Intake pipeline
The intake path is short and deterministic. Every email goes through the same sequence; nothing is skipped.
1. Receive
Cloudflare Email Routing accepts the message at jar@splitjar.app and hands it to the worker as a ForwardableEmailMessage. We read up to a hard byte budget — anything larger is dropped at the door.
2. Filter automated mail
Bulk and auto-responder mail is dropped before any database write. The rules:
Auto-Submittedpresent and notno→ automated (RFC 3834)X-Auto-Response-Suppressset → out-of-office responderX-Spam-Flag: YES→ upstream classifier flagged itPrecedence: bulk | list | junk→ mailing list / bulk sender
This keeps the unknown-sender bounce path from feeding loops.
3. Authenticate the sender
We read the Authentication-Results and ARC-Authentication-Results headers stamped by Cloudflare. Only headers whose authserv-id is mx.cloudflare.net are trusted. The verdict gate is DMARC, not DKIM — DKIM alone doesn’t verify the visible From: domain.
See Sender authentication for the full story.
4. Parse attachments
The MIME parser pulls out images and PDFs. Caps:
- 10 files per ingest
- Per-file size limit
- Total size limit
Anything past the cap is dropped, and we log what was rejected.
5. Persist + route
The receipt row goes into D1, the raw MIME and attachments go to R2. Routing matches the sender’s email to a known user; if matched, the receipt is assigned to a jar based on the subject line or the user’s default jar.
6. Enqueue extraction
The receipt is queued for the extract pipeline. Web uploads skip steps 1–4 and start here.
7. Extract
A two-model pipeline reads the receipt. See Extraction for details.
8. Finalise
Once extraction returns and FX conversion completes, the finaliser computes shares, writes the splits, and the receipt joins the running ledger.
When things go sideways
- Stuck receipts are swept on each queue handler invocation; anything in
extractingfor more than a few minutes is re-queued. - Sender-auth failures route to
needs_reviewrather than rejecting outright — the user can still see and confirm the receipt. - Duplicates are detected by a tier-1 exact-content-hash check, then a tier-2 fuzzy match (vendor + amount + date), and held until the recipient confirms or dismisses.