Extraction
Extraction turns an image, a PDF, or an HTML email into a structured Receipt row. The pipeline is small on purpose: two models, one schema, one confidence threshold.
Two models
- Primary: Gemini 3 Flash (preview). Fast, cheap, multimodal.
- Fallback: Claude Sonnet 4.6. Slower, more accurate on hard receipts.
The primary runs first. We escalate to the fallback when:
- The schema validator rejected the response twice
- The confidence score came back below the configured threshold (default 0.85)
- The primary timed out or returned an error
The fallback runs once. If it also fails, the receipt is held for review.
What the model returns
The schema is a single Zod object covering:
vendor,vendor_original,description,category,country_codeamount,currency,purchased_atline_items[]withname,amount, optionalassigned_nametip_amount,service_chargesplit_hintwithtype,members,custom_shares,confidenceunresolved_names[]for people the model can’t match to a memberpurchaser_hint,notes,confidencevendor_address,vendor_phone,vendor_maps_query(opportunistic)
The amount here is the receipt total in its original currency. Conversion to the jar’s base currency happens in a separate FX queue step after extraction returns; the converted base amount is stored on the receipt row, not in the model output.
assigned_name and split_hint are how handwritten or in-body splits make it through the pipeline. A subject line of “split 3 ways with Steve and Rod” or a handwritten “Steve” next to a row both end up encoded in the same shape.
Files API for big photos
Anything over the model’s inline-payload cap is uploaded via the provider’s Files API:
- Anthropic: 3.5MB raw threshold (5MB inline cap)
- Gemini: 14MB raw threshold (20MB inline cap)
The raw payload is uploaded once, the model references it by handle, and the handle is deleted in a finally block after extraction returns — successful or not. Provider auto-expiry (48h for Gemini, workspace-policy for Anthropic) is the safety net; eager deletion keeps the file list tidy and quota predictable. A delete failure is logged but never fails the extraction.
Confidence and review
The confidence score is what the model itself reports. We trust it directionally; receipts under 0.85 (configurable) are routed to needs_review with reason low_confidence.
Long-thread emails (forwarded chains) are also routed to needs_review with reason long_thread, regardless of confidence — we held the most likely receipt, but the user should double-check the right attachment was picked.
What you don’t do
You don’t write extraction prompts. You don’t manage model credentials in your jar. You don’t pick between models. The pipeline does that.