Extraction

Extraction turns an image, a PDF, or an HTML email into a structured Receipt row. The pipeline is small on purpose: two models, one schema, one confidence threshold.

Two models

Primary: Gemini 3 Flash (preview). Fast, cheap, multimodal.
Fallback: Claude Sonnet 4.6. Slower, more accurate on hard receipts.

The primary runs first. We escalate to the fallback when:

The schema validator rejected the response twice
The confidence score came back below the configured threshold (default 0.85)
The primary timed out or returned an error

The fallback runs once. If it also fails, the receipt is held for review.

What the model returns

The schema is a single Zod object covering:

vendor, vendor_original, description, category, country_code
amount, currency, purchased_at
line_items[] with name, amount, optional assigned_name
tip_amount, service_charge
split_hint with type, members, custom_shares, confidence
unresolved_names[] for people the model can’t match to a member
purchaser_hint, notes, confidence
vendor_address, vendor_phone, vendor_maps_query (opportunistic)

The amount here is the receipt total in its original currency. Conversion to the jar’s base currency happens in a separate FX queue step after extraction returns; the converted base amount is stored on the receipt row, not in the model output.

assigned_name and split_hint are how handwritten or in-body splits make it through the pipeline. A subject line of “split 3 ways with Steve and Rod” or a handwritten “Steve” next to a row both end up encoded in the same shape.

Files API for big photos

Anything over the model’s inline-payload cap is uploaded via the provider’s Files API:

Anthropic: 3.5MB raw threshold (5MB inline cap)
Gemini: 14MB raw threshold (20MB inline cap)

The raw payload is uploaded once, the model references it by handle, and the handle is deleted in a finally block after extraction returns — successful or not. Provider auto-expiry (48h for Gemini, workspace-policy for Anthropic) is the safety net; eager deletion keeps the file list tidy and quota predictable. A delete failure is logged but never fails the extraction.

Confidence and review

The confidence score is what the model itself reports. We trust it directionally; receipts under 0.85 (configurable) are routed to needs_review with reason low_confidence.

Long-thread emails (forwarded chains) are also routed to needs_review with reason long_thread, regardless of confidence — we held the most likely receipt, but the user should double-check the right attachment was picked.

What you don’t do

You don’t write extraction prompts. You don’t manage model credentials in your jar. You don’t pick between models. The pipeline does that.