Skip to content

Extraction

Extraction turns an image, a PDF, or an HTML email into a structured Receipt row. The pipeline is small on purpose: two models, one schema, one confidence threshold.

  • Primary: Gemini 3 Flash (preview). Fast, cheap, multimodal.
  • Fallback: Claude Sonnet 4.6. Slower, more accurate on hard receipts.

The primary runs first. We escalate to the fallback when:

  • The schema validator rejected the response twice
  • The confidence score came back below the configured threshold (default 0.85)
  • The primary timed out or returned an error

The fallback runs once. If it also fails, the receipt is held for review.

The schema is a single Zod object covering:

  • vendor, vendor_original, description, category, country_code
  • amount, currency, purchased_at
  • line_items[] with name, amount, optional assigned_name
  • tip_amount, service_charge
  • split_hint with type, members, custom_shares, confidence
  • unresolved_names[] for people the model can’t match to a member
  • purchaser_hint, notes, confidence
  • vendor_address, vendor_phone, vendor_maps_query (opportunistic)

The amount here is the receipt total in its original currency. Conversion to the jar’s base currency happens in a separate FX queue step after extraction returns; the converted base amount is stored on the receipt row, not in the model output.

assigned_name and split_hint are how handwritten or in-body splits make it through the pipeline. A subject line of “split 3 ways with Steve and Rod” or a handwritten “Steve” next to a row both end up encoded in the same shape.

Anything over the model’s inline-payload cap is uploaded via the provider’s Files API:

  • Anthropic: 3.5MB raw threshold (5MB inline cap)
  • Gemini: 14MB raw threshold (20MB inline cap)

The raw payload is uploaded once, the model references it by handle, and the handle is deleted in a finally block after extraction returns — successful or not. Provider auto-expiry (48h for Gemini, workspace-policy for Anthropic) is the safety net; eager deletion keeps the file list tidy and quota predictable. A delete failure is logged but never fails the extraction.

The confidence score is what the model itself reports. We trust it directionally; receipts under 0.85 (configurable) are routed to needs_review with reason low_confidence.

Long-thread emails (forwarded chains) are also routed to needs_review with reason long_thread, regardless of confidence — we held the most likely receipt, but the user should double-check the right attachment was picked.

You don’t write extraction prompts. You don’t manage model credentials in your jar. You don’t pick between models. The pipeline does that.