Bem fine-tuning add-on: how does the $500/month per trained function work, and how do corrections feed retraining?
Unstructured Data Extraction APIs

Bem fine-tuning add-on: how does the $500/month per trained function work, and how do corrections feed retraining?

8 min read

Most teams don’t lose time on models. They lose time on glue code, retraining, and “who owns this?” when accuracy drifts. The fine-tuning add-on exists so you can treat accuracy like software quality—versioned, monitored, and rolled back—without standing up your own ML infra or juggling model vendors.

Quick Answer: The Bem fine-tuning add-on is a $500/month fee per trained function that covers custom model training on your schemas and data, plus automatic retraining as you submit corrections. You don’t pay extra for inference—the fine-tuned function runs at the same per-call price, and every correction you send through the Corrections API feeds back into the model and its golden dataset.

Why This Matters

If you’re running anything critical—AP, claims, logistics packets, onboarding—“pretty accurate” isn’t enough. You need a system that:

  • Knows your schema and business rules.
  • Improves as your operators correct it.
  • Can be rolled back instantly when a change degrades quality.

That’s what the fine-tuning add-on gives you: a dedicated, trainable function that converges on your ground truth over time and stays stable as layouts, vendors, and edge cases change. You get production-grade accuracy without building a bespoke ML platform.

Key Benefits:

  • Predictable cost model: $500/month per trained function, no inference surcharge, no token or page-based surprises.
  • Always-learning pipeline: Corrections flow into golden datasets and trigger retrains automatically, keeping the model aligned with reality.
  • Production-safe evolution: Every trained function is versioned with rollback, so you can ship improvements without risking silent regressions.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Trained functionA Bem function that has its own fine-tuned model, trained specifically on your schemas, documents, and corrections.Gives you a dedicated “brain” per workflow (e.g., ap_invoice_v3) instead of a generic extraction model.
$500/month per trained functionFlat monthly fee that covers training, infra, auto-retraining on corrections, model versioning, and rollback. Inference still uses normal per-call pricing.Separates “accuracy engineering” from per-call costs so you can scale usage without worrying about training overhead.
Corrections-driven retrainingWhen you submit corrections via the Corrections API, they’re added to your golden dataset and used to automatically retrain and improve the function over time.Turns operator work into compounding accuracy gains—every fix reduces future manual effort.

How It Works (Step-by-Step)

At a high level, you:

  1. Stand up a function and schema.
  2. Turn on fine-tuning for that function.
  3. Let real-world calls and corrections drive the model forward.

Here’s how that plays out in practice.

1. Define and deploy a function

You start with the standard Bem workflow:

  • Define the schema for your target JSON (e.g., invoice_number, invoice_date, total, line items, enums for payment_terms).
  • Create a function (e.g., ap_invoice_v3) that:
    • Ingests PDFs, images, emails, etc.
    • Routes, Splits, Transforms, and Enriches as needed.
    • Outputs schema-enforced JSON (or an explicit exception) with per-field confidence.

At this point, the function runs on Bem’s base models. You’re paying only the standard per-call price:

  • 1 – 10,000 calls / month: $0.09 / call
  • 10,001 – 100,000 calls / month: $0.07 / call
  • 100,001 – 1,000,000 calls / month: $0.05 / call
  • 1,000,001 – 10,000,000 calls / month: $0.04 / call
  • 10,000,001+ calls / month: Contact us

The free tier gives you 100 free calls / month with no feature gates, so you can prototype the workflow end-to-end before you ever touch fine-tuning.

2. Enable the $500/month fine-tuning add-on for that function

When accuracy matters enough that you want the function to learn from your data, you enable fine-tuning for that specific function:

  • Cost: $500/month per trained function.
  • What you get:
    • Custom model trained on your schemas and data.
    • Auto-retraining when you submit corrections.
    • Model versioning and instant rollback.
    • Ongoing engineering and infra to keep the model up to date.

What you don’t pay extra for:

  • No inference premium. The fine-tuned model runs at the same per-call rate listed above.
  • No token/page fees. You still pay per function call, regardless of how long the document is.

Think of the $500 as the “accuracy engine” subscription for that function. Usage still scales on the normal, graduated per-call pricing.

3. Run real traffic and capture corrections

As you push production traffic through the trained function:

  • Each call returns:
    • Schema-valid JSON or an explicit exception.
    • Per-field confidence scores.
    • Hallucination detection flags.
  • Low-confidence fields can be:
    • Routed to a human review Surface.
    • Corrected directly in the UI or via your own tools.

When a human spots an issue, they submit a correction:

  • Via the Corrections API: programmatic submission from your own review tools.
  • Via Bem Surfaces: generated UIs based on your schema for review/correction/approval.

Pricing note: The Corrections API is free. There is no per-correction charge. We want you to correct aggressively—this is the data that drives your model forward.

4. Corrections feed golden datasets and retraining

Under the hood, every correction:

  • Is added to a golden dataset attached to that function.
  • Becomes part of the training corpus for the next round of fine-tuning.
  • Carries additional signals:
    • Original document.
    • Previous model output.
    • Corrected ground truth.
    • Timestamps, version, and context.

Bem’s training loop then:

  1. Aggregates corrections into the golden dataset.
  2. Trains an updated model version for that function.
  3. Runs evals (F1 scores, pass rates) on held-out golden data.
  4. Flags regressions before the new version is promoted.

Once a new model version passes thresholds, it’s deployed to serve that function’s calls.

You don’t manage any of this training plumbing. The $500/month covers:

  • Data management and golden set curation.
  • Training orchestration and monitoring.
  • Versioning, regression tests, and rollout.

5. Versioning, rollback, and safe iteration

Every trained function is versioned:

  • Example: ap_invoice_v3@1.7.2.
  • You can see:
    • When a version was trained.
    • Which corrections/golden set it was based on.
    • Its eval metrics (e.g., F1, pass rate per field).

If a new version underperforms in your environment:

  • You can roll back instantly to a previous version.
  • Because execution is idempotent, you can safely re-run calls under the previous version without corrupting downstream systems.
  • You keep full auditability: which version produced which output for which document.

That’s the core premise: accuracy behaves like code. Versioned. Measured. Reversible.

How the Pricing Actually Works in Practice

To make this concrete, imagine:

  • You run AP invoices through ap_invoice_v3.
  • Volume: 50,000 calls/month.
  • You enable fine-tuning on that function.

Your monthly costs for this function:

  • Per-call: first 10,000 calls at $0.09, next 40,000 at $0.07.
  • Fine-tuning: $500 flat for the trained function.

You’re not paying:

  • Extra per-call for the fact it’s fine-tuned.
  • Extra per correction.
  • Extra for training runs.

As volume grows, your per-call rate decreases (graduated pricing). The fine-tuning line item stays flat at $500/month per trained function.

If you later stand up a second trained function—for example, claims_packet_v2—that’s another $500/month for that second trained function, plus its own per-call usage.

Common Mistakes to Avoid

  • Treating fine-tuning as a one-off project:
    Don’t think of this as “we’ll fine-tune once and we’re done.” The value comes from ongoing corrections → retraining → evals. Keep your correction flow alive, even if accuracy feels “good enough.” It’s how you stay ahead of layout drift and new vendors.

  • Under-investing in golden datasets and evals:
    If you never look at evals, you’re flying blind. Use golden datasets and per-field metrics (F1, pass rate) to decide when a retrained version is ready, which workflows need more corrections, and where to set confidence thresholds for human review.

Real-World Example

Say you’re a fleet management platform processing maintenance invoices from thousands of shops.

You define a function:

  • fleet_maintenance_invoice@prod
  • Schema includes:
    • vehicle_id, odometer, service_date
    • Line items with labor_hours, parts_cost, taxes
    • Enum-tightened fields like service_type and payment_terms

You enable fine-tuning for this function:

  • You start with your historical invoices as seed data.
  • Operator team reviews low-confidence fields in a Bem Surface and pushes corrections via the Corrections API.
  • Over a few weeks:
    • The model learns your specific vendor layouts.
    • It gets better at subtle stuff—like detecting when tax is embedded vs separate.
    • You see evaluation metrics improve and manual review rates drop.

By month two:

  • The function is consistently returning schema-valid, line-item-accurate totals.
  • Your team moves from reviewing ~40% of invoices to <10%, focused on genuinely weird edge cases.
  • Because retrieval and enrichment are part of the same workflow, the function can Enrich vendor_id from your Collections and output the exact JSON your ERP expects.

Behind the scenes:

  • That entire improvement curve is driven by:
    • The $500/month fine-tuning add-on on fleet_maintenance_invoice.
    • Continuous corrections via the free Corrections API.
    • Automated retraining, evals, and versioned rollout.

Pro Tip: If you’re not sure whether a workflow justifies fine-tuning, watch your manual review rate and exception rate. When a function is stable but still driving a lot of corrections on similar patterns, that’s the perfect time to enable fine-tuning and let those corrections compound.

Summary

The Bem fine-tuning add-on is simple on purpose:

  • $500/month per trained function pays for a dedicated, continuously improving model tied to that function’s schema and data.
  • You keep the same per-call pricing as any other function—no inference premium for being fine-tuned.
  • The Corrections API is free, and every correction feeds a golden dataset that drives automatic retraining, evals, and safe rollout.
  • You get versioning and instant rollback, so you can push for higher accuracy without risking silent regressions in production.

Instead of building your own training pipelines, eval harness, and model registry, you buy a predictable line item that makes your workflows trainable, deterministic, and debuggable.

Next Step

Get Started