
Bem fine-tuning add-on: how does the $500/month per trained function work, and how do corrections feed retraining?
Quick Answer: The Bem fine-tuning add-on is a $500/month charge per trained function that turns a generic workflow into a model specialized on your exact schemas and data. You still pay the same per-call price as any other function; the fee covers training, retraining, versioning, and rollback. Corrections you submit via the Corrections API automatically flow into golden datasets and trigger retraining so accuracy keeps improving without you managing models manually.
Why This Matters
Most teams eventually hit the ceiling of “generic LLM + prompts.” It demos well; it fails on edge cases, vendor quirks, and downstream business rules. Fine-tuning is how you cross that gap—by training the underlying model on your schemas, your fields, your documents. Bem’s fine-tuning add-on is built so you can get those gains without hiring an ML team or owning a training pipeline: you define the function, send real traffic, submit corrections, and we handle the training loop, evals, and versioning as infrastructure.
Key Benefits:
- Production accuracy without ML overhead: You get a custom-trained function for $500/month, with Bem managing training, retraining, and model versions for you.
- Same per-call cost, higher performance: Fine-tuned functions run at the standard per-call rates—no hidden “model tax” for inference.
- Continuous improvement via corrections: Every correction you submit feeds a golden dataset and powers automatic retraining, so the model converges on your edge cases over time.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Trained function | A Bem function whose underlying model has been fine-tuned on your schemas and data, with its own versioned weights. | Moves you from “prompted general model” to a specialized model optimized for your fields, formats, and business rules. |
| $500/month per trained function | Flat monthly charge for each function you choose to fine-tune, on top of normal per-call pricing. | You’re paying for training infrastructure, retraining, evals, versioning, and rollback—not for higher inference costs. |
| Corrections-driven retraining | A feedback loop where every correction you send via the Corrections API is added to a golden dataset and used to retrain your model. | Accuracy improves where it matters most: real-world errors from your own workflows, not synthetic benchmarks. |
How It Works (Step-by-Step)
At a high level, the Bem fine-tuning add-on works like this:
- You define a function with a schema.
- You run real traffic through it.
- You correct mistakes via the Corrections API.
- Bem builds golden datasets, trains a model, versions it, and rolls it out.
- Corrections keep flowing, triggering automatic retrains as your data evolves.
Here’s that flow in more detail.
- Start with a normal function
You don’t fine-tune in the abstract. You start with a working function:
- You define a JSON Schema for your output (e.g.,
InvoiceExtractionV3withvendor_name,total_amount, line items, etc.). - You wire that schema into a Bem function via the API.
- You connect your inputs (PDFs, images, emails, mixed packets) and start calling the function.
At this stage, you’re paying standard graduated pricing per call:
- 1–10,000 calls/month: $0.09/call
- 10,001–100,000 calls/month: $0.07/call
- 100,001–1,000,000 calls/month: $0.05/call
- 1,000,001–10,000,000 calls/month: $0.04/call
- 10,000,001+ calls/month: Contact us
You get 100 free calls every month (no card, no feature gates), and you can use all primitives, evals, Corrections API, and review workflows even before fine-tuning.
- Turn on fine-tuning for that function
Once you see that a function is stable but hitting accuracy limits—especially on edge cases—you enable fine-tuning:
- You opt that specific function into fine-tuning.
- That function now incurs $500/month as a trained function.
- The call pricing stays exactly the same. No new per-call cost, no model surcharges.
What the $500/month covers:
- Custom models trained on your schemas and data.
- Automatic retraining when you submit corrections.
- Model versioning for that function.
- Safe, instant rollback to previous versions.
You’re not paying for GPU hours per se; you’re paying for an ML pipeline you don’t have to build, maintain, or debug.
- Send corrections via the Corrections API
When a function call is wrong, you don’t accept it—you correct it.
- You get a result payload plus per-field confidence and hallucination signals.
- Low-confidence or flagged fields can be routed to human review (via Bem Surfaces or your own UI).
- When an operator fixes the data, you call the Corrections API with:
- The original function call ID.
- The corrected JSON matching your schema.
- Optional metadata (source, reason, tags).
The critical part: the Corrections API is free. There is no charge per correction. We want you to submit corrections aggressively; they’re the fuel for accuracy.
- Bem builds golden datasets and runs evals
Behind the scenes, corrections are not just “patches.” They become structured training data:
- Every correction is added to a golden dataset tied to that function and schema.
- We track:
- Input artifacts (PDFs, images, emails).
- Expected outputs (your corrected JSON).
- Model version that produced the original output.
- Bem automatically computes accuracy metrics (F1, pass/fail rates, per-field stats) on these golden examples.
This gives you two things:
- Ground truth that reflects your actual documents and rules, not synthetic tests.
- Eval pipelines to compare new fine-tuned versions against prior versions before rollout.
- Automatic retraining on corrections
As your golden dataset grows, Bem periodically retrains the model behind your trained function:
- New corrections are batched.
- A new version of the model is trained on:
- Your schema.
- Your historical golden data.
- Your latest corrections.
- The new candidate version is evaluated against:
- The golden dataset.
- Any additional eval suites you define.
Only when the new version meets or beats your thresholds (e.g., F1 score, per-field accuracy) does it become the active version for that function.
You don’t script any of this. No custom ML jobs. No Airflow. No fine-tuning endpoint juggling. You just keep sending corrections.
- Versioning, rollout, and rollback
Every trained function has explicit versions:
invoice_extractor_v5might be your current production version.invoice_extractor_v6is a candidate, trained on the latest corrections.- Bem stores:
- Model weights per version.
- Eval results per version.
- Changelogs of what data went into each retrain.
When a new version is promoted:
- Existing workflows automatically start using it for new calls.
- You can pin workflows to a specific version if you want to roll out gradually.
- If something regresses, rollback is instant: you revert to the previous version with one configuration change—no re-training, no data migration.
The $500/month per trained function is what keeps this loop running: data ingestion, training, evals, version management, and safe rollout/rollback.
Common Mistakes to Avoid
- Assuming fine-tuning changes per-call pricing: It doesn’t. Fine-tuned functions run at the same per-call rates as non–fine-tuned functions. The $500/month is for the training infrastructure, not more expensive inference.
- Treating corrections as optional: If you don’t send corrections, your fine-tuned model will stagnate. Use the Corrections API aggressively—especially for edge cases and high-value fields—to drive meaningful improvement.
Real-World Example
Imagine you’re automating AP for a logistics company. You have:
- Thousands of invoices from hundreds of vendors.
- Handwritten totals on some scans.
- Fuel surcharges, accessorials, and discounts applied differently per carrier.
- A requirement that “Totals must be 100% correct, including line items” before posts to your ERP.
You start with a generic Bem function:
- Schema:
carrier_name,invoice_number,total_amount,currency,due_date,line_items[]. - You run 5,000 invoices through it in the first month.
- Accuracy is strong on clean PDFs, weaker on faxes and weird layouts.
Then you:
- Turn on fine-tuning for this function (now $500/month for that trained function).
- Route low-confidence documents and hallucination-flagged fields to a review Surface.
- When reviewers fix totals, line items, or dates, you submit corrections via the Corrections API.
- Bem builds a golden dataset of the “hard” invoices—faxed copies, overlapping stamps, odd line item formats.
- Within a cycle or two, the fine-tuned model:
- Learns vendor-specific patterns.
- Handles noisy scans more reliably.
- Captures line items and totals at a level you can trust in production.
You didn’t hire an ML engineer. You didn’t spin up a separate training pipeline. You just tuned one function, paid $500/month for it, and fed it corrections. The output is operational: “Totals including line items are correct, or the call is explicitly flagged as an exception.”
Pro Tip: Don’t wait for a huge dataset before enabling fine-tuning. Turn it on once you’ve found a stable schema and you’re feeling the pain of recurring errors. Then make corrections submission part of your normal review loop—Bem will handle batching and retraining as the data grows.
Summary
The Bem fine-tuning add-on is a way to turn your most critical functions into specialized, continuously improving models without owning ML infrastructure. For $500/month per trained function, you get:
- A custom model trained on your schemas and data.
- Automatic retraining on every correction you submit.
- Versioning, evals, and instant rollback as first-class features.
- The same transparent, graduated per-call pricing as any other function.
Corrections are the key. The Corrections API is free, and every fix becomes labeled data that feeds your golden datasets and retraining cycles. Over time, that feedback loop is what gets you from “demo accuracy” to “production you can bet a P&L on.”