
How do I configure Bem so schema-invalid outputs or low-confidence fields get flagged and routed to a review queue?
Most teams don’t lose trust in AI because it “misses a field.” They lose trust because it returns bad data and no one notices until it hits the ERP, the GL, or a customer. On Bem, the whole point of the production layer is the opposite: schema-valid JSON, or it flags the exception. Low-confidence fields never slip through quietly—they get routed to a review queue, corrected, and those corrections train the system.
Quick Answer: On Bem, you configure schema validation and confidence routing at the workflow/function level. You define a strict JSON Schema for your output, set per-field confidence thresholds, and enable exception routing to a review Surface. Anything schema-invalid or below threshold is automatically flagged, queued for review, and fed back into model training—no silent failures, no guessing.
Why This Matters
Once you plug AI into AP, claims, onboarding, or logistics, “probably right” is a production incident. You need guarantees: either the data matches your schema and meets your confidence thresholds, or it’s explicitly flagged and routed to an operator. That’s what you’re configuring in Bem: the line between safe automation and auditable exception handling.
When schema-invalid outputs or low-confidence fields get routed to a review queue instead of slipping into downstream systems, you:
Key Benefits:
- Eliminate silent failures: Bad or incomplete data never “passes” as valid; exceptions are explicit objects with traceable context.
- Make accuracy enforceable: Confidence thresholds and JSON Schema become guardrails, not suggestions—Bem will not guess.
- Continuously improve the system: Every human correction updates the model, passes regression tests, and raises your F1 scores over time.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Schema validation | Enforcing a JSON Schema on every workflow output (types, required fields, enums, formats). | Ensures Bem either returns schema-valid JSON or flags a precise exception, never “best-effort” blobs. |
| Confidence thresholds | Per-field minimum confidence scores required before a value is considered safe. | Lets you automate the high-confidence path while routing uncertain fields for human review. |
| Review Surfaces & queues | Auto-generated operator UIs that receive flagged records/fields for correction. | Turn exceptions into a controlled workflow: review, correct, approve, and push clean data downstream. |
How It Works (Step-by-Step)
At a high level, you’re wiring three things together:
- A function/workflow that outputs structured data according to your schema.
- Validation and confidence thresholds that determine what’s “good enough.”
- A Surface that receives all exceptions for review, correction, and approval.
1. Define and attach a strict JSON Schema
You start by telling Bem what “correct” looks like.
A simple AP invoice schema might look like:
{
"$id": "ap_invoice_v1",
"type": "object",
"required": ["vendor", "total", "due_date", "category", "line_items"],
"properties": {
"vendor": { "type": "string" },
"total": { "type": "number" },
"due_date": {
"type": "string",
"format": "date"
},
"category": {
"type": "string",
"enum": ["AP", "AR", "PO"]
},
"line_items": {
"type": "array",
"items": {
"type": "object",
"required": ["description", "quantity", "unit_price"],
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"unit_price": { "type": "number" }
}
}
}
}
}
In Bem, you:
- Create this schema (via API or UI).
- Attach it to a function or workflow step (e.g.,
extract_ap_invoice_v1). - Mark whether fields are required, constrained by enums, formats, etc.
Once attached, Bem enforces: schema-valid output, or it flags the exception.
Example: if an incoming invoice produces "category": "Refund" and that’s not in Enum<AP|AR|PO>, Bem will emit:
{
"status": "exception",
"error_type": "schema_violation",
"details": {
"field": "category",
"value": "Refund",
"expected_enum": ["AP", "AR", "PO"]
}
}
…and route that record to your review queue.
2. Configure per-field confidence thresholds
Schema validity is binary. Confidence is continuous. You want both.
On Bem, each extracted field comes with a confidence score and hallucination detection. You configure routing rules like:
- “If any required field is below 0.99, route to review.”
- “If line-item totals don’t reconcile with invoice total, flag the whole document.”
- “If hallucination detector is non-zero for a field, treat it as low confidence.”
A typical configuration pattern:
{
"workflow_id": "ap_invoice_ingest_v3",
"confidence_policy": {
"default_min_confidence": 0.98,
"overrides": {
"vendor": 0.99,
"total": 0.995,
"line_items[*].description": 0.97
},
"on_below_threshold": "route_to_review"
}
}
What this does:
- Applies a default 0.98 threshold to all fields.
- Tightens vendor and total, loosens line item descriptions slightly.
- When any field is below threshold, the document gets an exception status and lands in a review queue instead of being emitted as “clean.”
Combine this with schema constraints and you get a simple rule of thumb:
If we aren’t ~99% sure and schema-valid, we do not pass it through. We route it.
3. Wire exceptions to a review Surface
Flagging is only half the story. You also need a place for humans to fix things.
On Bem, you:
-
Define a Surface for your schema
- E.g.,
ap_invoice_surface_v1 - Bem auto-generates the UI from your JSON Schema.
- Fields, enums, and formats map directly to form controls.
- E.g.,
-
Bind exceptions from your workflow to that Surface
In the workflow config, you specify:
{
"workflow_id": "ap_invoice_ingest_v3",
"exception_routing": {
"on_schema_violation": {
"surface_id": "ap_invoice_surface_v1"
},
"on_low_confidence": {
"surface_id": "ap_invoice_surface_v1"
}
}
}
-
Configure approval + sync behavior
When an operator corrects and approves a record:- Bem saves the correction (“Amzn Mktp” → “Amazon Web Services”).
- A new model version is created (e.g.,
v2.4.1). - Regression tests run on your golden dataset.
- If tests pass, the new version becomes active—self-healing accuracy loop.
- Clean, schema-valid JSON is synced to your downstream system (ERP, DB, API).
All of this is traceable: which function/workflow version, which model version, which operator corrected which field.
Common Mistakes to Avoid
-
Treating “almost schema-valid” as success:
If you relax your schema or make fields optional just to “get green checks,” you’re pushing complexity downstream. Keep the schema strict. Let Bem flag the edge cases and route them. -
Using one global confidence threshold for everything:
High-value fields (totals, dates, GL codes) deserve stricter thresholds than low-risk ones (descriptions, notes). Use per-field overrides. Tie thresholds to business risk, not convenience.
Real-World Example
You’re processing 100k invoices a week. Historically, your OCR/IDP tool “mostly worked”…until the month-end close when totals didn’t match, vendor names were inconsistent, and auditors started asking questions.
You switch to Bem and set up:
- A strict
ap_invoice_v1JSON Schema with enums forcategory, ISO-8601 fordue_date, and required line items. - Confidence thresholds of 0.995 for
totaland 0.99 forvendor, 0.98 for everything else. - A Surface for AP operators to review exceptions.
Day 1:
- 85% of invoices come through as schema-valid and above threshold. They auto-post.
- 15% get exceptions: missing categories, low-confidence totals, weird vendor names.
Your AP team spends time only on those 15%. Every correction (“Amzn Mktp” → “Amazon Web Services”, misread decimal points, category corrections) feeds back:
- Bem spins a new model version (
v2.4.1). - Golden datasets are re-run with F1 scores tracked by field.
- If regression tests pass, the new version rolls forward automatically.
By the end of month 1, your exception rate drops from 15% to 3–5%. Totals—including line items—are correct. You still have the same schema, the same thresholds. The system just got better, because the review queue became a training loop, not a manual rework pile.
Pro Tip: Start with stricter thresholds than you think you need (e.g., 0.99+) and a tight schema. Let the initial exception rate be higher, then drive it down with training. It’s easier to relax thresholds later than to unwind bad data in your ERP.
Summary
Bem is designed so you don’t have to trust “AI vibes” in production. You define what valid looks like with JSON Schema, set explicit confidence thresholds by field, and wire exceptions to a review Surface. From there:
- Schema-valid + above-threshold → auto-processed.
- Schema-invalid or low-confidence → flagged, routed, corrected.
- Corrections → new model versions, regression-tested, with rising F1 scores.
No guessing. No silent failures. Just deterministic pipelines with self-healing accuracy loops.