How do I configure Bem so schema-invalid outputs or low-confidence fields get flagged and routed to a review queue?
Unstructured Data Extraction APIs

How do I configure Bem so schema-invalid outputs or low-confidence fields get flagged and routed to a review queue?

9 min read

Most teams only discover their extraction pipeline is fragile when a bad field slips into production. The vendor name is half-garbled, the total doesn’t match the sum of line items, or someone picked a category value that doesn’t exist in your ERP. With Bem, you don’t have to trust that it “probably worked.” You can enforce a hard rule: either the output is schema-valid and high-confidence, or it gets flagged and routed to a review queue.

Quick Answer: You configure Bem to flag and route bad outputs by (1) defining a strict JSON Schema for your workflow, (2) turning on schema enforcement and confidence thresholds at the function/workflow level, and (3) wiring a review “Surface” + exception route so any schema-invalid or low-confidence field lands in a human review queue instead of your production system. Bem’s architecture makes this deterministic: schema-valid output, or an exception you can explicitly handle.

Why This Matters

If you’re running AP, claims, logistics, or onboarding flows, “mostly right” is still a production incident. One malformed enum, one hallucinated line item, and now reconciliation breaks or revenue gets delayed. You need AI that behaves like infrastructure: observable, auditable, and governed by contracts.

Configuring Bem to route schema-invalid and low-confidence fields into a review queue gives you that contract. You get:

  • A single source of truth for what “valid” means (your schema).
  • Automatic guardrails that stop uncertain fields before they touch your ERP/CRM.
  • A feedback loop where every human correction upgrades the model and tightens accuracy over time.

Key Benefits:

  • Deterministic safety rails: Either Bem produces schema-valid JSON, or it flags the exception. It never silently guesses.
  • Lighter operational overhead: Operators only review the 1–5% of fields Bem isn’t ≥99% confident about, instead of re-checking everything.
  • Continuous accuracy gains: Each correction in the review queue creates a new model version, backed by regression tests and F1 scores.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
Schema enforcementUsing JSON Schema (types, enums, required fields) as a hard contract for Bem’s outputs.Turns “AI parsing” into a deterministic interface: valid JSON or flagged exception, nothing in-between.
Confidence thresholdsPer-field minimum confidence (e.g., 0.99) that Bem must reach before accepting a value.Prevents low-confidence guesses from entering your systems; pushes them into review instead.
Surfaces & review queuesAuto-generated UIs based on your schema where humans review, correct, and approve flagged fields.Gives you a structured, auditable place to resolve exceptions and feed corrections back into training.

How It Works (Step-by-Step)

At a high level, you’ll:

  1. Define your schema and confidence thresholds.
  2. Configure your workflow to enforce that schema and route exceptions.
  3. Connect a Surface so humans can review, correct, and push updates back into Bem.

Here’s how that typically looks in practice.

1. Define a strict JSON Schema

Start by deciding what “valid” means for your output. For example, an AP invoice:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Invoice",
  "type": "object",
  "required": ["vendor", "invoice_number", "total", "due_date", "line_items"],
  "properties": {
    "vendor": { "type": "string" },
    "invoice_number": { "type": "string" },
    "total": { "type": "number" },
    "currency": {
      "type": "string",
      "enum": ["USD", "EUR", "GBP", "CAD", "AUD"]
    },
    "due_date": {
      "type": "string",
      "format": "date"
    },
    "category": {
      "type": "string",
      "enum": ["AP", "AR", "PO"]
    },
    "line_items": {
      "type": "array",
      "items": {
        "type": "object",
        "required": ["description", "quantity", "unit_price", "line_total"],
        "properties": {
          "description": { "type": "string" },
          "quantity": { "type": "number" },
          "unit_price": { "type": "number" },
          "line_total": { "type": "number" }
        }
      }
    }
  }
}

You can:

  • Mark fields as required (e.g., vendor, total).
  • Constrain values via enums (e.g., category).
  • Enforce formats (e.g., ISO-8601 dates for due_date).

This schema becomes the contract for your workflow.

2. Attach the schema to a workflow function

Next, you wire your schema into a Bem function responsible for extraction + shaping.

Conceptually (simplified):

{
  "name": "invoice_extract_v1",
  "type": "transform",
  "input": { "type": "file" },
  "output_schema": "schemas/invoice.json",
  "options": {
    "enforce_schema": true,
    "include_confidence": true,
    "hallucination_detection": true
  }
}

Key things happening here:

  • output_schema ties the function to your JSON Schema.
  • enforce_schema: true tells Bem to treat the schema as hard validation. If any field fails (wrong type, missing, enum mismatch), that field is flagged instead of “best-guess filled.”
  • include_confidence: true ensures you get per-field confidence scores back.
  • Hallucination detection guards against invented fields/values that weren’t actually present in the input.

3. Set per-field confidence thresholds

Now you define when Bem is “confident enough” to accept a field. For critical fields, you can require ≥99% confidence; for secondary fields, you might accept ≥95%.

Example configuration:

{
  "name": "invoice_workflow_v1",
  "steps": [
    {
      "id": "extract",
      "function": "invoice_extract_v1",
      "on_result": {
        "route": [
          {
            "condition": "any(field.confidence < 0.99 for field in ['vendor', 'total', 'due_date'])",
            "to": "review_queue"
          },
          {
            "condition": "any(field.confidence < 0.97 for field in ['line_items'])",
            "to": "review_queue"
          },
          {
            "condition": "any(field.schema_status == 'invalid')",
            "to": "review_queue"
          },
          {
            "condition": "default",
            "to": "sync_to_erp"
          }
        ]
      }
    }
  ]
}

The point:

  • Low-confidence fields don’t silently pass; they trigger a route.
  • Schema-invalid fields (like category not in ["AP","AR","PO"]) are automatically detected and routed.
  • Only results that are both schema-valid and above your threshold continue to sync_to_erp.

In practice, Bem handles the heavy lifting:

  • Auto-routing low confidence: if it isn’t ≈99% sure, fields go to review.
  • Schema status: each field is marked Valid or Flagged based on your schema.

4. Create a Surface as a review queue

Once you’re routing exceptions, you need a place for humans to resolve them. Bem generates a “Surface” from your schema that acts as your review UI.

Conceptually:

{
  "name": "invoice_review_surface_v1",
  "schema": "schemas/invoice.json",
  "views": {
    "queue": {
      "source": "review_queue",
      "filters": {
        "status": ["flagged", "low_confidence"]
      }
    }
  }
}

Operators see:

  • A PDF preview with the extracted fields highlighted.
  • A panel listing flagged or low-confidence fields (e.g., category enum violation, vendor at 0.93 confidence).
  • For each field:
    • Extracted value.
    • Confidence score.
    • Schema status (Valid / Flagged).
  • A simple edit experience:
    • They correct values (e.g., “Amzn Mktp” → “Amazon Web Services”).
    • Click Save & Approve.

Behind the scenes, when they save:

  • Bem creates a new model version (e.g., v2.4.1) with the correction.
  • It automatically runs regression tests on your golden dataset.
  • If tests pass, the new version can be promoted to production.
  • Accuracy metrics (F1 scores) update so you can track drift and improvement.

5. Wire the workflow → Surface → downstream system loop

Tie it all together with explicit flows:

  1. Documents in

    • You send files via REST / webhook into invoice_workflow_v1.
  2. Extraction + validation

    • invoice_extract_v1 runs, enforcing the schema and computing confidence.
    • Bem checks your routing rules:
      • If any field is schema-invalid or below threshold → review_queue.
      • Else → sync_to_erp.
  3. Review queue handling

    • Operators work in invoice_review_surface_v1:
      • Fix values.
      • Approve the record.
    • The corrected output is:
      • Written back into Bem (for versioning + training).
      • Re-validated against the schema.
      • Emitted via webhook/REST to your ERP.
  4. Continuous learning + monitoring

    • Every correction feeds training (“self-healing loop”).
    • Golden datasets run automatically to catch regressions.
    • You monitor F1 scores per field (vendor, total, line_items) and adjust thresholds as accuracy increases.

Common Mistakes to Avoid

  • Treating the schema as documentation, not enforcement:
    If you define a schema but don’t turn on enforce_schema, you’re back to “best-effort parsing.” Always enable schema enforcement so invalid fields are flagged, not silently coerced.

  • Using a single global confidence threshold:
    Not all fields are equal. total and due_date should have stricter thresholds than memo. Set per-field thresholds so you’re strict where it matters and pragmatic where it doesn’t.

  • Skipping regression tests after “fixing” a model:
    Manually correcting fields is powerful, but without regression tests you can accidentally break previous behavior. Use Bem’s built-in evals and golden datasets so every new model version is gated by F1 scores, not vibes.

  • Letting exceptions pile up without enrichment:
    If the same vendor or category shows up in review repeatedly, enrich against your Collections (vendor master, GL codes) so future matches are automatic and high-confidence.

Real-World Example

A finance ops team at a mid-market SaaS company moved their invoices onto Bem. Their constraints:

  • Thousands of invoices weekly from hundreds of vendors.
  • Totals and line items had to be 100% accurate.
  • Categories had to match a strict GL enum list.

They defined a schema similar to the invoice example above and configured:

  • vendor, total, due_date, and category to require ≥0.99 confidence.
  • All enums (currency, category) to be schema-enforced.
  • Routing so any schema-invalid field or sub-threshold confidence went to a review Surface.

First week:

  • ~20–25% of invoices had at least one flagged field (mostly vendor normalization and category).
  • Operators corrected them in the Surface UI:
    • “Amzn Mktp” → “Amazon Web Services”
    • Misclassified category values → valid GL categories

Each correction:

  • Created a new model version, validated via regression tests.
  • Updated F1 scores on key fields.

Within a few cycles:

  • Field-level F1 for vendor crossed 0.997.
  • total and line_items accuracy climbed above 0.99 as edge cases were absorbed.
  • Only a low-single-digit percentage of invoices required any human review.

Critically, they didn’t have a single bad total reach the ERP. Every schema-invalid or low-confidence value was caught, surfaced, and fixed upstream, with a full audit trail.

Pro Tip: Start with stricter confidence thresholds than you think you need (e.g., 0.99 on critical fields) and relax them only after you see stable F1 scores and passing regression tests. It’s easier to reduce workload once you trust the evals than to unwind silent bad data in your ERP.

Summary

Configuring Bem to flag schema-invalid outputs and low-confidence fields is about turning AI parsing into a deterministic, contract-based system:

  • Your JSON Schema defines what valid looks like.
  • Schema enforcement and per-field confidence thresholds ensure Bem never guesses; anything uncertain gets flagged.
  • Surfaces provide a structured review queue where humans correct, approve, and continuously improve the underlying models via versioned updates and regression tests.

You end up with a pipeline that either returns schema-valid JSON you can trust or explicit exceptions you can route and resolve—no silent failures, no “hope it worked.”

Next Step

Get Started