
How do I build my first Bem function to extract invoice fields into a strict JSON schema?
Quick Answer: You build your first Bem invoice extractor by defining a strict JSON Schema, creating a
Transformfunction via the/v2/functionsAPI, and then calling it with a raw invoice (PDF, image, email, etc.) to get schema‑valid JSON back. The schema does the hard work: it enforces structure, types, and required fields so the workflow either returns a valid payload or a flagged exception—never a silent guess.
Why This Matters
Most “invoice AI” demos are a prompt wrapped around OCR. They look great on three PDFs and fall apart on the fourth when layouts shift, vendors change, or the model hallucinates line items. What you actually need in production is a deterministic contract: a strict JSON schema the system must satisfy on every call, plus a function you can version, test, and route around edge cases.
A Bem function does exactly that. You define the schema for your invoice fields—Vendor Name, Invoice Number, Date, totals, line items—then let Bem handle routing, extraction, validation, and confidence scoring. One REST call in, schema‑valid JSON (or explicit exceptions) out.
Key Benefits:
- Schema-enforced outputs: Every response is validated against your JSON Schema, so downstream systems never see malformed or missing fields.
- Production-grade control: Functions are versioned, idempotent, and observable—easy to test, roll back, and monitor like any other piece of critical infra.
- Faster to value: Instead of months of OCR glue code and brittle parsers, you ship an invoice workflow in hours and iterate on the function as requirements change.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Strict JSON Schema | A formal definition of the invoice payload structure (fields, types, enums, required properties) used by Bem to shape and validate outputs. | Turns “AI extraction” into a contract: the system either returns schema‑valid JSON or flags exceptions, which makes downstream integrations safe. |
| Transform Function | A versioned Bem function that takes unstructured input (PDF, image, email body, etc.) and returns structured JSON matching your schema. | It’s the core unit of work in Bem—trainable, evaluable, and deployable—so you can treat unstructured extraction like code, not like a one‑off prompt. |
| Workflows & Routing | Composable chains of functions (Route, Transform, Enrich, Validate, etc.) that process mixed inputs and handle edge cases. | Let you start with a single invoice extractor, then graduate to handling mixed packets, vendor‑specific quirks, and enrichment against internal systems. |
How It Works (Step-by-Step)
At a high level, you’ll:
- Define a strict invoice JSON Schema.
- Create a Bem
Transformfunction that uses that schema. - Call the function with an invoice and consume the structured response.
1. Define your invoice schema (the contract)
This is where you decide what “an invoice” means to your system. Don’t start from “what text can I get?” Start from “what does my ERP or ledger need?”
For a first pass, a minimal, strict schema might have:
vendor_name(string)invoice_number(string)invoice_date(string, ISO date)total_amount(number)currency_code(string, enum)line_items(array of objects: description, quantity, unit_price, line_total)
Example (TypeScript/JS object form) matching how you’d send this to Bem:
// Minimal invoice JSON Schema for bem
const invoiceSchema = {
type: "object",
properties: {
vendor_name: {
type: "string",
description: "The name of the vendor issuing the invoice",
},
invoice_number: {
type: "string",
description: "Unique identifier for the invoice",
},
invoice_date: {
type: "string",
format: "date",
description: "Invoice date in YYYY-MM-DD format",
},
total_amount: {
type: "number",
description: "The final total amount on the invoice, including tax",
},
currency_code: {
type: "string",
enum: ["USD", "EUR", "GBP", "CAD", "AUD"],
description: "3-letter ISO currency code",
},
line_items: {
type: "array",
description: "Each billed item or service on the invoice",
items: {
type: "object",
properties: {
description: {
type: "string",
description: "Line item description",
},
quantity: {
type: "number",
description: "Quantity for this line",
},
unit_price: {
type: "number",
description: "Unit price for this line",
},
line_total: {
type: "number",
description: "Total for this line (quantity * unit_price, including any line-level tax/discounts)",
},
},
required: ["description", "line_total"],
additionalProperties: false,
},
},
},
required: ["vendor_name", "invoice_number", "invoice_date", "total_amount", "currency_code", "line_items"],
additionalProperties: false,
};
A few things to notice:
requiredfields ensure you never ingest “half an invoice” silently.enumoncurrency_codeprevents garbage values.additionalProperties: falsekeeps the model from inventing fields.
This schema is what Bem uses to force the extraction to match your expectations.
2. Create your first Bem Transform Function
Once you have your schema, you register a function with Bem’s API. This function is how your app talks to Bem in production.
Define the function via API
You’ll call POST /v2/functions with:
functionName: a versioned name like"invoice-extractor-v1".type:"transform"(we’re converting unstructured → structured).outputSchemaNameor inline schema definition.
Example in TypeScript/Node:
import fetch from "node-fetch";
const invoiceSchema = { /* same as above */ };
async function createInvoiceExtractor() {
const res = await fetch("https://api.bem.ai/v2/functions", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": process.env.BEM_API_KEY!, // from app.bem.ai
},
body: JSON.stringify({
functionName: "invoice-extractor-v1",
type: "transform",
outputSchema: invoiceSchema, // alternatively, outputSchemaName if stored in Bem
description: "Extracts invoice fields into a strict JSON schema",
}),
});
if (!res.ok) {
const errorBody = await res.text();
throw new Error(`Failed to create function: ${res.status} ${errorBody}`);
}
const data = await res.json();
console.log("Created function:", data);
}
createInvoiceExtractor().catch(console.error);
Under the hood, Bem binds this function definition to a deterministic extraction pipeline. The schema you provided is now the contract every execution must satisfy.
Because functions are versioned, when you later add fields (e.g., tax breakdowns, PO numbers), you’ll create invoice-extractor-v2 and keep v1 live for existing workflows until you’re ready to migrate.
3. Call the function with an invoice
Now you can send real invoices—PDFs, images, even email threads containing invoices—to Bem and get a strict JSON payload back.
Example call:
import fs from "fs";
import fetch from "node-fetch";
async function extractInvoice(filePath: string) {
const fileBuffer = fs.readFileSync(filePath);
const base64Content = fileBuffer.toString("base64");
const res = await fetch("https://api.bem.ai/v2/functions/invoke", {
method: "POST",
headers: {
"Content-Type": "application/json",
"x-api-key": process.env.BEM_API_KEY!,
},
body: JSON.stringify({
functionName: "invoice-extractor-v1",
// bem accepts different input surfaces; here we use a simple file payload
input: {
type: "file",
mimeType: "application/pdf",
data: base64Content,
},
}),
});
if (!res.ok) {
const errorBody = await res.text();
throw new Error(`Invoke failed: ${res.status} ${errorBody}`);
}
const output = await res.json();
console.dir(output, { depth: null });
}
extractInvoice("./sample-invoice.pdf").catch(console.error);
A typical response shape:
{
"data": {
"vendor_name": "ACME Supplies LLC",
"invoice_number": "INV-20348",
"invoice_date": "2026-01-02",
"total_amount": 1543.67,
"currency_code": "USD",
"line_items": [
{
"description": "Industrial cleaner (24-pack)",
"quantity": 3,
"unit_price": 120.5,
"line_total": 361.5
},
{
"description": "Safety gloves (case)",
"quantity": 10,
"unit_price": 118.21,
"line_total": 1182.1
}
]
},
"meta": {
"functionName": "invoice-extractor-v1",
"version": "1",
"confidence": {
"vendor_name": 0.99,
"invoice_number": 0.97,
"invoice_date": 0.98,
"total_amount": 0.995
},
"hallucinationFlags": {
"line_items": false
}
}
}
Every field conforms to your schema. If Bem can’t produce schema‑valid JSON, you don’t get a “best effort” blob—you get a clear exception you can route to a human review surface.
Common Mistakes to Avoid
-
Treating the schema as “nice to have” instead of the contract:
If you skip strictrequiredfields, enums, and type constraints, you’re back to guessing at runtime. Be explicit up front, even if the schema feels strict. You can relax later if needed. -
Overloading one function with every edge case:
Don’t jam vendor‑specific quirks, multi‑doc packets, and currency conversion into a single Transform Function. Use workflows: Route by document type, Transform per schema, then Enrich/Validate. Keep your first invoice function clean and focused.
Real-World Example
A finance ops team wants to stop keying invoices into their ERP. They start by defining the minimal strict JSON schema their system needs: vendor, invoice number, date, currency, grand total, and line items with descriptions and amounts. They register invoice-extractor-v1 in Bem using that schema, then wire a simple workflow:
- A vendor uploads a PDF via a portal.
- The backend sends the PDF to Bem’s
invoice-extractor-v1. - Bem returns schema‑valid JSON plus per‑field confidence.
- If all critical fields are above a 0.98 confidence threshold, the invoice posts directly into the ERP. If anything is low, it routes to a review surface auto-generated from the same schema.
- Corrections made by operators are captured as golden data to evaluate and improve future function versions.
Within a week, they’ve eliminated manual entry for the majority of invoices and built a measurable accuracy pipeline, not just a brittle OCR script.
Pro Tip: Start by instrumenting your first function with a simple eval harness: keep a small golden set of invoices with known “correct” JSON, run
invoice-extractor-v1against them on every change, and track F1 scores per field. Treat accuracy like test coverage and only cut over tov2in production once it passes your thresholds.
Summary
Building your first Bem function to extract invoice fields into a strict JSON schema is about flipping the usual pattern: define the contract first, then let the infrastructure satisfy it. You declare a JSON Schema that matches what your downstream systems need, create a Transform function bound to that schema, and call it with real invoices to get schema‑valid JSON plus confidence and hallucination signals. From there, you can add routing, enrichment, evals, and review surfaces—but the foundation is that first, strict function.