
How do I build my first Bem function to extract invoice fields into a strict JSON schema?
Most teams can hack together a demo that “reads invoices.” The real question is: can you get a strict, schema-valid JSON payload you’d trust to hit your ERP—and keep it working across vendors, layouts, and edge cases? That’s what your first Bem function should do.
Quick Answer: You build your first Bem invoice extractor by defining a strict JSON Schema, creating a
transformFunction via the Bem API, and then calling it with raw invoice inputs (PDFs, images, emails) to get back schema-enforced JSON or explicit exceptions—never silent guesses. The schema becomes the contract: either the output is valid, or you get a flagged case you can route to review.
Why This Matters
If your “invoice AI” can change shape on every call, you don’t have a system—you have a demo. Finance, AP, and billing workflows depend on deterministic structures: vendor name where you expect it, totals that reconcile, and line items that always follow the same schema. Bem’s model is simple: unstructured in, strict JSON out, with versioned functions and workflows you can treat like software, not vibes.
Key Benefits:
- Schema-valid outputs by design: Bem enforces your JSON Schema at the architecture level, so malformed payloads become explicit exceptions, not hidden bugs.
- Production-ready from day one: Versioned functions, idempotent execution, and confidence scores let you test, roll back, and govern invoice extraction like code.
- No page- or layout-specific parsers: One function can handle messy real-world invoices across vendors and formats—PDF, image, email, attachments—without brittle templates.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Transform Function | A Bem function that takes unstructured input (PDF, image, text) and returns structured JSON that must conform to a JSON Schema. | This is your “invoice extractor” primitive—trainable, versioned, and enforced by schema, not prompt tricks. |
| Strict JSON Schema | A JSON Schema that defines fields (types, required, enums, descriptions) for your invoice output: vendor, dates, totals, line items. | The schema is the contract between AI and your systems; it prevents shape drift and makes validation and exceptions deterministic. |
| Per-call outcome | One request in, one schema-valid JSON (or explicit exception) out, regardless of pages, layouts, or formats. | You stop thinking in pages and templates and start thinking in outcomes your ERP or AP system can consume reliably. |
How It Works (Step-by-Step)
From zero to “my first Bem function to extract invoice fields into a strict JSON schema” is basically three steps:
- Define your invoice JSON Schema
- Create a Bem Transform Function using that schema
- Call the function with real invoices and wire it into your workflow
1. Define your invoice JSON Schema
You start by describing the exact shape of the invoice JSON payload you want. Not “whatever the model thinks is helpful.” A strict schema.
For a minimal, production-ready invoice schema, you probably care about:
vendor_nameinvoice_numberinvoice_datetotal_amountcurrencyline_items(array withdescription,quantity,unit_price,total)
Example (TypeScript/JSON Schema-esque):
// invoiceSchema.ts
export const invoiceSchema = {
type: "object",
properties: {
vendor_name: {
type: "string",
description: "The name of the vendor issuing the invoice",
},
invoice_number: {
type: "string",
description: "Unique identifier for the invoice",
},
invoice_date: {
type: "string",
format: "date",
description: "The invoice issue date in ISO 8601 format (YYYY-MM-DD)",
},
currency: {
type: "string",
description: "The currency code for the invoice totals (e.g. USD, EUR)",
},
total_amount: {
type: "number",
description: "The final total amount due on the invoice",
},
line_items: {
type: "array",
description: "List of line items included in the invoice",
items: {
type: "object",
properties: {
description: {
type: "string",
description: "Description of the line item",
},
quantity: {
type: "number",
description: "Quantity billed for this line item",
},
unit_price: {
type: "number",
description: "Unit price for this line item",
},
total: {
type: "number",
description: "Total amount for this line item",
},
},
required: ["description", "total"],
},
},
},
required: ["vendor_name", "total_amount", "line_items"],
} as const;
Two important things:
requiredfields are enforced. If the model can’t produce a schema-valid payload, you get an exception case, not a half-broken JSON.- Descriptions are for humans and models. They make your expectations explicit and help the underlying inference engine.
You can register this schema with Bem as a named schema (e.g., "Invoice") or inline it when you create your function.
2. Create your first Bem Transform Function
Now you wrap that schema in a Transform Function. This is not a chat prompt; it’s a deterministic function with a versioned definition.
// create-function.ts
import fetch from "node-fetch";
import { invoiceSchema } from "./invoiceSchema";
const BEM_API_KEY = process.env.BEM_API_KEY;
async function createInvoiceExtractor() {
const res = await fetch("https://api.bem.ai/v2/functions", {
method: "POST",
headers: {
"x-api-key": BEM_API_KEY!,
"content-type": "application/json",
},
body: JSON.stringify({
functionName: "invoice-extractor-v1",
type: "transform",
// Either reference a pre-registered schema by name or pass the schema inline:
outputSchemaName: "Invoice", // if you've registered it
// OR:
// outputSchema: invoiceSchema,
description:
"Extracts structured invoice data (vendor, invoice number, dates, totals, line items) from unstructured inputs into a strict JSON schema.",
}),
});
if (!res.ok) {
const error = await res.text();
throw new Error(`Failed to create function: ${error}`);
}
const data = await res.json();
console.log("Function created:", data);
}
createInvoiceExtractor().catch(console.error);
A few design notes:
functionNameis versioned by you. Start withinvoice-extractor-v1, and bump to-v2when you change behavior or schema.type: "transform"tells Bem this is an unstructured → structured function that must return schema-valid JSON.- You can inspect the returned payload for IDs, versions, and metadata; everything is auditable and traceable.
3. Call the function with real invoices
Once the function exists, you send it real-world invoices: PDFs, images, email threads, S3 URLs, etc. You get back structured, schema-enforced JSON plus per-field confidence.
Example: calling the function on a single PDF invoice.
// call-function.ts
import fetch from "node-fetch";
import fs from "fs/promises";
const BEM_API_KEY = process.env.BEM_API_KEY;
async function extractInvoiceFromPdf(filePath: string) {
const fileBuffer = await fs.readFile(filePath);
const base64Content = fileBuffer.toString("base64");
const res = await fetch(
"https://api.bem.ai/v2/functions/invoke/invoice-extractor-v1",
{
method: "POST",
headers: {
"x-api-key": BEM_API_KEY!,
"content-type": "application/json",
},
body: JSON.stringify({
input: {
type: "file",
mimeType: "application/pdf",
data: base64Content,
filename: "invoice-1234.pdf",
},
}),
}
);
const data = await res.json();
console.dir(data, { depth: null });
}
extractInvoiceFromPdf("./samples/invoice-1234.pdf").catch(console.error);
Typical response shape (simplified):
{
"output": {
"vendor_name": "Acme Supplies Inc.",
"invoice_number": "INV-2024-00123",
"invoice_date": "2024-11-08",
"currency": "USD",
"total_amount": 1423.75,
"line_items": [
{
"description": "Printer toner - black",
"quantity": 5,
"unit_price": 79.5,
"total": 397.5
},
{
"description": "Maintenance service - Q4",
"quantity": 1,
"unit_price": 1026.25,
"total": 1026.25
}
]
},
"meta": {
"schemaValid": true,
"confidence": {
"vendor_name": 0.99,
"invoice_number": 0.97,
"invoice_date": 0.95,
"total_amount": 0.998
},
"hallucination": {
"vendor_name": false,
"total_amount": false
}
}
}
If the payload can’t satisfy your schema, you don’t get malformed JSON. You get a flagged exception you can route into an operator Surface for review, correction, and feedback.
Common Mistakes to Avoid
-
Skipping the schema and “just prompting the model”:
How to avoid it: Always start with a JSON Schema that mirrors what your downstream systems need. Treat the schema as the contract. If you add fields later (e.g., tax breakdown, PO number), version the schema and the function. -
Treating invoices as per-page OCR problems instead of per-call outcomes:
How to avoid it: Don’t build pipelines that think in pages, bounding boxes, and coordinates unless you truly need them. Use Bem to think in terms of “invoice in → strict JSON out,” independent of layout or page count.
Real-World Example
A typical AP team starts with “invoice AI” that looks good in a demo: a model reads a single vendor’s clean PDF and prints out key fields. Then reality hits. Mixed packets. Scanned faxes. Multi-currency. Invoices embedded in email threads. Suddenly, the brittle parser breaks, and engineers are maintaining regexes instead of shipping product.
With Bem, that same team defines a single strict JSON Schema for invoices and ships invoice-extractor-v1 as a Transform Function. They route every incoming invoice—PDF attachments, EDI fallbacks, even images from mobile uploads—through the same function. For 90–95% of cases, they get schema-valid JSON with per-field confidence that can flow directly into their ERP. The remaining low-confidence cases appear in a review Surface: operators fix them in a few clicks, and those corrections flow back into evaluation datasets.
The result: no more per-vendor templates, no more manual keying for the majority of invoices, and a pipeline they can regression test and version like any other production service.
Pro Tip: Start narrow with your first Bem function: vendor, invoice number, date, total, and line items. Once it’s stable and covered by evals, add fields like tax breakdown, due dates, and payment terms as a new version (
invoice-extractor-v2)—don’t mutate v1 in production.
Summary
Building your first Bem function to extract invoice fields into a strict JSON schema is about shifting from “AI that reads invoices” to “infrastructure that guarantees structure.” You define a JSON Schema, create a Transform Function around it, and then call that function with messy real-world invoices. The output is either schema-valid JSON with confidence and hallucination signals—or an explicit exception that you can route and review. No silent failures, no layout-specific hacks, and a pipeline your finance and engineering teams can trust.