
Bem API quickstart: what’s the simplest REST example to send a PDF and get schema-validated JSON back?
Most teams don’t need a full platform tour to get started with Bem. You just want the simplest REST call that takes a PDF in, runs it through a workflow, and returns schema-validated JSON out. This quickstart gives you exactly that, plus the minimum context you need to make it production-safe instead of “demo-only.”
Quick Answer: The simplest Bem API call is a single
POST /v2/callswithmultipart/form-data: send your PDF asfileand the name of your workflow asworkflowName. Bem returns a JSON payload that includes your schema-enforced output, per-field confidence, and aschema_validflag so you can trust—or route—each result deterministically.
Why This Matters
For most teams, the hard part isn’t “calling an LLM.” It’s getting reliable, schema-correct JSON from messy PDFs without building a pile of brittle parsing code. You want one REST call that works across vendors, layouts, and edge cases—and that either gives you data your system can ingest or a clear exception your operators can handle.
Bem is designed as that production layer. Instead of per-page OCR or a fragile prompt, you define your schema once, wire it into a workflow, and then hit a single endpoint with any document. You get strict JSON validation, confidence scores, and exception routing out of the box.
Key Benefits:
- Deterministic outputs: Schema enforcement means you either get schema-valid JSON or an explicit exception—no silent failures, no half-populated objects.
- One call, any input: PDFs, images, emails, and mixed packets all use the same
POST /v2/callspattern; your app doesn’t care about the source. - Production tooling baked in: Versioned functions/workflows, idempotent execution, and audit-ready traces make it safe to move from demo to production quickly.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Function | A versioned, atomic unit in Bem that does one job (e.g., “extract invoice JSON given this schema”). | Lets you treat extraction like code: version, test, roll back, and reuse across workflows. |
| Workflow | A composed pipeline of functions (Route, Split, Transform, Enrich, etc.) that runs when you call POST /v2/calls. | Keeps the entire unstructured → structured pipeline deterministic and observable, not just the extraction step. |
| Schema-validated output | JSON that is guaranteed to conform to your JSON Schema, or else the call is flagged as an exception with per-field confidence and hallucination signals. | Your downstream systems (ERP, TMS, billing) can trust the shape of the data—no ad-hoc null checks or regex band-aids. |
How It Works (Step-by-Step)
At the simplest level, there are three pieces:
- Define the schema you want (e.g., for an invoice or BOL).
- Attach that schema to a Bem function and workflow.
POSTa PDF to/v2/callswith yourworkflowName.
Below is the minimal sequence to go from “no setup” to “PDF → schema-enforced JSON” using REST.
1. Define Your Schema (Once)
You can define your schema in the Bem UI or via API. Here’s a BOL-style example via POST /v2/functions:
curl https://api.bem.ai/v2/functions \
--request POST \
--header "X-Api-Key: YOUR_API_KEY" \
--header "Content-Type: application/json" \
--data '{
"name": "extract-bol-v1",
"type": "extract",
"schema": {
"type": "object",
"properties": {
"shipper_name": { "type": "string" },
"vessel_name": { "type": "string" },
"containers": {
"type": "array",
"items": {
"type": "object",
"properties": {
"container_id": { "type": "string" },
"seal_number": { "type": "string" }
},
"required": ["container_id"]
}
}
},
"required": ["shipper_name", "containers"]
}
}'
Response (trimmed):
{
"name": "extract-bol-v1",
"version": "1",
"id": "fn_123...",
"schema": { "...": "..." }
}
You’ll reference this function when building your workflow. Usually you’ll do that once in the UI, then just call the workflow by name from your app.
2. Create a Workflow That Uses the Function
In the Bem UI, you’d wire up a simple workflow:
- Step 1 – Route: Accept the uploaded file.
- Step 2 – Transform (Extract): Call
extract-bol-v1with your schema. - Step 3 – (Optional) Enrich: Match vendor/customer IDs from a Bem Collection.
- Step 4 – Validate: Enforce schema, set confidence thresholds, and configure exception routing.
Give this workflow a name, e.g., shipping-packet. That’s the only identifier your app needs.
(You can also create workflows via API, but for a quickstart, the UI is fastest.)
3. Call the Workflow: PDF → Schema-Validated JSON
This is the core of your quickstart: a single REST call.
cURL example
curl https://api.bem.ai/v2/calls \
--request POST \
--header "X-Api-Key: YOUR_API_KEY" \
--form "file=@/path/to/document.pdf" \
--form "workflowName=shipping-packet"
Typical success response (shape simplified):
{
"id": "call_9y3...",
"workflowName": "shipping-packet",
"status": "completed",
"schema_valid": true,
"confidence": 0.99,
"result": {
"shipper_name": "Acme Logistics Ltd",
"vessel_name": "Ever Strong",
"containers": [
{
"container_id": "TGHU1234567",
"seal_number": "SEAL-9981"
}
]
},
"meta": {
"fields": {
"shipper_name": { "confidence": 0.998, "hallucination": false },
"vessel_name": { "confidence": 0.982, "hallucination": false }
}
}
}
The key flags:
schema_valid: true→ output conforms to your JSON Schema and is safe to ingest.confidence+ per-field confidences → let you implement your own routing thresholds.
If Bem can’t produce schema-valid output (e.g., the PDF is unreadable or a required field is missing), it will flag that instead of guessing:
{
"id": "call_9y4...",
"workflowName": "shipping-packet",
"status": "exception",
"schema_valid": false,
"error": {
"code": "SCHEMA_VALIDATION_FAILED",
"message": "Missing required property: shipper_name"
}
}
You can route this to a human review Surface or your own backoffice queue.
4. Same Call in Python and Node.js
If you’re building an app, you’ll likely wrap that cURL in a client.
Python (requests)
import requests
API_KEY = "YOUR_API_KEY"
WORKFLOW_NAME = "shipping-packet"
PDF_PATH = "document.pdf"
url = "https://api.bem.ai/v2/calls"
headers = {"X-Api-Key": API_KEY}
with open(PDF_PATH, "rb") as f:
files = {"file": f}
data = {"workflowName": WORKFLOW_NAME}
resp = requests.post(url, headers=headers, files=files, data=data)
resp.raise_for_status()
payload = resp.json()
if payload.get("schema_valid"):
result = payload["result"]
# Use result directly in your system
else:
# Handle exception path
print("Exception:", payload.get("error"))
Node.js (fetch / node-fetch)
import fs from "node:fs";
import fetch from "node-fetch";
import FormData from "form-data";
const API_KEY = "YOUR_API_KEY";
const WORKFLOW_NAME = "shipping-packet";
const PDF_PATH = "document.pdf";
async function run() {
const form = new FormData();
form.append("file", fs.createReadStream(PDF_PATH));
form.append("workflowName", WORKFLOW_NAME);
const res = await fetch("https://api.bem.ai/v2/calls", {
method: "POST",
headers: {
"X-Api-Key": API_KEY,
...form.getHeaders()
},
body: form
});
if (!res.ok) {
throw new Error(`HTTP ${res.status}: ${await res.text()}`);
}
const payload = await res.json();
if (payload.schema_valid) {
const result = payload.result;
// Push to ERP, TMS, etc.
} else {
console.error("Exception:", payload.error);
// Route to manual review or retry logic
}
}
run().catch(console.error);
Common Mistakes to Avoid
-
Treating the response as “just LLM output”:
Don’t ignoreschema_valid, confidence scores, and exception status. Use them as hard gates—only ingest whenschema_valid: true, and route everything else to review or a retry workflow. -
Hard-coding layouts or templates on top of Bem:
The point is to stop chasing per-vendor templates. Let the workflow + schema handle layout variability; if you start bolting regexes and coordinates around it, you’re reintroducing the fragility Bem is designed to remove.
Real-World Example
A fleet management team we work with had a common pattern: they could get a “nice” demo extraction from invoices, but it broke the moment they hit real traffic—mixed batches, half-scanned images, different tax layouts. Their engineers were stuck mapping every new vendor PDF to a custom parser.
They switched to a schema-first Bem workflow: defined one invoice schema, wired an extract function and enrichment against their vendor master Collection, and then moved their integration to a single POST /v2/calls with the invoice PDF. Now, every invoice—regardless of layout—returns schema-valid JSON totals and line items, plus match confidence against their vendor table. When Bem can’t confidently match, the call lands in a review Surface instead of silently corrupting their GL.
Pro Tip: In your first week, log
schema_valid, per-field confidence, and exception codes for every call. Use that log as your “eval harness” to set thresholds, refine your schema, and decide where you need Surfaces for human review.
Summary
You don’t need a giant integration project to get value from Bem. Define a schema, attach it to a workflow, and hit POST /v2/calls with your PDF and workflowName. In return, you get deterministic, schema-validated JSON with traceable confidence and exception handling—exactly what you need to plug into a TMS, ERP, or core product without a pile of brittle parsing code.
From there, you can iterate into enrichment, routing, and review queues, but the core pattern stays the same: one workflow, one call, structured data out.