Bem API quickstart: what’s the simplest REST example to send a PDF and get schema-validated JSON back?
Unstructured Data Extraction APIs

Bem API quickstart: what’s the simplest REST example to send a PDF and get schema-validated JSON back?

7 min read

Most teams don’t need a full platform tour to get started with Bem. You just want the simplest REST call that takes a PDF in, runs it through a workflow, and returns schema-validated JSON out. This quickstart gives you exactly that, plus the minimum context you need to make it production-safe instead of “demo-only.”

Quick Answer: The simplest Bem API call is a single POST /v2/calls with multipart/form-data: send your PDF as file and the name of your workflow as workflowName. Bem returns a JSON payload that includes your schema-enforced output, per-field confidence, and a schema_valid flag so you can trust—or route—each result deterministically.

Why This Matters

For most teams, the hard part isn’t “calling an LLM.” It’s getting reliable, schema-correct JSON from messy PDFs without building a pile of brittle parsing code. You want one REST call that works across vendors, layouts, and edge cases—and that either gives you data your system can ingest or a clear exception your operators can handle.

Bem is designed as that production layer. Instead of per-page OCR or a fragile prompt, you define your schema once, wire it into a workflow, and then hit a single endpoint with any document. You get strict JSON validation, confidence scores, and exception routing out of the box.

Key Benefits:

  • Deterministic outputs: Schema enforcement means you either get schema-valid JSON or an explicit exception—no silent failures, no half-populated objects.
  • One call, any input: PDFs, images, emails, and mixed packets all use the same POST /v2/calls pattern; your app doesn’t care about the source.
  • Production tooling baked in: Versioned functions/workflows, idempotent execution, and audit-ready traces make it safe to move from demo to production quickly.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
FunctionA versioned, atomic unit in Bem that does one job (e.g., “extract invoice JSON given this schema”).Lets you treat extraction like code: version, test, roll back, and reuse across workflows.
WorkflowA composed pipeline of functions (Route, Split, Transform, Enrich, etc.) that runs when you call POST /v2/calls.Keeps the entire unstructured → structured pipeline deterministic and observable, not just the extraction step.
Schema-validated outputJSON that is guaranteed to conform to your JSON Schema, or else the call is flagged as an exception with per-field confidence and hallucination signals.Your downstream systems (ERP, TMS, billing) can trust the shape of the data—no ad-hoc null checks or regex band-aids.

How It Works (Step-by-Step)

At the simplest level, there are three pieces:

  1. Define the schema you want (e.g., for an invoice or BOL).
  2. Attach that schema to a Bem function and workflow.
  3. POST a PDF to /v2/calls with your workflowName.

Below is the minimal sequence to go from “no setup” to “PDF → schema-enforced JSON” using REST.


1. Define Your Schema (Once)

You can define your schema in the Bem UI or via API. Here’s a BOL-style example via POST /v2/functions:

curl https://api.bem.ai/v2/functions \
  --request POST \
  --header "X-Api-Key: YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "name": "extract-bol-v1",
    "type": "extract",
    "schema": {
      "type": "object",
      "properties": {
        "shipper_name": { "type": "string" },
        "vessel_name":  { "type": "string" },
        "containers": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "container_id": { "type": "string" },
              "seal_number":  { "type": "string" }
            },
            "required": ["container_id"]
          }
        }
      },
      "required": ["shipper_name", "containers"]
    }
  }'

Response (trimmed):

{
  "name": "extract-bol-v1",
  "version": "1",
  "id": "fn_123...",
  "schema": { "...": "..." }
}

You’ll reference this function when building your workflow. Usually you’ll do that once in the UI, then just call the workflow by name from your app.


2. Create a Workflow That Uses the Function

In the Bem UI, you’d wire up a simple workflow:

  • Step 1 – Route: Accept the uploaded file.
  • Step 2 – Transform (Extract): Call extract-bol-v1 with your schema.
  • Step 3 – (Optional) Enrich: Match vendor/customer IDs from a Bem Collection.
  • Step 4 – Validate: Enforce schema, set confidence thresholds, and configure exception routing.

Give this workflow a name, e.g., shipping-packet. That’s the only identifier your app needs.

(You can also create workflows via API, but for a quickstart, the UI is fastest.)


3. Call the Workflow: PDF → Schema-Validated JSON

This is the core of your quickstart: a single REST call.

cURL example

curl https://api.bem.ai/v2/calls \
  --request POST \
  --header "X-Api-Key: YOUR_API_KEY" \
  --form "file=@/path/to/document.pdf" \
  --form "workflowName=shipping-packet"

Typical success response (shape simplified):

{
  "id": "call_9y3...",
  "workflowName": "shipping-packet",
  "status": "completed",
  "schema_valid": true,
  "confidence": 0.99,
  "result": {
    "shipper_name": "Acme Logistics Ltd",
    "vessel_name": "Ever Strong",
    "containers": [
      {
        "container_id": "TGHU1234567",
        "seal_number": "SEAL-9981"
      }
    ]
  },
  "meta": {
    "fields": {
      "shipper_name": { "confidence": 0.998, "hallucination": false },
      "vessel_name":  { "confidence": 0.982, "hallucination": false }
    }
  }
}

The key flags:

  • schema_valid: true → output conforms to your JSON Schema and is safe to ingest.
  • confidence + per-field confidences → let you implement your own routing thresholds.

If Bem can’t produce schema-valid output (e.g., the PDF is unreadable or a required field is missing), it will flag that instead of guessing:

{
  "id": "call_9y4...",
  "workflowName": "shipping-packet",
  "status": "exception",
  "schema_valid": false,
  "error": {
    "code": "SCHEMA_VALIDATION_FAILED",
    "message": "Missing required property: shipper_name"
  }
}

You can route this to a human review Surface or your own backoffice queue.


4. Same Call in Python and Node.js

If you’re building an app, you’ll likely wrap that cURL in a client.

Python (requests)

import requests

API_KEY = "YOUR_API_KEY"
WORKFLOW_NAME = "shipping-packet"
PDF_PATH = "document.pdf"

url = "https://api.bem.ai/v2/calls"
headers = {"X-Api-Key": API_KEY}

with open(PDF_PATH, "rb") as f:
    files = {"file": f}
    data = {"workflowName": WORKFLOW_NAME}

    resp = requests.post(url, headers=headers, files=files, data=data)

resp.raise_for_status()
payload = resp.json()

if payload.get("schema_valid"):
    result = payload["result"]
    # Use result directly in your system
else:
    # Handle exception path
    print("Exception:", payload.get("error"))

Node.js (fetch / node-fetch)

import fs from "node:fs";
import fetch from "node-fetch";
import FormData from "form-data";

const API_KEY = "YOUR_API_KEY";
const WORKFLOW_NAME = "shipping-packet";
const PDF_PATH = "document.pdf";

async function run() {
  const form = new FormData();
  form.append("file", fs.createReadStream(PDF_PATH));
  form.append("workflowName", WORKFLOW_NAME);

  const res = await fetch("https://api.bem.ai/v2/calls", {
    method: "POST",
    headers: {
      "X-Api-Key": API_KEY,
      ...form.getHeaders()
    },
    body: form
  });

  if (!res.ok) {
    throw new Error(`HTTP ${res.status}: ${await res.text()}`);
  }

  const payload = await res.json();

  if (payload.schema_valid) {
    const result = payload.result;
    // Push to ERP, TMS, etc.
  } else {
    console.error("Exception:", payload.error);
    // Route to manual review or retry logic
  }
}

run().catch(console.error);

Common Mistakes to Avoid

  • Treating the response as “just LLM output”:
    Don’t ignore schema_valid, confidence scores, and exception status. Use them as hard gates—only ingest when schema_valid: true, and route everything else to review or a retry workflow.

  • Hard-coding layouts or templates on top of Bem:
    The point is to stop chasing per-vendor templates. Let the workflow + schema handle layout variability; if you start bolting regexes and coordinates around it, you’re reintroducing the fragility Bem is designed to remove.

Real-World Example

A fleet management team we work with had a common pattern: they could get a “nice” demo extraction from invoices, but it broke the moment they hit real traffic—mixed batches, half-scanned images, different tax layouts. Their engineers were stuck mapping every new vendor PDF to a custom parser.

They switched to a schema-first Bem workflow: defined one invoice schema, wired an extract function and enrichment against their vendor master Collection, and then moved their integration to a single POST /v2/calls with the invoice PDF. Now, every invoice—regardless of layout—returns schema-valid JSON totals and line items, plus match confidence against their vendor table. When Bem can’t confidently match, the call lands in a review Surface instead of silently corrupting their GL.

Pro Tip: In your first week, log schema_valid, per-field confidence, and exception codes for every call. Use that log as your “eval harness” to set thresholds, refine your schema, and decide where you need Surfaces for human review.

Summary

You don’t need a giant integration project to get value from Bem. Define a schema, attach it to a workflow, and hit POST /v2/calls with your PDF and workflowName. In return, you get deterministic, schema-validated JSON with traceable confidence and exception handling—exactly what you need to plug into a TMS, ERP, or core product without a pile of brittle parsing code.

From there, you can iterate into enrichment, routing, and review queues, but the core pattern stays the same: one workflow, one call, structured data out.

Next Step

Get Started