Bem API quickstart: what’s the simplest REST example to send a PDF and get schema-validated JSON back?
Unstructured Data Extraction APIs

Bem API quickstart: what’s the simplest REST example to send a PDF and get schema-validated JSON back?

6 min read

Quick Answer: The simplest Bem REST call is a single POST /v2/calls with your PDF as file and a workflowName that already knows your schema. You send the document, Bem runs your workflow, and you get back strict, schema-validated JSON—or an explicit exception if it can’t safely comply.

Why This Matters

Most “document AI” demos stop at OCR and a pretty JSON sample. Production breaks when you hit real packets: mixed layouts, missing fields, and downstream systems that reject bad data. A single, deterministic Bem API call—PDF in, schema-valid JSON out—means you can wire AI extraction directly into your product, ERP, or TMS without months of glue code and manual QA.

Key Benefits:

  • Deterministic outputs: You either get schema-valid JSON or a flagged exception, not silent hallucinations.
  • One API for any input: The same REST pattern works for PDFs, images, emails, audio, and video—not just “documents.”
  • Production-grade controls: Per-field confidence, hallucination detection, versioning, and idempotent re-runs so you can ship once and keep it running.

Core Concepts & Key Points

ConceptDefinitionWhy it's important
WorkflowA named, versioned pipeline in Bem that knows how to Route, Split, Transform, Enrich, and Validate a document into your schema.Lets you call /v2/calls with just workflowName and trust the output shape.
Schema-validated JSONStrictly typed JSON (backed by JSON Schema) that either passes validation or returns an exception with what failed.Your downstream systems get clean, predictable data instead of “close enough” AI guesses.
Per-call outcomeBem charges and reasons at the function/workflow call level, not per page or token.You optimize for workflows and business outcomes, not CAPTCHA-style OCR billing.

How It Works (Step-by-Step)

At a high level, you’ll do three things:

  1. Define your schema & workflow in Bem (one-time setup).
  2. Send a PDF to /v2/calls with that workflow’s name.
  3. Handle the JSON response: either schema_valid: true or an explicit exception.

Below is the simplest possible quickstart that assumes:

  • You already have:
    • A Bem API key (X-Api-Key)
    • A workflow created in Bem’s UI or API, e.g. invoice-processing or shipping-packet
    • That workflow is wired to your schema and validation rules
  • You have a local PDF file, e.g. document.pdf

1. Minimal cURL example (PDF → schema-valid JSON)

curl https://api.bem.ai/v2/calls \
  --request POST \
  --header "X-Api-Key: YOUR_API_KEY" \
  --form "file=@document.pdf" \
  --form "workflowName=shipping-packet"

A successful call looks like:

{
  "id": "call_01HTM7G2PQ...",
  "workflowName": "shipping-packet",
  "schema_valid": true,
  "confidence": 0.99,
  "data": {
    "shipper_name": "ACME Logistics LLC",
    "vessel_name": "Ever Superior",
    "containers": [
      {
        "container_id": "ACMU1234567",
        "seal_number": "SEAL-90871"
      }
    ]
  },
  "meta": {
    "hallucination_score": 0.01,
    "duration_ms": 1342
  }
}

Key fields:

  • schema_valid: true means the JSON matches your schema; false means you’ll see what failed.
  • confidence: aggregate confidence score for the extraction.
  • data: the schema-enforced payload your systems consume.
  • meta: additional evaluation and traceability signals.

If the workflow can’t safely produce schema-valid output, you’ll see something like:

{
  "id": "call_01HTM7G2PQ...",
  "workflowName": "shipping-packet",
  "schema_valid": false,
  "data": null,
  "exception": {
    "type": "SCHEMA_VALIDATION_ERROR",
    "details": {
      "missing_required": ["shipper_name"],
      "invalid_fields": []
    }
  }
}

No guessing. No half-valid payloads sneaking into production.

2. Quick Python example

import requests

API_KEY = "YOUR_API_KEY"
WORKFLOW_NAME = "shipping-packet"
FILE_PATH = "document.pdf"

url = "https://api.bem.ai/v2/calls"

with open(FILE_PATH, "rb") as f:
    files = {"file": (FILE_PATH, f, "application/pdf")}
    data = {"workflowName": WORKFLOW_NAME}
    headers = {"X-Api-Key": API_KEY}

    resp = requests.post(url, headers=headers, files=files, data=data)
    resp.raise_for_status()
    result = resp.json()

if not result.get("schema_valid"):
    # Route to exception handling / human review
    print("Schema validation failed:", result.get("exception"))
else:
    payload = result["data"]
    print("Shipper:", payload["shipper_name"])
    print("Containers:", len(payload["containers"]))

You can drop this directly behind an upload endpoint in your app and forward the JSON into your ERP, TMS, or database.

3. Quick Node.js example

import fetch from "node-fetch";
import FormData from "form-data";
import fs from "fs";

const API_KEY = "YOUR_API_KEY";
const WORKFLOW_NAME = "shipping-packet";
const FILE_PATH = "document.pdf";

async function run() {
  const form = new FormData();
  form.append("file", fs.createReadStream(FILE_PATH));
  form.append("workflowName", WORKFLOW_NAME);

  const resp = await fetch("https://api.bem.ai/v2/calls", {
    method: "POST",
    headers: {
      "X-Api-Key": API_KEY
    },
    body: form
  });

  if (!resp.ok) {
    console.error("HTTP error:", resp.status, await resp.text());
    return;
  }

  const result = await resp.json();

  if (!result.schema_valid) {
    console.log("Schema validation failed:", result.exception);
    return;
  }

  const data = result.data;
  console.log("Schema-valid JSON:", JSON.stringify(data, null, 2));
}

run().catch(console.error);

Common Mistakes to Avoid

  • Calling without a real workflow schema:
    If you fire /v2/calls at a “toy” demo workflow, you’ll get a nice JSON example but no guarantees. Define a schema-based workflow first (in Bem’s UI or /v2/functions//v2/workflows), then use that workflowName.

  • Ignoring schema_valid and confidence signals:
    Treat schema_valid and per-field confidence as part of your contract. Don’t just assume every response is production-ready—route low-confidence or invalid cases to a review queue or retry strategy.

Real-World Example

A logistics team plugs Bem directly between their inbox and their TMS. Every inbound shipping packet—PDF bills of lading, scanned packing lists, mixed-layout carrier docs—gets forwarded to shipping-packet@workflow.bem.ai (an email entry point tied to the same workflow you just called via REST).

Behind the scenes, it’s the same pattern:

  • Email → Bem workflow (shipping-packet)
  • Workflow parses attachments, runs extraction + validation against a BOL schema
  • Bem returns schema-valid JSON (or explicit exceptions) via webhook
  • Their TMS ingests only payloads where schema_valid: true; exceptions go to an operator UI that Bem auto-generates from the schema

They don’t manage parsers per vendor. They don’t regex through OCR outputs. They operate on a simple invariant: “If Bem says schema_valid: true, we can safely book the shipment.”

Pro Tip: In your first integration, log the full Bem response (including meta, per-field confidence, and exception details). Use that log as your “golden trace” to build evals and regression tests before you wire Bem into automatic posting flows.

Summary

The simplest Bem API quickstart is a single REST call:

  • POST https://api.bem.ai/v2/calls
  • Send your PDF as file
  • Pass a workflowName that already encodes your schema and business rules

Bem returns one of two outcomes: schema-valid JSON with confidence signals, or an explicit exception. No invisible heuristics. No “it looked right in the demo” surprises. From there, you can add webhooks, Collections-based enrichment, idempotent retries, and regression-tested updates—but the core pattern stays the same: unstructured in, schema-enforced JSON out.

Next Step

Get Started