
Bem vs Extracta: how do they compare on long-document pricing (per-page vs per-call) at 100k and 1M+ volume?
Quick Answer: Extracta prices long documents per page; Bem prices per function call, regardless of length, layout, or input type. At 100k+ and 1M+ scale, that pricing model shift (per-page vs per-call) usually makes Bem dramatically cheaper for long packets, and much more predictable to forecast, because a 2‑page invoice and a 200‑page claim packet cost the same to process.
Why This Matters
If you’re running real volume—100k or 1M+ long documents per month—the pricing model is not a footnote. It is the architecture. Per-page billing (like Extracta and most OCR/IDP tools) ties your unit economics to layout complexity and page count. Every extra appendix, rider, or email chain adds cost and forces you to obsess over page thresholds. Bem’s per-call pricing decouples cost from length and format: one function call, one price, any input. That lets you design the pipeline you want (process the whole packet, not just the “main” pages) without getting punished for being thorough.
Key Benefits:
- Predictable unit economics at scale: With Bem, a 10-page invoice, a 50-page contract, and a 5-minute call all cost the same per function call, so 100k or 1M+ “things processed” is easy to forecast.
- Incentive to process whole packets, not cherry-picked pages: Per-call pricing plus schema-enforced JSON encourages you to run the entire packet through a single workflow, instead of hacking around page-based costs.
- Better total cost of ownership (TCO): You aren’t just saving on per-page charges; you’re avoiding the glue code, retries, and manual exception handling that per-page tools push downstream.
Core Concepts & Key Points
| Concept | Definition | Why it's important |
|---|---|---|
| Per-page pricing (Extracta-style) | You’re charged for each page processed (e.g., a 50-page contract costs ~5x a 10-page invoice). | Your cost scales with document length and layout complexity, which is exactly what you don’t control in production. |
| Per-call pricing (Bem) | You’re charged once per function call, regardless of pages, duration, or media type. A 1-page doc and a 100-page packet are the same cost. | Your cost matches the outcome (“one packet in, one structured JSON out”), making large-document and mixed-packet processing economically viable. |
| Whole-pipeline vs per-page tools | Per-page tools give you text/bounding boxes; you build the rest. Bem handles routing, extraction, enrichment, validation, and exception handling in one workflow call. | The pricing model aligns with the pipeline: with Bem, you pay once for the whole workflow, not per slice of raw text. This compresses both cost and engineering time at scale. |
How It Works (Step-by-Step)
Bem is built for “unstructured in → structured, schema-valid JSON out” in a single call. Here’s how that plays out when you’re comparing Extracta’s per-page pricing with Bem’s per-call pricing at 100k and 1M+ volume.
1. Define the workflow & schema
You start by defining the outcome you want, not the pages you have.
-
Define your JSON Schema
Example for a 50-page invoice packet with line items, terms, and remittance details:{ "type": "object", "properties": { "vendor_name": { "type": "string" }, "invoice_number": { "type": "string" }, "invoice_date": { "type": "string", "format": "date" }, "total_amount": { "type": "number" }, "currency": { "type": "string", "enum": ["USD", "EUR", "GBP"] }, "line_items": { "type": "array", "items": { "type": "object", "properties": { "description": { "type": "string" }, "quantity": { "type": "number" }, "unit_price": { "type": "number" }, "line_total": { "type": "number" } }, "required": ["description", "line_total"] } } }, "required": ["vendor_name", "invoice_number", "total_amount"] } -
Compose a Bem workflow using primitives:
- Route – detect document type (invoice, contract, claim packet, email thread) and choose a function/workflow.
- Split – if needed, split packet into logical parts (main form, attachments, emails), not pages.
- Transform/Enrich – normalize dates, amounts, and vendor names using your own Collections (e.g., vendor master, GL codes).
- Validate – ensure output is schema-valid; if something is off, flag instead of guessing.
- Surfaces – send low-confidence fields to human review; corrections feed back into evals and training.
One workflow, one function call, one price. Whether the input is 8 pages or 80.
2. Call the API: one call vs many
With Extracta-style per-page tools:
- A 50-page contract = 50 pages * (transcription + extraction) * N retries.
- Mixed packets (scanned docs + emails + images) often become multiple calls per document because you have to normalize each piece.
With Bem:
curl -X POST https://api.bem.ai/v1/workflows/claims-packet-run \
-H "Authorization: Bearer $BEM_API_KEY" \
-F "file=@/path/to/50-page-contract.pdf"
- That single call handles routing, extraction, enrichment, validation, and exception pathing.
- A 10-page PDF, 50-page contract, 5-minute call, or WhatsApp thread are all one function call at the same price tier.
3. Cost at 100k and 1M+ volume
Using Bem’s published example numbers and typical per-page pricing ranges for long documents, you can see how the models diverge.
Bem (per function call, any length / type):
From the provided pricing example:
-
First 100 calls: free
-
1–10,000 calls: $0.09 / call
-
10,001–100,000 calls: $0.07 / call
-
The example shows a 50,000-call month:
- 100 free
- 9,899 calls @ $0.09
- 39,999 calls @ $0.07
- Total: $3,691 → $0.0738 / call blended
Now extrapolate to longer-document workloads:
At 100k documents/month
Assume:
- Workload: 100k “packets” (long invoices, contracts, claims) per month.
- Bem: per-call pricing, per our example.
- Extracta: per-page pricing (using the ranges benchmarking against “per-page tools” in the docs).
From Bem’s own comparison table:
- 10-page PDF: $0.40 – $0.90 for per-page tools vs $0.09 for Bem.
- 50-page contract: $2.00 – $4.50 for per-page tools vs $0.09 for Bem.
Let’s do simple ranges:
-
Scenario A: average packet 10 pages
- Extracta-style per-page:
- Low: $0.40 * 100,000 = $40,000/month
- High: $0.90 * 100,000 = $90,000/month
- Bem per-call (assuming ~7–9¢ blended in this range):
- 100,000 calls * ~$0.07–$0.09 ≈ $7,000–$9,000/month
- Extracta-style per-page:
-
Scenario B: average packet 50 pages
- Extracta-style per-page:
- Low: $2.00 * 100,000 = $200,000/month
- High: $4.50 * 100,000 = $450,000/month
- Bem per-call:
- Same 100,000 calls * ~$0.07–$0.09 ≈ $7,000–$9,000/month
- Extracta-style per-page:
Even if Extracta gives you “volume discounts,” you have to beat a 20–50x difference on early-page pricing just to get in the same ballpark. And you still pay more if documents get longer.
At 1M documents/month
Now scale the same logic.
-
Scenario A: average packet 10 pages
- Extracta-style per-page:
- Low: $0.40 * 1,000,000 = $400,000/month
- High: $0.90 * 1,000,000 = $900,000/month
- Bem per-call:
- 1,000,000 calls * (assume graduated pricing + volume commitments)
Even if the blended rate dropped only modestly (say to ~$0.05–$0.07 / call), that’s:- $50,000–$70,000/month
- 1,000,000 calls * (assume graduated pricing + volume commitments)
- Extracta-style per-page:
-
Scenario B: average packet 50 pages
- Extracta-style per-page:
- Low: $2.00 * 1,000,000 = $2M/month
- High: $4.50 * 1,000,000 = $4.5M/month
- Bem per-call:
- Still the same 1,000,000 calls * ~$0.05–$0.07 = $50,000–$70,000/month
- Extracta-style per-page:
The exact Extracta numbers will depend on their current per-page schedule and discounts. The pattern will not: longer documents get linearly (or worse) more expensive, while Bem is flat per document/packet.
And notice: we haven’t even priced audio, video, WhatsApp, or email threads—things per-page tools often don’t support at all, or force through separate, additional pipelines.
Common Mistakes to Avoid
-
Optimizing your process around per-page pricing instead of around your business data.
Teams using per-page tools often truncate packets (“just the first 3 pages”) to save money, then rebuild the rest manually. This is backwards. With Bem’s per-call model, design your schema to reflect the whole packet and let the workflow handle it. -
Comparing headline “per-page” rates to Bem’s per-call price without modeling real packets.
A $0.02/page rate looks cheap until you’re processing 50-page contracts, retries, and exception reruns. Always model cost on real traffic: longest packets, worst-case page counts, and expected re-runs. Then compare that to Bem’s flat per-call pricing on the same packet count.
Real-World Example
Imagine an AP team at a logistics company:
- They receive a mix of:
- 8–15 page invoices
- 30–60 page carrier packets (BOLs, PODs, rate confirmations, emails)
- Occasional 100+ page dispute files
On Extracta-style per-page pricing, this becomes:
- Hundreds of thousands of pages per month.
- Invoices and packets are split into multiple “jobs” to keep costs down.
- Long disputes are often handled manually because “it’s not worth the pages.”
On Bem, they flip the model:
- One packet = one function call, whether it’s 8 or 80 pages.
- The workflow does:
- Route packet type (invoice, carrier packet, dispute).
- Split logical sections (invoice, BOL, email chain) without caring about page boundaries.
- Transform & Enrich with their vendor master and rate tables.
- Validate against a strict JSON Schema (e.g., totals must match sum of line items).
- Route exceptions (mismatched totals, unknown vendors, low-confidence fields) to a Surface for AP review.
Operationally, they go from:
- Manually picking “which pages to send to the extractor” → sending the full packet every time.
- Paying for every extra appendix → paying once per document.
- Seeing cost spike with contract complexity → having flat, predictable unit economics even as packets grow.
Pro Tip: When you evaluate Bem vs Extracta, don’t just compare “per-page” and “per-call” in a vacuum. Take your actual traffic—largest packets, expected growth, and re-run rates—then run three scenarios: 10-page average, 25-page average, 50-page average. If the per-page tool still looks cheaper, you probably modeled the demo, not production.
Summary
Bem and Extracta represent two different philosophies:
- Extracta and other per-page tools sell you OCR/extraction as a metered commodity. The longer and messier your documents, the more you pay. You still own the pipeline, the glue code, and the exception handling.
- Bem sells you outcomes: unstructured input in, schema-enforced JSON out, for a single per-call price across PDFs, images, audio, video, WhatsApp/SMS, and email threads. Long documents and mixed packets don’t change your unit cost; they just make the workflow more valuable.
At 100k and 1M+ volume, per-page pricing compounds exactly where you can’t control it—page count and layout. Bem’s per-call pricing plus deterministic workflows, schema validation, and exception routing let you scale unstructured processing like software, not like labor.