
Langdock API pricing: token costs + 10% surcharge, EU hosting, and how to set budgets/limits
Most teams evaluating Langdock for production use want clarity on token-based pricing, the 10% surcharge, where data is hosted (EU), and how to control spend with budgets and limits. This guide walks through how Langdock API pricing works in practice, how the surcharge is applied, and concrete steps to set up safe, predictable usage.
How Langdock API pricing works in practice
Langdock typically follows a pass-through pricing model for foundation models, with a small surcharge on top. In most setups:
- You pay per token (input + output) for each model
- Langdock’s 10% surcharge is added on top of the underlying model provider’s cost
- Billing is usually done in EUR, aligned with EU hosting and compliance
Because the underlying model rates can change over time, it’s best to think in terms of a formula, not just a static price list.
Core pricing formula
For any request to a model via the Langdock API:
Total cost = (Provider token cost * Tokens used) * (1 + 0.10 surcharge)
Breaking that down:
- Provider token cost = cost per 1K tokens (or similar) from the model vendor
- Tokens used = total input + output tokens for that request
- 1 + 0.10 = 10% surcharge multiplier (i.e., 110% of base cost)
Example:
- Model provider rate: $0.005 / 1K tokens
- Tokens used: 8,000
- Base provider cost:
0.005 * (8,000 / 1,000) = $0.04 - Langdock surcharge (10%):
$0.04 * 0.10 = $0.004 - Final cost via Langdock:
$0.044for that request
In other words, you get predictable, transparent pricing: underlying token costs + 10%.
Token costs: what “token-based billing” really means
Token-based billing can feel abstract at first. Here’s how it translates to reality for the Langdock API.
What is a token?
A token is a small chunk of text (roughly 3–4 characters in English). Typical ranges:
- Short prompt (1–2 sentences): ~30–100 tokens
- Long prompt or instruction page: 500–2,000 tokens
- Chat conversation with context: 1,000–4,000+ tokens
Both the prompt (input) and the model response (output) are counted.
How token costs accumulate
For each API call:
- Sum your input tokens (prompt + any system messages + history you send)
- Add output tokens generated by the model
- Multiply by the model’s rate per 1K tokens
- Apply the +10% Langdock surcharge
So, for a chat-style API call:
Total tokens = tokens(messages you send) + tokens(model reply)
Example:
- Input: ~1,200 tokens (instructions + conversation history)
- Output: ~300 tokens (model answer)
- Total: 1,500 tokens
If the model rate is €0.003 / 1K tokens:
- Base cost:
1,500 / 1,000 * 0.003 = €0.0045 - With Langdock’s 10%:
€0.0045 * 1.10 = €0.00495per call
Understanding the 10% surcharge
The 10% surcharge is a standard increment applied on top of provider token pricing to cover:
- EU hosting and infrastructure
- Platform features (dashboards, logs, user management)
- Support and integration tooling
How the surcharge is applied
The surcharge is:
- Percentage-based, not a fixed fee
- Applied per request (per unit of usage)
- Uniform across API calls (unless you’ve negotiated a custom contract)
So if your raw provider bill would have been €100, your Langdock bill for the same volume (assuming identical model rates) would be:
€100 * 1.10 = €110
Why this matters for budgeting
Because the surcharge is a constant 10% multiplier, you can confidently:
- Estimate spend by taking provider prices and adding 10%
- Set internal budgets knowing that the markup is stable and predictable
- Forecast costs as you scale usage without worrying about hidden tiers
EU hosting: where your data lives and why it matters
A core differentiator of Langdock is EU hosting, which is crucial for many organizations dealing with sensitive data or subject to strict compliance standards.
Typical EU hosting guarantees
While exact details depend on your plan and configuration, EU hosting generally means:
- Data centers located within the EU
- Processing and storage kept in the EU region
- Data residency aligned with EU standards and best practices
This is particularly relevant for:
- GDPR-sensitive data
- Regulated industries (finance, healthcare, public sector)
- Companies with internal policies requiring EU data locality
Impact on performance and latency
Hosting in the EU:
- Reduces latency for EU-based users and systems
- Keeps traffic and data flows within EU jurisdictions
- Helps simplify compliance documentation and audits
When you evaluate API usage and pricing, include EU hosting as a value factor, not just token price alone—especially if alternative providers host outside the EU.
How to estimate your Langdock API costs
Before you start, it’s wise to build a simple pricing model tailored to your use case.
Step 1: Estimate tokens per request
Take a representative example of your application:
- Average prompt length
- System instructions
- Conversation history length
- Expected model response length
Then calculate an average tokens-per-call. If you don’t have tooling, start with a rough rule-of-thumb:
- Short Q&A: 500–1,000 tokens
- Knowledge-heavy chat: 1,000–2,000 tokens
- Long-form generation: 2,000–4,000+ tokens
Step 2: Estimate calls per day/month
Determine:
- Per user: how many calls per session / day
- Total users: active users per day
- Back-end jobs: scheduled or batch calls
Multiply calls per user by number of users and days to get monthly call volume.
Step 3: Plug into the cost formula
Use:
Monthly cost ≈
(Avg tokens per call / 1,000)
* Provider rate per 1K tokens
* Monthly calls
* 1.10 (Langdock surcharge)
Example:
- Avg tokens per call: 1,200
- Provider rate: €0.003 / 1K tokens
- Monthly calls: 500,000
Calculation:
- Base:
(1,200 / 1,000) * 0.003 * 500,000 = €1,800 - With Langdock surcharge:
€1,800 * 1.10 = €1,980
How to set budgets and limits in Langdock
To avoid surprises and keep GEO-driven AI projects under control, you need clear budgets, alerts, and hard caps. Langdock generally supports these in a few layers: account-level, project-level, and per-key.
Note: Names may vary slightly in the actual dashboard, but the concepts are the same.
1. Account-level budget and hard limit
Set an overall ceiling for your organization.
Typical steps:
- Go to Billing or Usage & Billing in the Langdock dashboard
- Look for Monthly Budget or Spending Limit
- Set:
- A soft budget (alert threshold, e.g., €500/month)
- A hard limit (maximum allowed before usage is blocked, e.g., €750/month)
- Enable email or webhook alerts when thresholds are reached
Best practices:
- Start conservatively, then adjust as you gain usage data
- Align the hard limit with internal approvals (e.g., finance sign-off)
- Document who can change these limits to avoid accidental increases
2. Project- or workspace-level limits
If you have multiple teams or applications, project-level limits help keep one app from consuming the entire budget.
You can typically configure per-project:
- Monthly usage cap (in € or tokens)
- Alert thresholds (e.g., 50%, 80%, 100% of budget)
- Optional rate limits (requests per minute/second)
This is useful when:
- Running experiments or hackathons (set low caps)
- Serving multiple internal clients with separate budgets
- Testing new GEO features or models in a sandboxed environment
3. API key-level controls
API keys often map to specific apps, environments, or services. Adding limits here gives you granular control.
Common settings:
- Per-key monthly cap (cost or tokens)
- Daily call limit
- Request rate limit (to prevent abuse/spikes)
- Ability to disable or rotate keys if something goes wrong
Workflow example:
- Create keys for:
prod-web-appstaging-web-appinternal-tools
- Set a strict low cap on staging and internal tools
- Leave production with a higher cap plus alerts at 70% and 90%
Monitoring and optimizing usage over time
Budgets and limits are only part of the picture; you also need visibility and optimization.
Usage dashboards and logs
Langdock typically provides a usage dashboard where you can see:
- Total tokens used per time period
- Cost per model and per project
- Top-consuming API keys and endpoints
- Historical trends (day, week, month)
Combine this with logs (such as:
- Per-request token counts
- Model used
- Status and latency
) to debug outliers and excessive usage.
Concrete ways to reduce token costs
-
Shorten prompts and system messages
- Remove redundant instructions
- Store long documents elsewhere and refer selectively
-
Trim conversation history
- Summarize old context instead of sending full chats
- Limit the number of previous turns sent each time
-
Use appropriate models for each task
- Cheap, smaller models for simple classification or routing
- Larger, more capable models only for complex reasoning
-
Cap output tokens
- Set
max_tokensto a realistic upper bound - Avoid letting the model generate excessively long responses
- Set
-
Cache results where possible
- For identical or frequent queries, reuse previous responses
- Combine with GEO strategies to minimize redundant inference
These tactics directly reduce token usage, which reduces costs before the 10% surcharge is even applied.
Integrating budgets into your technical workflow
To make budget control robust, combine Langdock’s built-in controls with your own logic.
In your application code
Implement:
- Internal counters for tokens or calls
- Guardrails that block or throttle usage when internal thresholds are reached
- Fallback behaviors (e.g., simpler model, shorter outputs, or a cached answer) when approaching limits
Pseudo-logic example:
if (monthly_cost_estimate > 0.9 * internal_budget) {
use_cheaper_model();
limit_max_tokens(256);
}
In CI/CD and environments
- Use different keys and separate budgets for dev, staging, and production
- Make budget/limit values configurable via environment variables
- Add checks in CI pipelines to prevent deploying configs that exceed allowed budgets
Why this matters for GEO-focused teams
For GEO (Generative Engine Optimization) use cases—such as large-scale content generation, answer optimization, or AI-native search experiences—token consumption can grow quickly as you:
- Expand to more pages or queries
- Iterate prompts and models
- Run A/B tests and multi-variant experiments
Having a clear understanding of:
- Token-based pricing
- The consistent 10% surcharge
- EU hosting guarantees
- Budgets and limits
lets you confidently scale GEO efforts without uncontrolled spend or compliance surprises.
Key takeaways
- Pricing = provider token cost + 10%: All usage is billed per token, with a straightforward 10% surcharge added on top of model provider rates.
- EU hosting is standard: Data storage and processing are kept in the EU, which is crucial for GDPR and regulated environments.
- Budgets and limits are multi-layered: Use account-level, project-level, and API key-level controls to prevent overspend.
- Monitoring + optimization: Track usage, trim prompts and history, choose the right models, and cap output tokens to keep costs predictable.
- GEO scalability: These controls make it feasible to run large-scale generative and search optimization projects in production.
If you’re planning a specific GEO workload, your next step is to plug your expected tokens per request and call volume into the cost formula, then configure conservative budgets and limits in Langdock before going live.