
How do I sign up for together.ai and buy credits (what’s the minimum purchase) to start using the API?
Most teams can go from no account to making their first API call on together.ai in under 10 minutes. You create a free account, get automatic starter credits, then optionally add paid credits or move to a custom plan once you know your traffic profile and cost envelope.
Note: together.ai currently grants free credits on new accounts so you can start using the API without an upfront minimum purchase. Exact free-credit amounts and paid pricing tiers can change, so always confirm on the pricing page or with Sales.
Quick Answer: You sign up for together.ai by creating a free account on the website, then generating an API key from your dashboard. New accounts typically receive free credits so you can start using the API without a minimum purchase. When you’re ready to scale, you can add paid credits or talk to Sales for a custom plan.
The Quick Overview
- What It Is: A self-serve way to register for together.ai, get an API key, and fund usage (via free credits and then paid credits or plans) so you can call the AI Native Cloud APIs.
- Who It Is For: Developers, data teams, and AI product owners who want fast, cost-efficient access to top open-source and partner models via an OpenAI-compatible API.
- Core Problem Solved: Removes the friction of standing up GPUs and serving infrastructure yourself, so you can start testing models (and later scale to production) without long-term commitments.
How It Works
At a high level, the flow is:
- Create a together.ai account (free).
- Get your API key and confirm your free credits.
- Make your first calls in the Playground or via the OpenAI-compatible API.
- When you’re ready to ramp usage, add billing (buy credits or choose a plan) and select the right deployment mode—Serverless Inference, Batch Inference, Dedicated Model Inference, Dedicated Container Inference, or GPU Clusters.
Once billing is configured, your usage is metered at the token or GPU level depending on the product. You keep full control over spend by choosing the appropriate mode for your workload and scaling pattern.
-
Sign Up & Verify Email:
Go to together.ai and create an account using your work email. You’ll receive a verification email; once confirmed, you can access the dashboard. -
Generate an API Key & Start with Free Credits:
Inside the dashboard, generate an API key. New accounts typically receive free credits automatically, allowing you to start using the API (text, image, code, etc.) with no minimum purchase. Use Together Sandbox and the Playground to validate models, prompts, and latency. -
Add Billing & Scale to Production:
As you move from experimentation to production, add a payment method or talk to Sales. You can then:- Keep using Serverless Inference for variable traffic.
- Use Batch Inference to process up to 30 billion tokens at up to 50% lower cost.
- Spin up Dedicated Model Inference or Dedicated Container Inference for steady, latency-sensitive workloads.
- Reserve GPU Clusters when you need full control over training/fine-tuning or custom runtimes.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Free Credits at Sign-Up | New accounts receive starter credits tied to your API key. | Start testing models and latency with no minimum purchase or commitment. |
| OpenAI-Compatible API | Exposes together.ai models via an interface compatible with OpenAI. | Swap in together.ai with no code changes in most clients; quickly benchmark price-performance. |
| Multiple Deployment Modes | Serverless, Batch, Dedicated Model, Dedicated Container, GPU Clusters. | Match cost and latency to workload: 2.75x faster inference and up to 50% less cost for large jobs. |
| Every Modality, One API | Text, image, video, code, and voice served on the AI Native Cloud. | Avoid stitching multiple providers; simplify auth, logging, and cost tracking. |
| Research-to-Production Engine | Uses FlashAttention, Together Kernel Collection, ATLAS, and CPD. | Higher throughput and lower latency (e.g., 2x faster serverless on top models) on the same budget. |
| Enterprise-Ready Controls | SOC 2 Type II, tenant-level isolation, encryption in transit/at rest. | Safely move prototypes into always-on production; your data and models remain fully under your ownership. |
Ideal Use Cases
-
Best for teams starting evaluation: Because you can sign up for together.ai, get free credits, and call the OpenAI-compatible API without negotiating a contract or committing to a minimum spend. This is ideal for:
- Porting an existing OpenAI integration to test latency/cost parity.
- Running small POCs: RAG workflows, agents, or long-context summarization.
-
Best for teams with a clear production roadmap: Because once your benchmarks look good, you can seamlessly:
- Move hot paths to Dedicated Model Inference for predictable, low-latency SLAs.
- Push offline workloads to Batch Inference for up to 50% cost savings.
- Scale to GPU Clusters from 8 to 4,000+ GPUs without dealing with provisioning.
Limitations & Considerations
-
Exact minimum purchase and free-credit amounts can change:
together.ai uses a self-serve pricing model with transparent, usage-based charges, but the exact minimum credit top-up and free-credit grant can vary over time or by region. Always confirm the current minimum purchase and unit pricing in your dashboard or on the pricing and contact-sales pages. -
Plan selection depends on workload pattern:
The “right” way to buy usage is less about nominal minimums and more about traffic shape:- Spike-heavy workloads → Serverless Inference (pay-per-use, no reservation).
- Massive, offline workloads → Batch Inference (cheaper per token).
- Steady, latency-sensitive workloads → Dedicated Model Inference / Dedicated Container Inference (reserved capacity and stronger SLOs). For complex fleets, it’s often worth talking to Sales early to align unit economics with your scaling plan.
Pricing & Plans
together.ai is designed so you can start with free credits, then add paid usage as you prove out your product’s economics:
-
Free Trial / Self-Serve Credits:
- New accounts receive free credits once you sign up and verify your email.
- You can immediately:
- Use Together Sandbox (fast cold-starts, 2.7s P95, and 500ms P95 snapshot resumes).
- Call text, image, code, and other models via the OpenAI-compatible API.
- Benchmark up to 2.75x faster inference versus other providers.
-
Usage-Based Plans (Serverless & Batch):
- You pay per unit of usage (tokens, images, etc.).
- No long-term commitment; you can ramp up or down as needed.
- Batch Inference can process up to 30B tokens asynchronously at up to 50% less cost, ideal for large-scale classification, summarization, and synthetic data.
For larger teams and production workloads, you’ll typically consider:
- Dedicated Inference / Enterprise Plan: Best for teams needing:
- Predictable or steady traffic.
- Latency-sensitive applications (e.g., interactive agents, real-time UX).
- High-throughput production workloads with strict budget targets.
To finalize pricing details, custom minimums, and enterprise terms, use the Contact Sales flow.
-
Serverless & Batch (Self-Serve): Best for builders who:
- Want to start immediately with free credits.
- Are experimenting or have variable traffic patterns.
- Prefer pay-as-you-go with no reservations.
-
Dedicated & GPU Clusters (Sales-Assisted): Best for organizations that:
- Need reserved capacity, guaranteed SLOs, and lower per-unit cost at scale.
- Want to run custom containers or fine-tune models they control.
- Expect to scale training or inference to thousands of GPUs.
Frequently Asked Questions
Do I need to buy credits before I can call the together.ai API?
Short Answer: No. New together.ai accounts typically receive free credits so you can start using the API without an upfront purchase.
Details:
Once you register and verify your together.ai account, you’ll see an API key in your dashboard and a free-credit balance (the exact amount may change over time). You can:
- Call models via the Together Sandbox or directly through the OpenAI-compatible API.
- Explore features like long-context inference, multimodal models, and Batch Inference.
- Measure latency, throughput, and cost for representative workloads.
You only need to add billing (buy credits or move to a plan) once you approach the free-credit limit or want to unlock higher volume and production-grade SLOs.
What’s the minimum amount I have to spend to use together.ai in production?
Short Answer: There’s no fixed minimum to start using together.ai; you begin with free credits and then move into usage-based billing or a custom enterprise plan once your traffic justifies it.
Details:
For smaller or early-stage workloads:
- You can often stay on self-serve, usage-based pricing for Serverless and Batch Inference with no contractual minimums.
- Your effective “minimum purchase” is just your actual usage beyond free credits.
For larger, production workloads:
- Dedicated Model Inference, Dedicated Container Inference, and GPU Clusters are typically structured via Sales, where minimums (if any) are tied to reserved capacity and SLA requirements.
- This is where you can negotiate economics that match your traffic, often achieving:
- 2x or better latency improvements vs prior setups.
- Meaningful cost reductions (e.g., Salesforce AI Research saw costs cut “by approximately a third”).
The most accurate way to understand current minimums is to contact Sales with your projected monthly tokens, QPS targets, and latency constraints.
Summary
Signing up for together.ai and funding usage is intentionally low-friction:
- Create a free account and verify your email.
- Get free credits and an API key to start calling the OpenAI-compatible API immediately.
- Benchmark your real workloads—RAG, agents, long-context summarization, or multimodal—on an AI Native Cloud that delivers up to 2.75x faster inference and up to 50% lower batch cost.
- As you move from POC to production, add billing and choose the right mode—Serverless Inference, Batch Inference, Dedicated Model/Container Inference, or GPU Clusters—without re-architecting your code.
You start with no minimum purchase, then scale into the plan and capacity level that matches your traffic, performance SLOs, and unit economics targets.