together.ai vs DeepInfra: SOC 2 Type II, data retention, and enterprise security review—what’s different?

Most AI teams don’t get burned by model quality—they get burned by unclear security posture, surprise data retention, and “we’re working on SOC 2” answers when procurement starts asking hard questions. Evaluating together.ai vs DeepInfra on SOC 2 Type II, data retention, and enterprise security is ultimately about one thing: can you put sensitive workloads into production without adding risk, and can you prove it to security and legal?

This breakdown focuses on what’s verifiable today for together.ai, what typically matters in enterprise reviews, and how to think about DeepInfra if you’re comparing the two for regulated or high-sensitivity use cases.

Quick Comparison: together.ai vs DeepInfra on enterprise security

Because I don’t have DeepInfra’s internal documentation, I’ll treat their public positioning as “a modern AI inference provider” and focus on how together.ai’s AI Native Cloud is designed for teams that need strong guarantees.

Snapshot: what’s explicitly true for together.ai

SOC 2 Type II: together.ai is AICPA SOC 2 Type II audited.
Data ownership: Your data and models remain fully under your ownership.
Data privacy: Strict data privacy controls, encryption in transit and at rest.
Data residency: Storage can be deployed in regions matching your requirements—North America, Europe, or Asia/Middle East.
Enterprise proof points: Used in production by enterprises (e.g., Salesforce AI Research) with:
- ~33% cost savings
- 2x latency reduction
- Up to 6× cost reduction and <400ms P95 latency in some deployments
Security posture: NVIDIA preferred partner; SOC 2 Type II; tenant-level isolation and production-grade SLOs for inference.

Typical gaps you’ll want to verify with DeepInfra

You should explicitly ask DeepInfra (and validate with documentation):

Are you SOC 2 Type II audited today, or “in progress”?
What are your default data retention windows for:
- Request/response logs
- Model training / fine-tuning
- Metrics / traces
Can you opt out of data retention or enforce “no training on my data” at the account or project level?
Do you support regional data residency and can you guarantee data stays in-region for logs, storage, and backups?
What is your encryption posture (in transit, at rest, key management)?
Do you provide tenant-level isolation for Dedicated endpoints, or is isolation only logical within shared clusters?

The Quick Overview

What It Is: A side-by-side review of together.ai and a typical inference provider like DeepInfra on SOC 2 Type II, data retention, and core enterprise security controls.
Who It Is For: Security teams, infra leads, and AI platform owners deciding where to run production LLMs—especially for regulated or high-sensitivity workloads.
Core Problem Solved: Reducing risk when moving from prototype to production by choosing an AI Native Cloud with clear, defensible security and data-handling guarantees.

How together.ai’s security model is structured

Together’s AI Native Cloud is designed so that the same platform you use for experimentation (Together Sandbox) can carry you into high-scale production—without switching providers or relaxing security requirements.

At a high level:

Control where and how you run
- Use Serverless Inference for variable or bursty traffic.
- Use Dedicated Model Inference or Dedicated Container Inference for steady, predictable workloads that need stricter isolation and controls.
- Use GPU Clusters when you want full control (Kubernetes/Slurm style) but still benefit from Together’s kernels and runtimes.
Keep ownership of data and models end-to-end
- Together explicitly commits that your data and models remain fully under your ownership, with strict privacy controls applied across modes (inference, batch, fine-tuning, storage).
- No hidden “we’ll train on your prompts by default” clauses—the platform is aimed at enterprises who can’t accept that model.
Back guarantees with audited controls (SOC 2 Type II)
- SOC 2 Type II covers not just policy but how controls actually operate over time.
- For AI teams, this matters for access control, logging, system changes, incident response, and data protection—not just encryption checkboxes.

From an infra engineer’s perspective: you’re not forced to choose between “fast, cheap, and insecure” vs “slow, expensive, and compliant.” Together’s research-to-production stack (FlashAttention, ATLAS, CPD, Together Kernel Collection) targets both price-performance and enterprise-grade security together.

SOC 2 Type II: what’s different in practice?

together.ai: SOC 2 Type II in place

Together is formally listed as:

AICPA SOC 2 Type II
NVIDIA preferred partner

And it pairs that with production proof:

2–3x cost savings in real deployments
Up to 6× cost reduction and <400ms P95 latency for some customers
~33% cost savings and 2x latency reduction reported by Salesforce AI Research

For a security review, this means:

Audited security controls: Access management, change control, monitoring, and incident handling are not ad hoc—they’re evaluated and documented.
Repeatable process: You can share an SOC 2 report with internal audit and compliance, reducing back-and-forth during vendor onboarding.
Baseline for regulated industries: Combined with data residency and retention controls, this is the minimum bar many healthcare/financial services teams require.

DeepInfra: questions you should ask

If DeepInfra doesn’t clearly state SOC 2 Type II, treat it as unknown until confirmed. In practice:

No SOC 2 Type II doesn’t automatically mean “insecure,” but it does mean security controls are not independently audited.
For procurement, lack of SOC 2 often translates into:
- Longer security questionnaires
- Custom addenda to MSAs and DPAs
- Additional controls you must implement internally.

Actionable checklist for DeepInfra:

Do you have SOC 2 Type II? If not, what third-party assurance exists?
Can you provide security whitepapers and data flow diagrams for inference, logging, and storage?
What is your timeline if SOC 2 Type II is “in progress”?

If your org is already SOC 2-heavy, together.ai’s existing Type II certification typically reduces friction in vendor approval.

Data retention and data use: how the two approaches differ

together.ai: “Your data and models remain fully under your ownership”

Together’s stance is explicit:

Data ownership: Your data and models remain fully under your ownership.
Data protection: Strict data privacy controls, encryption in transit and at rest.
Data residency: Deploy storage in regions matching your data residency requirements—North America, Europe, or Asia/Middle East.

Practically, this drives:

Clear separation between:
- Inference payloads (prompts/outputs)
- Fine-tuning datasets
- Persistent AI-native storage and logs
Enterprise-grade logging for debugging and observability without silently using data for global model training.

Because together.ai focuses on open and partner models rather than a single closed model, there’s no “we silently improve our base model using all your prompts” incentive. Model shaping is something you control, not something done to you.

DeepInfra: what to pin down in contracts

Public model-hosting platforms often default to:

Short-term request logging for debugging and rate limiting.
Variable policies on using logs to improve models, especially for “free” or hobby tiers.
Less explicit statements about regional data residency and log deletion windows.

For DeepInfra, ask:

Are prompts and outputs stored by default? If yes:
- For how long?
- For what purposes (debugging only vs model improvement)?
Can I opt out of any data use for training or evaluation?
What is the maximum retention period for:
- Logs
- Backups
- Monitoring traces
Can you provide binding commitments on data residency and deletion in the DPA?

If your workloads involve PII, PHI, or confidential customer data, these answers matter as much as model quality.

Enterprise security controls: how together.ai is built for production

Core security posture at together.ai

While the AI Native Cloud is designed for speed and economics (e.g., up to 2.75x faster inference, 50% lower batch cost for large jobs), the security layer is not an afterthought:

Encryption in transit and at rest across inference, storage, and control planes.
Tenant-level isolation for Dedicated Inference and GPU Clusters—your workloads are not co-mingled with other tenants at the runtime level.
Regional storage options to match data residency requirements (North America, Europe, Asia/Middle East).
SOC 2 Type II and “strict data privacy controls” codified into the platform.

This applies across deployment modes:

Serverless Inference (real-time / variable traffic)
- Best when you care about no infrastructure management and no long-term commitments.
- Still inherits SOC 2 Type II controls and encryption defaults.
- Good fit for prompts that don’t contain the highest-sensitivity data but still need compliance-ready handling.
Batch Inference (large offline jobs)
- Designed to scale to 30 billion tokens with up to 50% less cost.
- Security considerations similar to serverless, but with large dataset handling and data residency becoming more important.
Dedicated Model Inference / Dedicated Container Inference
- Best when you need tenant-level isolation, strict SLOs, and predictable performance.
- Ideal for workloads with customer PII, financial data, or proprietary documents.
- You can deploy endpoints in minutes without shipping artifacts off-platform if you trained on Together’s Accelerated Compute.
GPU Clusters
- For teams that want Kubernetes or Slurm-style control with Together’s performance kernels.
- Best when you bring your own orchestration and need strong isolation plus economic advantages.

How this compares conceptually with DeepInfra

DeepInfra offers model hosting and inference; the key questions are:

Do they provide clear isolation guarantees (e.g., per-customer GPU allocation for steady workloads)?
Are there Dedicated endpoint options with stronger boundaries than generic shared serverless hosting?
How are keys and credentials managed, rotated, and audited?

If DeepInfra is primarily optimized for “host many open-source models quickly,” you’ll want explicit documentation that their security controls scale with enterprise expectations, not just with model variety.

Features & Benefits Breakdown

From a security and compliance perspective, here’s how together.ai’s platform features translate into concrete benefits versus a generic inference provider like DeepInfra.

Core Feature	What It Does	Primary Benefit
SOC 2 Type II & audited controls	Formal third-party audit of security, availability, and process controls.	Easier vendor approval, less friction with security and compliance teams.
Data ownership & privacy guarantees	Ensures your data and models remain fully under your ownership with strict privacy controls.	Reduces legal and regulatory risk; no surprise training-on-your-data behavior.
Regional data residency & encryption	Lets you deploy storage in specific regions with encryption in transit and at rest.	Supports GDPR/industry requirements; simplifies cross-border data flow reviews.
Tenant-level isolation options	Dedicated Model/Container Inference and GPU Clusters isolate workloads at the tenant/runtime level.	Better blast-radius control for sensitive data and predictable performance for production workloads.

Ideal use cases

Best for regulated or security-sensitive workloads:
Because together.ai combines SOC 2 Type II, data residency, encryption, and isolation options with high performance, it’s well suited for:
- Financial services
- Healthcare / life sciences (with HIPAA-aligned options)
- Enterprise SaaS handling customer PII
Best for teams standardizing on an OpenAI-compatible gateway:
Because together.ai exposes an OpenAI-compatible API and lets you pick Serverless vs Dedicated vs GPU Clusters, you can:
- Migrate from multi-provider chaos to a single AI Native Cloud
- Keep sensitive workloads on Dedicated endpoints
- Use more flexible modes for lower-risk use cases, all under one security/compliance umbrella.

For DeepInfra, the best fit is typically:

Lightweight or experimental workloads where:
- Security requirements are lower.
- You prioritize quick model access over audited controls and granular residency guarantees.

Limitations & considerations

together.ai documentation granularity:
While the high-level commitments are clear—SOC 2 Type II, data ownership, encryption, residency—you’ll still want the SOC 2 report, DPA, and security whitepaper for formal review. That’s standard for any enterprise vendor.
DeepInfra specifics may vary over time:
If DeepInfra evolves its security posture (e.g., adds SOC 2 Type II, clearer retention policies), the comparison changes. Always rely on current documentation and signed agreements, not assumptions.

Pricing & plans (security lens)

Together’s pricing is built around deployment modes rather than security “tiers,” which is usually what security teams prefer—you get consistent controls across modes.

Serverless Inference:
Best for teams needing no infrastructure to manage, no commitments, and good security defaults for workloads with moderate sensitivity. The OpenAI-compatible API makes it easy to switch without major code changes.
Dedicated Inference & GPU Clusters:
Best for teams needing tenant-level isolation, strict SLOs, and the ability to tune infrastructure to traffic patterns and latency targets—especially when running sensitive or regulated workloads at scale.

For DeepInfra, clarify:

Whether stronger isolation or compliance features require a different pricing tier or custom enterprise plan.
How costs scale when you move from hobby/experimental workloads to production with strict SLAs and security requirements.

Frequently Asked Questions

Does together.ai store or train on my prompts by default?

Short Answer: together.ai’s positioning is that your data and models remain fully under your ownership, with strict data privacy controls. There’s no default “we own your data” stance.

Details:
Enterprise customers typically sign a DPA that clarifies:

Data is processed solely to provide the service (inference, training you explicitly request, storage you configure).
Together does not take ownership of your prompts, outputs, or models.
Any use of data for system improvement is governed by contractual terms, not buried defaults.

Always confirm the exact wording in your DPA, but the baseline is: you retain ownership, and data privacy is a first-class concern.

How do I compare together.ai vs DeepInfra for a security review?

Short Answer: Use a structured checklist: SOC 2 Type II status, data retention, data use for training, encryption, data residency, and tenant isolation. together.ai will check many of these boxes out of the gate.

Details:
For each provider, capture written answers (and references) to:

Certifications: SOC 2 Type II? Any other relevant attestations?
Data retention: Default retention for logs, prompts, outputs, backups; options to shorten or disable.
Data use: Whether any data is used to train global models; opt-out controls.
Encryption: Transport (TLS), storage encryption, key management practices.
Residency: Region selection for storage and logs; guarantees that data stays in-region.
Isolation: Options for dedicated endpoints or clusters; tenant separation model.
Incident handling: How incidents are detected, communicated, and resolved.

In practice, teams often find together.ai simplifies this process because SOC 2 Type II, data ownership guarantees, and regional options are already baked into the AI Native Cloud.

Summary

When you evaluate together.ai vs DeepInfra on SOC 2 Type II, data retention, and enterprise security, the main difference is maturity of the security and compliance story. Together’s AI Native Cloud combines:

SOC 2 Type II and strict data privacy controls
Clear data ownership and data residency options
Encryption in transit and at rest
Tenant-level isolation via Dedicated Inference and GPU Clusters

all on top of a performance-optimized stack that’s already delivering up to 6× cost reduction and <400ms P95 latencies in production.

DeepInfra can be a good fit for lighter-weight or experimental workloads, but for enterprise teams with security and compliance as gatekeepers—not suggestions—together.ai’s combination of audited controls, data guarantees, and flexible deployment modes is usually the safer and more scalable choice.

Next Step

Get Started