
How do we use Tonic Structural to subset a huge database for local dev without breaking relationships?
Most teams hit the same wall the first time they try to bring a truly large production database down to a laptop: you either copy too much and overwhelm local resources, or you clip the data and quietly break foreign keys, business rules, and edge cases. Tonic Structural exists to solve exactly that tension—give developers a production-shaped subset that still behaves like the real thing, without dragging sensitive data or petabytes of rows into lower environments.
Quick Answer: Use Tonic Structural’s patented, graph-based subsetter to define the slice of production you want, then let it automatically pull all related records across tables while keeping referential integrity intact. The result is a smaller, fully consistent dataset that runs locally, preserves real-world behavior, and strips out sensitive information.
The Quick Overview
- What It Is: Tonic Structural is a production data transformation engine that subsets, de-identifies, and optionally synthesizes structured and semi-structured data while preserving relationships, formats, and distributions.
- Who It Is For: Engineering, QA, data, and AI teams who need realistic, production-shaped data for local dev, CI environments, and automated testing—but can’t safely copy raw production into those environments.
- Core Problem Solved: It lets you shrink huge production databases down to laptop-friendly, privacy-safe subsets without breaking foreign keys, cross-table logic, or critical edge cases.
How It Works
At its core, Structural connects to your production database, builds a graph of how your tables relate, and then uses that graph to generate subsets that maintain referential integrity end-to-end. Instead of hard-coding JOIN logic or hand-curating CSVs, you define the entry points and size constraints; Structural traces dependencies through the graph, pulls the right rows, and applies masking or synthesis in one pass.
-
Map & Classify Your Schema:
Structural ingests your schema, infers relationships, and lets you validate or add foreign-key and dependency rules. It also scans and tags sensitive columns (PII/PHI/etc.) so you can decide how to transform them. -
Design the Subset Strategy:
You choose subset “anchors” (like a set of customers, accounts, or projects), define how far to traverse the graph (depth and direction of relationships), and set size or percentage targets. Structural’s patented subsetter then computes the exact rows needed across tables to keep the dataset coherent. -
Generate, Transform, and Deliver:
Structural executes the subset plan inside your environment, applies de-identification and synthesis transforms, and writes out a fully relational, smaller database for local dev or staging. You can schedule and version this process so developers always have a consistent, refreshable dataset.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Graph-Based Subsetting with Referential Integrity | Builds a dependency graph across tables and selects records by traversing relationships, not just applying raw filters. | Local datasets shrink by orders of magnitude while foreign keys, joins, and app logic still work as if you were hitting production. |
| Built-In De-Identification & Synthesis | Applies deterministic masking, format-preserving transforms, or full synthesis to sensitive fields while preserving distributions and formats. | You can use production-shaped data for dev and QA without exposing real identities or violating compliance. |
| Repeatable, Configurable Pipelines | Saves subset rules, sensitivity policies, and transforms as reusable configs, with automation hooks for CI/CD. | Developers get consistent, refreshable local databases without waiting days for one-off exports or manual scripts. |
Ideal Use Cases
- Best for massive OLTP/OLAP databases powering web or mobile apps: Because it automatically preserves cross-table consistency—user → orders → payments → logs—while giving each developer a smaller copy they can run locally.
- Best for regulated teams needing realistic data without compliance risk: Because Structural transforms sensitive production data into safe, high-fidelity test data and supports HIPAA-, GDPR-, and SOC 2–aligned workflows.
Limitations & Considerations
- Subsetting still requires an initial schema understanding: You’ll get more out of Structural if you can identify key anchor entities and critical workflows; plan to spend a bit of time aligning the subset design with how your app actually behaves.
- Local resource constraints still matter: Structural can shrink an 8 PB environment down to gigabytes, but you still need to calibrate subset size to fit your local dev machines and test suite performance.
Pricing & Plans
Tonic Structural is licensed for organizations that need to operationalize safe, realistic test data at scale across multiple environments and teams. Pricing typically reflects database footprint, deployment model (Tonic Cloud or self-hosted), and feature needs (e.g., enterprise SSO/SAML, advanced governance).
- Team / Growth Plan: Best for engineering orgs and QA teams needing to hydrate dev/staging with realistic subsets from a few core production databases.
- Enterprise Plan: Best for larger organizations with multiple regulated data stores, complex compliance requirements, and a need to standardize test data provisioning across dozens or hundreds of services.
(For exact pricing, Tonic will scope to your estate and usage pattern.)
Frequently Asked Questions
How do we actually configure a subset in Structural so it doesn’t break relationships?
Short Answer: You define one or more anchor tables and rules, and Structural’s graph-based subsetter automatically pulls in all related rows while maintaining referential integrity across the schema.
Details:
The operational pattern looks like this:
-
Connect and discover:
- Point Structural at your production database (e.g., Postgres, MySQL, SQL Server, Snowflake).
- Structural analyzes the schema, ingesting declared foreign keys and inferring additional relationships from naming and usage patterns.
-
Validate relationships:
- In the UI, confirm or adjust inferred relationships. You can add logical dependencies even when there’s no explicit FK (e.g.,
orders.account_iddepends onaccounts.id). - Structural builds a graph view that shows how tables connect—this is the backbone of the subsetter.
- In the UI, confirm or adjust inferred relationships. You can add logical dependencies even when there’s no explicit FK (e.g.,
-
Pick anchors for local dev:
For local development, you typically anchor on entities that map to “what a developer actually pokes” in the app—e.g.:- A cohort of customers or tenants
- A subset of projects, repos, or accounts
- A specific region, feature flag, or product line
You can select those anchors via filters (e.g., 1% of customers, region = ‘US’, or a list of IDs) or via sampling strategies.
-
Define traversal rules:
- Tell Structural how far to traverse the graph from those anchors (e.g., pull all related orders, payments, line items, events).
- Configure direction: downstream (child records), upstream (parents), or both.
- Set caps to avoid runaway expansion (e.g., maximum depth, max rows per table, exclusion rules for low-value/high-volume logs).
-
Simulate and inspect:
- Structural can simulate the subset and show you resulting row counts by table.
- You can inspect sampled records to confirm that the subset captures realistic user journeys and critical edge cases.
-
Generate with transforms:
- Once the subset looks right, you apply masking/synthesis policies (deterministic masking, format-preserving encryption, etc.) to sensitive columns.
- Structural then executes the subset plan, writing out to a target database or files that your local environment can ingest.
Because the subsetter operates on the schema graph, not ad-hoc WHERE clauses, you avoid the usual failure modes: orphaned records, broken foreign keys, or missing IDs that cause app errors as soon as a developer logs in.
How small can we go for local dev before we lose realism?
Short Answer: You can shrink a huge production database down to a few gigabytes for local dev, as long as you anchor on realistic entities and preserve the relationship graph; Structural’s subsetting keeps behavior intact even at small scales.
Details:
What matters for local dev isn’t raw row count—it’s whether the data exercises the flows your code depends on:
-
Preserve variety, not just volume:
Instead of “give me 1% of every table,” you can say “give me 500 customers with a mix of active/inactive statuses, churned accounts, high-volume tenants, and at least one with edge-case data (very long names, weird encodings, etc.).” Structural pulls the full relational footprint for those customers. -
Target critical workflows:
For example, if you’re debugging payments, you anchor on accounts with a variety of payment methods and failure modes. Structural then pulls all related invoices, transactions, chargebacks, and audit logs. -
Use subsetting plus synthesis:
When certain tables explode (clickstream, logs, events), you can subset more aggressively and optionally synthesize additional rows that match statistical properties without dragging the entire firehose down to a laptop.
Customers routinely report cutting multi-terabyte environments to datasets that are hundreds or thousands of times smaller, while still uncovering defects that were previously escaping into production because their old test data was too toy-like.
Summary
Subsetting a huge production database for local dev is not just a storage problem—it’s a relationship problem. If you cut the wrong way, your app stops working, your tests become meaningless, and your “quick fix” turns into a maintenance nightmare of scripts and manual exports.
Tonic Structural solves this by treating subsetting as a graph problem: model the relationships, define anchors that map to real user journeys, traverse intelligently, and transform sensitive data as you go. You end up with a small, fast, and safe dataset that still looks and behaves like production—so developers can move faster without opening new breach surfaces or violating compliance.