
How do I connect Tonic Structural to our Postgres/MySQL database and run a first de-identified dataset?
Most engineering teams hit the same wall: they need production-like data in Postgres or MySQL for dev and QA, but they can’t afford the risk of cloning raw production into every lower environment. Tonic Structural exists to break that tradeoff—letting you connect directly to your live databases, de-identify the sensitive bits, and hydrate staging with high-fidelity, referentially intact test data.
Quick Answer: You connect Tonic Structural to Postgres/MySQL using a read-access connection, let it automatically profile your schema, choose de-identification transforms for sensitive columns, and then write the output into a separate test schema or database. In one run, you go from risky production clones to a safe, production-shaped dataset your apps and tests can actually use.
The Quick Overview
- What It Is: A structured and semi-structured data de-identification and synthesis engine that connects directly to your Postgres/MySQL databases, preserves referential integrity, and outputs test-ready datasets.
- Who It Is For: Dev, QA, data, and security teams that need realistic staging/local data without copying raw PII/PHI out of production.
- Core Problem Solved: Turning production databases into safe, production-like test datasets without DIY scripts, broken foreign keys, or weeks of approvals.
How It Works
At a high level, you point Tonic Structural at your source database (Postgres or MySQL), define where the de-identified data should land (usually a separate schema or a dedicated test database), and let Structural handle the heavy lifting: schema discovery, sensitivity detection, transform application, and writing the masked/synthetic data back out with referential integrity intact.
Here’s the flow you’ll follow for a first run:
- Connect & Profile: Configure a secure connection to your Postgres/MySQL instance, select the source schema, and let Structural automatically scan and map out tables, relationships, and basic statistics.
- Define Destination & Transforms: Choose where the de-identified data should be written (test schema or database), then apply recommended or custom transforms to sensitive columns while preserving structure and relationships.
- Generate & Validate: Run the job to create your first de-identified dataset, validate that your applications and tests run as expected, and iterate on transforms as needed.
Step 1: Prepare access to your Postgres/MySQL database
Before you plug anything into Tonic Structural, set up database access with the right blast radius:
- Create a dedicated DB user for Structural with:
- Read access to the source schema(s) you want to de-identify.
- Write access to the target schema or database where the de-identified data will land.
- Avoid superuser/admin roles. Structural doesn’t need them; minimal privilege keeps your security team happy.
- Network access: Ensure the Structural deployment (Docker, Kubernetes, or cloud) can reach your Postgres/MySQL host and port over TLS if available.
Typical permission pattern:
- Source
production_db:SELECTon relevant schemas/tables.
- Target
test_dborprod_sanitized_schema:CREATE,INSERT,UPDATE,DELETE,ALTERas needed for writing de-identified data.
This ensures Structural can fully hydrate the target without ever mutating your production data.
Step 2: Connect Tonic Structural to Postgres/MySQL
Once your connectivity and permissions are in place, you connect Structural via its UI (or API) using native connectors.
In the Structural UI:
- Add a new data source.
- Choose PostgreSQL or MySQL as the database type.
- Provide:
- Hostname (or IP)
- Port
- Database name
- Username and password for the restricted Structural user
- SSL/TLS settings if required
- Test the connection to validate network and credentials.
- Save the data source.
Structural will now be able to:
- Automatically scan your schema.
- Track tables, relationships, and column statistics.
- Power subsetting and cross-table consistency.
Step 3: Let Structural profile your schema
Manual profiling is where most DIY masking projects die—hours of SQL for cardinality, foreign key inference, and schema archaeology. Structural automates this.
After connecting:
- Select the source schema(s) you want to work with.
- Trigger a schema scan:
- Structural captures:
- Table structure and data types.
- Explicit foreign keys.
- Inferred relationships where possible.
- Basic statistics (distributions, cardinality) for smarter transforms.
- Structural captures:
- Review:
- The schema graph to see how tables tie together.
- Detected relationships that will drive referential integrity during generation.
This is the foundation that keeps your joins, app logic, and reports working on the de-identified dataset.
Step 4: Configure the destination for de-identified data
Next you decide where your de-identified Postgres/MySQL dataset will live. Structural is built to write directly to target databases via native connectors, so you avoid manual export/import steps.
Common patterns:
- Postgres → Postgres (same cluster):
- Source:
production_db.public - Target:
test_db.publicorproduction_db.sanitized
- Source:
- MySQL → MySQL (same server or replica):
- Source:
prod_db - Target:
prod_db_sanitizedor a dedicated test schema
- Source:
In the Structural project:
- Select your source connection (Postgres/MySQL).
- Add a destination connection:
- Often another Postgres/MySQL database or schema with write permissions only.
- Define mapping:
- Source schema → Target schema.
- Decide whether Structural creates tables or writes into pre-created ones.
You can also configure Structural to export to:
- Files (CSV, SQL dumps) for one-off pipelines.
- Container images to drop ready-made datasets into containerized CI/CD workflows.
For a first run, writing back into a dedicated test database is usually simplest.
Step 5: Identify sensitive data and choose transforms
This is where speed and safety intersect. You want to strip PII/PHI while preserving the utility your tests depend on.
In Structural:
- Run sensitivity detection (if enabled):
- Structural can flag likely sensitive columns (names, emails, phone numbers, addresses, IDs).
- Review tables and columns:
- Classify them as:
- Sensitive: must be de-identified (PII/PHI, secrets).
- Quasi-identifiers: may need transformation to avoid re-identification.
- Non-sensitive: safe to leave as-is (e.g., flags, statuses, non-identifying metrics).
- Classify them as:
- Apply transforms per column:
- Options typically include:
- Deterministic masking for values that must remain linkable across tables (e.g., user IDs, emails).
- Format-preserving encryption for structured identifiers (e.g., credit card-like fields).
- Randomization/synthesis to generate realistic but fake values while preserving distributions.
- Nulling or generalization for fields you don’t need at all.
- Options typically include:
- Ensure cross-table consistency:
- For keys referenced in multiple tables, configure consistent transforms so foreign-key joins still work.
- Structural’s understanding of relationships ensures that referential integrity and cross-table consistency are preserved in the generated dataset.
Remember the goal: the data behaves like production—constraints hold, distributions look right—without representing real customers.
Step 6: Run your first de-identified dataset generation
With source, destination, and transforms configured, you’re ready for the first full run.
- Dry run / preview (optional):
- Sample a subset of data to validate that transforms look right.
- Check a few high-value tables for realism and uniqueness where it matters.
- Execute the job:
- Structural reads from your Postgres/MySQL source.
- Applies configured de-identification and synthesis transforms.
- Writes out to your target schema or database with:
- Referential integrity preserved.
- Relationships intact.
- Schema mirrored.
- Monitor progress:
- Use the Structural UI or logs to track status and any errors.
- If a table or transform fails, Structural surfaces the issue so you can adjust.
At the end of this run, you’ve created your first sanitized, production-shaped dataset ready for dev, QA, demos, or analytics.
Step 7: Validate against real workflows
The test of any de-identified dataset isn’t how clever the transforms are—it’s whether your workflows run without surprises.
Once Structural has hydrated your target database:
- Point your application (dev or staging config) at the de-identified Postgres/MySQL database.
- Run:
- Regression test suites.
- Critical user journeys (signup, purchase, workflow completion).
- Key reports and analytics queries.
- Watch for:
- Broken foreign keys or failed joins.
- UI errors driven by unexpected nulls or invalid formats.
- Performance cliff-dives due to radically changed distributions.
When teams switch from DIY masking to Structural, they routinely see:
- Fewer escaped defects because test data mirrors production complexity.
- Faster environment refresh cycles (Patterson, for example, generated test data 75% faster and boosted developer productivity by 25%).
If something looks off, iterate your transforms and rerun—Structural is designed for this loop.
Features & Benefits Breakdown
| Core Feature | What It Does | Primary Benefit |
|---|---|---|
| Native Postgres/MySQL connectors | Connects directly to your databases with read/write access where needed. | Eliminates brittle export/import cycles; integrates cleanly into existing DB workflows. |
| Automatic schema & relationship profiling | Scans schema, statistics, and foreign key relationships in minutes. | Preserves referential integrity so your apps and tests work on de-identified data. |
| Configurable de-identification & synthesis | Applies column-level transforms while preserving structure and distributions. | Delivers high-fidelity, production-like datasets without exposing real identities. |
Ideal Use Cases
- Best for hydrating dev and staging with safe data: Because Structural keeps foreign keys working and preserves statistical properties, your Postgres/MySQL-backed services behave like they do in production—without the risk of cloning PII into every lower env.
- Best for CI/CD and containerized test workflows: Because Structural can write directly to target databases or export container images, you can bake de-identified datasets into automated pipelines and ephemeral test environments.
Limitations & Considerations
- Initial schema and transform design takes thought: Structural automates profiling, but you still need to make explicit decisions about what’s sensitive and how to treat it. Plan a first pass plus at least one iteration after test validation.
- Not a replacement for DB-level access controls: Structural drastically reduces exposure by avoiding raw production clones in lower envs, but you still need role-based access control, encryption, and audit practices on the target Postgres/MySQL instances.
Pricing & Plans
Tonic Structural is licensed on an annual basis with volume-based pricing, tailored to the size and complexity of your data estate and deployment model (Tonic Cloud or self-hosted on Docker/Kubernetes across AWS, GCP, or Azure).
Typical patterns:
- Team/Department Plan: Best for product and QA teams needing to hydrate a handful of Postgres/MySQL-backed applications with safe test data on a regular refresh cadence.
- Enterprise Plan: Best for organizations with multiple regulated workloads, needing broad database coverage, SSO/SAML, strict compliance (SOC 2 Type II, HIPAA, GDPR), and deep CI/CD integrations.
For precise sizing and pricing, Tonic routes teams through a short discovery to match licensing to your environment footprint.
Frequently Asked Questions
Do we need to copy our production Postgres/MySQL data out first?
Short Answer: No. Structural connects directly to your source database, reads in place, and writes de-identified data to your chosen target.
Details: The recommended pattern is to grant Structural read access to your production database and write access only to the test/staging target. Structural uses native connectors to pull from Postgres/MySQL, apply transforms, and hydrate the destination. This avoids ad-hoc dumps scattered across laptops and file shares—exactly the kind of unofficial copies that expand your breach surface.
Will de-identification break our foreign keys or application logic?
Short Answer: No, not when you use Structural’s relationship-aware transforms.
Details: Because Structural profiles schema and foreign key relationships upfront, it understands how your tables connect. When you apply transforms, Structural maintains cross-table consistency, ensuring that keys and referenced values stay in sync. Deterministic transforms and synthesis are applied in a way that keeps joins valid, constraints satisfied, and distributions realistic—so your tests still exercise the same logic paths as production.
Summary
Connecting Tonic Structural to your Postgres or MySQL database and running a first de-identified dataset is a straightforward path to safer, faster shipping: configure a restricted DB user, point Structural at your source, let it profile relationships, define where the sanitized data should land, apply relationship-aware transforms, and execute the job. The result is a high-fidelity, referentially intact dataset that mirrors the complexity of production without exposing real customers—fueling dev, QA, and AI workflows with zero-regret test data.