How do we schedule automated refreshes with Tonic Structural for staging and QA environments?
Synthetic Test Data Platforms

How do we schedule automated refreshes with Tonic Structural for staging and QA environments?

11 min read

Keeping staging and QA environments in sync with production is a constant fight between speed and safety. You need fresh, realistic data to reproduce bugs and validate releases, but manual refreshes and ad‑hoc scripts slow everything down—and copying raw production data into lower environments creates privacy and compliance risk you can’t ignore.

This is exactly the workflow Tonic Structural is built to own: turn your production databases into high‑fidelity, de‑identified test data, then hydrate staging and QA on an automated schedule, without ever shipping raw PII downstream.

Quick Answer: You schedule automated refreshes with Tonic Structural by defining generation jobs for your staging/QA targets, then wiring those jobs into your scheduler of choice (e.g., Jenkins, GitHub Actions, cron, or your CI/CD platform) using Structural’s CLI, REST API, or SDK. Structural runs the full pipeline—subsetting, de‑identification, synthesis, and write‑out—as a repeatable job that can be triggered on a schedule or per‑build.


The Quick Overview

  • What It Is: A workflow for automatically refreshing staging and QA with de‑identified, production‑shaped data generated by Tonic Structural on a fixed schedule or as part of CI/CD.
  • Who It Is For: Engineering, QA, and DevOps teams who need reliable, production‑like data in non‑prod environments without exposing live customer information.
  • Core Problem Solved: Eliminates manual data refreshes and unsafe production clones, giving you environment‑aware, compliant test data that stays in lockstep with schema changes and releases.

How It Works

At a high level, you configure Structural once for your production source, define how the data should be transformed and subsetted for staging/QA, and then let your automation framework call Structural on a schedule. Structural handles the heavy lifting—pulling from your source database, applying deterministic masking or synthesis, preserving referential integrity, and writing directly into your staging and QA databases via native connectors.

The result: staging and QA are continuously hydrated with realistic, secure data that mirrors the complexity and shape of production, but contains no live identifiers.

  1. Define the generation job in Structural:
    Connect to production, classify sensitive data, configure de‑identification and subsetting rules, and choose your staging/QA destinations. Save this as a repeatable generation.
  2. Automate the trigger via your scheduler:
    Use Jenkins, GitHub Actions, GitLab CI, Azure DevOps, or cron to call Structural’s CLI/REST API on a cadence (nightly, weekly) or per‑build. Pass environment‑specific parameters (volume, destinations, policies) as needed.
  3. Monitor, audit, and iterate:
    Use Structural’s logs and audit trails to track each refresh, verify data volumes and transformation coverage, and adjust policies as your schema or testing requirements evolve.

Features & Benefits Breakdown

Core FeatureWhat It DoesPrimary Benefit
Repeatable generation jobsEncapsulates source connection, transforms, subsetting, and destination configuration into a reusable job.One‑click or automatically triggered refreshes; no fragile bespoke scripts to maintain.
Native write‑out to staging/QAWrites masked/synthetic data directly into target databases (e.g., MySQL, PostgreSQL, Snowflake) or test schemas.Staging and QA are always hydrated with production‑shaped data that preserves foreign keys and cross‑table consistency.
CI/CD and scheduler integrationHooks into Jenkins and other automation frameworks quickly via CLI, REST API, or SDK.Turn data refresh into a first‑class part of your release pipeline—nightly refresh, per‑branch databases, or pre‑deploy checks.

Step‑by‑Step: Scheduling Automated Refreshes with Tonic Structural

1. Model the “ideal” staging/QA dataset in Structural

Before you automate anything, you define what “good” looks like:

  • Connect Structural to production:
    Use native connectors for your source (e.g., PostgreSQL, MySQL, Snowflake, etc.).
  • Classify sensitive data:
    Let Structural detect PII/PHI using built‑in rules and, if needed, add custom sensitivity rules to catch domain‑specific columns.
  • Configure transformations:
    • Deterministic masking or format‑preserving encryption for identifiers.
    • Synthetic generation to protect high‑risk columns while preserving realistic distributions.
    • Cross‑table consistency and referential integrity, so joins and app logic still work.
  • Define subsetting for staging and QA:
    • Staging might need a larger, environment‑wide subset for end‑to‑end regression.
    • QA might use smaller, targeted slices for specific test suites.
      Structural’s subsetting keeps referential integrity intact, so you don’t end up with orphaned rows.

Save this configuration as a generation—that becomes the unit your scheduler will call.

2. Configure destinations for staging and QA

Structural lets you choose where the transformed data lands:

  • Direct database write‑out:
    • Point to your staging and QA databases or schemas.
    • Structural writes masked/synthetic data directly over native connections.
  • Isolated DBs for short‑lived QA:
    • For branch‑based or PR‑based testing, output to ephemeral databases.
    • Use Structural’s ability to export datasets as container images for containerized test workflows.
  • Environment‑aware policies:
    • Different volume caps per environment.
    • Different refresh cadences (e.g., staging nightly, QA weekly or per build).
    • Access controls so only the right teams can trigger or consume each dataset.

You can create separate generations for staging and QA, each with its own destination and policies, or parameterize a single template.

3. Tie Structural into your scheduler or CI/CD

Once your generation jobs are defined, you automate refreshes through your existing tooling. Common patterns:

Using Jenkins (or another CI server)

  1. Create a Jenkins job dedicated to test data refresh (e.g., staging-data-refresh).
  2. Call Structural’s CLI or REST API in your build steps:
    • Trigger a specific generation ID.
    • Pass environment parameters (e.g., destination connection, subset size).
  3. Schedule the job using Jenkins’ cron syntax:
    • Nightly at 2am, before daily regression runs.
    • Weekly, aligned with sprint boundaries.

Structural’s integration with Jenkins and other automation frameworks is straightforward—you’re essentially just telling Jenkins: “Run this Structural generation and wait until it finishes.”

Using GitHub Actions, GitLab CI, or Azure DevOps

  • Define a workflow that runs on a schedule (on: schedule) or on events (on: push, on: pull_request).
  • Add a step that calls Structural via:
    • Docker container (Structural running where your CI agent lives).
    • REST API request to your deployed Structural instance.
  • Use environment variables/secrets to inject credentials and generation IDs.

This pattern is powerful for:

  • Per‑branch or per‑PR databases:
    Combine Structural’s output‑to‑repos/containers with your CI to spin up isolated databases for each PR, run tests, then tear them down.
  • Pre‑deploy gating:
    Refresh staging as part of a release candidate pipeline, then run your full integration suite on fresh, realistic data.

Using cron or platform schedulers

If you’re not ready to couple with CI/CD, you can still automate:

  • A cron job that:
    • Calls Structural’s CLI on a set schedule.
    • Executes the staging and QA generations.
  • A cloud scheduler (e.g., AWS EventBridge, GCP Cloud Scheduler):
    • Triggers a lambda/function that calls Structural’s REST API.

The key principle: Structural is the engine that understands your schema, masking policies, and destinations; your scheduler just needs to call it on a rhythm.

4. Make refreshes environment‑aware

Real value comes from treating each environment as a first‑class citizen, not a clone:

  • Different volumes
    • Staging: larger subsets mirroring real load.
    • QA: smaller, faster datasets optimized for test cycle times.
  • Different policies
    • Tighter de‑identification in broader‑access environments.
    • Potentially more detailed synthetic data for internal‑only performance environments.
  • Different cadences
    • Staging nightly, QA weekly, performance on demand.
    • Short‑lived PR environments created and destroyed per pipeline run.

Structural supports this through policies that control volume, refresh cadence, and access—so you’re not maintaining N different script stacks.

5. Audit, verify, and iterate

Once the pipeline is live, you need confidence it’s doing what you think:

  • Audit trails in Structural
    • Export audit logs that track transformations applied to each dataset.
    • Use them to support governance reviews and compliance workflows.
  • Monitoring and alerts
    • Watch for schema change alerts so new sensitive columns don’t slip through unmasked.
    • Integrate logs into your existing monitoring stack.
  • Feedback from QA and developers
    • Confirm that test data still mirrors production complexity.
    • Add or adjust synthetic distributions when new edge cases appear in production.

Teams that treat this as an iterative pipeline—not a one‑time project—see the biggest payoff: fewer escaped defects and faster release velocity.


Features & Benefits Breakdown (Applied to Scheduled Refreshes)

Core FeatureWhat It DoesPrimary Benefit
Environment‑aware policiesDefine volume, cadence, and access rules per environment.Staging, QA, and ephemeral environments each get the right shape and size of data, without separate tooling.
Referentially intact subsettingPreserves foreign keys and cross‑table consistency in subsets.Your apps and tests behave like they do in production; no broken joins or orphaned data.
Schema change alertsDetects new or changed columns in the source schema that impact sensitivity.Protects you from silent regressions where new PII fields bypass masking and land in staging.

Ideal Use Cases

  • Best for nightly staging refreshes: Because it can automatically pull from production, de‑identify, subset, and write into staging before your regression suite runs—without you touching production data directly in non‑prod.
  • Best for CI‑driven QA environments: Because you can integrate Structural into Jenkins or your CI/CD pipeline to generate fresh, environment‑specific datasets per branch or build, enabling parallel test runs and short‑lived test databases.

Limitations & Considerations

  • Initial setup investment:
    You need to spend time upfront connecting sources/destinations, defining transforms, and modeling subsets. The payoff is repeatability and automated refreshes, but plan for a proper onboarding window instead of treating it like a “quick masking script.”
  • Infrastructure placement:
    Structural must be deployed where it can securely reach both your production sources and non‑prod targets (e.g., self‑hosted on Kubernetes, Docker, or cloud VMs). For strict network segmentation, coordinate with your infra team to align networking and access controls.

Pricing & Plans

Tonic’s pricing details aren’t exposed in a granular way here, but the structure generally aligns with:

  • Team / Mid‑market tiers:
    Best for product and QA teams who need reliable, de‑identified test data for a small number of core applications and environments, with CI/CD integration and standard support.
  • Enterprise tiers:
    Best for large organizations with multiple regulated datasets, complex environment topologies, and strict compliance requirements (SOC 2 Type II, HIPAA, GDPR). These tiers typically include SSO/SAML, advanced governance features, and deployment flexibility (Tonic Cloud or fully self‑hosted).

For precise pricing and plan alignment, the next step is usually a tailored discussion around your data sources, environment count, and refresh requirements.


Frequently Asked Questions

Can we run different refresh cadences for staging and QA with the same Structural setup?

Short Answer: Yes. You can use the same Structural configuration and trigger it on different schedules or create environment‑specific generations with distinct policies.

Details:
In practice, most teams either:

  • Define one generation that contains all transforms and subsetting logic, then:
    • Run it nightly for staging via a Jenkins job.
    • Run it weekly or on‑demand for QA via a separate scheduler.

or

  • Create two generations derived from the same baseline:
    • staging-generation: larger subset, more frequent cadence.
    • qa-generation: smaller, faster subset tailored to test suites.

Your scheduler (Jenkins, GitHub Actions, etc.) handles cadence; Structural guarantees that each run applies the same transformation policies and preserves referential integrity.


How do we avoid accidentally pushing raw production data into staging during a refresh?

Short Answer: By making Structural the only engine that touches production data for non‑prod, and enforcing de‑identification policies and schema change alerts inside Structural.

Details:
The failure mode you’re avoiding is side‑door access: a manual dump or legacy script bypasses your masking pipeline. To prevent this:

  • Centralize the workflow:
    Only Structural has permission to read from the production source and write to staging/QA targets for refresh purposes.
  • Define strict de‑identification policies:
    Use Structural’s masking and synthesis tools across all sensitive columns, plus custom sensitivity rules for domain‑specific fields.
  • Enable schema change awareness:
    When new columns are added upstream, Structural surfaces them so you can apply appropriate transforms before they ever flow to staging.
  • Audit trails and logs:
    Structural’s audit logs show what was transformed and when, giving you evidence for governance and an early warning system if something’s off.

This shifts privacy from “please follow the policy” to “the pipeline enforces the policy,” which is the only model that scales.


Summary

Automated refreshes with Tonic Structural turn staging and QA from brittle, manually curated environments into first‑class, production‑shaped test beds. You define a generation once—connecting to production, configuring de‑identification and subsetting, and wiring in destinations—then let your scheduler or CI/CD platform call Structural on a cadence that matches your release rhythm.

Because Structural preserves referential integrity, cross‑table consistency, and statistical properties while stripping out sensitive data, your tests run on realistic datasets without the compliance baggage of raw production clones. Customers running this pattern have cut test data provisioning time by more than half and seen measurable gains in developer productivity and fewer defects escaping to production.

If you’re still hydrating staging and QA with a mix of SQL dumps, shell scripts, and “do not copy” warnings, you’re leaving both velocity and safety on the table. This is a pipeline problem, and it’s one Structural is designed to own end‑to‑end.


Next Step

Get Started