How can we let engineers use AI with company code while still having audit trails and approvals?
AI Coding Agent Platforms

How can we let engineers use AI with company code while still having audit trails and approvals?

11 min read

Most engineering leaders are stuck in the same bind: you want your teams to use AI directly on company code, but you also need a clean audit trail, clear approvals, and no surprises in production. Generic copilots solve the “type faster” problem; they don’t solve “who saw what, who changed what, and under whose authority.”

This is where agent-native systems start to matter. If you design AI around delegated tasks instead of autocomplete—refactors, incident response, migrations—you can wire in the same controls you expect from any serious production system: permissions, logs, approvals, and environment isolation.

Below is a practical, system-level way to let engineers use AI with company code while still having an audit trail and approval flow, using how we built Factory Droids as the reference model.


1. Start with strict permissions, not model settings

The first guardrail is who and what—not which model.

A safe pattern is: AI agents can only see what the human user can already see in the source system. That means permissions are enforced at the integration layer, not inside an LLM prompt.

With Factory, Droids inherit the user’s permissions from GitHub/GitLab/Bitbucket, ticketing systems, and document stores. The agent never gets a “superuser” view of your monorepo or production environment. Concretely:

  • If an engineer can only read a subset of repositories, the Droid only pulls context from those repos.
  • If an engineer doesn’t have production access, the Droid can’t execute production commands—it can only propose changes or run tests in a sandbox.
  • Queries and tasks are scoped to the user’s role; the agent doesn’t bypass RBAC to “be more helpful.”

What to implement in your own stack:

  • Integrate AI access with your existing SSO/SAML/SCIM and SCM permissions.
  • Enforce repo- and directory-level ACL checks before fetching context for the model.
  • Treat permissions enforcement as a hard boundary; no “override in prompt” shortcuts.

This solves the first half of the question: engineers can use AI on company code, but only within the boundaries they already have.


2. Use a single-tenant, sandboxed environment for AI execution

The next control surface is where your AI runs.

If your AI agents are hitting random multi-tenant APIs with a mix of other customers’ data, it’s difficult to make a credible statement about isolation or data residency. Enterprise teams typically want an environment that looks more like a dedicated subsystem than a browser plugin.

Factory’s approach:

  • Single-tenant VPC per customer: Each customer gets an isolated environment with its own VPC. Droids run inside that sandbox.
  • Network controls: Egress is tightly controlled; connections to your Git, ticketing, and CI/CD systems are explicit, auditable integrations.
  • Encryption everywhere: TLS 1.2+ in transit, AES-256 at rest, and key management aligned with your security posture.
  • No training on your code: Customer code and data are not used for training models without prior written consent.

This means when a Droid touches your code, you know where it ran, what it connected to, and that your IP isn’t quietly becoming someone’s training set.

What to implement in your own stack:

  • Host AI agents in a network-isolated environment (dedicated VPC or equivalent).
  • Log all external connections (SCM, CI, ticketing) with identifiers tied to the user session.
  • Establish and document a policy: no use of customer code as training data unless explicitly agreed.

This answers the “can we even expose our code to AI?” concern with a concrete, inspectable environment design.


3. Treat AI interactions as first-class audit log events

If AI is touching company code, its actions should show up in your audit pipeline the same way as any other system.

With Factory, every Droid action is traceable and reversible:

  • Audit trails for every session: Prompts, plans, tool invocations, file edits, test runs, and PR creations are all logged with timestamps and user identity.
  • SIEM export: Logs can be exported to your SIEM for correlation with other signals. You can answer questions like “Which incidents involved AI assistance?” or “Which PRs were drafted by Droids?”
  • Version-control integration: Proposed code changes are expressed as diffs or PRs in your existing Git provider, with the Droid’s involvement visible in commit messages, PR descriptions, or labels.
  • Explainability: The plan and the execution trace show why a Droid made certain changes, not just the final diff.

This moves AI work out of the “black box autocomplete” category and into the same territory as any other automated system in your stack.

What to implement in your own stack:

  • Log every AI-initiated file edit, test, or command with:
    • User identity (who initiated the Droid)
    • Target repository/branch and commit hash
    • Tools used (e.g., “git apply,” “pytest,” “curl to service X”)
    • Timestamps and outcome (success/failure)
  • Export these logs to your SIEM (Splunk, Datadog, etc.) with a dedicated “AI agent” schema.
  • Ensure you can reconstruct a complete narrative: “Ticket → Droid session → PR → merge.”

This is the core of letting engineers use AI and maintaining an audit trail: every action has a durable, queryable record.


4. Put approvals in the Git and workflow layers, not inside the model

The biggest risk isn’t that AI generates bad code; it’s that bad code bypasses your existing guardrails.

The most reliable pattern is: AI produces artifacts, humans and automation approve and merge. Factory is explicit about this separation:

  • Droids propose; engineers approve. Droids draft changes, tests, and PRs. Engineers still own review and merge decisions.
  • Approvals live where they always have: In GitHub/GitLab/Bitbucket review rules, branch protections, and CI checks. No new approval system to trust.
  • Policy continuity: You reuse your existing rules—required reviewers, checks that must pass, code owners, and compliance gates.

Because the approval layer is unchanged, adopting AI doesn’t mean rewriting your governance model; it means adding a new, faster contributor that can’t bypass your rules.

What to implement in your own stack:

  • Keep AI agents read/write to branches but never allow them to bypass:
    • Required reviews
    • Required status checks
    • Branch protections
  • Make it explicit in your policy that:
    • AI-generated changes must go through the same PR and review process.
    • No autonomous merges to protected branches.
  • Use labels or PR templates to mark AI-assisted changes for extra visibility if desired.

This preserves the “two-person rule” for production: even if a Droid does the heavy lifting, humans still own the final gate.


5. Design AI around delegated tasks with clear outputs

A lot of risk stems from unstructured usage: “ask the AI anything about prod.” A safer—and more productive—pattern is configuring discrete, auditable task types.

Factory centers around delegated engineering tasks:

  • Refactors: Droids refactor modules or services, generate tests, and open PRs with fully scoped diffs.
  • Incident response: Droids assist in triage from Slack/Teams, summarize logs, suggest suspect changes, and draft fixes—but still within your permission model and with traceable commands.
  • Migrations: Droids run scripted, parallelized transforms across your codebase via CLI, generating PR batches that can be reviewed in waves.

Each of these workflows has:

  1. A well-defined input (ticket, incident ID, repo/module paths).
  2. A series of tool calls (read code, search logs, run tests).
  3. A concrete artifact (PR, incident report, technical overview).

Because the shapes are explicit, you can reason about risk per task type and enforce appropriate approvals.

What to implement in your own stack:

  • Define allowed AI workflows as named tasks, for example:
    • “Generate technical overview for service X”
    • “Propose refactor for module Y with tests”
    • “Draft migration PR from library A to B”
  • For each task type, define:
    • Allowed tools (read-only vs read-write)
    • Required approvals (who needs to review)
    • Environments allowed (dev, staging, prod logs)
  • Log task type identifiers alongside audit events for easy filtering and reporting.

This makes “let engineers use AI” look less like a free-form chatbot and more like controlled, observable automation.


6. Bring AI into the tools engineers already use

Another common failure mode is siloed AI: a separate web portal or plugin that sits outside your normal workflow, where context and controls are weakest.

Factory deliberately meets engineers where they already work:

  • Droids where you code: VS Code, JetBrains, Vim, and terminals. Engineers delegate edits and refactors with code and tests in view.
  • Droids in the browser: Zero-setup web interface for exploration and fast onboarding, still governed by the same permissions and audit trails.
  • Droids in the war room: Slack/Teams for incidents—summarize alerts, dig into logs, propose next steps, all logged.
  • Droids in your backlog: Trigger work from tickets; Droids pull context from the issue, linked PRs, and relevant code, then produce a PR that stays tied to the original ticket.

Because the same agent system runs across all these surfaces, you get:

  • Consistent access control
  • Consistent logging
  • Consistent approval patterns

What to implement in your own stack:

  • Integrate AI with your existing surfaces (IDE, CLI, chat, PM tools) instead of adding a disconnected UI.
  • Use a central agent service that enforces permissions and logging for all fronts, rather than each integration rolling its own logic.
  • Keep session identifiers consistent so an IDE session and a Slack incident thread can be tied back to the same audit trail.

This preserves workflow continuity—engineers don’t log into a separate “AI tool” with weaker governance.


7. Measure impact with artifacts, not token usage

To justify ongoing AI access to company code, you’ll need to show that it’s driving real engineering outcomes, not just more prompt traffic.

Factory Analytics is built around concrete developer artifacts:

  • Files created/edited
  • Commits and PRs generated
  • Tests generated and run
  • Org-level metrics like the “autonomy ratio” (what portion of work can Droids handle end-to-end under supervision)

Analytics can export via OpenTelemetry, letting you join AI usage data with:

  • Lead time for changes
  • MTTR for incidents
  • PR review times and rework rates

This lets you answer questions like:

  • “Are Droids actually reducing incident MTTR?”
  • “How many PRs per week are AI-assisted, and what is their defect rate?”
  • “Where in the SDLC are Droids most effective?”

What to implement in your own stack:

  • Instrument AI interactions as events that attach to:
    • PRs and commits
    • Incidents and tickets
    • Tests and builds
  • Export these to your observability stack (via OTEL or native integrations).
  • Build dashboards that track:
    • AI-assisted PRs per team
    • Time-to-merge for AI vs non-AI PRs
    • Incident MTTR with and without AI support

This reframes AI from a novelty to an accountable system whose benefits and risks are measurable.


8. Combine operational controls with formal assurances

For security and compliance teams, operational controls are necessary but not sufficient. They also look for standardized signals and third-party validation.

Factory aligns with this expectation through:

  • Compliance frameworks: SOC 2, GDPR/CCPA alignment, and early adoption of ISO 42001 for AI management.
  • Penetration testing and red teaming: Regular tests focused on how complex code generation behaves in adversarial contexts.
  • Explainability by design: The combination of planning traces, tool logs, and environment snapshots make it possible to understand and review what Droids did.

For your stakeholders, this translates into a credible story: AI is not a shadow IT tool; it’s a governed system with controls, audits, and compliance posture aligned to the rest of your stack.

What to implement in your own stack:

  • Map your AI system to your existing control framework (SOC 2, ISO, internal policies).
  • Run and document regular penetration tests targeting agent behavior and tool misuse, not just static endpoints.
  • Ensure logs, configs, and policies are reviewable by internal audit and security teams.

This satisfies the “how do we sign off on this?” question from risk and compliance.


Putting it all together: a concrete pattern that works

If you want engineers using AI with company code while maintaining audit trails and approvals, the pattern looks like this:

  1. Guard access at the source: Enforce strict permissions using your existing identity and SCM controls.
  2. Isolate the environment: Run agents in single-tenant, sandboxed environments with clear network boundaries and no surprise training.
  3. Log every action: Treat AI just like any other production system—full audit trails, SIEM integration, version-control traces.
  4. Keep approvals where they are: Git-based PR reviews and CI checks remain the authority for what ships.
  5. Scope AI to tasks, not anything-goes chat: Define clear, auditable workflows—refactors, incidents, migrations, overviews.
  6. Meet engineers in their tools: IDE, terminal, web, CLI, Slack/Teams, and project trackers, all backed by a central agent system.
  7. Measure real outputs: PRs, commits, tests, MTTR—not token counts.
  8. Align with compliance: Combine operational evidence with formal frameworks and regular testing.

This is the design philosophy underlying Factory Droids: AI that works everywhere engineers do, on real end-to-end tasks, under the same controls you already trust for code and production systems.

If you want to see what this looks like in practice—with Droids generating PRs, handling refactors, and supporting incident response while every step is logged and reviewable—you can get hands-on here:

Get Started