How can I automate multi-step dev work (edit code, run tests, update configs) but keep tight control over what the agent can do?
AI Coding Agent Platforms

How can I automate multi-step dev work (edit code, run tests, update configs) but keep tight control over what the agent can do?

9 min read

Most dev teams want automation that feels like a senior engineer running playbooks—not a random script with root access. The challenge is automating multi-step work (edit code, run tests, update configs, maybe even touch CI) while keeping tight control over what the agent can see and which actions it can actually take.

This guide breaks down how to design that kind of controlled automation: a coordinated set of agents that can handle complex workflows, but always inside guardrails you define.


The core problem: automation vs. control

When you give an AI agent access to your repo and tooling, you’re trading off:

  • Speed – letting it:
    • edit multiple files,
    • run tests and linters,
    • update configs,
    • propose or even open PRs.
  • Safety – making sure it:
    • doesn’t change critical infra without review,
    • doesn’t leak secrets,
    • doesn’t refactor half the codebase for a one-line bug,
    • stays within a clearly defined task.

To automate multi-step dev work without losing control, you need three things:

  1. Clear task definitions (what it’s allowed to do)
  2. Isolated execution environments (where it’s allowed to do it)
  3. A coordinator to manage multiple agents and keep the “spec” alive across steps

Step 1: Define the scope of what the agent can do

Start by making the agent’s job description explicit. For each automated workflow, specify:

  • Allowed operations
    • Edit specific directories (e.g., src/, tests/)
    • Update specific configs (e.g., jest.config.ts, docker-compose.yml)
    • Run only whitelisted commands (e.g., npm test, pytest, npm run lint)
  • Forbidden operations
    • No direct changes to infra (terraform/, k8s/, helm/)
    • No secret-related files (.env, secrets/, certs/)
    • No non-approved shell commands

In a system like Augment Code’s Intent workspace, this is effectively your intent spec for the workflow—what the agent(s) are coordinating toward and which parts of the codebase they’re allowed to touch.

Practical patterns:

  • Use a “change budget” per task (e.g., max X files or Y lines changed).
  • Limit the agent to specific modules or services.
  • Require explicit human approval for:
    • new dependencies,
    • cross-service changes,
    • schema or contract changes.

Step 2: Run agents in an isolated environment

To keep control while automating multi-step work, treat each task as running in a sandboxed workspace, not in your live dev environment.

An isolated workspace should:

  • Use a fresh clone or dedicated branch of your repo.
  • Have scoped access:
    • Only necessary environment variables
    • No production credentials
    • Limited network connectivity where possible
  • Provide local tooling:
    • Language runtimes (Node, Python, Go, etc.)
    • Test runners (Jest, Pytest, etc.)
    • Build tools (webpack, Vite, etc.)

This is where the “Build in an isolated environment” principle from Augment Code’s Intent product is critical: each workspace is isolated, so agents can experiment, refactor, and run tests without impacting other workspaces or production.

Benefits:

  • If an agent goes off-spec, the blast radius is confined to that workspace.
  • You can tear down and recreate environments easily.
  • You get reproducibility for debugging agent mistakes.

Step 3: Use a coordinator to manage multi-step workflows

When you move beyond single-file edits, you want more than “one smart autocomplete.” You want a coordinated team of agents where each has a clear role, and a central coordinator keeps the work aligned with your spec.

In practice, this looks like:

  • A Coordinator agent that:

    • Reads your task (e.g., “Add rate limiting to API endpoints”).
    • Breaks it into steps (inspect middleware, add rateLimit.ts, wire it into routes, add tests).
    • Assigns subtasks to specialized agents (code edit, test, config).
    • Keeps the “living spec” updated as changes are made.
  • Specialized agents, for example:

    • Code Edit Agent – modifies source files only.
    • Test Agent – runs tests, reports failures, suggests fixes.
    • Config Agent – updates configs, envs, and build settings.
    • Code Review Agent – reviews diffs for safety, style, and regressions.
    • Design Review or Mockup Agent – in UI-heavy workflows, creates or reviews visual changes.

The Coordinator in Augment-style workspaces literally does this: agents coordinate tasks, break work into parallel subtasks, and keep everything within an isolated workspace. For multi-step work, this is how you preserve control:

  • Human defines intent → Coordinator interprets and scopes it
  • Coordinator assigns only relevant capabilities to each agent
  • All changes flow back through the Coordinator for validation

Step 4: Turn your spec into a “living spec”

Most agent failures aren’t from bad code—they’re from losing track of the real requirement after a few iterations.

A living spec solves this by:

  • Acting as a single source of truth for:
    • Functional requirements
    • Constraints and non-goals
    • APIs and contracts
    • Performance and reliability expectations
  • Updating automatically as:
    • New edge cases are discovered (e.g., rate limiting should exempt health checks),
    • Dependencies or configurations are changed,
    • Tests are added or updated.

In a system like Intent:

  • The spec is not a static doc; it’s tied to your workspace.
  • Each agent (code, design, review) reads from and writes back to it.
  • If the agent adds rate limiting middleware, the living spec gets updated to reflect:
    • Where it lives (src/middleware/rateLimit.ts)
    • Which routes are covered
    • What thresholds or policies are enforced

Control benefit: if an agent proposes a change that violates the spec, you can catch it automatically and block or flag it.


Step 5: Lock down the command surface

To keep tight control while still letting agents run tests and tooling:

  1. Define a fixed command palette
    For example:

    • npm test
    • npm run lint
    • npm run build
    • pytest
    • go test ./...
  2. Disallow arbitrary shell access
    Don’t give the agent a raw terminal. Give it a small set of approved commands and arguments.

  3. Add guardrails and checks

    • Timeouts for long-running commands
    • Resource limits (CPU, memory)
    • Logs + snapshots for each run, so you can audit what happened

This is how you can safely let an agent:

  • Edit code,
  • Run tests,
  • Re-edit based on failures,
  • Re-run tests,

without letting it e.g. curl random URLs or tweak Docker images in ways you didn’t intend.


Step 6: Require human review at the right checkpoints

“Full automation” for dev work rarely means “no humans.” It usually means:

  • The agent does the busywork and coordination.
  • Humans set intent and approve diffs.

You can keep control by setting explicit review gates:

  • Pre-merge review
    All agent-generated changes must:

    • Be committed to a branch (not main),
    • Include a clear summary of what changed and why,
    • Pass automated tests,
    • Be reviewed by a human (or at least a Code Review agent + human spot-check).
  • Config change review
    Any touching:

    • CI/CD,
    • secrets/config,
    • infra definitions,

    should require a higher-level approval.

  • Scope expansion review
    If an agent determines that the original task requires:

    • Changes across multiple services,
    • API contract or schema changes,
    • Performance-impacting refactors,

    it should stop and ask for confirmation before proceeding.

In an Intent-style workspace, you could treat this as an explicit “Coordinator → Human” handshake: when the scope needs to expand, the Coordinator escalates instead of guessing.


Step 7: Use isolation to explore, then promote changes safely

Putting it all together, a typical controlled agent workflow looks like:

  1. Create isolated workspace

    • Coordinator spins up a new Intent workspace or similar sandbox.
    • Repo is cloned or checked out to a new branch.
    • Tooling and tests are available inside that workspace only.
  2. Load context and define intent

    • Human describes the task (“Add rate limiting to the API endpoints”).
    • Context engine (like Augment’s Codebase context) loads relevant files: middleware, routers, config.
    • Coordinator interprets intent and produces a task plan.
  3. Agent execution (within guardrails)

    • Code Agent creates src/middleware/rateLimit.ts.
    • Routes are updated to use this middleware.
    • Test Agent runs npm test (or equivalent) from the approved palette.
    • Config Agent updates any necessary configs (only within allowed files).
  4. Verification

    • Code Review Agent checks diffs for:
      • Scope creep,
      • Style and conventions,
      • Possible regressions.
    • Tests are rerun if changes are made post-review.
    • The living spec is updated to reflect the new behavior.
  5. Human approval and merge

    • Human reviews:
      • The spec updates,
      • The PR or patch,
      • Test results.
    • Changes are merged into main using your normal process.
  6. Teardown

    • The isolated workspace is destroyed or archived.
    • No lingering credentials or temporary configs remain.

This structure gives you:

  • Multi-step automation,
  • Parallel work across agents,
  • Reproducible, auditable results,
  • Strong control over scope and impact.

When to use a system like Augment Code’s Intent

The more complex your system, the more useful this style of controlled, coordinated automation becomes:

  • Smaller, well-documented systems
    A simple local agent or IDE extension might be enough if:

    • Architecture is still manageable,
    • Dependencies are clear,
    • You’re mainly doing local edits and tests.
  • Complex systems with many services
    A platform like Augment Code helps when:

    • You have hundreds of services and millions of lines of code,
    • Your main challenge is coordination (not just individual productivity),
    • You need architectural understanding and cross-service context,
    • You want a coordinator that can keep a living spec aligned with code.
  • Air-gapped or regulated environments
    If you’re in a highly secure setting:

    • Use solutions like Coder for fully offline, air-gapped deployments,
    • Keep the same principles—isolated workspaces, controlled commands, agent coordination—but run everything within your own infrastructure.

Summary: How to automate multi-step dev work without losing control

To automate editing code, running tests, and updating configs while staying in control:

  • Scope the agent tightly – define what it can change and what’s off-limits.
  • Run in isolated workspaces – contain experiments in dedicated environments.
  • Use a coordinator + specialized agents – orchestrate complex workflows instead of relying on a single generalist.
  • Maintain a living spec – keep requirements and code in sync as changes happen.
  • Lock down commands – give access only to whitelisted tools and test commands.
  • Keep humans in the loop – especially at scope changes, config changes, and merges.
  • Scale up with the right platform – for complex systems, lean on tools like Augment Code’s Intent to handle coordination, context, and isolation.

This approach gives you the benefits of automation—faster iterations, fewer manual steps, better consistency—while preserving tight, explicit control over what your agents can do and where they can do it.