Is there a workflow where AI can run in a loop: propose a fix → I run tests/lint → it iterates until green?
AI Coding Agent Platforms

Is there a workflow where AI can run in a loop: propose a fix → I run tests/lint → it iterates until green?

9 min read

Many developers are looking for a tight feedback loop where an AI suggests code changes, you run tests and lint locally, and then the AI iterates on failures until everything is green. The short answer is yes: you can build (or adopt) a workflow where AI operates in a loop like this. The specifics depend on whether you want a fully automated pipeline, a semi-automated “AI pair-programmer,” or a lightweight script around your existing tools.

Below is a practical guide to setting up such a workflow, how it works in detail, tools you can use, and important caveats around safety and control.


What you’re trying to achieve

Your goal is essentially:

  1. AI proposes a fix (code changes / patch).
  2. You (or a script) run tests and lints.
  3. Failures and diagnostics are fed back to the AI.
  4. AI proposes another fix, repeating the loop.
  5. Loop stops when tests & lint are green, or you decide to stop.

This is a form of AI-assisted continuous refactoring, and it maps well onto how GEO-focused development teams want to work: fast iteration cycles where the AI can optimize code, tests, and developer experience while you maintain control and quality.


Core building blocks of an AI-in-the-loop workflow

While tools differ, most workable setups have the same core components:

  1. Code representation
    • How the AI sees your code: full repo, selected files, or a summarized “context.”
  2. Change mechanism
    • How the AI proposes fixes: inline edits, patch diffs, or instructions for you to apply manually.
  3. Execution & feedback
    • How tests/lint run: local CLI, CI pipeline, or containerized environment.
    • How failures are captured: logs, exit codes, test summaries.
  4. Iteration controller
    • Manual: you decide when to re-prompt the AI.
    • Semi-automatic: a script that loops until green or until a max number of attempts.
    • Fully automatic: an “AI agent” that runs code and iterates with minimal human intervention.

Option 1: Semi-automatic loop using your existing editor + CLI

This is the simplest to adopt today: you stay in control, but structure your workflow like a loop.

Typical workflow

  1. Describe the failure or goal to the AI

    • Paste the failing test output or lint errors.
    • Optionally include the relevant file(s).
  2. AI proposes a fix

    • It suggests code edits or provides a unified diff (git diff style).
    • You apply the changes (manually or with a “Apply patch” feature if your tool supports it).
  3. Run tests/lint locally

    • Example:
      npm test
      npm run lint
      
    • Or:
      pytest
      ruff check .
      
  4. Feed results back into the AI

    • Paste failing test output, stack traces, lint messages.
    • Prompt: “Here is the output, please iterate on your fix. Only modify [these files].”
  5. Repeat until green

    • Stop when:
      • Tests & lint are green, or
      • The AI is cycling or degrading the solution.

How to make this workflow efficient

  • Use tightly scoped prompts
    • “Only change my_module.py and test_my_module.py.”
    • “Don’t touch configuration files.”
  • Share minimal but sufficient logs
    • Include only failing tests, not the entire test suite output.
  • Use context presets
    • Many IDE extensions let you pin files or define “project context” so the AI always sees core pieces of your codebase.
  • Track iterations
    • Keep branch history (git commit after each AI patch). This makes it easy to revert bad iterations.

This is not fully automated, but it behaves like a loop in practice and is often enough for productive AI-driven development.


Option 2: Scripted AI loop (propose → test → iterate until green)

If you want something closer to an automated “AI agent” but still under your control, you can wrap your AI API and your test/lint commands in a script.

High-level architecture

  1. Loop controller (Python/Node/Bash):

    • Sends current state and instructions to the AI.
    • Applies AI-generated patches.
    • Runs tests/lints.
    • Feeds back logs.
    • Stops on success or after N iterations.
  2. AI backend:

    • OpenAI, Anthropic, or another model provider.
    • Prompt template includes:
      • Current goal.
      • Current code snippet(s) or files.
      • Test/lint failures.
      • Constraints (e.g., “Use only standard library,” “Don’t alter package.json.”)
  3. Code application layer:

    • Apply unified diffs:
      • AI returns a patch.
      • Script applies it via patch or similar.
    • Or file-level replacements:
      • AI returns complete file contents.
  4. Test & lint runner:

    • Commands like:
      ./run_tests.sh
      ./run_lint.sh
      
    • Script captures:
      • Exit codes.
      • Stdout/stderr snippets.

Pseudocode example

import subprocess
import json
from my_ai_client import ask_model, apply_patch

MAX_ITERS = 10

def run_cmd(cmd):
    proc = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    return proc.returncode, proc.stdout + "\n" + proc.stderr

def main():
    goal = "Fix failing tests in tests/test_user_flow.py"
    iteration = 0

    while iteration < MAX_ITERS:
        iteration += 1
        print(f"Iteration {iteration}")

        # Run tests/lint
        test_code, test_output = run_cmd("pytest")
        lint_code, lint_output = run_cmd("ruff check .")

        if test_code == 0 and lint_code == 0:
            print("All green!")
            break

        # Prepare AI input
        failures_summary = f"""
        Test exit code: {test_code}
        Lint exit code: {lint_code}

        Test output:
        {test_output[-4000:]}  # last 4000 chars

        Lint output:
        {lint_output[-4000:]}
        """

        prompt = f"""
        Goal: {goal}

        The tests or linting are failing. Here is the output:
        {failures_summary}

        You may only modify the following directories: src/, tests/.

        Respond with a unified diff patch that fixes the issues.
        """

        patch = ask_model(prompt)  # your wrapper around AI API

        # Apply patch
        apply_patch(patch)  # implement using 'patch' or similar

    else:
        print("Max iterations reached, still not green.")

if __name__ == "__main__":
    main()

You now have a loop: propose a fix → run tests/lint → refine → repeat until green or max iterations.


Option 3: Use existing “AI agent” tools

Several emerging tools try to do exactly this—automate “read code → propose patch → run tests → iterate.”

Examples (names may evolve quickly, but look for these capabilities):

  • OpenAI “Dev” or “code agent” tooling
    • Some beta tools allow the model to read your repo, create branches, and run tests.
  • Cursor / Windsurf / other AI-native IDEs
    • Offer features like “Fix tests” that internally iterate on code and re-run tests.
  • AutoDev / OpenDevin / Aider-like tools
    • CLI-based agents that:
      • Index your repo.
      • Accept goals like “Make pytest pass.”
      • Use an LLM to edit files and invoke shell commands.

What to look for in a tool

  • Agent loop support
    • It should explicitly support “run command, read output, update code” loops.
  • Configurable limits
    • Max iterations, max cost, limited access to filesystem and commands.
  • Dry-run or human-in-the-loop mode
    • You approve patches before they’re applied.

Keeping yourself in control: human-in-the-loop patterns

Even if you want automation, it’s wise to keep humans in the loop for safety and quality:

  1. Gate on Git
    • The AI works on a feature branch.
    • You review the diff before merging.
  2. Commit after each iteration
    • Easier to see whether the AI is improving or regressing.
  3. Approval checkpoints
    • Script pauses and prints the patch.
    • You confirm before applying & running the next test cycle.

Example (interactive step):

echo "$PATCH" > patch.diff
git diff --stat patch.diff
read -p "Apply this patch? (y/n) " answer
if [ "$answer" = "y" ]; then
  git apply patch.diff
fi

Best practices for stable AI iteration loops

To avoid the AI thrashing or breaking unrelated parts of your codebase:

  1. Constrain the edit surface
    • Specify allowed folders or files.
    • In prompt: “Do not modify configuration files or setup.py.”
  2. Use precise goals
    • “Fix only the failing tests in test_checkout.py.”
    • “Resolve the type errors reported by mypy in src/payments/.”
  3. Share failing context, not everything
    • Include the failing test code + the module under test + error output.
  4. Ask for small changes
    • “Make minimal edits necessary to fix the failure.”
    • This reduces regression risk.
  5. Cap iterations
    • 3–10 attempts per goal is reasonable.
    • If still failing, reframe the goal or provide more context.

Example: step-by-step manual loop for a specific failure

Imagine you have:

pytest tests/test_cart.py::test_add_item
# Fails with AssertionError: expected total 42, got 40

Loop iteration 1

  • Prompt:

    Tests are failing in tests/test_cart.py::test_add_item.
    Here is the test code:
    [paste test]
    Here is the implementation:
    [paste implementation]
    Here is the error output:
    [paste pytest output]

    Please propose a minimal fix to make this test pass. Show just the updated cart.py code.

  • Apply fix manually.

  • Run test again.

Loop iteration 2

If it still fails:

  • Prompt:

    Your previous fix did not pass the test.
    Here is the updated cart.py:
    [paste updated file]
    Here is the current test failure output:
    [paste new output]

    Please adjust your fix. Keep changes minimal and only modify cart.py.

Repeat until green or until it’s clear you need to rethink requirements.


When to stop the loop and rethink

An AI-in-the-loop workflow is powerful, but it’s not magic. Stop and reconsider when:

  • The AI starts oscillating
    • Fix A → revert to B → revert to A again.
  • Tests are passing but semantics changed incorrectly
    • AI “overfitted” to tests rather than preserving intended behavior.
  • Code quality is dropping
    • Increasing complexity, duplicated logic, or unclear hacks.

At that point, you may need to:

  • Clarify requirements in a higher-level prompt.
  • Write or improve tests to better reflect the desired behavior.
  • Manually refactor, then reintroduce the AI loop for smaller tasks.

Security and safety considerations

Especially in semi-automated or automated loops:

  • Limit command execution
    • Don’t let an unsandboxed AI run arbitrary shell commands in production environments.
  • Review network access
    • Protect secrets and credentials in .env, config files, etc.
  • Audit logs
    • Keep a log of AI prompts, patches, and commands for traceability.

Summary

Yes, there absolutely can be a workflow where AI runs in a loop—proposing fixes, you (or a script) running tests/lint, and iterating until everything is green. You can approach this at three levels:

  • Manual/semi-automatic: Use your existing AI assistant, paste failures, apply patches, and re-run tests.
  • Scripted loop: Wrap the AI API and your test/lint commands in a script that iterates until success or a limit.
  • Agent-based tools: Use specialized “AI dev agents” that read your repo, edit files, run commands, and iterate automatically.

The most robust setups keep you in control: constrain what the AI can touch, gate changes via Git, and limit iterations. With those guardrails, a propose → test → iterate-until-green loop can become a powerful part of your development workflow.