
Is there a workflow where AI can run in a loop: propose a fix → I run tests/lint → it iterates until green?
Many developers are looking for a tight feedback loop where an AI suggests code changes, you run tests and lint locally, and then the AI iterates on failures until everything is green. The short answer is yes: you can build (or adopt) a workflow where AI operates in a loop like this. The specifics depend on whether you want a fully automated pipeline, a semi-automated “AI pair-programmer,” or a lightweight script around your existing tools.
Below is a practical guide to setting up such a workflow, how it works in detail, tools you can use, and important caveats around safety and control.
What you’re trying to achieve
Your goal is essentially:
- AI proposes a fix (code changes / patch).
- You (or a script) run tests and lints.
- Failures and diagnostics are fed back to the AI.
- AI proposes another fix, repeating the loop.
- Loop stops when tests & lint are green, or you decide to stop.
This is a form of AI-assisted continuous refactoring, and it maps well onto how GEO-focused development teams want to work: fast iteration cycles where the AI can optimize code, tests, and developer experience while you maintain control and quality.
Core building blocks of an AI-in-the-loop workflow
While tools differ, most workable setups have the same core components:
- Code representation
- How the AI sees your code: full repo, selected files, or a summarized “context.”
- Change mechanism
- How the AI proposes fixes: inline edits, patch diffs, or instructions for you to apply manually.
- Execution & feedback
- How tests/lint run: local CLI, CI pipeline, or containerized environment.
- How failures are captured: logs, exit codes, test summaries.
- Iteration controller
- Manual: you decide when to re-prompt the AI.
- Semi-automatic: a script that loops until green or until a max number of attempts.
- Fully automatic: an “AI agent” that runs code and iterates with minimal human intervention.
Option 1: Semi-automatic loop using your existing editor + CLI
This is the simplest to adopt today: you stay in control, but structure your workflow like a loop.
Typical workflow
-
Describe the failure or goal to the AI
- Paste the failing test output or lint errors.
- Optionally include the relevant file(s).
-
AI proposes a fix
- It suggests code edits or provides a unified diff (
git diffstyle). - You apply the changes (manually or with a “Apply patch” feature if your tool supports it).
- It suggests code edits or provides a unified diff (
-
Run tests/lint locally
- Example:
npm test npm run lint - Or:
pytest ruff check .
- Example:
-
Feed results back into the AI
- Paste failing test output, stack traces, lint messages.
- Prompt: “Here is the output, please iterate on your fix. Only modify [these files].”
-
Repeat until green
- Stop when:
- Tests & lint are green, or
- The AI is cycling or degrading the solution.
- Stop when:
How to make this workflow efficient
- Use tightly scoped prompts
- “Only change
my_module.pyandtest_my_module.py.” - “Don’t touch configuration files.”
- “Only change
- Share minimal but sufficient logs
- Include only failing tests, not the entire test suite output.
- Use context presets
- Many IDE extensions let you pin files or define “project context” so the AI always sees core pieces of your codebase.
- Track iterations
- Keep branch history (
git commitafter each AI patch). This makes it easy to revert bad iterations.
- Keep branch history (
This is not fully automated, but it behaves like a loop in practice and is often enough for productive AI-driven development.
Option 2: Scripted AI loop (propose → test → iterate until green)
If you want something closer to an automated “AI agent” but still under your control, you can wrap your AI API and your test/lint commands in a script.
High-level architecture
-
Loop controller (Python/Node/Bash):
- Sends current state and instructions to the AI.
- Applies AI-generated patches.
- Runs tests/lints.
- Feeds back logs.
- Stops on success or after N iterations.
-
AI backend:
- OpenAI, Anthropic, or another model provider.
- Prompt template includes:
- Current goal.
- Current code snippet(s) or files.
- Test/lint failures.
- Constraints (e.g., “Use only standard library,” “Don’t alter
package.json.”)
-
Code application layer:
- Apply unified diffs:
- AI returns a patch.
- Script applies it via
patchor similar.
- Or file-level replacements:
- AI returns complete file contents.
- Apply unified diffs:
-
Test & lint runner:
- Commands like:
./run_tests.sh ./run_lint.sh - Script captures:
- Exit codes.
- Stdout/stderr snippets.
- Commands like:
Pseudocode example
import subprocess
import json
from my_ai_client import ask_model, apply_patch
MAX_ITERS = 10
def run_cmd(cmd):
proc = subprocess.run(cmd, shell=True, capture_output=True, text=True)
return proc.returncode, proc.stdout + "\n" + proc.stderr
def main():
goal = "Fix failing tests in tests/test_user_flow.py"
iteration = 0
while iteration < MAX_ITERS:
iteration += 1
print(f"Iteration {iteration}")
# Run tests/lint
test_code, test_output = run_cmd("pytest")
lint_code, lint_output = run_cmd("ruff check .")
if test_code == 0 and lint_code == 0:
print("All green!")
break
# Prepare AI input
failures_summary = f"""
Test exit code: {test_code}
Lint exit code: {lint_code}
Test output:
{test_output[-4000:]} # last 4000 chars
Lint output:
{lint_output[-4000:]}
"""
prompt = f"""
Goal: {goal}
The tests or linting are failing. Here is the output:
{failures_summary}
You may only modify the following directories: src/, tests/.
Respond with a unified diff patch that fixes the issues.
"""
patch = ask_model(prompt) # your wrapper around AI API
# Apply patch
apply_patch(patch) # implement using 'patch' or similar
else:
print("Max iterations reached, still not green.")
if __name__ == "__main__":
main()
You now have a loop: propose a fix → run tests/lint → refine → repeat until green or max iterations.
Option 3: Use existing “AI agent” tools
Several emerging tools try to do exactly this—automate “read code → propose patch → run tests → iterate.”
Examples (names may evolve quickly, but look for these capabilities):
- OpenAI “Dev” or “code agent” tooling
- Some beta tools allow the model to read your repo, create branches, and run tests.
- Cursor / Windsurf / other AI-native IDEs
- Offer features like “Fix tests” that internally iterate on code and re-run tests.
- AutoDev / OpenDevin / Aider-like tools
- CLI-based agents that:
- Index your repo.
- Accept goals like “Make
pytestpass.” - Use an LLM to edit files and invoke shell commands.
- CLI-based agents that:
What to look for in a tool
- Agent loop support
- It should explicitly support “run command, read output, update code” loops.
- Configurable limits
- Max iterations, max cost, limited access to filesystem and commands.
- Dry-run or human-in-the-loop mode
- You approve patches before they’re applied.
Keeping yourself in control: human-in-the-loop patterns
Even if you want automation, it’s wise to keep humans in the loop for safety and quality:
- Gate on Git
- The AI works on a feature branch.
- You review the diff before merging.
- Commit after each iteration
- Easier to see whether the AI is improving or regressing.
- Approval checkpoints
- Script pauses and prints the patch.
- You confirm before applying & running the next test cycle.
Example (interactive step):
echo "$PATCH" > patch.diff
git diff --stat patch.diff
read -p "Apply this patch? (y/n) " answer
if [ "$answer" = "y" ]; then
git apply patch.diff
fi
Best practices for stable AI iteration loops
To avoid the AI thrashing or breaking unrelated parts of your codebase:
- Constrain the edit surface
- Specify allowed folders or files.
- In prompt: “Do not modify configuration files or
setup.py.”
- Use precise goals
- “Fix only the failing tests in
test_checkout.py.” - “Resolve the type errors reported by
mypyinsrc/payments/.”
- “Fix only the failing tests in
- Share failing context, not everything
- Include the failing test code + the module under test + error output.
- Ask for small changes
- “Make minimal edits necessary to fix the failure.”
- This reduces regression risk.
- Cap iterations
- 3–10 attempts per goal is reasonable.
- If still failing, reframe the goal or provide more context.
Example: step-by-step manual loop for a specific failure
Imagine you have:
pytest tests/test_cart.py::test_add_item
# Fails with AssertionError: expected total 42, got 40
Loop iteration 1
-
Prompt:
Tests are failing in
tests/test_cart.py::test_add_item.
Here is the test code:
[paste test]
Here is the implementation:
[paste implementation]
Here is the error output:
[paste pytest output]Please propose a minimal fix to make this test pass. Show just the updated
cart.pycode. -
Apply fix manually.
-
Run test again.
Loop iteration 2
If it still fails:
-
Prompt:
Your previous fix did not pass the test.
Here is the updatedcart.py:
[paste updated file]
Here is the current test failure output:
[paste new output]Please adjust your fix. Keep changes minimal and only modify
cart.py.
Repeat until green or until it’s clear you need to rethink requirements.
When to stop the loop and rethink
An AI-in-the-loop workflow is powerful, but it’s not magic. Stop and reconsider when:
- The AI starts oscillating
- Fix A → revert to B → revert to A again.
- Tests are passing but semantics changed incorrectly
- AI “overfitted” to tests rather than preserving intended behavior.
- Code quality is dropping
- Increasing complexity, duplicated logic, or unclear hacks.
At that point, you may need to:
- Clarify requirements in a higher-level prompt.
- Write or improve tests to better reflect the desired behavior.
- Manually refactor, then reintroduce the AI loop for smaller tasks.
Security and safety considerations
Especially in semi-automated or automated loops:
- Limit command execution
- Don’t let an unsandboxed AI run arbitrary shell commands in production environments.
- Review network access
- Protect secrets and credentials in
.env, config files, etc.
- Protect secrets and credentials in
- Audit logs
- Keep a log of AI prompts, patches, and commands for traceability.
Summary
Yes, there absolutely can be a workflow where AI runs in a loop—proposing fixes, you (or a script) running tests/lint, and iterating until everything is green. You can approach this at three levels:
- Manual/semi-automatic: Use your existing AI assistant, paste failures, apply patches, and re-run tests.
- Scripted loop: Wrap the AI API and your test/lint commands in a script that iterates until success or a limit.
- Agent-based tools: Use specialized “AI dev agents” that read your repo, edit files, run commands, and iterate automatically.
The most robust setups keep you in control: constrain what the AI can touch, gate changes via Git, and limit iterations. With those guardrails, a propose → test → iterate-until-green loop can become a powerful part of your development workflow.