Skip to main content
Aotokitsuruya
Aotokitsuruya
Senior Software Developer
Published at
This article is translated by AI, if have any corrections please let me know.

The Ralph Loop technique — a trick for making AI Coding Agents automatically repeat execution — was popular last year, but I didn’t trust whether it was safe enough or sufficiently stable.

A few months ago, Claude Code’s update introduced the /loop skill, which essentially uses Claude Code’s built-in Cron tool to repeat specific prompts at fixed intervals, achieving a similar effect. At least it’s safer and more controllable.

However, things are rarely that simple.

Not Reliable

Initially, I wanted to make the most of the doubled quota in March, so I started experimenting with /loop. It quickly became apparent that the results weren’t meeting expectations.

For an SDD (Spec-Driven Development) scenario, this seemed like a great use case — we could write the spec and let Claude Code handle the implementation, without needing to constantly monitor it since it would eventually finish.

To run it safely, I even prepared a Proxmox Base Image so I could spin up clean Virtual Machines on demand. Surprisingly, Claude Code stopped running in less than an hour. There was indeed a working Golang project, but it only had the “it runs” part — the spec wasn’t fully implemented.

At the very least, we can now confirm that instructions like this simply don’t work:

/loop 5m Implement all features according to SPEC.md

Harness Engineering

After digging deeper into the problem, I realized it shares the same nature as early Prompt Engineering — when prompts are too simple, language models often fail to perform well.

Around the same time Ralph Loop was being hotly discussed online, Anthropic’s research had already been quietly thinking about Harness Engineering problems. This led to their article Effective harnesses for long-running agents, followed by the more recent Harness design for long-running application development and OpenAI’s Harness Engineering: Making the Most of Codex in an Agent-First World, all sparking discussion around this concept.

There’s no widely accepted Chinese translation for “Harness Engineering” yet. Some use “駕馭” (mastery), but I think “約束” (constraint) is closest to the concept, so that’s the term I use in this article.

In practice, harness engineering has existed for a long time. The agent frameworks we use (like LangChain) or the coding agents we work with all require a significant degree of harness engineering to function properly.

Even so, no matter how well the software layer is designed, if the prompts that determine behavior aren’t designed to match, things still won’t work well. This is exactly why /loop with a simple prompt doesn’t perform reliably.

Although many people praise OpenCode, my own experience still feels far behind compared to Claude Code. The leaked system prompt that was previously extracted from Claude Code (before Anthropic removed it) further confirmed my belief that the runtime framework and system prompt must work well together to be effective. This remains a significant challenge in agent development.

Custom Skills

Ultimately, my approach to improving /loop’s reliability was to use 200–500 words of prompt text to constrain its behavior, which raised overall capability to a usable level. This also freed up more of my time for thinking, as Claude Code began operating in unattended mode.

Once I confirmed this mechanism was stable, I set about designing powerloop — a skill specifically built to fill the constraint gap that /loop originally lacked. For implementation details, refer to the project’s README.

The approach isn’t complicated. Following Anthropic’s harness engineering experiments from last year, we just need Claude Code to write a plan, execute it, and frequently “check for issues” to achieve a stable, usable level. Of course, errors and omissions still occur, but it’s significantly better than the previous state of merely “running.”

My approach with the /powerloop skill is straightforward. Unlike /loop, it first confirms the “goal” and the skills needed for the task. Once confirmed, Claude Code writes a task plan and handoff notes in .powerloop/[date]-[name].note.md.

It then generates a dedicated system prompt for the Cron tool, which essentially describes the skills used at each stage (execution, checking), the location of the task plan, and how to update it.

Just this alone enables Claude Code to run continuously for 3–5 hours. Compared to earlier this year when I had to stay focused at my computer, it’s now at the point where I can comfortably grab lunch or play a couple of games.

Even though many people’s attention is currently focused on “hot terms” like harness engineering, the foundation of these advanced techniques is still prompt engineering evolved to handle more complex situations. A good prompt and operational guidance is sometimes still what matters most.