Skip to main content
Aotokitsuruya
Aotokitsuruya
Senior Software Developer
Published at

How Much Content Do Agent Skills Need?

This article is translated by AI, if have any corrections please let me know.

I previously wrote Should You Have Your Own Agent Skills? to explain why I believe everyone should maintain their own skills. But another phenomenon I’ve noticed is: are those “impressive skill collections” really as magical as they seem?

From my own experience, things work just as well without all that content — stuffing in too much might actually waste tokens and even hurt performance.

Reasoning Model

Most frontier models today have adopted the Reasoning Model design. Because reasoning models include explicit reasoning steps, their performance is more stable, which means they no longer need CoT (Chain of Thought) prompting to assist them.

Take Claude Code as an example. Articles like Best Practices for Claude Code and Effective context engineering for AI agents repeatedly emphasize the importance of “setting goals” and “not being overly detailed,” because being too prescriptive can interfere with how the model operates.

For instance, with a non-reasoning model, when I prompt “print Hello World,” there’s a high chance I’ll get a Python or JavaScript file, simply because they’re statistically more common (more widely used).

With a reasoning model, the additional reasoning steps can change things. The model might think: “Wait, the user mentioned they’re a Ruby engineer, so defaulting to Ruby would make more sense” — and then produce a Ruby version, rather than relying purely on probability.

This is a simplified example. Even smaller local models can handle this kind of reasoning when the context is limited.

If instead I say “print Hello World in PHP,” the explicit PHP constraint removes the reasoning model’s opportunity to explore. Since the language is already specified, the reasoning steps don’t need to consider user preferences, project conventions, or other context — which means one less chance to make a better judgment call. This is exactly why overly detailed instructions in skill design can backfire.

Skill Types

Now that we understand how reasoning works, we can use this as a foundation for thinking about skill design — specifically, how to guide “reasoning” to proceed as expected.

Using the “print Hello World” example from earlier: printing text is the goal, and there are many ways to achieve it. When context is available — like CLAUDE.md or AGENTS.md — the model can reason: “Based on the CLAUDE.md contents, this project is mostly Ruby files, so Ruby would be the most appropriate choice.”

But there are different scenarios. The above was about executing a task. What if the situation becomes “there’s a lot of duplication in the implementation — what should we do?” How do we drive the model to choose the approach we expect?

  • Write a method to reduce duplication
  • Suggest using Factory Pattern
  • Use Design Patterns to maintain the project

Without any context, the model is most likely to choose “write a method to reduce duplication.” Compared to the concept of Creation Method (a simplified form of Factory Pattern), “implement a method to reduce duplication” is far more common in training data.

An experienced engineer would likely give the instruction “refactor using Factory,” and there’s a high chance the model would choose the Creation Method approach.

However, considering reasoning model capabilities, a more flexible approach would be: “use Design Patterns to refactor.” The model will reason its way to the Creation Method on its own — you don’t need to describe “duplication” or hint at “Factory.” The model will discover the duplication and reason that if we’re talking about Design Patterns, Factory is probably the best fit.

Following this classification, we can identify two broad directions:

  • Methods for achieving the goal
  • Knowledge needed to achieve the goal

At the same time, most people would agree that Large Language Models (LLMs) know more than most humans. So the key isn’t “repeatedly reminding the model of knowledge” but rather getting the model to “focus on the right knowledge.”

Expert Skills

Building an “expert skill” requires more than just knowledge — we need to combine “methods” and “knowledge” to do it well.

Although Claude Code currently treats Commands and Skills as having the same role, the directory structure still provides a useful distinction — which happens to map to the difference between “methods” and “knowledge.”

Using my coding-skills as an example, everything in the commands/ directory represents a set of methods (workflows). These won’t turn a junior engineer into a senior one, but they can significantly reduce trial-and-error. An experienced engineer has a stable set of habits — for instance, when debugging, they check logs first instead of blindly modifying code.

If you’ve used Claude Code’s official /skill-creator, one of the evaluation criteria is token efficiency. Without skills enabled, most tokens are wasted on trial and error.

However, smooth workflows alone can only ensure the process goes well — they can’t guarantee quality. That’s why the instructions in the commands/ directory include Decision Tables that specify which skills to activate.

For example, the /coding:write command requires activating the coding:principles skill when writing new implementations, to understand the guiding principles. The “principles” skill briefly describes SOLID applications along with a Rubric to help verify whether the implementation meets standards.

As a result, when I use /coding:write for a brand new implementation, coding:principles is automatically activated. Following SOLID principles — even though no examples or specific approaches are provided, just violation rules like “Direct instantiation of dependencies” — a model like Opus 4.6 will naturally choose dependency injection rather than directly instantiating dependencies in the constructor.

The reasoning behind this design is straightforward: first, trust that the model’s knowledge far exceeds what we imagine — the right keyword reminders are more effective. Once those keywords take effect, good context helps even more. The initial state comes from a well-maintained CLAUDE.md that provides understanding of basic commands and project structure. Then commands establish workflows that guide the model to use techniques like git log to check recent changes, search semantically for relevant files, and connect with the knowledge activated by skills — resulting in remarkably good performance.

This is why we mostly don’t need to rely on skills designed by others, or those “impressively rich” skill sets. We just need to arrange proper workflows to introduce context, pair them with skills that precisely activate the necessary knowledge, and most of the time the results will be excellent.

This is also why I believe we don’t need those content-heavy skills. Skills should reflect a team’s or individual’s understanding and application of knowledge.