Kobako: Cold Start Can Be 100× Faster?

contact@aotoki.me (Aotokitsuruya) — Wed, 17 Jun 2026 00:00:00 +0800

Kobako is a sandbox I recently built on WebAssembly and mruby for the Ruby ecosystem, in support of Harness Engineering, to fill the gap where AI-written code has no safe environment to run in.

I already introduced Kobako’s design in a previous post, so this time I want to talk about performance. In its early versions, the Cold Start (the initial startup) took roughly 500 ms. That’s a lot slower than the 200 ms response time you’d usually aim for as a best practice. Even though AI generally tolerates slower responses, this isn’t about waiting on an LLM, so it still deserves to be judged by traditional API standards.

Ahead-of-Time Compilation

After years of development, WebAssembly now has many new techniques and specifications. One of them, cwasm (the artifact Wasmtime produces after pre-compiling a .wasm), is a form of AOT (Ahead-of-Time) processing.

The reason it matters: once we compile to the wasm format, the output is designed for the WebAssembly virtual machine. Running mruby on top of that essentially means starting an mruby virtual machine inside a WebAssembly virtual machine—two layers stacked on top of each other.

With cwasm, the code is compiled directly into machine code for the CPU of the current runtime environment beforehand, which gets us infinitely close to eliminating the WebAssembly virtual machine and leaving only the cost of the mruby virtual machine. Startup time drops from 500 ms to around 5 ms, so the Cold Start instantly becomes 100× faster.

In short, Kobako itself doesn’t do much here—it simply leverages the WebAssembly standard to accelerate startup down to about the cost of a simple SQL query, which is practically imperceptible for most applications.

At this year’s RubyKaigi, Matz’s new project Spinel happens to be a kind of AOT compiler for Ruby. As for cwasm, this was my first encounter with it—my understanding had always stopped at wasm—and collaborating with AI unexpectedly taught me a new concept.

Pre-Booting

Once Kobako has started up once on a device, AOT brings the subsequent Cold Start down from 500 ms to just 5 ms. So can Kobako::Sandbox.new get faster too?

In the original design, every Kobako::Sandbox.new creates its own independent instance and re-initializes the mruby virtual machine on each #run or #eval, ensuring reusability and clean isolation. But every extra sandbox takes up about 560 KB of memory, which isn’t friendly to a design that wants to “create and destroy in bulk”—as the count grows, memory is consumed rapidly. There’s also an initialization cost of about 130 µs, even though that’s already very fast.

Wasmtime offers a mechanism called wizer that lets us prepare the runtime memory ahead of time. This works because every time a new mruby virtual machine starts, the memory is identical across all Kobako::Sandbox instances using the same .wasm. So if we run the mruby virtual machine to its started-up state and save that into the .wasm in advance, future startups no longer need to recompute it.

Combining this with two Wasmtime features, InstancePre and Copy-on-Write, we can push startup time and memory usage even lower.

InstancePre - Computes Kobako’s ABI (Application Binary Interface) ahead of time and reuses it, instead of recomputing on every call
Copy-on-Write - When creating a new Kobako::Sandbox, it uses the same baked mruby virtual machine rather than building a separate one

With these in place, a single Kobako::Sandbox.new drops from 130 µs to around 30 µs, and memory usage falls from 560 KB to 1 KB—because each sandbox no longer needs to keep a full copy of memory, sharing the same base until something changes.

Design Trade-offs

I had originally expected to trade relatively more memory for faster speed, but the result surprised me: both speed and memory usage improved dramatically, helping me push the project to a production-ready level much sooner.

What stands out is that most of the techniques used here aren’t new—they’re built on mechanisms that have been in use for years. But with AI’s help, I quickly dug them up and placed them where they belonged.

The impact on the software engineer as a profession still lines up with the shift we already know: experienced engineers become more valuable, precisely because “accepting unconditionally” and “selecting consciously” are two different things. The latter is the mark of a professional engineer. Before this revision, I had already made the intuitive judgment that “this can be reused with a cache,” reducing “waiting 500 ms every time” to waiting just once.

Once the overall functionality was complete, though, thinking about per-startup time and memory footprint led me to ask, “Can it be even better?” Limited by my unfamiliarity with WebAssembly, I used AI to gather and explore information extensively, and quickly confirmed that AOT, baking, and similar techniques were reasonable and helpful. Working in stages (if AOT didn’t pan out, baking alone probably wouldn’t differ much), I pushed forward step by step to reach a better outcome.

I think this is a question well worth pondering in the AI era: what is the “acceleration” we actually get from AI? In Kobako’s case, the dramatic improvement of the whole project came from rapidly gathering information, combined with enough familiarity with software development knowledge to arrive at the right judgment.

WebAssembly on Aotokitsuruya is learning

Kobako: Cold Start Can Be 100× Faster?

Ahead-of-Time Compilation

Pre-Booting

Design Trade-offs