<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:media="http://search.yahoo.com/mrss/"><channel><title>WebAssembly on Aotokitsuruya is learning</title><link>https://blog.aotoki.me/en/tags/WebAssembly/</link><description>Recent content in WebAssembly on Aotokitsuruya is learning</description><generator>Hugo -- gohugo.io</generator><language>en</language><managingEditor>contact@aotoki.me (Aotokitsuruya)</managingEditor><webMaster>contact@aotoki.me (Aotokitsuruya)</webMaster><lastBuildDate>Mon, 15 Jun 2026 21:40:29 +0800</lastBuildDate><atom:link href="https://blog.aotoki.me/en/tags/WebAssembly/index.xml" rel="self" type="application/rss+xml"/><item><title>Kobako: Cold Start Can Be 100× Faster?</title><link>https://blog.aotoki.me/en/posts/2026/06/17/kobako-cold-start-100x-faster/</link><category>LLM</category><category>AI</category><category>Experience</category><category>Ruby</category><category>WebAssembly</category><category>Gem</category><category>Performance</category><pubDate>Wed, 17 Jun 2026 00:00:00 +0800</pubDate><author>contact@aotoki.me (Aotokitsuruya)</author><guid>https://blog.aotoki.me/en/posts/2026/06/17/kobako-cold-start-100x-faster/</guid><description>&lt;p&gt;&lt;a href="https://github.com/elct9620/kobako"&gt;Kobako&lt;/a&gt; is a sandbox I recently built on WebAssembly and mruby for the Ruby ecosystem, in support of &lt;a href="https://blog.aotoki.me/en/tags/Harness-Engineering/"&gt;Harness Engineering&lt;/a&gt;, to fill the gap where AI-written code has no safe environment to run in.&lt;/p&gt;
&lt;p&gt;I already introduced Kobako&amp;rsquo;s design in &lt;a href="https://blog.aotoki.me/en/posts/2026/05/20/kobako-ruby-sandbox-for-ai/"&gt;a previous post&lt;/a&gt;, so this time I want to talk about performance. In its early versions, the Cold Start (the initial startup) took roughly 500 ms. That&amp;rsquo;s a lot slower than the 200 ms response time you&amp;rsquo;d usually aim for as a best practice. Even though AI generally tolerates slower responses, this isn&amp;rsquo;t about waiting on an LLM, so it still deserves to be judged by traditional API standards.&lt;/p&gt;</description><content:encoded>&lt;p&gt;&lt;a href="https://github.com/elct9620/kobako"&gt;Kobako&lt;/a&gt; is a sandbox I recently built on WebAssembly and mruby for the Ruby ecosystem, in support of &lt;a href="https://blog.aotoki.me/en/tags/Harness-Engineering/"&gt;Harness Engineering&lt;/a&gt;, to fill the gap where AI-written code has no safe environment to run in.&lt;/p&gt;
&lt;p&gt;I already introduced Kobako&amp;rsquo;s design in &lt;a href="https://blog.aotoki.me/en/posts/2026/05/20/kobako-ruby-sandbox-for-ai/"&gt;a previous post&lt;/a&gt;, so this time I want to talk about performance. In its early versions, the Cold Start (the initial startup) took roughly 500 ms. That&amp;rsquo;s a lot slower than the 200 ms response time you&amp;rsquo;d usually aim for as a best practice. Even though AI generally tolerates slower responses, this isn&amp;rsquo;t about waiting on an LLM, so it still deserves to be judged by traditional API standards.&lt;/p&gt;
&lt;h2 id="pre-compile"&gt;Ahead-of-Time Compilation&lt;/h2&gt;
&lt;p&gt;After years of development, WebAssembly now has many new techniques and specifications. One of them, &lt;code&gt;cwasm&lt;/code&gt; (the artifact Wasmtime produces after pre-compiling a &lt;code&gt;.wasm&lt;/code&gt;), is a form of AOT (Ahead-of-Time) processing.&lt;/p&gt;
&lt;p&gt;The reason it matters: once we compile to the &lt;code&gt;wasm&lt;/code&gt; format, the output is designed for the WebAssembly virtual machine. Running mruby on top of that essentially means starting an mruby virtual machine inside a WebAssembly virtual machine—two layers stacked on top of each other.&lt;/p&gt;
&lt;p&gt;With &lt;code&gt;cwasm&lt;/code&gt;, the code is compiled directly into machine code for the CPU of the current runtime environment beforehand, which gets us infinitely close to eliminating the WebAssembly virtual machine and leaving only the cost of the mruby virtual machine. Startup time drops from 500 ms to around 5 ms, so the Cold Start instantly becomes 100× faster.&lt;/p&gt;
&lt;p&gt;In short, Kobako itself doesn&amp;rsquo;t do much here—it simply leverages the WebAssembly standard to accelerate startup down to about the cost of a simple SQL query, which is practically imperceptible for most applications.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;At this year&amp;rsquo;s RubyKaigi, Matz&amp;rsquo;s new project &lt;a href="https://github.com/matz/spinel"&gt;Spinel&lt;/a&gt; happens to be a kind of AOT compiler for Ruby. As for &lt;code&gt;cwasm&lt;/code&gt;, this was my first encounter with it—my understanding had always stopped at &lt;code&gt;wasm&lt;/code&gt;—and collaborating with AI unexpectedly taught me a new concept.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2 id="pre-boot"&gt;Pre-Booting&lt;/h2&gt;
&lt;p&gt;Once Kobako has started up once on a device, AOT brings the subsequent Cold Start down from 500 ms to just 5 ms. So can &lt;code&gt;Kobako::Sandbox.new&lt;/code&gt; get faster too?&lt;/p&gt;
&lt;p&gt;In the original design, every &lt;code&gt;Kobako::Sandbox.new&lt;/code&gt; creates its own independent instance and re-initializes the mruby virtual machine on each &lt;code&gt;#run&lt;/code&gt; or &lt;code&gt;#eval&lt;/code&gt;, ensuring reusability and clean isolation. But every extra sandbox takes up about 560 KB of memory, which isn&amp;rsquo;t friendly to a design that wants to &amp;ldquo;create and destroy in bulk&amp;rdquo;—as the count grows, memory is consumed rapidly. There&amp;rsquo;s also an initialization cost of about 130 µs, even though that&amp;rsquo;s already very fast.&lt;/p&gt;
&lt;p&gt;Wasmtime offers a mechanism called &lt;a href="https://crates.io/crates/wasmtime-wizer"&gt;wizer&lt;/a&gt; that lets us prepare the runtime memory ahead of time. This works because every time a new mruby virtual machine starts, the memory is identical across all &lt;code&gt;Kobako::Sandbox&lt;/code&gt; instances using the same &lt;code&gt;.wasm&lt;/code&gt;. So if we run the mruby virtual machine to its started-up state and save that into the &lt;code&gt;.wasm&lt;/code&gt; in advance, future startups no longer need to recompute it.&lt;/p&gt;
&lt;p&gt;Combining this with two Wasmtime features, &lt;code&gt;InstancePre&lt;/code&gt; and &lt;code&gt;Copy-on-Write&lt;/code&gt;, we can push startup time and memory usage even lower.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;InstancePre&lt;/code&gt; - Computes Kobako&amp;rsquo;s ABI (Application Binary Interface) ahead of time and reuses it, instead of recomputing on every call&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Copy-on-Write&lt;/code&gt; - When creating a new &lt;code&gt;Kobako::Sandbox&lt;/code&gt;, it uses the same baked mruby virtual machine rather than building a separate one&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;With these in place, a single &lt;code&gt;Kobako::Sandbox.new&lt;/code&gt; drops from 130 µs to around 30 µs, and memory usage falls from 560 KB to 1 KB—because each sandbox no longer needs to keep a full copy of memory, sharing the same base until something changes.&lt;/p&gt;
&lt;h2 id="design-decision"&gt;Design Trade-offs&lt;/h2&gt;
&lt;p&gt;I had originally expected to trade relatively more memory for faster speed, but the result surprised me: both speed and memory usage improved dramatically, helping me push the project to a production-ready level much sooner.&lt;/p&gt;
&lt;p&gt;What stands out is that most of the techniques used here aren&amp;rsquo;t new—they&amp;rsquo;re built on mechanisms that have been in use for years. But with AI&amp;rsquo;s help, I quickly dug them up and placed them where they belonged.&lt;/p&gt;
&lt;p&gt;The impact on the software engineer as a profession still lines up with the shift we already know: experienced engineers become more valuable, precisely because &amp;ldquo;accepting unconditionally&amp;rdquo; and &amp;ldquo;selecting consciously&amp;rdquo; are two different things. The latter is the mark of a professional engineer. Before this revision, I had already made the intuitive judgment that &amp;ldquo;this can be reused with a cache,&amp;rdquo; reducing &amp;ldquo;waiting 500 ms every time&amp;rdquo; to waiting just once.&lt;/p&gt;
&lt;p&gt;Once the overall functionality was complete, though, thinking about per-startup time and memory footprint led me to ask, &amp;ldquo;Can it be even better?&amp;rdquo; Limited by my unfamiliarity with WebAssembly, I used AI to gather and explore information extensively, and quickly confirmed that AOT, baking, and similar techniques were reasonable and helpful. Working in stages (if AOT didn&amp;rsquo;t pan out, baking alone probably wouldn&amp;rsquo;t differ much), I pushed forward step by step to reach a better outcome.&lt;/p&gt;
&lt;p&gt;I think this is a question well worth pondering in the AI era: what is the &amp;ldquo;acceleration&amp;rdquo; we actually get from AI? In Kobako&amp;rsquo;s case, the dramatic improvement of the whole project came from rapidly gathering information, combined with enough familiarity with software development knowledge to arrive at the right judgment.&lt;/p&gt;</content:encoded></item></channel></rss>