Disaggregated Prefill/Decode on Consumer GPUs

Running llm-d’s disaggregated prefill/decode architecture across an RTX 3060 and a Tesla T4 connected by 25GbE RDMA. What worked, what broke, and what I learned about KV cache transfer at the edge of what consumer hardware can do.

March 14, 2026 · 10 min · Sam Batschelet