Aditya Kumar

https://hiraditya.github.io/Aditya KumarCompilers, Static analysis, software performance optimizations. 2026-06-19T21:49:06+00:00 Aditya Kumar https://hiraditya.github.io/ Jekyll © 2026 Aditya Kumar /assets/img/favicons/favicon.ico /assets/img/favicons/favicon-96x96.png Building vLLM from Source: A Field Guide (with all the pitfalls)2026-06-19T15:00:00+00:00 2026-06-19T15:00:00+00:00 https://hiraditya.github.io/posts/building-vllm-from-source/ Aditya Kumar

A step-by-step field guide to building vLLM from source on Ubuntu 26.04, covering Python 3.14 compatibility, CUDA driver issues, and toolchain pitfalls.

vLLM's op IR, or: where the inference engine meets the compiler2026-06-17T15:00:00+00:00 2026-06-19T17:06:15+00:00 https://hiraditya.github.io/posts/vllm-op-ir-where-inference-meets-compiler/ Aditya Kumar

How vLLM's op‑level IR reconciles the tension between a compiler target and hand‑tuned kernel dispatch, enabling graph‑level fusion while supporting multiple back‑ends.

Loop Unrolling in the ML Era2026-06-16T15:00:00+00:00 2026-06-17T19:22:58+00:00 https://hiraditya.github.io/posts/why-loop-unrolling-is-popular-again/ Aditya Kumar

If you have a massive compute architecture—whether it’s a modern wide-SIMD vector engine, a Tensor Core array, or a custom deep learning accelerator like a Systolic Array—you face one fundamental problem: feeding the beast. You have immense execution width, but if your instructions are bottlenecked by branch overhead and short basic blocks, those execution units sit idle. This architectural sh...

"Hello, World!" in a Heterogeneous System2026-06-13T15:00:00+00:00 2026-06-16T14:25:05+00:00 https://hiraditya.github.io/posts/hello-world-in-a-heterogeneous-system/ Aditya Kumar

In a previous post, we explored the monumental software stack required to run a simple “Hello, World!” program on a modern operating system. But what happens when we apply these concepts to a heterogeneous system—where a host machine is solely responsible for launching the program on a completely different target architecture? Applying concepts like loaders, stack initialization, and ABI const...

Hardening the ELF: Understanding RELRO and GOT Overwrites2026-06-12T15:00:00+00:00 2026-06-16T05:38:50+00:00 https://hiraditya.github.io/posts/hardening-the-elf-understanding-relro/ Aditya Kumar

In our previous post, we took a deep dive into the hidden complexities of the simplest C program. We discussed how modern Position Independent Executables (PIE) rely on the PLT (Procedure Linkage Table) and GOT (Global Offset Table) to dynamically resolve shared library functions like puts(). We noted that under “lazy binding”, the dynamic linker looks up the true memory address of puts on the...