class SagarSahu:
role = "Senior Software Engineer"
exp = "10+ years shipping production systems"
focus = ["Heterogeneous Computing", "Edge AI", "Systems Programming", "Cloud & Security"]
languages = ["Python", "C++", "C", "Rust", "Shell", "TypeScript"]
currently = "Programming AMD GPUs & NPUs for accelerated AI inference"I build software that runs close to the metal — from GPU kernels and NPU inference pipelines to cloud-scale security agents deployed on millions of endpoints. I care about performance, correctness, and writing code that other engineers enjoy reading.
| Project | What it does | Tech |
|---|---|---|
| gemem-AMD | 9-session tutorial series: heterogeneous computing on AMD GPU (RDNA 3.5) + NPU (XDNA 2) — from OpenCL kernels to INT8 quantization | Python OpenCL ONNX Runtime PyTorch |
| YoutubeSummarizer | AI-powered YouTube video summarizer with periodic updates | Python AI/ML |
| sample-app-aoai-chatGPT | Web chat interface powered by Azure OpenAI ChatGPT | TypeScript Azure OpenAI |
| distributed_computing | Distributed systems fundamentals — consensus, replication, fault tolerance | C MPI |
- 🔥 Heterogeneous Computing — Writing GPU/NPU kernels for AMD Ryzen AI (RDNA 3.5 + XDNA 2)
- 🧮 Performance Engineering — Amdahl's Law → Roofline Model → Kernel Fusion → real benchmarks
- 🤖 Edge AI — INT8 quantization, ONNX Runtime execution providers, on-device inference
- ☁️ Cloud + AI — Azure OpenAI, Databricks, building production ML pipelines

