close

Efficient AI Computing,
Transforming the Future.

Projects

To choose projects, simply check the boxes of the categories, topics and techniques.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Image

VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference

IROS 2026
 (
)

VLASH is a general asynchronous inference framework for VLAs that delivers smooth, accurate, and low-latency control with no overhead or architectural changes. By rolling the robot state forward with the previous action chunk, it achieves up to 2.03× speedup and 17.4× lower reaction latency while fully preserving accuracy.

Image

ForeAct: Steering Your VLA with Efficient Visual Foresight Planning

CVPR 2026 Highlight
 (
Highlight
)

ForeAct is a plug-and-play visual foresight planner that enables state-of-the-art VLAs to anticipate high-fidelity future observations for improved decision-making, generating 640×480 predictions in just 0.33s on a single H100 GPU without any architectural changes.

Image

Locality-aware Parallel Decoding for Efficient Autoregressive Image Generation

ICLR 2026 Oral
 (
Oral
)

We introduce Locality-aware Parallel Decoding to accelerate autoregressive image generation and achieve 13× faster than traditional AR models and at least 3.4× faster than previous parallelized AR models.

Image

StreamingVLM: Real-Time Understanding for Infinite Video Streams

ICLR 2026
 (
)

StreamingVLM enables real-time understanding of infinite videos with low, stable latency. By aligning training on overlapped video chunks with an efficient KV cache, it runs at 8 FPS on a single H100. It achieves a 66.18% win rate vs. GPT-4o mini on a new benchmark with videos averaging over 2 hours long.