-
-
Notifications
You must be signed in to change notification settings - Fork 17k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Bugfix][Gemma4MoE] Fix AutoRound quantized Gemma4 MoE loading
#43227
opened May 20, 2026 by
wxwxwwxxx
Loading…
[CPU] Experimentally enable Triton and MRV2
ci/build
cpu
Related to CPU backends
v1
#43225
opened May 20, 2026 by
bigPYJ1151
Member
•
Draft
4 tasks
Fix FlashInfer TRTLLM NvFP4 monolithic MoE routing
nvidia
#43223
opened May 20, 2026 by
zhangxin81
Contributor
Loading…
[EPLB] Make async EPLB default
ci/build
documentation
Improvements or additions to documentation
#43219
opened May 20, 2026 by
ilmarkov
Contributor
Loading…
4 tasks
[ROCm] MoRI connector telemetry
kv-connector
rocm
Related to AMD ROCm
#43218
opened May 20, 2026 by
simondanielsson
Contributor
•
Draft
4 tasks
[Misc] Add exponential distribution to multi-turn benchmark
performance
Performance-related issues
#43217
opened May 20, 2026 by
nikonyrh-siloai
Loading…
4 tasks done
[Misc] Add --max-duration-sec to benchmark_serving_multi_turn.py
performance
Performance-related issues
#43215
opened May 20, 2026 by
nikonyrh-siloai
Loading…
3 of 4 tasks
[Perf][DSv4] Add cuteDSL generic LL Blockwise FP8 GEMM
#43214
opened May 20, 2026 by
LopezCastroRoberto
Contributor
•
Draft
[Model] Fix MiniCPM-V 4.6 vit_merger qkv weight loading
#43213
opened May 20, 2026 by
tc-mb
Contributor
Loading…
[Bugfix] Fix multi-turn benchmark's sleep to match the configured request rate
bug
Something isn't working
performance
Performance-related issues
#43212
opened May 20, 2026 by
nikonyrh-siloai
Loading…
3 of 4 tasks
[Bugfix][Reasoning] Properly detect reasoning end when using thinking_token_budget
bug
Something isn't working
v1
#43210
opened May 20, 2026 by
schoennenbeck
Contributor
Loading…
[7/n] Migrate pos_encoding and norm kernels to libtorch stable ABI (continued)
ci/build
#43209
opened May 20, 2026 by
cleonard530
Contributor
Loading…
5 tasks
[Docs] Add drain shutdown section to Kubernetes deployment guide
documentation
Improvements or additions to documentation
#43208
opened May 20, 2026 by
markmc
Member
Loading…
[KV Offload] Add
get_request_offloading_context lifecycle hook
kv-connector
v1
#43205
opened May 20, 2026 by
ronensc
Contributor
Loading…
4 tasks
[Cleanup]Simplify UnitaryKVCacheCoordinator hash_block_size assert
v1
#43204
opened May 20, 2026 by
maang-h
Contributor
Loading…
[Spec Decode] Add FlashInfer metadata grouping for DFlash SWA
needs-rebase
nvidia
qwen
Related to Qwen models
speculative-decoding
v1
#43200
opened May 20, 2026 by
gq112
Loading…
【Feature】Modify the fps parameter when loading the multimodal model Video.
bug
Something isn't working
multi-modality
Related to multi-modality (#4194)
#43198
opened May 20, 2026 by
lucky-dep
Loading…
[CI] De-flake test_models for bigscience/bloom-560m
ready
ONLY add when PR is ready to merge/full CI is needed
#43197
opened May 20, 2026 by
haosdent
Contributor
Loading…
Update KDA chunk prefill decay to use exp2 semantics
performance
Performance-related issues
verified
Run pre-commit for new contributors without triggering other tests
#43195
opened May 20, 2026 by
zexplorerhj
Loading…
[Bugfix] fix device mismatch in MiniCPM-o-4_5 resampler
bug
Something isn't working
ready
ONLY add when PR is ready to merge/full CI is needed
#43194
opened May 20, 2026 by
yma11
Contributor
Loading…
4 tasks
Extend prefix-cache soft-pin with a popular-insert signal (follow-up to #42985)
ci/build
deepseek
Related to DeepSeek models
documentation
Improvements or additions to documentation
kv-connector
nvidia
performance
Performance-related issues
v1
#43191
opened May 20, 2026 by
manueldomke
Loading…
1 of 4 tasks
Previous Next
ProTip!
Updated in the last three days: updated:>2026-05-17.