Cyrus Leung
|
e5de19ff9a
|
[CI/Build[ Don't auto-rebase PRs with CI failures (#39443)
Close inactive issues and PRs / close-issues-and-pull-requests (push) Has been cancelled
macOS Apple Silicon Smoke Test / macos-m1-smoke-test (push) Has been cancelled
pre-commit / pre-run-check (push) Has been cancelled
pre-commit / pre-commit (push) Has been cancelled
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 13:57:37 -07:00 |
|
zzaebok
|
edee96519a
|
[Spec Decode] fix returning size mismatch on extract hidden states proposer (#38610)
Signed-off-by: Jaebok Lee <jaebok9541@naver.com>
Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
|
2026-04-09 20:39:39 +00:00 |
|
Rishi Puri
|
adaabb8a55
|
Add nightly b200 test for spec decode eagle correctness (#38577)
Signed-off-by: Rishi Puri <riship@nvidia.com>
|
2026-04-09 20:09:09 +00:00 |
|
Ekagra Ranjan
|
f7cad67412
|
[ASR] Fix spacing bw chunks in multi chunk audio transcription (#39116)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-04-09 12:46:33 -07:00 |
|
Xinyu Chen
|
a8134aef4e
|
[XPU] check is_xccl_available before oneccl warmup (#39302)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-04-09 12:42:17 -07:00 |
|
Michael Goin
|
2800706f06
|
[Refactor] Move NVFP4 GEMM management into NvFp4LinearKernel (#39129)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-04-09 15:05:36 -04:00 |
|
Cyrus Leung
|
0d310ffbeb
|
[CI/Build] Update auto-rebase rule (#39429)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 10:59:56 -07:00 |
|
Micah Williamson
|
d5f75fdf50
|
[ROCm] Correctly guard fused_silu_mul_block_quant on ROCm (#39387)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-04-09 17:59:03 +00:00 |
|
PikaPikachu
|
827268e98d
|
[Quantization] Support Quark W8A8 INT8 MoE inference (#36320)
Signed-off-by: kangletian <Letian.Kang@amd.com>
|
2026-04-09 17:24:43 +00:00 |
|
Wentao Ye
|
56e19d7ee2
|
[Model Runner V2] Fix flex attention kv blocks calculation issue (#39353)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-09 13:07:43 -04:00 |
|
Andreas Karatzas
|
9036d4c464
|
[ROCm][CI] Resolved nvidia package deps issue (#39421)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-10 00:06:06 +08:00 |
|
Lucas Kabela
|
a8c6ee9b78
|
[Performance Improvement] Update batched_count_greater_than to handle batch size 1 without recompile (#38933)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-09 23:51:31 +08:00 |
|
Cyrus Leung
|
3b1d9c3156
|
[CI/Build] Fix memory cleanup in MM test (#39411)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 08:50:45 -07:00 |
|
Cyrus Leung
|
54d244f28f
|
[UX] Improve error message for MM input too long (#39409)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 13:20:19 +00:00 |
|
Richard Zou
|
6c749399b7
|
[BugFix] fix tests/kernels/moe/test_moe_layer.py (#39404)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-04-09 08:48:59 -04:00 |
|
lalit10
|
91eea72330
|
[Tests] Add Qwen3-VL multimodal memory leak check (#39268)
Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 04:54:46 -07:00 |
|
Andrii Skliar
|
df2503e125
|
nemotron-nano-vl: Allow use_audio_in_video to be passed at vllm serve time (#38538)
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
|
2026-04-09 11:44:39 +00:00 |
|
Nick Hill
|
c8d98f81f6
|
[Core] Simplify API server handshake (#39364)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-09 18:56:15 +08:00 |
|
Harry Mellor
|
d87fb264df
|
[Docs] Bring README updates into docs README (#39397)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-04-09 10:35:00 +00:00 |
|
wang.yuqi
|
66c079ae83
|
[Frontend][4/n] Improve pooling entrypoints | pooling. (#39153)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-09 10:09:45 +00:00 |
|
Shengqi Chen
|
b6c9be509e
|
[CI] fix possible user permission issues in nightly index generation (#39390)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2026-04-09 08:14:07 +00:00 |
|
Qidong Su
|
ed733802f0
|
Fix NUMA binding on non-CDMM Grace-Blackwell systems (#39361)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-09 07:36:51 +00:00 |
|
Andrew Barnes
|
8a34c5087a
|
[ROCm] Remove unnecessary fp8 roundtrip in gather cache NHD dequant (#39122)
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
|
2026-04-09 15:12:22 +08:00 |
|
Wentao Ye
|
ed2f282bc8
|
[Perf] Optimize redundant sync for pooling model, 3.7% Throughput Improvement (#39113)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 23:12:23 -07:00 |
|
Zhewen Li
|
9e78555743
|
[Docker] Add fastsafetensors to NVIDIA Dockerfile (#38950)
|
2026-04-08 22:21:37 -07:00 |
|
sihao_li
|
e80e633927
|
[XPU] Skip VLLM_BATCH_INVARIANT for XPU in EAGLE DP test (#39164)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-09 12:45:16 +08:00 |
|
Khairul Kabir
|
490f17d0c7
|
[Multimodal] Fix nested_tensors_equal: add length check for lists and tuple support (#38388)
Signed-off-by: khairulkabir1661 <khairulkabir1661@users.noreply.github.com>
Co-authored-by: khairulkabir1661 <khairulkabir1661@users.noreply.github.com>
|
2026-04-09 04:40:37 +00:00 |
|
Yongye Zhu
|
2e98406048
|
[Refactor] Improve indexer decode path metadata preparation (#38865)
|
2026-04-08 20:49:15 -07:00 |
|
Chendi.Xue
|
ef5a226819
|
[PD][HeteroArch]Fix accuracy issue with CPU_ATTN as Decoder and Flash_ATTN as prefiller (#38935)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-04-09 11:19:07 +08:00 |
|
Wentao Ye
|
aec18492d0
|
[CI] Fix mypy for vllm/v1/ops (#39219)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-09 11:06:34 +08:00 |
|
noobHappylife
|
2a49284c8a
|
Fix Responses JSON schema alias serialization (#38519)
Signed-off-by: noobhappylife <aratar1991@hotmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-04-09 10:50:16 +08:00 |
|
Ilya Boytsov
|
d37b378762
|
[Model] Update ColModernVBERT to support latest HF checkpoint (#39307)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-04-09 10:48:51 +08:00 |
|
Wei Zhao
|
92fbec391b
|
[Bug] Fix routing bias dtype for trtllm per-block fp8 moe (#38989)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-04-08 19:42:43 -07:00 |
|
Ajay Anubolu
|
2f41d6c063
|
[Bugfix] Fix cpu-offload-gb assertion with non-default block sizes (#36461)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-04-08 19:42:16 -07:00 |
|
Dipika Sikka
|
3aecdf08b4
|
[Gemma4] Support quantized MoE (#39045)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
|
2026-04-08 21:57:53 -04:00 |
|
Michael Goin
|
eb4205fee5
|
[UX] Integrate DeepGEMM into vLLM wheel via CMake (#37980)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-04-08 18:56:32 -07:00 |
|
liuzhenwei
|
83aea2147f
|
[XPU][UT] update UTs in CI (#39296)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-09 09:38:16 +08:00 |
|
Maral
|
2e9034c998
|
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
|
2026-04-09 08:50:39 +08:00 |
|
Benjamin Chislett
|
8332078cfd
|
[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-08 20:36:33 -04:00 |
|
Richard Zou
|
ba4a78eb5d
|
[torch.compile] Allow usage of Opaque Objects in PyTorch 2.11 (#39286)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-04-08 23:21:10 +00:00 |
|
Kai Song
|
f3c7941ec8
|
[Bugfix]Fix EP precision for Qwen3.5, Qwen3-Next (#39181)
Signed-off-by: Song Kai <songkai05@baidu.com>
|
2026-04-09 01:47:48 +04:00 |
|
Wentao Ye
|
3352bf8b03
|
[CI Bug] Fix pre-commit issue in main (#39347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 14:10:05 -07:00 |
|
triangleXIV
|
7c94ae16c6
|
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102)
Signed-off-by: triangle14 <y1019026570@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-04-08 14:03:17 -07:00 |
|
Rishi Puri
|
ad05edfbca
|
tests/v1/e2e/spec_decode: assert async scheduling is used (#39206)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2026-04-08 20:30:03 +00:00 |
|
Wentao Ye
|
2018137242
|
[Feature] Batch invariant nvfp4 linear support (#39322)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 16:29:13 -04:00 |
|
Jackmin801
|
a776a48b1c
|
[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-08 19:23:08 +00:00 |
|
Ben Browning
|
8477fe427d
|
[Tool] adjust_request to reasoning parser, and Gemma4 fixes (#39027)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-04-08 19:04:04 +00:00 |
|
Lain
|
e24e0a43a4
|
[Attention] relax the head dim 512 and paged kv for sm90+FA4 (#38835)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-08 18:23:18 +00:00 |
|
Roberto L. Castro
|
b55d830ec7
|
[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode (#37421)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-04-08 13:35:57 -04:00 |
|
Shengqi Chen
|
75e01a39a1
|
[Feature] NUMA binding support for GPU workers (#38635)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-08 09:55:24 -07:00 |
|