Ekagra Ranjan
|
f7cad67412
|
[ASR] Fix spacing bw chunks in multi chunk audio transcription (#39116)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-04-09 12:46:33 -07:00 |
|
PikaPikachu
|
827268e98d
|
[Quantization] Support Quark W8A8 INT8 MoE inference (#36320)
Signed-off-by: kangletian <Letian.Kang@amd.com>
|
2026-04-09 17:24:43 +00:00 |
|
Lucas Kabela
|
a8c6ee9b78
|
[Performance Improvement] Update batched_count_greater_than to handle batch size 1 without recompile (#38933)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-09 23:51:31 +08:00 |
|
Cyrus Leung
|
3b1d9c3156
|
[CI/Build] Fix memory cleanup in MM test (#39411)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 08:50:45 -07:00 |
|
lalit10
|
91eea72330
|
[Tests] Add Qwen3-VL multimodal memory leak check (#39268)
Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-04-09 04:54:46 -07:00 |
|
wang.yuqi
|
66c079ae83
|
[Frontend][4/n] Improve pooling entrypoints | pooling. (#39153)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-09 10:09:45 +00:00 |
|
sihao_li
|
e80e633927
|
[XPU] Skip VLLM_BATCH_INVARIANT for XPU in EAGLE DP test (#39164)
Signed-off-by: sihao.li <sihao.li@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-09 12:45:16 +08:00 |
|
Chendi.Xue
|
ef5a226819
|
[PD][HeteroArch]Fix accuracy issue with CPU_ATTN as Decoder and Flash_ATTN as prefiller (#38935)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-04-09 11:19:07 +08:00 |
|
noobHappylife
|
2a49284c8a
|
Fix Responses JSON schema alias serialization (#38519)
Signed-off-by: noobhappylife <aratar1991@hotmail.com>
Co-authored-by: OpenAI Codex <codex@openai.com>
|
2026-04-09 10:50:16 +08:00 |
|
Ilya Boytsov
|
d37b378762
|
[Model] Update ColModernVBERT to support latest HF checkpoint (#39307)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-04-09 10:48:51 +08:00 |
|
Michael Goin
|
eb4205fee5
|
[UX] Integrate DeepGEMM into vLLM wheel via CMake (#37980)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-04-08 18:56:32 -07:00 |
|
Maral
|
2e9034c998
|
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
|
2026-04-09 08:50:39 +08:00 |
|
Benjamin Chislett
|
8332078cfd
|
[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-08 20:36:33 -04:00 |
|
Wentao Ye
|
3352bf8b03
|
[CI Bug] Fix pre-commit issue in main (#39347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 14:10:05 -07:00 |
|
triangleXIV
|
7c94ae16c6
|
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102)
Signed-off-by: triangle14 <y1019026570@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-04-08 14:03:17 -07:00 |
|
Rishi Puri
|
ad05edfbca
|
tests/v1/e2e/spec_decode: assert async scheduling is used (#39206)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2026-04-08 20:30:03 +00:00 |
|
Wentao Ye
|
2018137242
|
[Feature] Batch invariant nvfp4 linear support (#39322)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 16:29:13 -04:00 |
|
Jackmin801
|
a776a48b1c
|
[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-08 19:23:08 +00:00 |
|
Ben Browning
|
8477fe427d
|
[Tool] adjust_request to reasoning parser, and Gemma4 fixes (#39027)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-04-08 19:04:04 +00:00 |
|
Roberto L. Castro
|
b55d830ec7
|
[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode (#37421)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-04-08 13:35:57 -04:00 |
|
Shengqi Chen
|
75e01a39a1
|
[Feature] NUMA binding support for GPU workers (#38635)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-08 09:55:24 -07:00 |
|
Or Ozeri
|
512c5eb455
|
[kv_offload+HMA][5/N]: Track group block hashes and block IDs (#37109)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-08 19:50:28 +03:00 |
|
Flora Feng
|
13151a4df4
|
[Bugfix] Fix Gemma4 streaming tool call corruption for split boolean/number values (#39114)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 16:46:27 +00:00 |
|
Gregory Shtrasberg
|
56c976c1b5
|
[ROCm] Enable fused_silu_mul_block_quant on ROCm (#38817)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-04-08 11:23:32 -05:00 |
|
haosdent
|
8904fc4d19
|
[Bugfix] Fix V1 logprobs empty strings for multi-byte UTF-8 tokens when logprobs > 0 (#34875)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-04-08 15:30:00 +00:00 |
|
wang.yuqi
|
4e2ab1861d
|
[CI Failure] pin nomic-embed-text-v1 revision (#39292)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-08 11:43:06 +00:00 |
|
yoke
|
d734445fcd
|
[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls (#38909)
Signed-off-by: yoke233 <yoke2012@gmail.com>
|
2026-04-08 11:03:54 +08:00 |
|
Flora Feng
|
927975ead8
|
[Parser] Migrate response api streaming to unified parser (#38755)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-04-08 10:09:00 +08:00 |
|
Flora Feng
|
9ea7d670d8
|
[Bugfix] Fix Qwen3 tool parser for Responses API tools (#38848)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 10:08:51 +08:00 |
|
Andrey Talman
|
2111997f96
|
[release 2.11] Update to torch 2.11 (#34644)
|
2026-04-07 18:55:48 -07:00 |
|
Giancarlo Delfin
|
5daf62271d
|
[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-04-07 17:37:37 -07:00 |
|
Lucas Wilkinson
|
70406eb1dc
|
[Attention][V0 Deprecation] Deprecate accept output buffer (#39125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-07 17:14:58 -04:00 |
|
ibifrost
|
96b5004b71
|
[KVConnector] Support 3FS KVConnector (#37636)
Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com>
Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2026-04-07 15:46:00 +00:00 |
|
Ilya Boytsov
|
6e1100889e
|
fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader (#39176)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-04-07 22:40:55 +08:00 |
|
Ronen Schaffer
|
7c139ab23f
|
[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment (#38217)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-04-07 15:14:45 +03:00 |
|
Jiangyun Zhu
|
8060bb0333
|
[vLLM IR] rework gemma_rms_norm (#39014)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-07 01:37:00 -07:00 |
|
Andreas Karatzas
|
a435e3108d
|
[ROCm][CI] Fix test repo-root assumptions (#39053)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 13:36:21 +08:00 |
|
Andreas Karatzas
|
2df2c85be4
|
[Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 10:57:09 +08:00 |
|
fxmarty-amd
|
00d7b497b3
|
[NVFP4] Support NVFP4 dense models from modelopt and compressed-tensors on AMD Instinct MI300, MI355X and Hopper through emulation (#35733)
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-04-06 16:18:27 -06:00 |
|
Yongye Zhu
|
e8ebbdde83
|
[Quantization] Add FlashInfer CuteDSL batched experts backend for NVFP4 MoE (#38251)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-06 11:57:53 -07:00 |
|
bnellnm
|
f01482408c
|
[MoE Refactor][Test] FusedMoE layer test (#24675)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-06 17:17:23 +00:00 |
|
zhanqiuhu
|
bfdc0a3a99
|
[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer (#37635)
|
2026-04-06 19:07:02 +02:00 |
|
Walter Beller-Morales
|
e69a265135
|
[Feat][Core] safely abort requests when FSM fails to advance (#38663)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-04-06 08:00:16 -07:00 |
|
Julien Denize
|
fef56c1855
|
[Mistral Grammar] Support Grammar Factory (#38150)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-04-06 10:28:51 -04:00 |
|
bhargav-patel-29
|
c5e3454e5a
|
[Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-06 16:19:56 +08:00 |
|
liuchenbing2026
|
f6983f01de
|
MiniMax-M2: add Eagle3 speculative decoding support (#37512)
Signed-off-by: liuchenbing <chenliumail@163.com>
Signed-off-by: liucb <liuchengbao_work@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>
|
2026-04-05 19:50:18 -07:00 |
|
Micah Williamson
|
9570654c6d
|
[ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness (#38184)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-04-06 09:42:02 +08:00 |
|
Greg Pereira
|
4dd49b06f8
|
[Bug] Fix Import paths for encoder_cudagraph modules (#38997)
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 19:11:58 +00:00 |
|
Greg Pereira
|
f53fa26e05
|
[Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters (#38992)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 17:11:18 +00:00 |
|
Aaron Batilo
|
9a528260ef
|
[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987)
Signed-off-by: Aaron Batilo <abatilo@coreweave.com>
|
2026-04-05 02:41:54 -07:00 |
|