Benjamin Chislett
|
8332078cfd
|
[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-08 20:36:33 -04:00 |
|
Wentao Ye
|
3352bf8b03
|
[CI Bug] Fix pre-commit issue in main (#39347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 14:10:05 -07:00 |
|
triangleXIV
|
7c94ae16c6
|
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102)
Signed-off-by: triangle14 <y1019026570@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-04-08 14:03:17 -07:00 |
|
Rishi Puri
|
ad05edfbca
|
tests/v1/e2e/spec_decode: assert async scheduling is used (#39206)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2026-04-08 20:30:03 +00:00 |
|
Wentao Ye
|
2018137242
|
[Feature] Batch invariant nvfp4 linear support (#39322)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 16:29:13 -04:00 |
|
Jackmin801
|
a776a48b1c
|
[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-08 19:23:08 +00:00 |
|
Ben Browning
|
8477fe427d
|
[Tool] adjust_request to reasoning parser, and Gemma4 fixes (#39027)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-04-08 19:04:04 +00:00 |
|
Roberto L. Castro
|
b55d830ec7
|
[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode (#37421)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-04-08 13:35:57 -04:00 |
|
Shengqi Chen
|
75e01a39a1
|
[Feature] NUMA binding support for GPU workers (#38635)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-08 09:55:24 -07:00 |
|
Or Ozeri
|
512c5eb455
|
[kv_offload+HMA][5/N]: Track group block hashes and block IDs (#37109)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-08 19:50:28 +03:00 |
|
Flora Feng
|
13151a4df4
|
[Bugfix] Fix Gemma4 streaming tool call corruption for split boolean/number values (#39114)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 16:46:27 +00:00 |
|
Gregory Shtrasberg
|
56c976c1b5
|
[ROCm] Enable fused_silu_mul_block_quant on ROCm (#38817)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-04-08 11:23:32 -05:00 |
|
haosdent
|
8904fc4d19
|
[Bugfix] Fix V1 logprobs empty strings for multi-byte UTF-8 tokens when logprobs > 0 (#34875)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-04-08 15:30:00 +00:00 |
|
wang.yuqi
|
4e2ab1861d
|
[CI Failure] pin nomic-embed-text-v1 revision (#39292)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-08 11:43:06 +00:00 |
|
yoke
|
d734445fcd
|
[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls (#38909)
Signed-off-by: yoke233 <yoke2012@gmail.com>
|
2026-04-08 11:03:54 +08:00 |
|
Flora Feng
|
927975ead8
|
[Parser] Migrate response api streaming to unified parser (#38755)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-04-08 10:09:00 +08:00 |
|
Flora Feng
|
9ea7d670d8
|
[Bugfix] Fix Qwen3 tool parser for Responses API tools (#38848)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 10:08:51 +08:00 |
|
Andrey Talman
|
2111997f96
|
[release 2.11] Update to torch 2.11 (#34644)
|
2026-04-07 18:55:48 -07:00 |
|
Giancarlo Delfin
|
5daf62271d
|
[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-04-07 17:37:37 -07:00 |
|
Lucas Wilkinson
|
70406eb1dc
|
[Attention][V0 Deprecation] Deprecate accept output buffer (#39125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-07 17:14:58 -04:00 |
|
ibifrost
|
96b5004b71
|
[KVConnector] Support 3FS KVConnector (#37636)
Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com>
Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2026-04-07 15:46:00 +00:00 |
|
Ilya Boytsov
|
6e1100889e
|
fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader (#39176)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-04-07 22:40:55 +08:00 |
|
Ronen Schaffer
|
7c139ab23f
|
[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment (#38217)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-04-07 15:14:45 +03:00 |
|
Jiangyun Zhu
|
8060bb0333
|
[vLLM IR] rework gemma_rms_norm (#39014)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-07 01:37:00 -07:00 |
|
Andreas Karatzas
|
a435e3108d
|
[ROCm][CI] Fix test repo-root assumptions (#39053)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 13:36:21 +08:00 |
|
Andreas Karatzas
|
2df2c85be4
|
[Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 10:57:09 +08:00 |
|
fxmarty-amd
|
00d7b497b3
|
[NVFP4] Support NVFP4 dense models from modelopt and compressed-tensors on AMD Instinct MI300, MI355X and Hopper through emulation (#35733)
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-04-06 16:18:27 -06:00 |
|
Yongye Zhu
|
e8ebbdde83
|
[Quantization] Add FlashInfer CuteDSL batched experts backend for NVFP4 MoE (#38251)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-06 11:57:53 -07:00 |
|
bnellnm
|
f01482408c
|
[MoE Refactor][Test] FusedMoE layer test (#24675)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-06 17:17:23 +00:00 |
|
zhanqiuhu
|
bfdc0a3a99
|
[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer (#37635)
|
2026-04-06 19:07:02 +02:00 |
|
Walter Beller-Morales
|
e69a265135
|
[Feat][Core] safely abort requests when FSM fails to advance (#38663)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-04-06 08:00:16 -07:00 |
|
Julien Denize
|
fef56c1855
|
[Mistral Grammar] Support Grammar Factory (#38150)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-04-06 10:28:51 -04:00 |
|
bhargav-patel-29
|
c5e3454e5a
|
[Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-06 16:19:56 +08:00 |
|
liuchenbing2026
|
f6983f01de
|
MiniMax-M2: add Eagle3 speculative decoding support (#37512)
Signed-off-by: liuchenbing <chenliumail@163.com>
Signed-off-by: liucb <liuchengbao_work@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>
|
2026-04-05 19:50:18 -07:00 |
|
Micah Williamson
|
9570654c6d
|
[ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness (#38184)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-04-06 09:42:02 +08:00 |
|
Greg Pereira
|
4dd49b06f8
|
[Bug] Fix Import paths for encoder_cudagraph modules (#38997)
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 19:11:58 +00:00 |
|
Greg Pereira
|
f53fa26e05
|
[Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters (#38992)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 17:11:18 +00:00 |
|
Aaron Batilo
|
9a528260ef
|
[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987)
Signed-off-by: Aaron Batilo <abatilo@coreweave.com>
|
2026-04-05 02:41:54 -07:00 |
|
Jeffrey Wang
|
ab79863e6c
|
Remove MQ multi-node tests (#38934)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-04-03 20:00:08 +00:00 |
|
Vasiliy Kuznetsov
|
7b1a7423be
|
[Frontend] new online quantization frontend (#38138)
Signed-off-by: Vasiliy Kuznetsov <vasiliy@meta.com>
|
2026-04-03 11:58:39 -04:00 |
|
Yusuf Mohammad
|
46f02e00f2
|
[Bugfix] Fix AWQ models batch invariance issues (#38670)
Signed-off-by: yusuf <yusuf@deeplearningmachine.mynet>
Signed-off-by: <>
Co-authored-by: yusuf <yusuf@deeplearningmachine.mynet>
|
2026-04-03 14:54:15 +00:00 |
|
Hyeonki Hong
|
25f2b55319
|
[Frontend] feat: add streaming support for token generation endpoint (#37171)
Signed-off-by: Hyeonki Hong <hyeonki.hong@moreh.io>
|
2026-04-03 10:20:32 +00:00 |
|
Netanel Haber
|
fa9e68022d
|
Fix Nano Nemotron VL regressions (#38655)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-03 15:22:06 +08:00 |
|
shunting314
|
8b141ed8c3
|
full cudagraph for flex-attn (#36298)
Signed-off-by: shunting314 <shunting@meta.com>
|
2026-04-02 21:15:01 -07:00 |
|
Varun Sundar Rabindranath
|
2ad7c0335f
|
[Model] Add Phi4ForCausalLMV for microsoft/Phi-4-reasoning-vision-15B (#38306)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-02 21:14:57 -07:00 |
|
Bowen Bao
|
201d2ea5bf
|
[CI][ROCm] Add Qwen3.5-35B-A3B-MXFP4 model eval into CI (#38664)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2026-04-03 04:05:45 +00:00 |
|
Bowen Bao
|
103f0de565
|
[ROCm][Quantization][1/N] Refactor quark_moe w_mxfp4 w/ oracle (#38774)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-03 03:29:57 +00:00 |
|
wliao2
|
32e0c0bfa2
|
refactor hard coded device string in test files under tests/v1 and tests/lora (#37566)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
|
2026-04-03 11:21:47 +08:00 |
|
Carl Y
|
3bc2734dd0
|
[Kernel] Fuse FP8 output quantization into merge_attn_states (#36518)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
|
2026-04-03 01:47:04 +00:00 |
|
Carl Y
|
1f5ec2889c
|
[mla] Support fused FP8/NVFP4 output quantization in MLA attention (#35792) (#36205)
Signed-off-by: Carl You <4531192+carlyou@users.noreply.github.com>
Signed-off-by: Carl Y <4531192+carlyou@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-04-02 21:16:11 -04:00 |
|