Jiangyun Zhu
|
8060bb0333
|
[vLLM IR] rework gemma_rms_norm (#39014)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-07 01:37:00 -07:00 |
|
Rishapveer Singh
|
da4c0e4db9
|
[Model] Use AutoWeightsLoader for FalconH1 (#39092)
Signed-off-by: Rishapveer Singh <215205492+rishaps@users.noreply.github.com>
|
2026-04-07 16:25:17 +08:00 |
|
Netanel Haber
|
a9a0e0551f
|
nano-nemotron-vl: get_mm_max_tokens_per_item for audio, video, image == seq_len (#38727)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-07 00:23:29 -07:00 |
|
Andrew Barnes
|
5c35517a3e
|
[ROCm] Remove unused IS_FNUZ parameter from reshape_and_cache_shuffle_kernel (#39123)
Signed-off-by: Bortlesboat <bortstheboat@gmail.com>
|
2026-04-07 07:17:59 +00:00 |
|
Andreas Karatzas
|
a435e3108d
|
[ROCm][CI] Fix test repo-root assumptions (#39053)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 13:36:21 +08:00 |
|
Andreas Karatzas
|
2df2c85be4
|
[Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-07 10:57:09 +08:00 |
|
Nick Hill
|
62095e82c1
|
[BugFix][MRV2] Fix cuda event reuse race (#39115)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-04-07 00:21:09 +00:00 |
|
bnellnm
|
b2b2c5239e
|
[MoE Refactor] Split up compressed_tensors_moe.py (#38960)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-04-06 20:07:54 -04:00 |
|
fxmarty-amd
|
00d7b497b3
|
[NVFP4] Support NVFP4 dense models from modelopt and compressed-tensors on AMD Instinct MI300, MI355X and Hopper through emulation (#35733)
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Signed-off-by: fxmarty-amd <felmarty@amd.com>
Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-04-06 16:18:27 -06:00 |
|
Matthew Bonanni
|
9c81f35b1a
|
[Attention][MLA] Re-enable FA4 as default MLA prefill backend (#38819)
|
2026-04-06 17:51:46 -04:00 |
|
Woosuk Kwon
|
f186cfe75e
|
[MRV2] Fix hanging issue with DeepSeek V3.2 by setting skip_attn=False (#39098)
Signed-off-by: WoosukKwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-04-06 12:55:13 -07:00 |
|
Netanel Haber
|
dfa5062a8f
|
NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config (#39032)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-06 19:47:46 +00:00 |
|
Yongye Zhu
|
e8ebbdde83
|
[Quantization] Add FlashInfer CuteDSL batched experts backend for NVFP4 MoE (#38251)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-06 11:57:53 -07:00 |
|
namgyu-youn
|
94fbb09894
|
[EASY] Drop duplicate KV-cache initialization (#38799)
Signed-off-by: namgyu-youn <namgyu.dev@gmail.com>
|
2026-04-06 18:05:39 +00:00 |
|
Wentao Ye
|
419e73cdfa
|
[Bug] Fix mistral version dependency (#39086)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-06 13:31:19 -04:00 |
|
bnellnm
|
f01482408c
|
[MoE Refactor][Test] FusedMoE layer test (#24675)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-06 17:17:23 +00:00 |
|
zhanqiuhu
|
bfdc0a3a99
|
[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer (#37635)
|
2026-04-06 19:07:02 +02:00 |
|
bnellnm
|
93bada494f
|
[MoE Refactor] Split of DefaultMoERunner class (#35326)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-06 12:41:59 -04:00 |
|
Frederik Gossen
|
608914de30
|
[Core] Re-enable Inductor pre-grad passes in standalone compile (torch>=2.12) (#38944)
Signed-off-by: Frederik Gossen <frgossen@meta.com>
|
2026-04-06 09:37:13 -07:00 |
|
Wentao Ye
|
4ae218c122
|
[Refactor] Remove unused dead code (#38842)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-06 11:52:05 -04:00 |
|
Lukas Geiger
|
f40d9879f2
|
[Models][GDN] Remove GPU/CPU syncs in GDNAttentionMetadata.build during speculative decoding (#38047)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
|
2026-04-06 15:39:37 +00:00 |
|
Lucas Wilkinson
|
47e605092b
|
[Gemma4] Enable Fast Prefill Optimization (#38879)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-06 11:19:39 -04:00 |
|
Walter Beller-Morales
|
e69a265135
|
[Feat][Core] safely abort requests when FSM fails to advance (#38663)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
|
2026-04-06 08:00:16 -07:00 |
|
Julien Denize
|
fef56c1855
|
[Mistral Grammar] Support Grammar Factory (#38150)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-04-06 10:28:51 -04:00 |
|
bhargav-patel-29
|
c5e3454e5a
|
[Model] Add support for BharatGen's Param2MoE model (#38000)
Signed-off-by: bhargav-patel-29 <bhargav.patel@tihiitb.org>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-06 16:19:56 +08:00 |
|
liuchenbing2026
|
f6983f01de
|
MiniMax-M2: add Eagle3 speculative decoding support (#37512)
Signed-off-by: liuchenbing <chenliumail@163.com>
Signed-off-by: liucb <liuchengbao_work@163.com>
Co-authored-by: liuchenbing <chenliumail@163.com>
|
2026-04-05 19:50:18 -07:00 |
|
Andreas Karatzas
|
780ba37458
|
[ROCm][Quantization] Add asymmetric INT8 quantization support to TritonInt8ScaledMMLinearKernel (#38501)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-06 09:42:10 +08:00 |
|
Micah Williamson
|
9570654c6d
|
[ROCm][CI] Run Kernels Core Operation Test On MI325 and mitigate flakiness (#38184)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-04-06 09:42:02 +08:00 |
|
Netanel Haber
|
d56e952239
|
nano_nemotron_vl: fix tensor device mismatch exception when video profiling (#39029)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-04-05 22:23:45 +00:00 |
|
Kevin H. Luu
|
56de443db1
|
[ci] Switch some CI jobs to H200 MIG slices (#38956)
|
2026-04-05 13:26:11 -07:00 |
|
Greg Pereira
|
4dd49b06f8
|
[Bug] Fix Import paths for encoder_cudagraph modules (#38997)
Signed-off-by: greg pereira <grpereir@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 19:11:58 +00:00 |
|
Greg Pereira
|
f53fa26e05
|
[Bugfix] Fix invalid JSON in Gemma 4 streaming tool calls by stripping partial delimiters (#38992)
Signed-off-by: greg pereira <grpereir@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 17:11:18 +00:00 |
|
Wei Zhao
|
1af6f78ae5
|
[Perf] Change Trtllm fp8 MoE to use Shuffled Weights and BlockMajorK Layout (#38993)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 10:54:31 -04:00 |
|
Martin Vit
|
228023b3a5
|
[Bugfix][MoE] Fix 6-8% decode regression: prefer multi-stream shared expert overlap (#38990)
Signed-off-by: Martin Vit <martin@voipmonitor.org>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-05 10:28:31 -04:00 |
|
Aaron Batilo
|
9a528260ef
|
[Bugfix][Spec Decode] Fix extract_hidden_states for VLM models (#38987)
Signed-off-by: Aaron Batilo <abatilo@coreweave.com>
|
2026-04-05 02:41:54 -07:00 |
|
Robert Shaw
|
968ed02ace
|
[Quantization][Deprecation] Remove Petit NVFP4 (#32694)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-04-05 00:07:45 +00:00 |
|
Robert Shaw
|
7d266abb22
|
Revert "[vLLM IR] gemma_rms_norm" (#38998)
|
2026-04-04 17:48:08 -04:00 |
|
Xiaoshuang Wang
|
156405d243
|
[vLLM IR] gemma_rms_norm (#38780)
Signed-off-by: Icey <1790571317@qq.com>
|
2026-04-04 13:55:52 -04:00 |
|
Artem Perevedentsev
|
99e5539a67
|
[Perf][GDN] Align TMA usage with upstream FLA (#38981)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-05 00:38:02 +08:00 |
|
Linkun
|
a88ce94bbb
|
[IR][RmsNorm] pass None if not has_weight (#38961)
Signed-off-by: Linkun Chen <github@lkchen.net>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-04-04 11:02:30 -04:00 |
|
Ziming Qi
|
2a36d8fb72
|
[Bugfix][CPU] Fix macOS compatibility broken by #36487 (#38970)
Signed-off-by: Ziming (2imi9) <148090931+2imi9@users.noreply.github.com>
|
2026-04-04 14:05:58 +00:00 |
|
lalit10
|
93726b2a1c
|
Refactor Arctic loading to use AutoWeightsLoader (#38955)
Signed-off-by: Lalit Laxminarayan Bangad <lalitbangad@gmail.com>
Co-authored-by: Lalit Laxminarayan Bangad <lalitbangad@meta.com>
|
2026-04-04 05:01:09 +00:00 |
|
Yongye Zhu
|
8617f8676b
|
[Bugfix] Fix DSV32 weight loading (#38870)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2026-04-03 19:57:52 -07:00 |
|
Andreas Karatzas
|
06fd9ffcc4
|
[ROCm][CI] Fix ROCm Dockerfile conftest generation for older Docker parsers (#38959)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-04 10:41:41 +08:00 |
|
Wentao Ye
|
cab4064cd5
|
[Bug] Fix workspace manager _current_workspaces size (#38853)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-04 01:29:45 +00:00 |
|
Wentao Ye
|
062f1a2d70
|
[Bug] Fix compile error for swap_blocks_batch in CUDA 13 (#38915)
|
2026-04-03 16:56:38 -07:00 |
|
elenalil-aws
|
81994e1d0e
|
[Bugfix][LoRA] Fix missing in_proj_z in Qwen3_5ForConditionalGenerati… (#38927)
Signed-off-by: elenalil-aws <elenalil@amazon.com>
|
2026-04-03 23:30:09 +00:00 |
|
Andreas Karatzas
|
4b506ff90a
|
[ROCm][CI] Minor missing import patch (#38951)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-03 23:01:20 +00:00 |
|
Andreas Karatzas
|
5875bb2e9c
|
[ROCm][CI] Added back missing common deps (#38937)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-04-03 15:58:57 -07:00 |
|
Kevin H. Luu
|
f0d3ad9f3e
|
[ci] Remove soft fail for AMD image build job (#38941)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-04-03 20:42:33 +00:00 |
|