JJJYmmm
|
04a9e064db
|
[Bugfix] fix the ima issue of qwen-vit (#32687)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
|
2026-01-20 17:21:25 +00:00 |
|
whx
|
4ca62a0dbd
|
[PluggableLayer][1/N] Define PluggableLayer (#32331)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-20 16:19:21 +00:00 |
|
Cyrus Leung
|
fda3f03eb2
|
[4/N] Initialize MM components in context managers (M-P) (#32663)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 14:06:32 +00:00 |
|
Chauncey
|
c4e5bdf61b
|
[Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-01-20 18:48:07 +08:00 |
|
Cyrus Leung
|
7f1bcd18ff
|
[3/N] Initialize MM components in context managers (I-L) (#32650)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 10:21:56 +00:00 |
|
Cyrus Leung
|
e1a34c3a5d
|
[2/N] Initialize MM components in context managers (E-H) (#32641)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 08:12:56 +00:00 |
|
vllmellm
|
148117ea2e
|
[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-01-20 14:48:20 +08:00 |
|
Cyrus Leung
|
b75e85dede
|
[1/N] Initialize MM components in context managers (A-D) (#32632)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 14:12:42 +08:00 |
|
Cyrus Leung
|
4753f3bf69
|
[Model] Use context managers for encoder- and LM-only mode (#32605)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-20 11:43:38 +08:00 |
|
Matthew Bonanni
|
1a1fc3bbc0
|
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-19 18:41:34 -05:00 |
|
Tomas Ruiz
|
4a5299c93f
|
feat: spec decode with draft models (#24322)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2026-01-19 16:05:46 -05:00 |
|
jiahanc
|
7350331718
|
[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-19 14:32:24 -05:00 |
|
Netanel Haber
|
cd3ac5b797
|
support dynamic resolution image encoding for Nemotron Nano VL (#32121)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-01-19 18:15:58 +00:00 |
|
Jee Jee Li
|
2636d76257
|
[Misc] Remove unused ModelKeys (#32608)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-19 17:34:59 +00:00 |
|
danisereb
|
aa7f37ccfa
|
Add support for LoRA adapters in Nemotron-H models (#30802)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-01-19 22:30:44 +08:00 |
|
Nicolò Lucchesi
|
74c583bc50
|
[Core] Whisper support torch.compile (#30385)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-19 10:02:31 +00:00 |
|
Yuxuan Zhang
|
71832ba71e
|
[GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
|
2026-01-19 01:18:38 -08:00 |
|
honglyua
|
976af2f314
|
[BugFix] Fix embed_input_ids argument error of QwenVLForConditionalGeneration (#32462)
|
2026-01-19 03:06:02 +00:00 |
|
Iryna Boiko
|
f5d1740030
|
[Bugfix] Add OOT backend option (#32471)
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
|
2026-01-18 22:20:39 +00:00 |
|
Andrey Khalyavin
|
ba29ab441e
|
Use the same memory for workspace13 and fused_output. (#31531)
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
|
2026-01-18 19:14:22 +00:00 |
|
bnellnm
|
327a02d8db
|
[MoE Refactor] Separate Router into OO Classes (#30623)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-18 11:40:49 -05:00 |
|
tjp_zju
|
2f03035a61
|
"refactor: refactor_repeated_interfaces" (#32486)
Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-18 22:07:01 +08:00 |
|
Isotr0py
|
38bf2ffb21
|
[Bugfix] Fix GLM-ASR audio encoder RoPE dim (#32540)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-18 19:17:59 +08:00 |
|
Li Xie
|
c826c72a96
|
[Model] Support Step1 Model (#32511)
Signed-off-by: xieli <xieli@stepfun.com>
|
2026-01-18 10:20:46 +00:00 |
|
Canlin Guo
|
fe36bf5e80
|
[Model] Remove the unnecessary dtype conversion in MiniCPM (#32523)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2026-01-18 08:07:28 +00:00 |
|
Robert Shaw
|
4a6af8813f
|
[MoE Refactor] Move Test Impl into Test Dirs (#32129)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-01-18 12:16:59 +08:00 |
|
Kim Hee Su
|
1646fea672
|
[Model] Molmo2: Enable quantized weight mapping for vision backbone (#32385)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
|
2026-01-17 09:33:05 +00:00 |
|
Paul Pak
|
d3317bbba4
|
[Models] Lfm2Moe: minor name changes for resolving lora conflicts (#29063)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2026-01-16 22:12:55 -08:00 |
|
Matthew Bonanni
|
2e7c89e708
|
Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… (#32484)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-17 04:42:39 +00:00 |
|
Hashem Hashemi
|
7a1030431a
|
Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-01-16 11:45:04 -06:00 |
|
Cyrus Leung
|
180e981d56
|
[Chore] Replace swish with silu (#32459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-16 08:22:45 +00:00 |
|
Rabi Mishra
|
b66b0d6abb
|
fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm (#32244)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-16 15:31:10 +08:00 |
|
Hongxin Xu
|
03da3b52ef
|
[Bugfix] Refactor to support DP parallel in R3 (#32306)
Signed-off-by: xhx1022 <1737006628@qq.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>
|
2026-01-16 15:13:58 +08:00 |
|
XiongfeiWei
|
73f635a75f
|
[Bug] Add TPU backend option (#32438)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2026-01-16 05:17:12 +00:00 |
|
Kebe
|
5de6dd0662
|
[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding (#32175)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-16 03:21:55 +00:00 |
|
ltd0924
|
709502558c
|
[Model] Add Step3vl 10b (#32329)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-15 19:04:16 -08:00 |
|
Matthew Bonanni
|
bcf2333cd6
|
[CI] Fix LM Eval Large Models (H100) (#32423)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-16 00:52:49 +00:00 |
|
Michael Goin
|
83239ff19a
|
Add thread_n=64 support to Marlin MoE (#32360)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 16:45:44 -08:00 |
|
TomerBN-Nvidia
|
c277fbdf31
|
[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>
|
2026-01-15 16:15:05 -08:00 |
|
Yongye Zhu
|
31c29257c8
|
[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-15 12:53:40 -08:00 |
|
Aleksandr Malyshev
|
8c11001ba2
|
[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2026-01-15 14:13:08 -06:00 |
|
Lucas Wilkinson
|
c36ba69bda
|
[BugFix] Fix assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1] in Blackwell Quantized MoE Test (#32362)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-15 10:19:12 -08:00 |
|
Dipika Sikka
|
361dfdc9d8
|
[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-15 07:25:55 -08:00 |
|
Matthew Bonanni
|
8ebfacaa75
|
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-15 09:49:57 -05:00 |
|
brian033
|
b89275d018
|
[ROCm] Improve error handling while loading quantized model on gfx120… (#31715)
Signed-off-by: brian033 <85883730+brian033@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-01-15 04:16:00 -08:00 |
|
Cyrus Leung
|
cdba4c74b3
|
[Model] Avoid token selection in SigLIP pooling head (#32389)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 17:01:59 +08:00 |
|
Lucas Wilkinson
|
2c9b4cf5bf
|
[BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes (#32361)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>
|
2026-01-15 06:32:22 +00:00 |
|
rasmith
|
3c2685645e
|
[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201)
Signed-off-by: Randall Smith <ransmith@amd.com>
|
2026-01-15 05:04:34 +00:00 |
|
Cyrus Leung
|
9ea07b41da
|
[1/N] Reorganize multimodal processing code (#32327)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-14 15:25:31 +00:00 |
|
Roger Wang
|
b8199f6049
|
[Model] Re-implement Qwen3Omni Audio Encoder (#32167)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-14 15:40:30 +08:00 |
|