xuebwang-amd
|
f451b4558b
|
[Quantization][ROCm] Fix MoE weight loading to be robust (Qwen3_MoE/Qwen3_next as example models) (#33173)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
|
2026-01-30 17:50:23 +00:00 |
|
Vasiliy Kuznetsov
|
3f96fcf646
|
fix QERL attention import path (#33432)
Signed-off-by: vasiliy <vasiliy@fb.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-30 09:29:09 -08:00 |
|
Michael Goin
|
fd0e377244
|
Support FP8 block quant for CompressedTensorsW8A16Fp8 (#33280)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-30 11:15:20 -05:00 |
|
Kyle Sayers
|
f857a03f6b
|
[QeRL] Layerwise Reloading (#32133)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2026-01-30 08:50:05 -07:00 |
|
Frank Wang
|
8f5d51203b
|
Disable Cascade Attention for Batch Invariance (#32561)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
Signed-off-by: Frank Wang <41319051+frankwang28@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-30 10:00:46 -05:00 |
|
Julien Denize
|
8e2ad97ad0
|
[BUGFIX] Pixtral cannot be loaded with --limit-mm-per-prompt 0 (#33406)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-01-30 02:52:02 -08:00 |
|
Patrick von Platen
|
10152d2194
|
[Realtime API] Adds minimal realtime API based on websockets (#33187)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-30 18:41:29 +08:00 |
|
杨朱 · Kiki
|
1a7894dbdf
|
[Misc] Replace Optional[X] with X | None syntax (#33332)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-30 01:56:59 -08:00 |
|
tianshu-Michael-yu
|
f45870b53f
|
fix: allow LFM2 MoE prefix caching (align) (#33376)
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
|
2026-01-30 08:23:14 +00:00 |
|
hujiaxin0
|
ba45bedfd1
|
[model] Add support for openPangu7B-VL (#32449)
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
|
2026-01-30 15:54:27 +08:00 |
|
Harry Mellor
|
9432ed8c7e
|
Explicitly set return_dict for apply_chat_template (#33372)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-30 07:27:04 +00:00 |
|
Isotr0py
|
8bfc8d5600
|
[Models] Refactor Kimi-K2.5 weight loading (#33346)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-30 05:31:20 +00:00 |
|
Wang Haoyu
|
c46b0cd0af
|
[Model][Multimodal] Add explicit MusicFlamingo adapter (#32696)
Signed-off-by: WangHaoyuuu <mailwhaoyu@gmail.com>
|
2026-01-30 11:01:29 +08:00 |
|
Michael Goin
|
bfb9bdaf3f
|
[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-29 12:15:17 -08:00 |
|
danisereb
|
8e2a469b3b
|
Add Triton fused MoE config for B200 (Nemotron Nano) (#32804)
|
2026-01-29 19:21:33 +00:00 |
|
CarstyYou
|
23591e631e
|
[Bugfix][Kernel] Fix negative memory offset in GDN Triton kernel (#33326)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2026-01-29 10:40:11 -08:00 |
|
Linda
|
0493d897c4
|
[NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe (#32954)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2026-01-29 10:00:13 -08:00 |
|
Cyrus Leung
|
831453fcef
|
[Chore] Move MediaConnector to vllm.multimodal.media (#33324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 16:54:31 +00:00 |
|
Isotr0py
|
5e73e4900c
|
[Bugfix] Fix broken GLM-OCR initialization (#33350)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-29 07:56:05 -08:00 |
|
sthWrong
|
17b17c0684
|
[Backport] [Kimi-K2.5] Replace torch.cuda with current_platform for d… (#33320)
|
2026-01-29 12:29:17 +00:00 |
|
Roger Wang
|
8b3f0a99dd
|
[Models] Qwen3-ASR (#33312)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-29 19:27:15 +08:00 |
|
zofia
|
a5aa4d5c0f
|
[Quantization][Refactor] use platform dict to choose kernel (#33130)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>
|
2026-01-29 10:44:58 +00:00 |
|
Isotr0py
|
5400014d55
|
[Chore] Remove use_data_parallel kwargs from ViT implementation (#33310)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-29 10:20:52 +00:00 |
|
amirkl94
|
e01ff5c070
|
Bugfix: Pass router logits dtype in nemotron shared experts (#32669)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2026-01-29 09:36:34 +00:00 |
|
whx
|
08b1195e62
|
[PluggableLayer][2/N] Apply PluggableLayer to linear layers (#33152)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-29 16:53:15 +08:00 |
|
Ilya Markov
|
53fc166402
|
[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-01-29 16:52:11 +08:00 |
|
Didier Durand
|
31b25f6516
|
[Doc]: fixing multiple typos in diverse files (#33256)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 16:52:03 +08:00 |
|
wang.yuqi
|
abb34ac43a
|
[Bugfix] Fix Qwen3-VL-Reranker load. (#33298)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 08:42:53 +00:00 |
|
Kiersten Stokes
|
9e138cb01d
|
[Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189)
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
|
2026-01-29 06:55:50 +00:00 |
|
Cyrus Leung
|
51550179fc
|
[Refactor] Define MM data parser in processing info instead of processor itself (#33260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 13:55:17 +08:00 |
|
Angela Yi
|
07ea184f00
|
[ez] Delete more torch version checks <= 2.8 (#33288)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 05:28:46 +00:00 |
|
Michael Goin
|
141cd43967
|
[UX] Remove noisy CT UnquantizedLinearMethod warn (#33273)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-28 16:09:30 -08:00 |
|
Angela Yi
|
4197168ea5
|
[ez] Remove checks for torch version <= 2.8 (#33209)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-28 16:03:56 -05:00 |
|
Rohan Potdar
|
59bcc5b6f2
|
Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-01-28 20:47:47 +00:00 |
|
Robert Shaw
|
af9b69f977
|
[Quantization][Deprecation] Remove Marlin 24 (#32688)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 15:54:59 +00:00 |
|
Robert Shaw
|
247d1a32ea
|
[Quantization][Deprecation] Remove BitBlas (#32683)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-28 11:06:22 +00:00 |
|
Harry Mellor
|
f1acbd68c5
|
[CI] Enable mypy import following for vllm/compilation (#33199)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 08:59:54 +00:00 |
|
ramos
|
36d450e3b8
|
Adds FunAudioChat multimodal audio model support (#2) (#33058)
Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com>
Signed-off-by: mayufeng <mayufeng@example.com>
Co-authored-by: mayufeng <mayufeng@example.com>
|
2026-01-28 05:18:09 +00:00 |
|
Harry Mellor
|
35fb0b8613
|
Don't use min_pixels/max_pixels from Qwen2VL's processor (#33208)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 05:02:08 +00:00 |
|
Harry Mellor
|
2eb673a088
|
Add flake8-implicit-str-concat rules to Ruff (#33191)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-28 04:56:10 +00:00 |
|
Richard Zou
|
d9aa39a3bb
|
[torch.compile] Speed up MOE handling in forward_context (#33184)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-27 15:17:54 -08:00 |
|
Matthew Bonanni
|
1cbccb6dba
|
[Attention] Use has_flashinfer helper (#33177)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-27 18:33:17 +00:00 |
|
Iris
|
bd92089d33
|
feature: support eagle3 for HunyuanVL & Hunyuan (#33035)
Signed-off-by: irisliu10 <601012173@qq.com>
Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com>
|
2026-01-27 17:55:48 +00:00 |
|
IriKa
|
66e601ef79
|
Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076)
Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>
|
2026-01-27 11:04:05 -05:00 |
|
danielafrimi
|
83fb2d09e8
|
Support heterogeneous NemotronHPuzzle model (#32549)
Signed-off-by: <dafrimi@nvidia.com>
Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>
Signed-off-by: root <dafrimi@nvidia.com>
|
2026-01-27 10:55:54 -05:00 |
|
danisereb
|
f3a5ee705f
|
[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models (#32265)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-27 07:53:26 -08:00 |
|
Matthew Bonanni
|
a608b4c6c2
|
[5/N][Attention] Finish eliminating vllm/attention folder (#32064)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-27 10:02:51 -05:00 |
|
Harry Mellor
|
14385c80fc
|
Fix weight mapping test for Transfomers v5 (#33162)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-27 12:30:14 +00:00 |
|
Lifan Shen
|
da8d0c441a
|
[AMD][QWEN3-NEXT] FP8 Tunings (#32042)
Signed-off-by: Lifan Shen <lifans@meta.com>
|
2026-01-27 09:34:13 +00:00 |
|
Roger Wang
|
b539f988e1
|
[Models] Kimi-K2.5 (#33131)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-27 14:50:31 +08:00 |
|