Michael Goin
|
bfb9bdaf3f
|
[Bugfix] Enable Triton MoE for FP8 per-tensor dynamic (#33300)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-29 12:15:17 -08:00 |
|
Kevin H. Luu
|
2284461d02
|
[release] Minor fixes to release annotation and wheel upload (#33129)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-29 12:09:35 -08:00 |
|
danisereb
|
8e2a469b3b
|
Add Triton fused MoE config for B200 (Nemotron Nano) (#32804)
|
2026-01-29 19:21:33 +00:00 |
|
CarstyYou
|
23591e631e
|
[Bugfix][Kernel] Fix negative memory offset in GDN Triton kernel (#33326)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2026-01-29 10:40:11 -08:00 |
|
Linda
|
0493d897c4
|
[NVIDIA] [feat] Integrate flashinfer Trtllmgen bf16 moe (#32954)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
|
2026-01-29 10:00:13 -08:00 |
|
Chendi.Xue
|
8c8ebeb941
|
[BUGFIX][XPU] fix memory check after XPU reuse GPU_worker (#33358)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-01-29 09:56:30 -08:00 |
|
Cyrus Leung
|
831453fcef
|
[Chore] Move MediaConnector to vllm.multimodal.media (#33324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 16:54:31 +00:00 |
|
Angela Yi
|
5a66c9cc76
|
[ez] Delete torch25_custom_graph_pass (#33287)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 16:47:05 +00:00 |
|
Isotr0py
|
5e73e4900c
|
[Bugfix] Fix broken GLM-OCR initialization (#33350)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-29 07:56:05 -08:00 |
|
Cyrus Leung
|
c6e7404cc5
|
[Multimodal] Simplify MM input definitions (#33331)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 13:32:04 +00:00 |
|
sthWrong
|
17b17c0684
|
[Backport] [Kimi-K2.5] Replace torch.cuda with current_platform for d… (#33320)
|
2026-01-29 12:29:17 +00:00 |
|
Kunshang Ji
|
8bb6271c77
|
[Intel GPU] refine xpu worker (#32894)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-01-29 12:26:52 +00:00 |
|
Roger Wang
|
8b3f0a99dd
|
[Models] Qwen3-ASR (#33312)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-01-29 19:27:15 +08:00 |
|
Li, Jiang
|
8311f083bd
|
[Bugfix][CPU] Fix thread num for shared memory communication (#33317)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 03:26:58 -08:00 |
|
Patrick von Platen
|
40c35038d2
|
[Voxtral] Streaming example (#33042)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-29 03:22:49 -08:00 |
|
zofia
|
a5aa4d5c0f
|
[Quantization][Refactor] use platform dict to choose kernel (#33130)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Signed-off-by: zofia <110436990+zufangzhu@users.noreply.github.com>
|
2026-01-29 10:44:58 +00:00 |
|
andrii.pasternak
|
615e8033e5
|
[Bug Fix] Handle variable-length tensors in MultiModalFlatField batching (#31751)
Signed-off-by: Andrii Pasternak <andriipasternak31@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-29 10:42:59 +00:00 |
|
Ilya Markov
|
d09135fbd0
|
[BugFix] Async Eplb fix potential race condition (#32881)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-01-29 10:31:40 +00:00 |
|
daniel-salib
|
8688c3d460
|
[fix] tesdt mcp_tool_calling_streaming with a more complex math question (#32769)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2026-01-29 10:25:58 +00:00 |
|
Isotr0py
|
5400014d55
|
[Chore] Remove use_data_parallel kwargs from ViT implementation (#33310)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-29 10:20:52 +00:00 |
|
Isotr0py
|
3a92c6f3b5
|
[Misc] Cleanup Kimi-K2.5's vision chunk modality entrypoints (#33157)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-29 09:46:02 +00:00 |
|
amirkl94
|
e01ff5c070
|
Bugfix: Pass router logits dtype in nemotron shared experts (#32669)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2026-01-29 09:36:34 +00:00 |
|
Harry Mellor
|
fb946a7f89
|
Make mypy opt-out instead of opt-in (#33205)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-01-29 09:12:26 +00:00 |
|
Lucas Wilkinson
|
a650ad1588
|
[Misc] Remove missed pad_for_cudagraph (#33283)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-29 09:12:05 +00:00 |
|
graftim
|
d697581a7c
|
[Doc] Update outdated link to Ray documentation (#32660)
Signed-off-by: graftim <38649219+graftim@users.noreply.github.com>
|
2026-01-29 00:56:06 -08:00 |
|
shanjiaz
|
5eeba80c74
|
Adding optional speculator tests for larger models (#32943)
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com>
|
2026-01-29 16:54:02 +08:00 |
|
whx
|
08b1195e62
|
[PluggableLayer][2/N] Apply PluggableLayer to linear layers (#33152)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2026-01-29 16:53:15 +08:00 |
|
cmunley1
|
3bba2edb0f
|
support returning tokenids in responses api (#33212)
Signed-off-by: Christian Munley <cmunley@nvidia.com>
|
2026-01-29 16:52:39 +08:00 |
|
Ilya Markov
|
53fc166402
|
[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-01-29 16:52:11 +08:00 |
|
Didier Durand
|
31b25f6516
|
[Doc]: fixing multiple typos in diverse files (#33256)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 16:52:03 +08:00 |
|
wang.yuqi
|
abb34ac43a
|
[Bugfix] Fix Qwen3-VL-Reranker load. (#33298)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-29 08:42:53 +00:00 |
|
Pengchao Wang
|
2515bbd027
|
[CI/Build][BugFix] fix cuda/compat loading order issue in docker build (#33116)
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2026-01-29 00:19:05 -08:00 |
|
TJian
|
c487a8eef4
|
[Release] [ROCm] Remove old build step (#33316)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-28 23:35:51 -08:00 |
|
Kiersten Stokes
|
9e138cb01d
|
[Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189)
Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>
|
2026-01-29 06:55:50 +00:00 |
|
TJian
|
f9d03599ef
|
[Release] [CI] Optim release pipeline (#33156)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-28 22:45:42 -08:00 |
|
wangln19
|
39037d258e
|
Fix tool call indexing double-counting (#33141)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
|
2026-01-29 05:57:09 +00:00 |
|
Cyrus Leung
|
51550179fc
|
[Refactor] Define MM data parser in processing info instead of processor itself (#33260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-29 13:55:17 +08:00 |
|
Angela Yi
|
07ea184f00
|
[ez] Delete more torch version checks <= 2.8 (#33288)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-29 05:28:46 +00:00 |
|
Or Ozeri
|
a663b218ae
|
[Misc] Add orozery to CODEOWNERS (core, kv_transfer, kv_offload) (#33227)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-29 04:24:20 +00:00 |
|
Michael Goin
|
1bd47d6e5a
|
[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 (#33285)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-28 18:40:59 -08:00 |
|
Michael Goin
|
141cd43967
|
[UX] Remove noisy CT UnquantizedLinearMethod warn (#33273)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-28 16:09:30 -08:00 |
|
Nick Hill
|
6bf3b46d78
|
[ModelRunner V2] Misc code simplification and cleanup (#33266)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-28 14:41:23 -08:00 |
|
Matthew Bonanni
|
77c4f45c6c
|
[7/N][Attention][Docs] Add documentation for attention backends (#32477)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-28 17:20:22 -05:00 |
|
Michael Goin
|
ca1969186d
|
[UX] Enable nested configs in config yaml files (#33193)
|
2026-01-28 16:54:25 -05:00 |
|
Gregory Shtrasberg
|
ab597c869a
|
[Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-01-28 21:25:07 +00:00 |
|
Angela Yi
|
4197168ea5
|
[ez] Remove checks for torch version <= 2.8 (#33209)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-01-28 16:03:56 -05:00 |
|
Rohan Potdar
|
59bcc5b6f2
|
Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-01-28 20:47:47 +00:00 |
|
Wentao Ye
|
3e440786af
|
[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-01-28 20:30:32 +00:00 |
|
Kevin H. Luu
|
8bdd3979d8
|
[CI] Change GPU key to device key for B200 test (#33275)
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-01-28 19:14:29 +00:00 |
|
Wentao Ye
|
c4e744dbd4
|
[Perf] Optimize moe_permute for CUTLASS FP8 (#32892)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-28 10:15:24 -08:00 |
|