Mark McLoughlin
|
141d3b9fc5
|
[docs] Update v1 metrics design doc (#27332)
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: atalhens <sneh.lata@nutanix.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: atalhens <sneh.lata@nutanix.com>
|
2025-10-22 06:29:15 -07:00 |
|
Jee Jee Li
|
abf3db40ef
|
[Core] Handle MoE LoRA edge cases (#27335)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 13:14:33 +00:00 |
|
gnovack
|
8e4ca4d14e
|
Bugfix - pass 'max_num_tokens_padded' into 'moe_lora_align_block_size' (#27311)
Signed-off-by: gnovack <gnovack@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-10-22 12:23:57 +00:00 |
|
Wentao Ye
|
1a0f4defb7
|
[Log] Add Warning for LLM(data_parallel_size=k) single-process DP Usage (#27282)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-22 12:12:21 +00:00 |
|
Li, Jiang
|
843af7f7fc
|
[Bugfix][CPU] Disable dual stream execution for experts on CPU (#27320)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-10-22 11:02:27 +00:00 |
|
wang.yuqi
|
1f633b8632
|
[Frontend][3/N] Improve all pooling task | Support binary embedding response (#27066)
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-22 18:38:57 +08:00 |
|
ExtReMLapin
|
a4c29e6e82
|
fixed reasoning streaming with tool_choice="required" (#24108)
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-10-22 09:42:55 +00:00 |
|
Harry Mellor
|
8f18feb191
|
Remove last level references not removed in #26355 (#27260)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-22 09:18:17 +00:00 |
|
Huy Do
|
ed540d6d4c
|
Update release pipeline for PyTorch 2.9.0 (#27303)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-22 09:18:01 +00:00 |
|
wangxiyuan
|
f6027b2855
|
[1/N][Platform] Cleanup useless function (#26982)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-10-22 09:04:57 +00:00 |
|
Jiangyun Zhu
|
ab3e80042e
|
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled (#27146)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-10-22 00:22:39 -04:00 |
|
Cyrus Leung
|
ceacedc1f9
|
[Benchmark] Add plot utility for parameter sweep (#27168)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-21 20:30:03 -07:00 |
|
Nicolò Lucchesi
|
bfa59be8f1
|
[CI] Nixl integration tests DP-EP (#27199)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-22 11:17:48 +08:00 |
|
vllmellm
|
265ecb05fb
|
[DOC] [ROCm] Add ROCm quickstart guide (#26505)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2025-10-22 03:10:48 +00:00 |
|
Lain
|
09a7e6f617
|
[Deepseek v3.2] Remove extra logics in indexer (#26465)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Signed-off-by: Lain <siyuanf@nvidia.com>
Co-authored-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-21 23:34:03 +00:00 |
|
Tyler Michael Smith
|
6c2eef5a5d
|
[P/D] KVConnector for decode benchmarking (#25986)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-10-21 16:30:47 -07:00 |
|
Benjamin Chislett
|
19748806f0
|
[Bugfix] skip cuda graph for drafter when running with eager (#26821)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-21 15:39:09 -07:00 |
|
ExtReMLapin
|
4a8a567e16
|
Updated xgrammar backend to not deny supported string formats (#27253)
Signed-off-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Signed-off-by: ExtReMLapin <3909752+ExtReMLapin@users.noreply.github.com>
Co-authored-by: CNE Pierre FICHEPOIL <pierre-1.fichepoil@gendarmerie.interieur.gouv.fr>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-21 22:25:23 +00:00 |
|
Alexander Matveev
|
344a0017c0
|
[Performance] Dual stream execution of "shared_experts" and "selected_experts" inside FusedMoE (#26440)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-10-21 21:38:29 +00:00 |
|
Huy Do
|
becb7de40b
|
Update PyTorch to 2.9.0+cu129 (#24994)
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-21 17:20:18 -04:00 |
|
Tao He
|
250fb1b8ea
|
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. (#27144)
Signed-off-by: Tao He <linzhu.ht@alibaba-inc.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-10-21 18:27:03 +00:00 |
|
Nick Hill
|
647214f3d5
|
[V0 Deprecation] Remove V0 executors (#27142)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-21 11:09:37 -07:00 |
|
David Whyte-Gray
|
ddeec11ba9
|
[Bugfix][P/D] Reduce num_threads used by nixl ucx backend (#27196)
Signed-off-by: David Whyte-Gray <40244437+dagrayvid@users.noreply.github.com>
|
2025-10-21 13:41:52 -04:00 |
|
Wentao Ye
|
86ed77022d
|
[Feature] Batch Invariant for R1 TP 8 on Blackwell (#27229)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-21 10:25:55 -07:00 |
|
Micah Williamson
|
aa1356ec53
|
[ROCm] Update Triton, Torch, and AITER branches for ROCm base Dockerfile (#27206)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-10-21 12:01:23 -04:00 |
|
Pavani Majety
|
ecc3c0940a
|
Add @pavanimajety to .github/codeowners for Flashinfer, ModelOpt related code (#27213)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-10-21 22:59:53 +08:00 |
|
JartX
|
ba09652de2
|
[ROCM] Enable CompressedTensorsWNA16 (#27187)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-10-21 10:43:23 -04:00 |
|
Harry Mellor
|
bd66b8529b
|
[CI] Install pre-release version of apache-tvm-ffi for flashinfer (#27262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-21 14:23:56 +00:00 |
|
dongbo910220
|
6c728f7771
|
[Chore] Separate out NCCL utilities from vllm.utils (#27197)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-10-21 06:18:23 -07:00 |
|
Daniel Cámpora
|
80e9452984
|
[Deepseek v3.2] Optimize top_k_per_row (#26763)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
|
2025-10-21 08:30:07 +00:00 |
|
Roger Wang
|
c3a2c6ac5f
|
[MM][Core] Decouple ViT backend from LM backend (#27061)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-10-21 00:30:10 -07:00 |
|
Nicolò Lucchesi
|
72f431e709
|
[Nixl] Minor refactor to handshake related metadata (#26410)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-21 09:07:47 +02:00 |
|
Zebing Lin
|
be4445072c
|
[Fix][Spec Decode] Fix llama4 draft loading with different quantization (#27136)
Signed-off-by: linzebing <linzebing1995@gmail.com>
|
2025-10-20 23:19:00 -07:00 |
|
Benjamin Chislett
|
f381cf2302
|
[Bugfix] Fix broken MTP weight loading for FP8 KV Scales (#27227)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-10-20 22:51:44 -07:00 |
|
Varun Sundar Rabindranath
|
5ff5d94e77
|
[Bugfix] Fix gpt-oss w4a8 DP/EP on B200 (#26729)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-21 01:51:14 -04:00 |
|
Shu Wang
|
f95da13c3d
|
[ModelOpt] Load w13/w2_input_scale for all experts, nvfp4 (#26135)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-21 01:50:31 -04:00 |
|
Po-Han Huang (NVIDIA)
|
aef368aa08
|
[BugFix] GPT-OSS Attention DP + MoE TP weight loading issue (#24032)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-10-21 04:03:47 +00:00 |
|
Chen Wu
|
5f6cbf60d6
|
[Feature][Kernel]FusedMoE LoRA (#21229)
Signed-off-by: wuchen <cntryroa@gmail.com>
Signed-off-by: banjuede <lmklhc@163.com>
Signed-off-by: Chen Wu <cntryroa@gmail.com>
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: wuchen <wuchen@zetyun.com>
Co-authored-by: Nathan Van Gheem <vangheem@gmail.com>
Co-authored-by: banjuede <lmklhc@163.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
|
2025-10-21 03:01:37 +00:00 |
|
Russell Bryant
|
3ada34f9cb
|
[Frontend] Enforce tokenize=False when applying chat template (#27205)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-21 02:57:34 +00:00 |
|
Lunwen He
|
0eb8f2b880
|
create is_in_the_same_node on cpu (#26832)
Co-authored-by: Lunwen He <lunwenh@meta.com>
|
2025-10-21 02:04:14 +00:00 |
|
Fadi Arafeh
|
163965d183
|
[cpu] Dispatch un-quantized linear to oneDNN/ACL by default for AArch64 (#27183)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Michael Yang <Michael.Yang@arm.com>
|
2025-10-21 02:02:58 +00:00 |
|
Nick Hill
|
a03cf9bc70
|
[V0 Deprecation] Remove V0 metrics code (#27215)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-10-21 02:02:10 +00:00 |
|
Isotr0py
|
352c0c8a28
|
[Quantization] Automatically infer AWQ modules_to_not_convert field (#26909)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-21 01:49:28 +00:00 |
|
Andrew Xia
|
bfe0b4bd2a
|
[ez] add uv lock to gitignore (#27212)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-10-21 00:37:44 +00:00 |
|
Concurrensee
|
58fbbcb2f5
|
[ROCm] enable some tests in entrypoints test groups on AMD (#26725)
Signed-off-by: Yida <yida.wu@amd.com>
|
2025-10-21 00:37:16 +00:00 |
|
Heng Guo
|
87778d5f00
|
[Feature][Quantization] auto_round support for mixed bits quantization (#23812)
Signed-off-by: n1ck-guo <heng.guo@intel.com>
Signed-off-by: Heng Guo <heng.guo@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-10-20 22:23:30 +00:00 |
|
Nicolò Lucchesi
|
f9e7ad5400
|
[Bugfix][CI] Fix Distributed Tests (4 GPUs) async_sched+ray test (#27195)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-20 16:34:54 +00:00 |
|
shivampr
|
4d0f266113
|
[Kernel][Model] Tune fused_moe Triton configs for Qwen3-30B A3/A3B on H100 (FP8/BF16) (#26268)
Signed-off-by: Shivam <shivampr.dev@gmail.com>
|
2025-10-20 07:48:01 -07:00 |
|
Eugene Khvedchenya
|
e93ff6c8b9
|
Nemotron Nano V2 VL + EVS Video Support (#27107)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Natan Bagrov <nbagrov@nvidia.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-20 22:19:11 +08:00 |
|
ioana ghiban
|
1c691f4a71
|
AArch64 CPU Docker pipeline (#26931)
|
2025-10-20 07:09:40 -04:00 |
|