Tushar Shetty
|
c4d859c274
|
[Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243)
Signed-off-by: Tushar Shetty <tushar.shetty@abbyy.com>
Signed-off-by: Tushar Shetty <54362365+tusharshetty61@users.noreply.github.com>
|
2026-03-08 20:40:16 -07:00 |
|
cong-or
|
747431044d
|
feat(attention): extract KV-cache update from FlexAttention backend (#36263)
Signed-off-by: cong-or <conchubhar.gannon@gmail.com>
|
2026-03-08 20:40:12 -07:00 |
|
Cyrus Leung
|
d62856b928
|
[Misc] Move processors to transformers_utils (#35953)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-09 11:31:39 +08:00 |
|
Alex Brooks
|
bd2659a566
|
Increase Flexibility for OOV Multimodal Token Handling (#34858)
Signed-off-by: Alex Brooks <albrooks@redhat.com>
|
2026-03-08 20:30:49 -07:00 |
|
Shaun Kotek
|
90512b2e8b
|
fix: Use iterator as not to store all the file loads in memory at once (#36149)
Signed-off-by: Shaun Kotek - Nvidia <skotek@nvidia.com>
|
2026-03-08 20:25:21 -07:00 |
|
wang.yuqi
|
dcf8862fd4
|
[Examples][1/n] Resettle basic examples. (#35579)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:22:53 -07:00 |
|
Weiguang Li
|
43aa389231
|
[Bugfix] Fix CPU OMP autobind assertion to use local_world_size (#35815)
Signed-off-by: liweiguang <codingpunk@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-03-08 20:07:29 -07:00 |
|
Wentao Ye
|
384425f84e
|
[Dependency] Remove default ray dependency (#36170)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-08 20:06:22 -07:00 |
|
Harry Mellor
|
a0f44bb616
|
Allow markdownlint to run locally (#36398)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-08 20:05:24 -07:00 |
|
Kunshang Ji
|
fde4771bbd
|
[XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
|
2026-03-09 02:09:22 +00:00 |
|
Jiangyun Zhu
|
e5ff140216
|
[cudagraph] fix cudagraph warning in deepseekv32 (#28044)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-08 20:27:41 -04:00 |
|
danisereb
|
0a6a3a1290
|
Add support for ModelOpt MXFP8 MoE models (#35986)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-03-08 13:00:05 -07:00 |
|
Sage
|
4497431df6
|
[Frontend] Add GPU-less render serving path (vllm launch render) (#36166)
|
2026-03-08 16:35:09 +01:00 |
|
nvnbagrov
|
b7332b058c
|
[Model] Nano Nemotron VL - fast media preprocessing (#35657)
Signed-off-by: Natan Bagrov <nbagrov@nvidia.com>
|
2026-03-08 03:04:05 -07:00 |
|
Andreas Karatzas
|
40077ea3de
|
[CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-08 14:42:24 +08:00 |
|
Samuel Shen
|
5d6aae4577
|
[LMCache MP Patch]: Race Condition + Duplicated Block Ids (#35831)
|
2026-03-07 13:52:48 -08:00 |
|
Roy Huang
|
63298ee173
|
[Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode (#35931)
|
2026-03-07 13:52:35 -08:00 |
|
Richard Zou
|
2dde535df1
|
[compile] Split compile/warmup monitoring (#36098)
|
2026-03-07 13:52:11 -08:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
PatchyTIS
|
a6be75dbd2
|
[Core] NGram GPU Implementation compatible with Async Scheduler (#29184)
|
2026-03-07 13:51:37 -08:00 |
|
Micah Williamson
|
ee54f9cdb9
|
[ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224)
|
2026-03-07 13:50:52 -08:00 |
|
Micah Williamson
|
fc4657756f
|
[ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 (#36174)
|
2026-03-07 13:50:17 -08:00 |
|
qli88
|
eebd14651f
|
[CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416)
|
2026-03-07 13:49:56 -08:00 |
|
Matthew Bonanni
|
ebb9cc5f2b
|
[UX][Startup] Account for CUDA graphs during memory profiling (#30515)
|
2026-03-07 13:49:23 -08:00 |
|
rahul-sarvam
|
85f50eb41f
|
Adding support to Sarvam's MoE models (#33942)
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>
|
2026-03-08 01:16:24 +08:00 |
|
Taneem Ibrahim
|
5261223c2d
|
[Misc] Remove duplicate parser registration (#36303)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
|
2026-03-07 09:37:01 -05:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
vllmellm
|
ee8a29511f
|
[Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
|
2026-03-07 09:26:59 +00:00 |
|
milesial
|
755356b3d1
|
feat: expose media_io_kwargs at runtime (#34778)
Signed-off-by: Alexandre Milesi <milesial@users.noreply.github.com>
|
2026-03-07 04:27:04 +00:00 |
|
Andreas Karatzas
|
58928475e4
|
[ROCm][CI] Making entrypoints more deterministic on ROCm (#36293)
|
2026-03-06 19:04:40 -08:00 |
|
Mengtao (Martin) Yuan
|
1a9718085c
|
Fix CUDA graph decode capture crash in AITER FlashAttention (#36042)
Signed-off-by: Martin Yuan <myuan@meta.com>
Co-authored-by: Martin Yuan <myuan@meta.com>
|
2026-03-06 18:12:07 -08:00 |
|
Kunshang Ji
|
7eb524e64c
|
refine vllm bench throughput --backend hf (#35971)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-07 02:10:33 +00:00 |
|
Nick Hill
|
c7f32e08c2
|
[BugFix] Avoid ignored trust_remote_code warnings (#36290)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-07 01:24:18 +00:00 |
|
Nick Hill
|
b354686524
|
[Model Runner V2] Fix warmup for pipeline parallel (#36280)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-06 16:58:51 -08:00 |
|
Nick Hill
|
6a18d8789b
|
[Core] Fix benign error log during normal shutdown (#36270)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
|
2026-03-07 00:39:21 +00:00 |
|
Itay Alroy
|
24a03915f5
|
mla: don't update kv cache on dummy forwards (#36282)
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
|
2026-03-07 00:36:00 +00:00 |
|
Andreas Karatzas
|
b5e34e1fca
|
[ROCm][CI] Fixing yaml file for external amd-ci signal (#36284)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 18:30:39 -06:00 |
|
Copilot
|
ce8546a12b
|
[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538)
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: ProExpertProg <luka.govedic@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-06 23:55:06 +00:00 |
|
Chuan (Richard) Li
|
c188749bcd
|
[ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) (#35850)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-06 20:24:03 +00:00 |
|
Alexei-V-Ivanov-AMD
|
225d1090a0
|
Enabling some B200-specific tests on MI355 (#35253)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com>
|
2026-03-06 19:27:20 +00:00 |
|
eellison
|
f3c6c9c9d7
|
[CustomOp] CustomOp FusedRMSNormGated (#35877)
Signed-off-by: Elias Ellison <elias.ellison@gmail.com>
Signed-off-by: eellison <elias.ellison@gmail.com>
|
2026-03-06 10:53:37 -08:00 |
|
Nick Hill
|
26bd43b52d
|
Revert "[BugFix] Fix engine hanging after KV cache initialization fai… (#36262)
|
2026-03-06 08:28:09 -08:00 |
|
Travis Johnson
|
6b625a8807
|
[Bugfix] Quickfix followups to busy loop removal in #28053 (#36068)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-06 08:13:05 -08:00 |
|
Richard Zou
|
54756b6109
|
[compile] Stop unconditionally patching constrain_to_fx_strides (#36152)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-06 10:17:27 -05:00 |
|
Raphaël Rialland
|
39f9ea0da4
|
[Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) (#36165)
|
2026-03-06 09:15:31 -05:00 |
|
Isotr0py
|
e4ae148a78
|
[Refactor] Modular video loader backend refactoring (#35202)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-06 06:06:59 -08:00 |
|
Isotr0py
|
1d0c0d209c
|
[Misc] Lazy import registered processors (#36024)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-03-06 06:06:45 -08:00 |
|
Chenguang Zheng
|
fcb73f306c
|
[bugfix] add api process rank in default multimodal request (#36150)
Signed-off-by: fake0fan <645327136@qq.com>
Signed-off-by: Chenguang ZHENG <645327136@qq.com>
|
2026-03-06 12:00:09 +00:00 |
|
Harry Mellor
|
e2090bf3af
|
[CI] Fix startup error test (#36230)
A change in engine startup error messages in #35478 caused this test failure.
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-06 11:50:28 +00:00 |
|
Andreas Karatzas
|
2a00d3241f
|
[CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 01:17:08 -08:00 |
|