Christina Norman
|
166ac3c94d
|
fix(shm): Add memory barriers for cross-process shared memory visibility (#30407)
Signed-off-by: Christina Holland <hey@christinaholland.com>
Signed-off-by: Christina <truffle@gmail.com>
|
2025-12-10 23:01:19 +00:00 |
|
Seiji Eicher
|
b9e0951f96
|
[docs] Improve wide-EP performance + benchmarking documentation (#27933)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-10 22:15:54 +00:00 |
|
Michael Goin
|
fcb894222f
|
[Docs] Update EPLB docs (#30426)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-10 11:56:51 -09:00 |
|
Nick Hill
|
6ccb7baeb1
|
[LMCache] Fix breakage due to new LMCache version (#30216)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-10 11:52:01 -08:00 |
|
Po-Han Huang (NVIDIA)
|
eea41804a4
|
[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used (#30241)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
|
2025-12-10 11:18:51 -08:00 |
|
Jialin Ouyang
|
9f042ba26b
|
[Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-12-10 14:13:01 -05:00 |
|
Cyrus Leung
|
e72d65b959
|
{Deprecation] Remove tokenizer setter (#30400)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 19:10:58 +00:00 |
|
Will Eaton
|
a9e4106f28
|
[P/D] KV Load Failure Recovery/Abort Configuration (#26813)
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <me@wseaton.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-10 11:00:52 -08:00 |
|
Anker
|
e8e8cd73e5
|
[Bugfix] Fix HunyuanOCR cross-image contamination in batch processing (#30344)
Signed-off-by: Lennart Brog <lennart.borg@list-ag.de>
Signed-off-by: Anker <20343812+anker-c2@users.noreply.github.com>
|
2025-12-10 18:09:31 +00:00 |
|
Cyrus Leung
|
253305d5b2
|
[Chore] Delay recent deprecations (#30398)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-10 17:48:38 +00:00 |
|
Matthew Bonanni
|
794a7875ee
|
[Misc] Consistent case for vllm bench serve results (#30403)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-10 09:44:02 -08:00 |
|
Mark McLoughlin
|
2dcbac9077
|
[Docs] Generate full list of metrics in user docs (#30388)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-10 16:09:34 +00:00 |
|
Lucas Wilkinson
|
aacf0abf8b
|
[BugFix] Fix AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale' (#30399)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-10 07:59:23 -08:00 |
|
Nicolò Lucchesi
|
c756fb6781
|
[Core] Whisper enable FULL_DECODE_ONLY CudaGraph (#30072)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-10 06:14:24 -08:00 |
|
Roger Young
|
d017bceb08
|
[BugFix] Fix minimax m2 model rotary_dim (#30384)
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-12-10 04:58:50 -08:00 |
|
Aditya Tewari
|
cebda2a4af
|
[CPU] Support for Whisper (#30062)
Signed-off-by: Aditya Tewari <aditya.tewari@arm.com>
|
2025-12-10 04:58:42 -08:00 |
|
Daniele
|
53d2420b44
|
[Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() (#30331)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2025-12-10 04:58:35 -08:00 |
|
Chauncey
|
9db78f34dc
|
[Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_output (#30371)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-10 08:30:16 +00:00 |
|
Fadi Arafeh
|
434ac76a7c
|
[cpu][ci] Add CPU Attention Tests for Neon Backend (#30347)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-10 05:37:35 +00:00 |
|
Andreas Karatzas
|
ed7af3178a
|
[ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-10 05:33:13 +00:00 |
|
Radu Salavat
|
180345807f
|
[CMake][Build]: Remove unused ACL CMake env variables (#30339)
Signed-off-by: Radu Salavat <radu.salavat@arm.com>
|
2025-12-10 04:27:19 +00:00 |
|
Mingliang Li
|
d007387aa7
|
[Bugfix] Cache added_vocab to avoid per-token overhead (#30351)
Signed-off-by: limingliang <limingliang@stepfun.com>
Co-authored-by: limingliang <limingliang@stepfun.com>
|
2025-12-10 12:05:51 +08:00 |
|
Wilson Wu
|
3bdd426636
|
Fix typos in comments across multiple files (#30345)
Signed-off-by: Wilson Wu <iwilsonwu@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-12-09 20:05:28 -08:00 |
|
haoyangli-amd
|
06462392e4
|
[bugfix][quantization] fix quark qwen3 kv_cache quantization (#30308)
Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>
|
2025-12-10 03:24:12 +00:00 |
|
Micah Williamson
|
7d80c73d42
|
[CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
v0.13.0rc1
|
2025-12-10 02:35:49 +00:00 |
|
rasmith
|
b75f826fca
|
[CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-10 02:28:37 +00:00 |
|
Andrew Xia
|
c3487aca34
|
[responsesAPI][6] Fix multi turn MCP tokenization (#30230)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-12-10 10:13:13 +08:00 |
|
Lucas Wilkinson
|
abe93bce59
|
[Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
|
2025-12-09 17:18:10 -08:00 |
|
ElizaWszola
|
2e7035dd8c
|
[Bugfix] Fix fp8 DeepGemm compilation issues (#30336)
|
2025-12-09 20:17:25 -05:00 |
|
PatrykSaffer
|
4c2e10ea19
|
[Bugfix] Fix cuda graph sizes when running with speculative decoding (#30330)
Signed-off-by: Patryk Saffer <patryk.saffer99@gmail.com>
Signed-off-by: PatrykSaffer <patryk.saffer@mistral.ai>
Co-authored-by: Patryk Saffer <patryk.saffer99@gmail.com>
|
2025-12-10 00:47:07 +00:00 |
|
dongbo910220
|
03b5f940fd
|
[V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync (#29723)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-12-10 00:15:01 +00:00 |
|
Hashem Hashemi
|
2e7054da06
|
Improve wvsplitK tile and balance heristics. (#29937)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2025-12-09 23:51:32 +00:00 |
|
Charlie Fu
|
3c680f4a17
|
[Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: wuhuikx <hattie.wu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2025-12-09 22:39:26 +00:00 |
|
Kyle Sayers
|
fccd532587
|
[Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
|
2025-12-09 13:54:32 -08:00 |
|
bnellnm
|
00e5cbb967
|
[MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply (#29066)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-12-09 13:48:25 -08:00 |
|
rasmith
|
7618dc973d
|
[CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145)
|
2025-12-09 20:18:17 +00:00 |
|
dependabot[bot]
|
f8dacc66b6
|
Bump actions/stale from 10.1.0 to 10.1.1 (#30234)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2025-12-09 20:12:14 +00:00 |
|
dependabot[bot]
|
7cab92fd45
|
Bump actions/checkout from 6.0.0 to 6.0.1 (#30233)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2025-12-09 20:03:16 +00:00 |
|
Tsukasa OI
|
73a484caa1
|
[Model][Quantization] Fix / Add GGUF support for Qwen2 MoE models (#30307)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-12-09 19:13:10 +00:00 |
|
Lucas Wilkinson
|
b37bf51e75
|
[CI/Test] Fix FP8 per-tensor quant test reference scale shape (#30352)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-09 12:52:20 -06:00 |
|
Lucas Wilkinson
|
95501a70ec
|
[BugFix] Fix DeepSeek-R1 hang with DP and MTP (#30119)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-09 18:51:19 +00:00 |
|
Benjamin Chislett
|
e858bfe051
|
[Cleanup] Refactor profiling env vars into a CLI config (#29912)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-09 13:29:33 -05:00 |
|
Woosuk Kwon
|
d471b2aff0
|
[Model Runner V2] Support num NaNs in logits (#30187)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-12-09 10:00:49 -08:00 |
|
Woosuk Kwon
|
9e6562a3f6
|
[Model Runner V2] Fix Triton warning on tl.where (#30355)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-12-09 09:59:54 -08:00 |
|
Ilya Markov
|
0b6a8a304c
|
[BugFix] Fix non detected failing tests (#30277)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2025-12-09 17:57:55 +00:00 |
|
Alexei-V-Ivanov-AMD
|
804e3468c0
|
Update AMD test definitions (2025-12-08) (#30298)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-12-09 17:31:30 +00:00 |
|
Wentao Ye
|
83319b44c2
|
[Compile] Fix torch warning TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled (#29897)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-09 10:40:37 -05:00 |
|
Lucas Wilkinson
|
56037dfa2f
|
[BugFix] Fix assert batch_descriptor.num_tokens == num_tokens_padded (#30173)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-09 10:36:12 -05:00 |
|
quanliu
|
5dcd593baf
|
[Feature] Batch-Invariant Support for FA2 and LoRA (#30018)
Signed-off-by: quanliu <18646313696@163.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-12-09 10:01:38 -05:00 |
|
Julien Denize
|
5c213d2899
|
[BUGFIX] Mistral tool call parser v11+ (#30332)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2025-12-09 14:55:38 +00:00 |
|