Richard Zou
|
254f6b9867
|
[Bugfix] Fix eagle dp tests on A100 (#31241)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2025-12-25 00:05:04 +00:00 |
|
Cyrus Leung
|
09dc7c690c
|
[Chore][1/2] Drop v0.14 deprecations (#31285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 09:54:01 -08:00 |
|
wang.yuqi
|
1ff67df182
|
[CI] Reorganization pooling_mteb_test (#31265)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-12-24 23:36:20 +08:00 |
|
Cyrus Leung
|
aa3868ecfe
|
[Chore] Remove unused noqas (#31263)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-24 05:38:46 -08:00 |
|
Andreas Karatzas
|
0247a91e00
|
[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-23 22:42:30 -08:00 |
|
Michael Goin
|
8ee90c83f8
|
Add --max-model-len auto to auto-fit context to available memory (#29431)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-23 21:37:14 -08:00 |
|
Micah Williamson
|
3ce791ac77
|
[ROCm][CI] Set VLLM_FLOAT32_MATMUL_PRECISION="tf32" For terratorch Tests In AMD CI (#31242)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-24 03:21:50 +00:00 |
|
Andreas Karatzas
|
e42894f5b5
|
[ROCm][CI][Bugfix] Fix Siglip2 rotary embedding dispatch and InternVL video test tolerance (#31235)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-24 02:56:58 +00:00 |
|
Chen Zhang
|
538e830caa
|
[KVEvent] User request.block_hash for parent block_hash (#30544)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2025-12-23 18:23:43 -08:00 |
|
rongfu.leng
|
4ed11105d7
|
[Misc] Remove unused custom ops copy_blocks and copy_blocks_mla (#30967)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-12-23 18:22:35 -08:00 |
|
Vadim Gimpelson
|
bc0a5a0c08
|
[CI] Add Qwen3-Next-FP8 to Blackwell model tests (#31049)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-12-23 17:21:50 -08:00 |
|
Andreas Karatzas
|
bfa2c0bbb9
|
[ROCm][Bugfix] Fix RuntimeError in MMEncoderAttention by replacing .view() with .reshape() (#31203)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-23 21:48:01 +00:00 |
|
Mark McLoughlin
|
f790068600
|
[Core] Add a random suffix to frontend-provided request IDs (#27987)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-12-23 13:05:39 -08:00 |
|
Joachim Studnia
|
38c361f99d
|
Fix edge case Mistral tool parser (#30724)
Signed-off-by: Joachim Studnia <joachim@mistral.ai>
Signed-off-by: Joachim Studnia <studniajoachim@gmail.com>
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: juliendenize <julien.denize@mistral.ai>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-12-23 14:19:58 +00:00 |
|
Cyrus Leung
|
bb62dda2c3
|
[Misc] Introduce encode_*_url utility function (#31208)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-23 13:45:21 +00:00 |
|
Patrick von Platen
|
3faa8bee57
|
adapt voxtral (#31095)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-12-23 05:31:55 -08:00 |
|
Jakub Zakrzewski
|
23daef548d
|
[Frontend] Support using chat template as custom score template for reranking models (#30550)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-23 11:19:16 +00:00 |
|
vllmellm
|
f32cfd7d97
|
[ROCm][FEAT] Support AITER RMSNorm quantization fusion pass (#26575)
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-23 02:07:54 -08:00 |
|
Cyrus Leung
|
8cef137689
|
[Chore] Update more locations to use attention_config.backend (#31153)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-22 19:19:50 -08:00 |
|
Pavani Majety
|
3e10262356
|
Revert "[SM100] Enable fp8 compute for prefill MLA (#30746)" (#31197)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 18:15:33 -08:00 |
|
Angela Yi
|
612d5ffdab
|
[ci] Fix Pytorch compilation test oom in 2.10 (#31194)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-12-23 01:56:47 +00:00 |
|
Divakar Verma
|
78e5e62bbf
|
[AMD][CI] fix v1/engine test_preprocess_error_handling (#31192)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-23 01:28:19 +00:00 |
|
Michael Goin
|
6d518ffbaa
|
[CI Failure] Disable mosaicml/mpt-7b and databricks/dbrx-instruct tests (#31182)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-22 15:40:35 -08:00 |
|
Lucas Wilkinson
|
de71747655
|
[SpecDecode] Simplified alternative padded-speculation acceptance rate fix (#29845)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-22 13:06:10 -08:00 |
|
Pavani Majety
|
b10f41c894
|
[SM100] Enable fp8 compute for prefill MLA (#30746)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-12-22 19:15:57 +00:00 |
|
Yongye Zhu
|
7b926e8901
|
[MoE Refactor][9/N] Use modular kernel for unquantized Triton MoE (#31052)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2025-12-22 17:34:19 +00:00 |
|
Nicolò Lucchesi
|
b1c3f96ae3
|
[CI][Bugfix] Fix entrypoints/openai/test_audio.py (#31151)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-22 07:21:40 -08:00 |
|
Shengqi Chen
|
2cf91c2ea4
|
[CI] add polling for precompiled wheel in python_only_compile.sh, fix index generation for releases (#30781)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-22 13:24:21 +00:00 |
|
Kevin McKay
|
42b42824ae
|
[Misc] Fix grammar errors in comments and messages (#31115)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:14:02 -08:00 |
|
Kevin McKay
|
ec58c10ce1
|
[Misc] Fix quantization-related typos (#31116)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:13:48 -08:00 |
|
Kevin McKay
|
8c084de59d
|
[Misc] Fix spelling typos in comments (#31114)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2025-12-21 21:13:14 -08:00 |
|
CedricHuang
|
19cc9468fd
|
[Feature]: Support NVIDIA ModelOpt HF FP8 variants FP8_PER_CHANNEL_PER_TOKEN and FP8_PB_WO in vLLM (#30957)
|
2025-12-21 22:34:49 -05:00 |
|
Jee Jee Li
|
097978a15d
|
[Kernel] Enable fused_qknorm_rope_kernel supports partial rope (#30821)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-21 18:39:22 -08:00 |
|
Robert Shaw
|
b471092d3a
|
[MoE Refactor][4/N] Marlin Fp8 Mk (#31036)
|
2025-12-21 12:37:42 -05:00 |
|
汪志鹏
|
3e92b2b7ac
|
[BugFix]fix gpt-oss v1/completions response bug (#30608)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: bbrowning <bbrownin@redhat.com>
|
2025-12-21 10:39:31 +08:00 |
|
baonudesifeizhai
|
54c8924384
|
[MoE Refactor][5/N] Isolate zero expert to LongCatFlash (#28891)
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: Dongjie Zou <85092850+baonudesifeizhai@users.noreply.github.com>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robertgshaw2@gmail.com>
|
2025-12-20 18:22:04 +00:00 |
|
Lucas Wilkinson
|
ff2168bca3
|
[CI] FIx fixture 'siglip_attention_config' not found (#31053)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-20 03:46:15 +00:00 |
|
zejunchen-zejun
|
d52c5096d7
|
[Bugfix] fix the alias bug of AttentionBackendEnum when register CUSTOM attention backend to vllm (#30869)
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
|
2025-12-20 09:03:35 +08:00 |
|
Robert Shaw
|
83a317f650
|
[MoE Refactor][3/N] Deprecate cutlass block quant fp8 (b200) (#30990)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-19 13:09:54 -08:00 |
|
Seiji Eicher
|
1ab5213531
|
Make engine core client handshake timeout configurable (#27444)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-12-19 20:38:30 +00:00 |
|
Zhonghua Deng
|
969bbc7c61
|
[Model] Add MiMo-V2-Flash support (#30836)
Signed-off-by: Abatom <abzhonghua@gmail.com>
Signed-off-by: Jumiar <liuanqim10@126.com>
Signed-off-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jumiar <liuanqim10@126.com>
Co-authored-by: Zyann7 <zyann7@outlook.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-19 17:17:03 +00:00 |
|
Marko Rosenmueller
|
455949675d
|
[Frontend][Bug] allow tool calls in analysis channel (#28139)
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-19 10:47:44 +00:00 |
|
lif
|
086b96339f
|
[Bugfix] Add validation for tool requests when tool_parser is unavailable (#30613)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-19 18:23:28 +08:00 |
|
Wenqi Glantz
|
4924ac582c
|
Add hidden dimension validation for multimodal embedding inputs (#30968)
Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>
|
2025-12-19 07:59:36 +00:00 |
|
Nick Hill
|
2ac85a4544
|
[BugFix] Fix logprobs with spec decode and modified logits (#30846)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-18 19:58:28 -08:00 |
|
Andreas Karatzas
|
7b43db210c
|
[ROCm][CI][Bugfix] Multi-Modal Model Support Fixes and Attention Backend Improvements (#30270)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-19 02:17:27 +00:00 |
|
PlatinumGod
|
6a09612b2e
|
[Bugfix] Fix tool_choice="none" being ignored by GPT-OSS/harmony models (#30867)
Signed-off-by: yujiepu <pyjapple@gmail.com>
Signed-off-by: PlatinumGod <pyjapple@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-19 09:34:27 +08:00 |
|
Nick Hill
|
45c0526ac9
|
[BugFix] Handle errors when preprocessing added requests (#30895)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-19 01:29:11 +00:00 |
|
Elizabeth Thomas
|
41b6f9200f
|
Remove all2all backend envvar (#30363)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-18 19:46:28 +00:00 |
|
Isotr0py
|
700a5ad6c6
|
[MM Encoder]: Migrate legacy ViT MultiHeadAttention to new MMEncoderAttention interface (#30684)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-19 02:04:19 +08:00 |
|