Francesco Fusco
|
298e510848
|
[Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318)
Signed-off-by: Francesco Fusco <ffu@zurich.ibm.com>
|
2026-03-21 09:29:43 +00:00 |
|
Andreas Karatzas
|
0d50fa1db6
|
[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 12:57:25 +08:00 |
|
Simon Mo
|
1fa1e53a73
|
Revert "[compile] Initialize passes at VllmBackend init" (#37733)
|
2026-03-20 21:35:49 -07:00 |
|
Andreas Karatzas
|
3ffa52009f
|
[ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-21 11:58:58 +08:00 |
|
Yongye Zhu
|
87bd91892f
|
[MoE Refactor] Mxfp4 oracle rebased (#37128)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-03-21 03:37:04 +00:00 |
|
Isotr0py
|
c7f98b4d0a
|
[Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-21 11:36:15 +08:00 |
|
Santino Ramos
|
85f671b8e1
|
[Model Runner V2] Support Streaming Inputs (#37028)
Signed-off-by: Santino Ramos <elsantinoramos@gmail.com>
|
2026-03-20 20:42:25 +00:00 |
|
Vadim Gimpelson
|
4f16ebbbd3
|
[Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-20 12:19:26 -07:00 |
|
Angela Yi
|
12fd17eb51
|
[compile] Initialize passes at VllmBackend init (#35216)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-03-20 11:40:33 -07:00 |
|
Lucas Wilkinson
|
e1d85e5c24
|
[Attention] Support distinguishing between short extends and decodes (#37303)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-03-20 10:49:36 -07:00 |
|
Xin Yang
|
d0532bf38d
|
[Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-20 11:28:41 -06:00 |
|
Andreas Karatzas
|
fb4e8bf442
|
[ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 10:16:59 -07:00 |
|
Harry Mellor
|
6ade4bc5a5
|
Fix various config related issues for Transformers v5 (#37681)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-20 16:30:12 +00:00 |
|
Zhengxu Chen
|
2e089b96a8
|
[compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFACT=1. (#37589)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-20 16:22:46 +00:00 |
|
Zhengxu Chen
|
c0f5fae601
|
[compile] Fix aot test failures with torch 2.12. (#37604)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-20 16:06:29 +00:00 |
|
L.B.R.
|
1779c09898
|
[ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709)
Signed-off-by: L.B.R. <lbr@mmonad.com>
Co-authored-by: L.B.R. <lbr@mmonad.com>
|
2026-03-20 10:11:23 -05:00 |
|
Ilya Boytsov
|
8b6c6b9505
|
[Model] Add LFM2-ColBERT-350M support (#37528)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-03-20 14:57:57 +00:00 |
|
Harry Mellor
|
9f6d9dd371
|
Fix attribute error in isaac_patch_hf_runner (#37685)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-20 14:49:40 +00:00 |
|
Flora Feng
|
b4c1aef21c
|
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:50:34 -07:00 |
|
Flora Feng
|
6050b93bed
|
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:10:47 -07:00 |
|
Andreas Karatzas
|
5a4a179591
|
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:07:26 +08:00 |
|
Andreas Karatzas
|
9cfd4ebb5e
|
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:06:53 +08:00 |
|
wang.yuqi
|
ed359c497a
|
[Model] Deprecate the score task (this will not affect users). (#37537)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-20 08:07:56 +00:00 |
|
Flora Feng
|
e2d1c8b5e8
|
[Refactor] Relocate entrypoint tests to match serving code structure (#37593)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 05:31:23 +00:00 |
|
Flora Feng
|
9040151fe1
|
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 11:31:43 +08:00 |
|
Jee Jee Li
|
8fbe3f303f
|
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-20 11:09:32 +08:00 |
|
Xiao
|
ea2c148fa7
|
[compile][graph_partition]Add tensor size handling (#36038)
Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
2026-03-19 19:55:25 -07:00 |
|
tianshu-Michael-yu
|
269bf46d99
|
fix: disambiguate multimodal prefix cache keys (#36708)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
|
2026-03-20 10:33:20 +08:00 |
|
Flora Feng
|
be12afd284
|
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056)
|
2026-03-19 19:51:25 -04:00 |
|
rasmith
|
98ff042917
|
[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-20 07:12:45 +08:00 |
|
Laith Sakka
|
112944fab9
|
test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-03-19 17:28:45 -04:00 |
|
Andreas Karatzas
|
040a505ff5
|
[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-19 14:30:58 -05:00 |
|
Ifta khairul Alam Adil
|
104605cbf2
|
Remove deprecated reasoning_content message field(part-2) (#37480)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
Signed-off-by: sihao.li <sihao.li@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Philip Ottesen <phiott256@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Co-authored-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com>
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 15:20:08 +00:00 |
|
DorBernsohn
|
c63ca2b2e6
|
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
|
2026-03-19 21:08:00 +08:00 |
|
Cyrus Leung
|
765e461065
|
[Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 09:55:29 +00:00 |
|
zhanqiuhu
|
d49f273144
|
[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310)
|
2026-03-19 08:22:00 +01:00 |
|
Flora Feng
|
b21d384304
|
[Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-19 07:19:36 +00:00 |
|
Sage Moore
|
c32a58cc2a
|
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-18 20:34:00 -04:00 |
|
Thillai Chithambaram
|
828f862acb
|
[Bugfix] Expand quantization method support in perf metrics (#37231)
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
|
2026-03-18 23:54:19 +00:00 |
|
Andy Lo
|
577df69b26
|
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 23:07:29 +00:00 |
|
Wentao Ye
|
0d81a1fe61
|
[V0 Deprecation] Deprecate virtual engine (#37195)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:30:14 -07:00 |
|
Or Ozeri
|
5dd8df0701
|
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 19:26:40 +02:00 |
|
Xin Yang
|
b1169d7be8
|
[Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 08:15:56 -07:00 |
|
elvischenv
|
296839a1b0
|
[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-03-18 15:01:26 +00:00 |
|
Cyrus Leung
|
99267c23ca
|
[2/3] Refactor InternVL-based processors (#37324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-18 22:22:19 +08:00 |
|
Or Ozeri
|
525f2eeb0b
|
[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 14:42:46 +01:00 |
|
Yufeng He
|
918b7890a1
|
[Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-18 13:40:03 +00:00 |
|
Andy Lo
|
98b09ddc27
|
[NIXL][Bugfix] metrics & testing minor bug (#36051)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 14:39:14 +01:00 |
|
Chauncey
|
b322b197f1
|
[Build] Bump python openai version (#32316)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-18 18:20:10 +08:00 |
|
Andreas Karatzas
|
eaf7c9b976
|
[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 09:44:12 +00:00 |
|