Commit Graph

4882 Commits

Author SHA1 Message Date
Flora Feng
b4c1aef21c [Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 02:50:34 -07:00
Flora Feng
6050b93bed [Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 02:10:47 -07:00
Andreas Karatzas
5a4a179591 [ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 17:07:26 +08:00
Andreas Karatzas
9cfd4ebb5e [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 17:06:53 +08:00
wang.yuqi
ed359c497a [Model] Deprecate the score task (this will not affect users). (#37537)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-20 08:07:56 +00:00
Flora Feng
e2d1c8b5e8 [Refactor] Relocate entrypoint tests to match serving code structure (#37593)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 05:31:23 +00:00
Flora Feng
9040151fe1 [V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-20 11:31:43 +08:00
Jee Jee Li
8fbe3f303f [Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-03-20 11:09:32 +08:00
Xiao
ea2c148fa7 [compile][graph_partition]Add tensor size handling (#36038)
Signed-off-by: Xiao Fu <xiaofu@meta.com>
2026-03-19 19:55:25 -07:00
tianshu-Michael-yu
269bf46d99 fix: disambiguate multimodal prefix cache keys (#36708)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
2026-03-20 10:33:20 +08:00
Flora Feng
be12afd284 [Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056) 2026-03-19 19:51:25 -04:00
rasmith
98ff042917 [CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-03-20 07:12:45 +08:00
Laith Sakka
112944fab9 test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064)
Signed-off-by: Laith Sakka <lsakka@meta.com>
2026-03-19 17:28:45 -04:00
Andreas Karatzas
040a505ff5 [ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-19 14:30:58 -05:00
Ifta khairul Alam Adil
104605cbf2 Remove deprecated reasoning_content message field(part-2) (#37480)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
Signed-off-by: sihao.li <sihao.li@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Philip Ottesen <phiott256@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Co-authored-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com>
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-19 15:20:08 +00:00
DorBernsohn
c63ca2b2e6 [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
2026-03-19 21:08:00 +08:00
Cyrus Leung
765e461065 [Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 09:55:29 +00:00
zhanqiuhu
d49f273144 [SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310) 2026-03-19 08:22:00 +01:00
Flora Feng
b21d384304 [Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-03-19 07:19:36 +00:00
Sage Moore
c32a58cc2a [EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2026-03-18 20:34:00 -04:00
Thillai Chithambaram
828f862acb [Bugfix] Expand quantization method support in perf metrics (#37231)
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
2026-03-18 23:54:19 +00:00
Andy Lo
577df69b26 [Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054)
Signed-off-by: Andy Lo <andy@mistral.ai>
2026-03-18 23:07:29 +00:00
Wentao Ye
0d81a1fe61 [V0 Deprecation] Deprecate virtual engine (#37195)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-03-18 14:30:14 -07:00
Or Ozeri
5dd8df0701 [kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-03-18 19:26:40 +02:00
Xin Yang
b1169d7be8 [Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-03-18 08:15:56 -07:00
elvischenv
296839a1b0 [Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2026-03-18 15:01:26 +00:00
Cyrus Leung
99267c23ca [2/3] Refactor InternVL-based processors (#37324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-18 22:22:19 +08:00
Or Ozeri
525f2eeb0b [kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
2026-03-18 14:42:46 +01:00
Yufeng He
918b7890a1 [Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-18 13:40:03 +00:00
Andy Lo
98b09ddc27 [NIXL][Bugfix] metrics & testing minor bug (#36051)
Signed-off-by: Andy Lo <andy@mistral.ai>
2026-03-18 14:39:14 +01:00
Chauncey
b322b197f1 [Build] Bump python openai version (#32316)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-03-18 18:20:10 +08:00
Andreas Karatzas
eaf7c9b976 [CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 09:44:12 +00:00
Karan Bansal
fad09e8a1f fix(glm47): improve tool call parsing and content normalization (#37386)
Signed-off-by: karanb192 <karan@example.com>
Co-authored-by: karanb192 <karan@example.com>
2026-03-18 08:12:21 +00:00
Or Ozeri
fcf0687b27 [kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
2026-03-18 08:49:53 +02:00
liuzhenwei
86b7e3c95a [XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-18 13:32:59 +08:00
Andreas Karatzas
ce2ef42fd3 [CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 05:26:20 +00:00
Andreas Karatzas
8b6325758c [ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 04:55:40 +00:00
gxd3
a0dd1995c7 [Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
2026-03-18 12:53:28 +08:00
Andreas Karatzas
58cde5c026 [ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 11:12:26 +08:00
Yanan Cao
ff9fbc9aff [Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-18 01:23:35 +00:00
Michael Goin
09e4576f65 [Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-03-17 18:12:04 -04:00
Yong Hoon Shin
de35c06c66 Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2026-03-17 21:29:06 +00:00
Athrael Soju
c0745a851a [Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-17 21:17:02 +00:00
Ekagra Ranjan
b5ca9c3557 [Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-17 21:04:17 +00:00
Cyrus Leung
51f0acda79 [Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 19:44:52 +00:00
Andrey Talman
68f783a727 [Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
2026-03-17 18:47:59 +00:00
Andreas Karatzas
4ed51308c8 [CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-17 09:08:08 -07:00
Isotr0py
a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Bhoomit
3717a4dd47 [Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:36:41 +00:00
Harry Mellor
ecfcdd2ce4 Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:29:24 +00:00