Flora Feng
|
b4c1aef21c
|
[Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:50:34 -07:00 |
|
Flora Feng
|
6050b93bed
|
[Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 02:10:47 -07:00 |
|
Andreas Karatzas
|
5a4a179591
|
[ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:07:26 +08:00 |
|
Andreas Karatzas
|
9cfd4ebb5e
|
[ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-20 17:06:53 +08:00 |
|
wang.yuqi
|
ed359c497a
|
[Model] Deprecate the score task (this will not affect users). (#37537)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-20 08:07:56 +00:00 |
|
Flora Feng
|
e2d1c8b5e8
|
[Refactor] Relocate entrypoint tests to match serving code structure (#37593)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 05:31:23 +00:00 |
|
Flora Feng
|
9040151fe1
|
[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-20 11:31:43 +08:00 |
|
Jee Jee Li
|
8fbe3f303f
|
[Bugfix][LoRA] Fix Qwen35 LoRA (#36976)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-20 11:09:32 +08:00 |
|
Xiao
|
ea2c148fa7
|
[compile][graph_partition]Add tensor size handling (#36038)
Signed-off-by: Xiao Fu <xiaofu@meta.com>
|
2026-03-19 19:55:25 -07:00 |
|
tianshu-Michael-yu
|
269bf46d99
|
fix: disambiguate multimodal prefix cache keys (#36708)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
|
2026-03-20 10:33:20 +08:00 |
|
Flora Feng
|
be12afd284
|
[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056)
|
2026-03-19 19:51:25 -04:00 |
|
rasmith
|
98ff042917
|
[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-03-20 07:12:45 +08:00 |
|
Laith Sakka
|
112944fab9
|
test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-03-19 17:28:45 -04:00 |
|
Andreas Karatzas
|
040a505ff5
|
[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-19 14:30:58 -05:00 |
|
Ifta khairul Alam Adil
|
104605cbf2
|
Remove deprecated reasoning_content message field(part-2) (#37480)
Signed-off-by: JartX <sagformas@epdcenter.es>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Philip Ottesen <phiott256@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
Signed-off-by: Andy Lo <andy@mistral.ai>
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
Signed-off-by: sihao.li <sihao.li@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Philip Ottesen <phiott256@gmail.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com>
Co-authored-by: Andy Lo <andy@mistral.ai>
Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com>
Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-19 15:20:08 +00:00 |
|
DorBernsohn
|
c63ca2b2e6
|
[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438)
Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>
|
2026-03-19 21:08:00 +08:00 |
|
Cyrus Leung
|
765e461065
|
[Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-19 09:55:29 +00:00 |
|
zhanqiuhu
|
d49f273144
|
[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310)
|
2026-03-19 08:22:00 +01:00 |
|
Flora Feng
|
b21d384304
|
[Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-19 07:19:36 +00:00 |
|
Sage Moore
|
c32a58cc2a
|
[EPLB] Simplify EPLB rearrange by only returning one map (#36267)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-18 20:34:00 -04:00 |
|
Thillai Chithambaram
|
828f862acb
|
[Bugfix] Expand quantization method support in perf metrics (#37231)
Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>
|
2026-03-18 23:54:19 +00:00 |
|
Andy Lo
|
577df69b26
|
[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 23:07:29 +00:00 |
|
Wentao Ye
|
0d81a1fe61
|
[V0 Deprecation] Deprecate virtual engine (#37195)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-18 14:30:14 -07:00 |
|
Or Ozeri
|
5dd8df0701
|
[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 19:26:40 +02:00 |
|
Xin Yang
|
b1169d7be8
|
[Kernel] Add gpt-oss Router GEMM kernel (#37205)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-18 08:15:56 -07:00 |
|
elvischenv
|
296839a1b0
|
[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-03-18 15:01:26 +00:00 |
|
Cyrus Leung
|
99267c23ca
|
[2/3] Refactor InternVL-based processors (#37324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-18 22:22:19 +08:00 |
|
Or Ozeri
|
525f2eeb0b
|
[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-18 14:42:46 +01:00 |
|
Yufeng He
|
918b7890a1
|
[Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301)
Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-18 13:40:03 +00:00 |
|
Andy Lo
|
98b09ddc27
|
[NIXL][Bugfix] metrics & testing minor bug (#36051)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-03-18 14:39:14 +01:00 |
|
Chauncey
|
b322b197f1
|
[Build] Bump python openai version (#32316)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-18 18:20:10 +08:00 |
|
Andreas Karatzas
|
eaf7c9b976
|
[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 09:44:12 +00:00 |
|
Karan Bansal
|
fad09e8a1f
|
fix(glm47): improve tool call parsing and content normalization (#37386)
Signed-off-by: karanb192 <karan@example.com>
Co-authored-by: karanb192 <karan@example.com>
|
2026-03-18 08:12:21 +00:00 |
|
Or Ozeri
|
fcf0687b27
|
[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-03-18 08:49:53 +02:00 |
|
liuzhenwei
|
86b7e3c95a
|
[XPU] skip unsupported ut and update test_nixl_connector (#37179)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-18 13:32:59 +08:00 |
|
Andreas Karatzas
|
ce2ef42fd3
|
[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 05:26:20 +00:00 |
|
Andreas Karatzas
|
8b6325758c
|
[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 04:55:40 +00:00 |
|
gxd3
|
a0dd1995c7
|
[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924)
Signed-off-by: Guangxiang Du <gxd@google.com>
|
2026-03-18 12:53:28 +08:00 |
|
Andreas Karatzas
|
58cde5c026
|
[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-18 11:12:26 +08:00 |
|
Yanan Cao
|
ff9fbc9aff
|
[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-18 01:23:35 +00:00 |
|
Michael Goin
|
09e4576f65
|
[Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-03-17 18:12:04 -04:00 |
|
Yong Hoon Shin
|
de35c06c66
|
Make KV connector metadata build overridable via plugin (#37336)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2026-03-17 21:29:06 +00:00 |
|
Athrael Soju
|
c0745a851a
|
[Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-17 21:17:02 +00:00 |
|
Ekagra Ranjan
|
b5ca9c3557
|
[Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-17 21:04:17 +00:00 |
|
Cyrus Leung
|
51f0acda79
|
[Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-17 19:44:52 +00:00 |
|
Andrey Talman
|
68f783a727
|
[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673)
Signed-off-by: atalman <atalman@fb.com>
|
2026-03-17 18:47:59 +00:00 |
|
Andreas Karatzas
|
4ed51308c8
|
[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-17 09:08:08 -07:00 |
|
Isotr0py
|
a836524d20
|
[Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-17 14:44:19 +00:00 |
|
Bhoomit
|
3717a4dd47
|
[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984)
Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:36:41 +00:00 |
|
Harry Mellor
|
ecfcdd2ce4
|
Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-17 14:29:24 +00:00 |
|