biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Or Ozeri	5dd8df0701	[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 19:26:40 +02:00
Xin Yang	b1169d7be8	[Kernel] Add gpt-oss Router GEMM kernel (#37205 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-18 08:15:56 -07:00
elvischenv	296839a1b0	[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-03-18 15:01:26 +00:00
Cyrus Leung	99267c23ca	[2/3] Refactor InternVL-based processors (#37324 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-18 22:22:19 +08:00
Or Ozeri	525f2eeb0b	[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 14:42:46 +01:00
Yufeng He	918b7890a1	[Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301 ) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-18 13:40:03 +00:00
Andy Lo	98b09ddc27	[NIXL][Bugfix] metrics & testing minor bug (#36051 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 14:39:14 +01:00
Chauncey	b322b197f1	[Build] Bump python openai version (#32316 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-18 18:20:10 +08:00
Andreas Karatzas	eaf7c9b976	[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 09:44:12 +00:00
Karan Bansal	fad09e8a1f	fix(glm47): improve tool call parsing and content normalization (#37386 ) Signed-off-by: karanb192 <karan@example.com> Co-authored-by: karanb192 <karan@example.com>	2026-03-18 08:12:21 +00:00
Or Ozeri	fcf0687b27	[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-18 08:49:53 +02:00
liuzhenwei	86b7e3c95a	[XPU] skip unsupported ut and update test_nixl_connector (#37179 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-18 13:32:59 +08:00
Andreas Karatzas	ce2ef42fd3	[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 05:26:20 +00:00
Andreas Karatzas	8b6325758c	[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 04:55:40 +00:00
gxd3	a0dd1995c7	[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924 ) Signed-off-by: Guangxiang Du <gxd@google.com>	2026-03-18 12:53:28 +08:00
Andreas Karatzas	58cde5c026	[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 11:12:26 +08:00
Yanan Cao	ff9fbc9aff	[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 01:23:35 +00:00
Michael Goin	09e4576f65	[Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-03-17 18:12:04 -04:00
Yong Hoon Shin	de35c06c66	Make KV connector metadata build overridable via plugin (#37336 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2026-03-17 21:29:06 +00:00
Athrael Soju	c0745a851a	[Model] Add ColQwen3.5 4.5B support (#36887 ) Signed-off-by: Athrael Soju <athrael.soju@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-17 21:17:02 +00:00
Ekagra Ranjan	b5ca9c3557	[Models] Cohere ASR (#35809 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-17 21:04:17 +00:00
Cyrus Leung	51f0acda79	[Model] Remove unused `handle_oov_mm_token` (#37321 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-17 19:44:52 +00:00
Andrey Talman	68f783a727	[Torch 2.11] Guard torch._C._cpu attribute checks for forward compatibility (#35673 ) Signed-off-by: atalman <atalman@fb.com>	2026-03-17 18:47:59 +00:00
Andreas Karatzas	4ed51308c8	[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-17 09:08:08 -07:00
Isotr0py	a836524d20	[Chore] Replace all base64 usages with faster pybase64 package (#37290 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-17 14:44:19 +00:00
Bhoomit	3717a4dd47	[Misc][LoRA] Add --lora-target-modules to restrict LoRA to specific modules (#34984 ) Signed-off-by: Bhoomit Vasani <bhoomit.2010@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-17 14:36:41 +00:00
Harry Mellor	ecfcdd2ce4	Fix Phi3 test that fails with Transformers v5 (#37298 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-17 14:29:24 +00:00
Sage	59192dfd39	[Frontend] Complete OpenAI render delegation (#37287 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-17 13:53:55 +00:00
Cyrus Leung	f340324335	[1/2] Move InternVL-based processors (#37260 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-17 21:50:56 +08:00
Viacheslav	293f036e6d	Add gigachat 3.1 tool parser + fix gigachat3 tool parser (#36664 ) Signed-off-by: Viacheslav Barinov <viacheslav.teh@gmail.com>	2026-03-17 12:03:20 +00:00
Sage	00f8e0d211	[Frontend] Delegate tokenization serving preprocessing to OpenAIServingRender (#37266 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-17 11:22:54 +00:00
Augusto Yao	9c7cab5ebb	[Feature]: Support for multiple embedding types in a single inference call (#35829 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>	2026-03-17 17:05:42 +08:00
Chauncey	132bfd45b6	[Bugfix][ResponsesAPI] Fix crash when tool_choice=required exceeds max_output_tokens (#37258 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-17 08:54:52 +00:00
Benjamin Chislett	8a680463fa	[Bugfix] Fix NemotronH MTP + Chunked Prefill (#35447 )	2026-03-17 07:07:33 +01:00
Flora Feng	3e3d320c1b	[Refactor] Relocate responses API tests (#37241 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 05:14:52 +00:00
Flora Feng	384dc7f77b	[Refactor] Relocate completion and chat completion tests (#37125 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 11:31:23 +08:00
Flora Feng	f04d5226f8	[CI] Fix flaky tool_use chat completion tests with deterministic seed (#37027 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-17 03:24:34 +00:00
Vadim Gimpelson	6c1cfbad32	Support non-contiguous KV cache in TRTLLM fp8 dequant kernel (#36867 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com> Co-authored-by: Pavani Majety <pavanimajety@gmail.com>	2026-03-16 17:48:42 -07:00
Harry Huang	45f526d652	[BugFix] Correct max memory usage for multiple KV-cache groups (#36030 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-03-17 00:38:52 +00:00
Walter Beller-Morales	061980c36a	[Feature][Frontend] add support for Cohere Embed v2 API (#37074 ) Signed-off-by: walterbm <walter.beller.morales@gmail.com>	2026-03-16 19:55:53 -04:00
Ben Browning	7a49742b88	[CI/Build] Add common tool call parser test suite (#27599 ) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2026-03-16 19:46:20 -04:00
Terry Gao	3e6a1e1686	[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389 ) Signed-off-by: tianrengao <terrygao87@gmail.com>	2026-03-16 18:51:46 -04:00
Andreas Karatzas	4f9b14c21c	[CI] Stabilize multinode DP internal LB completion tests (#36356 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 15:40:23 -07:00
EdalatiAli	e5b807607c	[Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com>	2026-03-16 18:07:39 -04:00
Krish Gupta	c0f011918d	[Bugfix] opcheck false mutation error in rms_norm_per_block_quant (#36688 ) (#36779 ) Signed-off-by: Krish Gupta <krishom70@gmail.com>	2026-03-16 21:11:33 +00:00
rasmith	2cc26c3a99	[CI][BugFix][MORI][AMD] Add transfer_id to kv transfer params for test (#37213 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-16 13:22:57 -07:00
Flora Feng	dfa8852db2	[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Flora Feng <4florafeng@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-16 15:53:07 -04:00
Nicolò Lucchesi	f5c081d432	[PD][Nixl] Add support for hybrid SSM-FA models (#36687 )	2026-03-16 19:58:06 +01:00
Max de Bayser	9f9ecff4cd	Add simple granite4 tool parser (#36827 ) Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com>	2026-03-16 10:49:09 -07:00
haosdent	ca1954d58c	[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090 ) Signed-off-by: haosdent <haosdent@gmail.com> Co-authored-by: Or Ozeri <oro@il.ibm.com>	2026-03-16 19:03:10 +02:00

1 2 3 4 5 ...

4859 Commits