biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	0d81a1fe61	[V0 Deprecation] Deprecate virtual engine (#37195 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:30:14 -07:00
Netanel Haber	6ae4c8d6fc	chunk parakeet into 30s clips to prevent OOMs on long audios (#36671 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-18 14:22:24 -07:00
JartX	a913b612d8	[Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795 ) (#37427 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-03-18 16:06:31 -04:00
Harry Mellor	5ce2d10e4a	Fix models which use `layer_type_validation` for Transformers v5 (#37398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-18 18:41:51 +00:00
Chengyu Fang	738d0a281f	[Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp calculation (#37439 ) Signed-off-by: chengyufang <cnyvfang@outlook.com>	2026-03-18 11:36:34 -07:00
youkaichao	70b81c4f3d	[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP (#37449 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2026-03-18 18:32:30 +00:00
Cyrus Leung	7476d148db	[Model] Remove unnecessary processor definition for Nemotron Parse (#37456 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-18 18:25:13 +00:00
Cyrus Leung	f3732bd931	[Misc] Clean up model registry (#37457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-18 18:24:44 +00:00
Wentao Ye	0ef7f79054	[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:18:34 -04:00
Or Ozeri	5dd8df0701	[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 19:26:40 +02:00
Harry Mellor	39bfb57b7c	Add API docs link if the CLI arg is a config class (#37432 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-18 17:19:35 +00:00
RonaldBXu	c9d838fc33	Adding deterministic lora benchmarking to vLLM Bench (#36057 ) Signed-off-by: Ubuntu <ubuntu@ip-172-31-43-201.ap-northeast-1.compute.internal> Signed-off-by: Ronald Xu <ronaldxu@amazon.com>	2026-03-18 16:02:03 +00:00
Xin Yang	b1169d7be8	[Kernel] Add gpt-oss Router GEMM kernel (#37205 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-18 08:15:56 -07:00
XLiu-2000	17808394bc	standardize load_weights using AutoWeightsLoader for kimi_linear and minimax_text_01 (#37371 ) Signed-off-by: XuLiu <xuliu40@gmail.com> Co-authored-by: XuLiu <xuliu40@gmail.com>	2026-03-18 15:05:37 +00:00
elvischenv	296839a1b0	[Perf] Eliminate padding and slicing op for GPT-OSS with Flashinfer MXFP4 MXFP8 MoE (#30647 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-03-18 15:01:26 +00:00
Wentao Ye	c373b5c00d	[Log] Reduce duplicate log (#37313 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 10:57:44 -04:00
Itay Alroy	de1a86b7de	elastic_ep: Fix stateless group port races (#36330 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com>	2026-03-18 14:36:18 +00:00
Cyrus Leung	99267c23ca	[2/3] Refactor InternVL-based processors (#37324 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-18 22:22:19 +08:00
Or Ozeri	525f2eeb0b	[kv_offload+HMA][6/N]: Split offloading_connector.py (#37405 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 14:42:46 +01:00
Yufeng He	918b7890a1	[Bugfix] Fix base64 JPEG video frames returning empty metadata (#37301 ) Signed-off-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Signed-off-by: Yufeng He <40085740+he-yufeng@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Yufeng He <40085740+universeplayer@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-18 13:40:03 +00:00
Andy Lo	98b09ddc27	[NIXL][Bugfix] metrics & testing minor bug (#36051 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 14:39:14 +01:00
Shwetha Poojary	cef1f302d2	[Model] Enable LoRA support for tower and connector in H2OVL (#31696 ) Signed-off-by: shwetha-s-poojary <shwetha.s-poojary@ibm.com>	2026-03-18 13:26:47 +00:00
Elvir Crnčević	17c47fb869	[Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy (#37322 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-03-18 18:30:29 +08:00
Chauncey	b322b197f1	[Build] Bump python openai version (#32316 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-18 18:20:10 +08:00
Andreas Karatzas	eaf7c9b976	[CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 09:44:12 +00:00
Aaron Hao	47a1f11bff	[docs] Add docs for new RL flows (#36188 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-18 09:04:26 +00:00
Karan Bansal	fad09e8a1f	fix(glm47): improve tool call parsing and content normalization (#37386 ) Signed-off-by: karanb192 <karan@example.com> Co-authored-by: karanb192 <karan@example.com>	2026-03-18 08:12:21 +00:00
Jee Jee Li	8c31f47c63	[LoRA] Make LoRA respect `language_model_only` (#37375 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-18 07:53:34 +00:00
Li, Jiang	261801242f	[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-18 07:51:39 +00:00
Or Ozeri	fcf0687b27	[kv_offload+HMA][0/N]: Support block-level preemption handling (#34805 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-03-18 08:49:53 +02:00
liuzhenwei	86b7e3c95a	[XPU] skip unsupported ut and update test_nixl_connector (#37179 ) Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-18 13:32:59 +08:00
Andrew Xia	0e95916155	[responsesAPI] parser.extract_response_outputs can take in token IDs (#37130 ) Signed-off-by: Andrew Xia <axia@meta.com>	2026-03-18 05:31:31 +00:00
Andreas Karatzas	ce2ef42fd3	[CI] Stabilize test_cpu_offloading by waiting for async offload before cache reset (#37335 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 05:26:20 +00:00
Andreas Karatzas	8b6325758c	[ROCm][CI] Add ROCM_EXTRA_ARGS to audio_in_video test server fixture (#37349 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 04:55:40 +00:00
gxd3	a0dd1995c7	[Hardware][TPU] Add supports_async_scheduling() method to Executor interface so that it can be extended for Executor implementations. (#36924 ) Signed-off-by: Guangxiang Du <gxd@google.com>	2026-03-18 12:53:28 +08:00
Xin Yang	f1740006e4	[Perf] Enable dual stream execution of input projection for Qwen3 (#36795 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-18 11:13:27 +08:00
Andreas Karatzas	58cde5c026	[ROCm][CI] Skip trtllm kvfp8 dequant tests on ROCm (#37330 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-18 11:12:26 +08:00
Roy Wang	761e0aa7a0	[Performance] Add --enable-ep-weight-filter CLI option (#37351 ) Signed-off-by: esmeetu <jasonailu87@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-18 09:36:55 +08:00
Yanan Cao	ff9fbc9aff	[Kernel][Helion] [16/N] Refactor register_kernel API to be more Dynamo-friendly (#36705 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-18 01:23:35 +00:00
Divakar Verma	e6c4797704	[ROCm][Quantization] add fp8xfp8 attn support for rocm_aiter_unified_attn (#36927 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-03-18 08:49:32 +08:00
Michael Goin	09e4576f65	[Kernel] Add non-gated support for NVFP4 CUTLASS MoE (#37320 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-03-17 18:12:04 -04:00
Andreas Karatzas	3ed7b1e6e0	[ROCm] Validate block_size for explicitly selected attention backends (#36846 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-17 17:04:40 -05:00
JartX	e8f9dbc369	[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-03-17 17:55:34 -04:00
Yong Hoon Shin	de35c06c66	Make KV connector metadata build overridable via plugin (#37336 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2026-03-17 21:29:06 +00:00
Athrael Soju	c0745a851a	[Model] Add ColQwen3.5 4.5B support (#36887 ) Signed-off-by: Athrael Soju <athrael.soju@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-17 21:17:02 +00:00
Ekagra Ranjan	b5ca9c3557	[Models] Cohere ASR (#35809 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2026-03-17 21:04:17 +00:00
Chao-Ju Chen	245758992e	[Bugfix] Rescale NVFP4 weight scales to fix BF16 dequant underflow (#34577 ) Signed-off-by: ricky-chaoju <ricky.chen@infinirc.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-17 20:48:42 +00:00
Dimitrios Bariamis	1204cf0a9d	[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158 ) Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com> Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>	2026-03-17 20:13:06 +00:00
Wei Zhao	b36adfa349	[Perf] Set Flashinfer sparse MLA as default backend for FP8 kv cache (#37252 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-17 20:09:20 +00:00
Michael Goin	e78821b438	[Deprecation] Deprecate `--calculate-kv-scales` option (#37201 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-03-17 19:57:24 +00:00

1 2 3 4 5 ...

14989 Commits