biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Fynn Schmitt-Ulms	9433acb8df	[Spec Decode] Add hidden states extraction system (#33736 ) Signed-off-by: Fynn Schmitt-Ulms <fschmitt@redhat.com>	2026-03-02 14:29:09 -05:00
Richard Zou	d1a6e96d9e	[torch.compile] Improve cold and warm start compile tests (#35709 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-03-02 19:27:06 +00:00
Isotr0py	cc0d565f40	[CI/Build] Enable Qwen3.5 tests on CI (#35763 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-02 17:43:53 +00:00
Turner Jabbour	4034c3d32e	[Core] Move test utility to test file (#35672 ) Signed-off-by: Turner Jabbour <doubleujabbour@gmail.com>	2026-03-02 10:56:03 -05:00
Runkai Tao	ada4f4fadd	[Fix Bug]`num_active_loras` always equals to zero (#34119 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-02 23:17:46 +08:00
Andreas Karatzas	ec27b36b4b	[CI] Defining extended V1 e2e + engine tests (#35580 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 08:10:54 +00:00
EdalatiAli	cb21972a97	[Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-03-01 23:31:19 -08:00
Andreas Karatzas	c34963f138	[ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-02 15:04:18 +08:00
Hongxia Yang	f26650d649	[ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com> Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-02 06:02:43 +00:00
Jesse Cai	a60985b07e	Fix deprecated v1 config tests (#35327 ) Signed-off-by: Jesse Cai <jessecai@fb.com>	2026-03-01 20:32:03 -05:00
haosdent	6290470843	[Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-03-01 15:14:46 -05:00
Asaf Gardin	bbf81f9a92	[Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-03-01 20:40:23 +08:00
Ryan Rock	87d319c52f	[AMD][CI] Support Triton attention with ExampleConnector (#34931 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-03-01 09:58:07 +02:00
gnovack	3ecd0bf9fc	Add TMA support to fused_moe_lora kernel (#32195 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-01 10:55:25 +08:00
Martin Vit	95a395dbec	[Bugfix] Fix Anthropic API base64 image handling in Messages endpoint (#35557 ) Signed-off-by: Martin Vit <martin@voipmonitor.org>	2026-02-28 20:57:08 +00:00
Wentao Ye	e113a30113	[Deprecation] Deprecate code in 0.17 as scheduled (#35441 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-28 17:32:37 +00:00
Augusto Yao	8e75d88554	add io_process_plugin for sparse embedding (#34214 ) Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com> Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-28 09:16:37 +00:00
Hashem Hashemi	7600642eae	Add padding support to wvSplitK solution for skinny GEMMs (#33762 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-28 09:02:05 +00:00
Andreas Karatzas	1e69c04887	[ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances (#35571 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-28 08:59:26 +00:00
Chauncey	06254d4cbb	[CI] add trainer_send_weights for MockWeightTransferEngine (#35589 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-02-28 06:47:43 +00:00
Andreas Karatzas	f5d1281c9d	[ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-28 13:57:31 +08:00
Umut Polat	1d5ab5d603	[Bugfix] Move chat completion response_format validation to Pydantic model_validator (#35510 ) Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>	2026-02-27 21:26:19 -08:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Aaron Hao	2ce6f3cf67	[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171 ) Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-02-27 13:45:21 -07:00
Lucas Wilkinson	1d532f9d8f	[DP] Only use DP padding when cudagraphs are actually used (#34102 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-27 15:14:31 -05:00
Zhengxu Chen	29b35477b0	[compile] Fix caching error over pytree slice node. (#35308 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-27 19:34:16 +00:00
Huamin Li	157722da75	[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2026-02-28 01:50:37 +08:00
fort726	905d76b51d	[Model] Add huggingface skt/A.X-K1 model (#32407 ) Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com> Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-27 09:26:02 -08:00
Yanan Cao	9098ce690c	[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching (#34390 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-27 09:21:35 -08:00
Yueqian Lin	e8249378e4	[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests (#35487 ) Signed-off-by: linyueqian <linyueqian@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-27 06:48:25 -08:00
Max Hu	9c3fe9936b	Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580 ) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Shang Wang <shangw@nvidia.com>	2026-02-27 20:20:23 +08:00
Umut Polat	b66a74649e	[Bugfix] Replace assert with ValueError for response_format validation in completions endpoint (#35456 ) Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>	2026-02-27 08:01:06 +00:00
gnovack	a532c83849	use 'max_active_experts' for moe lora input size (#33197 ) Signed-off-by: gnovack <gnovack@amazon.com>	2026-02-27 03:50:43 +00:00
Nicolò Lucchesi	cabdaa7619	[Misc] Move `GPUModelRunner.prepare_kernel_block_sizes` to utils (#35400 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-27 11:42:51 +08:00
daniel-salib	d43048ce05	[Bugfix] Emit reasoning_part events in simple streaming path for Resp… (#35184 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2026-02-27 09:49:06 +08:00
Michael Goin	4fec53cfcb	[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274 )	2026-02-26 17:58:03 -07:00
Andrii Skliar	56a6371706	[Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687 ) Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Andrii <askliar@nvidia.com>	2026-02-26 16:31:43 -08:00
Tyler Michael Smith	eb19955c37	[WideEP] Remove pplx all2all backend (#33724 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-26 14:30:10 -08:00
Lucia Fang	0f2f24c8b2	[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism (#35429 ) Signed-off-by: Lu Fang <fanglu@fb.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-26 22:08:16 +00:00
不做了睡大觉	967572dd5f	fix(reasoning): Qwen3ReasoningParser returns truncated output as reasoning (#35230 ) Signed-off-by: stakeswky <stakeswky@users.noreply.github.com> Co-authored-by: stakeswky <stakeswky@users.noreply.github.com>	2026-02-26 20:30:45 +00:00
Lucas Wilkinson	5e58bdc711	[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint (#35354 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-26 18:44:50 +00:00
Yiliu Dong	d940607629	[Core] Support `min_tokens` with speculative decoding (#32642 ) Signed-off-by: qianlihuang <yiliu.dong@qq.com> Co-authored-by: qianlihuang <yiliu.dong@qq.com>	2026-02-26 12:31:28 -05:00
Wentao Ye	99c7892c5b	[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement (#35330 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 17:14:54 +00:00
Jakub Zakrzewski	111d869069	[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-02-26 14:17:17 +00:00
Cyrus Leung	845ee348ef	[Misc] Standardize handling of `mm_processor_kwargs.size` (#35284 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-26 13:05:46 +00:00
Yueqian Lin	c0615a296d	[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression (#35368 ) Signed-off-by: linyueqian <linyueqian@outlook.com>	2026-02-26 11:58:23 +00:00
Jiangyun Zhu	ab87f85231	[Model] Ring 2.5 (#35102 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-26 02:17:11 -08:00
Krish Gupta	3827c8c55a	[Test] Add tests for n parameter in chat completions API (#35283 ) Signed-off-by: KrxGu <krishom70@gmail.com>	2026-02-26 09:14:07 +00:00
Chaojun Zhang	9f9a675b23	[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115 ) Signed-off-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-26 15:46:44 +08:00
Cyrus Leung	d3a51da92a	[Benchmark] Simplify SLA scan (#35306 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-25 22:35:41 -08:00

1 2 3 4 5 ...

4625 Commits