biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yong Hoon Shin	021143561f	[ROCm] Add missing gemm_a8w8_blockscale import (#28378 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-10 23:13:36 +00:00
Lucas Wilkinson	6dec9f6109	[BugFix] Fix DeepGEMM over-allocating workspace (#28254 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-11-10 17:01:17 -05:00
Sage Moore	40d33264c6	[Bugfix][EPLB] Disabled shared expert overlap when EPLB is enabled (#28377 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: Sage Moore <sagemoore@utexas.edu> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-10 20:39:19 +00:00
jiahanc	34553b9d27	[Performance] Support FP8 flashinfer TRTLLM MOE on Qwen3 and Qwen-3next (#27492 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-11-10 12:34:57 -05:00
Varun Sundar Rabindranath	b039bfda8f	[Bugfix] Fix persistent_masked_m_silu_mul_quant tests (#28366 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-10 09:21:52 -08:00
Cyrus Leung	d0e186c16f	[V0 Deprecation] Remove unused `context_len` and `seq_len` from M-RoPE (#28395 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-11 00:30:06 +08:00
vllmellm	f080a83511	[RFC][ROCm][AITER] Keep all AITER kernels in `_aiter_ops` class like `_custom_ops` and `_ipex_ops` (#24490 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-11-10 08:20:53 -08:00
zejunchen-zejun	b06b9470ca	[Rocm][fused_moe][fp4] view weight to torch.float4_e2m1fn_x2 when running aiter fused moe for fp4 model (#27474 ) Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>	2025-11-10 10:38:56 -05:00
Ferrebo	912744d066	[Fix] optimize visual token mask with caching and multi-token support (#28374 ) Signed-off-by: Ferrebo <itachi971009@gmail.com> Signed-off-by: kebo01 <kebo01@baidu.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 13:23:49 +00:00
Yu Jiaqi	15be507c86	[bugfix] fix siglip batch text output error (#28365 ) Signed-off-by: piood <2477084691@qq.com>	2025-11-10 21:21:15 +08:00
Xiake Sun	03fa4d3fb3	[Hardware][AMD][Model] Add Triton MoE tuning support and optimized configs for Qwen3 omni for MI308X (#28373 ) Signed-off-by: Xiake Sun <xiake.sun@amd.com> Signed-off-by: Xiake Sun <xisun@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-10 04:53:40 +00:00
Jiangyun Zhu	c4768dcf47	[Kernel] Fix fused_gdn_gating (#28343 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-09 14:26:35 -07:00
Jiangyun Zhu	7ae5a5fb11	[Misc] Add some comments in qwen3-next (#28267 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-08 23:59:24 -08:00
Yong Hoon Shin	de2b78305f	[ROCm] Add env to enable/disable aiter triton gemm (#28321 ) Signed-off-by: Yong Hoon Shin <yhshin@meta.com>	2025-11-08 22:27:00 -08:00
Mohammad Miadh Angkad	404d7a9d14	[Performance][gpt-oss] Revert gpt-oss max cudagraph size to 1024 (#28345 ) Signed-off-by: Mohammad Miadh Angkad <MAngkad.BSDSBA2027@aim.edu>	2025-11-08 15:50:10 -07:00
Robert Shaw	26990d25dc	[Bugfix] Update device name for H200 detection (#28349 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-08 19:01:11 +00:00
Isotr0py	934a9c3b79	[Model] Consolidate Deepseek-MoE implementation with DeepSeek-v2 (#28101 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 05:01:27 +00:00
Michael Goin	0852527647	[Perf][DeepSeek] Add sigmoid+bias fusion to fused_grouped_topk from TRTLLM (#28124 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-07 18:20:55 -08:00
Kunshang Ji	1aaecda078	[XPU] Enable Expert parallel for MoE models (#28263 ) Signed-off-by: Yan Ma <yan.ma@intel.com> Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-08 00:33:11 +00:00
Pavani Majety	72b1c2ae2c	[Bugfix] Use latency MOE backend as default for Flashinfer and other misc fixes (#27439 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-11-07 04:18:39 -08:00
Lukas Geiger	e0919f331d	[Core][MM] Add mechanism to configure multimodal fields which should stay on CPU (#28168 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-07 12:14:29 +00:00
Kevin H. Luu	8e19d470af	[fix] Revert "fixing mm placeholder replacement issue with gemma3" (#28285 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com>	2025-11-07 12:09:09 +00:00
Mengqing Cao	1958bda9b4	[Misc][Model][Refactor] Pass the prefix into Linear layers (#28259 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-11-07 19:38:38 +08:00
smit kadvani	11fd69dd54	[amd][gptoss] Perf gain because of block alignment (#28024 ) Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com> Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>	2025-11-07 05:27:42 +00:00
Harry Mellor	c0a4b95d64	Fix issues from #28242 (#28257 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-07 04:23:17 +00:00
Lucas Kabela	4bf56c79cc	[Multimodal][torch.compile] Add compilation config field for turning off ViT/MM compile (#28242 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-11-07 00:16:03 +00:00
Aleksandr Malyshev	449de9001a	[ROCm] triton fp8 kernel (#27058 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2025-11-06 14:46:44 -05:00
Julien Denize	7a8375f8a0	Add llama 4 scaling support (#28145 ) Signed-off-by: Julien Denize <julien.denize@mistral.ai>	2025-11-06 18:55:17 +00:00
Eric Yue	0370679ce9	[Kernel][Model] Tune fused_moe Triton configs for MiniMax-M2 on H100 (#28200 ) Signed-off-by: minatoaquaMK2 <jiacheng.yue@foxmail.com>	2025-11-06 07:29:46 -08:00
xiangze-arm	c757a15f0f	[CPU]Improve cpu fused moe perf (#27244 ) Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com>	2025-11-06 11:04:18 +00:00
Seungduk Kim	201dc98acc	Fix hard-coded parameter name in gemma3n.py (#27946 ) Signed-off-by: Seungduk Kim <seungduk.kim@yanolja.com> Signed-off-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2025-11-05 23:07:36 -08:00
Xiaozhu Meng	e31946f86e	[flashinfer] fix FI all2all with FI cutlass moe (#28166 ) Signed-off-by: Xiaozhu <mxz297@gmail.com>	2025-11-06 05:52:16 +00:00
Isotr0py	43ecd0a900	[Chore] Clean up deepseek v2/v3 config copy (#28055 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-06 03:46:30 +00:00
Wentao Ye	d71af5f502	[Feature] Enable TP + EP `shared_experts` overlap with router, 3.7% E2E performance improvement (#28164 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-05 17:21:08 -08:00
Vadim Gimpelson	b6a248bdd7	[PERF] Decouple projections from GDN custom op. Attempt 2 (#28083 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-05 17:01:12 -08:00
wang.yuqi	802748bddb	[Bugfix] Fix Qwen3-Reranker-8B load (#28117 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-11-05 18:33:50 +00:00
Paul Zhang	faedbb4d4f	[Feature] Extend batch invariant torch.compile to B200 (#27856 ) Signed-off-by: PaulZhang12 <paulzhan@fb.com>	2025-11-05 10:04:49 -08:00
Chen Zhang	c765f0b443	[FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2025-11-05 09:25:32 -08:00
Jiangyun Zhu	c18f88c6ca	[Kernel] Fuse computation of g and beta for Gated Delta Net (#28095 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-05 09:14:55 -08:00
Isotr0py	3f5a4b6473	[Bugfix] Validate custom logits processor xargs for online serving (#27560 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-05 16:53:33 +00:00
Ilya Markov	e50c454672	[BugFix] Support EP/DP + EPLB with MTP (#25311 ) Signed-off-by: ilmarkov <markovilya197@gmail.com> Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-11-05 15:22:17 +00:00
Frost Mitchell	6e97eccf5d	[XPU] Enable custom routing functions in IPEX for Llama4 (#28004 ) Signed-off-by: frost-intel <frost.mitchell@intel.com>	2025-11-05 13:39:57 +00:00
amirkl94	6b7a81185d	Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-05 06:06:06 -05:00
Alex Brooks	b7cbc25416	[Model, Core] Support Granite Speech & LoRA for STT (#24455 )	2025-11-05 08:33:48 +01:00
Isotr0py	0ff05e3770	[Bugfix] Fix encoder-only model support for transformers backend (#28021 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-04 22:24:41 -08:00
wangxiyuan	428bc7bf1c	[V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-04 20:51:16 -08:00
Kunshang Ji	18b39828d9	[XPU] Add gpt-oss model support for Intel GPU (#27786 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-11-05 02:17:23 +00:00
tou	4ea62b77f5	[Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740 )	2025-11-05 09:25:09 +08:00
Vadim Gimpelson	d4e547bb7e	Revert "[PERF] Decouple projections from GDN custom op" (#28080 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-04 15:58:23 -08:00
Aleksandr Malyshev	2d977a7a9e	[ROCm] gemm_a16w16 upstreaming (#26969 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-11-04 16:01:00 -05:00

1 2 3 4 5 ...

3211 Commits