biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
JJJYmmm	04a9e064db	[Bugfix] fix the ima issue of qwen-vit (#32687 ) Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>	2026-01-20 17:21:25 +00:00
whx	4ca62a0dbd	[PluggableLayer][1/N] Define PluggableLayer (#32331 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-20 16:19:21 +00:00
Cyrus Leung	fda3f03eb2	[4/N] Initialize MM components in context managers (M-P) (#32663 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 14:06:32 +00:00
Chauncey	c4e5bdf61b	[Bugfix] Fix the fp8_mqa_logits dim mismatch (#32652 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-20 18:48:07 +08:00
Cyrus Leung	7f1bcd18ff	[3/N] Initialize MM components in context managers (I-L) (#32650 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 10:21:56 +00:00
Cyrus Leung	e1a34c3a5d	[2/N] Initialize MM components in context managers (E-H) (#32641 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 08:12:56 +00:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Cyrus Leung	b75e85dede	[1/N] Initialize MM components in context managers (A-D) (#32632 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 14:12:42 +08:00
Cyrus Leung	4753f3bf69	[Model] Use context managers for encoder- and LM-only mode (#32605 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-20 11:43:38 +08:00
Matthew Bonanni	1a1fc3bbc0	[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32615 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-19 18:41:34 -05:00
Tomas Ruiz	4a5299c93f	feat: spec decode with draft models (#24322 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-01-19 16:05:46 -05:00
jiahanc	7350331718	[BugFix] Fix TRT-LLM NVFP4 DP/EP (#32349 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-19 14:32:24 -05:00
Netanel Haber	cd3ac5b797	support dynamic resolution image encoding for Nemotron Nano VL (#32121 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-01-19 18:15:58 +00:00
Jee Jee Li	2636d76257	[Misc] Remove unused ModelKeys (#32608 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-19 17:34:59 +00:00
danisereb	aa7f37ccfa	Add support for LoRA adapters in Nemotron-H models (#30802 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-01-19 22:30:44 +08:00
Nicolò Lucchesi	74c583bc50	[Core] Whisper support `torch.compile` (#30385 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-19 10:02:31 +00:00
Yuxuan Zhang	71832ba71e	[GLM-4.7] GLM Model support for GLM-Lite (#31386 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Yuxuan Zhang <2448370773@qq.com>	2026-01-19 01:18:38 -08:00
honglyua	976af2f314	[BugFix] Fix embed_input_ids argument error of QwenVLForConditionalGeneration (#32462 )	2026-01-19 03:06:02 +00:00
Iryna Boiko	f5d1740030	[Bugfix] Add OOT backend option (#32471 ) Signed-off-by: Iryna Boiko <iboiko@habana.ai>	2026-01-18 22:20:39 +00:00
Andrey Khalyavin	ba29ab441e	Use the same memory for workspace13 and fused_output. (#31531 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2026-01-18 19:14:22 +00:00
bnellnm	327a02d8db	[MoE Refactor] Separate Router into OO Classes (#30623 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-18 11:40:49 -05:00
tjp_zju	2f03035a61	"refactor: refactor_repeated_interfaces" (#32486 ) Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-18 22:07:01 +08:00
Isotr0py	38bf2ffb21	[Bugfix] Fix GLM-ASR audio encoder RoPE dim (#32540 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-18 19:17:59 +08:00
Li Xie	c826c72a96	[Model] Support Step1 Model (#32511 ) Signed-off-by: xieli <xieli@stepfun.com>	2026-01-18 10:20:46 +00:00
Canlin Guo	fe36bf5e80	[Model] Remove the unnecessary dtype conversion in MiniCPM (#32523 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2026-01-18 08:07:28 +00:00
Robert Shaw	4a6af8813f	[MoE Refactor] Move Test Impl into Test Dirs (#32129 ) Signed-off-by: Robert Shaw <rshaw@neuralmagic.com> Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>	2026-01-18 12:16:59 +08:00
Kim Hee Su	1646fea672	[Model] Molmo2: Enable quantized weight mapping for vision backbone (#32385 ) Signed-off-by: kimheesu <wlskaka4@gmail.com>	2026-01-17 09:33:05 +00:00
Paul Pak	d3317bbba4	[Models] Lfm2Moe: minor name changes for resolving lora conflicts (#29063 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2026-01-16 22:12:55 -08:00
Matthew Bonanni	2e7c89e708	Revert "[Attention][MLA] Make `FLASHINFER_MLA` the default MLA backen… (#32484 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-17 04:42:39 +00:00
Hashem Hashemi	7a1030431a	Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-01-16 11:45:04 -06:00
Cyrus Leung	180e981d56	[Chore] Replace swish with silu (#32459 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-16 08:22:45 +00:00
Rabi Mishra	b66b0d6abb	fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm (#32244 ) Signed-off-by: rabi <ramishra@redhat.com>	2026-01-16 15:31:10 +08:00
Hongxin Xu	03da3b52ef	[Bugfix] Refactor to support DP parallel in R3 (#32306 ) Signed-off-by: xhx1022 <1737006628@qq.com> Co-authored-by: arlenxu <arlenxu@tencent.com>	2026-01-16 15:13:58 +08:00
XiongfeiWei	73f635a75f	[Bug] Add TPU backend option (#32438 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>	2026-01-16 05:17:12 +00:00
Kebe	5de6dd0662	[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding (#32175 ) Signed-off-by: Kebe <mail@kebe7jun.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-16 03:21:55 +00:00
ltd0924	709502558c	[Model] Add Step3vl 10b (#32329 ) Signed-off-by: luotingdan <luotingdan@stepfun.com> Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: luotingdan <luotingdan@stepfun.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-01-15 19:04:16 -08:00
Matthew Bonanni	bcf2333cd6	[CI] Fix LM Eval Large Models (H100) (#32423 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-16 00:52:49 +00:00
Michael Goin	83239ff19a	Add thread_n=64 support to Marlin MoE (#32360 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-15 16:45:44 -08:00
TomerBN-Nvidia	c277fbdf31	[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>	2026-01-15 16:15:05 -08:00
Yongye Zhu	31c29257c8	[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-15 12:53:40 -08:00
Aleksandr Malyshev	8c11001ba2	[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2026-01-15 14:13:08 -06:00
Lucas Wilkinson	c36ba69bda	[BugFix] Fix `assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1]` in Blackwell Quantized MoE Test (#32362 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-15 10:19:12 -08:00
Dipika Sikka	361dfdc9d8	[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-15 07:25:55 -08:00
Matthew Bonanni	8ebfacaa75	[Attention][MLA] Make `FLASHINFER_MLA` the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-15 09:49:57 -05:00
brian033	b89275d018	[ROCm] Improve error handling while loading quantized model on gfx120… (#31715 ) Signed-off-by: brian033 <85883730+brian033@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-01-15 04:16:00 -08:00
Cyrus Leung	cdba4c74b3	[Model] Avoid token selection in SigLIP pooling head (#32389 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-15 17:01:59 +08:00
Lucas Wilkinson	2c9b4cf5bf	[BugFix] Fix DeepSeek-V3.1 + DeepGEMM incompatible scale shapes (#32361 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>	2026-01-15 06:32:22 +00:00
rasmith	3c2685645e	[CI][AMD][Quantization][BugFix] Fix fp8 max in quant_utils.py and update test_fp8_quant.::test_static_fp8_quant_group_2d to use correct fp8 dtype and adjust atol/rtol (#32201 ) Signed-off-by: Randall Smith <ransmith@amd.com>	2026-01-15 05:04:34 +00:00
Cyrus Leung	9ea07b41da	[1/N] Reorganize multimodal processing code (#32327 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-14 15:25:31 +00:00
Roger Wang	b8199f6049	[Model] Re-implement Qwen3Omni Audio Encoder (#32167 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-01-14 15:40:30 +08:00

1 2 3 4 5 ...

3905 Commits