biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
amirkl94	e01ff5c070	Bugfix: Pass router logits dtype in nemotron shared experts (#32669 ) Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>	2026-01-29 09:36:34 +00:00
whx	08b1195e62	[PluggableLayer][2/N] Apply PluggableLayer to linear layers (#33152 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-01-29 16:53:15 +08:00
Ilya Markov	53fc166402	[BugFix] Fix EPLB fail for MoeFP4 model with Marlin backend (#33262 ) Signed-off-by: ilmarkov <markovilya197@gmail.com>	2026-01-29 16:52:11 +08:00
Didier Durand	31b25f6516	[Doc]: fixing multiple typos in diverse files (#33256 ) Signed-off-by: Didier Durand <durand.didier@gmail.com> Signed-off-by: Didier Durand <2927957+didier-durand@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 16:52:03 +08:00
wang.yuqi	abb34ac43a	[Bugfix] Fix Qwen3-VL-Reranker load. (#33298 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-29 08:42:53 +00:00
Kiersten Stokes	9e138cb01d	[Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189 ) Signed-off-by: kiersten-stokes <kierstenstokes@gmail.com>	2026-01-29 06:55:50 +00:00
Cyrus Leung	51550179fc	[Refactor] Define MM data parser in processing info instead of processor itself (#33260 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-29 13:55:17 +08:00
Angela Yi	07ea184f00	[ez] Delete more torch version checks <= 2.8 (#33288 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-29 05:28:46 +00:00
Michael Goin	141cd43967	[UX] Remove noisy CT UnquantizedLinearMethod warn (#33273 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-28 16:09:30 -08:00
Angela Yi	4197168ea5	[ez] Remove checks for torch version <= 2.8 (#33209 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-28 16:03:56 -05:00
Rohan Potdar	59bcc5b6f2	Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-01-28 20:47:47 +00:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Harry Mellor	f1acbd68c5	[CI] Enable mypy import following for `vllm/compilation` (#33199 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 08:59:54 +00:00
ramos	36d450e3b8	Adds FunAudioChat multimodal audio model support (#2 ) (#33058 ) Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com> Signed-off-by: mayufeng <mayufeng@example.com> Co-authored-by: mayufeng <mayufeng@example.com>	2026-01-28 05:18:09 +00:00
Harry Mellor	35fb0b8613	Don't use `min_pixels`/`max_pixels` from Qwen2VL's processor (#33208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 05:02:08 +00:00
Harry Mellor	2eb673a088	Add flake8-implicit-str-concat rules to Ruff (#33191 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 04:56:10 +00:00
Richard Zou	d9aa39a3bb	[torch.compile] Speed up MOE handling in forward_context (#33184 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-27 15:17:54 -08:00
Matthew Bonanni	1cbccb6dba	[Attention] Use `has_flashinfer` helper (#33177 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 18:33:17 +00:00
Iris	bd92089d33	feature: support eagle3 for HunyuanVL & Hunyuan (#33035 ) Signed-off-by: irisliu10 <601012173@qq.com> Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com>	2026-01-27 17:55:48 +00:00
IriKa	66e601ef79	Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076 ) Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>	2026-01-27 11:04:05 -05:00
danielafrimi	83fb2d09e8	Support heterogeneous NemotronHPuzzle model (#32549 ) Signed-off-by: <dafrimi@nvidia.com> Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by: root <dafrimi@nvidia.com>	2026-01-27 10:55:54 -05:00
danisereb	f3a5ee705f	[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models (#32265 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-27 07:53:26 -08:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Harry Mellor	14385c80fc	Fix weight mapping test for Transfomers v5 (#33162 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-27 12:30:14 +00:00
Lifan Shen	da8d0c441a	[AMD][QWEN3-NEXT] FP8 Tunings (#32042 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2026-01-27 09:34:13 +00:00
Roger Wang	b539f988e1	[Models] Kimi-K2.5 (#33131 ) Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-27 14:50:31 +08:00
Robert Shaw	5a93b9162b	[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>	2026-01-27 01:28:02 +00:00
XiongfeiWei	510ed1e8d3	[Bugfix][TPU] Return a Default fp8 MoE Backend (#32908 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-26 18:46:11 -05:00
Pengchao Wang	8caffd92df	[Bugfix][MXFP4] Call `trtllm_fp4_block_scale_moe` with kwargs (#33104 ) Signed-off-by: Pengchao Wang <wpc@fb.com>	2026-01-26 15:13:18 -08:00
Wentao Ye	8f987883cb	[Refactor] Remove unused `_moe_permute` function (#33108 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-26 16:06:45 -05:00
Robert Shaw	43a013c3a2	[Bugfix] Fix Dtypes for Pynccl Wrapper (#33030 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-26 20:09:32 +00:00
Cyrus Leung	c25dbee40d	[Model] Bump transformers version for test registry (#33100 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 18:53:22 +00:00
Nicolò Lucchesi	19ab0f7ce5	[Bugfix] Fix Voxtral streaming slot_mapping (#33073 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-26 10:40:40 -08:00
danielafrimi	67fe677c53	[FIX] Always support TP > 4 for FP4 Gemm (#31099 ) Signed-off-by: dafrimi <dafrimi@nvidia.com> Co-authored-by: root <root@gpu-51.slurm-workers-slurm.slurm.svc.cluster.local>	2026-01-26 11:04:20 -07:00
Andy Lo	d56afd45fd	Remove unused logic in `models/mistral.py` (#33095 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-01-26 09:01:52 -08:00
Pleaplusone	be6931ee27	[ROCm][Bugfix] Fix ptpc scale load issue for fused shared expert path in deepseek mtp (#33018 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-26 23:19:04 +08:00
Yuxuan Zhang	bb17e8f11c	[GLM-OCR] GLM-OCR with MTP Support (#33005 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-26 06:24:43 -08:00
Cyrus Leung	dcd80206b7	[Chore] Update type annotation of `input_ids` in model forward (#33063 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 06:02:10 -08:00
danisereb	f4a0921c9c	[Performance] Tune Mamba selective scan kernel for B200 (#32873 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-26 05:56:54 -08:00
VihaanThat	208c56256f	[Feature] Add LoRA support for Gemma3 vision components (#32764 )	2026-01-26 13:56:40 +00:00
Itay Etelis	6ca2c91b96	[Model] Use mm_position to compute mrope positions for Qwen3-Omni (#33010 ) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com>	2026-01-26 13:48:07 +00:00
Cyrus Leung	11b556878b	[Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 15:00:28 +08:00
ltd0924	105d104576	[StepVL] support close img patch (#32923 ) Signed-off-by: luotingdan <luotingdan@stepfun.com> Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com> Co-authored-by: luotingdan <luotingdan@stepfun.com>	2026-01-25 20:56:39 -08:00
Lucas Wilkinson	566cdb6cfb	[CI] Fix MHA attention test failure (AttributeError when model_config is None in ViT attention backend) (#33033 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-25 19:49:53 -08:00
Andreas Karatzas	22aeb43007	[Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling (#32969 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-26 08:34:05 +08:00
Itay Etelis	a698e8e7ad	[Model] Use mm_position to compute mrope positions for Qwen2.5-Omni (#32772 ) Signed-off-by: Itay Etelis <itay.etelis@ibm.com> Co-authored-by: Itay Etelis <itay.etelis@ibm.com>	2026-01-25 20:15:53 +08:00
JJJYmmm	7e67df5570	[Bugfix] fix encoder cache hang in Qwen3VL (#32684 ) Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-25 05:17:31 +00:00
Roberto L. Castro	fcb9df99bd	[Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-01-24 18:45:27 -07:00
Lucas Wilkinson	da5e7b12be	[MLA] Fuse cat and qaunt for fp8 kv-cache (#32950 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-24 16:03:02 +00:00

1 2 3 4 5 ...

3905 Commits