biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
vllmellm	1a19e9cd87	[Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize non-standard block size (544) support under rocm_atten (#31380 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-09 19:28:02 +08:00
Cyrus Leung	c8ed39b9dd	[Model] Reorganize pooling layers (#31973 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-09 11:02:14 +00:00
Andreas Karatzas	020732800c	[Bugfix] Fix OpenAPI schema test failures (#31921 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 10:56:20 +00:00
Alex Brooks	dc77cb7129	[Bugfix] Fix Var Length Batched Padding in Granite Speech (#31906 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-09 10:28:43 +00:00
gnovack	bde38c11df	fix lora moe sharding when rank < max_lora_rank (#31994 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-09 14:43:25 +08:00
Xin Yang	707b240d7e	[Bugfix] Fix FusedMoE LoRA w2_output_size (#31949 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-09 00:54:05 -05:00
Nick Hill	29ce48221c	[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 05:44:18 +00:00
TJian	7a05d2dc65	[CI] [ROCm] Fix `tests/entrypoints/test_grpc_server.py` on ROCm (#31970 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-01-09 12:54:20 +08:00
Divakar Verma	a1648c4045	[ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-09 04:04:33 +00:00
RioS	e2d49ec2a4	[Bugfix] missing tokens occur in harmony streaming (#30437 ) Signed-off-by: RioS <aa248424@gmail.com> Signed-off-by: Ri0S <aa248424@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-01-09 03:59:34 +00:00
Xin Yang	8413868dab	[Bugfix] Fix typo in FusedMoE LoRA reshape comment (#31992 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-08 18:46:05 -08:00
zhrrr	8ff4a99566	[Async][Feat] support apply penalty or bad_words for async + spec (#30495 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 02:31:50 +00:00
daniel-salib	a4ec0c5595	[Frontend] Add MCP tool streaming support to Responses API (#31761 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2026-01-09 09:19:34 +08:00
Robert Shaw	0fa8dd24d2	[Bugfix] Fix Typo from NVFP4 Refactor (#31977 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-08 16:18:50 -08:00
Max Hu	6ebe34d6fa	[Feature] Add iteration level logging and enhance nvtx marker (#31193 ) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2026-01-09 00:13:39 +00:00
Nick Hill	11cec296dd	[BugFix] Add spec-decode-incompatible request param validation (#31982 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 00:08:21 +00:00
Robert Shaw	5825bbc1f7	[Quantization] Deprecate Long Tail of Schemes (#31688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-08 19:07:45 -05:00
Yongye Zhu	d62cfe546d	[MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection (#31752 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-08 19:01:30 -05:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Dipika Sikka	5d3b6097ad	[Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support for NVFP4A16 MoEs (#30881 )	2026-01-08 17:45:17 -05:00
bnellnm	e74698c27a	[Misc][Refactor] Add FusedMoERouter object (#30519 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-08 20:52:55 +00:00
Cyrus Leung	aa125ecf0e	[Frontend] Improve error message (#31987 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-08 20:07:03 +00:00
Lucas Kabela	f16bfbe5bc	[Documentation][torch.compile] Add documentation for torch.compile + multimodal encoders (#31627 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-01-08 14:33:24 -05:00
Michael Goin	87e07a6b46	Revert "feat(moe): Add is_act_and_mul=False support for Triton MoE kernels" (#31978 )	2026-01-08 11:31:53 -08:00
Woosuk Kwon	7508243249	[Model Runner V2] Simplify BlockTables with UVA (#31965 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2026-01-08 10:24:26 -08:00
Nicolò Lucchesi	83e1c76dbe	[CI][ROCm] Fix NIXL tests on ROCm (#31728 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-09 01:34:43 +08:00
Nishidha Panpaliya	a563866b48	Fix ijson build for Power. (#31702 ) Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com>	2026-01-08 17:12:33 +00:00
Nick Hill	a3d909ad2b	[Misc] Tidy up some spec decode logic in GPUModelRunner (#31591 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-08 09:10:07 -08:00
Jee Jee Li	49568d5cf9	[Doc] Improve MM models LoRA notes (#31979 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-08 08:55:22 -08:00
danisereb	b8112c1d85	[Bugfix] Fix vllm serve failure with Nemotron Nano V3 FP8 (#31960 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-01-08 16:08:37 +00:00
Chauncey	eaba8ece77	[Bugfix]: Fix Step3ReasoningParser missing is_reasoning_end_streaming (#31969 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-08 15:28:13 +00:00
yxing-bj	fe86be66c5	[Model] Support IQuestCoder model (#31575 ) Signed-off-by: yxing <yxing@iquestlab.com>	2026-01-08 14:42:57 +00:00
Chauncey	1da3a5441a	[Docs]: update claude code url (#31971 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-08 14:04:55 +00:00
TJian	72c068b8e0	[CI] [Bugfix] Fix unbounded variable in `run-multi-node-test.sh` (#31967 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-01-08 05:42:01 -08:00
Mary	7645bc524b	[OpenAI] Fix tool_choice=required streaming when output has trailing extra data (#31610 ) Signed-off-by: maylikenoother <ogedengbemary19@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-01-08 21:01:42 +08:00
Ce Zhao	1123a87892	[Model] Enable LoRA support for Pixtral (#31724 ) Signed-off-by: <> Signed-off-by: 赵策 <alcor@zhaocedeMacBook-Air.local> Signed-off-by: 赵策 <alcor@mac.mynetworksettings.com> Co-authored-by: 赵策 <alcor@mac.mynetworksettings.com>	2026-01-08 05:00:57 -08:00
tianshu-Michael-yu	03fd76c570	[Model] Add LFM2-VL model support (#31758 ) Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-08 05:00:27 -08:00
Bijaya Dangol	59d260f5e4	[Model] Add Grok-2 (#31847 ) Signed-off-by: dangoldbj <dangoldbj23@gmail.com>	2026-01-08 04:59:48 -08:00
Patrick von Platen	18d4e481d0	[Voxtral] Fix speech transcription api (#31388 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: bk-201 <joy25810@foxmail.com> Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: bk-201 <joy25810@foxmail.com> Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com> Co-authored-by: Anexdeus <5142168@mail.ru> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>	2026-01-08 18:34:19 +08:00
Isotr0py	2972a05473	[MM Encoder]: Make MMEncoderAttention's `scale` takes effect properly (#31950 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-08 02:33:48 -08:00
Cyrus Leung	5576227bc1	[Model] Standardize common vision encoders (#31947 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-08 02:33:16 -08:00
Cyrus Leung	d1b6fe007f	[Chore] Further cleanup pooler (#31951 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-08 02:16:21 -08:00
omer-dayan	04a49669d1	RayLLM Bugfix - Preserve obj store URL for multi engine_config creation (#30803 ) Signed-off-by: Omer Dayan <omdayan@nvidia.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-08 10:00:25 +00:00
BingjiaWang	96fcd3c267	[Misc] Support qwen3-next lora (#31719 )	2026-01-08 09:27:50 +00:00
DevByteAI	1f214290d6	fix(compile): apply partition wrapper when loading AOT cached functions (#31536 ) Signed-off-by: Devbyteai <abud6673@gmail.com> Signed-off-by: DevByteAI <161969603+devbyteai@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 17:27:26 +08:00
Ryan Rock	8cbdc7eb94	[CI/Build] Enable test_kv_cache_events_dp for AMD (#31834 ) Signed-off-by: Ryan Rock <ryan.rock@amd.com>	2026-01-08 09:00:24 +00:00
Lumosis	b634e619bb	Decouple page_size_bytes calculation in AttentionSpec for TPU/RPA Compatibility. (#31635 ) Signed-off-by: Lihao Ran <imlihao.ran@gmail.com> Signed-off-by: Lumosis <30372757+Lumosis@users.noreply.github.com>	2026-01-08 09:00:07 +00:00
Isotr0py	eac3b96ec0	[Models] Allow converting Qwen3-VL into Reranker model (#31890 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-08 08:10:15 +00:00
Zhiwei	573a1d1119	[ROCm]Skip test_torchao.py::test_pre_quantized_model on CDNA3 arch (#31905 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2026-01-08 15:47:44 +08:00
Shang Wang	33156f56e0	[docker] A follow-up patch to fix #30913 : `[docker] install cuda13 version of lmcache and nixl` (#31775 ) Signed-off-by: Shang Wang <shangw@nvidia.com>	2026-01-07 23:47:02 -08:00

... 18 19 20 21 22 ...

13773 Commits