biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Or Ozeri	a663b218ae	[Misc] Add orozery to CODEOWNERS (core, kv_transfer, kv_offload) (#33227 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-01-29 04:24:20 +00:00
Michael Goin	1bd47d6e5a	[Bugfix] Register fp8 cutlass_group_gemm as supported for only SM90+SM100 (#33285 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-28 18:40:59 -08:00
Michael Goin	141cd43967	[UX] Remove noisy CT UnquantizedLinearMethod warn (#33273 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-28 16:09:30 -08:00
Nick Hill	6bf3b46d78	[ModelRunner V2] Misc code simplification and cleanup (#33266 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-28 14:41:23 -08:00
Matthew Bonanni	77c4f45c6c	[7/N][Attention][Docs] Add documentation for attention backends (#32477 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-28 17:20:22 -05:00
Michael Goin	ca1969186d	[UX] Enable nested configs in config yaml files (#33193 )	2026-01-28 16:54:25 -05:00
Gregory Shtrasberg	ab597c869a	[Bugfix] Add missing encoder only guard for do_kv_cache_update (#33269 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-01-28 21:25:07 +00:00
Angela Yi	4197168ea5	[ez] Remove checks for torch version <= 2.8 (#33209 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-28 16:03:56 -05:00
Rohan Potdar	59bcc5b6f2	Use aiter triton fused_add_rmsnorm_pad for gpt-oss (#30976 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-01-28 20:47:47 +00:00
Wentao Ye	3e440786af	[Feature] Fully support for async scheduling + PP, 30.8% E2E throughput improvement, 31.8% TPOT improvement (#32618 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-28 20:30:32 +00:00
Kevin H. Luu	8bdd3979d8	[CI] Change GPU key to device key for B200 test (#33275 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-01-28 19:14:29 +00:00
Wentao Ye	c4e744dbd4	[Perf] Optimize `moe_permute` for CUTLASS FP8 (#32892 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-28 10:15:24 -08:00
Nicolò Lucchesi	8ebf372e9d	[CI] Whisper tests `enforce_eager=False` (#33098 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-28 09:36:56 -08:00
cwazai	f210f0b7b1	[lora/moe] Avoid extra intermediate buffer & Python slicing in expand phase when split_k == 1 (#32774 ) Signed-off-by: 陈建华 <1647430658@qq.com>	2026-01-29 00:22:45 +08:00
Bin Bao	392c5af4fe	[Benchmark] Add startup benchmarking to buildkite run (#33183 ) Signed-off-by: Bin Bao <binbao@meta.com>	2026-01-28 16:03:07 +00:00
Robert Shaw	af9b69f977	[Quantization][Deprecation] Remove Marlin 24 (#32688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 15:54:59 +00:00
Chauncey	8e5e40daf4	[Misc] Provide a DeepSeek ReasoningParser with thinking enabled by default (#33221 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-28 21:16:53 +08:00
Or Ozeri	2e8de86777	Revert "Enable Cross layers KV cache layout at NIXL Connector (#30207 )" (#33241 ) Signed-off-by: Or Ozeri <oro@il.ibm.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-01-28 04:36:00 -08:00
Robert Shaw	247d1a32ea	[Quantization][Deprecation] Remove BitBlas (#32683 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-28 11:06:22 +00:00
Kevin H. Luu	ecb4f82209	[CI] Update job dependency syntax for Intel and AMD jobs (#33240 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-01-28 01:33:59 -08:00
Kevin H. Luu	5914090765	[CI] Update job dependency for hardware and CPU jobs (#33237 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-01-28 01:10:05 -08:00
Harry Mellor	f1acbd68c5	[CI] Enable mypy import following for `vllm/compilation` (#33199 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 08:59:54 +00:00
Yan Ma	9581185d51	[XPU]disable test_acceptance_length UT (#33226 )	2026-01-28 15:24:13 +08:00
Maryam Tahhan	2dd359f953	[Docs] Simplify CPU x86 Docker build documentation (#33071 ) Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>	2026-01-28 06:37:09 +00:00
Gregory Shtrasberg	22ad649501	[ROCm] Enabling forward_includes_kv_cache on ROCm MHA backends (#33106 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-01-28 14:36:14 +08:00
ramos	36d450e3b8	Adds FunAudioChat multimodal audio model support (#2 ) (#33058 ) Signed-off-by: ramos <49182011+nemoramo@users.noreply.github.com> Signed-off-by: mayufeng <mayufeng@example.com> Co-authored-by: mayufeng <mayufeng@example.com>	2026-01-28 05:18:09 +00:00
22quinn	a2b877df6c	[Bugfix] Lazy import NgramProposer in GPU model runner (#32821 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2026-01-27 21:07:16 -08:00
Harry Mellor	35fb0b8613	Don't use `min_pixels`/`max_pixels` from Qwen2VL's processor (#33208 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 05:02:08 +00:00
Harry Mellor	2eb673a088	Add flake8-implicit-str-concat rules to Ruff (#33191 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 04:56:10 +00:00
Jeffrey Wang	a97b5e206d	Relax protobuf library version constraints (#33202 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-01-28 04:15:53 +00:00
Micah Williamson	911b51b69f	[ROCm][CI] Add TORCH_NCCL_BLOCKING_WAIT For Distributed Tests (A100) (#32891 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-01-28 11:32:31 +08:00
Xinan Miao	604e3b87e8	[Feature]: Container image WORKDIR consistency (#33159 ) Signed-off-by: SouthWest7 <am1ao@qq.com> Co-authored-by: SouthWest7 <am1ao@qq.com>	2026-01-28 11:06:48 +08:00
Harry Mellor	706f123b23	[Docs] Use definition lists for CLI reference docs (#33186 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Ashwin Phadke <23502062+ashwin-phadke@users.noreply.github.com>	2026-01-28 02:22:48 +00:00
Angela Yi	fb7abfc1d0	[docs] Improve tlparse section (#33211 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-01-28 02:07:37 +00:00
Kevin H. Luu	5d3d6e44e8	[CI] minor fixes to pipeline generator and tests (#33151 ) Signed-off-by: khluu <khluu000@gmail.com>	2026-01-27 17:04:02 -08:00
Woosuk Kwon	46ec6d71c7	[Model Runner V2] Use a different stream for grammar bitmask h2d copy (#33059 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-01-27 16:37:43 -08:00
Matthew Bonanni	e82fa448c4	Add attention benchmarking tools (#26835 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-01-28 00:09:20 +00:00
Richard Zou	d9aa39a3bb	[torch.compile] Speed up MOE handling in forward_context (#33184 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-27 15:17:54 -08:00
Wentao Ye	3a6d5cbefd	[Perf] Optimize dcp allocate tensor (#33102 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-27 17:24:41 -05:00
linhaifeng	f5d7049cc1	[Bugfix] Fix display error (inconsistent with context) (#33020 ) Signed-off-by: linhaifeng <1371675203@qq.com>	2026-01-27 20:33:29 +00:00
Alexei-V-Ivanov-AMD	3c3c547ce0	Enabling "2 node" distributed tests in the AMD CI pipeline. (#32719 ) Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com> Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-01-27 19:13:21 +00:00
Matthew Bonanni	1cbccb6dba	[Attention] Use `has_flashinfer` helper (#33177 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 18:33:17 +00:00
Iris	bd92089d33	feature: support eagle3 for HunyuanVL & Hunyuan (#33035 ) Signed-off-by: irisliu10 <601012173@qq.com> Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com>	2026-01-27 17:55:48 +00:00
Karan Bansal	a6760f1525	[Doc] Improve serve parameter documentation with meaningful defaults (#33082 ) Signed-off-by: Karan Bansal <karanb192@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-27 09:19:37 -08:00
IriKa	66e601ef79	Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076 ) Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>	2026-01-27 11:04:05 -05:00
Nick Hill	0cd259b2d8	[BugFix] Fix P/D with non-MoE DP (#33037 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-27 08:03:47 -08:00
danielafrimi	83fb2d09e8	Support heterogeneous NemotronHPuzzle model (#32549 ) Signed-off-by: <dafrimi@nvidia.com> Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by: root <dafrimi@nvidia.com>	2026-01-27 10:55:54 -05:00
danisereb	f3a5ee705f	[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models (#32265 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-27 07:53:26 -08:00
wang.yuqi	7cbbca9aaa	[Frontend] Cleanup api server (#33158 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2026-01-27 15:18:10 +00:00
omkhalil	5ec44056f7	[Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill (#33045 ) (#33045 ) Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer. The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the batch for projection flops, vocab projections are run on just the last token for the autoregressive use case. Co-authored-by: Omar Mohamed Khalil <omarkhalil@meta.com>	2026-01-27 15:16:49 +00:00

... 7 8 9 10 11 ...

13773 Commits