biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Matthew Bonanni	0308901975	[2/N][Attention] Fix pre-commit errors (#32052 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-10 00:27:15 +00:00
Lucas Kabela	aaf4b70aae	[Misc][BE] Type coverage for vllm/compilation [2/3] (#31744 )	2026-01-09 18:30:38 -05:00
Nick Hill	3adffd5b90	[Misc] Enable async scheduling by default with spec decoding (#31998 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 23:09:34 +00:00
zhrrr	97ba96fbe9	[perf][async] support non cpu sync get logprob tensors for spec (#31336 ) Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com>	2026-01-09 21:24:51 +00:00
Chendi.Xue	94578127a4	[NIXL] refine decoder side post process for heterogeneous BlockSize and kv_layout (#30275 )	2026-01-09 21:22:19 +00:00
Matthew Bonanni	2612ba9285	[1/N][Attention] Restructure attention: move files (#31916 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-09 13:10:24 -08:00
Andrew Xia	1f8b7c536b	[responsesAPI] fix incomplete_messages for simple/parsable context (#31836 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2026-01-09 21:00:57 +00:00
Lucas Wilkinson	0a0aa07747	[Quant] Make static quant support all group shapes (#30833 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-09 12:49:27 -08:00
jiahanc	f9e2a75a1e	[fix] add cutedsl to global sf (#32001 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2026-01-09 12:03:02 -08:00
Runkai Tao	a4d5d663e2	Add unpermute-aware fused MoE path and small-batch fallback (#29354 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-09 12:58:39 -07:00
Jeremy Teboul	657e9c0e18	[Fix] Introduce audio channels spec (#31595 ) Signed-off-by: Jeremy Teboul <jeremyte@meta.com>	2026-01-09 19:34:51 +00:00
Wentao Ye	308feab33f	[Perf] Optimize cutlass moe problem size calculation, 5.3% E2E Throughput improvement, 2.2% TTFT improvement (#31830 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-09 11:13:43 -08:00
Wentao Ye	28ae32a5d3	[Refactor] Remove numpy split in async scheduling (#32034 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-09 19:09:02 +00:00
Andrew Xia	f32c629eb4	[Frontend][gpt-oss] Allow system message to overwrite model identity (#31737 ) Signed-off-by: lacora <hyelacora@gmail.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: lacora <hyelacora@gmail.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-01-09 14:03:57 -05:00
Yifan Qiao	cd4a95e3aa	[Feat][Core] Support multiple KV cache groups in Hybrid KV Coordinator (#31707 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2026-01-09 10:53:20 -08:00
Michael Goin	d5ec6c056f	[UX] Add vLLM model inspection view (#29450 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-09 10:12:35 -07:00
Shanshan Shen	08d954f036	[Doc] Add developer guide for CustomOp (#30886 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2026-01-09 16:21:11 +00:00
Kevin Šuc	ac9f9330e6	Rename --exclude-log-deltas to --enable-log-deltas (#32020 ) Signed-off-by: Catacomba <kevinsuc16@gmail.com>	2026-01-09 15:30:40 +00:00
Isotr0py	2d0c5b630e	[Doc] Remove hardcoded Whisper in example openai translation client (#32027 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-09 14:44:52 +00:00
Michael Goin	34cd32fe30	[Perf][Kernel] Fused SiLU+Mul+Quant kernel for NVFP4 cutlass_moe (#31832 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-01-09 07:40:33 -07:00
R3hankhan	8e27663b6a	[CPU] Add head sizes 80 and 112 with vec16 fallback (#31968 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-01-09 22:14:46 +08:00
maang	7cdf7e2fe0	[Model] Remove redundant None check in DeepSeekOCR image input processing (#32016 ) Signed-off-by: maang <maang_h@163.com>	2026-01-09 06:12:44 -08:00
Adolfo Victoria	bbf80ede43	Fix type error (#31999 ) Signed-off-by: Adolfo Victoria <adolfokarim@gmail.com> Co-authored-by: Adolfo Victoria <adovi@meta.com>	2026-01-09 22:03:32 +08:00
inkcherry	4505849b30	[ROCm][PD] add moriio kv connector. (#29304 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2026-01-09 14:01:57 +00:00
Roger Wang	db07433ce5	[Misc] Skip hashing kwargs if value is `None` (#32025 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-01-09 13:20:59 +00:00
Andreas Karatzas	e02706d2d2	[ROCm][CI][V1] Fix `nixl_connector` test failure and achieve CUDA parity in `test_async_scheduling` (#32000 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 20:48:32 +08:00
Sophie du Couédic	b474782ad7	[Feature][Benchmarks] Custom dataset: read output length from dataset (#31881 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>	2026-01-09 12:40:59 +00:00
Bofeng Xue	55212c1404	fix: remove duplicate engine_id check in nixl_connector (#31948 ) Signed-off-by: Bofeng BF1 Xue <xuebf1@Lenovo.com> Co-authored-by: Bofeng BF1 Xue <xuebf1@Lenovo.com>	2026-01-09 12:13:17 +00:00
Xin Yang	e7b68f4d6c	[Bugfix] Fix Triton FusedMoE LoRA (#30585 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-09 11:46:59 +00:00
vllmellm	1a19e9cd87	[Bugfix][ROCm]Fix Qwen3-Next-80B-A3B-Thinking inference and optimize non-standard block size (544) support under rocm_atten (#31380 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-09 19:28:02 +08:00
Cyrus Leung	c8ed39b9dd	[Model] Reorganize pooling layers (#31973 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-09 11:02:14 +00:00
Andreas Karatzas	020732800c	[Bugfix] Fix OpenAPI schema test failures (#31921 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-09 10:56:20 +00:00
Alex Brooks	dc77cb7129	[Bugfix] Fix Var Length Batched Padding in Granite Speech (#31906 ) Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>	2026-01-09 10:28:43 +00:00
gnovack	bde38c11df	fix lora moe sharding when rank < max_lora_rank (#31994 ) Signed-off-by: gnovack <gnovack@amazon.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-09 14:43:25 +08:00
Xin Yang	707b240d7e	[Bugfix] Fix FusedMoE LoRA w2_output_size (#31949 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-09 00:54:05 -05:00
Nick Hill	29ce48221c	[Cleanup] Remove obsolete spec decoding compatibility logic (#32003 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 05:44:18 +00:00
TJian	7a05d2dc65	[CI] [ROCm] Fix `tests/entrypoints/test_grpc_server.py` on ROCm (#31970 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-01-09 12:54:20 +08:00
Divakar Verma	a1648c4045	[ROCm][CI] Fix test_token_classification.py::test_bert_models (#31993 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-01-09 04:04:33 +00:00
RioS	e2d49ec2a4	[Bugfix] missing tokens occur in harmony streaming (#30437 ) Signed-off-by: RioS <aa248424@gmail.com> Signed-off-by: Ri0S <aa248424@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-01-09 03:59:34 +00:00
Xin Yang	8413868dab	[Bugfix] Fix typo in FusedMoE LoRA reshape comment (#31992 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-08 18:46:05 -08:00
zhrrr	8ff4a99566	[Async][Feat] support apply penalty or bad_words for async + spec (#30495 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 02:31:50 +00:00
daniel-salib	a4ec0c5595	[Frontend] Add MCP tool streaming support to Responses API (#31761 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2026-01-09 09:19:34 +08:00
Robert Shaw	0fa8dd24d2	[Bugfix] Fix Typo from NVFP4 Refactor (#31977 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-08 16:18:50 -08:00
Max Hu	6ebe34d6fa	[Feature] Add iteration level logging and enhance nvtx marker (#31193 ) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2026-01-09 00:13:39 +00:00
Nick Hill	11cec296dd	[BugFix] Add spec-decode-incompatible request param validation (#31982 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-09 00:08:21 +00:00
Robert Shaw	5825bbc1f7	[Quantization] Deprecate Long Tail of Schemes (#31688 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-08 19:07:45 -05:00
Yongye Zhu	d62cfe546d	[MoE Refactoring][Bugfix]Wrap WNA16 Triton kernel into mk and change compressed tensor kernel selection (#31752 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-08 19:01:30 -05:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Dipika Sikka	5d3b6097ad	[Compressed-Tensors] Simplify NVFP4 Conditions, enable marlin support for NVFP4A16 MoEs (#30881 )	2026-01-08 17:45:17 -05:00
bnellnm	e74698c27a	[Misc][Refactor] Add FusedMoERouter object (#30519 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-08 20:52:55 +00:00

... 8 9 10 11 12 ...

13302 Commits