biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
inkcherry	500f26e6d3	[Bugfix] fix DP-aware routing in OpenAI API requests (#29002 ) Signed-off-by: inkcherry <mingzhi.liu@amd.com>	2025-12-18 09:50:42 -08:00
sarathc-cerebras	28d15ab56b	adds jais 2 support (#30188 ) Signed-off-by: sarathc-cerebras <sarath.chandran@cerebras.net> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-18 15:46:58 +00:00
Lucas Wilkinson	30bb19a760	[BugFix] Partial revert of #29558 (DeepEP HT + PIECEWISE CG support) (#30910 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-17 23:50:15 -08:00
Yifan Qiao	11a89cf95c	[Fix][FlexAttention] return max logical block index to handle reused blocks (#30915 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-18 06:42:21 +00:00
Li, Jiang	e3ab93c896	[CPU] Refactor CPU fused MOE (#30531 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-18 14:36:49 +08:00
Zhengxu Chen	5f2f3fba1d	[compile] Fix CI for test_gpt2_cache_hit (#30902 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2025-12-17 20:22:23 -08:00
Nicolò Lucchesi	bc3700e0cd	[NIXL] Support P tensor-parallel-size > D tensor-parallel-size (#27274 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-18 11:53:30 +08:00
Micah Williamson	fd8afdf38d	[ROCm][CI] Reduce Flakiness For test_async_scheduling Using ROCM_ATTN With FP32 (#30811 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-18 10:27:37 +08:00
SungMinCho	a0b782f9cc	[Metrics] Model FLOPs Utilization estimation (#30738 ) Signed-off-by: SungMinCho <tjdals4565@gmail.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-12-18 01:40:51 +00:00
Isotr0py	74a1ac38b0	[v1] Add PrefixLM support to TritonAttention backend (#30386 )	2025-12-17 16:05:24 -08:00
Matthew Bonanni	7eb6cb6c18	[Attention] Update tests to remove deprecated env vars (#30563 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-12-17 09:49:59 -08:00
Nicolò Lucchesi	9ca8cb38fd	[CI][Bugfix] Fix flaky `tests/entrypoints/openai/test_audio.py::test_chat_streaming_audio` (#30878 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-17 18:49:56 +01:00
Jialin Ouyang	6e9dbcc50e	[Fix] uniform decode batch check (#30747 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-12-17 19:58:43 +08:00
Chauncey	9ad5b21710	[Refactor] [4/N] Move VLLM_SERVER_DEV endpoints into the serve directory (#30749 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-17 02:27:30 -08:00
Michael Goin	519ef9a911	[UX] Make `vllm bench serve` discover model by default and use --input-len (#30816 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-17 01:55:30 -08:00
Ye (Charlotte) Qi	a100152288	[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#30842 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-17 01:54:21 -08:00
Xinyu Chen	3b1d440ede	CustomOp: grouped topk (#29575 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2025-12-17 17:43:00 +08:00
Robin	20fda43151	[Bugfix][Frontend] Prevent IndexError in MiniMax M2 tool parser during streaming extraction (#30555 ) Signed-off-by: WangErXiao <863579016@qq.com>	2025-12-17 16:37:57 +08:00
Cyrus Leung	44d3b1df3d	[CI/Build] Fix compatibility between #30244 and #30396 (#30787 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-16 20:21:19 -08:00
Wentao Ye	b6ec077e05	[CI] Skip ci failure test (#30804 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-16 22:47:53 +00:00
Roger Wang	f5f51e5931	[Core][MM] Optimize encoder cache manager by operating with embeddings only (#30475 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Sun Kim <sunytokki@gmail.com>	2025-12-16 14:18:17 -08:00
Lucas Wilkinson	9fec0e13d5	[Attention] Cache attention metadata builds across hybrid KV-cache groups (#29627 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Stanislaw Wozniak <stw@zurich.ibm.com>	2025-12-16 17:10:16 -05:00
Wentao Ye	f21f5ea38c	[Refactor] Small refactor for group topk (#30562 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-12-16 14:50:59 -05:00
Nicolò Lucchesi	ca702a14dc	[Frontend] Add `max-completion-token` option to transcription/translation endpoints (#30769 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-16 19:36:49 +00:00
Michael Goin	10ee1c64cf	[CI] Generalize gsm8k test args and add Qwen3-Next MTP B200 test (#30723 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-16 14:28:34 -05:00
Harry Mellor	af506fd76a	Fix instantiation of `HfHubHTTPError` in LoRA test (#30768 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-16 08:02:24 -08:00
Isotr0py	4de08ad698	[CI/Build] Skip broken ViT backend functionality test tempoarily (#30782 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-16 06:45:25 -08:00
Jee Jee Li	0e391e7570	[Bugfix] Fix RequestOutput miss lora_request (#30636 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-16 01:36:35 -08:00
Andrew Xia	0d0c929f23	[responsesAPI][8] input/output messages for ResponsesParser (#30158 ) Signed-off-by: Andrew Xia <axia@fb.com> Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-16 13:54:59 +08:00
jiangkuaixue123	b9ff4f2a8d	[feature] extend DBO to XBO (#30120 ) Signed-off-by: jiangkuaixue123 <jiangxiaozhou111@163.com> Co-authored-by: root <root@hk01dgx028.cm.cluster>	2025-12-16 00:04:01 -05:00
Boyuan Feng	c881db364e	improve lazy import test (#30733 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-16 03:12:05 +00:00
Shanshan Shen	3bd9c49158	[CustomOp] Extract ApplyRotaryEmb as CustomOp and unify the dispatch logic (#29873 ) Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: gcanlin <canlinguosdu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-12-15 19:08:16 -08:00
penfree	bbd850e597	[Bugfix] fix streaming final output for non harmony (#30237 ) Signed-off-by: penfree <qiupengfei@baidu.com> Co-authored-by: penfree <qiupengfei@baidu.com>	2025-12-16 09:03:11 +08:00
Michael Goin	a450c64a30	[Bugfix] Fail instead of ignoring when CompilationConfig gets invalid args (#30708 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-12-15 20:18:02 +00:00
Cyrus Leung	ed586e7724	[Refactor] [3/N] Move tool parser tests and run on CPU (#30693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-15 13:45:36 +00:00
Chauncey	2a1776b7ac	[Refactor] [2/N] Move tool parsers into the vLLM main directory (#30675 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2025-12-15 12:54:52 +00:00
wang.yuqi	4429d934de	[Model] Automatic conversion of TokenClassification model (#30666 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-15 08:13:00 +00:00
汪志鹏	1adeb3b84c	[New Model] BAGEL support (AR only) (#28439 ) Signed-off-by: princepride <wangzhipeng628@gmail.com> Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-15 14:58:23 +08:00
Wentao Ye	3778673ea8	[Feat] Refactor for `parallel_config` in `FusedMoEModularKernel` (#30282 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-12-15 04:21:36 +00:00
Shanshan Shen	87b4d1557d	[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. (#30125 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-15 11:13:32 +08:00
Wenqi Glantz	84e23d103d	additional protection for CVE-2025-62164 (#30649 ) Signed-off-by: Wenqi Glantz <wglantz@nvidia.com>	2025-12-15 03:07:10 +00:00
Or Ozeri	174e39ead7	CPU KV Offloading: Use more CUDA streams (#29013 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2025-12-14 23:50:45 +00:00
Chendi.Xue	ae2e503dda	[NIXL][BUG FIX] Fix a bug for PD with host_buffer after merging 29665 (#30420 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Signed-off-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com>	2025-12-14 15:38:28 +00:00
ElizaWszola	994acec0cc	[Bugfix] Fix fusion for VL models (#30244 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-12-14 21:22:37 +08:00
Johannes F	060893654d	fix: Update json features supported by xGrammar (#30390 ) Signed-off-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com> Signed-off-by: Johannes F <johannesflommersfeld@users.noreply.github.com> Co-authored-by: Johannes Flommersfeld <johannes.flommersfeld@tngtech.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-14 02:16:06 -08:00
Matthias Gehre	e9add129ad	[Bugfix] awq_gemm: fix argument order swap (#30364 ) Signed-off-by: Matthias Gehre <matthias.gehre@amd.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-12-14 18:15:37 +08:00
Lasha Koroshinadze	3a20450d31	Add AudioFlamingo3 model support (#30539 ) Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com> Signed-off-by: Lasha Koroshinadze <26011196+lashahub@users.noreply.github.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-14 02:14:55 -08:00
Cyrus Leung	dcb31196da	[Chore] Remove redundant `RequestPrompt` (#30612 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-14 09:22:37 +00:00
Laith Sakka	f569c654e1	enable unbacked with aot_compile (#30462 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-14 08:14:06 +00:00
Kayvan Mivehnejad	29f7d97715	Improve parse_raw_prompt test cases for invalid input .v2 (#30512 ) Signed-off-by: Kayvan Mivehnejad <K.Mivehnejad@gmail.com>	2025-12-14 11:18:41 +08:00

... 11 12 13 14 15 ...

4517 Commits