biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Kevin H. Luu	db14f61f2d	[ci] Refactor CI file structure (#29343 )	2025-12-08 17:25:43 -09:00
Micah Williamson	78c7503364	[ROCm][CI] Skip NVIDIA-Only Prime-RL Test in AMD CI (#29420 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2025-12-09 02:14:02 +00:00
Christina Norman	e41312a2f5	[Bugfix] Skip generation config fallback for GGUF to prevent multi-process hang (#30209 ) Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2025-12-09 01:52:43 +00:00
Yanan Cao	7b35011ad1	Mark qwen2_5_vl as xfail (#30283 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-09 01:14:10 +00:00
Zhewen Li	ae339b1a67	[Bugfix] Fix DeepGEMM after #29546 (#30267 ) Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-12-09 01:05:27 +00:00
Wentao Ye	0ee6416f67	[Perf] Optimize `group_topk` kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt (#30159 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-08 19:44:01 -05:00
Wentao Ye	d9417096d1	[Feature] Batch invariant: Enable `TRITON_MLA` without prefix-caching (#29125 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-08 19:31:57 -05:00
Ming Yang	9d6235ca9a	[moe] Allow disabling DP chunking (#29936 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:29:36 +00:00
Victor Ziliang Peng	f1599ca55d	feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189 ) Signed-off-by: Ziliang Peng <ziliang@character.ai>	2025-12-09 00:08:48 +00:00
Ming Yang	60d17251c9	[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782 ) Signed-off-by: Ming Yang <minos.future@gmail.com>	2025-12-09 00:01:08 +00:00
Lain	1fb632fdb6	[Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum (#29795 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>	2025-12-08 15:02:34 -08:00
Charlie Fu	6af70e11a0	[ROCm][CI] Fix test_max_len.py for Rocm (#29916 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>	2025-12-08 16:58:30 -05:00
roikoren755	ae0f69b16a	Add SpecDec support to `selective_state_update` (#29488 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2025-12-08 16:45:18 -05:00
Dmitry Tokarev	799804d140	Bump nvshmem to 3.3.24 and fix CUDA 13 installation (#30149 ) Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-08 20:24:34 +00:00
Vasiliy Kuznetsov	0d402d2600	online fp8 quant with streaming weight post-processing (#29196 ) Signed-off-by: vasiliy <vasiliy@fb.com>	2025-12-08 20:15:10 +00:00
Johnny Yang	d1b5e7afbf	[TPU] Bump tpu-inference to 0.12.0 (#30221 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-12-08 20:10:10 +00:00
shaharmor98	fcd5306f65	Add latent MoE support (#30203 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-12-08 17:35:01 +00:00
weiguihua2	398a596ed2	[MP executor] fix get device count for multi node of mp executor feature (#30042 ) Signed-off-by: weiguihua2 <weiguihua2@huawei.com>	2025-12-09 01:33:48 +08:00
Jee Jee Li	67312cad11	[Misc] Split the LoRA code (#30253 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-09 00:59:31 +08:00
Laith Sakka	87aee9ed2b	Add evaluate_guards option to DynamicShapesConfig (#27432 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2025-12-08 10:46:15 -05:00
Daniel Cámpora	184076c3fe	[DeepSeek v3.2] Make top-k work for any logit values. (#27568 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-08 06:55:58 -08:00
Ye (Charlotte) Qi	eb1051fb95	[ROCm] Guard group quant RMS norm fusion patterns (#30239 )	2025-12-08 14:44:48 +00:00
Jee Jee Li	80433e225e	[LoRA] Reduce the loading time of MoE LoRA (#30243 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-08 13:29:47 +00:00
Harry Mellor	5c2433a6f3	Add tip for `mypy` and `markdownlint` to the pre-commit comment (#30259 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-08 13:11:51 +00:00
Simon Mo	77072e93b3	[docs] governance documents (#24801 ) Signed-off-by: simon-mo <simon.mo@hey.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Mark McLoughlin <markmc@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-12-08 12:06:20 +00:00
wang.yuqi	2e660c2434	[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-08 12:01:21 +00:00
Shiming Zhang	408cf42f67	[CI] Prevents triggering of an inactive issue/PR check for forked repository. (#29654 ) Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>	2025-12-08 10:29:14 +00:00
wang.yuqi	9e77ffca3f	[Model][7/N] Improve all pooling task \| Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-08 08:10:09 +00:00
Dazhi Jiang	bcb6f5947f	[Perf] Remove sync point in vit torch sdpa attn backend (#30232 ) Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>	2025-12-08 07:12:42 +00:00
Zhiyu	cd00c443d2	[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091 ) Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>	2025-12-08 07:05:27 +00:00
Jiangyun Zhu	d143271234	[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-12-08 06:43:47 +00:00
Zhiwei	c6df05ebb4	[ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kernel (#29773 ) Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>	2025-12-08 05:23:46 +00:00
Nick Hill	d726a7b0ed	[BugFix] Unblock use of LoRA with data parallel mode (#30220 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-08 12:21:05 +08:00
Zhijian Jiang	344b50d525	Address comment to mergify.yml in #30117 (#30219 ) Signed-off-by: Zhijian Jiang <Zhijian.Jiang@outlook.com>	2025-12-08 11:26:25 +08:00
Andrew Xia	735284ed86	[responsesAPI][7] Browser, Container MCP tools for non harmony models (#29989 ) Signed-off-by: Andrew Xia <axia@meta.com> Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-08 10:04:03 +08:00
daniel-salib	444f0e3f33	[Frontend] Add MCP type support infrastructure to Responses API (#30054 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2025-12-08 10:02:52 +08:00
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Lucas Wilkinson	0044c4038c	[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195 )	2025-12-07 10:53:51 -05:00
Isotr0py	b952f4d3c3	[v1] Add PrefixLM support to FlexAttention backend (#27938 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-12-07 15:51:36 +00:00
Wentao Ye	541a2ef892	[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 20:31:14 +08:00
Jee Jee Li	b0f4866a77	[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 20:27:11 +08:00
Jinzhen Lin	879ddb09c3	[Kernel][MoE] optimize `moe_align_block_size` (#29642 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-07 01:58:47 -08:00
Yifan Qiao	1b0482b9d1	[Misc][Core] Remove unused `req_index` increment in scheduler (#30176 ) Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>	2025-12-07 08:39:21 +00:00
Cyrus Leung	e83b7e379c	Revert "[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 )" (#30199 )	2025-12-07 00:00:22 -08:00
Cyrus Leung	27f4c2fd46	[Renderer] Separate out `RendererConfig` from `ModelConfig` (#30145 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-06 23:15:42 -08:00
Luke	a49d813fa8	Lazy loading to avoid importing all files (#29716 ) Signed-off-by: Luke <yq0536@gmail.com>	2025-12-07 07:13:14 +00:00
Wentao Ye	17eb25e327	[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 04:44:50 +00:00
jeremyteboul	dce6d229f7	Support multiple image/audio embeddings per requests (#29988 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2025-12-07 04:34:24 +00:00
Yanan Cao	cbedb703cc	[Frontend] Remove confusing -O.xx flag error (#30169 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-07 02:53:42 +00:00
AuruTus	8d3da4c79d	[MISC]: change NIXL compatibility hash logging level to debug (#30182 )	2025-12-07 00:21:03 +00:00

1 2 3 4 5 ...

12064 Commits