biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Wentao Ye	3a6d5cbefd	[Perf] Optimize dcp allocate tensor (#33102 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-27 17:24:41 -05:00
linhaifeng	f5d7049cc1	[Bugfix] Fix display error (inconsistent with context) (#33020 ) Signed-off-by: linhaifeng <1371675203@qq.com>	2026-01-27 20:33:29 +00:00
Alexei-V-Ivanov-AMD	3c3c547ce0	Enabling "2 node" distributed tests in the AMD CI pipeline. (#32719 ) Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com> Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-112.ord.vultr.cpe.ice.amd.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-01-27 19:13:21 +00:00
Matthew Bonanni	1cbccb6dba	[Attention] Use `has_flashinfer` helper (#33177 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 18:33:17 +00:00
Iris	bd92089d33	feature: support eagle3 for HunyuanVL & Hunyuan (#33035 ) Signed-off-by: irisliu10 <601012173@qq.com> Signed-off-by: Iris <38269816+irisliu10@users.noreply.github.com>	2026-01-27 17:55:48 +00:00
Karan Bansal	a6760f1525	[Doc] Improve serve parameter documentation with meaningful defaults (#33082 ) Signed-off-by: Karan Bansal <karanb192@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-27 09:19:37 -08:00
IriKa	66e601ef79	Support compress-tensors with nvfp4 or fp8 weights and modelopt with nvfp4 weights on Turing (#33076 ) Signed-off-by: IriKa Qiu <qiujie.jq@gmail.com>	2026-01-27 11:04:05 -05:00
Nick Hill	0cd259b2d8	[BugFix] Fix P/D with non-MoE DP (#33037 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-27 08:03:47 -08:00
danielafrimi	83fb2d09e8	Support heterogeneous NemotronHPuzzle model (#32549 ) Signed-off-by: <dafrimi@nvidia.com> Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com> Signed-off-by: root <dafrimi@nvidia.com>	2026-01-27 10:55:54 -05:00
danisereb	f3a5ee705f	[LoRA][Spec Decode] Support LoRA for Nemotron-H MTP models (#32265 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-27 07:53:26 -08:00
wang.yuqi	7cbbca9aaa	[Frontend] Cleanup api server (#33158 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com>	2026-01-27 15:18:10 +00:00
omkhalil	5ec44056f7	[Metrics][MFU] Fix UnembedMetrics FLOP overcounting for prefill (#33045 ) (#33045 ) Fix UnembedMetrics to correctly count FLOPs for the unembedding (LM head) layer. The bug: UnembedMetrics used total_num_tokens() which counts all tokens in the batch for projection flops, vocab projections are run on just the last token for the autoregressive use case. Co-authored-by: Omar Mohamed Khalil <omarkhalil@meta.com>	2026-01-27 15:16:49 +00:00
Nicolò Lucchesi	492a7983dd	[Bugfix] Fix DeepseekV32 `AssertionError: num_kv_heads == 1` (#33090 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-27 15:03:20 +00:00
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Nicolò Lucchesi	1f3a2c2944	[Bugfix] Disable CG for Whisper+FA2 (#33164 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-27 21:46:51 +08:00
omerpaz95	7227d06156	[Metrics] [KVConnector] Add Offloading Connector metrics (#27942 ) Added queries and hits metrics for the Offloading Connector. Also added timing metrics for store and load operations, which take the average time it takes to load/store, per-token. The metrics are available from Prometheus and from the StatLogger. Signed-off-by: omerpaz95 <omerpaz95@gmail.com> Co-authored-by: Omer Paz <Omer.Paz@ibm.com>	2026-01-27 13:34:49 +00:00
Harry Mellor	14385c80fc	Fix weight mapping test for Transfomers v5 (#33162 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-27 12:30:14 +00:00
wang.yuqi	76139d0801	[Frontend] Frontend will only attach supported tasks corresponding entrypoints. (#33139 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-01-27 12:15:43 +00:00
Lifan Shen	da8d0c441a	[AMD][QWEN3-NEXT] FP8 Tunings (#32042 ) Signed-off-by: Lifan Shen <lifans@meta.com>	2026-01-27 09:34:13 +00:00
rasmith	58996f3589	[AMD][Kernel][BugFix] Use correct scale in concat_and_cache_ds_mla_kernel when on gfx942 (#32976 ) Signed-off-by: Randall Smith <ransmith@amd.com> Signed-off-by: Randall Smith <Randall.Smith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com> v0.15.0rc1	2026-01-27 07:16:43 +00:00
Roger Wang	b539f988e1	[Models] Kimi-K2.5 (#33131 ) Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: wanglinian <wanglinian@stu.pku.edu.cn> Co-authored-by: wangln19 <96399074+wangln19@users.noreply.github.com> Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-27 14:50:31 +08:00
Andreas Karatzas	6c00645712	[CI][Pooling] Stabilize ModernBERT test (#32909 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-27 05:26:48 +00:00
Ning Xie	b781eeaa15	[code clean] remove duplicate code (#33135 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-27 04:57:16 +00:00
Cyrus Leung	e0b005d9cf	[Frontend] Cleanup serving engine (#33103 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 20:47:26 -08:00
Richard Zou	3b8f0fe59e	[torch.compile] Stop assuming 32 bit indexing (#33113 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-01-27 04:25:02 +00:00
Cyrus Leung	c831911be2	[Frontend] Reduce mixin usage in serving pooling (#33101 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-27 11:50:37 +08:00
Paco Xu	157caf511b	[Perf] avoid duplicate mem_get_info() call in get_current_memory_usage (#33064 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-01-27 03:45:45 +00:00
Vincent Gimenes	0b53bec60b	[DOC]: Add warning about max_num_batched_tokens and max_model_len when chunked prefill is disabled (#33109 ) Signed-off-by: Vincent Gimenes <147169146+VincentG1234@users.noreply.github.com>	2026-01-27 03:05:02 +00:00
Strahinja Stamenkovic	c568581ff3	Fix IndexError with encoder-decoder models when using Custom Paged Attention (#33112 ) Signed-off-by: sstamenk <strahinja.stamenkovic@amd.com>	2026-01-27 10:33:37 +08:00
wangln19	2d7053438a	fix: preserve native tool call ID in multi-turn tool calling (#32768 ) Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn> Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <2037008807@qq.com>	2026-01-27 10:22:35 +08:00
Robert Shaw	5a93b9162b	[MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>	2026-01-27 01:28:02 +00:00
Woosuk Kwon	6d86fde09c	[Model Runner V2] Remove UvaBufferPool for cpu->gpu copy (#33055 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-01-26 16:47:35 -08:00
XiongfeiWei	510ed1e8d3	[Bugfix][TPU] Return a Default fp8 MoE Backend (#32908 ) Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com> Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-26 18:46:11 -05:00
Pengchao Wang	8caffd92df	[Bugfix][MXFP4] Call `trtllm_fp4_block_scale_moe` with kwargs (#33104 ) Signed-off-by: Pengchao Wang <wpc@fb.com>	2026-01-26 15:13:18 -08:00
dolpm	58a05b0ca1	[fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913 ) Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>	2026-01-26 16:59:44 -05:00
Jared Wen	6ee7f18f33	[Logging] add `--disable-access-log-for-endpoints` CLI option (#30011 ) Add a new CLI option --disable-access-log-for-endpoints to suppress uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping). This addresses the need to reduce log noise in production environments where health check endpoints are frequently polled by load balancers or monitoring systems, generating excessive log entries that obscure meaningful request logs. Fixes #29982 Signed-off-by: JaredforReal <w13431838023@gmail.com>	2026-01-26 21:49:03 +00:00
Wentao Ye	8f987883cb	[Refactor] Remove unused `_moe_permute` function (#33108 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-26 16:06:45 -05:00
Kevin H. Luu	ebe0ba91db	[ci] Sync test areas with test-pipeline.yaml and enable new pipeline generator (#33080 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Kevin Luu <khluu@Kevins-MacBook-Pro.local>	2026-01-26 12:28:20 -08:00
Robert Shaw	43a013c3a2	[Bugfix] Fix Dtypes for Pynccl Wrapper (#33030 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-01-26 20:09:32 +00:00
Cyrus Leung	c25dbee40d	[Model] Bump transformers version for test registry (#33100 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 18:53:22 +00:00
Nicolò Lucchesi	19ab0f7ce5	[Bugfix] Fix Voxtral streaming slot_mapping (#33073 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-26 10:40:40 -08:00
danielafrimi	67fe677c53	[FIX] Always support TP > 4 for FP4 Gemm (#31099 ) Signed-off-by: dafrimi <dafrimi@nvidia.com> Co-authored-by: root <root@gpu-51.slurm-workers-slurm.slurm.svc.cluster.local>	2026-01-26 11:04:20 -07:00
Andy Lo	d56afd45fd	Remove unused logic in `models/mistral.py` (#33095 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-01-26 09:01:52 -08:00
Chauncey	a2393ed496	[CI] Fix AssertionError: MCP tool call not found in output_messages (#33093 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-26 15:19:57 +00:00
Pleaplusone	be6931ee27	[ROCm][Bugfix] Fix ptpc scale load issue for fused shared expert path in deepseek mtp (#33018 ) Signed-off-by: ganyi <ygan@amd.com>	2026-01-26 23:19:04 +08:00
Chauncey	9ef3b718d9	[Bugfix] Fix Can't instantiate abstract class DeepseekV32IndexerBackend (#33052 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-01-26 06:44:02 -08:00
Yuxuan Zhang	bb17e8f11c	[GLM-OCR] GLM-OCR with MTP Support (#33005 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-26 06:24:43 -08:00
Cyrus Leung	dcd80206b7	[Chore] Update type annotation of `input_ids` in model forward (#33063 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 06:02:10 -08:00
danisereb	f4a0921c9c	[Performance] Tune Mamba selective scan kernel for B200 (#32873 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-26 05:56:54 -08:00
VihaanThat	208c56256f	[Feature] Add LoRA support for Gemma3 vision components (#32764 )	2026-01-26 13:56:40 +00:00

1 2 3 4 5 ...

13335 Commits