biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
JartX	140cbb1186	[Bugfix] Cuda Clean up scales Kvcache fp8/int8_per_token_head (#39224 ) Signed-off-by: JartX <sagformas@epdcenter.es> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-04-08 04:08:04 -07:00
Kevin H. Luu	6155bbd1dd	[Bugfix][Docs] Fix ReadTheDocs build crash from mocked torch decorator (#39284 ) Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-08 09:43:01 +00:00
rasmith	78434b923c	[CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for chunk_gated_delta_rule_fwd_kernel_h_blockdim64 to avoid illegal memory access (#39087 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-04-08 16:57:18 +08:00
Michael Goin	2488d1dca2	[Docs] Update README (#39251 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-04-08 11:34:07 +08:00
yoke	d734445fcd	[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls (#38909 ) Signed-off-by: yoke233 <yoke2012@gmail.com>	2026-04-08 11:03:54 +08:00
Flora Feng	927975ead8	[Parser] Migrate response api streaming to unified parser (#38755 ) Signed-off-by: sfeng33 <4florafeng@gmail.com> Signed-off-by: Andrew Xia <axia@meta.com>	2026-04-08 10:09:00 +08:00
Flora Feng	9ea7d670d8	[Bugfix] Fix Qwen3 tool parser for Responses API tools (#38848 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-04-08 10:08:51 +08:00
Varun Sundar Rabindranath	7b80cd8ac3	[Docs] Add Phi-4-reasoning-vision to supported models + examples (#39232 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2026-04-08 02:02:26 +00:00
Andrey Talman	2111997f96	[release 2.11] Update to torch 2.11 (#34644 )	2026-04-07 18:55:48 -07:00
Flora Feng	5af684c319	[CI] Add reasoning parser tests to CI (#37025 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-04-08 00:57:36 +00:00
Md. Mekayel Anik	d521dcdbcc	docs: clarify SMT and OMP acronyms in CpuPlatform (#39085 )	2026-04-07 17:42:07 -07:00
Giancarlo Delfin	5daf62271d	[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-04-07 17:37:37 -07:00
zofia	ad3304425b	[XPU] add xpu backend implementation of mxfp8 quant (#38682 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-08 08:30:35 +08:00
Lucas Wilkinson	70406eb1dc	[Attention][V0 Deprecation] Deprecate accept output buffer (#39125 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-04-07 17:14:58 -04:00
Yubo Wang	08bfedc152	[Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype (#39160 ) Signed-off-by: Yubo Wang <yubowang2019@gmail.com>	2026-04-07 11:18:33 -07:00
Flora Feng	0102bd2f4c	[Parser] Pass request.tools to tool parser (#38860 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-04-08 01:36:21 +08:00
rasmith	83d09d36b5	[CI][Bugfix][AMD][ Ensure weights created when using emulating OCP MXFP4 (#36993 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-04-08 00:37:16 +08:00
Chendi.Xue	92b9afeecd	[XPU] Quick fix for TritonMLA to remove cuda hardcode (#39088 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-04-08 00:17:58 +08:00
Jinzhen Lin	7310555482	[Bugfix] Fix marlin nvfp4 rescaling (#37502 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>	2026-04-07 08:57:17 -07:00
ibifrost	96b5004b71	[KVConnector] Support 3FS KVConnector (#37636 ) Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com> Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2026-04-07 15:46:00 +00:00
kkyyxhll	98e1a43af7	[Bugfix][Quantization] Fix PerTensorScale loading with tuple shard_id in MergedColumnParallelLinear (#38517 ) Signed-off-by: loukang <loukang@xiaohongshu.com>	2026-04-07 11:16:26 -04:00
maobaolong	729eb59f60	[KVConnector]: prioritize external connector over internal registry (#38301 ) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-04-07 15:03:11 +00:00
Ilya Boytsov	6e1100889e	fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader (#39176 ) Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>	2026-04-07 22:40:55 +08:00
Harry Mellor	edcc37a8ce	Fix Mistral yarn warning in Transformers v5 (#37292 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>	2026-04-07 13:23:33 +00:00
Harry Mellor	79df4a794d	Automatically add links to API docs for matching strings in docs (#37434 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-04-07 21:21:18 +08:00
Ronen Schaffer	7c139ab23f	[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment (#38217 ) Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>	2026-04-07 15:14:45 +03:00
Wei Zhao	0be9516ea4	[Bug] Fix Trtllm Fp8 MoE Weight Shuffle Memory Fragamentation (#39054 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-04-07 08:04:08 -04:00
Kyle Mylonakis	7b9de7c892	[Bugfix] Correct mistake in chained comparison in static assert logic (#38699 ) Signed-off-by: Kyle Mylonakis <kyle@protopia.ai>	2026-04-07 18:24:39 +08:00
Rohan Potdar	dd9342e6bc	only patch runtime_env for torch >= 2.10 (#38763 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-04-07 09:29:23 +00:00
Jiangyun Zhu	8060bb0333	[vLLM IR] rework gemma_rms_norm (#39014 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com> Signed-off-by: Jiangyun Zhu <riverclouds.zhu@qq.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-04-07 01:37:00 -07:00
Rishapveer Singh	da4c0e4db9	[Model] Use AutoWeightsLoader for FalconH1 (#39092 ) Signed-off-by: Rishapveer Singh <215205492+rishaps@users.noreply.github.com>	2026-04-07 16:25:17 +08:00
Netanel Haber	a9a0e0551f	nano-nemotron-vl: get_mm_max_tokens_per_item for audio, video, image == seq_len (#38727 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-07 00:23:29 -07:00
Andrew Barnes	5c35517a3e	[ROCm] Remove unused IS_FNUZ parameter from reshape_and_cache_shuffle_kernel (#39123 ) Signed-off-by: Bortlesboat <bortstheboat@gmail.com>	2026-04-07 07:17:59 +00:00
Andreas Karatzas	a435e3108d	[ROCm][CI] Fix test repo-root assumptions (#39053 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-04-07 13:36:21 +08:00
Andreas Karatzas	2df2c85be4	[Kernels][MoE] Fix legacy_routing to use bitmatrix-based routing path (#38504 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-04-07 10:57:09 +08:00
Nick Hill	62095e82c1	[BugFix][MRV2] Fix cuda event reuse race (#39115 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-04-07 00:21:09 +00:00
bnellnm	b2b2c5239e	[MoE Refactor] Split up compressed_tensors_moe.py (#38960 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-04-06 20:07:54 -04:00
fxmarty-amd	00d7b497b3	[NVFP4] Support NVFP4 dense models from `modelopt` and `compressed-tensors` on AMD Instinct MI300, MI355X and Hopper through emulation (#35733 ) Signed-off-by: Felix Marty <Felix.Marty@amd.com> Signed-off-by: fxmarty-amd <felmarty@amd.com> Co-authored-by: Kyle Sayers <kylesayrs@gmail.com>	2026-04-06 16:18:27 -06:00
Matthew Bonanni	9c81f35b1a	[Attention][MLA] Re-enable FA4 as default MLA prefill backend (#38819 )	2026-04-06 17:51:46 -04:00
Woosuk Kwon	f186cfe75e	[MRV2] Fix hanging issue with DeepSeek V3.2 by setting `skip_attn=False` (#39098 ) Signed-off-by: WoosukKwon <woosuk.kwon@berkeley.edu> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-04-06 12:55:13 -07:00
Netanel Haber	dfa5062a8f	NemotronH default mamba_ssm_cache_dtype=float32; enable auto-hook for NemotronHNanoVLV2Config (#39032 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-04-06 19:47:46 +00:00
Yongye Zhu	e8ebbdde83	[Quantization] Add FlashInfer CuteDSL batched experts backend for NVFP4 MoE (#38251 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-04-06 11:57:53 -07:00
namgyu-youn	94fbb09894	[EASY] Drop duplicate KV-cache initialization (#38799 ) Signed-off-by: namgyu-youn <namgyu.dev@gmail.com>	2026-04-06 18:05:39 +00:00
Wentao Ye	419e73cdfa	[Bug] Fix mistral version dependency (#39086 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-06 13:31:19 -04:00
bnellnm	f01482408c	[MoE Refactor][Test] FusedMoE layer test (#24675 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-06 17:17:23 +00:00
zhanqiuhu	bfdc0a3a99	[NIXL][Mamba][3/N] Heterogeneous TP: 3-read conv state transfer (#37635 )	2026-04-06 19:07:02 +02:00
bnellnm	93bada494f	[MoE Refactor] Split of DefaultMoERunner class (#35326 ) Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-04-06 12:41:59 -04:00
Frederik Gossen	608914de30	[Core] Re-enable Inductor pre-grad passes in standalone compile (torch>=2.12) (#38944 ) Signed-off-by: Frederik Gossen <frgossen@meta.com>	2026-04-06 09:37:13 -07:00
Wentao Ye	4ae218c122	[Refactor] Remove unused dead code (#38842 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-04-06 11:52:05 -04:00
Lukas Geiger	f40d9879f2	[Models][GDN] Remove GPU/CPU syncs in `GDNAttentionMetadata.build` during speculative decoding (#38047 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2026-04-06 15:39:37 +00:00

1 2 3 4 5 ...

15593 Commits