biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Micah Williamson	0edf101d2b	[ROCm] Add `stablelm` Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-28 12:16:34 +08:00
Douglas Lehr	d5b6f3ba36	[ROCm][Quantization] Add Composable Kernel (CK) backend support for M… (#34301 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com> Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>	2026-02-28 03:37:01 +00:00
Woosuk Kwon	1a014a0a93	[Model Runner V2] Move MM encoder to Model States [3/N] (#35564 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-27 18:32:38 -08:00
Woosuk Kwon	86ac7bcf84	[Model Runner V2] Support pooling models (#35120 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-27 18:03:01 -08:00
Umut Polat	405f28d38d	[Misc] Clean up ResponsesRequest model validators (#35531 ) Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>	2026-02-28 01:19:21 +00:00
youkaichao	5323672bc2	[misc] cleanup one level of error stack when nixl fails to initialize (#35517 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2026-02-28 08:42:37 +08:00
Roberto L. Castro	a201ad72d8	[Refactor][Kernel] Add global helper to deduplicate vectorized memory ops (#35105 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com> Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>	2026-02-27 16:28:17 -08:00
Rohan Potdar	e3691988d0	[ROCm]: fix aiter rope functionalization (#35533 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-27 22:42:30 +00:00
Gregory Shtrasberg	9fa6c68fa6	[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-27 21:32:55 +00:00
Aaron Hao	2ce6f3cf67	[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171 ) Signed-off-by: hao-aaron <ahao@anyscale.com> Signed-off-by: Aaron Hao <ahao@anyscale.com> Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-02-27 13:45:21 -07:00
Jakub Zakrzewski	1f3dbd95fd	[Bugfix][Model] Fix gpt-oss batch invariance (#35404 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-02-27 20:41:24 +00:00
Lucas Wilkinson	1d532f9d8f	[DP] Only use DP padding when cudagraphs are actually used (#34102 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-27 15:14:31 -05:00
Lucas Kabela	234a65b781	[Bugfix] Add monkeypatch to prevent race condition from writing (#35420 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-02-27 14:51:36 -05:00
SteadfastAsArt	2decec9856	[Transformers backend] Ignore MTP weights when num_nextn_predict_layers=0 (#34888 ) Signed-off-by: SteadfastAsArt <695488173@qq.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-27 19:39:23 +00:00
Zhengxu Chen	29b35477b0	[compile] Fix caching error over pytree slice node. (#35308 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-27 19:34:16 +00:00
Nick Hill	b1d9f5372d	[Model Runner V2] Warmup kernels (#35172 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-27 10:43:30 -08:00
Raushan Turganbay	fd6de37fca	[BugFix] Fix 3D rope in transformers backend (#35097 ) Signed-off-by: raushan <raushan@huggingface.co> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-27 18:34:49 +00:00
Netanel Haber	c8aca0c9e1	Support parakeet as audio encoder for nemotron-nano-vl (#35100 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-27 11:07:38 -07:00
Martin Hickey	b602e4f299	[Doc] Fix link to Llama chat template for usability (#35525 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-27 17:51:09 +00:00
Huamin Li	157722da75	[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2026-02-28 01:50:37 +08:00
Nick Hill	1d897ff04f	[Misc] Fill in some v1 CODEOWNERS gaps (#35524 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-27 09:34:37 -08:00
fort726	905d76b51d	[Model] Add huggingface skt/A.X-K1 model (#32407 ) Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com> Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-27 09:26:02 -08:00
Yanan Cao	9098ce690c	[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching (#34390 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-02-27 09:21:35 -08:00
Nick Hill	876312f0b5	[Core] Fix `gpu_worker.py` pre-commit errors (#35312 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-27 07:54:24 -08:00
Boyuan Feng	5de98abc12	Add @BoyuanFeng to CODEOWNERS (#35317 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2026-02-27 15:53:47 +00:00
Koushik Dutta	9251ed5c4f	[Bugfix] Handle case when kimi ends reasoning with a tool call (#33646 ) Signed-off-by: Koushik Dutta <koushd@gmail.com> Co-authored-by: mondaylord <20212010046@fudan.edu.cn> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-27 14:58:28 +00:00
Yueqian Lin	e8249378e4	[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests (#35487 ) Signed-off-by: linyueqian <linyueqian@outlook.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-27 06:48:25 -08:00
haosdent	6d4f9d3ad5	[Bugfix] Fix DCP + FA3 crash due to missing num_splits in _forward_with_dcp (#35082 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-27 22:27:06 +08:00
Harry Mellor	fbe3f0120a	Revert "Add GlmOcrConfig for GLM-OCR model type recognition" (#35512 )	2026-02-27 06:13:27 -08:00
Jason Li	66c1751d13	[compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence parallelism (#35410 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>	2026-02-27 08:36:37 -05:00
Tib	6467b635b6	[Bugfix] Add missing activation attr to RMSNormGated (#35423 ) Signed-off-by: tibG <naps@qubes.milou> Co-authored-by: tibG <naps@qubes.milou>	2026-02-27 12:53:35 +00:00
Max Hu	9c3fe9936b	Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580 ) Signed-off-by: Max Hu <maxhu@nvidia.com> Signed-off-by: Max Hu <hyoung2991@gmail.com> Co-authored-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Shang Wang <shangw@nvidia.com>	2026-02-27 20:20:23 +08:00
Umut Polat	b66a74649e	[Bugfix] Replace assert with ValueError for response_format validation in completions endpoint (#35456 ) Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>	2026-02-27 08:01:06 +00:00
Wang Xingran	07bdabef03	[Bugfix] Use 'sum' reduction instead of 'avg' in Async TP reduce-scatter (#33088 ) Signed-off-by: Xingran Wang <wangxingran123456@outlook.com> Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com> Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>	2026-02-27 07:06:08 +00:00
Chengyi Nie	a572baff5e	[Model Performance] Add Qwen3MoE tuned MoE configs for H200 (#35457 ) Signed-off-by: Chengyi Nie <cnie@roblox.com> Co-authored-by: Chengyi Nie <cnie@roblox.com>	2026-02-27 13:51:14 +08:00
zofia	516cf26698	[Bug] correct out dtype of rms_norm_gated native path (#35369 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-27 05:19:51 +00:00
Jiangyun Zhu	487e5c51f7	[Bugfix] disable allreduce_rms_fusion by default when pp size > 1 (#35424 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-27 04:18:52 +00:00
Daniel Huang	1a8c71674e	[BugFix] Repo utils debug print patch (#35434 ) Signed-off-by: Daniel Huang <daniel1.huang@intel.com>	2026-02-27 03:50:56 +00:00
Wentao Ye	062b789632	[Bug] Fix outdated links in source code (#35314 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-27 03:50:46 +00:00
gnovack	a532c83849	use 'max_active_experts' for moe lora input size (#33197 ) Signed-off-by: gnovack <gnovack@amazon.com>	2026-02-27 03:50:43 +00:00
Jee Jee Li	1e5ad9b74f	[Bugfix] Fix Qwen3NextForCausalLM packed_modules_mapping (#35413 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-26 19:46:30 -08:00
Nicolò Lucchesi	cabdaa7619	[Misc] Move `GPUModelRunner.prepare_kernel_block_sizes` to utils (#35400 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-27 11:42:51 +08:00
Chenyaaang	06be53563b	[Core]Extract is_last_rank in Ray for tpu to override (#33012 ) Signed-off-by: Chenyaaang <chenyangli@google.com>	2026-02-27 03:18:52 +00:00
Angela Yi	c29ee9c326	[compile] Invalidate cache for cpu flags (#35119 ) Signed-off-by: angelayi <yiangela7@gmail.com>	2026-02-27 02:54:11 +00:00
daniel-salib	d43048ce05	[Bugfix] Emit reasoning_part events in simple streaming path for Resp… (#35184 ) Signed-off-by: Daniel Salib <danielsalib@meta.com>	2026-02-27 09:49:06 +08:00
Michael Goin	4fec53cfcb	[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274 )	2026-02-26 17:58:03 -07:00
roikoren755	38c498b8e3	[Performance] Cublas Bf16 Gate with Fp32 Output (#35121 ) Signed-off-by: Roi Koren <roik@nvidia.com>	2026-02-26 16:51:28 -08:00
Andrii Skliar	56a6371706	[Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687 ) Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Andrii <askliar@nvidia.com>	2026-02-26 16:31:43 -08:00
Pavani Majety	6283021142	[Bugfix] Fix KV Scale loading for MLA Models (#35430 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2026-02-26 23:38:19 +00:00
Aleksandr Malyshev	01923eec70	[ROCm][Quantization] GPT OSS Upstream MoE wmxfp4_afp8 with static scales (#30357 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2026-02-26 16:50:16 -06:00

1 2 3 4 5 ...

14318 Commits