biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	2ff3e436ad	[Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 05:52:44 +00:00
Jhao-Ting Chen	c2c4c4611a	[FIX] fused moe with lora shared expert dual stream (1.07x otps) (#34933 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 04:40:45 +00:00
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
Flora Feng	ec1d30c0f6	[Responses] Decouple SSE event helpers from Harmony context (#35148 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-02-24 20:05:25 -08:00
Pooya Davoodi	e3b2324ec4	[Frontend] Use init_app_state and FrontendArgs in run_batch (#32967 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-24 19:40:39 -08:00
Nick Hill	dbf0da817a	[Core] Cleanup engine pause/sleep logic (#34528 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-24 19:33:34 -08:00
Xin Yang	3bbb2046ff	[Bugfix] Fix expert_ids padding values in moe_align_block_size kernel (#35161 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-24 17:14:24 -08:00
yugong333	576fe50333	Adding Nemotron fp8 Triton MoE Config (#34674 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-24 15:56:38 -08:00
Hashem Hashemi	a0e50a4260	Convert wvSplitKQ to 16x16 MFMA in prep for mi4xx. (#34100 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-24 23:35:21 +00:00
Benjamin Chislett	9fa5b25a23	[Bug][DSV3.2] Always prepare metadata for DeepGEMM Sparse Attention (#35075 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-24 14:55:22 -08:00
Robert Shaw	ea97750414	[CI] Fix Distributed Tests (#35236 ) Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>	2026-02-24 22:31:56 +00:00
Andreas Karatzas	067c5d9ad1	[ROCm][CI] Added MI325 mirrors (#34923 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-24 13:37:15 -08:00
Benjamin Chislett	f5972a872f	[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726 ) Signed-off-by: Shahar Mor <smor@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Shahar Mor <smor@nvidia.com> Co-authored-by: Roi Koren <roik@nvidia.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-24 09:49:56 -08:00
Matthew Bonanni	a9e15e040d	Add @MatthewBonanni to CODEOWNERS (#35207 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-02-24 10:45:10 -07:00
Lucas Wilkinson	542ca66357	Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" (#35211 )	2026-02-24 09:26:42 -08:00
Cyrus Leung	fc8456c336	[CI/Build] Fix kernels test location (#35205 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-24 09:20:34 -08:00
Wentao Ye	9ce8fad2a9	[Perf] Optimize Python Slice for Structured Output using `islice` instead of [:] (#33593 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-24 09:02:36 -08:00
Harry Mellor	c38b8d5a31	Remove `padding_index` from models that don't use it for better Transformers v5 compatibility (#35189 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 08:04:46 -08:00
Robert Shaw	60da0e1544	[CI] Remove Duplicated Tests (#35199 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Robert Shaw <robshaw@redhat.com>	2026-02-24 23:53:30 +08:00
danisereb	9609b1f18d	Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 (#35053 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-24 08:45:13 -07:00
danisereb	a0c7081695	Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#35088 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-24 07:25:44 -08:00
R3hankhan	34ce0ffd1f	[CPU][Perf] Accelerate Attention head for s390x using vector intrinsics (#34434 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-02-24 07:25:39 -08:00
Robin Nabel	0de5333989	Fix GLM4 parser tests (#34905 ) Signed-off-by: Robin Nabel <opensource@nabel.co> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-02-24 22:27:42 +08:00
Eldar Kurtić	a87cc50859	[Attn,KV-cache] Use per-head scales in the attention selector (#34281 ) Signed-off-by: Your Name <you@example.com> Signed-off-by: Eldar Kurtic <research@neuralmagic.com> Co-authored-by: Eldar Kurtic <research@neuralmagic.com> Co-authored-by: Your Name <you@example.com>	2026-02-24 09:02:43 -05:00
Cyrus Leung	761e63e541	[Frontend] Always pass `supported_tasks` to validation (#35186 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-24 04:16:33 -08:00
Isotr0py	d12d201409	[Bugfix] Fix failing FunASR processor test (#35111 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-24 04:13:45 -08:00
eustlb	b3ad37c5db	[glm-asr] change defaults dummy audio size (#35108 ) Signed-off-by: Eustache Le Bihan <eulebihan@gmail.com>	2026-02-24 04:13:33 -08:00
Wentao Ye	14561fabfd	[Perf] Optimize pooling model redundant copy, 1.8% throughput improvement (#35127 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-24 04:13:11 -08:00
Zhengxu Chen	c77f3e1207	[compile] Save aot compile artifacts atomically. (#35117 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-24 04:11:01 -08:00
Dor Huri	012dee9233	[Feature] Add LoRA tower/connector support for Llama 4 Vision (mllama4) (#35147 ) Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-24 04:10:32 -08:00
Tugsbayasgalan Manlaibaatar	f1c664545b	Make voxtral compile friendly (#33959 ) Signed-off-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-24 09:33:35 +01:00
Xin Yang	c870eb9e0f	[LoRA] Update LoRA expand kernel block_n calculation (#32621 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-23 23:17:53 -08:00
BadrBasowid	6af03f2394	[Refactor] [1/N] Reorganize kernel abstraction directory (#34055 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-24 06:47:22 +00:00
Vlad Tiberiu Mihailescu	1a6cf39dec	[CI/Build] Remove redundant OpenTelemetry pip install from CI configs (#35032 ) Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>	2026-02-23 22:24:11 -08:00
Nicolò Lucchesi	f91808ae0d	[MM] Allow audio chunking for offline LLM (#34628 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 21:04:28 -08:00
Vadim Gimpelson	33a0d43c71	[BUGFIX][Qwen3.5] Hardcode `mlp.gate` as not quantizable (#35156 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-23 19:42:24 -08:00
pschlan-amd	80d93fd6da	gpu_model_runner: Cache is_encoder_decoder from model config (#35099 ) Signed-off-by: Patrick Schlangen <pschlan@amd.com>	2026-02-23 19:08:34 -08:00
Jia Guo	ec85340531	[Quantization] Support FP8 MoE bias for models like GPT-OSS (#34906 ) Signed-off-by: jasperjiaguo <jasperg662@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-02-23 19:07:47 -08:00
Rohan Potdar	2ff4e51152	[ROCm] AITER fused RoPE+KVCache (#33443 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com> Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: charlifu <charlifu@amd.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>	2026-02-23 19:06:00 -08:00
Asaf Gardin	95642441d0	[Mamba1] - Change supports_update_block_table to True (#35054 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-02-23 19:05:57 -08:00
Xin Yang	a7c9f7b7ec	[Bugfix] Fix lora_ids in FusedMoE LoRA test (#35135 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-23 21:49:25 -05:00
Michael Goin	a4bd661fb3	[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 17:34:41 -08:00
Michael Goin	3ef9fd0f98	[Bugfix] Fix DSV3 kernels breaking _C and _moe_C on unsupported arches (#35123 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-23 17:11:27 -08:00
Michael Goin	22a97e6613	[Perf] Improve default triton fused moe configs (#34846 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-23 16:01:28 -08:00
Aaron Hao	596ed1f02e	[RL] Validation for pause_mode='keep' (#34992 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-02-23 16:30:56 -05:00
Nicolò Lucchesi	b8d8b7e934	[Misc] Monitor interface changes (#35113 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-02-23 17:14:51 +00:00
Harry Mellor	28c5e69ba0	Enforce that `model` is the first positional arg when `--served-model-name` is used (#34973 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:38:05 -08:00
Harry Mellor	864167d376	Fix custom processors that use deleted import for Transformers v5 (#35101 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:38:00 -08:00
haosdent	a2ba6a5244	[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-23 17:31:51 +01:00
Harry Mellor	c4f38696f7	Use Xet high performance mode for Transformers v5 (#35098 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-23 08:19:30 -08:00

... 3 4 5 6 7 ...

14386 Commits