biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Jason Li	9d37941017	[torch.compile] Sequence Parallelism threshold compile ranges (#28672 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-26 05:00:12 +00:00
Fadi Arafeh	4171ff6dd9	[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (#34890 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-02-26 05:00:10 +00:00
Woosuk Kwon	13025e71e8	[Model Runner V2] Add coding style guide (#35325 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-25 20:42:40 -08:00
Hanjie Qiu	71dfce6aa6	[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109 ) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-02-26 03:17:20 +00:00
hujiaxin0	2aa4140402	openpangu-vl support video input (#34134 ) Signed-off-by: hujiaxin <524446785@qq.com> Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-26 03:08:09 +00:00
Roberto L. Castro	86c3b5a808	[BugFix] Fix fp4 quant kernel on CUDA 12.8 (#35210 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-02-25 18:32:50 -08:00
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Lucas Wilkinson	9511a3f8ee	[Bugfix] Fix AttributeError in SMControlContextManager (#35338 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-25 18:01:10 -08:00
Michael Goin	de527e1cec	[UX] Add `--moe-backend` arg for explicit kernel selection (#33807 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-25 17:44:44 -08:00
Yongye Zhu	1976356ee6	[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2026-02-25 17:32:39 -08:00
Michael Goin	cbf8f7028c	[UX] Add `--performance-mode {balanced,interactivity,throughput}` (#34936 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-25 17:28:31 -08:00
Ming Yang	6831650c40	[offloader] v2: Hide weight onloading latency via prefetching (#29941 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 17:20:59 -08:00
Andreas Karatzas	ed42507f6d	[ROCm][CI] Amending deletion of AMD mirror (#35322 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:17:56 -08:00
Andreas Karatzas	9571e99945	[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:16:18 -08:00
Elizabeth Thomas	c97234c08b	fix(mxfp4): Disable monolithic path for TRITON backend with EP (#34270 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 13:33:42 -08:00
rasmith	b188bab441	[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor (#34985 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-25 19:18:00 +00:00
Lucas Wilkinson	15d76f74e2	Revert "[Misc] Enable weights loading tracking for quantized models" (#35309 )	2026-02-25 09:20:15 -08:00
Andreas Karatzas	8fd6975479	[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results (#35049 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 16:48:37 +00:00
pushkar	5d18bf8b32	[Bugfix] Fix Harmony preamble visibility in Responses API (#32114 ) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>	2026-02-25 08:08:16 -08:00
haosdent	0788ff0a15	[Bugfix] Gracefully disable AllReduceFusionPass on GPUs without multicast support (#35085 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-25 07:31:45 -08:00
Chendi.Xue	d72b0be33c	[XPU]Fix for Qwen-OMNI crash (#35249 ) Signed-off-by: Chendi Xue <chendi.xue@intel.com>	2026-02-25 07:31:07 -08:00
Bhoomit	42489e43c2	[Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773 ) Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com>	2026-02-25 23:30:55 +08:00
Mario Hong	af5e6afa0a	[Bugfix] Fix step3p5 reasoning with interleaved thinking (#34211 ) Signed-off-by: mariohong <mariohong128@gmail.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2026-02-25 15:13:01 +00:00
Benjamin Chislett	ee59a7c615	[Tests] Add GSM8k check to SpecDec E2E tests (#34772 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-25 07:51:14 -05:00
Joao Gante	709eadbb0b	Doc link typo (#35281 ) Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-25 03:00:31 -08:00
Harry Mellor	90fc7f9109	Fix custom processors that use deleted behaviour for Transformers v5 (#35107 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-25 02:36:21 -08:00
Yanwen Lin	675ec59aa9	[Bugfix][CPU] Fix basic unit tests failing in CPU platforms (#34677 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-25 08:36:15 +00:00
Yanwen Lin	80e60a6133	[Doc] Suggest "--managed-python" flag when installing python using uv (#33069 ) Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>	2026-02-25 08:19:43 +00:00
jonoillar	26e722f906	[DOC][BugFix] Specfiy build dependency installation (#34513 ) Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com> Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com>	2026-02-25 08:04:06 +00:00
lichuang	2c619e5e3f	[Docs]Fix documentation formatting in architecture overview (#34679 ) Signed-off-by: codedump <lichuang1982@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-25 08:00:15 +00:00
Simon Mo	8a685be8d9	docs: document committer proposal process in governance (#35225 ) Signed-off-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-25 07:58:48 +00:00
Laura Wang	2465071510	[Perf] Add opt-in SM100 Oink RMSNorm custom-op path (#31828 ) Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com> Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>	2026-02-24 23:01:53 -08:00
wenshuai	cd43673668	[Perf] Optimize FP8 gemm of sm120. (#34424 ) Signed-off-by: wenshuai <wenshuai@xiaomi.com>	2026-02-24 22:25:24 -08:00
Xinyu Chen	35d44b4557	[XPU]Support CUDAGraph on XPU Platform (#34482 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com> Co-authored-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: zhenwei-intel <zhenwei.liu@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-24 22:22:52 -08:00
Kunshang Ji	8ad54a991b	[Platform] Add current_platform.num_compute_units interface (#35042 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>	2026-02-24 22:22:49 -08:00
Kunshang Ji	92510edc32	remove cuda check in `top_k_top_p_triton` kernel (#35011 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-24 22:22:31 -08:00
Isotr0py	a6c137521c	[Misc] Add shard_id validation for MergedColumnLinear (#35055 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-24 22:12:28 -08:00
Isotr0py	4572a06afe	[Misc] Enable weights loading tracking for quantized models (#35074 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-24 22:11:03 -08:00
Zhengxu Chen	5cc29cfb8b	[compile] Improve error message during artifacts load failure. (#35115 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-02-24 22:01:09 -08:00
Chen Zhang	8fae54faff	[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com>	2026-02-24 22:00:19 -08:00
Harry Mellor	f7967577f5	Remove requirement to use `--hf-overrides` for `DeepseekVLV2ForCausalLM` (#35203 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-24 22:00:06 -08:00
pks	af770b8e7b	[Bugfix] Fix AttributeError when passing StructuredOutputsParams to CompletionRequest (#35237 ) Signed-off-by: Patrick Simianer <patrick@lilt.com>	2026-02-24 22:00:03 -08:00
Andreas Karatzas	2ff3e436ad	[Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 05:52:44 +00:00
Jhao-Ting Chen	c2c4c4611a	[FIX] fused moe with lora shared expert dual stream (1.07x otps) (#34933 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 04:40:45 +00:00
Rohan Potdar	f38f8c9742	[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-25 04:36:40 +00:00
Flora Feng	ec1d30c0f6	[Responses] Decouple SSE event helpers from Harmony context (#35148 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-02-24 20:05:25 -08:00
Pooya Davoodi	e3b2324ec4	[Frontend] Use init_app_state and FrontendArgs in run_batch (#32967 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-24 19:40:39 -08:00
Nick Hill	dbf0da817a	[Core] Cleanup engine pause/sleep logic (#34528 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-24 19:33:34 -08:00
Xin Yang	3bbb2046ff	[Bugfix] Fix expert_ids padding values in moe_align_block_size kernel (#35161 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-02-24 17:14:24 -08:00
yugong333	576fe50333	Adding Nemotron fp8 Triton MoE Config (#34674 ) Signed-off-by: Yu Gong <yu3.gong@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-24 15:56:38 -08:00

1 2 3 4 5 ...

14228 Commits