biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Nick Hill	b6d5a17298	[Model Runner V2] Fix error-handling (#35063 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-26 11:00:19 -08:00
Lucas Wilkinson	5e58bdc711	[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint (#35354 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-26 18:44:50 +00:00
Runkai Tao	a1f53addb1	[BugFix] Align fused MoE-LoRA kernel config with actual weight shapes (#34396 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>	2026-02-26 18:03:10 +00:00
Wentao Ye	05970c772c	[Refactor] Remove dead code for attention benchmark script (#35418 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 09:53:46 -08:00
Yiliu Dong	d940607629	[Core] Support `min_tokens` with speculative decoding (#32642 ) Signed-off-by: qianlihuang <yiliu.dong@qq.com> Co-authored-by: qianlihuang <yiliu.dong@qq.com>	2026-02-26 12:31:28 -05:00
Wentao Ye	99c7892c5b	[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement (#35330 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 17:14:54 +00:00
hujia177	ec8f943db1	Add GlmOcrConfig for GLM-OCR model type recognition (#34982 )	2026-02-26 17:04:42 +00:00
Or Ozeri	f2ad952f40	[BugFix][kv_offload]: Fix kernel block size detection (#35125 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-02-26 16:29:34 +00:00
Sage Moore	9e2cabdf9c	[ROCm] Update the torch version in rocm_build.txt to use the official 2.10 release (#34387 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-02-26 16:28:45 +00:00
Douglas Lehr	ec8ab9d254	[ROCm] Add dynamic mxfp4 quantization for DeepSeek V2 projection layers (#34157 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com> Co-authored-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>	2026-02-26 10:00:49 -06:00
Wentao Ye	05972ea7e5	[Refactor] Remove dead or duplicate func utils or variables (#35318 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-02-26 10:57:56 -05:00
Jakub Zakrzewski	111d869069	[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297 ) Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>	2026-02-26 14:17:17 +00:00
stingoChen	7fea7250a4	[Bug] Fix missing <think> tag after tool call in MiniMax 2.1 (#35352 ) Signed-off-by: 冬马 <chenxinke@cai-inc.com> Co-authored-by: 冬马 <chenxinke@cai-inc.com>	2026-02-26 22:11:07 +08:00
Cyrus Leung	845ee348ef	[Misc] Standardize handling of `mm_processor_kwargs.size` (#35284 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-26 13:05:46 +00:00
Asaf Gardin	ec13e549d3	[Bugfix] Fix uint32 overflow in Mamba selective scan state pointer arithmetic (#35275 ) Signed-off-by: Josephasafg <ajgard7@gmail.com>	2026-02-26 12:22:06 +00:00
Li-Yongwen	c6ca51598a	[Bugfix] fix device_name for routing replay (#34336 ) Signed-off-by: liyongwen <1310439159@qq.com>	2026-02-26 12:18:38 +00:00
Yueqian Lin	c0615a296d	[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression (#35368 ) Signed-off-by: linyueqian <linyueqian@outlook.com>	2026-02-26 11:58:23 +00:00
Harry Mellor	01914445b0	Remove `bc-lint` (#35274 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-26 03:01:01 -08:00
Kunshang Ji	5281713e11	[XPU] use fixed UMD version in dockerfile.xpu (#35392 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-26 18:54:55 +08:00
HZY	32693db8ce	[Bugfix] [Qwen3.5]Fix Qwen3.5 FP8 quantization: tuple shard_id weight loading (#35289 ) Signed-off-by: daowu.hzy <daowu.hzy@alibaba-inc.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-26 18:26:15 +08:00
Akash kaothalkar	e03ddcfbd4	[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc64le (#35081 ) Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com> Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>	2026-02-26 10:21:24 +00:00
Sophie du Couédic	02acd16861	[Benchmarks] Plot benchmark timeline and requests statistics (#35220 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-26 02:17:43 -08:00
Jiangyun Zhu	ab87f85231	[Model] Ring 2.5 (#35102 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2026-02-26 02:17:11 -08:00
Krish Gupta	3827c8c55a	[Test] Add tests for n parameter in chat completions API (#35283 ) Signed-off-by: KrxGu <krishom70@gmail.com> v0.16.1rc0	2026-02-26 09:14:07 +00:00
Kevin McKay	ade81f17fe	[Bugfix][Hardware][AMD] Gate FP4 ops on gfx950 to prevent MI300X crash (#35250 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2026-02-26 16:11:07 +08:00
Gregory Shtrasberg	6042e66cd5	[ROCm] Add extra step in config initialization to populate custom ops before compilation config init (#34848 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2026-02-26 16:05:40 +08:00
Chaojun Zhang	9f9a675b23	[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115 ) Signed-off-by: chzhang <chaojun.zhang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-26 15:46:44 +08:00
Ofir Zafrir	a07c4c5939	[BugFix][XPU] Fix speculative decoding on Intel XPU due to bug with `IGC_ForceOCLSIMDWidth=16` (#35298 ) Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-26 07:15:16 +00:00
Cyrus Leung	d3a51da92a	[Benchmark] Simplify SLA scan (#35306 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-25 22:35:41 -08:00
Flora Feng	186ea22efe	[Misc][Harmony] Move Responses API only harmony utils to responses/harmony.py (#35339 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-02-26 14:35:16 +08:00
Daniele	4a9c07a0a2	[BugFix] anthropic/serving_messages: fix tool call arguments streaming (#34887 ) Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-26 05:39:48 +00:00
Jason Li	9d37941017	[torch.compile] Sequence Parallelism threshold compile ranges (#28672 ) Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com> Signed-off-by: Jason Li <jasonlizhengjian@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-26 05:00:12 +00:00
Fadi Arafeh	4171ff6dd9	[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (#34890 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-02-26 05:00:10 +00:00
Woosuk Kwon	13025e71e8	[Model Runner V2] Add coding style guide (#35325 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-25 20:42:40 -08:00
Hanjie Qiu	71dfce6aa6	[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109 ) Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com> Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>	2026-02-26 03:17:20 +00:00
hujiaxin0	2aa4140402	openpangu-vl support video input (#34134 ) Signed-off-by: hujiaxin <524446785@qq.com> Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-26 03:08:09 +00:00
Roberto L. Castro	86c3b5a808	[BugFix] Fix fp4 quant kernel on CUDA 12.8 (#35210 ) Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>	2026-02-25 18:32:50 -08:00
Seungmin Kim	160424a937	[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992 ) Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com> Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com> Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 18:15:51 -08:00
Lucas Wilkinson	9511a3f8ee	[Bugfix] Fix AttributeError in SMControlContextManager (#35338 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-25 18:01:10 -08:00
Michael Goin	de527e1cec	[UX] Add `--moe-backend` arg for explicit kernel selection (#33807 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-02-25 17:44:44 -08:00
Yongye Zhu	1976356ee6	[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2026-02-25 17:32:39 -08:00
Michael Goin	cbf8f7028c	[UX] Add `--performance-mode {balanced,interactivity,throughput}` (#34936 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-25 17:28:31 -08:00
Ming Yang	6831650c40	[offloader] v2: Hide weight onloading latency via prefetching (#29941 ) Signed-off-by: Ming Yang <minos.future@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 17:20:59 -08:00
Andreas Karatzas	ed42507f6d	[ROCm][CI] Amending deletion of AMD mirror (#35322 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:17:56 -08:00
Andreas Karatzas	9571e99945	[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:16:18 -08:00
Elizabeth Thomas	c97234c08b	fix(mxfp4): Disable monolithic path for TRITON backend with EP (#34270 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-25 13:33:42 -08:00
rasmith	b188bab441	[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor (#34985 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-02-25 19:18:00 +00:00
Lucas Wilkinson	15d76f74e2	Revert "[Misc] Enable weights loading tracking for quantized models" (#35309 )	2026-02-25 09:20:15 -08:00
Andreas Karatzas	8fd6975479	[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results (#35049 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 16:48:37 +00:00
pushkar	5d18bf8b32	[Bugfix] Fix Harmony preamble visibility in Responses API (#32114 ) Signed-off-by: Pushkar Patel <git@thepushkarp.com> Signed-off-by: pupa <pupa@users.noreply.github.com>	2026-02-25 08:08:16 -08:00

... 20 21 22 23 24 ...

15309 Commits