biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	89a77b1084	[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility (#34447 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com> (cherry picked from commit `4c078fa546`) (cherry picked from commit a976961fb77d38129abf69edd4952101731f2421) v0.16.0	2026-02-24 20:30:22 -08:00
Kevin H. Luu	d3c1513f5f	[ci] Use the right tag for CPU arm64 image (#34915 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> (cherry picked from commit `a1a2d79442`) (cherry picked from commit 772f70839192262ff01c533d821a11a225d1c00f)	2026-02-24 20:30:13 -08:00
Cyrus Leung	5dbfbc967b	[CI/Build] Fix gRPC version mismatch (#35013 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> (cherry picked from commit `965fe45935`) (cherry picked from commit 90308959295b66049024649fe1273070477f343d)	2026-02-24 20:30:02 -08:00
khluu	c86cdcbcd2	Revert "[Release 2.10] Update to Torch 2.10 - final release (#30525 )" This reverts commit `f97ca67176`.	2026-02-24 20:28:53 -08:00
khluu	3c9496f146	Revert "[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153 )" This reverts commit `55a1baebc5`.	2026-02-24 20:28:45 -08:00
khluu	2d5be1dd5c	release script Signed-off-by: khluu <khluu000@gmail.com>	2026-02-12 02:37:52 -08:00
Michael Goin	7a06e5b05b	[Bugfix] Fix MTP accuracy for GLM-5 (#34385 ) Signed-off-by: mgoin <mgoin64@gmail.com> (cherry picked from commit `ec12d39d44`) v0.16.0rc3	2026-02-11 20:54:27 -08:00
Junseo Park	946b2f106c	[Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963 ) Signed-off-by: pjs102793 <pjs102793@naver.com> Co-authored-by: Nick Hill <nickhill123@gmail.com> (cherry picked from commit `5458eb835d`)	2026-02-11 20:54:14 -08:00
Nick Hill	5e8adb0c49	[Misc] Bump `fastsafetensors` version for latest fixes (#34273 ) Signed-off-by: Nick Hill <nickhill123@gmail.com> (cherry picked from commit `79504027ef`)	2026-02-11 20:54:00 -08:00
Xinyu Dong	9be1ff2d3a	[Bugfix] fix default is_neox_style is True for deepseek (#34353 ) Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com> (cherry picked from commit `be7f3d5d20`)	2026-02-11 20:53:40 -08:00
Jee Jee Li	b3ee90f961	[Model] GLM adaptation (#34124 ) (cherry picked from commit `978a37c823`)	2026-02-11 20:53:11 -08:00
Seiji Eicher	c44d0c6d66	Patch protobuf for CVE-2026-0994 (#34253 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com> (cherry picked from commit `5045d5c983`) v0.16.0rc2	2026-02-11 02:33:40 -08:00
Kunshang Ji	83db96d8cd	[XPU][9/N] clean up existing ipex code/doc (#34111 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com> (cherry picked from commit `cb9574eb85`)	2026-02-11 02:33:27 -08:00
zofia	dbfb79fe45	[XPU][7/N] enable xpu fp8 moe (#34202 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> (cherry picked from commit `b482f71e9f`)	2026-02-11 02:33:15 -08:00
Roger Wang	b2e1fc3589	[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183 ) Signed-off-by: Roger Wang <hey@rogerw.io> (cherry picked from commit `8a5e0e2b2b`)	2026-02-11 02:33:04 -08:00
Gregory Shtrasberg	55a1baebc5	[Bugfix][ROCm][GPT-OSS] Use old triton_kernels implementation on ROCm if the new API is not available (#34153 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> (cherry picked from commit `c60f8e3b49`)	2026-02-11 02:32:52 -08:00
Charlie Fu	e1e9841631	[torch.compile][Fusion] Fix attention fusion pass removing kv_udpate op. (#33945 ) Signed-off-by: charlifu <charlifu@amd.com> (cherry picked from commit `bb9f97308d`)	2026-02-11 02:32:41 -08:00
zofia	5bd63387c3	[XPU][6/N] add xpu scaled_mm kernel (#34117 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com> (cherry picked from commit `9bdb06b436`)	2026-02-11 02:32:27 -08:00
wang.yuqi	22b64948f6	[Frontend][last/5] Make pooling entrypoints request schema consensus. (#31127 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> v0.16.0rc1	2026-02-09 06:42:38 +00:00
Reagan Lee	7c233dbb36	[Tiny] Rename encoder budget file to more specific name (#34103 ) Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>	2026-02-09 03:48:19 +00:00
kourosh hakhamaneshi	a75a5b54c7	[bug-fix] supported_tasks is breaking backward compatibility at init_app_state (#34027 ) Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com> Signed-off-by: kourosh hakhamaneshi <31483498+kouroshHakha@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-09 09:46:46 +08:00
Andrey Talman	f97ca67176	[Release 2.10] Update to Torch 2.10 - final release (#30525 )	2026-02-08 13:51:09 -08:00
danisereb	084aa19f02	Add support for ModelOpt MXFP8 dense models (#33786 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-08 11:16:48 -08:00
navmarri14	1ecfabe525	glm 4.6 fused tuned inference config for B200 (#32958 )	2026-02-08 18:55:47 +00:00
Richard Zou	4df841fe75	[torch.compile] Add an option to force-enable the MOE cold start optimization (#33735 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-08 18:42:56 +00:00
TomerBN-Nvidia	a263aa6140	[BugFix] Change support no act and mul for marlin (#34088 ) Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com> Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>	2026-02-08 17:18:22 +00:00
aabbccddwasd	179ae7da8f	[Revert] Fix performance regression for GLM-4.7-GPTQ decode and MTP acceptance rate (#33771 ) Signed-off-by: aabbccddwasd <aabbccddwasd@qq.com>	2026-02-08 08:13:24 -08:00
Reagan Lee	c4df59ad43	Add embedding input functionality for disabled modalities [remake] (#32493 ) Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”> Signed-off-by: Reagan Lee <reaganjlee@gmail.com> Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com> Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-08 04:57:16 -08:00
TJian	785cf28fff	[ROCm] [CI] Reduce Resource of two test groups (#34059 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-02-08 15:17:26 +08:00
Nick Hill	a96197f564	[Perf] Simplify DeepseekV32 tokenizer, ensure fast detokenization used (#33855 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-08 07:16:34 +00:00
Andreas Karatzas	ab10d79855	[ROCm][Bugfix] fix act_quant_fusion module import error (#34069 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-07 19:21:12 -08:00
Cyrus Leung	7fcb705b80	[CI/Build] Skip GCS test (#34057 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 08:52:38 -08:00
Cyrus Leung	b956cdf818	[Doc] Fix run_batch docs (#34056 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 06:18:16 -08:00
Hashem Hashemi	ed17f54c8b	Perf tuning and expansion of cases covered for wvSplitKrc (#33493 ) Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>	2026-02-07 05:33:11 -08:00
Jiang Wu	860981d8d8	Make directory exist ok for ray spinning up multiple replicas on a single instance (#33604 ) Signed-off-by: Jiang Wu <jwu@cclgroup.com>	2026-02-07 05:30:49 -08:00
zifeitong	52181baaea	Update DeepGEMM version pin in Dockerfile to match #32479 (#33935 ) Signed-off-by: Zifei Tong <zifeitong@gmail.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-07 05:30:22 -08:00
Rohan Potdar	de3869bb4d	move checks out of `unified_kv_cache_update` custom op (#33943 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-07 05:30:09 -08:00
whx	ce9b3cd3e9	[PluggableLayer][3/N] Apply PluggableLayer to mamba layers. (#33660 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2026-02-07 05:26:05 -08:00
Jee Jee Li	db4ede9743	[Model] Enable Step3p5ForCausalLM testing (#33755 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-07 05:25:24 -08:00
Pooya Davoodi	2cb2340f7a	[Frontend]Add support for transcriptions and translations to run_batch (#33934 ) Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-07 05:24:57 -08:00
TundeAtSN	4df44c16ba	Enable Eagle3 speculative decoding for Mistral3ForConditionalGeneration to support eagle3 (#33939 ) Signed-off-by: Akintunde Oladipo <akintunde.oladipo@servicenow.com> Signed-off-by: TundeAtSN <akintunde.oladipo@servicenow.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-07 05:24:52 -08:00
Richard Zou	81fe69cae5	[torch.compile] Stop compiling identical artifacts (#34003 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-07 05:24:48 -08:00
Mohammad Miadh Angkad	dd6a6e1190	[Kernel] Add KernelConfig flag to enable/disable FlashInfer autotune (#34006 ) Signed-off-by: Mohammad Miadh Angkad <176301910+mmangkad@users.noreply.github.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-02-07 05:24:44 -08:00
Cyrus Leung	edb359cce4	[Renderer] Define `render_cmpl` and `render_chat` (#34039 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:24:40 -08:00
wang.yuqi	6ed5eda300	[CI][Build] Pin grpcio-tools==1.78.0 (#34048 ) Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-07 05:24:35 -08:00
Cyrus Leung	11a4c9d30d	[Misc] Simplify `get_max_tokens` (#34036 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 00:59:49 -08:00
lukec	15a0b9e570	Fix spelling errors (#33978 )	2026-02-06 23:58:50 -08:00
Andreas Karatzas	c490d8cc73	[ROCm][CI] Pinning lm-eval version to resolve multi-modal small eval bug (#34038 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-06 22:21:08 -08:00
Cyrus Leung	48312e579a	[Misc] Make `PlaceholderRange.get_num_embeds` a method (#34035 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-07 05:30:17 +00:00
Vel	bc32444b23	[Kernel] Add enable_sm120_or_later for SM121 (DGX Spark) CUTLASS support (#33517 ) Signed-off-by: code4me2 <velvetmoon222999@gmail.com>	2026-02-06 20:28:01 -08:00

1 2 3 4 5 ...

13751 Commits