biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Hari	a3e2e250f0	[Feature] Add Azure Blob Storage support for RunAI Model Streamer (#34614 ) Signed-off-by: hasethuraman <hsethuraman@microsoft.com>	2026-03-15 19:38:21 +08:00
Isotr0py	6590a3ecda	[Frontend] Remove `torchcodec` from audio dependency (#37061 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-15 05:15:59 +00:00
arlo	8c29042bb9	[Feature] Add InstantTensor weight loader (#36139 )	2026-03-14 18:05:23 +01:00
seanmamasde	84868e4793	[Bugfix][Frontend] Fix audio transcription for MP4, M4A, and WebM formats (#35109 ) Signed-off-by: seanmamasde <seanmamasde@gmail.com>	2026-03-14 08:44:03 -07:00
Yanan Cao	236de72e49	[CI] Pin helion version (#37012 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-13 23:25:29 -04:00
Li, Jiang	092ace9e3a	[UX] Improve UX of CPU backend (#36968 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-14 09:27:29 +08:00
Simo Lin	572c776bfb	build: update smg-grpc-servicer to use vllm extra (#36938 ) Signed-off-by: Simo Lin <linsimo.mark@gmail.com>	2026-03-13 01:31:36 +00:00
Chang Su	507ddbe992	feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169 ) Signed-off-by: Chang Su <chang.s.su@oracle.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2026-03-10 03:29:59 -07:00
Andrii Skliar	5d199ac8f2	Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: Andrii Skliar <askliar@nvidia.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Andrii <askliar@nvidia.com> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster> Co-authored-by: Andrii <askliar@nvidia.com> Co-authored-by: root <root@pool0-03748.cm.cluster> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: root <root@pool0-02416.cm.cluster> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: root <root@pool0-04880.cm.cluster>	2026-03-03 23:20:33 -08:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Ma Jian	90805ff464	[CI/Build] CPU release supports both of AVX2 and AVX512 (#35466 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Co-authored-by: jiang1.li <jiang1.li@intel.com>	2026-02-28 04:35:21 +00:00
Sophie du Couédic	02acd16861	[Benchmarks] Plot benchmark timeline and requests statistics (#35220 ) Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-26 02:17:43 -08:00
Nick Hill	79504027ef	[Misc] Bump `fastsafetensors` version for latest fixes (#34273 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 00:30:09 -08:00
emricksini-h	325ab6b0a8	[Feature] OTEL tracing during loading (#31162 )	2026-02-05 16:59:28 -08:00
Michael Goin	d0cbac5827	[Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install (#32948 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Shengqi Chen <i@harrychen.xyz>	2026-01-23 19:15:17 -08:00
Lucas Wilkinson	889722f3bf	[FlashMLA] Update FlashMLA to expose new arguments (#32810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 22:02:39 -07:00
Yanan Cao	9d1e611f0e	[CI] Add Helion as an optional dependency (#32482 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2026-01-19 19:09:56 +00:00
Isotr0py	cee7436a26	[Misc] Make `scipy` as optional audio/benchmark dependency (#32096 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-11 00:18:57 -08:00
TJian	7a05d2dc65	[CI] [ROCm] Fix `tests/entrypoints/test_grpc_server.py` on ROCm (#31970 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-01-09 12:54:20 +08:00
Chang Su	791b2fc30a	[grpc] Support gRPC server entrypoint (#30190 ) Signed-off-by: Chang Su <chang.s.su@oracle.com> Signed-off-by: njhill <nickhill123@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: njhill <nickhill123@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2026-01-07 23:24:46 -08:00
RickyChen / 陳昭儒	b3a2bdf1ac	[Feature] Add offline FastAPI documentation support for air-gapped environments (#30184 ) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com> Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-29 16:22:39 +00:00
Andreas Karatzas	0247a91e00	[ROCm][CI] Fix entrypoints tests and Python-only installation test on ROCm (#28979 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-23 22:42:30 -08:00
Noa Neria	6366c098d7	Validating Runai Model Streamer Integration with S3 Object Storage (#29320 ) Signed-off-by: Noa Neria <noa@run.ai>	2025-12-04 18:04:43 +08:00
Shengqi Chen	1109f98288	[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-03 14:08:19 -08:00
Amr Mahdi	f5d3d93c40	[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2025-12-03 11:41:53 +00:00
Shengqi Chen	4b612664fd	[CI] Renovation of nightly wheel build & generation (take 2) (#29838 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 22:17:10 -08:00
Kevin H. Luu	1336a1ea24	Revert #29787 and #29690 (#29815 )	2025-12-01 13:42:03 -08:00
Shengqi Chen	37593deb02	[CI] fix url-encoding behavior in nightly metadata generation (#29787 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 23:17:20 +08:00
Shengqi Chen	36db0a35e4	[CI] Renovation of nightly wheel build & generation (#29690 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 21:25:39 +08:00
Ralf Gommers	7c1ed45848	[CI/Build]: make it possible to build with a free-threaded interpreter (#29241 ) Signed-off-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 15:21:46 -08:00
yihong	2d4978a57e	fix: clean up function never use in setup.py (#29061 ) Signed-off-by: yihong0618 <zouzou0208@gmail.com>	2025-11-22 05:00:04 -08:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
Johnny Yang	fdfd5075aa	[TPU] patch TPU wheel build script to resolve metadata issue (#27279 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-11-13 09:36:54 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	4ca5cd5740	[Core][AMD] Migrate fully transparent sleep mode to ROCm platform (#12695 ) Signed-off-by: Hollow Man <hollowman@opensuse.org> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com>	2025-11-12 15:24:12 -08:00
Benjamin Bartels	17d055f527	[Feat] Adds runai distributed streamer (#27230 ) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev> Co-authored-by: omer-dayan <omdayan@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-10-29 21:09:10 -07:00
Cyrus Leung	ecca3fee76	[Frontend] Add `vllm bench sweep` to CLI (#27639 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-29 05:59:48 -07:00
Michael Goin	7ef6052804	[CI/Build] Add tool to build vllm-tpu wheel (#19165 ) Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-12 16:25:40 -06:00
Michael Goin	c9d33c60dc	[UX] Add FlashInfer as default CUDA dependency (#26443 ) Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-09 14:10:02 -07:00
elvischenv	5e49c3e777	Bump Flashinfer to v0.4.0 (#26326 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2025-10-08 23:58:44 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Fadi Arafeh	9705fba7b7	[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack (#25948 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2025-10-04 12:16:38 +08:00
Yongye Zhu	fa7e254a7f	[New Model] DeepSeek-V3.2 (Rebased to Main) (#25896 ) Signed-off-by: Chen Zhang <zhangch99@outlook.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com> Signed-off-by: Lucia Fang <fanglu@meta.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: youkaichao <youkaichao@gmail.com> Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com> Co-authored-by: Lucia Fang <fanglu@meta.com> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Xiaozhu Meng <mxz297@gmail.com> Co-authored-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>	2025-09-30 17:14:41 +08:00
Cyrus Leung	d346ec695e	[CI/Build] Consolidate model loader tests and requirements (#25765 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-09-26 21:45:20 -07:00
Simon Mo	fd2f10546c	[ci] fix wheel names for arm wheels (#24898 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-09-15 14:39:08 -07:00
Benjamin Bartels	94b03f88dd	Bump Flashinfer to 0.3.1 (#24868 ) Signed-off-by: bbartels <benjamin@bartels.dev>	2025-09-15 12:45:55 -07:00
pwschuurman	4377b1ae3b	[Bugfix] Update Run:AI Model Streamer Loading Integration (#23845 ) Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai> Signed-off-by: Peter Schuurman <psch@google.com> Co-authored-by: Omer Dayan (SW-GPU) <omer@run.ai> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-09-09 21:37:17 -07:00
Woosuk Kwon	4172235ab7	[V0 deprecation] Deprecate V0 Neuron backend (#21159 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-06 16:15:18 -07:00
Po-Han Huang (NVIDIA)	78336a0c3e	Upgrade FlashInfer to v0.3.0 (#24086 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2025-09-04 09:49:20 -07:00
weiliang	ae067888d6	Update Flashinfer to 0.2.14.post1 (#23537 ) Signed-off-by: Siyuan Fu <siyuanf@nvidia.com> Signed-off-by: siyuanf <siyuanf@nvidia.com> Signed-off-by: Weiliang Liu <weiliangl@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Siyuan Fu <siyuanf@nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-08-25 18:30:44 -07:00
Daifeng Li	fa78de9dc3	Quantization: support FP4 quantized models on AMD CDNA2/CDNA3 GPUs (#22527 ) Signed-off-by: feng <fengli1702@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-08-22 20:53:21 -06:00

1 2 3 4 5

235 Commits