biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Aleksandr Malyshev	0925b28a8e	[ROCM] MoE fp4 CK kernel (#26545 ) Signed-off-by: Aleksandr Malyshev <maleksan@amd.com> Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>	2025-10-17 14:06:33 -04:00
Luka Govedič	bd7157a071	[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 08:10:23 -06:00
Reima Karhila (AMD)	c253745eb8	[Harware][AMD][Model] Triton MoE tuning configs for GLM-4.5 for MI350 and MI355 (#25586 ) Signed-off-by: Reima Karhila <reima.karhila@amd.com> Signed-off-by: xaguilar <Xavier.AguilarFruto@amd.com> Co-authored-by: xaguilar <Xavier.AguilarFruto@amd.com>	2025-10-17 04:56:12 -07:00
Harry Mellor	6c9fdbf725	[Docs] Replace `rst` style double-backtick with `md` single-backtick (#27091 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-17 02:47:34 -07:00
Mengqing Cao	e20eba753b	[VLM][Refactor] Remove useless func `get_input_positions` in `MRotaryEmbedding` (#27088 ) Signed-off-by: MengqingCao <cmq0113@163.com>	2025-10-17 02:00:30 -07:00
zhrrr	75c7ad9918	[Kernel][Performance] Fuse float cast and renormalize to topk softmax kernel (#26717 ) Signed-off-by: zhuhaoran <zhuhaoran.zhr@alibaba-inc.com> Signed-off-by: izhuhaoran <izhuhaoran@qq.com>	2025-10-17 07:30:35 +00:00
Boyuan Feng	08405609cc	disable graph partition in custom op (#26952 ) Signed-off-by: Boyuan Feng <boyuan@meta.com> Signed-off-by: Boyuan Feng <fby.1994@gmail.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 11:08:47 +08:00
Lukas Geiger	4d055ef465	Remove unused imports (#26972 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-10-16 19:51:17 -07:00
Cyrus Leung	4d4d6bad19	[Chore] Separate out `vllm.utils.importlib` (#27022 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-17 00:48:59 +00:00
jiahanc	41d3071918	[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-16 16:20:25 -07:00
Bram Wasti	b2f78cbad4	[small][batch invariance] Rename the env and internal flags to simplify usage (#26855 ) Signed-off-by: Bram Wasti <bwasti@meta.com>	2025-10-16 21:40:25 +00:00
Wentao Ye	b3dda72c23	[Feature] Migrate DeepGEMM API from `get_m_alignment_for_contiguous_layout` to `get_mk_alignment_for_contiguous_layout` (#26935 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-16 16:46:48 -04:00
Varun Sundar Rabindranath	fb0571b077	[GPTOSS][DP/EP][Marlin] Enable GPTOSS Batched DP/EP using Marlin kernels (#25997 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-16 12:53:11 -07:00
Kyle Sayers	a5464dcf92	[Compressed Tensors] Always clone output for compile robustness (#26849 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-16 19:29:59 +00:00
Sungjae Lee	00417f4e44	[MISC] fix import violations for re and triton modules (#26654 ) Signed-off-by: Sungjae Lee <33976427+llsj14@users.noreply.github.com> Co-authored-by: Mengqing Cao <cmq0113@163.com>	2025-10-16 03:38:27 -07:00
Cyrus Leung	d2740fafbf	[Chore] Separate out `vllm.utils.collections` (#26990 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 08:35:35 +00:00
Bram Wasti	7d8975de84	Deepseek-v3 Batch Invariant on 8xH100 (#26609 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-10-15 22:06:02 -07:00
Vadim Gimpelson	785d8b6410	[PERF] Qwen3-next MTP speedup (change bool mask indexing to index_select / index_copy to reduce d2h) (#26437 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-10-16 12:18:31 +08:00
Cyrus Leung	f6cdc9a02f	[Chore] Rename `utils` submodules (#26920 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-16 03:58:13 +00:00
kliuae	1317034379	[ROCm][FEAT] Fuse DeepSeek shared experts into AITER fused_moe ops (#24097 ) Signed-off-by: chenjun <junchen2@amd.com> Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com> Co-authored-by: valarLip <103567126+valarLip@users.noreply.github.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-10-16 10:41:34 +08:00
felixzhu555	f96bc3649c	[Qwen3-Next] Add tuned MoE config for Qwen3-Next FP8 on H100 tp2 (#26887 ) Signed-off-by: Felix Zhu <felixzhu555@gmail.com>	2025-10-15 18:55:05 -07:00
XiaobingZhang	0b99f5d302	support flashinfer_fp4 moe for 5090 gpu (#26669 ) Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-15 15:06:47 -04:00
Kaixi Hou	de92d916fe	[NVIDIA] Add support for cudnn fp4 gemm via flashinfer (#26107 ) Signed-off-by: kaixih <kaixih@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-15 13:53:00 -04:00
XiaobingZhang	d796375258	[ModelOpt] Remove NVFP4 MoE K%16==0 constraint (#26891 ) Signed-off-by: XiaobingSuper <xiaobingzhangupc@gmail.com>	2025-10-15 13:06:17 -04:00
Cyrus Leung	136a17fe6e	[Chore] Separate out `vllm.utils.func` (#26904 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-15 13:03:58 +00:00
Cyrus Leung	f93e348010	[Misc] Remove `isort` and `yapf` ignores (#26888 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-15 12:09:03 +00:00
wang.yuqi	f54f85129e	[Model][2/N] Improve all pooling task \| Support multi-vector retrieval (#25370 ) Signed-off-by: wang.yuqi <noooop@126.com>	2025-10-15 11:14:41 +00:00
Morrison Turnansky	96b9aa5aa0	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): name change compilation level to compilation mode, deprecation compilation level (#26355 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-15 02:51:16 +00:00
Michael Goin	7e0ef4084a	[CI Failure] Fix torchao dep failure for Quantization Test (#26824 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-14 16:41:43 -07:00
Dhruvil Bhatt	0e65818910	Added MoE configs for llama 4, H200 device with tp=4/8 tuning (#26837 ) Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>	2025-10-14 14:21:03 -07:00
Huamin Li	87efc681db	llama4_vision_rope: add HIP override to accept (q, k) and avoid (positions, q, k) mismatch (#26790 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-10-14 11:54:12 -07:00
Ze'ev Klapow	aba48f7db1	[Kernel][MoE] Add MoE tunings for GLM 4.6-FP8 and GLM 4.5 Air on NVidia B200 (#26818 )	2025-10-14 11:20:39 -07:00
Heng Guo	29350922c6	[Feature][Quantization] auto_round format add support for regex (#24024 ) Signed-off-by: n1ck-guo <heng.guo@intel.com> Signed-off-by: Heng Guo <heng.guo@intel.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-14 03:03:16 +00:00
Varun Sundar Rabindranath	8ae169286f	[torch.compile] Unwrap fused_marlin_moe custom op (#26739 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-14 02:22:16 +00:00
Michael Goin	3e051bda82	[UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-13 18:12:52 -07:00
Wentao Ye	7200a21cd1	[Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' (#26532 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-10-13 18:26:37 -04:00
Alex Kogan	89342ce4c0	[Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization (#26051 ) Signed-off-by: Alex Kogan <alex.kogan@oracle.com> Signed-off-by: Alex Kogan <82225080+sakogan@users.noreply.github.com>	2025-10-13 18:52:54 +00:00
Sangyeon Cho	a1b2d658ee	[CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (#26501 ) Signed-off-by: Sangyeon Cho <josang1204@gmail.com>	2025-10-13 12:58:33 -04:00
Bram Wasti	3263799056	[unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373 ) Signed-off-by: Bram Wasti <bwasti@meta.com> Signed-off-by: Bram Wasti <bwasti@fb.com>	2025-10-13 10:24:53 -04:00
bnellnm	60e419c1ee	[Misc] cache result of disable_inplace (#26666 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-13 00:17:50 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Isotr0py	045b396d09	[Bugfix][CI/Build] Fix failing Mteb CI (#26638 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-12 02:42:42 -07:00
Vadim Gimpelson	82e64c7a20	[PERF] [Qwen3-next] Speed up gated RMSNorm (#26207 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com> Signed-off-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-12 08:27:50 +00:00
dsinghvi	727144bed1	[Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py (#24172 ) Signed-off-by: Divyansh Singhvi <divyanshsinghvi@gmail.com> Signed-off-by: dsinghvi <divyanshsinghvi@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: wwl2755 <wangwenlong2755@gmail.com>	2025-10-11 07:21:04 +00:00
Roger Wang	ddaff2938e	[MM] Move Qwen3Omni MRoPE impl to model file (#26608 ) Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-10 22:17:24 -07:00
Harry Mellor	7c12763b24	Fix some typing issues found by `mypy==1.18.2` (#26596 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-10 18:21:25 +00:00
Xiong Wang	19a9b169bf	Add Qwen3-Omni moe thinker (#25550 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Xiong Wang <feizi.wx@alibaba-inc.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-10 17:00:56 +00:00
Roberto L. Castro	96ad65b7fe	[Transform] [Quantization] Add QuTLASS support to vLLM (#24440 ) Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Signed-off-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Andrei Panferov <andrei@panferov.org> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-10 09:43:40 -07:00
Elvir Crnčević	7b03584de8	Silu v2 (#25074 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: elvircrn <elvircrn@gmail.com> Signed-off-by: Elvir Crnčević <elvircrn@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>	2025-10-10 15:19:53 +00:00
Lucas Kabela	213b64452a	[Bugfix] Convert untraceable GroupShape to list for AMD impl (#26535 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2025-10-10 13:32:29 +00:00

1 2 3 4 5 ...

1329 Commits