biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Matthew Bonanni	a608b4c6c2	[5/N][Attention] Finish eliminating `vllm/attention` folder (#32064 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-27 10:02:51 -05:00
Cyrus Leung	dcd80206b7	[Chore] Update type annotation of `input_ids` in model forward (#33063 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-01-26 06:02:10 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟	482914849c	[BugFix] LoRA: Support loading base_layer of experts (#31104 ) Signed-off-by: Hollow Man <hollowman@opensuse.org>	2026-01-07 14:49:39 +08:00
Jee Jee Li	c3666f56fd	[Misc] Fix Qwen2-MoE shared_expert_gate (#31339 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-12-26 05:10:39 +00:00
Harry Mellor	cf3eacfe58	Standardise `get_rope` to use `rope_parameters["partial_rotary_factor"]`, not `rotary_dim` (#30389 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-11 20:45:23 +00:00
Tsukasa OI	73a484caa1	[Model][Quantization] Fix / Add GGUF support for Qwen2 MoE models (#30307 ) Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>	2025-12-09 19:13:10 +00:00
Matthew Bonanni	430dd4d9eb	[Attention] Remove imports from `vllm/attention/__init__.py` (#29342 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-11-26 10:53:15 -07:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Jee Jee Li	f0a30a067b	[Bugfix] Fix qwen-moe packed_modules_mapping (#26634 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-10-11 15:21:33 +00:00
bnellnm	47e66c24e2	[Model] Apply shared experts overlap optimization to all models with shared experts (#26145 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-10-09 11:31:04 -04:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Woosuk Kwon	1c3ffdbecc	[V0 Deprecation] Remove V0 sampling metadata (#25345 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-21 10:37:11 -07:00
toncao	027d37df38	[Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (#24960 ) Signed-off-by: toncao <cpatonn@gmail.com> Co-authored-by: toncao <cpatonn@gmail.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-09-18 12:08:50 +08:00
whx	4a9375fe9d	[Model] Pass param prefix to LLMHead (#24862 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-09-17 16:01:27 +08:00
Lukas Geiger	de533ab2a1	[Models] Improve iteration over layers (#19497 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-08-29 09:26:34 +08:00
Cyrus Leung	65552b476b	[Misc] Use config definitions from Transformers library (#21913 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-08-08 23:10:51 -07:00
Jee Jee Li	a3a6c695f4	[Misc] Qwen MoE model supports LoRA (#20932 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-17 18:32:52 +00:00
Jee Jee Li	a99b9f7dee	[Quantization] add BNB for MixtralForCausalLM (#20893 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-14 07:34:34 +00:00
Jee Jee Li	8020e98c9f	[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-07-11 08:01:13 +00:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Isotr0py	f07a673eb2	[Misc] Allow `AutoWeightsLoader` to skip loading weights with specific substr in name (#18358 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-19 20:20:12 -07:00
Harry Mellor	26d0419309	Update deprecated type hinting in `models` (#18132 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 22:06:50 -07:00
bnellnm	f9c069c85e	Modularize fused experts and integrate PPLX kernels (#15956 )	2025-05-14 13:11:54 -07:00
Tao He	60f7624334	Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support (#11844 )	2025-05-12 19:52:47 -07:00
Aaron Pham	da4e7687b5	[Fix] Support passing args to logger (#17425 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2025-04-30 08:06:58 -07:00
Woosuk Kwon	b411418ff0	[Chore] Remove Sampler from Model Code (#17084 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-24 02:49:33 -07:00
rongfu.leng	5a1e1c8353	[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe (#16203 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-04-08 04:05:47 -07:00
Tyler Michael Smith	72c62eae5f	[V1] EP/TP MoE + DP Attention (#13931 )	2025-03-04 21:27:26 -08:00
Tyler Michael Smith	4f5b059f14	Clean up unused padding_idx variables across many model definitions (#13240 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-03-04 21:27:00 +00:00
Harry Mellor	cdc1fa12eb	Remove unused kwargs from model definitions (#13555 )	2025-02-24 17:13:52 -08:00
Russell Bryant	e489ad7a21	[Misc] Add SPDX-License-Identifier headers to python source files (#12628 ) - Add SPDX license headers to python source files - Check for SPDX headers using pre-commit commit 9d7ef44c3cfb72ca4c32e1c677d99259d10d4745 Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:18:24 2025 -0500 Add SPDX license headers to python source files This commit adds SPDX license headers to python source files as recommended to the project by the Linux Foundation. These headers provide a concise way that is both human and machine readable for communicating license information for each source file. It helps avoid any ambiguity about the license of the code and can also be easily used by tools to help manage license compliance. The Linux Foundation runs license scans against the codebase to help ensure we are in compliance with the licenses of the code we use, including dependencies. Having these headers in place helps that tool do its job. More information can be found on the SPDX site: - https://spdx.dev/learn/handling-license-info/ Signed-off-by: Russell Bryant <rbryant@redhat.com> commit 5a1cf1cb3b80759131c73f6a9dddebccac039dea Author: Russell Bryant <rbryant@redhat.com> Date: Fri Jan 31 14:36:32 2025 -0500 Check for SPDX headers using pre-commit Signed-off-by: Russell Bryant <rbryant@redhat.com> --------- Signed-off-by: Russell Bryant <rbryant@redhat.com>	2025-02-02 11:58:18 -08:00
Cyrus Leung	d848800e88	[Misc] Move `print_*_once` from utils to logger (#11298 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com> Co-authored-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>	2025-01-09 12:48:12 +08:00
youkaichao	c055747867	[model][utils] add extract_layer_index utility function (#10599 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-23 22:22:54 -08:00
youkaichao	eebad39f26	[torch.compile] support all attention backends (#10558 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-22 14:04:42 -08:00
Isotr0py	c4e464333e	[Misc] Add uninitialized params tracking for `AutoWeightsLoader` (#10327 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2024-11-18 09:07:46 +08:00
Roger Wang	643ecf7b11	[V1] Refactor model executable interface for all text-only language models (#10374 ) Signed-off-by: Roger Wang <ywang@roblox.com>	2024-11-17 05:18:46 +00:00
youkaichao	f89d18ff74	[6/N] pass whole config to inner model (#10205 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-11 06:41:46 +00:00
youkaichao	1a95f10ee7	[5/N] pass the whole config to model (#9983 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2024-11-09 14:17:28 +08:00
Joe Runde	d58268c56a	[V1] Make v1 more testable (#9888 ) Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>	2024-11-06 11:57:35 -08:00
Aaron Pham	21063c11c7	[CI/Build] drop support for Python 3.8 EOL (#8464 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz>	2024-11-06 07:11:55 +00:00
Yongzao	2b5bf20988	[torch.compile] Adding torch compile annotations to some models (#9876 ) Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2024-11-01 00:25:47 -07:00
Murali Andoorveedu	0f6d7a9a34	[Models] Add remaining model PP support (#7168 ) Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai> Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2024-10-04 10:56:58 +08:00
Jee Jee Li	e497b8aeff	[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models (#8329 )	2024-09-10 20:59:19 -04:00
afeldman-nm	428dd1445e	[Core] Logprobs support in Multi-step (#7652 )	2024-08-29 19:19:08 -07:00
Zijian Hu	f4fc7337bf	[Bugfix] support `tie_word_embeddings` for all models (#5724 )	2024-08-19 20:00:04 -07:00
Dipika Sikka	d3bdfd3ab9	[Misc] Update Fused MoE weight loading (#7334 )	2024-08-13 14:57:45 -04:00
Cyrus Leung	7025b11d94	[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410 )	2024-08-13 05:33:41 +00:00
xuyi	1d2e7fb73f	[Model] Pipeline parallel support for Qwen2 (#6924 )	2024-07-31 18:49:51 -07:00

1 2

62 Commits