biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
rongfu.leng	c68b5c63eb	[Misc] fix olmoe model layer can't laod in tp gt 1 (#18828 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-28 17:36:21 +00:00
Harry Mellor	4c2b38ce9e	Enable Pydantic mypy checks and convert configs to Pydantic dataclasses (#17599 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-28 12:46:04 +00:00
wang.yuqi	3e9ce609bd	[Bugfix] Fix nomic max_model_len (#18755 )	2025-05-27 20:29:53 -07:00
Satyajith Chilappagari	e0cbad4e30	[Neuron] Support quantization on neuron (#18283 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com>	2025-05-27 22:10:33 +00:00
Cyrus Leung	696259ca01	[Core] Automatically cast multi-modal input dtype (#18756 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-27 23:45:48 +08:00
Cyrus Leung	4318c0559d	[CI/Build] Remove imports of built-in `re` (#18750 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-27 09:19:18 +00:00
Hyogeun Oh (오효근)	a68e293cb9	[Doc] Convert Sphinx directives ( `{class}`, `{meth}`, `{attr}`, ...) to MkDocs format for better documentation linking (#18663 ) Signed-off-by: Zerohertz <ohg3417@gmail.com>	2025-05-27 01:44:20 -07:00
Shawn Huang	6881107948	[BUG FIX] minicpm (#18739 ) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>	2025-05-27 01:04:49 -07:00
almersawi	a547aeb828	feat(rocm-support): support mamba2 on rocm (#18565 ) Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai> Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai>	2025-05-27 00:07:53 -07:00
vllmellm	d260f799a9	[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. (#18271 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-26 23:14:07 -07:00
Lukas Geiger	b50602d5f0	[Model][Gemma3] Cast image pixel values already on CPU (#18732 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-27 05:42:54 +00:00
Isotr0py	1f1b1bc03b	[V1][Quantization] Add CUDA graph compatible v1 GGUF support (#18646 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-27 04:40:28 +00:00
Lukas Geiger	0eebd74842	[Model][Gemma3] Simplify image input validation (#18710 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-27 11:13:37 +08:00
Cyrus Leung	a869baca73	[Bugfix] Fix Llama GGUF initialization (#18717 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:49:22 -07:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Naveassaf	6d68030f1c	[Model] Add support for YARN in NemotronNAS models (#18427 ) Signed-off-by: Nave Assaf <nassaf@nvidia.com>	2025-05-26 10:31:49 +00:00
Maximilien de Bayser	561b77a0d6	[Bugfix] Fix the lm_head in gpt_bigcode in lora mode (#6357 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com>	2025-05-26 14:52:25 +08:00
Cyrus Leung	57fd13a707	[Bugfix] Fix profiling dummy data for Pixtral (#18677 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-25 14:05:30 +00:00
Cyrus Leung	503f8487c2	[Misc] Reduce logs on startup (#18649 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-24 23:03:53 -07:00
Ning Xie	44073a7ac3	[BUGFIX] catch subclass first for try...except (#18672 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-25 05:34:24 +00:00
Isotr0py	75f81750f3	[VLM] Initialize video input support for InternVL models (#18499 ) Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2025-05-25 04:51:25 +00:00
Mengqing Cao	6ab681bcbe	[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE (#18655 ) Signed-off-by: Mengqing Cao <cmq0113@163.com> Signed-off-by: Isotr0py <2037008807@qq.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-05-25 04:51:21 +00:00
Ning Xie	6c6dcd8611	[MISC] correct signature for LoaderFunction (#18670 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-05-24 20:17:47 -07:00
wangxiyuan	b9018a3f9f	[BugFix] Fix import error for fused_moe (#18642 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-05-24 07:53:36 -07:00
ztang2370	2cd4d58df4	[Model] use AutoWeightsLoader for gpt2 (#18625 ) Signed-off-by: zt2370 <ztang2370@gmail.com>	2025-05-24 13:36:13 +00:00
Yuanhao WU	a859320575	[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) (#18647 )	2025-05-24 09:15:36 +00:00
Wenhua Cheng	ec82c3e388	FIX MOE issue in AutoRound format (#18586 ) Signed-off-by: wenhuach21 <wenhua.cheng@intel.com>	2025-05-23 22:01:40 -07:00
Feng XiaoLong	4fc1bf813a	[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking (#18454 ) Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com> Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com>	2025-05-23 16:16:26 -07:00
Pavani Majety	f2036734fb	[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation (#18160 ) Signed-off-by: Pavani Majety <pmajety@nvidia.com>	2025-05-23 15:52:20 -07:00
Michael Goin	0ddf88e16e	[CI] Enable test_initialization to run on V1 (#16736 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-23 15:09:44 -07:00
Jiayi Yao	2628a69e35	[V1] Support Deepseek MTP (#18435 ) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn> Co-authored-by: Rui Qiao <ruisearch42@gmail.com>	2025-05-23 10:26:28 -07:00
youkaichao	6a7988c55b	Refactor pplx init logic to make it modular (prepare for deepep) (#18200 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2025-05-23 23:43:43 +08:00
Harry Mellor	d4c2919760	Include private attributes in API documentation (#18614 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 06:18:31 -07:00
Tristan Leclercq	6220f3c6b0	[Bugfix] Fix transformers model impl ignored for mixtral quant (#18602 ) Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com>	2025-05-23 05:54:13 -07:00
Harry Mellor	2edb533af2	Replace `{func}` with mkdocs style links (#18610 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 05:51:38 -07:00
Kay Yan	7ab056c273	[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to `requirements/cpu.txt` (#18542 ) Signed-off-by: Kay Yan <kay.yan@daocloud.io>	2025-05-23 04:38:42 -07:00
Madeesh Kannan	e493e48524	[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731 ) Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-05-23 03:38:23 -07:00
Mengqing Cao	4ce64e2df4	[Bugfix][Model] Fix baichuan model loader for tp (#18597 ) Signed-off-by: Mengqing Cao <cmq0113@163.com>	2025-05-23 02:39:05 -07:00
Harry Mellor	a1fe24d961	Migrate docs from Sphinx to MkDocs (#18145 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 02:09:53 -07:00
Benjamin Chislett	583507d130	[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-22 20:17:39 -07:00
Harry Mellor	4b0da7b60e	Enable hybrid attention models for Transformers backend (#18494 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 10:12:08 +08:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Chenheli Hua	04eb88dc80	Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569 ) Signed-off-by: Chenheli Hua <huachenheli@outlook.com>	2025-05-23 01:59:18 +00:00
Sanger Steel	c32e249a23	[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926 ) Signed-off-by: Sanger Steel <sangersteel@gmail.com>	2025-05-22 18:44:18 -07:00
Tyler Michael Smith	6e588da0f4	[Build/CI] Fix CUDA 11.8 build (#17679 ) Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>	2025-05-22 12:13:54 -07:00
aws-elaineyz	fa72f9a812	Order sequence ids + config update to support specifying custom quantization layers (#18279 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Tailin Pan <tailinpa@amazon.com> Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com> Co-authored-by: Yishan McNabb <yishanm@amazon.com> Co-authored-by: Patrick Lange <patlange@amazon.com> Co-authored-by: Maxwell Goldberg <mgld@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:20:36 -07:00
aws-elaineyz	ebed81fbf5	Update default neuron config for speculation (#18274 ) Signed-off-by: Elaine Zhao <elaineyz@amazon.com> Co-authored-by: Shashwat Srijan <sssrijan@amazon.com> Co-authored-by: Aakash Shetty <sheaak@amazon.com>	2025-05-22 02:18:55 -07:00
Shane A	51797775c3	[Bugfix][Model] Make Olmo2Model weight loading return loaded weights (#18504 ) Signed-off-by: Shane A <shanea@allenai.org>	2025-05-21 21:17:03 -07:00
youngrok cha	d022115cc6	[Bugfix] Inconsistent token calculation compared to HF in llava family (#18479 ) Signed-off-by: jaycha <jaycha@ncsoft.com>	2025-05-21 20:21:47 -07:00
Dhia Eddine Rhaiem	20bd6f4d2e	[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500 ) Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae> Co-authored-by: younesbelkada <younesbelkada@gmail.com> Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae> Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae>	2025-05-21 19:23:59 -07:00

1 2 3 4 5 ...

1869 Commits