Commit Graph

14386 Commits

Author SHA1 Message Date
youkaichao
c81081fece [torch.compile] transparent compilation with more logging (#12246)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-21 19:32:55 +08:00
Cyrus Leung
a94eee4456 [Bugfix] Fix mm_limits access for merged multi-modal processor (#12252)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 10:09:39 +00:00
Cyrus Leung
f2e9f2a3be [Misc] Remove redundant TypeVar from base model (#12248)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 08:40:39 +00:00
Jee Jee Li
1f1542afa9 [Misc]Add BNB quantization for PaliGemmaForConditionalGeneration (#12237)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-01-21 07:49:08 +00:00
Cyrus Leung
96912550c8 [Misc] Rename MultiModalInputsV2 -> MultiModalInputs (#12244)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-21 07:31:19 +00:00
youkaichao
2fc6944c5e [ci/build] disable failed and flaky tests (#12240)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-21 13:25:03 +08:00
Nicolò Lucchesi
5fe6bf29d6 [BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-01-21 12:23:14 +08:00
Gregory Shtrasberg
d4b62d4641 [AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-01-21 12:22:23 +08:00
Michael Goin
ecf67814f1 Add quantization and guided decoding CODEOWNERS (#12228)
Signed-off-by: mgoin <michael@neuralmagic.com>
2025-01-20 18:23:40 -07:00
Jinzhen Lin
750f4cabfa [Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Co-authored-by: Michael Goin <mgoin@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-01-20 16:42:16 -08:00
Cheng Kuan Yong Jason
06a760d6e8 [bugfix] catch xgrammar unsupported array constraints (#12210)
Signed-off-by: Jason Cheng <jasoncky96@gmail.com>
2025-01-20 16:42:02 -08:00
youkaichao
da7512215f [misc] add cuda runtime version to usage data (#12190)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2025-01-21 00:31:01 +00:00
Işık
af69a6aded fix: update platform detection for M-series arm based MacBook processors (#12227)
Signed-off-by: isikhi <huseyin.isik000@gmail.com>
2025-01-20 22:23:28 +00:00
Roger Wang
7bd3630067 [Misc] Update CODEOWNERS (#12229) 2025-01-20 22:19:09 +00:00
Chen Zhang
96663699b2 [CI] Pass local python version explicitly to pre-commit mypy.sh (#12224)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-20 23:49:18 +08:00
Cyrus Leung
18572e3384 [Bugfix] Fix HfExampleModels.find_hf_info (#12223)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:35:36 +00:00
wangxiyuan
86bfb6dba7 [Misc] Pass attention to impl backend (#12218)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
2025-01-20 23:25:28 +08:00
Chen Zhang
5f0ec3935a [V1] Remove _get_cache_block_size (#12214)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2025-01-20 21:54:16 +08:00
youkaichao
c222f47992 [core][bugfix] configure env var during import vllm (#12209)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-20 19:35:59 +08:00
youkaichao
170eb35079 [misc] print a message to suggest how to bypass commit hooks (#12217)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-20 18:06:24 +08:00
Cyrus Leung
b37d82791e [Model] Upgrade Aria to transformers 4.48 (#12203)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 17:58:48 +08:00
Cyrus Leung
3127e975fb [CI/Build] Make pre-commit faster (#12212)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 17:36:24 +08:00
Cyrus Leung
4001ea1266 [CI/Build] Remove dummy CI steps (#12208)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 16:41:57 +08:00
youkaichao
5c89a29c22 [misc] add placeholder format.sh (#12206)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-20 16:04:49 +08:00
Cyrus Leung
59a0192fb9 [Core] Interface for accessing model from VllmRunner (#10353)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-20 15:00:59 +08:00
Isotr0py
83609791d2 [Model] Add Qwen2 PRM model support (#12202)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-20 14:59:46 +08:00
Yuan Tang
0974c9bc5c [Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-20 14:59:20 +08:00
Yuan Tang
d2643128f7 [DOC] Add missing docstring in LLMEngine.add_request() (#12195)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-20 14:59:00 +08:00
Yuan Tang
c5c06209ec [DOC] Fix typo in docstring and assert message (#12194)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-01-20 14:58:29 +08:00
Harry Mellor
3ea7b94523 Move linting to pre-commit (#11975)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-01-20 14:58:01 +08:00
youkaichao
51ef828f10 [torch.compile] fix sym_tensor_indices (#12191)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-20 11:37:50 +08:00
shangmingc
df450aa567 [Bugfix] Fix num_heads value for simple connector when tp enabled (#12074)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
2025-01-20 02:56:43 +00:00
Martin Gleize
bbe5f9de7d [Model] Support for fairseq2 Llama (#11442)
Signed-off-by: Martin Gleize <mgleize@meta.com>
Co-authored-by: mgleize user <mgleize@a100-st-p4de24xlarge-4.fair-a100.hpcaas>
2025-01-19 10:40:40 -08:00
Roger Wang
81763c58a0 [V1] Add V1 support of Qwen2-VL (#12128)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: imkero <kerorek@outlook.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-01-19 19:52:13 +08:00
Isotr0py
edaae198e7 [Misc] Add BNB support to GLM4-V model (#12184)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-19 19:49:22 +08:00
gujing
936db119ed benchmark_serving support --served-model-name param (#12109)
Signed-off-by: zibai <zibai.gj@alibaba-inc.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
2025-01-19 09:59:56 +00:00
youkaichao
e66faf4809 [torch.compile] store inductor compiled Python file (#12182)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-19 16:27:26 +08:00
Cyrus Leung
630eb5b5ce [Bugfix] Fix multi-modal processors for transformers 4.48 (#12187) 2025-01-18 19:16:34 -08:00
Michal Adamczyk
4e94951bb1 [BUGFIX] Move scores to float32 in case of running xgrammar on cpu (#12152)
Signed-off-by: Michal Adamczyk <madamczyk@habana.ai>
2025-01-19 11:12:05 +08:00
Simon Mo
7a8a48d51e [V1] Collect env var for usage stats (#12115) 2025-01-19 03:07:15 +00:00
yancong
32eb0da808 [Misc] Support register quantization method out-of-tree (#11969) 2025-01-18 16:13:16 -08:00
youkaichao
6d0e3d3724 [core] clean up executor class hierarchy between v1 and v0 (#12171)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-18 14:35:15 +08:00
Isotr0py
02798ecabe [Model] Port deepseek-vl2 processor, remove dependency (#12169)
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-01-18 13:59:39 +08:00
Russell Bryant
813f249f02 [Docs] Fix broken link in SECURITY.md (#12175)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2025-01-18 04:35:21 +00:00
youkaichao
da02cb4b27 [core] further polish memory profiling (#12126)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-18 12:25:08 +08:00
Hongxia Yang
c09503ddd6 [AMD][CI/Build][Bugfix] use pytorch stale wheel (#12172)
Signed-off-by: hongxyan <hongxyan@amd.com>
2025-01-18 11:15:53 +08:00
youkaichao
2b83503227 [misc] fix cross-node TP (#12166)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-18 10:53:27 +08:00
youkaichao
7b98a65ae6 [torch.compile] disable logging when cache is disabled (#12043)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2025-01-17 20:29:31 +00:00
Gregory Shtrasberg
b5b57e301e [AMD][FP8] Using MI300 FP8 format on ROCm for block_quant (#12134)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2025-01-17 17:12:26 +00:00
Kunshang Ji
54cacf008f [Bugfix] Mistral tokenizer encode accept list of str (#12149)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-01-17 16:47:53 +00:00