Michael Goin
5873877241
[Bugfix] Mistral tool calling when content is list ( #18729 )
...
Create Release / Create Release (push) Has been cancelled
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-27 09:05:37 -07:00
Cyrus Leung
696259ca01
[Core] Automatically cast multi-modal input dtype ( #18756 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 23:45:48 +08:00
chunxiaozheng
6b6d496114
optimize get_kv_cache_torch_dtype ( #18531 )
...
Signed-off-by: idellzheng <idellzheng@tencent.com >
2025-05-27 13:08:44 +00:00
cascade
aaa4ac1c95
Disable prefix cache by default for benchmark ( #18639 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-27 20:06:34 +08:00
Mark McLoughlin
06a0338015
[V1][Metrics] Add API for accessing in-memory Prometheus metrics ( #17010 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-27 09:37:06 +00:00
Cyrus Leung
4318c0559d
[CI/Build] Remove imports of built-in re ( #18750 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 09:19:18 +00:00
Hyogeun Oh (오효근)
a68e293cb9
[Doc] Convert Sphinx directives ( {class}, {meth}, {attr}, ...) to MkDocs format for better documentation linking ( #18663 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-27 01:44:20 -07:00
Shawn Huang
6881107948
[BUG FIX] minicpm ( #18739 )
...
Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com >
Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com >
2025-05-27 01:04:49 -07:00
Kebe
e0f0ff87b8
[Build] fix cpu build missing libtbbmalloc.so ( #18744 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-05-27 01:03:56 -07:00
maobaolong
c24b1572ac
Minor fix about MooncakeStoreConnector ( #18721 )
...
Signed-off-by: baoloongmao <baoloongmao@tencent.com >
2025-05-27 08:02:28 +00:00
Calvin Chen
4693a3438c
[Doc] cleanup deprecated flag for doc ( #18715 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-27 07:12:02 +00:00
Łukasz Durejko
bbd9a84dc5
[Hardware][Intel-Gaudi] [CI/Build] Fix multiple containers using the same name in run-hpu-test.sh ( #18752 )
...
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai >
2025-05-27 00:10:26 -07:00
almersawi
a547aeb828
feat(rocm-support): support mamba2 on rocm ( #18565 )
...
Signed-off-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
Co-authored-by: Islam Almersawi <islam.almersawi@openinnovation.ai >
2025-05-27 00:07:53 -07:00
Reid
fc6d0c290f
[Misc] improve docs ( #18734 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-27 07:07:01 +00:00
Cyrus Leung
753944fa9b
[Doc] Update reproducibility doc and example ( #18741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 07:03:13 +00:00
Cyrus Leung
25a817f202
[Doc] Update OOT model docs ( #18742 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-27 06:30:31 +00:00
vllmellm
d260f799a9
[FEAT] [ROCm] Upgrade AITER Fused MoE kernels. ( #18271 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-26 23:14:07 -07:00
Lukas Geiger
b50602d5f0
[Model][Gemma3] Cast image pixel values already on CPU ( #18732 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-27 05:42:54 +00:00
Isotr0py
1f1b1bc03b
[V1][Quantization] Add CUDA graph compatible v1 GGUF support ( #18646 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-27 04:40:28 +00:00
Reid
1f88dbd2bb
[Misc] improve web section group title display ( #18684 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-27 04:35:16 +00:00
Lukas Geiger
0eebd74842
[Model][Gemma3] Simplify image input validation ( #18710 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-27 11:13:37 +08:00
Harry Mellor
27bebcd897
Convert examples to ruff-format ( #18400 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-26 16:57:54 +00:00
Lukas Geiger
e7523c2e03
[V1][Sampler] Improve performance of FlashInfer sampling by sampling logits instead of probs ( #18608 )
2025-05-26 11:49:36 -04:00
Cyrus Leung
a869baca73
[Bugfix] Fix Llama GGUF initialization ( #18717 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:49:22 -07:00
Cyrus Leung
82e2339b06
[Doc] Move examples and further reorganize user guide ( #18666 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:38:04 -07:00
Cyrus Leung
9553fdb41e
[Doc] Improve API docs ( #18713 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 07:33:34 -07:00
dylan
243eb9199f
[Bugfix]: handle hf-xet CAS error when loading Qwen3 weights in vLLM ( #18701 )
2025-05-26 07:10:56 -07:00
Reid
0665e29998
[Misc] add AutoGen integration ( #18712 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-26 13:56:18 +00:00
Łukasz Durejko
e76be06550
[Hardware][Intel-Gaudi] [CI/Build] Add tensor parallel size = 2 test to HPU CI ( #18709 )
...
Signed-off-by: Lukasz Durejko <ldurejko@habana.ai >
2025-05-26 05:26:07 -07:00
Isotr0py
0877750029
[CI/Build] Split pooling and generation extended language models tests in CI ( #18705 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-26 04:00:08 -07:00
Naveassaf
6d68030f1c
[Model] Add support for YARN in NemotronNAS models ( #18427 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com >
2025-05-26 10:31:49 +00:00
Ning Xie
5a2c76cbe1
[CI] fix dump_input for str type ( #18697 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-26 18:23:35 +08:00
Cyrus Leung
38b13dfe78
[CI/Build] Replace math.isclose with pytest.approx ( #18703 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 02:05:17 -07:00
Cyrus Leung
61a45e7a72
[Bugfix] Fix Mistral-format models with sliding window ( #18693 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 01:44:04 -07:00
Cyrus Leung
65523a0995
[Doc] Fix issue template format ( #18699 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 00:45:39 -07:00
Cyrus Leung
4b7740a105
[GH] Add issue template for reporting CI failures ( #18696 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-26 00:42:04 -07:00
Ning Xie
4ea62c0ea0
[CI] add missing argument ( #18694 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-26 00:22:04 -07:00
Maximilien de Bayser
561b77a0d6
[Bugfix] Fix the lm_head in gpt_bigcode in lora mode ( #6357 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Signed-off-by: Max de Bayser <maxdebayser@gmail.com >
2025-05-26 14:52:25 +08:00
CYJiang
abd4030d94
refactor: simplify request handler, use positive condition check for handler assignment ( #18690 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-26 06:32:28 +00:00
AlexZhao
8820821b59
[Misc] Fixed the abnormally high TTFT issue in the PD disaggregation example ( #18644 )
...
Signed-off-by: zhaohaidao <zhaohaidao2008@hotmail.com >
Signed-off-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
Co-authored-by: zhaohaiyuan <zhaohaiyuan@xiaohongshu.com >
2025-05-26 13:51:27 +08:00
Cyrus Leung
fba0642704
[CI/Build][Doc] Update gte-Qwen2-1.5B-instruct usage ( #18683 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-25 20:27:50 -07:00
Lukas Geiger
6071e989df
[Core][Multimodal] Convert PIL Image to array without data copy when hashing ( #18682 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-25 17:33:35 +00:00
Cyrus Leung
57fd13a707
[Bugfix] Fix profiling dummy data for Pixtral ( #18677 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-25 14:05:30 +00:00
Reid
3a886bd58c
[Misc] small improve ( #18680 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 06:05:38 -07:00
Reid
35be8fad62
[CI/build] fix no regex ( #18676 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 10:10:51 +00:00
Yuqi Zhang
f2faac745d
[Bugfix] Fix cpu usage and cache hit stats reporting on cpu environment ( #18674 )
...
Signed-off-by: zzzyq <zhangyuqi94@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-25 02:36:06 -07:00
Reid
279f854519
[doc] improve readability ( #18675 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 01:40:31 -07:00
Reid
624b77a2b3
[doc] fix broken links ( #18671 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-25 01:36:33 -07:00
Cyrus Leung
503f8487c2
[Misc] Reduce logs on startup ( #18649 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 23:03:53 -07:00
Ning Xie
44073a7ac3
[BUGFIX] catch subclass first for try...except ( #18672 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-25 05:34:24 +00:00
Michael Goin
63934543a0
Speed up the kernels/quantization/ tests ( #18669 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-25 05:02:59 +00:00
Isotr0py
75f81750f3
[VLM] Initialize video input support for InternVL models ( #18499 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-25 04:51:25 +00:00
Mengqing Cao
6ab681bcbe
[Misc][ModelScope] Change to use runtime VLLM_USE_MODELSCOPE ( #18655 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-25 04:51:21 +00:00
Chenguang Li
cebc22f3b6
[Misc]Replace cuda hard code with current_platform in Ray ( #14668 )
...
Signed-off-by: noemotiovon <757486878@qq.com >
2025-05-24 20:26:31 -07:00
Ning Xie
6c6dcd8611
[MISC] correct signature for LoaderFunction ( #18670 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-24 20:17:47 -07:00
Seiji Eicher
7891fdf0c6
[V1] Fix _pickle.PicklingError: Can't pickle <class 'transformers_modules.deepseek-ai.DeepSeek-V2-Lite... ( #18640 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-05-24 20:07:20 -07:00
Woosuk Kwon
6825d9a998
[BugFix][Spec Decode] Improve Prefix Caching Logic in Speculative Decoding ( #18668 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-24 17:33:46 -07:00
Reid
b554ab736e
[CI/Build] fix permission denied issue ( #18645 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-24 16:09:10 +00:00
Aaron Pham
9ea7f1abf3
fix(regression): clone from reference items ( #18662 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-24 15:25:20 +00:00
Aaron Pham
2807271c86
[CI] enforce import regex instead of re ( #18665 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-24 08:04:14 -07:00
wangxiyuan
b9018a3f9f
[BugFix] Fix import error for fused_moe ( #18642 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-05-24 07:53:36 -07:00
Ning Xie
4ceafb6299
[MISC] typo fix and clean import ( #18664 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-24 07:52:09 -07:00
Cyrus Leung
2e6705784f
[CI/Build] chmod +x to cleanup_pr_body.sh ( #18650 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:26:45 -07:00
Cyrus Leung
1cb194a018
[Doc] Reorganize user guide ( #18661 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 07:25:33 -07:00
ztang2370
2cd4d58df4
[Model] use AutoWeightsLoader for gpt2 ( #18625 )
...
Signed-off-by: zt2370 <ztang2370@gmail.com >
2025-05-24 13:36:13 +00:00
Cyrus Leung
6d166a8d35
[Doc] Add community links ( #18657 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 06:06:38 -07:00
Cyrus Leung
ef1dd6870f
[Doc] Fix indentation problems in V0 Paged Attention docs ( #18659 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 06:06:35 -07:00
Mengqing Cao
e77dc4bad8
[MISC][pre-commit] Add pre-commit check for triton import ( #17716 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-24 20:09:15 +08:00
Cyrus Leung
07458a51ce
[Doc] Update README links, mark external links ( #18635 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-24 09:57:15 +00:00
qizixi
c1e4a4052d
[V1][Spec Decode] Support multi-layer eagle draft model ( #18030 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-24 09:45:34 +00:00
Yuanhao WU
a859320575
[Model] Add support for Qwen2.5-Omni-7B-AWQ (Qwen2_5OmniForConditionalGeneration) ( #18647 )
2025-05-24 09:15:36 +00:00
Reid
441dc63ac7
[Frontend] improve vllm serve --help display ( #18643 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-24 07:53:22 +00:00
qizixi
d55e446d13
[V1][Spec Decode] Small refactors to improve eagle bookkeeping performance ( #18424 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-24 06:51:22 +00:00
Wenhua Cheng
ec82c3e388
FIX MOE issue in AutoRound format ( #18586 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-23 22:01:40 -07:00
Mathieu Borderé
45ab403a1f
config.py: Clarify that only local GGUF checkpoints are supported. ( #18623 )
...
Signed-off-by: Mathieu Bordere <mathieu@letmetweakit.com >
2025-05-24 08:46:34 +08:00
Robert Shaw
2b10ba7491
[Bugfix][Nixl] Fix Preemption Bug ( #18631 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-23 23:30:16 +00:00
Feng XiaoLong
4fc1bf813a
[Bugfix] Migrate to REGEX Library to prevent catastrophic backtracking ( #18454 )
...
Signed-off-by: Crucifixion-Fxl <xmufxl@gmail.com >
Co-authored-by: Crucifixion-Fxl <xmufxl@gmail.com >
2025-05-23 16:16:26 -07:00
Pavani Majety
f2036734fb
[ModelOpt] Introduce VLLM_MAX_TOKENS_PER_EXPERT_FP4_MOE env var to control blockscale tensor allocation ( #18160 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-05-23 15:52:20 -07:00
Cyrus Leung
7d9216495c
[Doc] Update references to doc files ( #18637 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 15:49:21 -07:00
Michael Goin
0ddf88e16e
[CI] Enable test_initialization to run on V1 ( #16736 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 15:09:44 -07:00
Huy Do
1645b60196
Use prebuilt FlashInfer x86_64 PyTorch 2.7 CUDA 12.8 wheel for CI ( #18537 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-05-23 21:17:16 +00:00
Jiayi Yao
2628a69e35
[V1] Support Deepseek MTP ( #18435 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn >
Co-authored-by: Rui Qiao <ruisearch42@gmail.com >
2025-05-23 10:26:28 -07:00
Cyrus Leung
371f7e4ca2
[Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar ( #18627 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 10:22:40 -07:00
Cyrus Leung
15b45ffb9a
[Doc] Avoid documenting dynamic / internal modules ( #18626 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 09:58:02 -07:00
Cyrus Leung
273cb3b4d9
[Doc] Fix top-level API links/docs ( #18621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 09:46:56 -07:00
David Xia
8ddd1cf26a
[Doc] fix list formatting ( #18624 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-23 09:41:17 -07:00
Chen Zhang
6550114c9c
[v1] Redo "Support multiple KV cache groups in GPU model runner ( #17945 )" ( #18593 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-23 09:39:47 -07:00
Michael Goin
9520a989df
[Docs] Change mkdocs to not use directory urls ( #18622 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 09:33:21 -07:00
Harry Mellor
3d28ad343f
Fix figures in design doc ( #18612 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 09:09:54 -07:00
youkaichao
6a7988c55b
Refactor pplx init logic to make it modular (prepare for deepep) ( #18200 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-23 23:43:43 +08:00
Cyrus Leung
022d8abe29
[Doc] Use a different color for the announcement ( #18616 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 08:25:03 -07:00
Hyogeun Oh (오효근)
5221815a00
[Doc] Fix markdown list indentation for MkDocs rendering ( #18620 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-23 08:23:21 -07:00
Simon Mo
1068556b2c
[Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS ( #18579 )
2025-05-23 07:43:58 -07:00
Reid
2cd1fa4556
[Misc] add Haystack integration ( #18601 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-23 06:21:19 -07:00
Harry Mellor
d4c2919760
Include private attributes in API documentation ( #18614 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 06:18:31 -07:00
Tristan Leclercq
6220f3c6b0
[Bugfix] Fix transformers model impl ignored for mixtral quant ( #18602 )
...
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com >
2025-05-23 05:54:13 -07:00
Harry Mellor
52fb23f47e
Fix examples with code blocks in docs ( #18609 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 05:53:44 -07:00
Cyrus Leung
6dd51c7ef1
[CI/Build] Fix V1 flag being set in entrypoints tests ( #18598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 05:51:53 -07:00
Harry Mellor
2edb533af2
Replace {func} with mkdocs style links ( #18610 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 05:51:38 -07:00
Hyogeun Oh (오효근)
38a95cb4a8
[Doc] Fix indent of contributing to vllm ( #18611 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-05-23 05:50:07 -07:00
Ning Xie
cd821ea5d2
[CI] fix kv_cache_type argument ( #18594 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-23 04:49:18 -07:00
Kay Yan
7ab056c273
[Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt ( #18542 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-05-23 04:38:42 -07:00
Harry Mellor
6526e05111
Add myself as docs code owner ( #18605 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 04:08:31 -07:00
Madeesh Kannan
e493e48524
[V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled ( #17731 )
...
Signed-off-by: Madeesh Kannan <shadeMe@users.noreply.github.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-23 03:38:23 -07:00
Mengqing Cao
4ce64e2df4
[Bugfix][Model] Fix baichuan model loader for tp ( #18597 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-23 02:39:05 -07:00
Cyrus Leung
fbb13a2c15
Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )" ( #18600 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-23 02:18:22 -07:00
Harry Mellor
a1fe24d961
Migrate docs from Sphinx to MkDocs ( #18145 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 02:09:53 -07:00
Yuqi Zhang
d0bc2f810b
[Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform ( #18430 )
...
Signed-off-by: Yuqi Zhang <yuqizhang@google.com >
Co-authored-by: Yuqi Zhang <yuqizhang@google.com >
2025-05-23 01:41:37 -07:00
Chauncey
b046cf792d
[Feature][V1]: suupports cached_tokens in response usage ( #18149 )
...
Co-authored-by: simon-mo <xmo@berkeley.edu >
2025-05-23 01:41:03 -07:00
Michael Goin
54af915949
[Doc] Update quickstart and install for cu128 using --torch-backend=auto ( #18505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-23 08:36:37 +00:00
cascade
71ea614d4a
[Feature]Add async tensor parallelism using compilation pass ( #17882 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-23 01:03:34 -07:00
RonaldBXu
4c611348a7
[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal ( #18034 )
...
Signed-off-by: Ronald Xu <ronaldxu@amazon.com >
2025-05-23 00:37:18 -07:00
Ning Xie
60cad94b86
[Hardware] correct method signatures for HPU,ROCm,XPU ( #18551 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-22 22:31:59 -07:00
Shanshan Shen
9c1baa5bc6
[Misc] Replace cuda hard code with current_platform ( #16983 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-05-23 04:38:50 +00:00
Teruaki Ishizaki
4be2255c81
[Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key ( #17291 )
...
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com >
2025-05-23 12:30:47 +08:00
aws-elaineyz
ed5d408255
[Neuron] Remove bypass on EAGLEConfig and add a test ( #18514 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
2025-05-22 21:26:32 -07:00
Benjamin Chislett
583507d130
[Spec Decode] Make EAGLE3 draft token ID mapping optional ( #18488 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-22 20:17:39 -07:00
lkchen
e44d8ce8c7
[Bugfix] Set KVTransferConfig.engine_id in post_init ( #18576 )
...
Signed-off-by: Linkun Chen <github@lkchen.net >
2025-05-23 02:54:42 +00:00
Nick Hill
93ecb8139c
[BugFix] Increase TP execute_model timeout ( #18558 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-23 10:22:11 +08:00
CYJiang
fae453f8ce
[Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs ( #18482 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-23 10:15:32 +08:00
Harry Mellor
4b0da7b60e
Enable hybrid attention models for Transformers backend ( #18494 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-23 10:12:08 +08:00
Mark McLoughlin
c6b636f9fb
[V1][Spec Decoding] Use model_loader.get_model() to load models ( #18273 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-23 02:05:44 +00:00
Chenheli Hua
04eb88dc80
Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. ( #18569 )
...
Signed-off-by: Chenheli Hua <huachenheli@outlook.com >
2025-05-23 01:59:18 +00:00
rasmith
46791e1b4b
[AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh ( #18568 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-05-22 18:45:35 -07:00
Sanger Steel
c32e249a23
[Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization ( #17926 )
...
Signed-off-by: Sanger Steel <sangersteel@gmail.com >
2025-05-22 18:44:18 -07:00
Kai Wu
c91fe7b1b9
[Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser ( #17917 )
...
Signed-off-by: Kai Wu <kaiwu@meta.com >
2025-05-22 16:44:08 -07:00
Ekagra Ranjan
a04720bc36
[V1][Spec Decode][Bugfix] Load quantize weights for EAGLE ( #18290 )
2025-05-22 15:17:33 -07:00
lkchen
7b9d832c80
[Tool] Add NIXL installation script ( #18172 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-22 14:33:16 -07:00
Tyler Michael Smith
6e588da0f4
[Build/CI] Fix CUDA 11.8 build ( #17679 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-22 12:13:54 -07:00
Mengqing Cao
f8d2cc5f55
[Compile][Platform] Make PiecewiseBackend pluggable and extendable ( #18076 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-05-22 12:11:53 -07:00
wangxiyuan
721fb9b181
[Platform] Move platform check to right place ( #18470 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-05-22 12:11:28 -07:00
David Xia
1f3a1200e4
[Bugfix] make test_openai_schema.py pass ( #18224 )
...
Signed-off-by: David Xia <david@davidxia.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-22 18:34:06 +00:00
Lukas Geiger
54631f8262
[Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() ( #18347 )
...
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com >
2025-05-22 09:00:13 -07:00
Reid
cb506ecb5a
[Misc] improve Automatic Prefix Caching example ( #18554 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-22 14:50:46 +00:00
Li, Jiang
93f71673ce
[BugFix][CPU] Fix x86 SHM distributed module initialization ( #18536 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-05-22 07:35:00 -07:00
Calvin Chen
3f505233fd
[Doc] Add stream flag for chat completion example ( #18524 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-22 14:07:10 +00:00
Bowen Wang
4e04eceb58
[Bugfix] Use random hidden states in dummy sampler run ( #18543 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
2025-05-22 06:48:56 -07:00
CYJiang
71075029f2
[Doc] Support --stream arg in openai_completion_client.py script ( #18388 )
...
Signed-off-by: googs1025 <googs1025@gmail.com >
2025-05-22 13:20:17 +00:00
Harry Mellor
ca86a7cf6e
[CI/Build] Update bamba test model location ( #18544 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-22 06:01:07 -07:00
lkchen
a35a494745
[Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible ( #18513 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-22 05:24:43 -07:00
燃
f6037d1907
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18526 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-22 05:22:53 -07:00
aws-elaineyz
fa72f9a812
Order sequence ids + config update to support specifying custom quantization layers ( #18279 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Tailin Pan <tailinpa@amazon.com >
Co-authored-by: Rishabh Rajesh <rishyraj@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Maxwell Goldberg <mgld@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com >
2025-05-22 02:20:36 -07:00
aws-elaineyz
ebed81fbf5
Update default neuron config for speculation ( #18274 )
...
Signed-off-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Aakash Shetty <sheaak@amazon.com >
2025-05-22 02:18:55 -07:00
Satyajith Chilappagari
e2d7d31244
[Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) ( #18512 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-22 02:17:34 -07:00
Cyrus Leung
23b67b37b2
[Doc] Fix invalid JSON in example args ( #18527 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-22 07:11:46 +00:00
Jee Jee Li
db5a29ba19
[Bugfix] Fix LoRA test ( #18518 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-21 21:48:53 -07:00
Shane A
51797775c3
[Bugfix][Model] Make Olmo2Model weight loading return loaded weights ( #18504 )
...
Signed-off-by: Shane A <shanea@allenai.org >
2025-05-21 21:17:03 -07:00
Nick Hill
cf5984b2fe
[BugFix][DP] Send DP wave completion only from dp_rank==0 ( #18502 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: kourosh hakhamaneshi <kourosh@anyscale.com >
2025-05-21 20:25:25 -07:00
youngrok cha
d022115cc6
[Bugfix] Inconsistent token calculation compared to HF in llava family ( #18479 )
...
Signed-off-by: jaycha <jaycha@ncsoft.com >
2025-05-21 20:21:47 -07:00
Rabi Mishra
acb54ca8e1
Intialize io_thread_pool attribute in the beginning. ( #18331 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-21 20:21:14 -07:00
Russell Bryant
6e0fd34d3c
[CI] Fix race condition with StatelessProcessGroup.barrier ( #18506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-21 20:19:13 -07:00
Ning Xie
176d62e4ea
[MISC] update project urls in pyproject.toml ( #18519 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-21 20:17:34 -07:00
Dhia Eddine Rhaiem
20bd6f4d2e
[FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) ( #18500 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae >
2025-05-21 19:23:59 -07:00
Sebastian Schoennenbeck
1f079540db
[Bugfix] Consistent ascii handling in tool parsers ( #17704 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-05-21 20:41:23 +00:00
vllmellm
94d8ec8d2b
[FEAT][ROCm] Upgrade AITER MLA v1 backend ( #18338 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-05-21 10:34:28 -07:00
Mark McLoughlin
bb0a311213
Revert "[v1] Support multiple KV cache groups in GPU model runner ( #17945 ) ( #18459 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-21 10:25:23 -07:00
Hosang
dd5fa7e04f
[ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 ( #17004 )
...
Signed-off-by: Hosang Yoon <hosang.yoon@amd.com >
2025-05-21 08:35:00 -07:00
Hyogeun Oh (오효근)
2b16104557
[Misc] Update deprecation message for --enable-reasoning ( #18404 )
2025-05-21 07:33:11 -07:00
Kebe
371376f996
[Build] fix Dockerfile shell ( #18402 )
2025-05-21 07:32:06 -07:00
bnellnm
c6c10ca920
[Bugfix] Reduce moe_sum test size to avoid OOM ( #18484 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-05-21 06:46:39 -07:00
GiantCroc
c154d89306
[Doc] fix arg docstring in linear layers ( #18410 )
...
Signed-off-by: giantcroc <1204449533@qq.com >
2025-05-21 06:45:57 -07:00
Dhia Eddine Rhaiem
eca18691d2
[MODEL] FalconH1 ( #18406 )
...
Signed-off-by: dhia.rhaiem <dhia.rhaiem@tii.ae >
Co-authored-by: younesbelkada <younesbelkada@gmail.com >
Co-authored-by: Ilyas Chahed <ilyas.chahed@tii.ae >
Co-authored-by: Jingwei Zuo <jingwei.zuo@tii.ae >
2025-05-21 04:59:06 -07:00
Rabi Mishra
61acfc45bc
[Bugfix][Failing Test] Fix test_events.py ( #18460 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2025-05-21 04:57:28 -07:00
Reid
107f5fc4cb
[Misc] refactor disaggregated-prefill-v1 example ( #18474 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-21 11:10:14 +00:00
Yong Hoon Shin
907f935de9
[V1] Fix general plugins not loaded in engine for multiproc ( #18326 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-21 01:21:49 -07:00
Kebe
5d7f545204
[Frontend] deprecate --device arg ( #18399 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-05-21 01:21:17 -07:00
Nicolò Lucchesi
cd8dfc6dfc
[Misc] MultiConnector._connectors type ( #18423 )
...
Signed-off-by: nicklucche <nlucches@redhat.com >
2025-05-20 22:48:43 -07:00
wwl2755
d06dd72ba9
[Bugfix][Failing Test] Fix nixl connector test when promt size < block size ( #18429 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-20 22:41:44 -07:00
Cyrus Leung
ad0012a0ac
Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )" ( #18456 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-20 22:39:22 -07:00
bnellnm
92247c522e
[Bug] Fix moe_sum signature ( #18440 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-05-20 22:37:08 -07:00
Gregory Shtrasberg
0c15c2e486
[Bugfix] config.head_dim is now explicitly set to None ( #18432 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-20 21:04:33 -07:00
Michael Goin
3b17ea26e4
[TPU] Re-enable the Pallas MoE kernel ( #18025 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-05-20 19:52:27 -07:00
Dilip Gowda Bhagavan
23baa2180b
fix:Build torch wheel inline rather than picking from nightly ( #18351 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
2025-05-20 22:22:24 +00:00
Percy
980a172474
[Kernel] update comment for KV shape in unified triton attn ( #18099 )
...
Signed-off-by: haochengxia <xhc_1007@163.com >
2025-05-20 11:19:34 -07:00
Calvin Chen
e1f5a71ed7
[Model] use AutoWeightsLoader for bloom ( #18300 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-20 09:40:05 -07:00
Michael Goin
f4a8a37465
[Minor] Rename quantization nvfp4 to modelopt_fp4 ( #18356 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-20 09:08:37 -07:00
Reid
8f55962a7f
[Misc] refactor prompt embedding examples ( #18405 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-20 15:26:12 +00:00
燃
be48360c1f
[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text ( #18407 )
...
Co-authored-by: 松灵 <wpf272043@alibaba-inc.com >
2025-05-20 06:59:48 -07:00
wang.yuqi
86847700d7
[CI] Add mteb testing to test the accuracy of the embedding model ( #17175 )
2025-05-20 06:51:12 -07:00
汪志鹏
d6c86d09ae
Update cpu.txt ( #18398 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
2025-05-20 10:53:23 +00:00
Jee Jee Li
6b35cb10a0
[Misc] Add LoRA code owner ( #18387 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-20 03:27:30 -07:00
Reid
1b1e8e05ff
[doc] update env variable export ( #18391 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-20 08:53:27 +00:00
Random Fly
bca55b556f
[Bugfix] fix adding bias twice in ipex GPTQ quantization ( #18363 )
...
Signed-off-by: rand-fly <randfly@outlook.com >
2025-05-20 00:54:33 -07:00
Kevin H. Luu
d981396778
[release] Change dockerhub username for TPU release ( #18389 )
2025-05-19 23:49:23 -07:00
Nan Qin
9609327fa4
[Core] [Bugfix]: tensor parallel with prompt embeds ( #18171 )
...
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
2025-05-19 20:21:27 -07:00
Isotr0py
f07a673eb2
[Misc] Allow AutoWeightsLoader to skip loading weights with specific substr in name ( #18358 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-19 20:20:12 -07:00
Liangfu Chen
d565e0976f
[neuron] fix authorization issue ( #18364 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-05-19 23:30:32 +00:00
Lucia Fang
258bf621d5
fix CUDA_check redefinition in #17918 ( #18287 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-05-19 13:42:35 -07:00
Satyajith Chilappagari
dc1440cf9f
Neuron up mistral ( #18222 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
2025-05-19 09:54:47 -07:00
Gong Shufan
8171221834
[Misc] Fix typo ( #18330 )
2025-05-19 09:51:01 -07:00
sunyicode0012
7937c2fd52
Add files via uploadAdd fused MoE kernel tuning configs (fp8_w8a8) for DeepSeek V3/R1 on a single-node 8x NVIDIA H20 96GB setup ( #18337 )
2025-05-19 09:49:57 -07:00
Wenhua Cheng
e2ee1e8e9e
[Feature]Add support for models quantized with AutoRound ( #17850 )
...
Signed-off-by: wenhuach21 <wenhua.cheng@intel.com >
2025-05-19 09:38:53 -07:00
Reid
20d8ce81eb
[Frontend] add --quick option for vllm chat/complete ( #18297 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-19 09:36:13 -07:00
Elad Segal
84ab4feb7e
[Doc] Fix typo ( #18355 )
2025-05-19 16:05:16 +00:00
Jee Jee Li
6781af5608
[Quantization] Pool model support bitsandbytes ( #18087 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-19 09:03:43 -07:00
Nick Hill
1b15df2546
[BugFix] Fix handling of num_computed_tokens with connector ( #18232 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
2025-05-19 09:03:25 -07:00
Cyrus Leung
43b5f61dce
[Doc] Move input-related docs to Features ( #18353 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-19 15:08:39 +00:00
Li Wang
c5bb0ebdc6
[Doc] Fix prompt embedding examples ( #18350 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-05-19 06:48:16 -07:00
Shaoyu Yang
d637b96099
[BugFix] [Vul] Add missing usedforsecurity=False in MD5 hashing to enable FIPS ( #18319 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: shaoyuyoung <shaoyuyoung@gmail.com >
Co-authored-by: cascade <cascade812@outlook.com >
2025-05-19 01:31:23 -07:00
CYJiang
275c5daeb0
fix: Add type specifications for CLI arguments in tensorizer options ( #18314 )
2025-05-18 23:42:17 -07:00
Simon Mo
47fda6d089
[Build] Supports CUDA 12.6 and 11.8 after Blackwell Update ( #18316 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-05-18 23:19:33 -07:00
Reid
27d0952600
[Misc] extract parser.parse_args() ( #18323 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-19 04:06:26 +00:00
Nan Qin
221cfc2fea
Feature/vllm/input embedding completion api ( #17590 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: Nan2018 <nan@protopia.ai >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Andrew Sansom <andrew@protopia.ai >
Co-authored-by: Andrew Sansom <qthequartermasterman@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-18 20:18:05 -07:00
wwl2755
9da1095daf
[Spec Decode][V0] Fix spec decode correctness test in V0 eagle/medusa ( #18175 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-18 19:49:46 -07:00
Robin
d1211f8794
[Doc] Add doc to explain the usage of Qwen3 thinking ( #18291 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-05-18 23:04:07 +00:00
Reid
b6a6e7a529
[Misc] add litellm integration ( #18320 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-18 15:32:30 +00:00
Lifu Huang
4fb349f66a
Fix copy-paste error in phi4mm image processing ( #18315 )
...
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com >
2025-05-18 07:00:12 -07:00
22quinn
908733aca7
[Model] Use sigmoid for single-label classification ( #18313 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-05-18 07:00:09 -07:00
Reid
1a8f68bb90
[doc] update reasoning doc ( #18306 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-18 06:59:14 -07:00
cascade
9ab2c02ff8
Support sequence parallelism combined with pipeline parallelism ( #18243 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
2025-05-17 22:47:25 +00:00
Ning Xie
66e63e86ec
[MISC] fix typo ( #18305 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-17 10:52:09 -07:00
rongfu.leng
9214e60631
[Model] use AutoWeightsLoader for solar ( #18113 )
2025-05-17 00:24:17 -07:00
Nishidha
f880d42582
Fixed build on ppc64le due to openssl conflicts ( #18262 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-05-17 00:23:46 -07:00
Michael Goin
dcfe95234c
Update Dockerfile to build for Blackwell ( #18095 )
2025-05-17 00:23:25 -07:00
Siyuan Liu
48ac2bed5b
[Hardware][TPU] Optionally import for TPU backend ( #18269 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
Co-authored-by: Carol Zheng <cazheng@google.com >
Co-authored-by: Jade Zheng <zheng.shoujian@outlook.com >
Co-authored-by: Hongmin Fan <fanhongmin@google.com >
2025-05-17 15:23:12 +08:00
David Ben-David
3e0d435027
[P/D][V1] Support dynamic loading of external KV connector implementations ( #18142 )
...
Signed-off-by: David Ben-David <davidb@pliops.com >
Co-authored-by: David Ben-David <davidb@pliops.com >
2025-05-17 06:40:39 +00:00
汪志鹏
4ee4826ede
[BugFix] Correct max_model_len derivation from config.json for Mistral format ( #17937 )
...
Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com >
Co-authored-by: tracelogfb <48808670+tracelogfb@users.noreply.github.com >
Co-authored-by: Stephen Chen <tracelog@meta.com >
2025-05-17 04:20:13 +00:00
Reid
60017dc841
[Misc] reformat the collect-env output ( #18285 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-16 19:46:18 -07:00
Trevor Royer
55f1a468d9
Move cli args docs to its own page ( #18228 ) ( #18264 )
...
Signed-off-by: Trevor Royer <troyer@redhat.com >
2025-05-16 19:43:45 -07:00
Michael Goin
fd195b194e
[V1][P/D] Local attention optimization for NIXL ( #18170 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-16 21:16:33 -04:00
Woosuk Kwon
fabe89bbc4
[Spec Decode] Don't fall back to V0 when spec decoding is enabled ( #18265 )
2025-05-16 16:10:27 -07:00
Jinzhen Lin
e73b7dfd69
[Bugfix] fix an illegal memory access was encountered of marlin kernel + act_order ( #18245 )
2025-05-16 16:02:44 -07:00
Bowen Wang
7fdfa01530
[Sampler] Adapt to FlashInfer 0.2.3 sampler API ( #15777 )
...
Signed-off-by: Bowen Wang <abmfy@icloud.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-05-16 15:14:03 -07:00
Sanger Steel
aef94c6d07
[CI] Assign reviewer to mergify with changes to Tensorizer files ( #18278 )
2025-05-16 12:04:14 -07:00
Nick Hill
0ceaebf87b
[BugFix] Fix ordering of KVConnector finished send/rcv sets ( #18211 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-16 09:20:54 -07:00
Nick Hill
1db4f47f81
[BugFix] Fix multi async save in MultiConnector ( #18246 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-16 08:13:47 -07:00
Reid
d3d91b6f71
[Misc][MacOS] fix bfloat16 error ( #18249 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-16 15:05:59 +00:00
learner0810
87d871470d
[Model] Use autoweightloader for dbrx ( #18251 )
...
Signed-off-by: learner0810 <zhongjun.li@daocloud.io >
2025-05-16 07:54:13 -07:00
fxmarty-amd
a5f8c111c2
[Fix] Fix typo in resolve_hf_chat_template ( #18259 )
...
Signed-off-by: Felix Marty <felmarty@amd.com >
2025-05-16 14:52:41 +00:00
Lain
e23564cb70
use ceil_div in cutlass block scaling shape check ( #17918 )
2025-05-16 03:02:58 -07:00
Isotr0py
390ec88905
[Misc] Consolidate Audio tests into multimodal common generation tests ( #18214 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-16 09:18:08 +00:00
Seiji Eicher
541817670c
[Misc] Add Ray Prometheus logger to V1 ( #17925 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2025-05-16 01:02:42 -07:00
Vadim Gimpelson
67da5720d4
[PERF] Speed up Qwen2.5-VL model by speed up rotary position embedding ( #17973 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@centml.ai >
2025-05-15 23:31:02 -07:00
David Xia
5c04bb8b86
[doc] fix multimodal example script ( #18089 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-16 06:05:34 +00:00
Lucia Fang
3d2779c29a
[Feature] Support Pipeline Parallism in torchrun SPMD offline inference for V1 ( #17827 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-05-15 22:28:27 -07:00
Will Eaton
6b31c84aff
Throw better error for when running into k8s service discovery issue ( #18209 )
...
Signed-off-by: Will Eaton <weaton@redhat.com >
2025-05-15 21:07:28 -07:00
Harry Mellor
b18201fe06
Allow users to pass arbitrary JSON keys from CLI ( #18208 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-15 21:05:34 -07:00
Sky Lee
f4937a51c1
[Model] vLLM v1 supports Medusa ( #17956 )
...
Signed-off-by: lisiqi23 <lisiqi23@xiaomi.com >
Signed-off-by: skylee-01 <497627264@qq.com >
Co-authored-by: lisiqi23 <lisiqi23@xiaomi.com >
2025-05-15 21:05:31 -07:00
kliuae
ee659e3b60
[Bugfix][ROCm] Use chunked_prefill_paged_decode as fallback for V1 attention on ROCm ( #18093 )
...
Signed-off-by: kf <kuanfu.liu@embeddedllm.com >
2025-05-15 19:30:17 -07:00
Lucas Wilkinson
4e1c6a0264
[Bugfix] fix rotary embedding test for _get_padded_tensor_shape ( #18229 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-16 01:32:45 +00:00
Lucas Wilkinson
c7852a6d9b
[Build] Allow shipping PTX on a per-file basis ( #18155 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-15 16:41:55 -07:00
Lucia Fang
8795eb9975
[Bugfix] Fix test_eagle test ( #18223 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-05-15 15:59:42 -07:00
Alexei-V-Ivanov-AMD
0b34593017
Adding "AMD: Tensorizer Test" to amdproduction. ( #18216 )
2025-05-15 11:01:25 -07:00
Nicolò Lucchesi
e3f3aee6f4
[Misc] Avoid cuda graph log when sizes still match ( #18202 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-05-15 09:59:38 -07:00
TJian
92540529c0
[Bugfix] [ROCm]: Remove assertion logic when using AITER fused moe in unquantizedMethod to reenable LLama4 BF16 ( #18205 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-15 09:53:18 -07:00
Zhonghua Deng
fadb8d5c2d
[Bugfix]Change the exception thrown by call_hf_processor from RuntimeError to ValueError ( #18181 )
...
Signed-off-by: Abatom <abzhonghua@gmail.com >
2025-05-15 09:01:47 -07:00
Sebastian Schoennenbeck
2aa5470ac5
[Frontend] Fix chat template content format detection ( #18190 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-05-15 09:00:21 -07:00
Harry Mellor
51ff154639
Improve examples rendering in docs and GitHub ( #18203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-15 15:57:49 +00:00
Alexei-V-Ivanov-AMD
566ec04c3d
Adding "Basic Models Test" and "Multi-Modal Models Test (Extended) 3" in AMD Pipeline ( #18106 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-15 08:49:23 -07:00
Thomas Parnell
01c22335ba
[Kernel] [V1] Fix performance regression for triton unified attention ( #18161 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-15 06:39:00 -07:00
hustxiayang
451da4bcbd
add tools into TokenizeChatRequest ( #18187 )
...
Signed-off-by: yangxia <yangxiast@gmail.com >
2025-05-15 04:01:49 -07:00
Harry Mellor
07ad27121f
Update deprecated type hinting in model_loader ( #18130 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-15 04:00:21 -07:00
omahs
a9944aabfa
fix: typos ( #18151 )
...
Signed-off-by: omahs <73983677+omahs@users.noreply.github.com >
2025-05-15 02:16:15 -07:00
Russell Bryant
a8f5aec20a
[V1] Update zmq socket creation in nixl connector ( #18148 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-14 23:17:57 -07:00
David Xia
de71fec81b
[CI] don't skip fixed test_kv_cache_events() ( #18183 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-14 23:17:16 -07:00
Mengqing Cao
70f8b96724
[Bugfix] Fix FusedMoEPrepareAndFinalize for cuda-disalike backends ( #18178 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-14 23:16:31 -07:00
inkcherry
dd2a94596a
[Model] Allow the use of sliding window in Qwen2 ( #17772 )
...
Signed-off-by: inkcherry <mingzhi.liu@intel.com >
2025-05-14 22:29:38 -07:00
Ning Xie
420caf7557
[UT] Add ut for none hash ( #17892 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-05-15 13:28:11 +08:00
Chenheli Hua
4f07a64075
Support custom implementations of VideoLoader backends. ( #18091 )
2025-05-15 13:26:49 +08:00
Thomas Parnell
e6b8e65d2d
[Bugfix] Fix fp8 tests for triton_unified_attention for Triton 3.3 ( #18013 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-15 13:26:34 +08:00
Harry Mellor
26d0419309
Update deprecated type hinting in models ( #18132 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 22:06:50 -07:00
Luka Govedič
83f74c698f
[Fix][ROCm] Enforce eager for all encoder-decoder models on ROCm ( #18154 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-05-14 22:04:43 -07:00
Reid
2dff093574
[Misc] add lobe-chat support ( #18177 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-15 05:02:23 +00:00
Aaron Pham
afe3236e90
[Chore] astral's ty ( #18116 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-15 05:00:43 +00:00
Mark McLoughlin
65334ef3b9
[V1][Metrics] Remove unused code ( #18158 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-14 20:13:17 -07:00
Chen Zhang
e60f550b38
[v1] Support multiple KV cache groups in GPU model runner ( #17945 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-14 18:54:54 -07:00
David Xia
f25e0d1125
[Bugfix]: make most of test_openai_schema.py pass ( #17664 )
2025-05-14 17:04:35 -07:00
Andrey Talman
09f106a91e
Upload vllm index for the rc builds ( #18173 )
2025-05-14 16:35:56 -07:00
Michael Goin
2142035b51
[V1] Support multiple kv connectors ( #17564 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-05-14 16:28:02 -07:00
Russell Bryant
78aa341d12
[CI] Fix race condition in test_kv_cache_events test ( #18169 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-14 16:27:48 -07:00
Jerry Zhang
7974736740
Add support for loading torchao models with AOPerModuleConfig ( #17826 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-05-14 16:24:59 -07:00
Aaron Pham
2fc9075b82
[V1] Structured Outputs + Thinking compatibility ( #16577 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-14 15:45:24 -07:00
Lucas Wilkinson
d93c976a0d
[Kernel] Have rotary embeddings support tensors ( #18046 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-14 15:43:55 -07:00
David Xia
749f792553
[Frontend] decrease import time of vllm.multimodal ( #18031 )
...
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com >
2025-05-14 15:43:32 -07:00
Robert Shaw
856865008e
[CI] Disable Failing Tests ( #18165 )
2025-05-14 13:49:56 -07:00
bnellnm
f9c069c85e
Modularize fused experts and integrate PPLX kernels ( #15956 )
2025-05-14 13:11:54 -07:00
Ekagra Ranjan
418d2f8bfb
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model ( #17326 )
...
Co-authored-by: root <root@ekagra-8xh100.us-east5-a .c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-14 12:31:46 -07:00
Chen Zhang
964472b966
[Doc] Update prefix cache metrics to counting tokens ( #18138 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-14 15:23:30 +00:00
Nick Hill
59dd311cf5
[KVConnector] Keep KVTransferParams as a dict ( #18033 )
2025-05-14 08:05:57 -07:00
Cyrus Leung
d066e52013
[Bugfix] Fix chat utils tests ( #18139 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-14 05:38:21 -07:00
Harry Mellor
c8ea982d9b
Update deprecated type hinting in platform, plugins, triton_utils, vllm_flash_attn ( #18129 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 05:28:16 -07:00
Harry Mellor
dc372b9c8a
Update deprecated type hinting in vllm/device_allocator and vllm/distributed ( #18126 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 04:07:57 -07:00
Harry Mellor
9b5b39b650
Update deprecated type hinting in vllm/lora ( #18128 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-14 03:57:59 -07:00
Reid
9ccc6ded42
[doc] add missing import ( #18133 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-14 10:57:34 +00:00
Cyrus Leung
d62a076e84
[Model] GritLM supports other attention backends ( #18109 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-14 03:33:19 -07:00
Jee Jee Li
259127f8b8
[Bugfix] Fix LoRA test ( #18123 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-14 10:25:47 +00:00
TJian
612c2edb4f
[FEAT] [ROCm]: Add AITER CK 2 Stages MoE support ( #17110 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-14 03:03:11 -07:00
Andrzej Kotłowski
38fe728d60
[Bugfix] Fix QKVCrossParallelLinear::sync_weight_attrs for PyTorch compile ( #17844 )
...
Signed-off-by: Andrzej Kotłowski <akotlowski@habana.ai >
2025-05-14 09:39:51 +00:00
rongfu.leng
82e7f9bb03
[Misc] replace does not exist model ( #18119 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-05-14 02:13:47 -07:00
Jee Jee Li
63dc3426e0
[Model] Add packed_modules_mapping for Qwen3-MOE ( #18118 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-14 02:13:19 -07:00
Cyrus Leung
8f5dc41481
[Bugfix] Fix entrypoints audio test failure ( #18111 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-14 09:08:07 +00:00
wang.yuqi
63ad622233
[New Model]: support GTE NewModel ( #17986 )
2025-05-14 01:31:31 -07:00
majianpeng
e7ef61c1f0
[Bugfix][Example] make lmcache v0 work. ( #18051 )
...
Signed-off-by: Ma, Jianpeng <jianpeng.ma@intel.com >
2025-05-13 23:43:44 -07:00
Jinzhen Lin
d4154c35a2
[Bugfix] fix moe marlin topk_weight loading ( #18080 )
...
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-05-13 23:31:57 -07:00
lkchen
6685890d11
[Fix] Move "model_config" as keyword args in chat_utils.py ( #18098 )
...
Signed-off-by: Linkun <github@lkchen.net >
2025-05-13 23:27:26 -07:00
Ecthlion_zyy
33011318c2
Fix broken example: examples/offline_inference/profiling at scheduler_config ( #18117 )
2025-05-13 23:19:14 -07:00
qli88
4f8b373225
[BugFix][AMD] Compatible patch for AITER lib after 04/20 ( #17912 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
2025-05-13 23:05:20 -07:00
Charlie Fu
7b2f28deba
[AMD][torch.compile] Enable silu+fp8_quant fusion for rocm ( #18082 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-05-13 22:13:56 -07:00
vllmellm
2d912fb66f
[FEAT] [ROCm] [V1]: Add AITER biased group topk for DeepSeekV3 ( #17955 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-13 22:03:47 -07:00
Michael Goin
12e6c0b41c
[Bugfix][V1] Fix FlashInfer V1 backend using the wrong VllmConfig ( #18086 )
2025-05-13 20:36:17 -07:00
Michael Goin
9a2a6357de
[Bugfix] Fix FP8 Marlin MoE and enable for compressed-tensors models ( #18026 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-13 19:48:33 -07:00
youkaichao
6266c57bae
[core][distributed] add ep group and all2all interface ( #18077 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-14 10:46:49 +08:00
Jon Gill
754b699cbe
[Bug]: Fix S3 model/tokenizer path resolution ( #18083 )
...
Signed-off-by: Jon Gill <jon@yurts.ai >
2025-05-13 19:34:17 -07:00
Roger Wang
6e27c6d86b
[Misc] Remove unused numpy tensor ( #18084 )
...
Signed-off-by: Roger Wang <hey@rogerw.me >
2025-05-13 19:33:40 -07:00
Nick Hill
d5af47a149
[P/D] Add some more debug logs to NixlConnector ( #18102 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-13 19:33:03 -07:00
Pavani Majety
65f0f74b66
[Hardware/NVIDIA/Modelopt] Fix modelopt forward method for v1 torch.compile ( #18101 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-05-13 19:33:00 -07:00
Luka Govedič
176a95c670
[Fix] Support CUDAGraph capture for encoder-decoder on ROCm ( #18104 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-05-13 19:31:42 -07:00
Chen Zhang
f2ae883b67
[v1][KVCacheManager] pass num_new_computed_tokens to kv cache manager ( #18001 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-13 19:09:39 -07:00
vllmellm
40de1ef455
[FEAT] [ROCm]: Add AITER Block-Scaled GEMM Feature ( #14968 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-13 19:08:20 -07:00
Russell Bryant
0189a65a2e
[Docs] Expand security doc with firewall info ( #18081 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-13 19:36:00 +00:00
Nick Hill
55aa7af994
[V1] DP scale-out (2/N): Decouple engine process management and comms ( #15977 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-13 10:48:21 -07:00
Harry Mellor
0b217da646
Update deprecated type hinting in vllm/adapter_commons ( #18073 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 08:32:51 -07:00
Harry Mellor
19324d660c
Update deprecated type hinting in vllm/compilation ( #18072 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 08:32:48 -07:00
Harry Mellor
fc407a1425
Give auto-merge label workflow permission to add labels to issues ( #18078 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 07:53:13 -07:00
Harry Mellor
009d9e7590
Convert benchmarks to ruff format ( #18068 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 13:43:29 +00:00
Cyrus Leung
b922c2ebd2
[Bugfix] Fix entrypoints metrics tests ( #18063 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-13 06:42:43 -07:00
Russell Bryant
00b14e0f16
[CI] set token permissions for pre-commit CI job ( #17729 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 13:38:30 +00:00
Russell Bryant
54e467e6f8
[CI] Add token permissions for add-ready-label CI job ( #17730 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 13:38:13 +00:00
Russell Bryant
79a1d25bbd
[CI] Add workflow permissions for helm CI job ( #17727 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 12:49:07 +00:00
Russell Bryant
9944011b30
[CI] Set token permissions for reminder comment CI job ( #17728 )
...
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
2025-05-13 12:46:58 +00:00
Harry Mellor
8c946cecca
Update deprecated type hinting in vllm/transformers_utils ( #18058 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 04:34:37 -07:00
Harry Mellor
ff334ca1cd
Update deprecated type hinting in vllm/profiler ( #18057 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 04:34:34 -07:00
Harry Mellor
6223dd8114
Update deprecated type hinting in model_executor/layers ( #18056 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 04:17:23 -07:00
Reid
906f0598fc
[doc] add download/list/delete HF model CLI usage ( #17940 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-13 11:15:51 +00:00
Aaron Pham
cb528d0585
[Fix] check to make sure processor has chat templates ( #18047 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-13 03:04:10 -07:00
Harry Mellor
98fcba1575
Convert .buildkite to ruff format ( #17656 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 09:28:31 +00:00
Russell Bryant
23b3134eb5
[Benchmarks] Refactor run_structured_output_benchmarks.sh ( #17722 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-13 01:47:29 -07:00
Michael Goin
ea6ae8cb45
[Bugfix] Fix marlin moe fallback logic for llama4 ( #18042 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-13 07:53:28 +00:00
Woosuk Kwon
2ff297dce9
[BugFix] Set default random seed to 0 for V1 ( #17929 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-13 07:52:19 +00:00
Jin Huang
8dd0671bac
[Bugfix][V1] Only get input embeddings w/ multi-modal models if first PP ( #17916 )
...
Signed-off-by: Jin Huang <jinhun@amazon.com >
Co-authored-by: Jin Huang <jinhun@amazon.com >
2025-05-13 15:10:07 +08:00
Chen Zhang
f0d610a8ae
[v1][KVCacheManager] Avoid full cache hit by controlling max_length ( #17999 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-13 06:50:38 +00:00
Driss Guessous
e57e4d6e9e
Fix Broken macro for cutlass moe ( #18049 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-05-12 23:31:06 -07:00
Nick Hill
ee5be834e7
[BugFix] Fix 4-GPU RLHF tests ( #18007 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-12 23:03:55 -07:00
Calvin Chen
48545728d8
cleanup invalid prints ( #18050 )
...
Signed-off-by: calvin chen <120380290@qq.com >
2025-05-12 23:01:57 -07:00
Chauncey
dc1a821768
[Feature][V1] Support tool_choice: required when using Xgrammar as the StructuredOutputBackend. ( #17845 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-12 23:01:31 -07:00
Cyrus Leung
61e0a506a3
[Bugfix] Avoid repeatedly creating dummy data during engine startup ( #17935 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-12 22:40:19 -07:00
Michael Goin
1df491c522
[Bugfix] Fixes for new marlin moe usage ( #18017 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-13 03:50:04 +00:00
Arjun Kathuria
d8487ef557
[ROCm]: Fix build from source failure with gcc14 and ROCm 6.3 ( #13779 )
...
Signed-off-by: Arjun Kathuria <arjun.kathuria8@gmail.com >
2025-05-12 20:36:33 -07:00
Jee Jee Li
c06af9a959
[Misc] Slight spelling modification ( #18039 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-12 20:36:27 -07:00
Tao He
60f7624334
Implements dual-chunk-flash-attn backend for dual chunk attention with sparse attention support ( #11844 )
2025-05-12 19:52:47 -07:00
hissu-hyvarinen
f6518b2b48
[ROCm] Skip tests for quantizations incompatible with ROCm ( #17905 )
...
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com >
2025-05-12 18:39:28 -06:00
Harry Mellor
d67085c2c8
Remove noisy warnings from SchedulerConfig ( #17995 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-13 00:33:45 +00:00
Michael Goin
307939f299
Use NVFP4 Marlin for CompressedTensorsW4A16Fp4 ( #18000 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com >
Co-authored-by: Dipika <dipikasikka1@gmail.com >
2025-05-12 18:07:34 -06:00
Harry Mellor
9d7ea9dbbf
Update some more deprecated type hinting ( #17998 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-12 23:49:33 +00:00
bwshen-mi
acee8f48aa
[Model] Support MiMo-7B inference with MTP ( #17433 )
...
Signed-off-by: wp-alpha <wangpeng66@xiaomi.com >
Co-authored-by: wangpeng66 <wangpeng66@xiaomi.com >
2025-05-12 23:25:33 +00:00
Michael Goin
f065de4e88
Fix FBGEMM integration ( #18002 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-12 23:02:07 +00:00
wwl2755
dc9905368d
[V1][Spec Decode] Eagle unit tests ( #17350 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-05-12 23:01:17 +00:00
Russell Bryant
ebab1ac37c
[CI] Make JSON output tests less likely to fail ( #17859 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-12 22:31:54 +00:00
Yang Wang
2b0db9b0e2
Enable standard language model for torhc nightly ( #18004 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-05-12 14:00:04 -07:00
Robert Shaw
195adb47c0
[Chore] Remove unused method ( #18024 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-12 13:59:47 -07:00
Chen Zhang
302f3aca7e
[v1][KVCacheManager] Change prefix caching metric from counting blocks to counting tokens ( #18003 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-12 13:46:12 -07:00
Alexei-V-Ivanov-AMD
e9c730c9bd
Enabling "Weight Loading Multiple GPU Test - Large Models" ( #18020 )
2025-05-12 13:05:33 -07:00
Jade Zheng
289199feb6
[Core] Use platform-agnostic device control for DP engine core ( #17245 )
...
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
2025-05-12 12:09:16 -07:00
Carol Zheng
b9fd0d7a69
[CI/Build] Fix TPU V1 Test mixed use of & and && across tests ( #17968 )
2025-05-12 12:06:59 -07:00
Harry Mellor
72a3f6b898
Construct KVTransferConfig properly from Python instead of using JSON blobs without CLI ( #17994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-12 11:25:33 -07:00
Jonathan Berkhahn
98ea35601c
[Lora][Frontend]Add default local directory LoRA resolver plugin. ( #16855 )
...
Signed-off-by: jberkhahn <jaberkha@us.ibm.com >
2025-05-12 10:39:10 -07:00
Robert Shaw
d19110204c
[P/D] NIXL Integration ( #17751 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Brent Salisbury <bsalisbu@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: ApostaC <yihua98@uchicago.edu >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Brent Salisbury <bsalisbu@redhat.com >
2025-05-12 09:46:16 -07:00
Maximilien de Bayser
05a4324f8e
Initialize the delta tool call fields explicitly ( #17340 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: igmainc <igmainc@icloud.com >
2025-05-12 13:28:58 +00:00
Jee Jee Li
7ea6cb28b2
[Misc] Improve modelscope import error ( #17983 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-12 10:46:45 +00:00
Aaruni Aggarwal
9fbf2bfbd5
Correcting testcases in builkite job for IBM Power ( #17675 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-05-12 08:11:55 +00:00
Xu Wenqing
3a5ea75129
[Feature] Support DeepSeekV3 Function Call ( #17784 )
...
Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com >
Signed-off-by: Xu Wenqing <xuwq1993@qq.com >
2025-05-12 00:45:21 -07:00
Brayden Zhong
891b9d33de
[Fix] Benchmark "EngineClient" has no attribute "model_config" ( #17976 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-05-11 22:55:53 -07:00
Siyuan Liu
430783018c
[Bugfix][TPU] Use np array when updating cache slot_mapping ( #17971 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-05-12 12:58:33 +08:00
Li Wang
19a3c78d1f
[Bugfix] Fix pydantic.errors.PydanticUserError ( #17962 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-05-12 12:58:23 +08:00
Reid
ada50aa295
[bugfix] fix the wrong parser ( #17958 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-12 04:58:02 +00:00
Cheng Kuan Yong Jason
08bf784078
[Bugfix] validate grammar and throw 400 error instead of crashing the engine when xgrammar validation fails ( #17623 )
...
Signed-off-by: Jason Cheng <jasoncky96@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-05-12 09:06:10 +08:00
youkaichao
d45fe333fb
[misc] add instructions on how to install nvshmem/pplx/deepep ( #17964 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-05-11 18:02:39 -07:00
Isotr0py
021c16c7ca
[Model] Broadcast Ovis2 implementation to fit Ovis1.6 ( #17861 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-11 17:56:30 -07:00
TJian
7de18d541b
[BUG] [ROCm] [MLA] Fix variable name bug due to change in variable name in PR #17483 ( #17961 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-11 09:14:30 -07:00
TJian
a810b5b088
[BugFix] [ROCm]: Bugfix and handle addition case of input for rocm_aiter_rms_norm ( #17857 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-11 04:17:11 -07:00
Reid
009b3d5382
[Misc] not show --model in vllm serve --help ( #16691 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-11 08:47:58 +00:00
wang.yuqi
e4b8713380
[New Model]: nomic-embed-text-v2-moe ( #17785 )
2025-05-11 00:59:43 -07:00
Gregory Shtrasberg
06c0922a69
[FP8][ROCm][Attention] Enable FP8 KV cache on ROCm for V1 ( #17870 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-11 15:58:45 +08:00
Dipika Sikka
cd3edfc908
[Misc] Add compressed-tensors NVFP4A16 emulation support ( #17914 )
...
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com >
Signed-off-by: Dipika <dipikasikka1@gmail.com >
2025-05-11 15:58:38 +08:00
Frieda Huang
9cea90eab4
[Frontend] Add /classify endpoint ( #17032 )
...
Signed-off-by: Frieda (Jingying) Huang <jingyingfhuang@gmail.com >
2025-05-11 07:57:07 +00:00
Reid
d1110f5b5a
[doc] update lora doc ( #17936 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-11 15:56:21 +08:00
Ben Browning
8132365b74
[Bugfix]: v1 engine - consider lora adapters in allowed_token_ids ( #17855 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
2025-05-11 00:53:58 -07:00
Shiyan Deng
eea22a56ab
fix amd triton mla path ( #17871 )
2025-05-11 07:53:31 +00:00
Kuntai Du
9112155283
[Perf] Use small max_num_batched_tokens for A100 ( #17885 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-05-11 07:53:23 +00:00
xinli-centml
90d0a74b60
[Bugfix] Add revision to transformers.Auto*.from_pretrained processors ( #17948 )
...
Signed-off-by: Xin Li <xin@centml.ai >
2025-05-11 07:52:44 +00:00
Jinzhen Lin
d74e5f37bc
[Kernel] fp4 marlin kernel ( #17687 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-05-10 19:58:49 -07:00
Chen Zhang
ca66a1674c
[v1] Rename specialized_manager.py to single_type_kv_cache_manager.py ( #17946 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-10 16:14:12 -07:00
Chen Zhang
950751a987
[v1] Pass BlockTable and KVCacheSpec to AttentionMetadataBuilders ( #17483 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-10 16:12:04 -07:00
Reid
4c31218f80
[Misc] remove --model from vllm serve usage ( #17944 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-10 13:23:31 +00:00
Harry Mellor
68311891f5
Don't default construct ModelConfig when default constructing VllmConfig ( #17943 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-10 13:23:00 +00:00
Ximo Guanter
fc4441a4ee
Add missing content type headers to /ping and /health ( #17036 ) ( #17786 )
...
Signed-off-by: Ximo Guanter <ximo.guanter@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-10 07:13:32 +01:00
tracelogfb
246e3e0a36
fix broken test vllm:test_kernels - test_attention_selector.py::test_flash_attn ( #17873 )
...
Co-authored-by: Stephen Chen <tracelog@meta.com >
2025-05-10 10:46:54 +08:00
Mark McLoughlin
7042cc96b0
[V1][Spec Decoding] Log accumulated metrics after system goes idle ( #17913 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-09 18:23:07 -07:00
Pavani Majety
0c0fdae84f
[Hardware/NVIDIA/Kernel] Enable nvidia/DeepSeek-R1-FP4 Model ( #16362 )
2025-05-09 16:24:41 -07:00
Alexei-V-Ivanov-AMD
3b602cdea7
AMD conditional all test execution // new test groups ( #17556 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Yida Wu <yidawu@alumni.cmu.edu >
2025-05-09 15:35:58 -07:00
Harry Mellor
4b2ed7926a
Improve configs - the rest! ( #17562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-09 15:18:44 -07:00
Mark McLoughlin
7e3571134f
[V1][Spec Decoding] Include bonus tokens in mean acceptance length ( #17908 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-05-09 13:32:36 -07:00
Richard Zou
ea2236bf95
Add option to use torch._inductor.standalone_compile ( #17057 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-09 12:59:04 -07:00
Harry Mellor
7d4aedae7c
Handle error when str passed to /v1/audio/transcriptions ( #17909 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-09 19:23:59 +00:00
Michael Goin
22481fbfa3
Update CT WNA16MarlinMoE integration ( #16666 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-09 13:19:45 -04:00
Isotr0py
5c4c08f6f1
[Misc] Auto fallback to float16 for pre-Ampere GPUs when detected bfloat16 config ( #17265 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-09 17:16:12 +00:00
Rui Qiao
c44c384b1c
[Misc] Add references in ray_serve_deepseek example ( #17907 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-05-09 16:59:36 +00:00
Michael Goin
85b72cb7b1
Revert "[BugFix][AMD] Compatible patch for latest AITER(05/07/2025)" ( #17910 )
2025-05-09 08:58:18 -07:00
Cyrus Leung
6e5595ca39
[CI/Build] Automatically retry flaky tests ( #17856 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-09 09:55:17 -06:00
Chen Zhang
200da9a517
[v1] Move block management logic from KVCacheManager to SpecializedManager ( #17474 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-09 15:25:34 +00:00
qli88
9f64e93415
[BugFix][AMD] Compatible patch for latest AITER(05/07/2025) ( #17864 )
...
Signed-off-by: Qiang Li <qiang.li2@amd.com >
2025-05-09 08:59:36 -06:00
Reid
ec61ea20a8
[Misc] add dify integration ( #17895 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-09 03:42:39 -07:00
Harry Mellor
c6798baa9c
Change top_k to be disabled with 0 (still accept -1 for now) ( #17773 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-09 10:01:49 +00:00
inkcherry
5b2dcbf0b8
Fix Whisper crash caused by invalid`` max_num_batched_tokens`` config ( #17853 )
...
Signed-off-by: inkcherry <mingzhi.liu@intel.com >
2025-05-09 09:16:26 +00:00
Isotr0py
6e4a93e3f7
[Bugfix][CPU] Fix broken AVX2 CPU TP support ( #17252 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-09 08:55:14 +00:00
vllmellm
217db4baa6
[Bugfix][ROCm] Fix AITER MLA V1 ( #17880 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-05-09 08:38:21 +00:00
Yan Ma
ff8c400502
[Doc] remove visible token in doc ( #17884 )
...
Signed-off-by: yan <yanma1@habana.ai >
2025-05-09 01:21:31 -07:00
Michael Yao
89a0315f4c
[Doc] Update several links in reasoning_outputs.md ( #17846 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-05-09 01:20:55 -07:00
Simon Mo
3d1e387652
[Docs] Add Slides from NYC Meetup ( #17879 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-05-08 21:46:54 -07:00
Ning Xie
d310e6de98
[BUGFIX]: return fast when request requires prompt logprobs ( #17251 )
2025-05-08 21:25:41 -07:00
Lucas Wilkinson
5e6f939484
[Attention] MLA move rotary embedding to cuda-graph region ( #17668 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-09 11:14:42 +08:00
Shanshan Shen
760e3ecc8f
[V1][Structured Output] Update llguidance (>= 0.7.11) to avoid AttributeError (no StructTag) ( #17839 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-05-08 20:14:18 -07:00
vllmellm
3c9396a64f
[FEAT][ROCm]: Support AITER MLA on V1 Engine ( #17523 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: Hongxia Yang <62075498+hongxiayang@users.noreply.github.com >
2025-05-09 10:42:05 +08:00
Shu Wang
376786fac1
Add cutlass support for blackwell fp8 blockwise gemm ( #14383 )
...
Signed-off-by: Shu Wang <shuw@nvidia.com >
2025-05-08 15:09:55 -07:00
Michael Goin
4f605a6de5
Fix noisy warning for uncalibrated q_scale/p_scale ( #17414 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-08 15:56:59 -04:00
Michael Goin
8342e3abd1
[CI] Prune down lm-eval small tests ( #17012 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-08 19:00:26 +00:00
yarongmu-google
a83a0f92b5
[Test] Attempt all TPU V1 tests, even if some of them fail. ( #17334 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-05-08 17:20:54 +00:00
Russell Bryant
226a4272cf
[V1] Improve VLLM_ALLOW_INSECURE_SERIALIZATION logging ( #17860 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-08 16:57:35 +00:00
Russell Bryant
ec54d73c31
[CI] Fix test_collective_rpc ( #17858 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-08 16:47:12 +00:00
Jee Jee Li
a944f8ede7
[Misc] Delete LoRA-related redundancy code ( #17841 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-08 06:02:21 -07:00
Cyrus Leung
015815fe01
[Bugfix] use_fast failing to be propagated to Qwen2-VL image processor ( #17838 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-08 05:39:21 -07:00
Harry Mellor
e4ca6e3a99
Fix transient dependency error in docs build ( #17848 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-08 03:42:03 -07:00
Reid
53d0cb7423
[Misc] add chatbox integration ( #17828 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-08 10:05:26 +00:00
Lu Fang
f50dcb7c21
[Easy] Eliminate c10::optional usage in vllm/csrc ( #17819 )
2025-05-08 03:05:10 -07:00
Cyrus Leung
a1e19b635d
[Doc] Fix a typo in the file name ( #17836 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-08 18:04:18 +08:00
fxmarty-amd
bb239a730f
[Bugfix] Fix quark fp8 format loading on AMD GPUs ( #12612 )
...
Signed-off-by: Felix Marty <felmarty@amd.com >
Signed-off-by: kewang2 <kewang2@amd.com >
Co-authored-by: kewang2 <kewang2@amd.com >
2025-05-08 02:53:53 -07:00
Jevin Jiang
a463555dee
[TPU] Fix the test_sampler ( #17820 )
2025-05-08 05:51:33 -04:00
Rick Yuan
ca04b97c93
[Bugfix] Fix tool call template validation for Mistral models ( #17644 )
...
Signed-off-by: Rick Yuan <yuan821120@gmail.com >
Signed-off-by: RIck Yuan <yuan821120@gmail.com >
Co-authored-by: Aaron Pham <Aaronpham0103@gmail.com >
2025-05-08 09:47:19 +00:00
xsank
0a9bbaa104
[Misc] support model prefix & add deepseek vl2 tiny fused moe config ( #17763 )
...
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com >
2025-05-08 07:50:22 +00:00
Qiong Zhou Huang
39956efb3f
[Bugfix] Fix bad words for Mistral models ( #17753 )
...
Signed-off-by: Qiong Zhou Huang <qiong@phonic.co >
2025-05-07 23:32:10 -07:00
Ximingwang-09
597051e56f
[Qwen3]add qwen3-235b-bf16 fused moe config on A100 ( #17715 )
2025-05-07 23:09:32 -07:00
Cyrus Leung
96722aa81d
[Frontend] Chat template fallbacks for multimodal models ( #17805 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-07 23:05:54 -07:00
Agata Dobrzyniewicz
843b222723
[Hardware][Intel-Gaudi] Support Automatic Prefix Caching on HPU ( #17648 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
2025-05-07 22:37:03 -07:00
Akash kaothalkar
e515668edf
[Hardware][Power] Enable compressed tensor W8A8 INT8 quantization for POWER ( #17153 )
...
Signed-off-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: Akash Kaothalkar <akash.kaothalkar@ibm.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-05-07 22:35:03 -07:00
Hashem Hashemi
5a499e70d5
[Kernel][Hardware][AMD] Bf16 mfma opt for ROCm skinny GEMMs ( #17071 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: charlifu <charlifu@amd.com >
2025-05-07 22:34:49 -07:00
Russell Bryant
6930a41116
[V1] Add VLLM_ALLOW_INSECURE_SERIALIZATION env var ( #17490 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-05-08 13:34:02 +08:00
Harry Mellor
998eea4a0e
Only log non-default CLI args for online serving ( #17803 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-07 22:33:29 -07:00
Mikhail Podvitskii
c747d84576
[Installation] OpenTelemetry version update ( #17771 )
...
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com >
2025-05-07 22:32:49 -07:00
Vadim Markovtsev
b2da14a05a
Improve exception reporting in MP engine ( #17800 )
...
Signed-off-by: Vadim Markovtsev <vadim@poolside.ai >
2025-05-08 05:32:39 +00:00
Chanh Nguyen
7ea2adb802
[Core] Support full cuda graph in v1 ( #16072 )
...
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com >
2025-05-07 22:30:15 -07:00
Nick Hill
3d13ca0e24
[BugFix] Fix --disable-log-stats in V1 server mode ( #17600 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-08 04:08:15 +00:00
Harry Mellor
66ab3b13c9
Don't call the venv vllm ( #17810 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-08 04:06:39 +00:00
Aaron Pham
a8238bbdb0
[Chore][Doc] uses model id determined from OpenAI client ( #17815 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-08 01:48:57 +00:00
Wallas Henrique
d43f914d42
[Core][Feature] Input metadata dump on crash ( #13407 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
2025-05-07 22:15:09 +00:00
Nick Hill
ed5272cf21
[BugFix] Avoid secondary missing MultiprocExecutor.workers error ( #17811 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-07 21:55:04 +00:00
Akshat Tripathi
c20ef40fd0
[Hardware][TPU][V1] Multi-LoRA implementation for the V1 TPU backend ( #14238 )
...
Signed-off-by: Akshat Tripathi <akshat@krai.ai >
Signed-off-by: Chengji Yao <chengjiyao@google.com >
Co-authored-by: Chengji Yao <chengjiyao@google.com >
2025-05-07 16:28:47 -04:00
Bowen Bao
db593aa67f
[Quantization] Quark MXFP4 format loading ( #16943 )
2025-05-07 15:05:05 -04:00
Isotr0py
f98e307588
[Bugfix] Fix missing lora name mapping for lora without prefix ( #17793 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-07 16:17:12 +00:00
Harry Mellor
646a31e51e
Fix and simplify deprecated=True CLI kwarg ( #17781 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-07 16:51:06 +01:00
Isotr0py
be8ff88e66
[Bugfix] Fix Video IO error for short video ( #17791 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-07 15:36:06 +00:00
Christian Heimes
1a6af1453d
Only depend on importlib-metadata for Python < 3.10 ( #17776 )
...
Signed-off-by: Christian Heimes <christian@python.org >
2025-05-07 07:51:06 -07:00
Gregory Shtrasberg
32aa74c09c
[ROCm][FP8][Kernel] FP8 quantization fused into Custom Paged Attention ( #17139 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-07 07:12:35 -07:00
Reid
7377dd0307
[doc] update the issue link ( #17782 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-07 20:29:05 +08:00
Yong Hoon Shin
98c89e16ff
Make key optional for rotary embedding ( #17566 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-07 00:11:46 -07:00
Yong Hoon Shin
324a3119b0
Fix test_memory_usage_no_spec ( #17754 )
...
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
2025-05-07 00:10:33 -07:00
Cyrus Leung
8a15c2603a
[Frontend] Add missing chat templates for various MLLMs ( #17758 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-07 00:10:01 -07:00
Satyajith Chilappagari
043e4c4955
Add NeuronxDistributedInference support, Speculative Decoding, Dynamic on-device sampling ( #16357 )
...
Signed-off-by: Satyajith Chilappagari <satchill@amazon.com >
Co-authored-by: Aaron Dou <yzdou@amazon.com >
Co-authored-by: Shashwat Srijan <sssrijan@amazon.com >
Co-authored-by: Chongming Ni <chongmni@amazon.com >
Co-authored-by: Amulya Ballakur <amulyaab@amazon.com >
Co-authored-by: Patrick Lange <patlange@amazon.com >
Co-authored-by: Elaine Zhao <elaineyz@amazon.com >
Co-authored-by: Lin Lin Pan <tailinpa@amazon.com >
Co-authored-by: Navyadhara Gogineni <navyadha@amazon.com >
Co-authored-by: Yishan McNabb <yishanm@amazon.com >
Co-authored-by: Mrinal Shukla <181322398+mrinalks@users.noreply.github.com >
2025-05-07 00:07:30 -07:00
Jee Jee Li
ba7703e659
[Misc] Remove qlora_adapter_name_or_path ( #17699 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-06 23:10:37 -07:00
Wanrui Dai
f80ae5bdcf
[Kernel] Use fused rmsnorm for some models like qwen3 series ( #17735 )
...
Signed-off-by: evian <eviantai@u.nus.edu >
Co-authored-by: evian <eviantai@u.nus.edu >
2025-05-06 23:10:02 -07:00
Szymon Ożóg
1a45a61387
[Kernel] GGUF MoeVec kernel ( #16780 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-05-06 23:07:23 -07:00
Isotr0py
c3e9d5060e
[Misc] Use apply_rotary_emb from vllm_flash_attn for Qwen2-VL vision RoPE ( #17726 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-07 04:51:33 +00:00
Jee Jee Li
822de7fb94
[Misc] Split model loader ( #17712 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-05-07 12:42:26 +08:00
Woosuk Kwon
8d84d836d1
[BugFix][Spec Decode] Fix hidden size mismatch between target and eagle head ( #17740 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-05-06 19:51:26 -07:00
Michael Goin
950b71186f
Replace lm-eval bash script with pytest and use enforce_eager for faster CI ( #17717 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 18:00:10 -07:00
Michael Goin
e50a1f1a9c
[TPU] Add kernel test for moe_pallas ( #17496 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-05-06 17:59:57 -07:00
Michael Goin
a17cef70ea
Removed unused marlin cuda code ( #17684 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 17:59:47 -07:00
Chih-Chieh Yang
18dd5e01f2
[Model] Mamba2 causal conv1d Refactor to Split Prefill and Decode Requests for Corresponding Kernels ( #17146 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-05-06 17:59:30 -07:00
Yang Wang
6de3e13413
Add logging for torch nightly version ( #17669 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-05-07 00:45:51 +00:00
Hongxia Yang
ed3a1d2106
[ROCm] fix num_stages for default moe config to avoid triton OutOfResource error ( #17744 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-05-07 00:39:48 +00:00
Harry Mellor
022afbeb4e
Fix doc build performance ( #17748 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-07 00:36:41 +00:00
Thomas Parnell
2f925e5777
[Kernel] Unified Triton kernel that doesn't distinguish between prefill + decode ( #16828 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-06 18:21:48 -04:00
Gregory Shtrasberg
de906b95f9
[Bugfix] Fix for the condition to accept empty encoder inputs for mllama ( #17732 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-06 19:59:06 +00:00
d.transposed
d456aea71f
[Misc] Add Next Edit Prediction (NEP) datasets support in benchmark_serving.py ( #16839 )
...
Signed-off-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
Signed-off-by: dtransposed <>
Co-authored-by: dtransposed <damian@damian-ml-machine.europe-west3-b .c.jetbrains-grazie.internal>
2025-05-06 15:38:45 -04:00
Jevin Jiang
621ca2c0ab
[TPU] Increase block size and reset block shapes ( #16458 )
2025-05-06 13:55:04 -04:00
Harry Mellor
6115b11582
Make right sidebar more readable in "Supported Models" ( #17723 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-06 16:48:26 +00:00
Cyrus Leung
5b8c390747
[Bugfix] Fix modality limits in vision language example ( #17721 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-06 16:12:28 +00:00
Reid
7525d5f3d5
[doc] Add RAG Integration example ( #17692 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-06 16:10:23 +00:00
Chen Zhang
aabcd2cae3
[v1] Introduce KVCacheBlocks as interface between Scheduler and KVCacheManager ( #17479 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-06 08:50:34 -07:00
Michael Yao
0d115460a7
[Docs] Use gh-file to add links to tool_calling.md ( #17709 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-05-06 15:27:19 +00:00
Aaron Pham
175bda67a1
[Feat] Add deprecated=True to CLI args ( #17426 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-05-06 08:11:27 -07:00
Chen Zhang
cba31c47c4
[v1] AttentionMetadata for each layer ( #17394 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-05-06 07:58:37 -07:00
Li, Jiang
a6fed02068
[V1][PP] Support PP for MultiprocExecutor ( #14219 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-05-06 07:58:05 -07:00
Michael Goin
d419aa5dc4
[V1] Enable TPU V1 backend by default ( #17673 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 06:49:49 -07:00
Mengqing Cao
f9bc5a0693
[Bugfix] Fix triton import with local TritonPlaceholder ( #17446 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-05-06 17:53:09 +08:00
Harry Mellor
05e1f96419
Fix dockerfilegraph pre-commit hook ( #17698 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-06 08:56:48 +00:00
Lucas Wilkinson
6eae34533a
[Misc] Fix ScalarType float4 naming ( #17690 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-06 01:07:15 -07:00
Cyrus Leung
63ced7b43f
[Doc] Update notes for H2O-VL and Gemma3 ( #17219 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-06 07:51:02 +00:00
Mikhail Podvitskii
dc47ba32f8
[Bugfix] Fixed prompt length for random dataset ( #17408 )
...
Signed-off-by: Mikhail Podvitskii <podvitskiymichael@gmail.com >
2025-05-06 07:00:08 +00:00
Richard Zou
edbf2d609e
[easy] Fix logspam on PiecewiseBackend errors ( #17138 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-05 23:46:11 -07:00
Stan Wozniak
999328be0d
[Model] Add GraniteMoeHybrid 4.0 model ( #17497 )
...
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com >
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com >
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-05-06 12:00:31 +08:00
Michael Goin
98834fefaa
Update nm to rht in doc links + refine fp8 doc ( #17678 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-06 00:41:14 +00:00
Varun Sundar Rabindranath
90bd2ae172
[Bugfix] LoRA - Retire unused maxnreg LoRA kernel argument ( #17677 )
2025-05-05 17:34:29 -07:00
Nicolò Lucchesi
5941e0b7ea
[TPU][V1] Add support for top-logprobs ( #17072 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-05-05 14:20:15 -07:00
XiongfeiWei
9765940824
[TPU] Enable gemma3-27b with TP>1 on multi-chips. ( #17335 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-05-05 14:19:58 -07:00
Nick Hill
5ea5c514da
[BugFix] Increase timeout for startup failure test ( #17642 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-05-05 20:53:19 +00:00
Russell Bryant
d3efde8176
[Benchmarks] Remove invalid option under V1 engine ( #17651 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-05 16:30:22 -04:00
Thomas J. Fan
aea302be6c
Use git-path commit in hook ( #17616 )
...
Signed-off-by: Thomas J. Fan <thomasjpfan@gmail.com >
2025-05-05 17:55:32 +00:00
Isotr0py
cc05b90d86
[Doc] Fix broken cuda installation doc rendering ( #17654 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-05 17:52:40 +00:00
Jinzhen Lin
1d0c9d6b2d
[Kernel] some optimizations for dense marlin and moe marlin ( #16850 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-05-05 09:39:30 -07:00
Tyler Michael Smith
f62cad6431
[Build/CI] Upgrade CUTLASS to 3.9.2 ( #17641 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-05-04 19:23:17 -07:00
Chauncey
5394ad7387
[Bugfix] fix KeyError on top logprobs are special tokens ( #17637 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-04 19:22:35 -07:00
Tyler Michael Smith
68e1ee0072
[Bugfix][Easy] Fix whitespace in shm_broadcast.py logging ( #17635 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-05-04 19:20:19 -07:00
Cyrus Leung
2858830c39
[Bugfix] Prioritize dtype in root config before checking text config ( #17629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-04 12:43:05 +00:00
Harry Mellor
d6484ef3c3
Add full API docs and improve the UX of navigating them ( #17485 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-03 19:42:43 -07:00
Cyrus Leung
46fae69cf0
[Misc] V0 fallback for --enable-prompt-embeds ( #17615 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-03 22:59:24 +00:00
Isotr0py
f66f1e0fa3
[Bugfix] Fix broken Qwen2.5-omni tests ( #17613 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-03 17:08:14 +00:00
Cyrus Leung
887d7af882
[Core] Gate prompt_embeds behind a feature flag ( #17607 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-04 00:19:20 +08:00
Gregory Shtrasberg
a92842454c
[Bugfix][ROCm] Using device_type because on ROCm the API is still torch.cuda ( #17601 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-02 22:25:47 -07:00
Tyler Michael Smith
c8386fa61d
[Build/CI] Upgrade CUTLASS to 3.9.1 ( #17602 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-05-02 22:25:14 -07:00
Chenyaaang
87baebebd8
[Frontend][TPU] Add TPU default max-num-batched-tokens based on device name ( #17508 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-05-02 21:42:44 -07:00
rasmith
e3d0a1d190
[Quantizaton] [AMD] Add support for running DeepSeek int8 w8a8 MoE on ROCm ( #17558 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-05-02 21:41:10 -07:00
22quinn
d47b605eca
Update test requirements to CUDA 12.8 ( #17576 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
2025-05-02 21:40:15 -07:00
Liangfu Chen
22c6f6397f
[Neuron][Build] Require setuptools >= 77.0.3 for PEP 639 ( #17603 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-05-03 02:41:59 +00:00
Kevin H. Luu
3ec97e2cc5
[release] Add command to clean up Docker containers/images in TPU release machine ( #17606 )
2025-05-02 18:54:34 -07:00
Eric Hartford
9b103a1d76
fix typo in logging ( #17605 )
2025-05-02 18:04:40 -07:00
Richard Zou
b90b0852e9
[easy] Print number of needed GPUs in skip message ( #17594 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-05-02 15:27:43 -07:00
Xiaodong Wang
9352cdb56d
[Hardware][AMD] Improve OAM device ID + llama4 Maverick MOE tuning ( #16263 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Lu Fang <lufang@fb.com >
2025-05-02 19:44:19 +00:00
Zhiyu
182f40ea8b
Add NVIDIA TensorRT Model Optimizer in vLLM documentation ( #17561 )
2025-05-02 11:36:46 -07:00
Caleb_Du
3e887d2e0c
permute/unpermute kernel for moe optimization ( #14568 )
...
Signed-off-by: Caleb_Du <Caleb_Du@zju.edu.cn >
2025-05-02 11:31:55 -07:00
Lucas Wilkinson
0f87d8f7b2
[BugFix][Attention] Fix sliding window attention in V1 giving incorrect results ( #17574 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-02 11:01:38 -07:00
Hui Liu
4c33d67321
[Bugfix] fix tmp_out and exp_sums dimensions ( #17438 )
...
Signed-off-by: Hui Liu <96135754+hliuca@users.noreply.github.com >
2025-05-02 16:44:07 +00:00
Cyrus Leung
cb234955df
[Misc] Clean up input processing ( #17582 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 08:11:53 -07:00
Reid
3a500cd0b6
[doc] miss result ( #17589 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-02 07:04:49 -07:00
Michael Goin
868c546da4
Support W8A8 INT8 MoE for compressed-tensors ( #16745 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-02 10:03:32 -04:00
Cyrus Leung
99404f53c7
[Security] Fix image hash collision ( #17378 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 08:36:39 -04:00
Harry Mellor
785d75a03b
Automatically tell users that dict args must be valid JSON in CLI ( #17577 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-02 05:24:55 -07:00
Reid
6d1479ca4b
[doc] add the print result ( #17584 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-02 05:24:45 -07:00
Yang Wang
b8b0859b5c
add more pytorch related tests for torch nightly ( #17422 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-05-02 03:29:59 -07:00
Cyrus Leung
d7543862bd
[Misc] Rename assets for testing ( #17575 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 03:29:25 -07:00
Robert Shaw
c777df79f7
[BugFix] Fix Memory Leak ( #17567 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-05-02 01:07:03 -07:00
Andrew Sansom
cc2a77d7f1
[Core] [Bugfix] Add Input Embeddings ( #15428 )
...
Signed-off-by: Andrew Sansom <andrew@protopia.ai >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: 临景 <linjing.yx@alibaba-inc.com >
Co-authored-by: Bryce1010 <bryceyx@gmail.com >
Co-authored-by: Nan2018 <nan@protopia.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-02 01:06:39 -07:00
Isotr0py
9e2de9b9e9
[Bugifx] Remove TritonPlaceholder from sys.modules ( #17317 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-02 00:45:01 -07:00
Jerry Zhang
109e15a335
Add pt_load_map_location to allow loading to cuda ( #16869 )
...
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com >
2025-05-01 23:23:42 -07:00
Michael Goin
f192ca90e6
Fix PixtralHF missing spatial_merge_size ( #17571 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-01 22:14:09 -07:00
Cyrus Leung
f89d0e11bf
[Misc] Continue refactoring model tests ( #17573 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 22:06:08 -07:00
Michael Goin
b4003d11fc
Check if bitblas is installed during support check ( #17572 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-02 04:32:54 +00:00
Michael Goin
292fc59d61
[CI] Actually run tests/kv_transfer/test_disagg.py in CI ( #17555 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-02 04:05:04 +00:00
Lucas Wilkinson
afcb3f8863
[Attention] MLA move o_proj q_proj into cuda-graph region ( #17484 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-02 03:16:26 +00:00
David Xia
afb12e4294
[Doc] note that not all unit tests pass on CPU platforms ( #17554 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-05-02 02:57:21 +00:00
Michael Goin
24aebae177
[Bugfix] Disable gptq_bitblas for <SM80 to fix GPTQ on V100/T4 ( #17541 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-05-01 17:59:35 -07:00
qizixi
39c0813a7f
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 ( #17504 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-05-01 16:19:30 -07:00
Chenyaaang
9b70e2b4c1
[Misc][Tools][Benchmark] Publish script to auto tune server parameters ( #17207 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-05-01 19:53:03 +00:00
Chen Xia
173daac19d
[Bug]change the position of cuda_graph_sizes in dataclasses ( #17548 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
2025-05-01 11:52:37 -07:00
sstamenk
04f2cfc894
Remove duplicate code from dbrx.py ( #17550 )
2025-05-01 11:51:58 -07:00
Juan Villamizar
811a6c0972
[ROCM] Add gfx950 to the custom attention archs ( #16034 )
...
Signed-off-by: jpvillam <Juan.Villamizar@amd.com >
Signed-off-by: seungrokjung <seungrok.jung@amd.com >
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: seungrokjung <seungrok.jung@amd.com >
Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-05-01 11:18:28 -07:00
Cyrus Leung
9b1769dd9a
[Bugfix] Fix lint error ( #17547 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 11:12:19 -07:00
Chen Xia
61c299f81f
[Misc]add configurable cuda graph size ( #17201 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-01 11:04:50 -07:00
Hongxia Yang
4acfa3354a
[ROCm] update installation guide to include build aiter from source instructions ( #17542 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-05-01 11:01:28 -07:00
Isotr0py
88c8304104
[Model] Refactor Ovis2 to support original tokenizer ( #17537 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-05-01 11:00:53 -07:00
Harry Mellor
6768ff4a22
Move the last arguments in arg_utils.py to be in their final groups ( #17531 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-01 10:31:44 -07:00
Cyrus Leung
f2e7af9b86
[CI/Build] Remove awscli dependency ( #17532 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 09:20:54 -07:00
Reid
7423cf0a9b
[Misc] refactor example - cpu_offload_lmcache ( #17460 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-01 15:05:24 +00:00
Sage Moore
460a2b1100
[torch.compile] Add torch inductor pass for fusing silu_and_mul with subsequent scaled_fp8_quant operations ( #10867 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-05-01 07:59:28 -07:00
Hongxia Yang
28566d73b3
[ROCm] remove unsupported archs from rocm triton flash-attention supported list ( #17536 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-05-01 07:54:25 -07:00
Chauncey
98060b001d
[Feature][Frontend]: Deprecate --enable-reasoning ( #17452 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-01 06:46:16 -07:00
TJian
f5a3c655b2
[FEAT] [ROCm]: Add Qwen/Qwen3-235B-A22B-FP8 TP4 triton fused moe config ( #17535 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-01 06:37:17 -07:00
Reid
7169f87ad0
[doc] add streamlit integration ( #17522 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-05-01 13:34:02 +00:00
Huy Do
b74d888c63
Fix more broken speculative decode tests ( #17450 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-05-01 06:05:58 -07:00
TJian
2007d4d54f
[FEAT] [ROCm]: Add Qwen/Qwen3-30B-A3B-FP8 fused moe config for MI300X ( #17530 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-05-01 06:03:13 -07:00
Cyrus Leung
48e925fab5
[Misc] Clean up test docstrings and names ( #17521 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 05:19:32 -07:00
Cyrus Leung
1903c0b8a3
[Frontend] Show progress bar for adding requests ( #17525 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-05-01 05:15:32 -07:00
Teruaki Ishizaki
86a1f67a3b
[Bugfix][Benchmarks] Allow benchmark of deepspeed-mii backend to select a model ( #17285 )
...
Signed-off-by: Teruaki Ishizaki <teruaki.ishizaki@ntt.com >
2025-05-01 11:54:51 +00:00
Harry Mellor
a257d9bccc
Improve configs - ObservabilityConfig ( #17453 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-05-01 03:52:05 -07:00
Chauncey
015069b017
[Misc] Optimize the Qwen3_ReasoningParser extract_reasoning_content ( #17515 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-05-01 03:29:01 -07:00
Russell Bryant
fbefc8a78d
[Core] Enable IPv6 with vllm.utils.make_zmq_socket() ( #16506 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-05-01 09:38:18 +00:00
Keyun Tong
26bc4bbcd8
Avoid overwriting vllm_compile_cache.py ( #17418 )
...
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
2025-05-01 07:30:57 +00:00
Lucas Wilkinson
3c3d767201
[BugFix] Fix mla cpu - missing 3 required positional arguments ( #17494 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-05-01 14:36:52 +08:00
Noah Yoshida
13cf6b6236
[BugFix] fix speculative decoding memory leak when speculation is disabled ( #15506 )
...
Signed-off-by: Noah Yoshida <noahcy117@gmail.com >
2025-04-30 23:28:17 -07:00
Hongxia Yang
90d0a54c4d
[ROCm] Effort to reduce the number of environment variables in command line ( #17229 )
...
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com >
2025-04-30 23:27:06 -07:00
Russell Bryant
7a0a146c54
[Build] Require setuptools >= 77.0.3 for PEP 639 ( #17389 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-30 23:25:36 -07:00
Alexei-V-Ivanov-AMD
7ab643e425
FIxing the AMD test failures caused by PR#16457 ( #17511 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-30 23:23:07 -07:00
Cyrus Leung
afb4429b4f
[CI/Build] Reorganize models tests ( #17459 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-30 23:03:08 -07:00
Michael Goin
aa4502e7f3
[CI][Bugfix] Fix failing V1 Test due to missing 'cache_salt' arg ( #17500 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-30 21:03:30 -07:00
Michael Goin
17b4d85f63
[CI][TPU] Skip structured outputs+spec decode tests on TPU ( #17510 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-30 20:36:20 -07:00
NaLan ZeYu
1144a8efe7
[Bugfix] Temporarily disable gptq_bitblas on ROCm ( #17411 )
...
Signed-off-by: Yan Cangang <nalanzeyu@gmail.com >
2025-04-30 19:51:45 -07:00
Gregory Shtrasberg
08fb5587b4
[Bugfix][ROCm] Fix import error on ROCm ( #17495 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-30 19:51:42 -07:00
Siyuan Liu
dbc18e7816
[CI][TPU] Skip Multimodal test ( #17488 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-04-30 19:51:39 -07:00
Alex Brooks
02bd654846
[Misc] Rename Audios -> Audio in Qwen2audio Processing ( #17507 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-30 19:51:36 -07:00
Rahul Tuli
200bbf92e8
Bump Compressed Tensors version to 0.9.4 ( #17478 )
...
Signed-off-by: Rahul Tuli <rtuli@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-30 15:24:45 -07:00
Chen Zhang
81ecf425f0
[v1][Spec Decode] Make sliding window compatible with eagle prefix caching ( #17398 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-30 18:25:53 +00:00
David Xia
42d9a2c4c7
doc: fix bug report Github template formatting ( #17486 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-04-30 10:03:20 -07:00
Reid
2ac74d098e
[doc] add install tips ( #17373 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-30 17:02:41 +00:00
Gregory Shtrasberg
584f5fb4c6
[Bugfix][ROCm] Restrict ray version due to a breaking release ( #17480 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-30 09:59:06 -07:00
zh Wang
d586ddc691
[BugFix] Fix authorization of openai_transcription_client.py ( #17321 )
...
Signed-off-by: zh Wang <rekind133@outlook.com >
2025-04-30 09:51:05 -07:00
Michael Goin
0b7e701dd4
[Docs] Update optimization.md doc ( #17482 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-30 09:34:02 -07:00
Russell Bryant
947f2f5375
[V1] Allow turning off pickle fallback in vllm.v1.serial_utils ( #17427 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-30 16:10:54 +00:00
Pete Savage
739e03b344
[Bugfix] Fixed mistral tokenizer path when pointing to file ( #17457 )
...
Signed-off-by: Pete Savage <psavage@redhat.com >
2025-04-30 08:08:37 -07:00
Aaron Pham
da4e7687b5
[Fix] Support passing args to logger ( #17425 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-30 08:06:58 -07:00
Russell Bryant
39317cf42b
[Docs] Add command for running mypy tests from CI ( #17475 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-30 08:06:09 -07:00
Chauncey
2990cee95b
[Feature] The Qwen3 reasoning parser supports guided decoding ( #17466 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-30 07:48:21 -07:00
Alec
0be6d05b5e
[V1][Metrics] add support for kv event publishing ( #16750 )
...
Signed-off-by: alec-flowers <aflowers@nvidia.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
2025-04-30 07:44:45 -07:00
Marko Rosenmueller
77073c77bc
[Core] Prevent side-channel attacks via cache salting ( #17045 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
2025-04-30 20:27:21 +08:00
Nicolò Lucchesi
a7d5b016bd
[TPU][V1][CI] Update regression test baseline for v6 CI ( #17064 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-30 04:03:22 -07:00
rongfu.leng
d803786731
[V1][Bugfix]: vllm v1 verison metric num_gpu_blocks is None ( #15755 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-30 18:20:39 +08:00
Chauncey
1534d389af
[Misc] Remove deprecated files ( #17447 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-30 01:52:19 -07:00
Lu Fang
ece5a8b0b6
Make the _apply_rotary_emb compatible with dynamo ( #17435 )
2025-04-30 07:52:48 +00:00
Marco
54072f315f
[MODEL ADDITION] Ovis2 Model Addition ( #15826 )
...
Signed-off-by: Marco <121761685+mlinmg@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-04-30 07:33:29 +00:00
Chauncey
be633fba0f
[Bugfix] Fix AttributeError: 'State' object has no attribute 'engine_client' ( #17434 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-30 00:11:04 -07:00
Kunshang Ji
ed6cfb90c8
[Hardware][Intel GPU] Upgrade to torch 2.7 ( #17444 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Qiming Zhang <qiming1.zhang@intel.com >
2025-04-30 00:03:58 -07:00
Kunshang Ji
6ed9f6047e
[Intel GPU] [CI]Fix XPU ci, setuptools >=80.0 have build issue ( #17298 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-04-29 22:54:10 -07:00
Michael Goin
a44c4f1d2f
Support LoRA for Mistral3 ( #17428 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-29 21:10:30 -07:00
Huy Do
88fcf00dda
Fix some speculative decode tests with tl.dot ( #17371 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-04-29 19:41:02 -07:00
Harry Mellor
d1f569b1b9
Fix call to logger.info_once ( #17416 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 19:39:18 -07:00
Harry Mellor
13698db634
Improve configs - ModelConfig ( #17130 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-30 10:38:22 +08:00
Huy Do
2c4f59afc3
Update PyTorch to 2.7.0 ( #16859 )
2025-04-29 19:08:04 -07:00
Gabriel Marinho
1c2bc7ead0
Truncation control for embedding models ( #14776 )
...
Signed-off-by: Gabriel Marinho <gmarinho@ibm.com >
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
Co-authored-by: Max de Bayser <mbayser@br.ibm.com >
2025-04-30 09:24:57 +08:00
Kevin H. Luu
4055130a85
[release] Always git fetch all to get latest tag on TPU release ( #17322 )
2025-04-29 17:52:11 -07:00
Benjamin Chislett
34120f5acd
[V1][Feature] Enable Speculative Decoding with Structured Outputs ( #14702 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-04-30 00:02:10 +00:00
Harry Mellor
7489ec0bab
Remove Bamba 9B from CI ( #17407 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 21:10:31 +00:00
Bryan Lu
70788bdbdc
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE ( #17211 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
2025-04-29 21:10:00 +00:00
Dilip Gowda Bhagavan
c9c1b59e59
Fix: Python package installation for opentelmetry ( #17049 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
2025-04-29 20:20:24 +00:00
Harry Mellor
0350809f3a
Remove Falcon3 2x7B from CI ( #17404 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 19:52:25 +00:00
Harry Mellor
a6977dbd15
Simplify (and fix) passing of guided decoding backend options ( #17008 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 19:02:23 +00:00
Isotr0py
2fa2a50bf9
[Bugfix] Fix Minicpm-O-int4 GPTQ model inference ( #17397 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-29 18:21:42 +00:00
Reid
08e15defa9
[CI/Build] Add retry mechanism for add-apt-repository ( #17107 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-29 10:40:52 -07:00
Aaron Pham
b37685afbb
[CI] Uses Python 3.11 for TPU ( #17359 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-29 17:39:16 +00:00
Nicolò Lucchesi
792595b59d
[TPU][V1][CI] Replace python3 setup.py develop with standard pip install --e on TPU ( #17374 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-29 10:36:48 -07:00
casinca
0c1c788312
[Doc][Typo] Fixing label in new model requests link in overview.md ( #17400 )
2025-04-29 10:29:48 -07:00
Russell Bryant
56d64fbe30
[Docs] Propose a deprecation policy for the project ( #17063 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-29 10:29:44 -07:00
Alexei-V-Ivanov-AMD
608968b7c5
Enabling multi-group kernel tests. ( #17115 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-29 10:27:27 -07:00
TY-AMD
06ffc7e1d3
[Misc][ROCm] Exclude cutlass_mla_decode for ROCm build ( #17289 )
...
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
2025-04-29 10:26:42 -07:00
Qiming Zhang
d3cf61b89b
fix gemma3 results all zero ( #17364 )
...
Signed-off-by: mayuyuace <qiming1.zhang@intel.com >
2025-04-29 09:40:25 -07:00
mofanke
a39203f99e
[Bugfix] add qwen3 reasoning-parser fix content is None when disable … ( #17369 )
...
Signed-off-by: mofanke <mofanke@gmail.com >
2025-04-29 16:32:40 +00:00
Chen Zhang
24e6ad3f16
[V1] Remove num_input_tokens from attn_metadata ( #17193 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-29 09:28:41 -07:00
Harry Mellor
2ef5d106bb
Improve literal dataclass field conversion to argparse argument ( #17391 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 16:25:08 +00:00
a2q1p
0ed27ef66c
Fix: Spelling of inference ( #17387 )
2025-04-29 09:23:39 -07:00
Harry Mellor
900edfa8d4
Transformers backend tweaks ( #17365 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 09:08:03 -07:00
Cyrus Leung
88ad9ec6b2
[Frontend] Support chat_template_kwargs in LLM.chat ( #17356 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 22:03:35 +08:00
Harry Mellor
40896bdf3f
pre-commit autoupdate (#17380 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 06:46:55 -07:00
Cyrus Leung
00ee37efa2
[Bugfix] Clean up MiniMax-VL and fix processing ( #17354 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 20:42:16 +08:00
Jee Jee Li
890f104cdf
[Doc] Fix QWen3MOE info ( #17381 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-29 12:38:32 +00:00
Harry Mellor
4a5e13149a
Update docs requirements ( #17379 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-29 11:35:47 +00:00
Ekagra Ranjan
97cc8729f0
[Model] Ignore rotary embed load for Cohere model ( #17319 )
2025-04-29 00:30:40 -07:00
Gregory Shtrasberg
4464109219
[Build][Bugfix] Restrict setuptools version to <80 ( #17320 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-29 00:17:23 -07:00
Hyogeun Oh (오효근)
193e78e35d
[Fix] Documentation spacing in compilation config help text ( #17342 )
...
Signed-off-by: Zerohertz <ohg3417@gmail.com >
2025-04-29 00:16:17 -07:00
ponix-j
bdb2cddafc
[Misc]Use a platform independent interface to obtain the device attributes ( #17100 )
2025-04-29 06:59:13 +00:00
Cyrus Leung
ebb3930d28
[Misc] Move config fields to MultiModalConfig ( #17343 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 06:37:21 +00:00
qscqesze
cde384cd92
[Model] support MiniMax-VL-01 model ( #16328 )
...
Signed-off-by: qingjun <qingjun@minimaxi.com >
2025-04-29 12:05:50 +08:00
Chauncey
96e06e3cb7
[Misc] Add a Jinja template to support Mistral3 function calling ( #17195 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-28 19:53:44 -07:00
Zhengyuan Su (苏政渊)
17eb306fcc
[Bugfix] Add contiguous call inside rope kernel wrapper ( #17091 )
...
Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn >
2025-04-28 19:24:07 -07:00
Richard Zou
165cb56329
Ignore '<string>' filepath ( #17330 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-28 19:23:29 -07:00
Richard Barnes
d6da8a8ff2
[Bugfix] Fix numel() downcast in fused_layernorm_dynamic_per_token_quant.cu ( #17316 )
2025-04-28 19:23:18 -07:00
Lucia Fang
b4ac4fa04d
[model] make llama4 compatible with pure dense layers ( #17315 )
...
Signed-off-by: Lucia Fang <fanglu@fb.com >
2025-04-29 10:22:22 +08:00
Ekagra Ranjan
e136000595
[V1][Spec Decode] Make Eagle model arch config driven ( #17323 )
2025-04-29 10:22:02 +08:00
Michał Moskal
86d9fc29cb
implement Structural Tag with Guidance backend ( #17333 )
...
Signed-off-by: Michal Moskal <michal@moskal.me >
2025-04-29 02:21:32 +00:00
Cyrus Leung
506475de5f
[Optim] Compute multimodal hash only once per item ( #17314 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-29 09:40:35 +08:00
Ekagra Ranjan
cfe4532093
[Benchmark] Add single turn MTBench to Serving Bench ( #17202 )
2025-04-28 16:46:15 -07:00
Michael Goin
8fc88d63f1
[Model] Add tuned triton fused_moe configs for Qwen3Moe ( #17328 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-28 15:20:24 -07:00
Alex Wu
6e74fd4945
Support loading transformers models with named parameters ( #16868 )
...
Signed-off-by: Alex <alexwu@character.ai >
2025-04-28 23:15:58 +01:00
Simon Mo
dcbac4cb4b
[Model] Qwen3 Dense FP8 Compat Fixes ( #17318 )
...
Signed-off-by: simon-mo <xmo@berkeley.edu >
2025-04-28 14:12:01 -07:00
Charlie Fu
ed2462030f
[Bugfix] Fix moe weight losing all extra attrs after process_weights_after_loading. ( #16854 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-04-28 21:05:07 +00:00
Lucas Wilkinson
cc5befbced
[BugFix] Fix cascade attention - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #17283 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-28 13:55:50 -07:00
Aaron Pham
2c89cd96a8
[Chore] cleanup license indicators in light of SPDX ( #17259 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-04-28 19:43:52 +00:00
Russell Bryant
a0304dc504
[Security] Don't bind tcp zmq socket to all interfaces ( #17197 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-28 10:08:20 -07:00
Harry Mellor
c7941cca18
Explicitly explain quant method override ordering and ensure all overrides are ordered ( #17256 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 16:55:31 +00:00
Harry Mellor
b6dd32aa07
Make name of compressed-tensors quant method consistent across vLLM ( #17255 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 16:28:13 +00:00
Harry Mellor
f94886946e
Improve conversion from dataclass configs to argparse arguments ( #17303 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 16:22:12 +00:00
Russell Bryant
72dfe4c74f
[Docs] Add a security guide ( #17230 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-28 15:12:17 +00:00
Cyrus Leung
8b464d9660
[Misc] Clean up Qwen2.5-Omni code ( #17301 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-28 06:20:45 -07:00
Nicolò Lucchesi
889ebb2638
[Misc] Minor typo/grammar in platforms/interface.py ( #17307 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-28 05:45:42 -07:00
Reid
3ad986c28b
[doc] update wrong model id ( #17287 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-28 04:20:51 -07:00
Cyrus Leung
344e193b7d
[Bugfix] Add missing get_language_model to new MLLMs ( #17300 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-28 04:09:57 -07:00
Harry Mellor
fb1c933ade
Add missing class docstring for PromptAdapterConfig ( #17302 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-28 04:06:59 -07:00
idouba
72c5b97231
Update tpu_worker.py 's typo ( #17288 )
2025-04-28 04:01:15 -07:00
Alex Brooks
fa93cd9f60
[Model] Add Granite Speech Support ( #16246 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-28 10:05:00 +00:00
Cyrus Leung
aec9674dbe
[Core] Remove legacy input mapper/processor from V0 ( #15686 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-28 15:38:48 +08:00
Wanrui Dai
7fcc4223dc
[Minor][Models] Pass partial_rotary_factor parameter to rope ( #17266 )
...
Signed-off-by: evian <eviantai@u.nus.edu >
Co-authored-by: evian <eviantai@u.nus.edu >
2025-04-28 04:28:59 +00:00
Nick Hill
8262a3e23b
[Misc] Validate stop_token_ids contents ( #17268 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-28 03:54:05 +00:00
Reid
f211331c48
[Doc] small fix ( #17277 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-28 03:53:35 +00:00
Kuntai Du
9053d0b134
[Doc] Fix wrong github link in LMCache examples ( #17274 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-04-28 03:09:11 +00:00
Michael Goin
cb3f2d8d10
[Bugfix] Fix Mistral3 spatial merge error ( #17270 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-27 19:40:05 -07:00
TherLF
c12df53b60
[Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… ( #16751 )
...
Signed-off-by: Ther-LF <2639852836@qq.com >
2025-04-27 19:38:42 -07:00
Lennart K. M. Schulz
d1aeea7553
[Bugfix] Fix missing ARG in Dockerfile for arm64 platforms ( #17261 )
...
Signed-off-by: lkm-schulz <44176356+lkm-schulz@users.noreply.github.com >
2025-04-27 19:38:14 -07:00
Lucas Wilkinson
d8bccde686
[BugFix] Fix vllm_flash_attn install issues ( #17267 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-27 17:27:56 -07:00
Lily Liu
20e489eaa1
[V1][Spec Decode] Make eagle compatible with prefix caching. ( #17137 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
2025-04-27 09:29:43 -07:00
Cyrus Leung
4213475ec7
[Metrics] Fix minor inconsistencies in bucket progression ( #17262 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-27 16:19:39 +00:00
Reid
d92879baf6
[doc] Add feature status legend ( #17257 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-27 08:17:02 -07:00
cascade
690fe019f0
[Feature] support sequence parallelism using compilation pass ( #16155 )
...
Signed-off-by: cascade812 <cascade812@outlook.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-04-27 06:29:35 -07:00
Kaixi Hou
ed7a29d9f8
[NVIDIA] Support Cutlass MLA for Blackwell GPUs ( #16032 )
...
Signed-off-by: kaixih <kaixih@nvidia.com >
2025-04-27 06:29:21 -07:00
Alex Brooks
756848e79e
[Bugfix] Fix Lora Name Parsing ( #17196 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-27 20:33:09 +08:00
Flex Wang
18445edd0f
[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens ( #17033 )
...
Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com >
2025-04-27 12:30:53 +00:00
Jade Zheng
30215ca61f
[MISC] Use string annotation types for class definitions ( #17244 )
...
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
2025-04-27 08:39:57 +00:00
Chen Zhang
838cedade7
[Bugfix] Get a specific type of layer from forward context ( #17222 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-27 00:58:05 -07:00
Jee Jee Li
4283a28c2f
[Bugfix] Fix QWen2 VL multimodal mapping ( #17240 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-27 05:53:23 +00:00
Cyrus Leung
93a126fbc7
[Misc] Make cached tokenizer pickle-compatible ( #17048 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-27 13:05:00 +08:00
rasmith
8e4b351a0c
[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel ( #12591 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-04-27 00:35:08 +00:00
Happy
9869453c42
Update test_flash_attn.py ( #17102 )
...
Signed-off-by: ShuaibinLi <lishuaibin@live.cn >
2025-04-26 22:17:35 +00:00
Reid
3642c59aa8
[CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh ( #16271 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-26 18:25:05 +00:00
Woosuk Kwon
43eea2953b
[Minor] Fix lint error in main branch ( #17233 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-26 11:10:14 -07:00
Kero Liang
de7eb10ce4
[Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation ( #16878 )
...
Signed-off-by: imkero <kerorek@outlook.com >
2025-04-26 10:41:35 -07:00
Ning Xie
fd11a325b8
[MISC] rename interval to max_recent_requests ( #14285 )
2025-04-26 16:59:18 +00:00
Lu Fang
4d17e20310
Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 ( #16573 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-04-26 09:17:58 -07:00
changjun.lee
10fd1d7380
[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps ( #9276 )
...
Signed-off-by: changjun.lee <pord7457@gmail.com >
2025-04-26 11:51:17 -04:00
Russell Bryant
52b4f4a8d7
[Docs] Update structured output doc for V1 ( #17135 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-26 15:12:18 +00:00
Aaron Pham
e782e0a170
[Chore] added stubs for vllm_flash_attn during development mode ( #17228 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-04-26 07:45:26 -07:00
Ning Xie
dc2ceca5c5
[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set ( #17088 )
...
Signed-off-by: Andy Xie <andy.xning@gmail.com >
2025-04-26 14:34:24 +00:00
Russell Bryant
f8acd01ff7
[V1] Add structural_tag support using xgrammar ( #17085 )
2025-04-26 14:06:37 +00:00
Agata Dobrzyniewicz
c48334d405
[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device ( #17186 )
...
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai >
2025-04-26 05:55:14 -07:00
Cyrus Leung
909fdaf152
[Bugfix] Fix standard models tests ( #17217 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-26 02:26:41 -07:00
Isotr0py
8c1c926d00
[Bugfix] Fix missing int type for -n in multi-image example ( #17223 )
2025-04-26 08:49:52 +00:00
Nick Hill
df6f3ce883
[Core] Remove prompt string from engine core data structures ( #17214 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-25 23:41:05 -07:00
Woosuk Kwon
513f074766
[CI/test] Fix Eagle Correctness Test ( #17209 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-25 23:40:36 -07:00
Nick Hill
b07bf83c7d
[BugFix] Avoid race conditions in zero-copy tensor transmission ( #17203 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-26 06:00:07 +00:00
Zijing Liu
53e8cf53a4
[V1][Metrics] Allow V1 AsyncLLM to use custom logger ( #14661 )
...
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Mark McLoughlin <markmc@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-25 22:05:40 -07:00
Charlie Fu
54271bb766
[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. ( #17011 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-04-25 22:05:10 -07:00
Shu Wang
9e96f56efb
Allocate kv_cache with stride order ( #16605 )
...
Signed-off-by: shuw <shuw@nvidia.com >
2025-04-25 22:03:31 -07:00
Woosuk Kwon
b278911229
[Minor][Models] Fix Return Types of Llama & Eagle ( #17220 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-25 21:54:47 -07:00
yarongmu-google
7bd0c7745c
[Doc] Minor fix for the vLLM TPU setup page ( #17206 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-04-26 04:39:56 +00:00
Woosuk Kwon
1cf0719ebd
[Minor][Spec Decode] Add use_eagle to SpeculativeConfig ( #17213 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-25 21:08:15 -07:00
Reid
537d5ee025
[doc] add Anything LLM integration ( #17216 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-25 21:03:23 -07:00
Lu Fang
c8e5be35f7
[MISC][AMD] Add unused annotation to rocm kernel file ( #17097 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-04-25 20:33:35 -07:00
James Wu
a6e72e1e4f
[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env ( #17142 )
...
Signed-off-by: James Wu <jjwu@meta.com >
2025-04-26 11:28:20 +08:00
Yihua Cheng
5e83a7277f
[v1] [P/D] Adding LMCache KV connector for v1 ( #16625 )
2025-04-26 03:03:38 +00:00
rasmith
68af5f6c5c
[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary ( #17215 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
2025-04-25 19:55:05 -07:00
Chen Zhang
8de2901fea
[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled ( #17180 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-25 19:53:51 -07:00
Rui Qiao
c53e0730cb
[Misc] Refine ray_serve_deepseek example ( #17204 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-04-25 16:06:59 -07:00
Benjamin Chislett
a0e619e62a
[V1][Spec Decode] EAGLE-3 Support ( #16937 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
Co-authored-by: Bryan Lu <yuzhelu@amazon.com >
2025-04-25 15:43:07 -07:00
Nick Hill
70116459c3
[BugFix][Frontend] Fix LLM.chat() tokenization ( #16081 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-25 22:20:05 +00:00
Christian Heimes
65e262b93b
Fix Python packaging edge cases ( #17159 )
...
Signed-off-by: Christian Heimes <christian@python.org >
2025-04-26 06:15:07 +08:00
Cyrus Leung
43faa0461a
[Bugfix] Fix hybrid model tests ( #17182 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-25 15:14:37 -07:00
Daniel Li
48cb2109b6
[V1] Move usage stats to worker and start logging TPU hardware ( #16211 )
2025-04-25 14:06:01 -06:00
Russell Bryant
a5450f11c9
[Security] Use safe serialization and fix zmq setup for mooncake pipe ( #17192 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-04-25 16:53:23 +00:00
Cyrus Leung
9d98ab5ec6
[Misc] Inline Molmo requirements ( #17190 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-25 16:41:44 +00:00
Reid
df5c879527
[doc] update wrong hf model links ( #17184 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-25 16:40:54 +00:00
Harry Mellor
423e9f1cbe
Use Transformers helper get_text_config() instead of checking for text_config ( #17105 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-25 08:47:35 -07:00
Harry Mellor
0bd7f8fca5
Bump Transformers to 4.51.3 ( #17116 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-25 08:34:34 -07:00
Jasmond L
d5615af9ae
[Bugfix] Fix Mistral ChatCompletionRequest Body Exception ( #16769 )
...
Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-25 07:26:30 -07:00
Cyrus Leung
19dcc02a72
[Bugfix] Fix mistral model tests ( #17181 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-25 06:03:34 -07:00
Alex Brooks
7feae92c1f
[Doc] Move todo out of beam search docstring ( #17183 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-25 04:44:58 -07:00
Michael Yao
f851b84266
[Doc] Add two links to disagg_prefill.md ( #17168 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-25 10:23:57 +00:00
Lu Fang
fc966e9cc6
Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 ( #17158 )
2025-04-25 17:10:32 +08:00
Michael Yao
ef19e67d2c
[Doc] Add headings to improve gptqmodel.md ( #17164 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-25 01:13:13 -07:00
rasmith
a41351f363
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization ( #15734 )
...
Signed-off-by: Randall Smith <Randall.Smith@amd.com >
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Co-authored-by: Luka Govedič <lgovedic@redhat.com >
2025-04-25 00:45:02 -07:00
Sangyeon Cho
6aae216b4e
[Bugfix] remove fallback in guided_json (int range, patterns) ( #16725 )
...
Signed-off-by: csy1204 <josang1204@gmail.com >
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com >
2025-04-25 06:54:43 +00:00
yexin(叶鑫)
b22980a1dc
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance ( #16457 )
...
Signed-off-by: cynthieye <yexin93@qq.com >
Co-authored-by: MagnetoWang <magnetowang@outlook.com >
2025-04-25 14:52:28 +08:00
Lucas Wilkinson
881f735827
[Misc] Benchmark Serving Script Support Appending Results ( #17028 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-24 22:53:55 -07:00
Mengqing Cao
2f54045508
[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton ( #15099 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-04-24 22:51:02 -07:00
Lifu Huang
5aa6efb9a5
[Misc] Clean up redundant code in uniproc_executor.py ( #16762 )
...
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com >
2025-04-24 22:49:30 -07:00
Harry Mellor
6ca0234478
Move missed SchedulerConfig args into scheduler config group in EngineArgs ( #17131 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 22:48:53 -07:00
Michael Goin
649818995f
[Docs] Fix True->true in supported_models.md ( #17141 )
2025-04-25 04:20:04 +00:00
Varun Sundar Rabindranath
7a0a9da72b
[Doc] V1 : Update LoRA status ( #17133 )
...
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com >
2025-04-24 20:17:22 -07:00
Zaida Zhou
69bff9bc89
fix float16 support for kimi-vl ( #17156 )
...
Co-authored-by: zhouzaida <zhouzaida@msh.team >
2025-04-24 20:16:32 -07:00
Lucas Wilkinson
41ca7eb491
[Attention] FA3 decode perf improvement - single mma warp group support for head dim 128 ( #16864 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-24 20:12:21 -07:00
vllmellm
eef364723c
[FEAT] [ROCm]: AITER Fused MOE V1 Support ( #16752 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-04-25 11:06:50 +08:00
jglaser
0d6e187e88
Use custom address for listening socket ( #15988 )
...
Signed-off-by: Jens Glaser <glaserj@ornl.gov >
2025-04-25 01:57:16 +00:00
Michael Goin
9420a1fc30
Better error message for missing mistral params.json ( #17132 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-24 23:43:08 +00:00
Rui Qiao
583e900996
[Misc] Add example to run DeepSeek with Ray Serve LLM ( #17134 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-04-24 22:25:21 +00:00
Maximilien de Bayser
05e1fbfc52
Add chat template for Llama 4 models ( #16428 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-04-24 20:19:36 +00:00
Yinghai Lu
fe92176321
Add collective_rpc to llm engine ( #16999 )
...
Signed-off-by: Yinghai Lu <yinghai@thinkingmachines.ai >
2025-04-24 20:16:52 +00:00
Russell Bryant
6d0df0ebeb
[Docs] Generate correct github links for decorated functions ( #17125 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-24 10:39:43 -07:00
Harry Mellor
0fa939e2d1
Improve configs - LoRAConfig + PromptAdapterConfig ( #16980 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 10:29:34 -07:00
Harry Mellor
0422ce109f
Add :markdownhelp: to EngineArgs docs so markdown docstrings render properly ( #17124 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 10:28:45 -07:00
Eyshika Agarwal
47bdee409c
Molmo Requirements ( #17026 )
...
Signed-off-by: Eyshika Agarwal <eyshikaengineer@gmail.com >
Signed-off-by: eyshika <eyshikaengineer@gmail.com >
2025-04-24 10:08:37 -07:00
Atilla
49f189439d
existing torch installation pip command fix for docs ( #17059 )
2025-04-24 10:07:21 -07:00
Aaruni Aggarwal
5adf6f6b7f
Updating builkite job for IBM Power ( #17111 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-04-24 10:06:17 -07:00
Russell Bryant
4115f19958
[CI] Add automation for the tool-calling github label ( #17118 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-24 09:22:00 -07:00
Mark McLoughlin
340d7b1b21
[V1][Spec Decoding] Add num_drafts and num_accepted_tokens_per_position metrics ( #16665 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-24 08:57:40 -07:00
Reid
1bcbcbf574
[Misc] refactor example series - structured outputs ( #17040 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-24 07:49:48 -07:00
Michael Goin
82e43b2d7e
Add missing rocm_skinny_gemms kernel test to CI ( #17060 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-24 07:49:37 -07:00
wang.yuqi
67309a1cb5
[Frontend] Using matryoshka_dimensions control the allowed output dimensions. ( #16970 )
2025-04-24 07:06:28 -07:00
Shanshan Shen
b724afe343
[V1][Structured Output] Clear xgrammar compiler object when engine core shut down to avoid nanobind leaked warning ( #16954 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-24 06:15:03 -07:00
Harry Mellor
21f4f1c9a4
Improve static type checking in LoRAModelRunnerMixin ( #17104 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 06:14:47 -07:00
Isotr0py
b0c1f6202d
[Misc] Remove OLMo2 config copy ( #17066 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-24 06:14:32 -07:00
Rui Qiao
c0dfd97519
[V1][PP] Optimization: continue scheduling prefill chunks ( #17080 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-04-24 05:27:08 -07:00
Harry Mellor
a9138e85b1
Fix OOT registration test ( #17099 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 04:44:12 -07:00
Harry Mellor
0a05ed57e6
Simplify TokenizerGroup ( #16790 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-24 04:43:56 -07:00
Michael Goin
14288d1332
Disable enforce_eager for V1 TPU sampler and structured output tests ( #17016 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-24 02:50:09 -07:00
Woosuk Kwon
b411418ff0
[Chore] Remove Sampler from Model Code ( #17084 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-24 02:49:33 -07:00
omer-dayan
2bc0f72ae5
Add docs for runai_streamer_sharded ( #17093 )
...
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-24 01:03:21 -07:00
Reid
9c1244de57
[doc] update to hyperlink ( #17096 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-24 00:58:08 -07:00
Reid
db2f8d915c
[V1] Update structured output ( #16812 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-23 23:57:17 -07:00
张宇
6167c0e5d2
[Bugfix][Core] add seq_id_to_seq_group clearing to avoid memory leak when s… ( #16472 )
...
Signed-off-by: 开哲 <kaizhe.zy@alibaba-inc.com >
Co-authored-by: 开哲 <kaizhe.zy@alibaba-inc.com >
2025-04-24 11:25:37 +08:00
Areeb Syed
ed2e464653
Addendum Fix to support FIPS enabled machines with MD5 hashing ( #17043 )
...
Signed-off-by: sydarb <areebsyed237@gmail.com >
2025-04-23 19:55:00 -07:00
Harry Mellor
2c8ed8ee48
More informative error when using Transformers backend ( #16988 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 19:54:03 -07:00
Michael Goin
ed50f46641
[Bugfix] Enable V1 usage stats ( #16986 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-23 19:54:00 -07:00
Woosuk Kwon
46e678bcff
[Minor] Use larger batch sizes for A100/B100/B200/MI300x ( #17073 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-23 19:18:59 -07:00
Chen Xia
6b2427f995
[Quantization]add prefix for commandA quantized model ( #17017 )
2025-04-23 17:32:40 -07:00
Sangyeon Cho
b07d741661
[CI/Build] workaround for CI build failure ( #17070 )
...
Signed-off-by: csy1204 <josang1204@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-04-23 16:14:18 -07:00
Woosuk Kwon
41fb013d29
[V1][Spec Decode] Always use argmax for sampling draft tokens ( #16899 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-23 14:57:43 -07:00
Yong Hoon Shin
32d4b669d0
[BugFix][V1] Fix int32 token index overflow when preparing input ids ( #16806 )
2025-04-23 12:12:35 -07:00
Travis Johnson
3cde34a4a4
[Frontend] Support guidance:no-additional-properties for compatibility with xgrammar ( #15949 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-04-23 18:34:41 +00:00
Harry Mellor
bdb3660312
Use @property and private field for data_parallel_rank_local ( #17053 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 08:50:08 -07:00
Harry Mellor
f3a21e9c68
CacheConfig.block_size should always be int when used (#17052 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 08:50:05 -07:00
Harry Mellor
8e630d680e
Improve Transformers backend model loading QoL ( #17039 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 07:33:51 -07:00
Russell Bryant
af869f6dff
[CI] Update structured-output label automation ( #17055 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-23 07:33:14 -07:00
Harry Mellor
53c0fa1e25
Ensure that pid passed to kill_process_tree is int for mypy ( #17051 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-23 07:32:26 -07:00
Michael Yao
f7912cba3d
[Doc] Add top anchor and a note to quantization/bitblas.md ( #17042 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-23 07:32:16 -07:00
Michael Goin
6317a5174a
Categorize tests/kernels/ based on kernel type ( #16799 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-23 09:21:07 -04:00
Michael Goin
aa72d9a4ea
Mistral-format support for compressed-tensors ( #16803 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-23 08:46:23 -04:00
Russell Bryant
ce17db8085
[CI] Run v1/test_serial_utils.py in CI ( #16996 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-23 01:13:34 -07:00
Chauncey
8c87a9ad46
[Bugfix] Fix AssertionError: skip_special_tokens=False is not supported for Mistral tokenizers ( #16964 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-23 07:24:09 +00:00
huafeng
ec69124eb4
[Misc] Improve readability of get_open_port function. ( #17024 )
...
Signed-off-by: gitover22 <qidizou88@gmail.com >
2025-04-23 06:16:53 +00:00
Lucas Wilkinson
d0da99fb70
[BugFix] llama4 fa3 fix - RuntimeError: scheduler_metadata must have shape (metadata_size) ( #16998 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-22 21:49:24 -07:00
Nick Hill
b2f195c429
[V1] Avoid socket errors during shutdown when requests are in in-flight ( #16807 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-23 12:36:29 +08:00
vllmellm
047797ef90
[Bugfix] Triton FA function takes no keyword arguments ( #16902 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-04-22 21:35:24 -07:00
Reid
eb8ef4224d
[doc] add download path tips ( #17013 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-23 04:06:30 +00:00
Chendi.Xue
56a735261c
[INTEL-HPU][v0] Port delayed sampling to upstream ( #16949 )
...
Signed-off-by: Michal Adamczyk <michal.adamczyk@intel.com >
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai >
2025-04-22 20:14:11 -07:00
youkaichao
e1cf90e099
[misc] tune some env vars for GB200 ( #16992 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-23 10:59:48 +08:00
Chauncey
6bc1e30ef9
Revert "[Misc] Add S3 environment variables for better support of MinIO." ( #17021 )
2025-04-22 19:22:29 -07:00
vllmellm
7e081ba7ca
[BugFix] Revert ROCm Custom Paged Attention Env Flag Check ( #17022 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-04-22 19:17:48 -07:00
Nick Hill
1e013fa388
[V1][DP] More robust DP/EP dummy request coordination ( #16277 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-22 19:12:15 -07:00
Aleksandr Malyshev
bc7c4d206b
[Kernel][ROCM] Upstream prefix prefill speed up for vLLM V1 ( #13305 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Signed-off-by: maleksan85 <maleksan@amd.com >
Signed-off-by: <>
Co-authored-by: Sage Moore <sage@neuralmagic.com >
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com >
2025-04-22 19:11:56 -07:00
Yang Wang
f67e9e9f22
add Dockerfile build vllm against torch nightly ( #16936 )
...
Signed-off-by: Yang Wang <elainewy@meta.com >
2025-04-22 19:08:27 -07:00
Guillaume Calmettes
36fe78769f
[Bugfix] validate urls object for multimodal content parts ( #16990 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-23 09:43:06 +08:00
Chenyaaang
83d933718c
[Core][V1][TPU] Enable structured decoding on TPU V1 ( #16499 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-22 18:05:23 -06:00
Nick Hill
5175b884f7
[BugFix] Remove default multiproc executor collective_rpc timeout ( #17000 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-22 23:27:14 +00:00
Alexei-V-Ivanov-AMD
5536b30a4c
Fencing Kernels Tests for enabling on AMD ( #16929 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-22 09:32:40 -07:00
Richard Zou
7f58fb9718
Add assertion for no objects while hashing hf_config ( #16930 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-22 09:32:22 -07:00
vllmellm
30bc3e0f66
[FEAT][ROCm]: Support AITER MLA ( #15893 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Co-authored-by: qli88 <qiang.li2@amd.com >
2025-04-22 09:31:13 -07:00
Reid
f34410715f
[frontend] enhance tool_calls type check ( #16882 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-22 15:40:24 +00:00
Chauncey
68d4c33202
[Misc] Add S3 environment variables for better support of MinIO. ( #16977 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-22 14:27:36 +00:00
Zhengyuan Su (苏政渊)
f961d7f6ef
[BugFix] Pass in correct VLLM config in FlashInfer backend ( #13207 ) ( #16973 )
...
Signed-off-by: 苏政渊 <suzhengyuan@moonshot.cn >
Co-authored-by: 苏政渊 <suzhengyuan@moonshot.cn >
2025-04-22 06:44:10 -07:00
Harry Mellor
d059110498
Improve configs - SpeculativeConfig ( #16971 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-22 12:55:36 +00:00
Yang Fan
571e8dd65e
[Bugfix] Fix distributed bug again in Qwen2.5-VL & Qwen2.5-Omni ( #16974 )
...
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com >
2025-04-22 12:23:17 +00:00
Reid
4b91c927f6
[Misc] refactor example series ( #16972 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-22 11:44:21 +00:00
vllmellm
0e237f0035
[FEAT][ROCm] Integrate Paged Attention Kernel from AITER ( #15001 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-04-22 02:46:28 -07:00
Cyrus Leung
8f7bace7c3
[Doc] Improve documentation for multimodal CLI args ( #16960 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-22 08:35:35 +00:00
Nick Hill
e4d6144232
[BugFix] Fix incremental detokenization perf issue ( #16963 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-22 08:16:19 +00:00
Lei Wang
8d32dc603d
[Kernel] Support Microsoft Runtime Kernel Lib for our Low Precision Computation - BitBLAS ( #6036 )
...
Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com >
Co-authored-by: xinyuxiao <xinyuxiao2024@gmail.com >
2025-04-22 09:01:36 +01:00
Woosuk Kwon
c4ab9f3e71
[V1] Remove pre-allocation for KV cache ( #16941 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-22 00:52:18 -07:00
Flora Feng
2689d5c027
[Model] Use autoweightloader for mamba ( #16950 )
...
Signed-off-by: sfeng33 <4florafeng@gmail.com >
2025-04-22 07:48:15 +00:00
Chauncey
acba33a0f1
[Bugfix] Fix the issue where llm.generate cannot be called repeatedly after setting GuidedDecodingParams ( #16767 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-04-22 06:02:20 +00:00
SnowCharm
a114bf20a3
[Perf] Optimize _update_states for GPU model runner ( #16910 )
...
Signed-off-by: snowcharm <snowcharmqq@gmail.com >
2025-04-22 14:01:54 +08:00
Michael Yao
3097ce3a32
[Doc] Update ai_accelerator/hpu-gaudi.inc.md ( #16956 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-22 05:33:27 +00:00
Cyrus Leung
d6da9322c8
[Bugfix] Fix f-string for Python 3.9-3.11 ( #16962 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-21 21:45:55 -07:00
omer-dayan
71ce44047f
Support S3 Sharded loading with RunAI Model Streamer ( #16317 )
...
Signed-off-by: Omer Dayan (SW-GPU) <omer@run.ai >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-21 21:21:49 -07:00
Charlie Fu
188b7f9b8c
[Performance][ROCm] Add skinny gemms for unquantized linear on ROCm ( #15830 )
...
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-04-21 20:46:22 -07:00
wangxiyuan
b9b4746950
[V1] Remove additional_config check ( #16710 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-04-21 20:45:27 -07:00
Varun Sundar Rabindranath
7b8a2ab76f
[Kernel] Add expert_map support to Cutlass FP8 MOE ( #16861 )
...
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com >
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com >
2025-04-21 20:44:32 -07:00
Jee Jee Li
c9acbf1141
[Misc] Remove the chunked prefill warning for LoRA ( #16925 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-21 20:44:24 -07:00
kliuae
5b794cae8d
[ROCm] Add aiter tkw1 kernel for Llama4 fp8 ( #16727 )
...
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com >
2025-04-21 20:42:34 -07:00
Jeffrey Li
0e4254492f
[Bugfix]: fix issue with n>1 sampling on v1 requests overriding each other ( #16863 )
...
Signed-off-by: Jeffrey Li <jeffrey.dot.li@gmail.com >
2025-04-22 11:40:19 +08:00
Woosuk Kwon
1311913f55
[BugFix][Spec Decode] No in-place update to draft probs ( #16952 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-21 19:54:19 -07:00
Cyrus Leung
29f395c97c
[Doc] Remove unnecessary V1 flag ( #16924 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-21 21:04:38 -04:00
Nicolò Lucchesi
fa3bba2a53
[TPU][V1] Enable Top-P ( #16843 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-22 00:46:07 +00:00
Michael Goin
986537f1c3
[V1] V1 FlashInfer Attention ( #16684 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Aurick Qiao <qiao@aurick.net >
2025-04-22 00:38:41 +00:00
Nicolò Lucchesi
210207525e
[TPU][V1] Capture multimodal encoder during model compilation ( #15051 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Siyuan Liu <lsiyuan@google.com >
2025-04-21 18:36:59 -06:00
Michael Goin
71eda0bb76
Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml ( #16946 )
2025-04-21 18:35:32 -06:00
Chengji Yao
471fe65630
[TPU][V1] Implicitly adjust page size when there's SMEM OOM ( #16871 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-21 15:43:13 -06:00
Woosuk Kwon
3a0fba5cf4
[V1][Spec Decode] Handle draft tokens beyond max_model_len ( #16087 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-21 12:38:50 -07:00
Chanh Nguyen
299ebb62b2
[Core] Speed up decode by remove synchronizing operation in sampler ( #16436 )
...
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com >
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com >
2025-04-21 18:18:22 +00:00
David Xia
f728ab8e35
[Doc] mention how to install in CPU editable mode ( #16923 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-04-21 17:45:51 +00:00
David Xia
63e26fff78
[doc] install required python3-dev apt package ( #16888 )
...
Signed-off-by: David Xia <david@davidxia.com >
2025-04-21 16:15:18 +00:00
Yan Ma
fe3462c774
[XPU][Bugfix] minor fix for XPU ( #15591 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2025-04-22 00:02:57 +08:00
Kartik Ramesh
3b34fd5273
Raise error for data-parallel with benchmark_throughput ( #16737 )
...
Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2025-04-21 23:51:43 +08:00
Isotr0py
55d6d3fdb8
[Bugfix] Fix GLM rotary_dim issue and support v1 ( #16912 )
...
Signed-off-by: isotr0py <2037008807@qq.com >
2025-04-21 14:26:34 +00:00
Shanshan Shen
7272bfae77
[Misc] Refactor platform to get device specific stream and event ( #14411 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-21 21:25:49 +08:00
wangxiyuan
d9ac9e3dc5
[Misc] fix collect_env version parse ( #15267 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-04-21 20:29:40 +08:00
Han Zhang
d41faaf9df
Restore buffers when wake up from level 2 sleep ( #16564 ) ( #16889 )
...
Signed-off-by: Han <zh950713@gmail.com >
2025-04-21 20:18:28 +08:00
Alex Brooks
b34f33438a
[Doc] Split dummy_processor_inputs() in Multimodal Docs ( #16915 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-21 11:10:01 +00:00
Yang Fan
26c0406555
[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni ( #16907 )
2025-04-21 10:25:21 +00:00
Woosuk Kwon
4c41278b77
[CI/CD][V1] Add spec decode tests to CI ( #16900 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-20 22:37:16 -07:00
qizixi
bb3605db85
[Bugfix] Fix v1/spec_decode/test_ngram.py ( #16895 )
...
Signed-off-by: qizixi <qizixi@meta.com >
2025-04-20 20:54:29 -07:00
Richard Zou
fe742aef5a
[easy] Pass compile_fx only the config patches ( #16845 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-20 12:25:19 +08:00
Harry Mellor
4b07d36891
Improve configs - CacheConfig ( #16835 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-20 12:25:04 +08:00
Staszek Paśko
87aaadef73
Serialize tensors using int8 views ( #16866 )
...
Signed-off-by: Staszek Pasko <staszek@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-19 10:28:34 -07:00
Richard Zou
682e0b6d2f
Log how much time loading a compiled artifact takes ( #16848 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-19 16:50:46 +00:00
Reid
d6195a748b
[doc] update hyperlink ( #16877 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-19 16:40:38 +00:00
Cyrus Leung
205d84aaa9
[VLM] Clean up models ( #16873 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-19 12:13:06 +00:00
Roger Wang
5124f5bf51
[Model] Qwen2.5-Omni Cleanup ( #16872 )
2025-04-19 09:37:02 +00:00
Isotr0py
83f3c3bd91
[Model] Refactor Phi-4-multimodal to use merged processor and support V1 ( #15477 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-19 02:26:11 -07:00
vie-serendipity
d9737ca1c6
[V1][Misc] stop update prefix cache stats when logs_stats is disabled ( #16460 )
...
Signed-off-by: vie-serendipity <2733147505@qq.com >
2025-04-19 02:25:19 -07:00
Nicolò Lucchesi
9d4ca19d50
[Misc] Benchmarks for audio models ( #16505 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-19 02:24:14 -07:00
Nicolò Lucchesi
2ef0dc53b8
[Frontend] Add sampling params to v1/audio/transcriptions endpoint ( #16591 )
...
Signed-off-by: Jannis Schönleber <joennlae@gmail.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Jannis Schönleber <joennlae@gmail.com >
2025-04-19 07:03:54 +00:00
Divakar Verma
1d4680fad2
[rocm][MI300] llama4 maverick fp8 moe config tp8 ( #16847 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-04-19 06:21:43 +00:00
Yang Fan
2c1bd848a6
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) ( #15130 )
...
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Xiong Wang <wangxiongts@163.com >
2025-04-18 23:14:36 -07:00
omrishiv
5c9121203c
[release] Publish neuron docker image ( #16733 )
...
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com >
2025-04-18 17:11:25 -07:00
Justin Ho
490b1698a5
[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info ( #16857 )
...
Signed-off-by: jmho <jaylenho734@gmail.com >
2025-04-18 23:28:53 +00:00
Reid
5a5e29de88
[Misc] refactor examples series - Chat Completion Client With Tools ( #16829 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-18 23:24:42 +00:00
wang.yuqi
3d3ab3689f
[New Model]: Snowflake Arctic Embed (Family) ( #16649 )
2025-04-18 08:11:57 -07:00
Harry Mellor
686623c5e7
Fix nullable_kvs fallback ( #16837 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-18 05:58:39 -07:00
Cyrus Leung
aadb656562
[Misc] Clean up Kimi-VL ( #16833 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-18 05:15:09 -07:00
Jonghyun Choe
87e067de41
[Model] use AutoWeightsLoader for BigCode, GPT-J ( #16823 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com >
2025-04-18 10:42:41 +00:00
Michael Yao
26507f8973
[Docs] Fix a link and grammar issue in production-stack.md ( #16809 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-18 06:42:58 +00:00
Nathan Weinberg
9c1d5b456d
[Doc] add podman setup instructions for official image ( #16796 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com >
2025-04-18 06:10:49 +00:00
Lucia Fang
e31045f95c
[Bugfix] fix pp for llama4 ( #16746 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-04-18 13:51:30 +08:00
Luka Govedič
aaec845f8e
[ROCm] [Attention] Cleanup ROCm output passing ( #16431 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
2025-04-18 05:46:45 +00:00
rongfu.leng
7bdfd29a35
[Misc] add collect_env to cli and docker image ( #16759 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-17 22:13:35 -07:00
Harry Mellor
e78587a64c
Improve-mm-and-pooler-and-decoding-configs ( #16789 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 22:13:32 -07:00
Lucas Wilkinson
7eb4255628
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales ( #16801 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-17 22:13:29 -07:00
Michael Goin
6a0f547561
Add hardware print to TPU V1 test ( #16792 )
2025-04-17 22:13:26 -07:00
Shanshan Shen
30ed81b7ca
[V1][Structured Output] Minor modification to _validate_structured_output() ( #16748 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-18 13:12:54 +08:00
Chauncey
7a4a5de729
[Misc] Update outdated note: LMCache now supports chunked prefill ( #16697 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-18 05:12:42 +00:00
Cyrus Leung
c16fb5dae8
[Doc] Improve help examples for --compilation-config ( #16729 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-17 21:22:34 -07:00
Tarun Kumar
e37073efd7
Add property-based testing for vLLM endpoints using an API defined by an OpenAPI 3.1 schema ( #16721 )
...
Signed-off-by: Tarun Kumar <takumar@redhat.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-17 21:08:27 -07:00
Lucas Wilkinson
183dad7a85
[Attention] Update to lastest FA3 code ( #13111 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-17 15:14:07 -07:00
Yihua Cheng
3408e47159
[P/D][V1] KV Connector API V1 ( #15960 )
...
Signed-off-by: ApostaC <yihua98@uchicago.edu >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Signed-off-by: remi <remi@mistral.ai >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Co-authored-by: Rémi Delacourt <54138269+Flechman@users.noreply.github.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
2025-04-17 13:22:40 -07:00
Nick Hill
0377b8310b
[MLA] Simplification to batch P/D reordering ( #16673 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-17 16:12:09 -04:00
Mark McLoughlin
e4755f7fac
[V1][Metrics] Fix http metrics middleware ( #15894 )
2025-04-17 19:52:18 +00:00
Sijia(Jackson) Chen
92edf35826
[ROCM] enable aiter fused moe kernel for llama4 bf16 checkpoints ( #16674 )
2025-04-17 11:44:34 -07:00
Nicolò Lucchesi
eb5819b2d9
[V1][TPU] Enable Top K ( #15489 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: Hyesoo Yang <hyeygit@gmail.com >
2025-04-17 18:18:11 +00:00
Nicolò Lucchesi
5989f4684d
[TPU][V1] Fix padding recompilation when max-num-batched-tokens is not even ( #16726 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-17 18:09:57 +00:00
rongfu.leng
5125d72f02
[Model] use AutoWeightsLoader for olmoe,opt,orion,persimmon,phi3_small ( #16548 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-17 17:48:31 +00:00
Ximingwang-09
a018e555fd
[Kernel] Add fp8_w8a8 fused MoE kernel tuning configs for DeepSeek V3/R1 on NVIDIA H20 ( #16753 )
...
Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com >
Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com >
2025-04-18 00:01:30 +08:00
Robin
6211b92273
[Bugfix]Fix index out of range error in api server log ( #16787 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-04-17 09:01:07 -07:00
Nick Hill
05fcd1b430
[V1][Perf] Faster incremental detokenization ( #15137 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-17 07:45:24 -07:00
Insu Kim
7c02d6a137
[Doc] Changed explanation of generation_tokens_total and prompt_tokens_total counter type metrics to avoid confusion ( #16784 )
...
Signed-off-by: insukim1994 <insu.kim@moreh.io >
2025-04-17 14:10:08 +00:00
wang.yuqi
11c3b98491
[Doc] Document Matryoshka Representation Learning support ( #16770 )
2025-04-17 13:37:37 +00:00
Cyrus Leung
dbe7f07001
[Doc] Make sure to update vLLM when installing latest code ( #16781 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-17 06:53:31 -06:00
Reid
c69bf4ee06
fix: hyperlink ( #16778 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-17 11:34:20 +00:00
Harry Mellor
d27ea94034
Improve configs - TokenizerPoolConfig + DeviceConfig ( #16603 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 11:19:42 +00:00
Reid
99ed526101
[Misc] refactor examples series - lmcache ( #16758 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-17 11:02:35 +00:00
Michael Yao
207da28186
[Doc] Fix a 404 link in installation/cpu.md ( #16773 )
...
Signed-off-by: windsonsea <haifeng.yao@daocloud.io >
2025-04-17 10:46:21 +00:00
intervitens
5b1aca2ae3
[Bugfix] Fix GLM4 model ( #16618 )
...
Signed-off-by: intervitens <intervitens@tutanota.com >
2025-04-17 03:35:07 -07:00
Reid
d8e557b5e5
[doc] add open-webui example ( #16747 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-17 18:27:32 +08:00
Cyrus Leung
61a44a0b22
[Doc] Add more tips to avoid OOM ( #16765 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-17 09:54:34 +00:00
DefTruth
a6481525b8
[misc] ignore marlin_moe_wna16 local gen codes ( #16760 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-17 17:15:14 +08:00
Richard Liaw
8cac35ba43
[Ray] Improve documentation on batch inference ( #16609 )
...
Signed-off-by: Richard Liaw <rliaw@berkeley.edu >
2025-04-16 22:19:26 -07:00
Russell Bryant
9dbf7a2dc1
[V1] Remove log noise when idle ( #16735 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-16 21:34:08 -07:00
David Heineman
607029e515
[Bugfix] Revert max_prompt_len validation for decoder-only models. ( #16741 )
...
Signed-off-by: David Heineman <david@davidheineman.com >
2025-04-16 21:33:15 -07:00
Isotr0py
cb072ce93b
[Bugfix] Update Florence-2 tokenizer to make grounding tasks work ( #16734 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-17 04:17:39 +00:00
Divakar Verma
95aca283b4
[rocm][V0] fix selection logic for custom PA in V0 ( #16426 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-04-16 19:52:11 -07:00
Robert Shaw
2b05b8ce69
[V1][Frontend] Improve Shutdown And Logs ( #11737 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: afeldman-nm <156691304+afeldman-nm@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-16 19:48:34 -07:00
Aaruni Aggarwal
3c776dcefb
Adding vllm buildkite job for IBM Power ( #16679 )
...
Signed-off-by: Aaruni Aggarwal <aaruniagg@gmail.com >
2025-04-17 10:47:47 +08:00
Bryan Lu
2cbd4d2999
[V1][Spec Dec Bug Fix] Respect Spec Dec Method Specification ( #16636 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
2025-04-16 19:47:26 -07:00
Staszek Paśko
3092375e27
[V1][Performance] Implement custom serializaton for MultiModalKwargs [Rebased] ( #16432 )
...
Signed-off-by: Staszek Pasko <staszek@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-16 19:28:32 -07:00
Harry Mellor
3cd91dc955
Help user create custom model for Transformers backend remote code models ( #16719 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 01:05:59 +00:00
Jade Zheng
8a7368e069
[Misc] Remove redundant comment ( #16703 )
...
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com >
2025-04-17 00:44:52 +00:00
Harry Mellor
93e561ec4d
Improve error for structured output backend selection ( #16717 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-17 00:35:35 +00:00
Joe Runde
e1b004839a
[Hardware] Add processor inputs to platform validation ( #16680 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-04-16 09:28:42 -07:00
xsank
ee378f3d49
[Model] support modernbert ( #16648 )
...
Signed-off-by: 唯勤 <xsank.mz@alibaba-inc.com >
Co-authored-by: 唯勤 <xsank.mz@alibaba-inc.com >
2025-04-16 05:30:15 -07:00
DefTruth
e82ee40de3
[Bugfix][Kernel] fix potential cuda graph broken for merge_attn_states kernel ( #16693 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-16 03:31:39 -07:00
Cyrus Leung
facbe2a114
[Doc] Improve OOM troubleshooting ( #16704 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-16 18:29:48 +08:00
Reid
7168920491
[Misc] refactor examples series ( #16708 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-16 10:16:36 +00:00
Kay Yan
21378a2323
[CI] Cleanup additional_dependencies: [toml] for pre-commit yapf hook ( #16405 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-04-16 10:05:31 +00:00
Shanshan Shen
976711d9db
[V1][Structured Output] Move xgrammar related utils to backend_xgrammar.py ( #16578 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-16 17:01:36 +08:00
Sage Moore
44fa4d556c
[ROCM] Bind triton version to 3.2 in requirements-built.txt ( #16664 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-04-16 14:05:28 +08:00
billishyahao
3ac98edcb1
[Feature] add model aware kv ops helper ( #16020 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
2025-04-15 23:00:43 -07:00
Richard Zou
966c742ed2
Disable remote caching when calling compile_fx ( #16611 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-15 22:18:28 -07:00
Jee Jee Li
0d7d05f4b6
[Misc] Modify LRUCache touch ( #16689 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-16 04:51:38 +00:00
rongfu.leng
96bb8aa68b
[Bugfix] fix gpu docker image mis benchmarks dir ( #16628 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-15 21:21:14 -07:00
Shinichi Hemmi
3badb0213b
[Model] Add PLaMo2 ( #14323 )
...
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com >
Signed-off-by: shemmi <shemmi@preferred.jp >
Co-authored-by: Kento Nozawa <nzw0301@preferred.jp >
Co-authored-by: Hiroaki Mikami <mhiroaki@preferred.jp >
Co-authored-by: Calvin Metzger <metzger@preferred.jp >
2025-04-15 19:31:30 -07:00
Angky William
fdcb850f14
[Misc] Enable vLLM to Dynamically Load LoRA from a Remote Server ( #10546 )
...
Signed-off-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local >
Co-authored-by: Angky William <angkywilliam@Angkys-MacBook-Pro.local >
2025-04-15 22:31:38 +00:00
Dipika Sikka
54a66e5fee
[Misc] Update compressed-tensors WNA16 to support zero-points ( #14211 )
2025-04-15 07:33:51 -06:00
DefTruth
280d62b8a2
[Kernel] Remove redundant Exp calculations ( #16123 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-15 12:58:37 +00:00
Xihui Cang
1666e66443
Add "/server_info" endpoint in api_server to retrieve the vllm_config. ( #16572 )
...
Signed-off-by: Xihui Cang <xihuicang@gmail.com >
2025-04-15 11:50:38 +00:00
Jee Jee Li
1575c1701a
[CI/Build] Fix LoRA OOM ( #16624 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-15 16:38:19 +08:00
Reid
6ae996a873
[Misc] refactor argument parsing in examples ( #16635 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-15 08:05:30 +00:00
Richard Zou
b590adfdc1
Fix vLLM x torch.compile config caching ( #16491 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-14 23:11:11 -07:00
Michael Goin
b4fe16c75b
Add vllm bench [latency, throughput] CLI commands ( #16508 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-14 23:10:35 -07:00
Pooya Davoodi
bc5dd4f669
[Bugfix] Fix broken GritLM model and tests (missing pooling_metadata) ( #16631 )
...
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io >
2025-04-14 23:09:58 -07:00
Tyler Michael Smith
dbb036cf61
[Bugfix] Fix tests/kernels/test_mamba_ssm_ssd.py ( #16623 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-04-15 05:35:38 +00:00
Taneem Ibrahim
70e7ed841d
[BugFix]: Update minimum pyzmq version ( #16549 )
...
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
2025-04-14 20:06:03 -07:00
Jinzhen Lin
d06ba4ed3f
[Kernel] moe wna16 marlin kernel ( #14447 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-14 20:05:22 -07:00
Alex Brooks
6b40996ae8
[Core][Bugfix] Fix Offline MM Beam Search ( #16390 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-15 10:33:02 +08:00
Shuqiao Li
d2020acac7
config check sleep mode support oot platforms ( #16562 )
2025-04-14 16:31:50 -07:00
Chengji Yao
1eb3c2ed48
[DOC][TPU] Add core idea about avoiding recompilation after warmup ( #16614 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-14 21:56:06 +00:00
Siyuan Liu
c64ee87267
[Hardware][TPU] Add torchvision to tpu dependency file ( #16616 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-04-14 17:50:46 -04:00
courage17340
b1308b84a3
[Model][VLM] Add Kimi-VL model support ( #16387 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-04-14 21:41:48 +00:00
Nishan Acharya
7b5ecf79bd
s390x: Fix PyArrow build and add CPU test script for Buildkite CI ( #16036 )
...
Signed-off-by: Nishan Acharya <Nishan.Acharya@ibm.com >
2025-04-14 10:55:32 -07:00
Harry Mellor
9883a18859
Fix triton install condition on CPU ( #16600 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-14 17:06:01 +00:00
Nicolò Lucchesi
b3f2fddd17
[TPU][V1] Fix exponential padding when max-num-batched-tokens is not a power of 2 ( #16596 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-14 17:01:05 +00:00
Cyrus Leung
aa29841ede
[Bugfix] Multi-modal caches not acting like LRU caches ( #16593 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-14 09:24:16 -07:00
Md. Shafi Hussain
6bf27affb6
[fix]: Dockerfile.ppc64le fixes for opencv-python and hf-xet ( #16048 )
...
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
2025-04-14 17:08:39 +01:00
shangmingc
1dd23386ec
[Misc] Update usage with mooncake lib for kv transfer ( #16523 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-04-14 11:31:37 +00:00
Reid
7cbfc10943
[Misc] refactor examples ( #16563 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-14 09:59:15 +00:00
DefTruth
ce4ddd2d1a
[Misc] remove warning if triton>=3.2.0 ( #16553 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-14 02:39:47 -07:00
Harry Mellor
e51929ebca
Improve configs - SchedulerConfig ( #16533 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-14 17:24:16 +08:00
Russell Bryant
dc1b4a6f13
[Core][V0] Enable regex support with xgrammar ( #13228 )
...
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-14 10:13:38 +08:00
Jennifer Zhao
63d2705edb
[Benchmark][Bugfix] Fix SonnetDataset default values in benchmark_throughput.py ( #16556 )
2025-04-13 17:20:26 -07:00
Michael Goin
d085a44082
Enable PTPC FP8 for CompressedTensorsW8A8Fp8MoEMethod (triton fused_moe) ( #16537 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-13 14:55:18 +00:00
Lily Liu
f49e5aff11
[V1][Spec Decode] KV cache slots for eagle heads ( #16370 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
2025-04-12 19:42:51 -07:00
Ryan McConville
6c11ecf8d3
[Bugfix] Validate logit biases to prevent out of vocab ids crashing engine ( #16529 )
...
Signed-off-by: Ryan McConville <ryan@ryanmcconville.com >
2025-04-12 20:19:19 +00:00
SnowCharm
93e5f3c5fb
[Perf] Optimize Preparing Inputs for GPU Model Runner ( #16484 )
...
Signed-off-by: snowcharm <snowcharmqq@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-12 22:54:37 +08:00
Jie Fu (傅杰)
70363bccfa
Fix syntaxWarning: invalid escape sequence '\s' ( #16532 )
...
Signed-off-by: Jie Fu <jiefu@tencent.com >
2025-04-12 14:39:42 +00:00
Jee Jee Li
3cdc57669f
[Misc] Delete redundant code ( #16530 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-04-12 11:21:37 +00:00
Huazhong Ji
68bb122eb4
[MISC] Make GroupCoordinator compatible with out-of-tree devices ( #16464 )
...
Signed-off-by: hzji210@gmail.com <hzji210@gmail.com >
2025-04-12 09:20:25 +00:00
Cyrus Leung
d9fc8cd9da
[V1] Enable multi-input by default ( #15799 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-12 08:52:39 +00:00
Nicolò Lucchesi
f069f3ea74
[Misc] Openai transcription client example use same Whisper model ( #16487 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-12 07:27:03 +00:00
Cyrus Leung
c5bc0e7fcc
[Misc] Update chat utils tests ( #16520 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-12 06:48:43 +00:00
Tianer Zhou
4a3a518722
fix: spelling ( #16466 )
...
Signed-off-by: Tianer Zhou <ezhoureal@gmail.com >
2025-04-11 23:24:22 -07:00
wang.yuqi
fbf722c6e6
[Frontend] support matryoshka representation / support embedding API dimensions ( #16331 )
2025-04-11 23:23:10 -07:00
leon-seidel
e92d7085bf
[Feature][V1] Add xgrammar to support minLength, maxLength with test ( #16516 )
...
Signed-off-by: Leon Seidel <leon.seidel@fau.de >
2025-04-11 23:22:07 -07:00
Michael Goin
bd6028d6b0
Optimized topk for topk=1 (Llama-4) ( #16512 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-12 14:21:08 +08:00
Ye (Charlotte) Qi
802329dee9
[Doc] Update Llama4 Model Names in Supported Models ( #16509 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-04-12 02:53:10 +00:00
Nick Hill
41cc883c29
[BugFix] Handle non-contiguous tensors properly when serializing ( #16492 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-11 17:54:06 -07:00
Michael Goin
57504a4bcf
[CI][Bugfix] Add mistral_tool_use to Ci ( #16517 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 17:52:38 -07:00
Yuan Tang
ed4792c990
[Doc] Fix link to vLLM blog ( #16519 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-04-11 17:39:23 -07:00
Michael Goin
87b836ba77
Bugfix for PixtralHF models without spatial_merge_size ( #16513 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 23:32:22 +00:00
rongfu.leng
56c76c2e0e
[Bugfix] clean up duplicated code ( #16485 )
...
Signed-off-by: Gogs <gogs@fake.local >
Co-authored-by: Gogs <gogs@fake.local >
2025-04-11 23:19:40 +00:00
Christian Sears
c09632a66c
Update openai_compatible_server.md ( #16507 )
...
Signed-off-by: Christian Sears <csears@redhat.com >
2025-04-11 22:54:58 +00:00
Yong Hoon Shin
a3bf8d4a2b
[Kernel] Add tuned FusedMoE kernel config for Llama4 Scout, TP=8 on H100 ( #16488 )
2025-04-12 06:26:55 +08:00
Ye (Charlotte) Qi
16eda8c43a
[Frontend] Added chat templates for LLaMa4 pythonic tool calling ( #16463 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Kai Wu <kaiwu@meta.com >
2025-04-12 06:26:17 +08:00
Harry Mellor
cd77382ac1
Improve configs - LoadConfig ( #16422 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-11 20:27:27 +00:00
Travis Johnson
71b9cde010
[Bugfix] handle alignment of encoder_seq_lens in mllama.py ( #14784 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-04-11 19:59:50 +00:00
Isotr0py
5285589f37
[Doc] Document InternVL3 support ( #16495 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-11 19:41:09 +00:00
Michael Goin
f41647ee6b
[Kernel] Support W8A8 channel-wise weights and per-token activations in triton fused_moe_kernel ( #16366 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 17:54:08 +00:00
Nicolò Lucchesi
4d022cbc75
[TPU][V1] Make --disable_chunked_mm_input mandatory for serving MM models ( #16483 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-11 17:06:14 +00:00
Richard Zou
70de35a881
Fix erroneous "model doesn't support compile" warning ( #16486 )
...
Signed-off-by: rzou <zou3519@gmail.com >
2025-04-11 16:24:36 +00:00
Tomasz Zielinski
34b2cf3b33
[Hardware][Intel-Gaudi] Multi-step scheduling implementation for HPU ( #12779 )
...
Signed-off-by: Tomasz Zielinski <tomasz.zielinski@intel.com >
2025-04-11 07:38:36 -07:00
chaow-amd
9e90c9f73f
[Bugfix] Fix bugs of running Quark quantized models ( #16236 )
...
Signed-off-by: chaow <chaow@amd.com >
2025-04-11 10:18:32 -04:00
DefTruth
e9528f6dc6
[Kernel] support merge_attn_states CUDA kernel, 3x speedup ( #16173 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-04-11 06:50:50 -06:00
Harry Mellor
51baa9c333
Don't install triton on ppc64le platform ( #16470 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-11 10:11:00 +00:00
Reid
35e076b3a8
[Misc] update api_client example ( #16459 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-11 10:05:40 +00:00
Jee Jee Li
a26f59ccbc
[Misc] Raise error for V1 not supporting Long LoRA. ( #16415 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-11 01:51:20 -07:00
Michael Goin
aa3b3d76e0
Enforce valid max_num_batched_tokens when disable_chunked_mm_input=True ( #16447 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-11 08:09:52 +00:00
Jee Jee Li
f7030df3be
[Core][LoRA][1/N] Add LoRA for EncoderDecoderModelRunner ( #15990 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-11 15:32:37 +08:00
DefTruth
905e91e9ac
Revert "[Model] use AutoWeightsLoader for deepseek_v2, internlm2" ( #16453 )
2025-04-11 06:44:22 +00:00
Alex Brooks
f8f9c0ba62
[Bugfix] Don't set an upper bound on repetition penalty ( #16403 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-04-11 14:19:40 +08:00
Li, Jiang
dda811021a
[CPU][Bugfix] Fix CPU docker issues ( #16454 )
...
Signed-off-by: jiang.li <jiang1.li@intel.com >
2025-04-11 14:19:07 +08:00
Isotr0py
93195146ea
[Bugfix][VLM] Fix failing Phi-4-MM multi-images tests and add vision-speech test ( #16424 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-11 04:57:16 +00:00
Michael Goin
ed37599544
Update supported_hardware.md for TPU INT8 ( #16437 )
2025-04-11 12:28:07 +08:00
Yong Hoon Shin
99ef59cf7f
[Llama4] Enable attention temperature tuning by default for long context (>32k) ( #16439 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-04-10 21:26:07 -07:00
Chenyaaang
d544d141ec
update benchmark_serving_structured_output to include auto backend ( #16438 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-11 12:25:52 +08:00
Alexey Belyakov
3e397a9484
check input length of sonnet samples ( #16423 )
...
Signed-off-by: alexey-belyakov <alexey.belyakov@intel.com >
2025-04-11 10:15:06 +08:00
WWW
268c325078
Fix range_ratio Bug in RandomDataset ( #16126 )
...
Signed-off-by: jadewang21 <jadewangcn@outlook.com >
2025-04-10 15:31:17 -07:00
Nicolò Lucchesi
3cc9af88ff
[TPU][V1] Disable per-request seed/Generator ( #16172 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-10 17:05:44 -04:00
look
7cd0bd7212
[Bugfix] Fix output token length check logic ( #16419 )
...
Signed-off-by: look <eeslook@163.com >
2025-04-10 20:16:48 +00:00
Cyrus Leung
56d4aefa33
[VLM] Avoid unnecessary dummy multimodal data during processing ( #16416 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-10 19:32:14 +00:00
Nick Hill
dd143ef541
[V1] Zero-copy tensor/ndarray serialization/transmission ( #13790 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-10 19:23:14 +00:00
Chih-Chieh Yang
daefed052c
[Model] Reduce redundant computations in mamba2 blocks for Bamba-9B ( #15423 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
Co-authored-by: Yu Chin Fabian Lim <flim@sg.ibm.com >
2025-04-10 19:07:07 +00:00
Chenyaaang
5fbab20e02
[Bugfix] Fix bug when dataset is json ( #15899 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-10 18:35:41 +00:00
Lily Liu
e8224f3dca
[V1][Spec Decode] Eagle Model loading ( #16035 )
...
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com >
2025-04-10 11:21:48 -07:00
Russell Bryant
9665313c39
[V1] Set structured output backend to auto by default ( #15724 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-10 17:53:26 +00:00
Harry Mellor
0c54fc7273
Improve configs - ParallelConfig ( #16332 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-10 17:34:37 +00:00
Nicolò Lucchesi
c1b57855ec
[TPU][V1] Use language_model interface for getting text backbone in MM ( #16410 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-10 17:32:04 +00:00
Cyrus Leung
83b824c8b4
[VLM] Remove BaseProcessingInfo.get_mm_max_tokens_per_item ( #16408 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-10 09:06:58 -07:00
Lu Fang
7678fcd5b6
Fix the torch version parsing logic ( #15857 )
2025-04-10 07:37:47 -07:00
wineandchord
8661c0241d
[CI] Add auto update workflow for Dockerfile graph ( #11879 )
...
Signed-off-by: wineandchord <guoqizhou19@gmail.com >
2025-04-10 13:43:05 +00:00
Reid
ce8d6b75fc
[doc] update the wrong link ( #16401 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-10 21:02:37 +08:00
Ye (Charlotte) Qi
61de3ef74b
[Model] Remove image mm limit for LLaMa4 ( #16365 )
...
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
2025-04-10 09:36:27 +00:00
cyyever
ec1f9c8c91
Update Numba to 0.61.2 ( #16376 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-04-10 07:59:37 +00:00
Reid
65e09094c4
[doc] add download model tips ( #16389 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-10 07:45:26 +00:00
Michael Goin
c70cf0fe06
[Kernel] Use moe_wna16 kernel for compressed tensors wna16 moe models ( #16038 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-10 15:08:47 +08:00
Cyrus Leung
a5d11a54dc
[Bugfix] Fix validation error for text-only Mllama 3.2 ( #16377 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-10 14:19:42 +08:00
Cyrus Leung
3d4c87758e
[Misc] Update transformers version limits of multi-modal tests ( #16381 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-09 23:03:33 -07:00
Aaron Ang
a9bd832fc5
[Model] use AutoWeightsLoader for deepseek_v2, internlm2 ( #16383 )
...
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com >
2025-04-09 23:01:00 -07:00
Chenyaaang
417bcefbae
fix sonnet dataset sample when prefix len is very small ( #16379 )
...
Signed-off-by: Chenyaaang <chenyangli@google.com >
2025-04-10 05:35:07 +00:00
Michael Goin
baada0e737
[Bugfix][TPU] Fix TPU validate_request ( #16369 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-04-10 12:55:12 +08:00
Benjamin Kitor
82eb61dd4c
[misc] use tqdm.auto where appropriate ( #16290 )
...
Signed-off-by: Benjamin Kitor <bkitor@gigaio.com >
2025-04-09 21:54:54 -07:00
Roger Wang
0d4d06fe2f
[CI][Bugfix] Pin triton version for CPU ( #16384 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-10 04:35:00 +00:00
Jintao
4aed0ca6a2
[bugfix] Avoid the time consumption caused by creating dummy videos. ( #16371 )
2025-04-10 04:30:05 +00:00
Chengji Yao
1621b25288
[TPU] Fix dummy loading OOM ( #16372 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-10 04:06:16 +00:00
Aaron Ang
a564797151
[Model] use AutoWeightsLoader for granite, granitemoe, granitemoeshared, grok1, mixtral ( #16325 )
...
Signed-off-by: Aaron Ang <aaron.angyd@gmail.com >
2025-04-09 20:07:40 -07:00
Guillaume Calmettes
1da6a09274
[Bugfix]: do not shutdown server if skip_special_use=False for MistralTokenizer ( #14094 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-09 19:43:09 -07:00
Yuxuan Zhang
1e44ffc3ff
Add GLM-4-0414 support ( #16338 )
...
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com >
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com >
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
Co-authored-by: Accelerator1996 <lvfei.lv@alibaba-inc.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
Co-authored-by: yihong <zouzou0208@gmail.com >
Co-authored-by: Lucia Fang <116399278+luccafong@users.noreply.github.com >
Co-authored-by: ajayvohra2005 <ajayvohr@amazon.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-10 09:19:42 +08:00
Chengji Yao
a454748544
[TPU][V1] Refine tpu_model_runner to mitigate future recompilation issues ( #16275 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-09 18:51:51 -06:00
Reid
1bff42c4b7
[Misc] refactor Structured Outputs example ( #16322 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-09 23:32:42 +00:00
Joe Runde
cb391d85dc
[Hardware] add platform-specific request validation api ( #16291 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-04-09 12:50:01 -07:00
Russell Bryant
fee5b8d37f
[Build/CI] Add tracing deps to vllm container image ( #15224 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-09 19:14:06 +00:00
Michael Goin
b2ce859bd2
Fix benchmark_throughput.py --backend=hf ( #16352 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-09 19:09:28 +00:00
Chendi.Xue
566f10a929
[CI]Fix hpu docker and numpy version for CI ( #16355 )
...
Signed-off-by: Chendi Xue <chendi.xue@intel.com >
2025-04-09 17:52:26 +00:00
Guillaume Calmettes
c3b5189137
[Bugfix] catch AssertionError in MistralTokenizer as ValueError ( #16344 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-09 17:33:24 +00:00
zh Wang
a25866ac8d
[Bugfix] Fix profiling.py ( #16202 )
...
Signed-off-by: zh Wang <rekind133@outlook.com >
2025-04-09 17:03:34 +00:00
Michael Goin
098900d7c2
Revert "Update label-tpu mergify and remove removal bot" ( #16350 )
2025-04-09 07:59:36 -07:00
Guillaume Calmettes
98d01d3ce2
[Bugfix][Frontend] respect provided default guided decoding backend ( #15476 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-04-09 05:11:10 -07:00
Nicolò Lucchesi
d55244df31
[Model] Add SupportsMultiModal.get_language_model interface ( #16007 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-09 04:12:54 -07:00
yihong
04149cce27
[BugFix] fix some typos found by typos. ( #16314 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-09 03:43:59 -07:00
ajayvohra2005
24834f4894
update neuron config ( #16289 )
...
Signed-off-by: Ajay Vohra <ajayvohr@amazon.com >
2025-04-09 03:43:22 -07:00
Lucia Fang
ec7da6fcf3
[BugFix] llama4 qknorm should be not shared across head ( #16311 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-04-09 00:59:14 -07:00
yihong
819d548e8a
[BugFix] logger is not callable ( #16312 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-09 00:59:02 -07:00
Michael Goin
477d2a8aa2
Update label-tpu mergify and remove removal bot ( #16298 )
2025-04-09 07:56:25 +00:00
Cyrus Leung
e484e02857
[Bugfix] Avoid transferring cached multi-modal items from P0 to P1 ( #16273 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-09 00:51:27 -07:00
Accelerator1996
24f6b9a713
[Misc] Fix test_sharded_state_loader.py( #16004 ) ( #16005 )
...
Signed-off-by: lvfei.lv <lvfei.lv@alibaba-inc.com >
2025-04-09 14:47:30 +08:00
Luka Govedič
9cdde47289
[BugFix] Fix fusion test and add them to CI ( #16287 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-04-08 23:46:45 -07:00
Chengji Yao
b1eb4ca152
[TPU] Update PyTorch/XLA ( #16288 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-09 14:46:32 +08:00
Michael Goin
87b4ac56c2
[CI][Bugfix] Fix bad tolerance for test_batch_base64_embedding ( #16221 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-09 04:14:46 +00:00
Russell Bryant
cb84e45ac7
[Core] Upgrade to xgrammar 0.1.18, add cache size limit ( #16283 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-08 19:13:22 -07:00
rongfu.leng
4716377fbc
[Feature] Estimate max-model-len use available KV cache memory ( #16168 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-08 19:12:51 -07:00
rongfu.leng
4e9cf8c1dd
[Bugfix] fix gettid method is not define ( #16084 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-08 19:12:44 -07:00
TJian
2976dc27e9
[Bug] [ROCm] Fix Llama 4 Enablement Bug on ROCm: V0 ROCmFlashAttentionImpl and Triton Fused MoE bugs ( #16198 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
Co-authored-by: Hongxia Yang <hongxia.yang@amd.com >
Co-authored-by: kliuae <kuanfu.liu@embeddedllm.com >
2025-04-08 19:12:34 -07:00
Chauncey
102bf967f0
[Model] Add smolvlm support ( #16017 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-08 19:12:17 -07:00
yueshen2016
1f4b09b525
Add support to modelopt quantization of Mixtral model ( #15961 )
...
Signed-off-by: Yue <yueshen@nvidia.com >
2025-04-09 01:53:31 +00:00
Jee Jee Li
86c3369eb8
[CI/Build] Fix CI LoRA failure ( #16270 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-09 09:13:56 +08:00
Russell Bryant
2755c34a8f
[V1] Update structured output offline inference example ( #15721 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-08 22:34:09 +00:00
Jinzhen Lin
db10422184
[Bugfix] fix deepseek fp16 scale bug ( #14809 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-08 16:56:09 -04:00
Lucas Wilkinson
e1a2c699dd
[BugFix] Fix Llama4 - Index Error When Single Request Near Max Context ( #16209 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
2025-04-08 18:56:51 +00:00
Harry Mellor
0115ccd5c0
Add warning that content below line in template will be removed ( #16276 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-08 18:18:40 +00:00
Isotr0py
40b4284fe3
[Bugfix] Handle process_weights_after_loading for QKVCrossParallelLinear ( #15328 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-08 10:02:23 -07:00
Cyrus Leung
4ebc0b9640
[Bugfix] Proper input validation for multi-modal encoder-decoder models ( #16156 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-08 09:45:21 -07:00
Kero Liang
dc96fd54c6
[Misc] Avoid stripping meaningful whitespace from nvidia-smi topo -m output in collect_env.py ( #16272 )
...
Signed-off-by: imkero <kerorek@outlook.com >
2025-04-08 16:08:09 +00:00
wang.yuqi
1f5d13ab9f
[New Model]: jinaai/jina-embeddings-v3 ( #16120 )
2025-04-08 08:39:12 -07:00
Harry Mellor
90cb44eb02
Update to transformers==4.51.1 ( #16257 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-08 06:53:39 -07:00
Kebe
e11880deea
[Bugfix] Remove triton do_bench fast_flush arg ( #16256 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-04-08 13:51:06 +00:00
TY-AMD
9351f91be9
[BugFix][ROCm] Fix GGUF MoE Dispatch Block_Dim for ROCm ( #16247 )
...
Signed-off-by: Tianyuan Wu <Tianyuan.Wu@amd.com >
2025-04-08 05:10:26 -07:00
rongfu.leng
5a1e1c8353
[Model] use AutoWeightsLoader for phimoe,qwen2_moe,qwen3_moe ( #16203 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-08 04:05:47 -07:00
Alex Brooks
69ecaa7c79
[Misc] Add warning for multimodal data in LLM.beam_search ( #16241 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-04-08 04:05:27 -07:00
Reid
7f00899ff7
[Misc] format and refactor some examples ( #16252 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-08 10:42:32 +00:00
Simon Mo
995e3d1f41
[Docs] Add Slides from Singapore Meetup ( #16213 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-08 07:20:22 +00:00
Kebe
b4ac449a83
[Misc] Merge the logs of pp layers partitions ( #16225 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-04-08 00:18:15 -07:00
Michael Goin
8e5314a468
[V1] Add disable_chunked_mm_input arg to disable partial mm input prefill ( #15837 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-07 23:24:07 -07:00
Siyuan Liu
87918e40c4
[torch.compile][TPU] Make @support_torch_compile work for XLA backend ( #15782 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-04-08 14:23:53 +08:00
Isotr0py
f6b32efb7f
[Bugfix] Fix and reorganize broken GGUF tests and bump gguf version ( #16194 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-08 13:38:13 +08:00
Michael Goin
b99733d092
[Bugfix] Do not skip "empty" parts of chats that are parsable ( #16219 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-08 05:14:15 +00:00
Yong Hoon Shin
05a015d6a5
Add warning for Attention backends that do not support irope yet ( #16212 )
2025-04-08 03:59:26 +00:00
zxfan-cpu
ad971af8c7
[Bugfix] fix use-ep bug to enable ep by dp/tp size > 1 ( #16161 )
2025-04-07 20:48:47 -07:00
Roger Wang
f2ebb6f541
[V1] Scatter and gather placeholders in the model runner ( #16076 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
2025-04-08 10:43:41 +08:00
Satyajith Chilappagari
1d01211264
Update BASE_IMAGE to 2.22 release of Neuron ( #16218 )
2025-04-07 19:11:18 -07:00
Miles Williams
f94ab12f79
[Misc] Update compressed-tensors to version 0.9.3 ( #16196 )
...
Signed-off-by: Miles Williams <42222518+mlsw@users.noreply.github.com >
2025-04-07 19:09:06 -07:00
youkaichao
a865bc1ca6
[core] do not send error across process ( #16174 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-07 19:09:03 -07:00
Michael Goin
21802c4b6d
[ROCm][Bugfix][FP8] Make fp8 quant respect fused modules mapping ( #16031 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
2025-04-07 21:28:14 -04:00
Driss Guessous
652907b354
Torchao ( #14231 )
...
Signed-off-by: drisspg <drisspguessous@gmail.com >
2025-04-07 19:39:28 -04:00
leon-seidel
24f1c01e0f
[Bugfix][V0] XGrammar structured output supports Enum ( #15878 )
...
Signed-off-by: Leon Seidel <leon.seidel@fau.de >
2025-04-07 22:38:25 +00:00
Reid
fad6e2538e
[Misc] add description attribute in CLI ( #15921 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-07 22:30:35 +00:00
Nick Hill
7f6d47c1a2
[V1][BugFix] Exit properly if engine core fails during startup ( #16137 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-07 15:30:15 -07:00
Benjamin Chislett
3147586ebd
[Bugfix] Fix guidance backend for Qwen models ( #16210 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-04-07 22:15:43 +00:00
Roger Wang
ed636d99ca
[Misc] Move Llama 4 projector call into encoder execution ( #16201 )
2025-04-07 14:02:05 -07:00
Nicolò Lucchesi
090c856d76
[Misc] Human-readable max-model-len cli arg ( #16181 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-04-07 14:40:58 -04:00
Gregory Shtrasberg
ad434d4cfe
Print the warning only once ( #16193 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-07 18:30:06 +00:00
Cyrus Leung
66d433b94f
[V1] Revert the default max_num_seqs to V0 values for most hardware ( #16158 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 13:54:36 -04:00
Cyrus Leung
027b204ff1
[Bugfix] Re-enable support for ChatGLMForConditionalGeneration ( #16187 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 23:15:58 +08:00
Lu Fang
55dcce91df
Upstream Llama4 Support to Main ( #16113 )
...
Signed-off-by: Aston Zhang <22279212+astonzhang@users.noreply.github.com >
Signed-off-by: Chris Thi <chris.c.thi@gmail.com >
Signed-off-by: drisspg <drisspguessous@gmail.com >
Signed-off-by: Jon Swenson <jmswen@gmail.com >
Signed-off-by: Keyun Tong <tongkeyun@gmail.com >
Signed-off-by: Lu Fang <fanglu@meta.com >
Signed-off-by: Xiaodong Wang <xdwang@meta.com >
Signed-off-by: Yang Chen <yangche@fb.com >
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com >
Signed-off-by: Yong Hoon Shin <yhshin@meta.com >
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com >
Signed-off-by: Lu Fang <lufang@fb.com >
Signed-off-by: Lu Fang <fanglu@fb.com >
Signed-off-by: Lucia Fang <fanglu@fb.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 08:06:27 -07:00
Robin
8017c8db7f
[Doc]Update image to latest version ( #16186 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-04-07 14:17:39 +00:00
Reid
dc3529dbf6
[Misc] improve example mlpspeculator and llm_engine_example ( #16175 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-07 11:53:52 +00:00
YamPengLi
7699258ef0
[Model] Add Qwen3 and Qwen3MoE ( #15289 )
...
Signed-off-by: YamPengLi <yampayne.lyp@alibaba-inc.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-07 04:06:41 -07:00
Shanshan Shen
e9ba99f296
[V1][Structured Output] Add supports_structured_output() method to Platform ( #16148 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-04-07 11:06:24 +00:00
Isotr0py
7c80368710
[VLM] Florence-2 supports online serving ( #16164 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-07 04:04:02 -07:00
yihong
95d63f38c0
doc: fix some typos in doc ( #16154 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-07 05:32:06 +00:00
Roger Wang
bb8dab821e
[CI] Set max transformers version for Ultravox model test ( #16149 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-07 04:37:58 +00:00
Isotr0py
fc0f87768a
[Bugfix] Make dummy encoder prompt padding alternative and add missing warnings ( #16129 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-07 04:07:15 +00:00
Cyrus Leung
0a57386721
[Misc] Update Mistral-3.1 example ( #16147 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-07 03:57:37 +00:00
Woosuk Kwon
3749e28774
[V1][Minor] Minor simplification for get_computed_blocks ( #16139 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-06 20:38:12 -07:00
Kay Yan
86fc2321ff
[Metrics] Add bucket for request_latency, time_to_first_token and time_per_output_token ( #15202 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-04-06 20:34:51 -07:00
Martin Hoyer
2549c0dfef
Fix requires-python ( #16132 )
2025-04-06 19:22:25 -07:00
Woosuk Kwon
b10e519895
[V1][Minor] Optimize get_cached_block ( #16135 )
2025-04-06 20:48:14 +00:00
Chengji Yao
9bde5ba127
[TPU] Update PyTorch/XLA ( #16130 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-06 18:25:55 +00:00
Reid
72c8f1ad04
[Misc] update requires-python in pyproject.toml ( #16116 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-06 14:56:34 +00:00
paolovic
da224daaa9
[Bugfix] add hf_token to EngineArgs ( #16093 )
...
Signed-off-by: paolovic <paul-philipp.luley@uzh.ch >
Co-authored-by: paolovic <paul-philipp.luley@uzh.ch >
2025-04-06 14:47:33 +00:00
Varun Sundar Rabindranath
3a100b9278
[Bugfix] LoRA : Fix the order in which the kernels process LoRAs ( #16040 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-04-06 14:04:50 +00:00
rongfu.leng
242a637aea
[Model] use AutoWeightsLoader for stablelm,starcoder2,zamba2 ( #16103 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-06 05:52:01 -07:00
Isotr0py
c2a9671510
[Misc] Improve model redirect to accept json dictionary ( #16119 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-06 05:51:45 -07:00
Paul Schweigert
d5ae4f7f42
[Doc][Bugfix] Add missing EOF in k8s deploy doc ( #16025 )
2025-04-06 12:10:57 +00:00
Reid
b6c502a150
[Misc] refactor example eagle ( #16100 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-06 09:42:48 +00:00
Roger Wang
9ca710e525
[CI][V1] Fix passing tokenizer as kwarg to validate_guidance_grammar ( #16117 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-06 16:18:00 +08:00
Ben Jackson
eb07c8cb5b
[Frontend] Fix typo in tool chat templates for llama3.2 and toolace ( #14501 )
...
Signed-off-by: Ben Jackson <ben@ben.com >
2025-04-06 07:44:36 +00:00
Hyesoo Yang
ba10801961
[Benchmark] Add sampling parameters to benchmark_serving. ( #16022 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
2025-04-06 12:30:35 +08:00
Lucia Fang
620fc2d09e
[Model] fix model testing for TeleChat2ForCausalLM and V0 llama4 ( #16112 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
2025-04-05 21:23:40 -07:00
Jonghyun Choe
29283eaa7e
[Model] use AutoWeightsLoader for phi, gemma, deepseek ( #16088 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com >
2025-04-05 20:34:38 -07:00
Jinzhen Lin
2fa66ef713
[Bugfix] fix use_atomic_add support of marlin kernel when using v1 engine ( #15946 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-04-05 20:04:22 -07:00
Chauncey
13affc432d
[Misc] Remove redundant code ( #16098 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-05 20:03:50 -07:00
Reid
d8f094a92a
[Misc] format output for encoder_decoder.py ( #16095 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-05 19:57:18 -07:00
Harry Mellor
97ae6d777f
Fix some capitalisations in generated examples doc titles ( #16094 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-05 13:44:03 +00:00
yihong
6baeee70d1
Revert "doc: add info for macos clang errors ( #16049 )" ( #16091 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-05 11:51:51 +00:00
Reid
d2517a4939
[doc] fix 404 ( #16082 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-05 11:39:18 +00:00
yihong
6342adc438
fix: support clang17 for macos and fix the real libomp ( #16086 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-05 11:00:12 +00:00
Kevin H. Luu
0adba91547
[CI] Fix benchmark script level ( #16089 )
2025-04-05 03:36:01 -07:00
Tristan Leclercq
4285e423a6
[Misc] Auto detect bitsandbytes pre-quantized models ( #16027 )
...
Signed-off-by: Tristan Leclercq <tristanleclercq@gmail.com >
2025-04-04 23:30:45 -07:00
Woosuk Kwon
63375f0cdb
[V1][Spec Decode] Update N-gram Proposer Interface ( #15750 )
...
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-04 16:32:54 -07:00
Michael Goin
70ad3f9e98
[Bugfix][TPU] Fix V1 TPU worker for sliding window ( #16059 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-04-04 23:31:19 +00:00
bnellnm
d6fc629f4d
[Kernel][Minor] Re-fuse triton moe weight application ( #16071 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-04 23:27:34 +00:00
Roger Wang
af51d80fa1
Revert "[V1] Scatter and gather placeholders in the model runner" ( #16075 )
2025-04-04 14:50:57 -07:00
Cyrus Leung
f5722a5052
[V1] Scatter and gather placeholders in the model runner ( #15712 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-04-04 21:26:44 +00:00
Nick Hill
651cf0fec1
[V1] DP scale-out (1/N): Use zmq ROUTER/DEALER sockets for input queue ( #15906 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-04-04 12:56:43 -07:00
Kevin H. Luu
4dc52e1c53
[CI] Reorganize .buildkite directory ( #16001 )
...
Signed-off-by: kevin <kevin@anyscale.com >
2025-04-04 12:16:20 -07:00
Michael Goin
4708f13a9c
[Bugfix] Fix default behavior/fallback for pp in v1 ( #16057 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-04 17:58:08 +00:00
Gregory Shtrasberg
a6d042df0a
[ROCm][Bugfix] Bring back fallback to eager mode removed in #14917 , but for ROCm only ( #15413 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-04 09:40:37 -07:00
Gregory Shtrasberg
40a36ccfeb
[ROCm][Bugfix] Use platform specific FP8 dtype ( #15717 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-04 09:40:20 -07:00
Ilya Markov
ef608c37a7
[Distributed] [ROCM] Fix custom allreduce enable checks ( #16010 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
2025-04-04 09:39:08 -07:00
Li, Jiang
2386803f2a
[CPU] Change default block_size for CPU backend ( #16002 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-04-04 09:39:05 -07:00
Ziji Shi (Steven)
95862f7b4d
[Benchmark][Doc] Update throughput benchmark and README ( #15998 )
...
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-04-04 09:39:02 -07:00
Isotr0py
230b131b54
[Bugfix][kernels] Fix half2float conversion in gguf kernels ( #15995 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-04 09:38:58 -07:00
liuzhenwei
0812d8dd41
[Hardware][Gaudi][BugFix] fix arguments of hpu fused moe ( #15945 )
...
Signed-off-by: zhenwei <zhenweiliu@habana.ai >
2025-04-04 09:38:55 -07:00
Jonghyun Choe
bf7e3c51ae
[Model] use AutoWeightsLoader for baichuan, gpt-neox, mpt ( #15939 )
...
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com >
2025-04-04 09:38:52 -07:00
Mark McLoughlin
a35a8a8392
[V1][Spec Decode] Avoid logging useless nan metrics ( #16023 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-04 08:52:41 -07:00
yihong
4ef0bb1fcf
doc: add info for macos clang errors ( #16049 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-04 14:58:16 +00:00
Chengji Yao
fadc59c0e6
[TPU][V1] Remove ragged attention kernel parameter hard coding ( #16041 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-04 07:48:50 -04:00
Reid
86cbd2eee9
[Misc] improve gguf check ( #15974 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-04 01:33:36 +00:00
Huy Do
092475f738
[ROCm] Tweak the benchmark script to run on ROCm ( #14252 )
2025-04-03 17:12:48 -07:00
bnellnm
dcc56d62da
[Bugfix] Fix function names in test_block_fp8.py ( #16033 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-03 23:01:34 +00:00
Robert Shaw
f15e70d906
[TPU] Switch Test to Non-Sliding Window ( #15981 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-04-03 14:28:45 -07:00
iefgnoix
b6be6f8d1e
[TPU] Support sliding window and logit soft capping in the paged attention kernel for TPU. ( #15732 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-04-03 14:23:28 -07:00
Alexei-V-Ivanov-AMD
03a70eacaf
Re-enable the AMD Testing for the passing tests. ( #15586 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-04-03 11:05:17 -07:00
yarongmu-google
45b1ff7a25
[Misc][Performance] Advance tpu.txt to the most recent nightly torch … ( #16024 )
2025-04-03 17:32:54 +00:00
bnellnm
15ba07ef25
[Minor] Fused experts refactor ( #15914 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-03 10:19:38 -07:00
Liangfu Chen
d2b58ca203
[Neuron][kernel] Fuse kv cache into a single tensor ( #15911 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-04-03 09:51:32 -07:00
Kyle Sayers
82e7e19a6e
[SupportsQuant] Chameleon, Chatglm, Commandr ( #15952 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-04-03 08:25:22 -07:00
Kyle Sayers
421c462948
[SupportsQuant] Bert, Blip, Blip2, Bloom ( #15573 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-04-03 08:23:19 -07:00
yihong
84884cd9ac
fix: tiny fix make format.sh excutable ( #16015 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-03 15:18:05 +00:00
Reid
a43aa183dc
[doc] update contribution link ( #15922 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-03 10:47:31 +00:00
wwl2755
463bbb1835
[Bugfix][V1] Fix bug from putting llm_engine.model_executor in a background process ( #15367 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-04-03 07:32:10 +00:00
youkaichao
5e125e74d1
[misc] improve error message for "Failed to infer device type" ( #15994 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-03 14:45:03 +08:00
Ziji Shi (Steven)
06f21ce7a5
[Benchmark] Add AIMO Dataset to Benchmark ( #15955 )
...
Signed-off-by: Ziji Shi <shi.ziji.sm@gmail.com >
Signed-off-by: StevenShi-23 <shi.ziji.sm@gmail.com >
2025-04-03 06:09:18 +00:00
Aleksandr Malyshev
57a810db9c
[ROCM][V0] PA kennel selection when no sliding window provided ( #15982 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-04-03 05:28:44 +00:00
youkaichao
8b664706aa
[bugfix] add seed in torchrun_example.py ( #15980 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-03 12:25:01 +08:00
yihong
37bfee92bf
fix: better error message for get_config close #13889 ( #15943 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-03 03:53:19 +00:00
Aleksandr Malyshev
e73ff24e31
[ROCM][KERNEL] Paged attention for V1 ( #15720 )
...
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Signed-off-by: root <root@banff-cyxtera-s65-4.amd.com >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: root <root@banff-cyxtera-s65-4.amd.com >
2025-04-02 19:48:00 -07:00
Nicolò Lucchesi
bd7599d34a
[V1][TPU] Do not compile sampling more than needed ( #15883 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-04-03 01:36:01 +00:00
Chengji Yao
01b6113659
[TPU] optimize the all-reduce performance ( #15903 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-04-03 00:25:14 +00:00
Hyesoo Yang
1b84eff03a
[V1][TPU] TPU-optimized top-p implementation (avoids scattering). ( #15736 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
Co-authored-by: root <root@t1v-n-822696b7-w-0.us-central2-b .c.tpu-prod-env-large-adhoc.internal>
2025-04-02 17:18:08 -07:00
Harry Mellor
55acf86bf8
Fix huggingface-cli[hf-xet] -> huggingface-cli[hf_xet] ( #15969 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-02 23:37:30 +00:00
Michael Goin
f021b97993
[V1] Support Mistral3 in V1 ( #15950 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-04-02 15:36:24 -07:00
youkaichao
1cab43c2d2
[misc] instruct pytorch to use nvml-based cuda check ( #15951 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-04-03 01:02:58 +08:00
Nishidha
8bd651b318
Restricted cmake to be less than version 4 as 4.x breaks the build of… ( #15859 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-04-02 16:19:39 +00:00
Jee Jee Li
58e234a754
[Misc] V1 LoRA support CPU offload ( #15843 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-02 23:04:43 +08:00
rongfu.leng
e86c414d6a
[Model] use AutoWeightsLoader in model load_weights ( #15770 )
...
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io >
2025-04-02 07:47:31 -07:00
Li, Jiang
550b2801ad
[CPU][Bugfix] Using custom allreduce for CPU backend ( #15934 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-04-02 07:46:47 -07:00
Matthias Matt
cefb9e5a28
[Frontend] Implement Tool Calling with tool_choice='required' ( #13483 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
Signed-off-by: Matt, Matthias <matthias.matt@tuwien.ac.at >
Co-authored-by: Liangfu Chen <liangfc@amazon.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
2025-04-02 07:45:45 -07:00
Mark McLoughlin
98d7367b61
[Metrics] Hide deprecated metrics ( #15458 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-02 07:37:19 -07:00
Chauncey
594a8b9030
[Bugfix] Fix the issue where the model name is empty string, causing no response with the model name. ( #15938 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-02 06:33:52 -07:00
Kay Yan
44f990515b
[CI] Remove duplicate entrypoints-test ( #15940 )
...
Signed-off-by: Kay Yan <kay.yan@daocloud.io >
2025-04-02 02:44:01 -07:00
Brayden Zhong
252937806c
[Bugfix][Benchmarks] Ensure async_request_deepspeed_mii uses the OpenAI choices key ( #15926 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-04-02 02:19:35 -07:00
Harry Mellor
51826d51fa
Add minimum version for huggingface_hub to enable Xet downloads ( #15873 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-02 02:03:36 -07:00
Russell Bryant
14e53ed11f
[V1] Fix json_object support with xgrammar ( #15488 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-04-02 02:00:08 -07:00
Eric Tang
ddb94c2605
[core] Add tags parameter to wake_up() ( #15500 )
...
Signed-off-by: Eric <erictang000@gmail.com >
2025-04-02 01:59:27 -07:00
LukasBluebaum
90969fb39a
[Kernel] Add more dtype support for GGUF dequantization ( #15879 )
...
Signed-off-by: lukas.bluebaum <lukas.bluebaum@aleph-alpha.com >
2025-04-02 01:58:48 -07:00
Chris Thi
101f1481f9
[Build/CI] Update lm-eval to 0.4.8 ( #15912 )
...
Signed-off-by: Chris Thi <chris.c.thi@gmail.com >
2025-04-02 01:47:57 -07:00
Thien Tran
2edc87b161
[Bugfix] Fix cache block size calculation for CPU MLA ( #15848 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-04-02 01:45:02 -07:00
Jee Jee Li
4203926f10
[CI/Build] Further clean up LoRA tests ( #15920 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-02 01:39:09 -07:00
Chauncey
cdb57015a7
[Misc] Replace print with logger ( #15923 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-04-02 01:37:38 -07:00
Li Wang
aa557e6422
[Benchmark]Fix error message ( #15866 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2025-04-02 01:32:24 -07:00
Roger Wang
0e00d40e4f
[V1][Bugfix] Fix typo in MoE TPU checking ( #15927 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-01 23:46:42 -07:00
chun
c920e01242
[Doc] Update rocm.inc.md ( #15917 )
...
Signed-off-by: chun37 <chun.jb.37@gmail.com >
2025-04-01 23:38:26 -07:00
Woosuk Kwon
274d8e8818
[V1][Minor] Enhance SpecDecoding Metrics Log in V1 ( #15902 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-01 23:38:02 -07:00
Thien Tran
2039c6305b
[Bugfix] Fix imports for MoE on CPU ( #15841 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-04-02 03:33:55 +00:00
Brayden Zhong
6efb195a6e
[V1] Fix: make sure k_index is int64 for apply_top_k_only ( #15907 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-04-01 19:06:44 -07:00
Ekagra Ranjan
24b7fb455a
[Spec Decode] Fix input triton kernel for eagle ( #15909 )
2025-04-01 18:15:14 -07:00
Simon Mo
58f5a59769
[Docs] Add Intel as Sponsor ( #15913 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-01 17:16:55 -07:00
Simon Mo
db9dfcfa6a
[Docs] Add Ollama meetup slides ( #15905 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-01 13:58:59 -07:00
Gerald
9ef98d527e
[Model][MiniMaxText01] Support MiniMaxText01 model inference ( #13454 )
...
Signed-off-by: qscqesze <475517977@qq.com >
Co-authored-by: qingjun <qingjun@minimaxi.com >
Co-authored-by: qscqesze <475517977@qq.com >
2025-04-01 16:23:55 -04:00
yihong
93491aefc7
[BugFix] make sure socket close ( #15875 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-01 13:10:24 -07:00
Simon Mo
7acd539cd7
[Docs] update usage stats language ( #15898 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-04-01 12:54:13 -07:00
Woosuk Kwon
e75a6301bd
[V1][Spec Decode] Implement Eagle Proposer [1/N] ( #15729 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-04-01 12:33:16 -07:00
Mark McLoughlin
a79cc68b3a
[V1][Metrics] Initial speculative decoding metrics ( #15151 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-04-01 10:45:04 -07:00
Roger Wang
7e3f7a4ee7
[CI] Disable flaky structure decoding test temporarily. ( #15892 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-04-01 17:42:34 +00:00
cloud11665
9ec8257914
[Model] Add module name prefixes to gemma3 ( #15889 )
...
Signed-off-by: Bartholomew Sabat <bartek@recursal.ai >
Co-authored-by: Bartholomew Sabat <bartek@recursal.ai >
2025-04-01 10:13:40 -07:00
Jennifer Zhao
38327cf454
[Model] Aya Vision ( #15441 )
...
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-04-01 16:30:43 +00:00
Jee Jee Li
dfa82e2a3d
[CI/Build] Clean up LoRA tests ( #15867 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-04-01 16:28:50 +00:00
bnellnm
e59ca942f5
Add option to use DeepGemm contiguous grouped gemm kernel for fused MoE operations. ( #13932 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2025-04-01 12:07:43 -04:00
Gregory Shtrasberg
a57a3044aa
[ROCm][Build][Bugfix] Bring the base dockerfile in sync with the ROCm fork ( #15820 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-01 08:56:39 -07:00
Isotr0py
4e5a0f6ae2
[Misc] Allow using OpenCV as video IO fallback ( #15055 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 15:55:13 +00:00
Harry Mellor
b63bd14999
Reinstate format.sh and make pre-commit installation simpler ( #15890 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 15:41:30 +00:00
chaow-amd
2041c0e360
[Doc] Quark quantization documentation ( #15861 )
...
Signed-off-by: chaow <chaow@amd.com >
2025-04-01 08:32:45 -07:00
wang.yuqi
085cbc4f9f
[New Model]: jinaai/jina-reranker-v2-base-multilingual ( #15876 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-01 08:32:26 -07:00
Harry Mellor
2b93162fb0
Remove format.sh as it's been unsupported >70 days ( #15884 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 22:27:46 +08:00
Reid
2e45bd29fe
[Misc] remove unused script ( #15746 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-04-01 13:58:05 +00:00
Michael Goin
51d7c6a2b2
[Model] Support Mistral3 in the HF Transformers format ( #15505 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-04-01 06:10:05 -07:00
Yang Chen
f3aca1ee30
setup correct nvcc version with CUDA_HOME ( #15725 )
...
Signed-off-by: Yang Chen <yangche@fb.com >
2025-04-01 06:09:40 -07:00
Rui Qiao
8dd41d6bcc
[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE ( #15831 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-04-01 06:07:53 -07:00
Isotr0py
0a298ea418
[Bugfix] Fix no video/image profiling edge case for MultiModalDataParser ( #15828 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-04-01 18:17:11 +08:00
Harry Mellor
d330558bab
[Docs] Fix small error in link text ( #15868 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-04-01 10:05:14 +00:00
shangmingc
656fd72976
[Misc] Fix speculative config repr string ( #15860 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-04-01 02:26:22 -07:00
Varun Sundar Rabindranath
79455cf421
[Misc] Enable V1 LoRA by default ( #15320 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-04-01 16:53:56 +08:00
Wei Zeng
30d6a015e0
[Feature] specify model in config.yaml ( #15798 )
...
Signed-off-by: weizeng <weizeng@roblox.com >
2025-04-01 01:20:06 -07:00
yihong
8af5a5c4e5
fix: can not use uv run collect_env close #13888 ( #15792 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-04-01 07:45:49 +00:00
Chen Zhang
3a5f0afcd2
[V1] Implement sliding window attention in kv_cache_manager ( #14097 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-04-01 00:33:17 -07:00
Gregory Shtrasberg
c7e63aa4d8
[ROCm] Use device name in the warning ( #15838 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-04-01 00:10:48 -07:00
Lionel Villard
4a9ce1784c
[sleep mode] clear pytorch cache after sleep ( #15248 )
...
Signed-off-by: <villard@us.ibm.com >
2025-03-31 22:58:58 -07:00
Alexander Matveev
7e4e709b43
[V1] TPU - Fix fused MOE ( #15834 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-31 22:58:07 -07:00
Alexey Kiryushin
63d8eabed0
[Bugfix]: Fix is_embedding_layer condition in VocabParallelEmbedding ( #15824 )
...
Signed-off-by: alexwl <alexey.a.kiryushin@gmail.com >
2025-03-31 22:57:59 -07:00
Percy
e830b01383
[Bugfix] Fix extra comma ( #15851 )
...
Signed-off-by: haochengxia <xhc_1007@163.com >
2025-03-31 22:57:28 -07:00
Yan Ma
ff6473980d
[Bugfix][Model] fix mllama multi-image ( #14883 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2025-03-31 22:53:37 -07:00
Kinfey
a164aea35d
[Frontend] Add Phi-4-mini function calling support ( #14886 )
...
Signed-off-by: Kinfey <kinfeylo@microsoft.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-03-31 22:50:05 -07:00
Harry Mellor
a76f547e11
Rename fallback model and refactor supported models section ( #15829 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 22:49:41 -07:00
Ilya Markov
b7b7676d67
[Distributed] Add custom allreduce support for ROCM ( #14125 )
...
Signed-off-by: ilmarkov <imarkov@redhat.com >
Co-authored-by: ilmarkov <imarkov@redhat.com >
2025-03-31 22:49:12 -07:00
Harry Mellor
e6e3c55ef2
Move dockerfiles into their own directory ( #14549 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 13:47:32 -07:00
Mark McLoughlin
f98a4920f9
[V1][Core] Remove unused speculative config from scheduler ( #15818 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-31 19:15:21 +00:00
Harry Mellor
d4bfc23ef0
Fix Transformers backend compatibility check ( #15290 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 10:27:07 -07:00
Alexander Matveev
9a2160fa55
[V1] TPU CI - Add basic perf regression test ( #15414 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-31 13:25:20 -04:00
yihong
2de4118243
fix: change GB to GiB in logging close #14979 ( #15807 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-31 10:00:50 -07:00
shangmingc
239b7befdd
[V1][Spec Decode] Remove deprecated spec decode config params ( #15466 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-31 09:19:35 -07:00
Cyrus Leung
09e974d483
[Bugfix] Check dimensions of multimodal embeddings in V1 ( #15816 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-31 09:01:35 -07:00
Harry Mellor
e5ef4fa99a
Upgrade transformers to v4.50.3 ( #13905 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-31 08:59:37 -07:00
Mrm
037bcd942c
[Bugfix] Fix missing return value in load_weights method of adapters.py ( #15542 )
...
Signed-off-by: noc-turne <2270929247@qq.com >
2025-03-31 06:56:42 -07:00
Alex Brooks
c2e7507ad4
[Bugfix] Fix Crashing When Loading Modules With Batchnorm Stats ( #15813 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2025-03-31 13:23:53 +00:00
Naveassaf
3aa2b6a637
[Model] Update support for NemotronNAS models ( #15008 )
...
Signed-off-by: Nave Assaf <nassaf@nvidia.com >
2025-03-31 20:35:14 +08:00
youkaichao
555aa21905
[V1] Fully Transparent Implementation of CPU Offloading ( #15354 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-31 20:22:34 +08:00
yihong
e7ae3bf3d6
fix: better install requirement for install in setup.py ( #15796 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-31 05:13:32 -07:00
Harry Mellor
b932c048ac
Recommend developing with Python 3.12 in developer guide ( #15811 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-31 11:54:49 +00:00
Charlie Fu
e85829450d
[Feature][ROCm]Enable fusion pass for torch.compile on ROCm ( #15050 )
...
Signed-off-by: charlifu <charlifu@amd.com >
2025-03-31 04:42:18 -07:00
Jennifer Zhao
effc5d24fa
[Benchmark] Update Vision Arena Dataset and HuggingFaceDataset Setup ( #15748 )
...
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
2025-03-31 15:38:58 +08:00
Chengyang LIU
18ed3132d2
[Misc] update the comments ( #15780 )
...
Signed-off-by: chengyang liu <lcy4869@gmail.com >
Co-authored-by: chengyang liu <lcy4869@gmail.com >
2025-03-30 19:39:56 -07:00
Woosuk Kwon
9b459eca88
[V1][Scheduler] Avoid calling _try_schedule_encoder_inputs for every request ( #15778 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-30 14:10:42 -07:00
yihong
70fedd0f79
fix: Comments to English for better dev experience ( #15768 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-30 10:47:57 -07:00
kYLe
bb103b29bf
[Bugfix] Added embed_is_patch mask for fuyu model ( #15731 )
...
Signed-off-by: Kyle Huang <kylhuang@nvidia.com >
2025-03-30 03:45:08 -07:00
yihong
248e76c4df
fix: lint fix a ruff checkout syntax error ( #15767 )
...
Signed-off-by: yihong0618 <zouzou0208@gmail.com >
2025-03-30 03:36:02 -07:00
Cyrus Leung
803d5c35f3
[V1] Override mm_counts for dummy data creation ( #15703 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-30 03:20:42 -07:00
pansicheng
7fd8c0f85c
fix test_phi3v ( #15321 )
...
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com >
2025-03-30 02:01:34 -07:00
Reid
44c3a5abc3
[doc] update conda to usage link in installation ( #15761 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-30 08:12:13 +00:00
Julien Denize
6909a76201
[Bugfix] Fix Mistral guided generation using xgrammar ( #15704 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-03-29 20:20:19 -07:00
Chauncey
045533716b
[CI] xgrammar structured output supports Enum. ( #15757 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-29 20:20:02 -07:00
Isotr0py
3c0ff914ac
[Bugfix] Fix Mllama interleaved images input support ( #15564 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Chen Zhang <zhangch99@outlook.com >
2025-03-29 18:11:15 +00:00
Woosuk Kwon
2bc4be4e32
[V1][Minor] Simplify rejection sampler's parse_output ( #15741 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-29 09:25:17 -07:00
Roger Wang
c67abd614f
[V1] Support interleaved modality items ( #15605 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-29 06:30:09 -07:00
shangmingc
6fa7cd3dbc
[Feature][Disaggregated] Support XpYd disaggregated prefill with MooncakeStore ( #12957 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-29 04:01:46 -07:00
wwl2755
94744ba41a
[V1] [Feature] Collective RPC ( #15444 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-29 03:39:14 -07:00
TJian
4965ec42d2
[FEAT] [ROCm] Add AITER int8 scaled gemm kernel ( #15433 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-29 03:33:56 -07:00
Reid
73aa7041bf
[doc] update doc ( #15740 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-29 04:27:22 +00:00
yarongmu-google
7c1f760024
[Kernel][TPU][ragged-paged-attn] vLLM code change for PR#8896 ( #15659 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-03-28 21:13:15 -07:00
Nicolò Lucchesi
da461f3cbf
[TPU][V1][Bugfix] Fix w8a8 recompiilation with GSM8K ( #15714 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-28 21:13:06 -07:00
Jinzhen Lin
5b800f0932
[Bugfix] set VLLM_WORKER_MULTIPROC_METHOD=spawn for vllm.entrypoionts.openai.api_server ( #15700 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-03-28 21:12:26 -07:00
cyyever
8427f70493
Use numba 0.61 for python 3.10+ to support numpy>=2 ( #15692 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-03-29 12:11:51 +08:00
Russell Bryant
7a7992085b
[CI] Speed up V1 structured output tests ( #15718 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-28 21:10:45 -07:00
Varun Sundar Rabindranath
1286211f57
[Bugfix] LoRA V1: add and fix entrypoints tests ( #15715 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-28 21:10:41 -07:00
Nick Hill
6d531ad7b8
[Misc][V1] Misc code streamlining ( #15723 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-28 20:59:47 -07:00
Ce Gao
762b424a52
[Docs] Document v0 engine support in reasoning outputs ( #15739 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-29 03:46:57 +00:00
pengyuange
de1cb38769
[Model] Support Skywork-R1V ( #15397 )
...
Signed-off-by: jiacai.liu <932997367@qq.com >
Co-authored-by: jiacai.liu <932997367@qq.com >
2025-03-28 20:39:21 -07:00
Gregory Shtrasberg
c802f5430d
[ROCm][AMD][Build] Update AMD supported arch list ( #15632 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-28 20:39:18 -07:00
simpx
cff8991a50
[Docs][V1] Optimize diagrams in prefix caching design ( #15716 )
2025-03-29 03:33:58 +00:00
daniel-salib
f3f8d8fff4
implement prometheus fast-api-instrumentor for http service metrics ( #15657 )
2025-03-29 00:12:02 +00:00
Reid
26df46ee59
[Misc] cli auto show default value ( #15582 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
2025-03-28 22:23:00 +00:00
Alexander Matveev
c3f687ac22
[V1] TPU - Fix the chunked prompt bug ( #15713 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-28 20:19:04 +00:00
Luka Govedič
04437e313d
[Bugfix] [torch.compile] Add Dynamo metrics context during compilation ( #15639 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-03-28 14:01:09 -06:00
Robert Shaw
038bededba
[TPU] [Perf] Improve Memory Usage Estimation ( #15671 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-03-28 17:37:52 +00:00
shangmingc
d03308be0c
[Misc] Remove stale func in KVTransferConfig ( #14746 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-28 17:33:32 +00:00
Cyrus Leung
c6bc0034d0
[Misc] Remove unused utils and clean up imports ( #15708 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-28 09:41:16 -07:00
Woosuk Kwon
70e132244a
[Minor] Remove TGI launching script ( #15646 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-28 09:30:08 -07:00
Michael Goin
47e9038d23
Fix cpu offload testing for gptq/awq/ct ( #15648 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-29 00:29:32 +08:00
Kebe
432cf22a6a
[Bugfix] Fix regex compile display format ( #15368 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
2025-03-28 08:58:44 -07:00
Reid
2914006fe0
[doc] add missing imports ( #15699 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-28 15:56:48 +00:00
Russell Bryant
7329ff5468
[V1] Support disable_any_whtespace for guidance backend ( #15584 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-28 23:46:45 +08:00
Cyrus Leung
541d1df486
[Bugfix] embed_is_patch for Idefics3 ( #15696 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-28 08:27:52 -07:00
Chauncey
3b00ff9138
[Bugfix][v1] xgrammar structured output supports Enum. ( #15594 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-28 06:14:53 -07:00
Jee Jee Li
91276c5721
[Model] Adding torch compile annotations to chatglm ( #15624 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-28 21:14:09 +08:00
Harry Mellor
0b4167526d
[Docs] Add "Generation quality changed" section to troubleshooting ( #15701 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-28 13:03:21 +00:00
Reid
fd5fd26902
[Frontend] update priority for --api-key and VLLM_API_KEY ( #15588 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-28 19:40:12 +08:00
Ce Gao
3bbaacbe15
[Bugfix][Frontend] Eliminate regex based check in reasoning full generator ( #14821 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-28 11:20:35 +00:00
Lize Cai
a10314c6b3
[Misc] Fix test_sleep to use query parameters ( #14373 )
...
Signed-off-by: Lize Cai <lize.cai@sap.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-28 18:00:14 +08:00
Jee Jee Li
70f2c2a709
[Bugfix] Fix 'InductorAdaptor object has no attribute 'cache_dir' ( #15674 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-28 17:10:40 +08:00
Li, Jiang
280d074103
[CPU][CI] Improve CPU Dockerfile ( #15690 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-03-28 01:36:31 -07:00
Ce Gao
32b14baf8a
[Refactor][Frontend] Keep all logic about reasoning into one class ( #14428 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-28 00:23:30 -07:00
Robert Shaw
2d9045fce8
[TPU][CI] Fix TPUModelRunner Test ( #15667 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2025-03-28 00:01:26 -07:00
Cyrus Leung
355f66348c
[V1] Remove legacy input registry ( #15673 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 23:34:34 -07:00
Cyrus Leung
8693e47e6a
[Bugfix] Fix mm_hashes forgetting to be passed ( #15668 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-28 05:51:05 +00:00
Jason (Siyu) Zhu
cec8c7d7f8
Refactor error handling for multiple exceptions in preprocessing ( #15650 )
...
Signed-off-by: JasonZhu1313 <jasonchu13@outlook.com >
2025-03-28 03:27:20 +00:00
Gregory Shtrasberg
4d0ec37267
[Quantization][FP8] Adding support for fp8 gemm layer input in fp8 ( #14578 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-03-28 02:58:16 +00:00
Chen Xia
e7f720ea56
[Misc]add coding benchmark for speculative decoding ( #15303 )
...
Signed-off-by: CXIAAAAA <cxia0209@gmail.com >
2025-03-28 10:47:05 +08:00
Wes
4ae17bf1e2
Revert "Use Cache Hinting for fused_moe kernel ( #15511 )" ( #15645 )
...
Signed-off-by: Wes Medford <wryanmedford@gmail.com >
2025-03-27 19:45:55 -07:00
Robert Shaw
8a49eea74b
[CI][TPU] Temporarily Disable Quant Test on TPU ( #15649 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-27 19:45:05 -07:00
wwl2755
b4245a48df
[Doc] Fix dead links in Job Board ( #15637 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-28 02:43:40 +00:00
Kebe
4e0f6076be
[Bugfix] Fix failure to launch in Tensor Parallel TP mode on macOS. ( #14948 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-28 10:13:41 +08:00
Jee Jee Li
726efc6a32
[Quantization][V1] BitsAndBytes support V1 ( #15611 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-28 10:12:47 +08:00
Robert Shaw
bd45912b99
[TPU] Lazy Import ( #15656 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-28 09:57:01 +08:00
Nick Hill
15dac210f0
[V1] AsyncLLM data parallel ( #13923 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-27 16:14:41 -07:00
Russell Bryant
112b3e5b3b
[CI] Update rules for applying tpu label. ( #15634 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-27 22:15:26 +00:00
cnorman
32d669275b
Correct PowerPC to modern IBM Power ( #15635 )
...
Signed-off-by: Christy Norman <christy@linux.vnet.ibm.com >
2025-03-27 15:04:32 -07:00
Nicolò Lucchesi
4098b72210
[Bugfix][TPU][V1] Fix recompilation ( #15553 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-27 19:15:06 +00:00
Harry Mellor
46450b8d33
Use absolute placement for Ask AI button ( #15628 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-27 18:52:18 +00:00
Cyrus Leung
13ac9cab21
[Misc] Avoid direct access of global mm_registry in compute_encoder_budget ( #15621 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 17:52:00 +00:00
Yuan Tang
66aa4c0bf4
[Feature] Add middleware to log API Server responses ( #15593 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-03-27 17:49:38 +00:00
Cyrus Leung
247181536f
[Misc] Replace is_encoder_decoder_inputs with split_enc_dec_inputs ( #15620 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 17:36:32 +00:00
Cyrus Leung
07bf813fb5
[Doc] Link to onboarding tasks ( #15629 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 16:30:53 +00:00
Hiroaki Sugiyama
8958217ad5
[Bugfix] Fix use_cascade_attention handling for Alibi-based models on vllm/v1 ( #15211 )
...
Signed-off-by: h-sugi <h.sugi@ieee.org >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-27 22:29:29 +08:00
Cyrus Leung
ac5bc615b0
[Model] MiniCPM-V/O supports V1 ( #15487 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 06:07:29 -07:00
Reid
8063dfc61a
[Doc] update --system for transformers installation in docker doc ( #15616 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-27 20:38:46 +08:00
Richard Zou
6278bc829e
Fix incorrect filenames in vllm_compile_cache.py ( #15494 )
...
Signed-off-by: <zou3519@gmail.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-27 18:33:41 +08:00
wang.yuqi
3f532cb6a6
[Misc] Use model_redirect to redirect the model name to a local folder. ( #14116 )
2025-03-27 02:21:23 -07:00
Cyrus Leung
e6c9053f9e
[Misc] Clean up scatter_patch_features ( #15559 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-27 07:45:00 +00:00
Robert Shaw
43ed4143c4
[Quantization] Fp8 Channelwise Dynamic Per Token GroupedGEMM ( #15587 )
...
Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: ElizaWszola <eliza@neuralmagic.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com >
Co-authored-by: ElizaWszola <ewszola@redhat.com >
2025-03-27 06:47:25 +00:00
Bella kira
f4c98b4d4c
[Misc] Consolidate LRUCache implementations ( #15481 )
...
Signed-off-by: Bella kira <2374035698@qq.com >
2025-03-27 06:43:43 +00:00
Robert Shaw
e1e0fd7543
[TPU] Avoid Triton Import ( #15589 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-27 06:43:02 +00:00
Rui Qiao
df8d3d1287
[Misc] Restrict ray version dependency and update PP feature warning in V1 ( #15556 )
2025-03-27 06:21:07 +00:00
Chengji Yao
619d3de8bd
[TPU] [V1] fix cases when max_num_reqs is set smaller than MIN_NUM_SEQS ( #15583 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-03-26 22:46:26 -07:00
Gregory Shtrasberg
ecff8309a3
[ROCm] Env variable to trigger custom PA ( #15557 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-26 22:46:12 -07:00
Jerry Zhang
dcf2a590f5
Allow torchao quantization in SiglipMLP ( #15575 )
2025-03-26 22:45:51 -07:00
Cody Yu
54aa619459
[V1] Refactor num_computed_tokens logic ( #15307 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-27 04:54:36 +00:00
Mengqing Cao
fb22be5817
[moe][quant] add weight name case for offset ( #15515 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-03-27 04:50:29 +00:00
Wei Zeng
7f301dd8ef
[Doc] Update V1 user guide for fp8 kv cache support ( #15585 )
...
Signed-off-by: weizeng <weizeng@roblox.com >
2025-03-26 19:39:03 -07:00
Varun Sundar Rabindranath
8095341a01
[misc] LoRA: Remove unused long context test data ( #15558 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-27 10:04:51 +08:00
Chenyaaang
69db16a46a
add platform check back ( #15578 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
2025-03-27 01:50:27 +00:00
Michael Goin
ce78f9af4e
Add automatic tpu label to mergify.yml ( #15560 )
2025-03-26 21:39:58 -04:00
ElizaWszola
9239bf718e
[Kernel] CUTLASS grouped gemm fp8 MoE kernel ( #13972 )
...
Signed-off-by: ElizaWszola <eliza@neuralmagic.com >
Signed-off-by: ElizaWszola <ewszola@redhat.com >
Co-authored-by: Lucas Wilkinson <wilkinson.lucas@gmail.com >
2025-03-27 00:54:44 +00:00
Matthew Vine
7a6d45bc8a
Support FIPS enabled machines with MD5 hashing ( #15299 )
...
Signed-off-by: Matthew Vine <32849887+MattTheCuber@users.noreply.github.com >
2025-03-26 20:19:46 -04:00
Chengji Yao
e74ff409e0
[TPU] support disabling xla compilation cache ( #15567 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-03-27 00:09:28 +00:00
Wes
7a888271f5
Use Cache Hinting for fused_moe kernel ( #15511 )
2025-03-26 23:21:34 +00:00
Alexander Matveev
9d119a86ae
[V1] TPU CI - Fix test_compilation.py ( #15570 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-26 21:51:54 +00:00
Alexander Matveev
b2e85e26f4
[V1] TPU - Revert to exponential padding by default ( #15565 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-26 21:35:05 +00:00
Alexei-V-Ivanov-AMD
dd8a29da99
Applying some fixes for K8s agents in CI ( #15493 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-03-26 20:35:11 +00:00
marko
27df5199d9
Support SHA256 as hash function in prefix caching ( #15297 )
...
Signed-off-by: Marko Rosenmueller <5467316+dr75@users.noreply.github.com >
2025-03-26 11:11:28 -07:00
Nick Hill
35fad35a48
[V1][Sampler] Faster top-k only implementation ( #15478 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-26 10:56:47 -07:00
Aaron Pham
733e7c9e95
[Refactor] Remove unnecessary backend parameter in structured output interface ( #15317 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-26 17:51:56 +00:00
Harry Mellor
0af4d764d6
Fix weight loading for some models in Transformers backend ( #15544 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-26 10:17:53 -07:00
youkaichao
e64afa455c
multi-node offline DP+EP example ( #15484 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-26 23:54:24 +08:00
Alex Brooks
1711b929b6
[Model] Add Reasoning Parser for Granite Models ( #14202 )
...
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com >
Co-authored-by: Joe Runde <joe@joerun.de >
2025-03-26 14:28:07 +00:00
Harry Mellor
c091c0a588
Improve validation of TP in Transformers backend ( #15540 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-26 07:26:48 -07:00
cyyever
1aa162e030
Apply torchfix ( #15532 )
...
Signed-off-by: cyy <cyyever@outlook.com >
2025-03-26 12:09:06 +00:00
Harry Mellor
cf5c8f1686
Separate base model from TransformersModel ( #15467 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-03-26 18:13:38 +08:00
Reid
4ec2cee000
[Misc] improve example script output ( #15528 )
...
Signed-off-by: reidliu41 <reid201711@gmail.com >
Co-authored-by: reidliu41 <reid201711@gmail.com >
2025-03-26 10:12:47 +00:00
wwl2755
99f536f830
[Misc] Enhance warning information to user-defined chat template ( #15408 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-26 02:21:15 -07:00
vllmellm
5ebf66748b
[FEAT][ROCm] Integrate Fused MoE Kernels from AITER ( #14967 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-26 16:30:30 +08:00
Bryan Lu
781d056280
[Feature] Enhance EAGLE Architecture with Proper RMS Norms ( #14990 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-26 08:24:07 +00:00
daniel-salib
5aefd6ac31
Fix raw_request extraction in load_aware_call decorator ( #15382 )
...
Signed-off-by: Daniel Salib <danielsalib@meta.com >
2025-03-25 22:29:54 -07:00
Varun Sundar Rabindranath
6c663dfd5e
[misc] LoRA - Skip LoRA kernels when not required ( #15152 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-26 11:33:45 +08:00
Lucas Wilkinson
33437bc6e7
[BugFix] Fix nightly MLA failure (FA2 + MLA chunked prefill, i.e. V1, producing bad results) ( #15492 )
...
Signed-off-by: LucasWilkinson <lwilkinson@neuralmagic.com >
2025-03-25 20:33:22 -07:00
Tyler Michael Smith
23114d3364
[Misc] Warn about v0 in benchmark_paged_attn.py ( #15495 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-25 20:31:04 -07:00
Cyrus Leung
997c8811d6
[Model] Support multi-image for Molmo ( #15438 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-26 11:26:33 +08:00
Harry Mellor
e42389f9d7
Transformers backend already supports V1 ( #15463 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-25 20:26:16 -07:00
Varun Sundar Rabindranath
ff38f0a32c
[CI/Build] LoRA: Delete long context tests ( #15503 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-25 17:18:34 -07:00
Varun Sundar Rabindranath
a5cfbab3c8
[Core] LoRA: V1 Scheduler optimization ( #15422 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-25 22:50:09 +00:00
Chenyaaang
ac3cd6e83c
[core] add bucket padding to tpu_model_runner ( #14995 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-25 17:27:22 -04:00
Lu Fang
082ab86f5f
[V1] Support long_prefill_token_threshold in v1 scheduler ( #15419 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-25 14:22:26 -07:00
Nick Hill
6aa196c8dc
[V1][Minor] Use SchedulerInterface type for engine scheduler field ( #15499 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-25 14:21:36 -07:00
Nicolò Lucchesi
a0dd7dcd49
[TPU][V1] Fix Sampler recompilation ( #15309 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-25 16:43:54 -04:00
Maximilien de Bayser
e977c11111
Add workaround for shared field_names in pydantic model class ( #13925 )
...
Signed-off-by: Max de Bayser <mbayser@br.ibm.com >
2025-03-25 20:31:08 +00:00
Joe Runde
5f063a80bd
[bugfix] add supports_v1 platform interface ( #15417 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-03-25 15:00:32 -04:00
Antonio Gómez
5d8e1c9279
[Bugfix] Support triton==3.3.0+git95326d9f for RTX 5090 (Unsloth + vLLM compatibility) ( #15471 )
...
Co-authored-by: ServerAI <ai@exc-mad-ai.com >
2025-03-25 17:59:25 +00:00
yarongmu-google
0a049c7d86
[CI/Build] Add tests for the V1 tpu_model_runner. ( #14843 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-03-25 12:27:16 -04:00
youkaichao
d0cfec7ab9
[bugfix] fix inductor cache on max_position_embeddings ( #15436 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-25 07:05:39 -07:00
Szymon Ożóg
a608160027
[Kernel] Fix conflicting macro names for gguf kernels ( #15456 )
...
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
2025-03-25 13:50:49 +00:00
Cyrus Leung
3f04a7fbf2
[Doc] Update V1 user guide for multi-modality ( #15460 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-25 11:01:58 +00:00
Cyrus Leung
5994430b84
[Misc] Remove redundant num_embeds ( #15443 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-25 18:27:57 +08:00
Cyrus Leung
a9e879b316
[Misc] Clean up MiniCPM-V/O code ( #15337 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-25 10:22:52 +00:00
Md. Shafi Hussain
3e2f37a69a
Dockerfile.ppc64le changes to move to UBI ( #15402 )
...
Signed-off-by: Md. Shafi Hussain <Md.Shafi.Hussain@ibm.com >
2025-03-25 10:15:14 +00:00
Thien Tran
4f044b1d67
[Kernel][CPU] CPU MLA ( #14744 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-03-25 09:34:59 +00:00
Siyuan Liu
4157f563b4
[Hardware][TPU][Bugfix] Fix v1 mp profiler ( #15409 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-25 01:43:00 -07:00
Lu Fang
051da7efe3
Fix CUDA kernel index data type in vllm/csrc/quantization/gptq_marlin/awq_marlin_repack.cu +10 ( #15160 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com >
2025-03-25 15:36:45 +08:00
Woosuk Kwon
25f560a62c
[V1][Spec Decode] Update target_logits in place for rejection sampling ( #15427 )
...
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-24 21:04:41 -07:00
Russell Bryant
a09ad90a72
[V1] guidance backend for structured output + auto fallback mode ( #14779 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Loc Huynh <jc1da.3011@gmail.com >
Co-authored-by: Michal Moskal <michal@moskal.me >
2025-03-24 21:02:33 -07:00
Chauncey
10b34e36b9
[Bugfix] Fixed the issue of not being able to input video and image simultaneously ( #15387 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-25 03:48:08 +00:00
Tyler Michael Smith
b5269db959
Revert "Fix non-contiguous input passed to Marlin kernel ( #15319 )" ( #15398 )
2025-03-24 20:43:51 -07:00
Jee Jee Li
6db94571d7
[Misc] Remove LoRA log ( #15388 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-24 20:43:48 -07:00
Harry Mellor
97cfa65df7
Add pipeline parallel support to TransformersModel ( #12832 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-03-25 10:41:45 +08:00
Woosuk Kwon
911c8eb000
[Minor][Spec Decode] Remove compiled_softmax ( #15416 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-24 19:09:04 -07:00
Woosuk Kwon
ebcebeeb6b
[V1][Spec Decode] Enable spec decode for top-p & top-k sampling ( #15063 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-24 17:16:46 -07:00
Gregory Shtrasberg
f533b5837f
[ROCm][Kernel] MoE weights padding ( #14454 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
Signed-off-by: charlifu <charlifu@amd.com >
Co-authored-by: charlifu <charlifu@amd.com >
2025-03-24 23:45:30 +00:00
Gregory Shtrasberg
8279201ce6
[Build] Cython compilation support fix ( #14296 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-24 23:37:54 +00:00
Siyuan Liu
23fdab00a8
[Hardware][TPU] Skip failed compilation test ( #15421 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-24 23:28:57 +00:00
Nick Hill
623e2ed29f
[BugFix][V1] Quick fix for min_tokens with multiple EOS ( #15407 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-24 15:58:59 -07:00
Nick Hill
9d72daf4ce
[V1][Perf] Simpler request output queues ( #15156 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-24 22:44:08 +00:00
Cyrus Leung
6dd55af6c9
[Doc] Update docs on handling OOM ( #15357 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-24 14:29:34 -07:00
Yuan Tang
3eb08ed9b1
[DOC] Add Kubernetes deployment guide with CPUs ( #14865 )
2025-03-24 10:48:43 -07:00
liuzhenwei
5eeadc2642
[Hardware][Gaudi][Feature] Enable Dynamic MoE for Mixtral ( #12303 )
...
Signed-off-by: zhenwei <zhenweiliu@habana.ai >
2025-03-24 09:48:40 -07:00
Nick Hill
3aee6573dc
[V1] Aggregate chunked prompt logprobs in model runner ( #14875 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-24 12:27:57 -04:00
Yi Liu
9cc645141d
[MISC] Refine no available block debug msg ( #15076 )
...
Signed-off-by: Yi Liu <yiliu4@habana.ai >
Signed-off-by: yiliu30 <yi4.liu@intel.com >
Co-authored-by: Yi Liu <yiliu4@habana.ai >
2025-03-25 00:01:10 +08:00
Chen1022
0893567db9
[V1][Minor] fix comments ( #15392 )
...
Signed-off-by: chenjincong <chenjincong@baidu.com >
Signed-off-by: Chen-0210 <chenjincong11@gmail.com >
Co-authored-by: chenjincong <chenjincong@baidu.com >
2025-03-24 08:45:32 -07:00
Russell Bryant
8abe69b499
[Core] Don't force uppercase for VLLM_LOGGING_LEVEL ( #15306 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-24 08:27:30 -07:00
Manish Sethi
761702fd19
[Core] Integrate fastsafetensors loader for loading model weights ( #10647 )
...
Signed-off-by: Manish Sethi <Manish.sethi1@ibm.com >
2025-03-24 08:08:02 -07:00
youkaichao
9606d572ed
[distributed] fix dp group ( #15355 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-24 14:54:27 +00:00
Cyrus Leung
cbcdf2c609
[Bugfix] Fix chat template loading ( #15143 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-24 13:50:09 +00:00
Russell Bryant
038de04d7b
Fix zmq IPv6 URL format error ( #15341 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-24 09:30:41 -04:00
Jinzhen Lin
6b3cc75be0
[Kernel] allow non-contiguous input for marlin kernel ( #14658 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-03-24 09:21:33 -04:00
Simon Mo
7ffcccfa5c
Revert "[CI/Build] Use uv python for docker rather than ppa:deadsnakess/ppa ( #13569 )" ( #15377 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-24 05:53:10 -07:00
sfbemerk
cc8accfd53
[Misc] Update guided decoding logs to debug ( #15310 )
...
Signed-off-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
Co-authored-by: Benjamin Merkel <benjamin.merkel@tngtech.com >
2025-03-24 04:25:20 -07:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
948ab03e7e
[Bugfix][V1] Avoid importing PreTrainedModel ( #15366 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-03-24 10:33:12 +00:00
Rui Qiao
5797fb97e9
[Misc] Remove ignore_reinit_error for ray.init() ( #15373 )
2025-03-24 07:41:53 +00:00
Jee Jee Li
3892e58ad7
[Misc] Upgrade BNB version ( #15183 )
2025-03-24 05:51:42 +00:00
Qubitium-ModelCloud
d20e261199
Fix non-contiguous input passed to Marlin kernel ( #15319 )
2025-03-24 03:09:44 +00:00
Luka Govedič
f622dbcf39
[Fix] [torch.compile] Improve UUID system for custom passes ( #15249 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-03-24 01:54:07 +00:00
Lucas Wilkinson
dccf535f8e
[V1] Enable V1 Fp8 cache for FA3 in the oracle ( #15191 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-23 15:07:04 -07:00
Roger Wang
9c5c81b0da
[Misc][Doc] Add note regarding loading generation_config by default ( #15281 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-23 14:00:55 -07:00
Robin
d6cd59f122
[Frontend] Support tool calling and reasoning parser ( #14511 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-03-23 14:00:07 -07:00
Woosuk Kwon
bc8ed3c4ba
[V1][Spec Decode] Use better defaults for N-gram ( #15358 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-23 10:52:30 -07:00
Woosuk Kwon
b9bd76ca14
[V1][Spec Decode] Respect prompt_lookup_max ( #15348 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-23 10:41:44 -07:00
DefTruth
6ebaf9ac71
[Bugfix] consider related env vars for torch.compiled cache hash ( #14953 )
...
Signed-off-by: DefTruth <31974251+DefTruth@users.noreply.github.com >
2025-03-23 15:53:09 +00:00
DefTruth
f90d34b498
[Misc] Add tuned R1 w8a8 and MoE configs for NVIDIA L20 ( #15322 )
...
Signed-off-by: DefTruth <qiustudent_r@163.com >
2025-03-23 01:10:10 -07:00
youkaichao
f68cce8e64
[ci/build] fix broken tests in LLM.collective_rpc ( #15350 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-23 14:49:48 +08:00
youkaichao
09b6a95551
[ci/build] update torch nightly version for GH200 ( #15135 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-23 14:04:13 +08:00
shangmingc
50c9636d87
[V1][Usage] Refactor speculative decoding configuration and tests ( #14434 )
...
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com >
2025-03-22 19:28:10 -10:00
hijkzzz
0661cfef7a
Fix v1 supported oracle for worker-cls and worker-extension-cls ( #15324 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-23 10:23:35 +08:00
Chen Zhang
a827aa815d
[doc] Add back previous news ( #15331 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-22 17:38:33 -07:00
Russell Bryant
b877031d80
Remove openvino support in favor of external plugin ( #15339 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-22 14:06:39 -07:00
Wang Ran (汪然)
dd861b992f
[BugFix][Typing] Fix Imprecise Type Annotations ( #15208 )
...
Signed-off-by: Wang Ran (汪然) <wrran@outlook.com >
2025-03-22 09:05:03 -07:00
Russell Bryant
eb63ea1e18
[V1] Add disable-any-whitespace option support for xgrammar ( #15316 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-22 15:56:17 +00:00
Naitong Yu
2f4bd358f1
[Model] Support Tele-FLM Model ( #15023 )
...
Signed-off-by: Naitong Yu <ntyu@baai.ac.cn >
Signed-off-by: jiangxin <horizon94@outlook.com >
Co-authored-by: Jason Fang <jasonfang3900@gmail.com >
Co-authored-by: jiangxin <horizon94@outlook.com >
2025-03-22 02:04:44 -07:00
Varun Sundar Rabindranath
8a8b30eac1
[Bugfix] LoRA V0 - Fix case where max_num_seqs is between cudagraph capture sizes ( #15308 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-22 02:03:32 -07:00
Jee Jee Li
2fa0e1396b
[Bugfix] Fix torch.compile raise FileNotFoundError ( #15278 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-22 13:49:34 +08:00
wwl2755
1c2bec0f82
[Doc] add load_format items in docs ( #14804 )
...
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com >
2025-03-21 22:36:43 -07:00
TJian
ec870fba9a
[FEAT] [ROCm]: Add AITER RMS Norm (Layer Norm) Feature ( #14959 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-21 22:36:14 -07:00
Andy Lo
df1430265c
[Bugfix][V0] Multi-sequence logprobs streaming edge case ( #15259 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-03-21 22:35:37 -07:00
Rui Qiao
4c69e228b3
[Misc] Increase RayDistributedExecutor RAY_CGRAPH_get_timeout ( #15301 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-03-21 22:25:43 -07:00
Russell Bryant
790b79750b
[Build/CI] Fix env var typo ( #15305 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-21 22:28:46 +00:00
Nicolò Lucchesi
cfbb8c930f
[TPU][V1] MHA Pallas backend ( #15288 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-21 08:50:39 -07:00
Cyrus Leung
baec0d4de9
Revert "[Feature] specify model in config.yaml ( #14855 )" ( #15293 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-21 08:30:23 -07:00
Mengqing Cao
c21b99b912
[Bugfix][VLM] fix llava processor ( #15285 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-03-21 05:14:36 -07:00
Chen Zhang
93a00d7dde
[v1] Refactor KVCacheConfig ( #14079 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-21 04:56:27 -07:00
Russell Bryant
61e8c18350
[Misc] Add cProfile helpers ( #15074 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-21 04:56:09 -07:00
Isotr0py
8afcd0f633
[Bugfix] Fix broken kernel test due to missing rename for v1 Triton backend ( #15282 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-21 11:42:06 +00:00
Lehua Ding
91ca929dc7
[V1] Fix wrong import path of get_flash_attn_version ( #15280 )
...
Signed-off-by: Lehua Ding <lehuading@tencent.com >
2025-03-21 03:54:11 -07:00
Isotr0py
84e00adc8a
[Bugfix] Fix incorrect resolving order for transformers fallback ( #15279 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-21 03:54:08 -07:00
Isotr0py
47c7126213
[Misc] Add attention mask pre-computation optimization back to Qwen2.5-VL ( #15273 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-21 10:32:33 +00:00
Shanshan Shen
a989ca2bf6
[Bugfix] Add int8 torch dtype for KVCache ( #15260 )
...
Signed-off-by: shen-shanshan <467638484@qq.com >
2025-03-21 08:58:28 +00:00
Wei Zeng
0fa3970deb
[Feature] specify model in config.yaml ( #14855 )
...
Signed-off-by: weizeng <weizeng@roblox.com >
2025-03-21 00:26:03 -07:00
Nick Hill
da6ea29f7a
[V1] Avoid redundant input processing in n>1 case ( #14985 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-20 22:24:10 -07:00
Edwin Hernandez
7297941b38
[Doc] Update LWS docs ( #15163 )
...
Signed-off-by: Edwinhr716 <Edandres249@gmail.com >
2025-03-20 21:18:47 -07:00
Isotr0py
f8a08cb90d
[V1] Enable Triton(ROCm) Attention backend for Nvidia GPUs ( #14071 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-21 03:14:19 +00:00
Siyuan Liu
b15fd2be2a
[Hardware][TPU] Add check for no additional graph compilation during runtime ( #14710 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-21 03:05:28 +00:00
Woosuk Kwon
e588ac237c
Add an example for reproducibility ( #15262 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-20 19:55:47 -07:00
Cody Yu
5df2da5b97
[Misc] Better RayExecutor and multiprocessing compatibility ( #14705 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-20 19:27:46 -07:00
Woosuk Kwon
11b986b3fb
[Docs] Trim the latest news in README ( #15261 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-20 19:24:21 -07:00
Chih-Chieh Yang
296f927f24
[Model] RE: Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies ( #14857 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-03-20 19:21:08 -07:00
Travis Johnson
0032903a5b
[Bugfix] detect alibi and revert to FA2 ( #15231 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-03-20 19:20:16 -07:00
Hyesoo Yang
47195057e9
[V1][TPU] Speed up top-k on TPU by using torch.topk ( #15242 )
...
Signed-off-by: Hyesoo Yang <hyeygit@gmail.com >
2025-03-20 19:19:40 -07:00
Harry Mellor
6edbfa924d
Mention extra_body as a way top pass vLLM only parameters using the OpenAI client ( #15240 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-20 19:18:36 -07:00
Isotr0py
1e508343e1
[Bugfix] Fix incorrect qwen2.5-vl attention mask pre-computation ( #15200 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-20 19:18:04 -07:00
Sage Moore
2e0b4cfde0
[ROCM] Upgrade torch to 2.6 ( #15244 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-03-20 19:17:33 -07:00
Jee Jee Li
10f55fe6c5
[Misc] Clean up the BitsAndBytes arguments ( #15140 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-20 19:17:12 -07:00
Lu Fang
d3ccbd6350
Fix CUDA kernel index data type in vllm/csrc/quantization/fused_kernels/layernorm_utils.cuh +10 ( #15159 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
Co-authored-by: Richard Barnes <rbarnes@meta.com >
2025-03-21 10:01:11 +08:00
Varun Sundar Rabindranath
0cfe7d386d
[CI/Build] LoRA : make add_lora_test safer ( #15181 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-21 09:28:53 +08:00
Woosuk Kwon
0c6f5023c3
[V1] Scheduler Refactoring [1/N] - Add Scheduler Interface ( #15250 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-03-20 17:50:43 -07:00
Yu Chin Fabian Lim
06dd08256f
Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. ( #14617 )
...
Signed-off-by: Yu Chin Fabian Lim <flim@sg.ibm.com >
2025-03-21 00:44:37 +00:00
Woosuk Kwon
2b22290ce0
[V1] Add flag to disable cascade attention ( #15243 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-20 15:24:16 -07:00
Jason
d8e82bc06d
[Bugfix] fix V1 Engine crash while handling requests with duplicate request id ( #15043 )
...
Signed-off-by: Jiahui Sun <jhsun2020@gmail.com >
2025-03-20 10:01:02 -07:00
Chi Zhang
086b56824c
[ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 ( #15172 )
...
Signed-off-by: Chi Zhang <zhangchi.usc1992@bytedance.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-21 00:30:04 +08:00
Harry Mellor
5a0905ba2a
Replace misc issues with link to forum ( #15226 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-20 23:18:20 +08:00
Richard Liu
a8f12a63fd
Fix env vars for running Ray distributed backend on GKE ( #15166 )
...
Signed-off-by: Richard Liu <ricliu@google.com >
2025-03-20 14:59:33 +00:00
Harry Mellor
69ae2380c6
Add user forum to README ( #15220 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-20 22:39:51 +08:00
Cyrus Leung
27261e40a6
[Bugfix] Multi-video inference on LLaVA-Onevision ( #15082 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-03-20 14:10:45 +00:00
Quang-Linh LE
e3f813c33b
[macOS] Ugrade pytorch to 2.6.0 ( #15129 )
2025-03-20 01:22:40 -07:00
Wang Ran (汪然)
c607a2652b
Fixing Imprecise Type Annotations ( #15192 )
2025-03-20 01:19:55 -07:00
Kevin H. Luu
3d45e3d749
[release] Tag vllm-cpu with latest upon new version released ( #15193 )
2025-03-20 01:19:10 -07:00
billishyahao
742369d35a
[Frontend][Bugfix] support prefill decode disaggregation on deepseek ( #14824 )
...
Signed-off-by: billishyahao <bill.he@amd.com >
Co-authored-by: Zhai Feiyue <80079571+ZhaiFeiyue@users.noreply.github.com >
2025-03-20 00:00:33 -07:00
Wang Ran (汪然)
bfe2fe0af4
typo: Update config.py ( #15189 )
2025-03-19 23:31:21 -07:00
Matt Ritter
a8652f4f0f
Enable CUDA graph support for llama 3.2 vision ( #14917 )
...
Signed-off-by: Matt Ritter <100659061+mritterfigma@users.noreply.github.com >
2025-03-19 23:29:16 -07:00
Cyrus Leung
2f726b241e
[Doc] Update README.md ( #15187 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-20 13:25:58 +08:00
Mickaël Seznec
a597a57595
[Attention] Flash Attention 3 - fp8 ( #14570 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
2025-03-20 01:14:20 -04:00
Chauncey
ae65f3e237
[Misc]fixed disable these http request logs ( #14754 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-19 21:53:40 -07:00
Roger Wang
34868b106a
[Doc] Update Mistral Small 3.1/Pixtral example ( #15184 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-20 04:46:06 +00:00
Russell Bryant
1f16b7fe74
[Core][V0] Add guidance backend for structured output ( #14589 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Loc Huynh <lohuynh@microsoft.com >
Co-authored-by: Michal Moskal <michal@moskal.me >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-19 21:33:51 -07:00
Jennifer Zhao
b88be22165
[Benchmark] Allow oversample request in benchmark dataset ( #15170 )
...
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
2025-03-20 12:32:58 +08:00
Nicolò Lucchesi
d8c6d7d6b5
[V1][TPU] Support V1 Sampler for ragged attention ( #14227 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-19 21:00:39 -07:00
Wang, Yi
40828ce5fe
fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… ( #14673 )
...
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com >
2025-03-19 20:56:16 -07:00
Cyrus Leung
ffa443afed
[Bugfix] Fix embedding assignment for InternVL-based models ( #15086 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-20 03:40:13 +00:00
Jovan Sardinha
70e500cad9
Fix broken tests ( #14713 )
...
Signed-off-by: JovanSardinha <jovan.sardinha@gmail.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-03-20 02:06:49 +00:00
Rui Qiao
4cb1c05c9e
[Doc] Clarify run vllm only on one node in distributed inference ( #15148 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-03-20 09:55:59 +08:00
Nick Hill
c47aafa37c
[BugFix] Lazily import XgrammarBackend to avoid early cuda init ( #15171 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-20 01:30:43 +00:00
Alexander Matveev
cfbca8a2f2
[V1] TPU - Tensor parallel MP support ( #15059 )
2025-03-20 00:55:18 +00:00
Simon Mo
0fe5609874
[Docs] Annouce Ollama and Singapore Meetups ( #15161 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-19 16:18:04 -07:00
Nick Hill
22d33baca2
[FrontEnd][Perf] merge_async_iterators fast-path for single-prompt requests ( #15150 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-19 21:04:41 +00:00
iefgnoix
b0e96aaebb
[V1][TPU] Change kv cache shape. ( #15145 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-03-19 12:16:42 -07:00
Wang Ran (汪然)
8310e0b59b
simple bugfix: Update stats.py ( #15139 )
2025-03-19 18:26:27 +00:00
maobaolong
26dd972adb
[FEAT]Support reset prefix cache by specified device ( #15003 )
2025-03-19 10:54:41 -07:00
Murali Andoorveedu
61c7a1b856
[V1] Minor V1 async engine test refactor ( #15075 )
...
Create Release / Create Release (push) Has been cancelled
Signed-off-by: andoorve <murali.andoorveedu@mail.utoronto.ca >
Co-authored-by: andoorve <murali.andoorveedu@mail.utoronto.ca >
2025-03-19 10:37:17 -07:00
Alessandro Sangiorgi
374ee287d8
[Frontend] Remove custom_cache_manager ( #13791 )
...
Signed-off-by: fulvius31 <asangior@redhat.com >
2025-03-20 00:13:50 +08:00
Kero Liang
a4d83661d7
[Misc] Update the "the first vLLM China Meetup" slides link to point to the first page ( #15134 )
...
Signed-off-by: imkero <kerorek@outlook.com >
2025-03-19 15:07:39 +00:00
Jan Kaniecki
8363cd093d
[Bugfix] Adjust mllama to regional compilation ( #15112 )
...
Signed-off-by: Jan Kaniecki <jkaniecki@habana.ai >
2025-03-19 07:57:25 -07:00
Aaron Pham
6c5a3195db
[Misc][Benchmark] Add support for different tokenizer_mode ( #15040 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-03-19 14:56:50 +00:00
Marc-Alexandre Côté
073d1ed354
[Doc] Update tip info on using latest transformers when creating a custom Dockerfile ( #15070 )
2025-03-19 13:33:40 +00:00
Cyrus Leung
3d446433ec
[Bugfix] Fix size calculation of processing cache ( #15114 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 05:53:19 -07:00
Cyrus Leung
1fe0fd12d3
[Misc] Avoid unnecessary HF do_rescale warning when passing dummy data ( #15107 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 03:42:31 -07:00
Roger Wang
dafb4e504a
[V1][Bugfix] Fix oracle for device checking ( #15104 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-19 18:35:32 +08:00
Kunshang Ji
68cf1601d3
[CI][Intel GPU] update XPU dockerfile and CI script ( #15109 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-03-19 01:29:25 -07:00
Cyrus Leung
61f412187d
[Bugfix] Re-enable Gemma3 for V1 ( #14980 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-18 23:58:22 -07:00
Woosuk Kwon
05ccd0aa35
[V1] Ensure using int64 for sampled token ids ( #15065 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-18 23:52:19 -07:00
Cyrus Leung
f690372b68
[Core] Update dtype detection and defaults ( #14858 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-19 13:49:33 +08:00
Brayden Zhong
8b3e94a357
[Model] Remove duplicated message check in Mistral chat completion request ( #15069 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-03-19 05:09:32 +00:00
Julien Denize
437f9162d0
[Model] Pixtral: Remove layer instantiation duplication ( #15053 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
2025-03-19 10:34:03 +08:00
Cody Yu
4f065f12f5
[Misc][V1] Skip device checking if not available ( #15061 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-18 19:33:43 -07:00
Jennifer Zhao
228b768db6
[Doc] Minor v1_user_guide update ( #15064 )
...
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2025-03-18 16:10:45 -07:00
Chujie Zheng
027827cc1d
fix long dtype in topk sampling ( #15049 )
2025-03-18 15:57:31 -07:00
Alexander Matveev
72a8639b68
[V1] TPU - CI/CD use smaller model ( #15054 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-18 21:39:21 +00:00
Woosuk Kwon
99abb8b650
[V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels ( #14930 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-18 14:31:54 -07:00
Russell Bryant
3a1e648158
[V1] Refactor Structured Output for multiple backends ( #14694 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-18 19:49:15 +00:00
Jee Jee Li
46c759c165
[Bugfix] Fix LoRA extra vocab size ( #15047 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-18 09:40:29 -07:00
Isotr0py
179a619c21
[Bugfix] Fix broken CPU quantization due to triton import ( #15038 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-18 08:57:39 -07:00
yury-tokpanov
452e8fd968
[MODEL] Add support for Zamba2 models ( #13185 )
...
Signed-off-by: Yury Tokpanov <yury@zyphra.com >
Signed-off-by: Quentin Anthony <qganthony@yahoo.com >
Co-authored-by: Quentin Anthony <qganthony@yahoo.com >
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-18 08:56:21 -07:00
ekuznetsov139
8b793f7ec6
MI325 configs, fused_moe_kernel bugfix ( #14987 )
...
Signed-off-by: Eugene Kuznetsov <eugene.kuznetsov@amd.com >
2025-03-18 08:05:18 -07:00
Nicolò Lucchesi
af35d3a3cc
[TPU][V1][Bugfix] Fix chunked prefill with padding ( #15037 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-18 07:34:45 -07:00
Simon Mo
3b457143d2
[Bugfix] Register serializers for V0 MQ Engine ( #15009 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-18 09:14:47 -04:00
Cyrus Leung
ab656f2c2f
[Bugfix] Loosen type check to avoid errors in V1 ( #15021 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-18 12:54:40 +00:00
Serena
64fc2193dc
[Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros ( #14347 )
2025-03-18 05:50:19 -07:00
Sebastian Schoennenbeck
dd732028f5
[Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest ( #14352 )
...
Signed-off-by: Sebastian Schönnenbeck <sebastian.schoennenbeck@comma-soft.com >
2025-03-18 05:50:05 -07:00
hoshi-hiyouga
414919138b
[Bugfix] torchrun compatibility ( #14899 )
...
Signed-off-by: hiyouga <hiyouga@buaa.edu.cn >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2025-03-18 05:49:27 -07:00
Jee Jee Li
db7c8ca910
[Misc] Embedding model support LoRA ( #14935 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-18 12:07:00 +00:00
Patrick von Platen
f863ffc965
[Mistral-Small 3.1] Update docs and tests ( #14977 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-18 03:29:42 -07:00
Varun Sundar Rabindranath
400d483e87
[Kernels] LoRA - Retire SGMV and BGMV Kernels ( #14685 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-18 09:47:53 +00:00
Shanshan Shen
d1695758b2
[Doc][V1] Fix V1 APC doc ( #14920 )
2025-03-18 08:15:46 +00:00
Liangfu Chen
53a0cf8b95
[Neuron] trim attention kernel tests to fit trn1.2x instance ( #14988 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-03-18 15:05:52 +08:00
Tristan Leclercq
5eeabc2a44
[Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights ( #14950 )
2025-03-17 23:27:26 +00:00
Alexander Matveev
18551e820c
[V1] TPU - Fix CI/CD runner ( #14974 )
2025-03-17 21:07:07 +00:00
Robert Shaw
e41e160263
[V1] Guard Against Main Thread Usage ( #14972 )
...
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
2025-03-17 13:23:02 -07:00
Cyrus Leung
b89fb2a4a1
[CI/Build] Use AutoModelForImageTextToText to load VLMs in tests ( #14945 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 18:35:17 +00:00
Roger Wang
5340b0e221
[Bugfix] Fix interface for Olmo2 on V1 ( #14976 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-17 11:26:38 -07:00
Roger Wang
37e3806132
[Bugfix] Make Gemma3 MM V0 only for now ( #14971 )
...
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-17 10:04:21 -07:00
Aaron Pham
c0efdd655b
[Fix][Structured Output] using vocab_size to construct matcher ( #14868 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2025-03-17 11:42:45 -04:00
Quentin
aaaec52ad9
[Bugfix][Model] Mixtral: use unused head_dim config argument ( #14961 )
...
Signed-off-by: Quentin Torroba <quentin.torroba@mistral.ai >
2025-03-17 07:44:18 -07:00
Tyler Michael Smith
e1eb45d397
[Bugfix] Fix precommit - line too long in pixtral.py ( #14960 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 07:18:50 -07:00
Simon Mo
89fca671fb
[V1] Default MLA to V1 ( #14921 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-17 06:54:40 -07:00
Patrick von Platen
d20b0c139c
Add patch merger ( #14957 )
2025-03-17 06:47:50 -07:00
Cyrus Leung
166a168b0f
[Doc] Fix misleading log during multi-modal profiling ( #14955 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 06:14:32 -07:00
vllmellm
2bb0e1a799
[Bugfix][ROCm] running new process using spawn method for rocm in tests. ( #14810 )
...
Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com >
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-17 11:33:35 +00:00
Cyrus Leung
6eaf1e5c52
[Misc] Add --seed option to offline multi-modal examples ( #14934 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 03:00:17 -07:00
Cyrus Leung
868a8c5b2c
[Bugfix] Fix Ultravox on V1 ( #14929 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 17:15:20 +08:00
iefgnoix
b4ad56c1bd
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. ( #14846 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
2025-03-17 01:48:28 -07:00
kushanam
69698f257e
fix minor miscalled method ( #14327 )
2025-03-17 01:47:58 -07:00
Lu Fang
cd0cd85102
[MISC] More AMD unused var clean up ( #14926 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-17 16:40:41 +08:00
Russell Bryant
0a74bfce9c
setup.py: drop assumption about local main branch ( #14692 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-17 01:37:42 -07:00
Chen Zhang
dd3b865854
[Doc] Add vLLM Beijing meetup slide ( #14938 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-17 16:29:36 +08:00
Yan Ma
9b87a579aa
[Misc][XPU] Use None as device capacity for XPU ( #14932 )
...
Signed-off-by: yan ma <yan.ma@intel.com >
2025-03-17 01:22:14 -07:00
Cyrus Leung
b539222d4e
[V1] Remove input cache client ( #14864 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-16 23:42:06 -07:00
Lily Liu
8d6cf89526
[V1] [Spec Decode] Support random sampling for spec decode ( #13933 )
...
Create Release / Create Release (push) Has been cancelled
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-16 22:00:20 -07:00
Simon Mo
583a9778e0
[Benchmark] Do not save detailed info to json by default ( #14879 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-16 21:48:11 -07:00
Sibi
a73e183e36
[Misc] Replace os environ to monkeypatch in test suite ( #14516 )
...
Signed-off-by: sibi <85477603+t-sibiraj@users.noreply.github.com >
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-16 20:35:57 -07:00
Lucas Wilkinson
1e799b7ec1
[BugFix] Fix MLA + V1 + TP==1 causing reinitialization of cuda context ( #14910 )
2025-03-17 03:35:37 +00:00
Woosuk Kwon
7f6c5ee06c
[V1][Minor] Add __repr__ to ConstantList ( #14907 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-16 20:20:15 -07:00
Woosuk Kwon
faa0275730
[V1] Optimize the overhead of rewinding ( #14905 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-16 20:19:30 -07:00
Cyrus Leung
8a5a9b70d7
[CI/Build] Update defaults for test reproducibility ( #14893 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-17 10:38:15 +08:00
Robert Shaw
bb3aeddfaf
[CI] Nightly Tests ( #14898 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Signed-off-by: rshaw@neuralmagic.com <robertgshaw2@gmail.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-03-17 02:06:43 +00:00
Robert Shaw
aecc780dba
[V1] Enable Entrypoints Tests ( #14903 )
2025-03-16 17:56:16 -07:00
Vadim Gimpelson
90df7f23aa
[Doc] Add guidance for using ccache with pip install -e . in doc ( #14901 )
2025-03-16 23:10:04 +00:00
Rui Qiao
b9b5bdfc7d
[Misc] Catching Ray Compiled Graph PP test failures for V1 ( #14847 )
2025-03-16 15:46:42 -07:00
Woosuk Kwon
31060b2757
[V1][BugFix] Detect interleaved sliding window attention ( #14896 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-16 14:53:53 -07:00
Nick Hill
fc1f67715d
[BugFix][V1] Fix overhead related to bad_words sampling when not in use ( #14894 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-16 14:53:34 -07:00
Cyrus Leung
f6137adbcb
Revert "[Bugfix] Limit profiling run sequence length by max_model_len ( #14785 ) ( #14892 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-16 09:13:46 -07:00
Cyrus Leung
e53b1350f2
[Bugfix] Explicitly disable Phi-4-multimodal in V1 ( #14889 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-16 09:05:40 -07:00
Kyle Sayers
d30aa7e9e6
[Bugfix] Limit profiling run sequence length by max_model_len ( #14785 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-03-16 07:44:19 -07:00
Lily Liu
d1ad2a57af
[V1] [Spec Decode] Fix ngram tests ( #14878 )
2025-03-16 00:29:22 -07:00
Nick Hill
b82662d952
[BugFix] Fix torch distributed stateless PG backend init ( #14870 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-15 20:26:19 -07:00
Simon Mo
71c1e07107
[Kernel] Add more tuned configs ( #14877 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-15 20:25:03 -07:00
Roger Wang
b30c75dda4
[V1] Remove V0 fallback for mistral-tokenizer ( #14873 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-15 20:21:11 -07:00
Isotr0py
def232e122
[VLM] Clean up Phi-4-MM ViT implementation ( #14812 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-03-15 18:53:52 -07:00
Roger Wang
3453b964a3
[Misc][Doc] Minor benchmark README update ( #14874 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-16 09:46:17 +08:00
Rémi Delacourt
61c6a5a796
[VLM] Merged multi-modal processor for Pixtral ( #12211 )
...
Signed-off-by: remi <remi@mistral.ai >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-15 06:28:27 -07:00
Jun Duan
74bc397b0a
[Core] Expose API endpoint /is_sleeping ( #14312 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com >
2025-03-15 06:28:14 -07:00
Kunshang Ji
f58aea002c
[CI][Intel GPU] refine intel GPU ci docker build ( #14860 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-03-15 11:58:53 +00:00
Cyrus Leung
3556a41434
[VLM] Limit multimodal input cache by memory ( #14805 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-15 02:52:05 -07:00
Bryan Lu
9ed6ee92d6
[Bugfix] EAGLE output norm bug ( #14464 )
...
Signed-off-by: Bryan Lu <yuzhelu@amazon.com >
2025-03-15 06:50:33 +00:00
Russell Bryant
ee3778d5fc
[Build/CI] Upgrade jinja2 to get 3 moderate CVE fixes ( #14839 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-15 05:38:19 +00:00
Jennifer Zhao
aaacf17324
[Doc] V1 user guide ( #13991 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-14 22:17:59 -07:00
Aaron Pham
4c7629cae9
[V1][Structured Output] calculate vocab_size eagerly ( #14851 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-14 22:09:51 -07:00
Jee Jee Li
e0fdfa1608
[CI/Build] Delete LoRA bias test ( #14849 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-14 22:09:25 -07:00
Lucas Wilkinson
5952d8ab61
[Attention] Get rid of mla cache alignment ( #14842 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-15 05:08:25 +00:00
Li, Jiang
a2ae496589
[CPU] Support FP8 KV cache ( #14741 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-03-14 22:07:36 -07:00
Simon Mo
877e352262
[Docs] Add new East Coast vLLM Meetup slides to README and meetups.md ( #14852 )
2025-03-14 22:06:38 -07:00
Robert Shaw
d4d93db2c5
[V1] V1 Enablement Oracle ( #13726 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Michael Goin <michael@neuralmagic.com >
2025-03-14 22:02:20 -07:00
Lu Fang
8c0d15d5c5
[Misc][Easy] Annotate unused vars in the csrc files ( #14798 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-15 12:40:09 +08:00
Isotr0py
97ac781c62
[Misc] Remove misleading message in gemma2 and gemma3 ( #14850 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-14 21:35:12 -07:00
Russell Bryant
776dcec8fe
Disable outlines cache by default ( #14837 )
2025-03-15 03:57:55 +00:00
Tyler Michael Smith
ccf02fcbae
Revert "[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of U… ( #14848 )
2025-03-14 20:45:42 -07:00
DefTruth
acaea3bb07
[Bugfix][V1] Fix flashinfer sampling ( #14815 )
2025-03-14 20:42:38 -07:00
Liangfu Chen
9f37422779
[Neuron][CI] update docker run command ( #14829 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
2025-03-14 18:51:35 -07:00
yarongmu-google
dd344e0342
[Bugfix] Fix torch_xla in V0 which can't handle None seed introduced … ( #14844 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-03-15 00:41:15 +00:00
Yuan Tang
54a8804455
[Doc] More neutral K8s deployment guide ( #14084 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-03-14 16:12:36 -07:00
Russell Bryant
bbd94a19fc
[Build/CI] Upgrade aiohttp to incldue CVE fix ( #14840 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-14 23:11:28 +00:00
Russell Bryant
233ffce1eb
[Build/CI] Move ninja to common deps ( #14835 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-14 21:25:28 +00:00
Richard Liu
40677783aa
[CI] Add TPU v1 test ( #14834 )
...
Signed-off-by: Richard Liu <ricliu@google.com >
2025-03-14 17:13:30 -04:00
Michael Goin
14f301b541
Update to torch==2.6.0 ( #12721 )
...
Signed-off-by: mgoin <michael@neuralmagic.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: luka <luka@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-14 16:58:30 -04:00
Russell Bryant
46f98893dd
[V1] Fix model parameterization for structured output tests ( #14833 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-14 20:55:18 +00:00
Chih-Chieh Yang
fe66b34728
[Model] Mamba2 Prefill Performance Tweaks: Fixing Flurry of Unnecessary Memory Copies ( #14778 )
...
Signed-off-by: Chih-Chieh-Yang <7364402+cyang49@users.noreply.github.com >
2025-03-14 16:36:18 -04:00
Alexei-V-Ivanov-AMD
270a5da495
Re-enable the AMD Entrypoints Test ( #14711 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
2025-03-14 12:18:13 -07:00
Kevin H. Luu
7097b4cc1c
[release] Remove log cleanup commands from TPU job ( #14838 )
2025-03-14 11:59:52 -07:00
Yajie Wang
977a16772c
[Bugfix][Kernel]: Fix AllSpark kernel compilation errors and enable for CUDA < 12.0 ( #14430 )
...
Signed-off-by: wyj371990 <wyj371990@alibaba-inc.com >
2025-03-14 09:55:14 -07:00
daniel-salib
73deea2fdb
[Frontend] track server_load ( #13950 )
2025-03-14 09:53:17 -07:00
Mark McLoughlin
9d2b4a70f4
[V1][Metrics] Updated list of deprecated metrics in v0.8 ( #14695 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-15 00:45:25 +08:00
Russell Bryant
0b0d6421b2
[Frontend] Fix log message to use http vs https ( #14774 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-14 09:21:09 -07:00
Russell Bryant
1140991a7b
[V1] Fix vocab size calculation for structured output ( #14826 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-14 09:18:38 -07:00
Cyrus Leung
613c5bb945
[Bugfix] Fix Aria test loading ( #14823 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-14 09:11:23 -07:00
Guillaume Calmettes
fd8e055ffb
[BugFix]: properly catch templating error when preprocess input ( #13976 )
...
Signed-off-by: Guillaume Calmettes <gcalmettes@scaleway.com >
2025-03-14 05:58:34 -07:00
Cyrus Leung
ab93f1360f
[VLM] Various cleanup and fixes ( #14806 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-14 05:58:19 -07:00
DefTruth
40253bab44
[Bugfix][W8A8] fixed cutlass block fp8 binding ( #14796 )
2025-03-14 03:32:42 -07:00
Woosuk Kwon
c77620d22d
[V1][Minor] Minor code cleanup for scheduling metrics ( #14800 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-14 08:21:28 +00:00
Jee Jee Li
989ecd2007
[Misc] Gemma3ForConditionalGeneration supports LoRA ( #14797 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-14 01:07:30 -07:00
WeiCheng
54cc46f3eb
[Bugfix] Fix small typo in the example of Streaming delimiter ( #14793 )
2025-03-14 08:05:17 +00:00
Cyrus Leung
601bd3268e
[Misc] Clean up type annotation for SupportsMultiModal ( #14794 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-14 00:59:56 -07:00
Li Wang
09269b3127
[BugFix]Fix performance serving benchmark when enable profiling ( #14737 )
...
Signed-off-by: wangli <wangli858794774@gmail.com >
2025-03-14 07:02:05 +00:00
Thien Tran
27b50f1fe6
[Bugfix][Kernel][CPU] Fix num_tokens in CPU rotary embedding kernel ( #14667 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-03-13 23:47:49 -07:00
Lucas Wilkinson
9532c49836
[Attention] MLA get rid of materialization ( #14770 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-13 23:39:02 -07:00
Roger Wang
0c2af17c76
[CI] Fix missing example model id in processor test ( #14787 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-14 13:52:15 +08:00
Jennifer Zhao
a6e0d096dd
[Feature] Add visionarena offline support for benchmark_throughput ( #14654 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Signed-off-by: Jennifer Zhao <ai.jenniferzhao@gmail.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Jennifer Zhao <JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2025-03-14 04:07:54 +00:00
Liangfu Chen
d3d4956261
[Neuron] flatten test parameterization for neuron attention kernels ( #14712 )
2025-03-13 20:46:56 -07:00
Nick Hill
4059adc31b
[Misc][Minor] Simplify SamplingParams.__post_init__() ( #14772 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-14 11:44:20 +08:00
Kevin H. Luu
f1f632d9ec
[ci] Reduce number of tests in fastcheck ( #14782 )
2025-03-13 20:43:45 -07:00
Thien Tran
95d680b862
[Bugfix][IPEX] Add VLLM_CPU_MOE_PREPACK to allow disabling MoE prepack when CPU does not support it ( #14681 )
...
Signed-off-by: Thien Tran <gau.nernst@yahoo.com.sg >
2025-03-13 20:43:18 -07:00
Thomas Parnell
fb4c7f8ef0
[Kernel] [V1] Further optimizations to ROCm (Triton) Backend to better handle GQA. ( #14431 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com >
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Chih-Chieh Yang <chih.chieh.yang@ibm.com >
2025-03-13 20:42:27 -07:00
Varun Sundar Rabindranath
0b1cfa6180
[Kernel] LoRA - Enable CUDAGraphs for V1 ( #14626 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-13 20:42:04 -07:00
Woosuk Kwon
32ef4983cd
[V1] Temporarily disable FlashInfer Rejection Sampler ( #14788 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-13 20:40:35 -07:00
Roger Wang
ad19c8a003
[V1] Move OOM check into sampler run ( #14728 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Simon Mo <simon.mo@hey.com >
2025-03-13 20:40:23 -07:00
Jeff Daily
2a602b055a
forward fix PR 14245, restore build on ROCm 6.2 ( #14709 )
...
Signed-off-by: Jeff Daily <jeff.daily@amd.com >
2025-03-13 20:40:15 -07:00
Alexander Matveev
7888e1d0a3
[V1] TPU - Enable prefix caching by default ( #14773 )
2025-03-13 20:40:05 -07:00
Chen Zhang
60c872d4b6
[Doc] Fix small typo in Transformers fallback ( #14791 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-03-13 20:33:12 -07:00
yasu52
3fb17d26c8
[Doc] Fix typo in documentation ( #14783 )
...
Signed-off-by: yasu52 <tsuguro4649@gmail.com >
2025-03-13 20:33:09 -07:00
Lucas Wilkinson
d47807ba08
[Attention] Remove slow setattr in MLA ( #14769 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-13 21:31:14 +00:00
afeldman-nm
02fcaa3d0a
[V1] Detokenizer: Respect Stop Tokens + not include_stop_str_in_output ( #14624 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
2025-03-13 19:07:34 +00:00
Aaron Pham
8a4a2efc6f
[V1][Core] using cached vocab_size for Structured Outputs ( #14630 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-13 11:39:28 -07:00
Cyrus Leung
8e9ffd37d6
[Misc] Clean up processor tests ( #14771 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-13 18:25:37 +00:00
Woosuk Kwon
01b3fd0af7
[V1][Minor] Minor enhancements on scheduler ( #14732 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-13 08:53:22 -07:00
Cyrus Leung
f53a0586b9
[Bugfix] Fix prompt format of GLM4V ( #14539 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-13 11:37:17 +00:00
Isotr0py
b1cc4dfef5
[VLM] Support loading InternVideo2.5 models as original InternVLChatModel ( #14738 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-13 03:10:02 -07:00
Cyrus Leung
382403921f
[VLM] Support pan-and-scan for Gemma3 multi-modal processor ( #14672 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-13 02:23:12 -07:00
Jee Jee Li
a73122de96
[Bugfix] fix benchmark moe ( #14653 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-13 16:12:42 +08:00
Jee Jee Li
bd44b812cb
[CI/Build] Delete ultravox LoRA test ( #14730 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-13 07:57:39 +00:00
Szymon Ożóg
55211b01e8
[Bugfix] Fix chunked prefill for GGUF ( #14666 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
2025-03-13 07:19:03 +00:00
Kyle Sayers
5d043c1685
[Quant] Bamba SupportsQuant ( #14698 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-03-13 04:57:05 +00:00
Kyle Sayers
36d1ccb286
[Quant] BartModel SupportsQuant ( #14699 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2025-03-13 04:55:59 +00:00
Siyuan Liu
1bc3b739c4
[V1][TPU] Add assertion on multi-step-scheduler ( #14707 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
2025-03-12 21:37:58 -07:00
Mathis Felardos
1bd32bc8dd
[Config][Disaggregated] Add timeout configuration for the torch.store and add KVTransferConfig.kv_connector_extra_config ( #14367 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai >
2025-03-12 20:15:20 -07:00
TY-AMD
128bf75283
[BugFix][TritonMLA] Process weights after model loading for GGUF ( #14555 )
...
Signed-off-by: TianyuanWu <Tianyuan.Wu@amd.com >
2025-03-12 20:14:36 -07:00
Gregory Shtrasberg
a94a699c3f
[ROCm][FP8] Fix for adjustments needed only for fnuz ( #14689 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2025-03-12 20:14:04 -07:00
Richard Liu
ab426ec9c0
Add ray[data] as tpu dependency ( #14691 )
...
Signed-off-by: <ricliu@google.com >
Signed-off-by: Richard Liu <ricliu@google.com >
2025-03-12 20:13:48 -07:00
Joe Runde
165290d357
[bugfix] fixup warning message for plugged schedulers for v1 ( #14700 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-03-12 20:12:13 -07:00
Kevin H. Luu
ce20124671
[release] Add force remove for TPU logs ( #14697 )
2025-03-12 22:35:18 +00:00
Woosuk Kwon
53be4a8634
[V1] Allow sliding window + prefix caching ( #13069 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-12 11:21:19 -07:00
Nick Hill
f5d3acd474
[BugFix][V1] Fix parallel sampling finishing/aborts ( #14512 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-12 10:29:48 -07:00
TJian
916836bbfb
[FEAT] [ROCm] [Embedding] Add encoder-only model support into ROCm Flash Attention to enable embedding models. ( #14664 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2025-03-12 09:31:19 -07:00
Sage Moore
d9f83d6206
[ROCm] Enable chunked prefill/paged attention in MLA on ROCm ( #14316 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-03-12 15:51:20 +00:00
ameyanjarlekar
4a754fcf15
[Bugfix] Missing thumbnail from NVLM-D processor ( #14633 )
...
Signed-off-by: ameyanjarlekar <aanjarlekar@nvidia.com >
2025-03-12 08:50:49 -07:00
Woosuk Kwon
c0c25e25fa
[Model] Add support for Gemma 3 ( #14660 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Signed-off-by: Roger Wang <ywang@roblox.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Roger Wang <ywang@roblox.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-12 08:36:33 -07:00
Sage Moore
45f3f3f59e
[ROCm][Bugfix] Ensure that the moe_wna16_gemm kernel is not built on ROCm platforms. ( #14629 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-03-12 08:00:28 -04:00
Li, Jiang
ff47aab056
[CPU] Upgrade CPU backend to torch-2.6 ( #13381 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
Co-authored-by: Isotr0py <2037008807@qq.com >
2025-03-12 10:41:13 +00:00
Pavani Majety
debd6bbf09
[Kernel] Add ModelOpt FP4 Checkpoint Support ( #12520 )
...
Signed-off-by: Pavani Majety <pmajety@nvidia.com >
2025-03-12 05:13:11 +00:00
Benjamin Chislett
5c538c37b2
[V1][Bugfix][Spec Decode] Fix incorrect outputs in V1 speculative decoding due to batch indexing ( #14645 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-03-11 22:12:41 -07:00
Szymon Ożóg
e22ee1e7a2
[Kernel] GGUF MoE kernel ( #14613 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
2025-03-12 03:33:27 +00:00
Isotr0py
e392d85831
[Core] Refactor QKVCrossParallelLinear implementation to support BNB 4-bit quantization ( #14545 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-11 20:12:52 -07:00
Aaron Pham
77a318bd01
[V1][Core] Support MistralTokenizer for Structured Output ( #14625 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-12 10:40:09 +08:00
Farzad Abdolhosseini
80e78d02ac
[Model] Extend Ultravox to accept audio longer than 30s ( #13631 )
...
Signed-off-by: Farzad Abdolhosseini <farzad@fixie.ai >
2025-03-12 10:27:10 +08:00
Jennifer Zhao
4a42b9f5d6
[Doc] Update benchmarks README ( #14646 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com >
2025-03-11 19:23:04 -07:00
Joe Runde
47532cd9f4
[core][V1] pluggable scheduler ( #14466 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-03-12 01:15:15 +00:00
Randy Chen
36e0c8f7da
[Feature] Add vllm bench CLI ( #13993 )
...
Signed-off-by: Randy Chen <acad.randyjhc@gmail.com >
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-12 00:31:48 +00:00
Kevin H. Luu
9f583e360c
[release] Add commands to clean up logs on TPU release node ( #14642 )
2025-03-12 00:14:50 +00:00
Cody Yu
b706d898af
[Bugfix][V1][PP] Only warmup sampler at last PP rank ( #14643 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-11 23:40:07 +00:00
iefgnoix
863d315c86
[V1][TPU] Pad the block_table.shape[1] so the ragged paged attention can handle correctly ( #14597 )
2025-03-11 19:12:26 -04:00
Richard Liu
d374f04a33
Fix run_tpu_test ( #14641 )
...
Signed-off-by: <ricliu@google.com >
Signed-off-by: Richard Liu <ricliu@google.com >
2025-03-11 21:14:33 +00:00
Russell Bryant
61a01b27a7
[V1] Delay all xgrammar usage until needed ( #14616 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-11 20:21:33 +00:00
Yang.Tao
53056731fd
fix some typos : supported_head_sizes ( #14627 )
2025-03-11 10:38:24 -07:00
Russell Bryant
4cbf286794
[V1] Remove cache from StructuredOutputManager ( #14622 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-11 10:36:07 -07:00
Kunshang Ji
c6e14a61ab
[Hardware][Intel GPU] upgrade IPEX dependency to 2.6.10. ( #14564 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2025-03-11 17:11:47 +00:00
Lucas Wilkinson
07b4b7a37f
[BugFix/Build] Fix sparse kernels not getting built on hopper ( #14572 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-11 17:09:03 +00:00
Dilip Gowda Bhagavan
07964e2f30
docs: Add documentation for s390x cpu implementation ( #14198 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-11 17:02:17 +00:00
Russell Bryant
4bf82d4b90
[V1] Add regex structured output support with xgrammar ( #14590 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-11 23:03:44 +08:00
Richard Liu
9ab326713f
Uninstall dependencies before installing requirements/tpu.txt ( #14586 )
...
Signed-off-by: <ricliu@google.com >
Signed-off-by: Richard Liu <ricliu@google.com >
2025-03-11 08:01:35 -07:00
Cyrus Leung
af295e9b01
[Bugfix] Update --hf-overrides for Alibaba-NLP/gte-Qwen2 ( #14609 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-11 07:59:43 -07:00
Jeff Daily
a1c8f3796c
dynamic distpatch of fp8 kernels ( #14245 )
...
Signed-off-by: Jeff Daily <jeff.daily@amd.com >
2025-03-11 10:54:56 -04:00
Russell Bryant
08a1a1121d
benchmarks: simplify test jsonschema ( #14567 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-11 13:39:30 +00:00
Isotr0py
1477ffc381
[VLM] Cleanup siglip legacy code and fix broken paligemma multimodal processor ( #14602 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-11 11:27:36 +00:00
yexin(叶鑫)
70b808fe1a
[Perf]:Optimize qwen2-vl to reduce cudaMemcpyAsync ( #14377 )
...
Signed-off-by: cynthieye <987073381@qq.com >
2025-03-11 07:39:56 +00:00
Isotr0py
63d635d179
[Misc] Correct deepseek-vl2 chat template ( #14558 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-11 04:37:11 +00:00
Roger Wang
1fc973c0b5
[V1][Core] Fix memory issue with logits & sampling ( #14508 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Varun Sundar Rabindranath <3337719+varun-sundar-rabindranath@users.noreply.github.com >
2025-03-11 04:03:41 +00:00
Concurrensee
c982ac5722
[Bugfix] Fix FP16 overflow for DeepSeek V2 ( #13232 )
...
Signed-off-by: Yida Wu <yida.wu@amd.com >
2025-03-10 20:46:59 -07:00
Cody Yu
4290b704ff
[V1][PP] Do not block engine core when no requests to schedule ( #14585 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-10 19:48:24 -07:00
Liangfu Chen
c91b64f749
[neuron] add reshape_and_cache ( #14391 )
2025-03-10 18:37:29 -07:00
gnovack
d6123170d5
[Neuron] Add Neuron device communicator for vLLM v1 ( #14085 )
2025-03-10 18:37:04 -07:00
Cody Yu
485afdd3cb
[MISC][V1] Handle exception of current_platform.get_device_name() in arg_utils ( #14379 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-10 20:42:11 -04:00
Jinzhen Lin
90e88ab756
[Kernel] moe wna16 cuda kernel ( #13321 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-03-10 20:12:40 -04:00
Russell Bryant
04421dff8a
[V1] Prevent xgrammar from breaking TPU support ( #14575 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-10 23:06:19 +00:00
Russell Bryant
432d6dad15
Fix typo in benchmark_serving_structured_output.py ( #14566 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-10 14:58:58 -07:00
Varun Sundar Rabindranath
5ff0d32580
[V1] LoRA - Add triton kernels for V1 ( #13096 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-10 17:27:53 -04:00
Woosuk Kwon
0967110e42
[Minor] Update the tqdm bar for parallel sampling ( #14571 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-10 14:23:48 -07:00
Simon Mo
fb0acb6c72
[Perf] Improve MLA on V1 ( #14540 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-10 12:06:58 -07:00
Chauncey
92b0ce2ac7
[Bugfix][v1] fixed llava-hf/llava-1.5-7b-hf is broken on V1 ( #14554 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-10 18:24:51 +00:00
Harry Mellor
bc2d4473bf
[Docs] Make installation URLs nicer ( #14556 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-10 10:43:08 -07:00
Harry Mellor
3b352a2f92
Correct capitalisation: VLLM -> vLLM ( #14562 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-10 16:36:21 +00:00
Roger Wang
dea985aef0
[V1][Bugfix] Fix handing of second_per_grid_ts for Qwen2-VL & Qwen2.5-VL ( #14548 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-10 16:03:11 +00:00
Harry Mellor
39be30351f
Correct capitalisation: Github -> GitHub ( #14561 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-10 15:53:33 +00:00
Cyrus Leung
001a9c7b0d
[Doc] Update PaliGemma note to a warning ( #14565 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-10 15:02:28 +00:00
Szymon Ożóg
89cdaa83e7
[Kernel] Add more dtype support for GGUF kernels ( #14043 )
...
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com >
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com >
2025-03-10 07:30:04 -07:00
Chauncey
b0746fae3d
[Frontend] support image embeds ( #13955 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-10 12:36:03 +00:00
Harry Mellor
60a98b2de5
[Docs] Mention model_impl arg when explaining Transformers fallback ( #14552 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-10 12:13:10 +00:00
Chauncey
460f553a6d
[Misc] Add log information for handle_process_request. ( #14130 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-03-10 08:40:50 +00:00
Jennifer Zhao
1253b15774
[Feature] Consolidate performance benchmark datasets ( #14036 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-10 07:23:11 +00:00
Martin Hoyer
dc74613fa2
[Bugfix] Wrong requirements path - rocm ( #14527 )
...
Signed-off-by: Martin Hoyer <mhoyer@redhat.com >
2025-03-10 02:49:46 +00:00
Yanyi Liu
a21076ed3a
[Misc] Ensure out-of-tree quantization method recognize by cli args ( #14328 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
2025-03-09 12:13:31 +00:00
Chengji Yao
212007b168
[Hardware][TPU] Fix the recompiling issue in logits processor after warmup ( #14510 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-03-09 05:44:39 -04:00
Isotr0py
fb16eea48b
[Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work ( #14498 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-09 04:47:45 +00:00
Yuchen Yan
73ae0b44e9
[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 ( #12428 )
...
Signed-off-by: Yuchen Yan <740987012@qq.com >
2025-03-08 20:14:53 -08:00
Jiayi Yao
6d7f037748
[Feat] Support chunked prefill for LMCache connector ( #14505 )
...
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn >
2025-03-08 19:30:06 -08:00
iefgnoix
10f7552789
[V1][TPU] Remove unnecessary padding for running on TPU. ( #14467 )
2025-03-08 21:56:04 -05:00
Lucas Wilkinson
b0d541947a
[Attention] Default to FlashMLA backend for MLA ( #14451 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-08 18:18:39 -08:00
Robert Shaw
5f0b53c6ea
Revert "[V1][Core] Fix memory issue with logits & sampling" ( #14504 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-03-08 17:43:37 -08:00
22quinn
eb8b5eb183
[V1] Support bad_words in sampler ( #13376 )
...
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-03-08 14:50:26 -08:00
Cyrus Leung
9513290032
[Misc] Upgrade to Python 3.9 typing for additional directories ( #14492 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-08 17:35:50 +00:00
Russell Bryant
0d5e73d30e
Update CODEOWNERS for structured output ( #14496 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-08 17:19:51 +00:00
Isotr0py
609ef61fea
[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling ( #14361 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-08 16:52:34 +00:00
Lucas Wilkinson
db84f5eb3b
[Bugfix] DeepSeek Accuracy ( #14476 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-08 16:47:03 +00:00
Harry Mellor
206e2577fa
Move requirements into their own directory ( #12547 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-08 16:44:35 +00:00
Cyrus Leung
e02883c400
[Misc] Don't run ruff at all on 3rd party libs ( #14493 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-08 07:16:40 -08:00
Russell Bryant
9085aabd62
[benchmarks] Add option to use unique jsonschema for each request ( #14457 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-08 06:36:39 -08:00
Roger Wang
8d5aa466fb
[V1][Core] Fix memory issue with logits & sampling ( #13776 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-08 06:11:04 -08:00
Aaron Pham
0b7f06b447
[Misc] add use_tqdm_on_load to reduce logs ( #14407 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2025-03-08 05:57:46 -08:00
Isotr0py
03fe18ae0f
[VLM] Add TP support for Phi-4-MM ( #14453 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-08 05:57:14 -08:00
Alexander Matveev
cb8bdfade2
[V1] TPU - Add tensor parallel support via Ray ( #13618 )
...
Signed-off-by: Alexander Matveev <amatveev@redhat.com >
2025-03-08 08:19:38 -05:00
Cyrus Leung
33f227e16b
[CI/Build] Use a fixed seed to avoid flaky tests ( #14480 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-08 11:30:09 +00:00
Harry Mellor
cfd0ae8234
Add RLHF document ( #14482 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-08 09:51:39 +00:00
Lucas Wilkinson
7caff01a7b
[Build/BugFix] Fix hopper 12.8 build ( #14354 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-08 08:11:56 +00:00
Harry Mellor
be0b399d74
Add training doc signposting to TRL ( #14439 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-08 07:35:07 +00:00
Jee Jee Li
b8b0ccbd2d
[Bugfix] Make the deviceprofiler include LoRA memory. ( #14469 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-08 07:12:22 +00:00
Robin
c908a07f57
[Doc] Added QwQ-32B to the supported models list in the reasoning out… ( #14479 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-03-08 07:07:32 +00:00
Robin
7b6fd6e486
[Doc]add doc for Qwen models tool calling ( #14478 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-03-08 06:58:46 +00:00
Harry Mellor
47512b3200
Default to generation_config from model ( #12622 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-08 14:46:15 +08:00
Roger Meier
3b9c6c6947
[CI/Build] refactor: set timezone of container to UTC ( #12888 )
...
Signed-off-by: Roger Meier <r.meier@siemens.com >
2025-03-07 22:42:01 -08:00
Aviv Keshet
4aae667668
[core] add extra_args to SamplingParams ( #13300 )
...
Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com >
2025-03-08 14:41:18 +08:00
Cody Yu
9f3bc0f58c
[MISC][V1] Register process killing handler only in the main thread ( #14380 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-07 22:40:06 -08:00
Mathis Felardos
980385f8c1
[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache ( #14369 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai >
2025-03-07 22:39:31 -08:00
Tyler Michael Smith
ca7a2d5f28
Revert "[Perf] Reduce MLA CPU overheads in V1 ( #14384 )" ( #14471 )
2025-03-07 22:18:53 -08:00
Tyler Michael Smith
333681408f
[Bugfix][V1] Handle MLA in kv_cache_interface ( #14462 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-07 22:18:25 -08:00
afeldman-nm
ef64044079
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC ( #13949 )
2025-03-08 01:48:12 +00:00
yarongmu-google
66e16a038e
[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 ( #14459 )
...
Signed-off-by: Yarong Mu <ymu@google.com >
2025-03-07 23:17:04 +00:00
Mark McLoughlin
e1f0835ae0
[V1][Metrics] Fix traceback with preemptions+LoRA ( #14220 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-07 15:36:16 -05:00
Nick Hill
8ed5421aaa
[V1] Eagerly remove finished requests from the batch ( #14388 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-07 10:56:00 -08:00
youkaichao
c6359e8ca6
[v1] torch.compile integration explanation ( #14437 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-08 01:55:50 +08:00
Jee Jee Li
952a074980
[Misc] Add Phi4-MM example ( #14343 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-07 17:28:52 +00:00
Jinzhen Lin
d0feea31c7
[Kernel] optimize performance of gptq marlin kernel when n is small ( #14138 )
...
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com >
2025-03-07 11:53:38 -05:00
Jeremy Arnold
58abe35455
[Benchmarks] Make detokenization optional in benchmark scripts ( #11697 )
...
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com >
2025-03-07 08:09:00 -08:00
York-RDWang
f7ebad2307
[Doc] Update prefix_caching.md to match the example image ( #14420 )
2025-03-07 15:29:00 +00:00
Aaron Pham
80e9afb5bc
[V1][Core] Support for Structured Outputs ( #12388 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-03-07 07:19:11 -08:00
iefgnoix
1e3598edeb
Use the optimized block sizes after tuning the kernel. ( #14329 )
2025-03-07 13:25:13 +00:00
Harry Mellor
f7a6bd0fa1
Fix missing kv_caches and attn_metadata in OpenVINOCausalLM ( #14271 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-07 12:30:42 +00:00
Aleksandr Malyshev
0ca3b8e01c
[BUGFIX] Skip tokenization support for throughput benchmark ( #12712 )
...
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com >
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu >
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com >
2025-03-07 02:51:47 -08:00
மனோஜ்குமார் பழனிச்சாமி
cc10281498
[Misc] Set default value of seed to None ( #14274 )
...
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com >
2025-03-07 10:40:01 +00:00
Cyrus Leung
05fb6718f0
[Bugfix] Clean up multi-modal processors ( #14417 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-07 10:33:38 +00:00
Jee Jee Li
12c29a881f
[Bugfix] Further clean up LoRA test ( #14422 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-07 10:30:55 +00:00
Peng Li
70da0c0748
correct wrong markdown syntax ( #14414 )
...
Signed-off-by: vincent-pli <justdoit.pli@gmail.com >
2025-03-07 08:01:18 +00:00
Cyrus Leung
c1588a2c94
[GH] Auto-apply multi-modality label to relevant PRs ( #14402 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-07 15:26:32 +08:00
Ilya Lavrenov
8ca7a71df7
OpenVINO: added CPU-like conditions ( #14338 )
...
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com >
2025-03-06 22:24:49 -08:00
Isotr0py
63137cd922
[Build] Add nightly wheel fallback when latest commit wheel unavailable ( #14358 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-06 22:10:57 -08:00
Jee Jee Li
ddd1ef66ec
[Bugfix] Fix JambaForCausalLM LoRA ( #14370 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-06 22:05:47 -08:00
Lucas Wilkinson
e5e03c2c1b
[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs ( #14396 )
2025-03-06 21:56:06 -08:00
Luka Govedič
e1744502c2
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object ( #14390 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-03-07 05:20:16 +00:00
Lucas Wilkinson
dae6896977
[Perf] Reduce MLA CPU overheads in V1 ( #14384 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-06 19:59:14 -08:00
Brayden Zhong
c34eeec58d
[Bugfix] Correctly call cudaProfilerStop in benchmarks script ( #14183 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-03-07 00:42:49 +00:00
Daniel Li
ad60bbb2b2
[Doc] Fix a typo ( #14385 )
2025-03-06 16:31:52 -08:00
Chengji Yao
0578e5a462
[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue ( #14310 )
...
Signed-off-by: Chengji Yao <chengjiyao@google.com >
2025-03-06 23:31:05 +00:00
Michael Goin
04222984f8
[Docs] Add nsight guide to profiling docs ( #14298 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-06 14:19:58 -08:00
Michael Goin
6832707e90
[V1][Bugfix] Standardize quantized kv cache rejection for attention backends ( #14221 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-06 14:18:29 -08:00
Michael Goin
6b2ef5cd17
[Bug] Fix Attention when ignored in by quant_method ( #14313 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-06 14:18:06 -08:00
Tyler Michael Smith
958adce478
[Bugfix] Fix use_direct_call condition in FusedMoE layer for ( #14382 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-06 14:17:21 -08:00
Tyler Michael Smith
99b0915d3b
[Kernel] Add needs_fixed_stride_order tag to most GEMMs ( #14306 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-06 14:17:09 -08:00
Thomas Parnell
8ca2b21c98
[CI] Disable spawn when running V1 Test ( #14345 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2025-03-06 21:52:46 +00:00
Michael Goin
d9292786e1
[CI/Build] Use uv python for docker rather than ppa:deadsnakes/ppa ( #13569 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-06 16:08:36 -05:00
Tyler Michael Smith
cc2f9b32c8
[Distributed] Add enable_expert_parallel arg ( #14305 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-06 18:54:45 +00:00
Himanshu Jaju
cd579352bf
[V1] Do not detokenize if sampling param detokenize is False ( #14224 )
...
Signed-off-by: Himanshu Jaju <hj@mistral.ai >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-03-06 10:40:24 -08:00
Ying Zhong
9f1710f1ac
Fix mla prefill context performance ( #13897 )
...
Signed-off-by: ZhongYingMatrix <zhongyingmatrix@gmail.com >
2025-03-06 09:35:49 -08:00
Thomas Parnell
e642ec962c
Add authors to license header. ( #14371 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com >
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com >
2025-03-06 08:43:09 -08:00
Dilip Gowda Bhagavan
ada19210a3
Adding cpu inference with VXE ISA for s390x architecture ( #12613 )
...
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com >
Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com >
Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com >
2025-03-06 08:40:53 -08:00
Harry Mellor
bf0560bda9
Reinstate best_of for V0 ( #14356 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-06 08:34:22 -08:00
youkaichao
151b08e0fe
[RLHF] use worker_extension_cls for compatibility with V0 and V1 ( #14185 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-07 00:32:46 +08:00
Jitse Klomp
81b2f4a45f
[Doc] Fix date typo in README.md ( #14366 )
...
Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl >
2025-03-06 08:29:57 -08:00
Cyrus Leung
82551ad616
[Core] Don't use cache during multi-modal profiling ( #14336 )
2025-03-06 08:03:31 -08:00
courage17340
caac5c2e59
[Bugfix][Core] fix abort_seq_group and memory leak when n>1 ( #14326 )
...
Signed-off-by: courage17340 <courage17340@163.com >
2025-03-06 23:59:32 +08:00
Thomas Parnell
6bd1dd9d26
[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend ( #14152 )
2025-03-06 07:39:16 -08:00
Irina Yuryeva
4f27044aab
[Doc] Correct beam_search using in generative_models.md ( #14363 )
2025-03-06 15:37:10 +00:00
Yanyi Liu
0ddc991f5c
[Doc] Update reasoning with stream example to use OpenAI library ( #14077 )
...
Signed-off-by: liuyanyi <wolfsonliu@163.com >
2025-03-06 13:20:37 +00:00
Nicolò Lucchesi
fa82b93853
[Frontend][Docs] Transcription API streaming ( #13301 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-03-06 10:39:35 +00:00
Nicolò Lucchesi
69ff99fdcd
[Core] Optimizing cross-attention QKVParallelLinear computation ( #12325 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
2025-03-06 09:37:26 +00:00
lkchen
5d802522a7
[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 ( #14275 )
...
Signed-off-by: Linkun Chen <github@lkchen.net >
2025-03-06 08:58:41 +00:00
kYLe
1769928079
[Model] Update Paligemma multimodal processing with PromptUpdate ( #14015 )
...
Signed-off-by: Kyle Huang <kylhuang@nvidia.com >
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-03-06 08:31:38 +00:00
Pavani Majety
ed6ea06577
[Hardware] Update the flash attn tag to support Blackwell ( #14244 )
2025-03-05 22:01:37 -08:00
Nicolò Lucchesi
5ee10e990d
[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention ( #11301 )
2025-03-05 20:00:53 -08:00
Varun Sundar Rabindranath
3dbd2d813a
[V1] LoRA - Enable more V1 tests ( #14315 )
...
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2025-03-06 11:55:42 +08:00
Ce Gao
f5f7f00cd9
[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 ( #14114 )
2025-03-06 03:49:20 +00:00
Rui Qiao
abcc61e0af
[misc] Mention ray list nodes command to troubleshoot ray issues ( #14318 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-03-06 02:00:36 +00:00
Lucas Wilkinson
f6bb18fd9a
[BugFix] MLA + V1, illegal memory access and accuracy issues ( #14253 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-03-05 17:10:13 -08:00
Yuan Tang
71eaf8969b
[Build] Add UV_HTTP_TIMEOUT to avoid timeout during installation ( #13850 )
...
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com >
2025-03-05 17:09:29 -08:00
Michael Goin
ca100c90fe
Add benchmark for DeepGEMM and vLLM Block FP8 Dense GEMM ( #13917 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-05 17:08:51 -08:00
Russell Bryant
ffad94397d
[CI/Build] Use spawn multiprocessing mode for V1 test pipeline ( #14243 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-03-05 17:08:02 -08:00
Lucas Wilkinson
4dacaa4a83
[BugFix] Fix prefix caching V0 MLA ( #14255 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Co-authored-by: Ying Zhong <zhongyingmatrix@gmail.com >
2025-03-05 17:07:42 -08:00
Tyler Michael Smith
a7ea35aa67
[Bugfix] Remove num_tokens_across_dp ( #14302 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-05 23:55:55 +00:00
pyc96
1e3e76b6cc
[Bugfix] Fix DeepSeek MTP crash when using TP1ModelRunner with CUDA graph due to shape mismatch ( #14237 )
...
Signed-off-by: pyc96 <pychen96@gmail.com >
2025-03-05 22:22:40 +00:00
Lu Fang
53ea6ad830
[V1][Easy] Add empty allowed_token_ids in the v1 sampler test ( #14308 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-05 21:41:18 +00:00
Serena
1b7624bf5c
[misc] Add FlashMLA as a new option of VLLM_ATTENTION_BACKEND env ( #14267 )
2025-03-05 21:28:50 +00:00
Nick Hill
ac60dc7fe1
[V1][BugFix] Fix for mixed top_k batch ( #14301 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Ye Cao <caoye.cao@alibaba-inc.com >
2025-03-05 20:43:04 +00:00
Vincent
a4f1ee35d6
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 ( #13997 )
...
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com >
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-05 20:22:43 +00:00
Nick Hill
a32c8669ca
[V1][Minor] Remove obsolete FIXME comment ( #14304 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-05 11:59:23 -08:00
Simon Mo
ca2ca8de57
[Docs] Add Meta Slides ( #14297 )
...
Signed-off-by: simon-mo <simon.mo@hey.com >
2025-03-05 08:30:23 -08:00
Isotr0py
f71b00a19e
[Bugfix] Fix broken vision language example ( #14292 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-05 15:57:10 +00:00
DaividFrank
8f808cf86e
prefix_caching.md: Fixed typo ( #14293 )
...
Signed-off-by: Daivid Savernin-Frenk <daivid.frank@TurboNext.ai >
2025-03-05 15:43:13 +00:00
Jee Jee Li
7bab4bb048
[Misc] Add Qwen2MoeForCausalLM moe tuning support ( #14276 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-05 23:11:29 +08:00
Isotr0py
e17e4488bd
[LoRA] Remove linear hack outside transformers backend ( #14177 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-03-05 15:06:28 +00:00
Robert Shaw
257e200a25
[V1][Frontend] Add Testing For V1 Runtime Parameters ( #14159 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-03-05 14:18:55 +00:00
Zhe Zhang
47d4a7e004
Small update for external_launcher backend docs ( #14288 )
2025-03-05 21:30:00 +08:00
Cyrus Leung
7f89a594dd
[Doc] [3/N] Refer code examples for common cases in dev multimodal processor ( #14278 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-03-05 12:29:50 +00:00
Iacopo Poli
961644e6a8
[Doc] Update nginx guide: remove privileged from vllm container run and add target GPU ID ( #14217 )
...
Signed-off-by: Iacopo Poli <iacopo@lighton.ai >
2025-03-05 11:44:10 +00:00
Lu Fang
8d6cd32b7b
[Bugfix][V1] Fix allowed_token_ids for v1 Sampler ( #14169 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-03-05 08:49:44 +00:00
Roger Wang
ec79b67c77
[Misc][V1] Avoid using envs.VLLM_USE_V1 in mm processing ( #14256 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-03-05 07:37:16 +00:00
Benjamin Chislett
32985bed7c
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param ( #14066 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-03-05 06:30:40 +00:00
Michael Goin
dae9ec464c
Temporarily disable test_awq_gemm_opcheck ( #14251 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-05 06:10:35 +00:00
youkaichao
6eaf93020d
[platforms] improve rocm debugging info ( #14257 )
2025-03-04 21:32:18 -08:00
Tyler Michael Smith
72c62eae5f
[V1] EP/TP MoE + DP Attention ( #13931 )
2025-03-04 21:27:26 -08:00
Congcong Chen
0a995d5434
[Model] New model support for Phi-4-multimodal-instruct ( #14119 )
2025-03-04 20:57:01 -08:00
Cody Yu
ade3f7d988
[V1][Bugfix] Do not reset prefix caching metrics ( #14235 )
2025-03-05 04:39:13 +00:00
rainkert
0df25101d6
[Bugfix] Fix gptq_marlin for deepseek-v3 ( #13750 )
...
Signed-off-by: dangshunya <dangshunya@baichuan-inc.com >
Co-authored-by: dangshunya <dangshunya@baichuan-inc.com >
2025-03-05 12:25:53 +08:00
Michael Goin
e123aafdf0
Disable GPTQ AllSpark kernels for CUDA Compiler < 12.0 ( #14157 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-05 12:25:24 +08:00
Nishidha
5b143d33be
Moved numba from common requirements to cuda/rocm specific requirements ( #14199 )
...
Signed-off-by: Nishidha Panpaliya <nishidha.panpaliya@partner.ibm.com >
2025-03-05 12:25:00 +08:00
youkaichao
eb59b5a6cb
[misc] announce china meetup ( #14248 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-05 10:33:50 +08:00
Michael Goin
fbfc3ee37e
[V1][TPU] TPU multimodal model support for ragged attention ( #14158 )
...
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2025-03-04 19:58:48 -05:00
Sage Moore
3e1d223626
[ROCm] Disable a few more kernel tests that are broken on ROCm ( #14145 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-03-04 23:37:55 +00:00
Tyler Michael Smith
4f5b059f14
Clean up unused padding_idx variables across many model definitions ( #13240 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-03-04 21:27:00 +00:00
Kuntai Du
288ca110f6
[Security] Serialize using safetensors instead of pickle in Mooncake Pipe ( #14228 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
2025-03-04 21:10:32 +00:00
Mark McLoughlin
c2bd2196fc
[v1][Metrics] Add design doc ( #12745 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-04 20:36:55 +00:00
Michael Goin
550c7ba3dc
[Docs] Update Dockerfile dependency image ( #14215 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-04 20:22:11 +00:00
Harry Mellor
e5b2f1601a
[Frontend] Do prompt_logprobs clamping for chat as well as completions ( #14225 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-04 20:13:06 +00:00
Harry Mellor
9badee53de
Fix performance when --generation-config is not None ( #14223 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-04 20:59:22 +01:00
Siyuan Liu
beebf4742a
[TPU][Profiler] Support start_profile/stop_profile in TPU worker ( #13988 )
...
Signed-off-by: Siyuan Liu <lsiyuan@google.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-03-04 14:40:06 -05:00
kushanam
f89978ad7c
add cutlass support for blackwell fp8 gemm ( #13798 )
2025-03-04 07:55:07 -08:00
lkchen
b3cf368d79
[V1][Molmo] Fix get_multimodal_embeddings() in molmo.py ( #14161 )
2025-03-04 15:43:59 +00:00
Mark McLoughlin
c8525f06fc
[V0][Metrics] Deprecate some questionable request time metrics ( #14135 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-04 15:11:33 +00:00
Nick Hill
5db6b2c961
[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs ( #13869 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-04 15:06:47 +00:00
Michael Goin
6247bae6c6
[Bugfix] Restrict MacOS CPU detection ( #14210 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-04 22:25:27 +08:00
youkaichao
3610fb4930
[doc] add "Failed to infer device type" to faq ( #14200 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-04 20:47:06 +08:00
youkaichao
71c4b40562
[sleep mode] error out with expandable_segments ( #14189 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-04 18:54:19 +08:00
youkaichao
ac65bc92df
[platform] add debug logging during inferring the device type ( #14195 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-03-04 18:39:16 +08:00
Michael Goin
f78c0be80a
Fix benchmark_moe.py tuning for CUDA devices ( #14164 )
2025-03-03 21:11:03 -08:00
Zhanwen Chen
66233af7b6
Use math.prod instead of np.prod for trivial ops ( #14142 )
2025-03-03 21:09:22 -08:00
Rui Qiao
bf13d40972
[core] Pass all driver env vars to ray workers unless excluded ( #14099 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-03-04 11:44:17 +08:00
Cody Yu
989f4f430c
[Misc] Remove lru_cache in NvmlCudaPlatform ( #14156 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-04 11:09:34 +08:00
Divakar Verma
bb5b640359
[core] moe fp8 block quant tuning support ( #14068 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-03-04 01:30:23 +00:00
Travis Johnson
c060b71408
[Model] Add support for GraniteMoeShared models ( #13313 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2025-03-04 08:04:52 +08:00
iefgnoix
79e4937c65
[v1] Add comments to the new ragged paged attention Pallas kernel ( #14155 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-03-03 23:00:55 +00:00
Qubitium-ModelCloud
cd1d3c3df8
[Docs] Add GPTQModel ( #14056 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-03-03 21:59:09 +00:00
Michael Goin
19d98e0c7d
[Kernel] Optimize moe intermediate_cache usage ( #13625 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-03 16:29:53 -05:00
Michael Goin
2b04c209ee
[Bugfix] Allow shared_experts skip quantization for DeepSeekV2/V3 ( #14100 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-03-03 14:20:24 -07:00
Mark McLoughlin
ae122b1cbd
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics ( #14055 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-03 19:04:45 +00:00
Nick Hill
872db2be0e
[V1] Simplify stats logging ( #14082 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-03-03 10:34:14 -08:00
Mark McLoughlin
2dfdfed8a0
[V0][Metrics] Deprecate some KV/prefix cache metrics ( #14136 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-03 18:25:46 +00:00
Mark McLoughlin
c41d27156b
[V0][Metrics] Remove unimplemented vllm:tokens_total ( #14134 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-03 17:50:22 +00:00
Harry Mellor
91373a0d15
Fix head_dim not existing in all model configs (Transformers backend) ( #14141 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-03 17:48:11 +00:00
TJian
848a6438ae
[ROCm] Faster Custom Paged Attention kernels ( #12348 )
2025-03-03 09:24:45 -08:00
Harry Mellor
98175b2816
Improve the docs for TransformersModel ( #14147 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-03-03 17:03:05 +00:00
Mark McLoughlin
4167252eaf
[V1] Refactor parallel sampling support ( #13774 )
...
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2025-03-03 08:15:27 -08:00
Cody Yu
f35f8e2242
[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 ( #13921 )
...
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com >
2025-03-03 16:43:14 +08:00
Mengqing Cao
b87c21fc89
[Misc][Platform] Move use allgather to platform ( #14010 )
...
Signed-off-by: Mengqing Cao <cmq0113@163.com >
2025-03-03 15:40:04 +08:00
wang.yuqi
e584b85afd
[Misc] duplicate code in deepseek_v2 ( #14106 )
2025-03-03 14:10:11 +08:00
Sheng Yao
09e56f9262
[Bugfix] Explicitly include "omp.h" for MacOS to avoid installation failure ( #14051 )
2025-03-02 17:35:01 -08:00
Harry Mellor
cf069aa8aa
Update deprecated Python 3.8 typing ( #13971 )
2025-03-02 17:34:51 -08:00
Ce Gao
bf33700ecd
[v0][structured output] Support reasoning output ( #12955 )
...
Signed-off-by: Ce Gao <cegao@tensorchord.ai >
2025-03-02 14:49:42 -05:00
qux-bbb
bc6ccb9878
[Doc] Source building add clone step ( #14086 )
...
Signed-off-by: qux-bbb <1147635419@qq.com >
2025-03-02 10:59:50 +00:00
Jun Duan
82fbeae92b
[Misc] Accurately capture the time of loading weights ( #14063 )
...
Signed-off-by: Jun Duan <jun.duan.phd@outlook.com >
2025-03-01 17:20:30 -08:00
Jee Jee Li
cc5e8f6db8
[Model] Add LoRA support for TransformersModel ( #13770 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-02 09:17:34 +08:00
Chen Zhang
d54990da47
[v1] Add __repr__ to KVCacheBlock to avoid recursive print ( #14081 )
2025-03-01 20:46:02 +00:00
Chen Zhang
b9f1d4294e
[v1][Bugfix] Only cache blocks that are not in the prefix cache ( #14073 )
2025-03-01 08:25:54 +00:00
Sage Moore
b28246f6ff
[ROCm][V1][Bugfix] Add get_builder_cls method to the ROCmAttentionBackend class ( #14065 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-03-01 07:18:32 +00:00
Woosuk Kwon
3b5567a209
[V1][Minor] Do not print attn backend twice ( #13985 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-03-01 07:09:14 +00:00
Isotr0py
fdcc405346
[Doc] Consolidate whisper and florence2 examples ( #14050 )
2025-02-28 22:49:15 -08:00
Kuntai Du
8994dabc22
[Documentation] Add more deployment guide for Kubernetes deployment ( #13841 )
...
Signed-off-by: KuntaiDu <kuntai@uchicago.edu >
Signed-off-by: Kuntai Du <kuntai@uchicago.edu >
2025-03-01 06:44:24 +00:00
Li, Jiang
02296f420d
[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor ( #14053 )
2025-02-28 22:31:01 -08:00
YajieWang
6a92ff93e1
[Misc][Kernel]: Add GPTQAllSpark Quantization ( #12931 )
2025-02-28 22:30:59 -08:00
Jee Jee Li
6a84164add
[Bugfix] Add file lock for ModelScope download ( #14060 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-03-01 06:10:28 +00:00
Brayden Zhong
f64ffa8c25
[Docs] Add pipeline_parallel_size to optimization docs ( #14059 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-03-01 05:43:54 +00:00
Luka Govedič
bd56c983d6
[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass ( #10902 )
...
Signed-off-by: luka <luka@neuralmagic.com >
2025-02-28 16:20:11 -07:00
Rui Qiao
084bbac8cc
[core] Bump ray to 2.43 ( #13994 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-02-28 21:47:44 +00:00
Chen Zhang
28943d36ce
[v1] Move block pool operations to a separate class ( #13973 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com >
2025-02-28 20:53:31 +00:00
Andrey Talman
b526ca6726
Add RELEASE.md ( #13926 )
...
Signed-off-by: atalman <atalman@fb.com >
2025-02-28 12:25:50 -08:00
Chen Zhang
e7bd944e08
[v1] Cleanup the BlockTable in InputBatch ( #13977 )
...
Signed-off-by: Chen Zhang <zhangch99@outlook.com >
2025-02-28 19:03:16 +00:00
iefgnoix
c3b6559a10
[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU ( #13379 )
...
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-02-28 11:01:36 -07:00
Harry Mellor
4be4b26cb7
Fix entrypoint tests for embedding models ( #14052 )
2025-02-28 08:56:44 -08:00
Brayden Zhong
2aed2c9fa7
[Doc] Fix ROCm documentation ( #14041 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-02-28 16:42:07 +00:00
Yang Liu
9b61dd41e7
[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series ( #14031 )
2025-02-28 07:36:08 -08:00
Cyrus Leung
f7bee5c815
[VLM][Bugfix] Enable specifying prompt target via index ( #14038 )
2025-02-28 07:35:55 -08:00
Jee Jee Li
e0734387fb
[Bugfix] Fix MoeWNA16Method activation ( #14024 )
2025-02-28 15:22:42 +00:00
Harry Mellor
f58f8b5c96
Update AutoAWQ docs ( #14042 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-28 15:20:29 +00:00
Thibault Schueller
b3f7aaccd0
[V1][Minor] Restore V1 compatibility with LLMEngine class ( #13090 )
2025-02-28 00:52:25 -08:00
Kacper Pietkun
b91660ddb8
[Hardware][Intel-Gaudi] Regional compilation support ( #13213 )
2025-02-28 00:51:49 -08:00
Harry Mellor
76c89fcadd
Use smaller embedding model when not testing model specifically ( #13891 )
2025-02-28 00:50:43 -08:00
Mathis Felardos
b9e41734c5
[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) ( #13987 )
...
Signed-off-by: Mathis Felardos <mathis@mistral.ai >
2025-02-28 07:53:45 +00:00
Cyrus Leung
1088f06242
[Doc] Move multimodal Embedding API example to Online Serving page ( #14017 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-02-28 07:12:04 +00:00
Travis Johnson
73e0225ee9
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama ( #13911 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2025-02-28 04:00:45 +00:00
Roger Wang
6c85da3a18
[V1]SupportsV0Only protocol for model definitions ( #13959 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-02-27 20:02:15 -05:00
Jee Jee Li
67fc426845
[Misc] Print FusedMoE detail info ( #13974 )
2025-02-27 18:53:13 -05:00
Benjamin Chislett
9804145cac
[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict ( #13626 )
...
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai >
2025-02-27 15:28:08 -08:00
Lucas Wilkinson
2e94b9cfbb
[Attention] Flash MLA for V1 ( #13867 )
...
Signed-off-by: Yang Chen <yangche@fb.com >
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Yang Chen <yangche@fb.com >
2025-02-27 23:03:41 +00:00
qli88
8294773e48
[core] Perf improvement for DSv3 on AMD GPUs ( #13718 )
...
Signed-off-by: qli88 <qiang.li2@amd.com >
2025-02-27 22:14:30 +00:00
Woosuk Kwon
cd813c6d4d
[V1][Minor] Minor cleanup for GPU Model Runner ( #13983 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-02-27 13:11:40 -08:00
Sage Moore
38acae6e97
[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups ( #13970 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-02-27 20:31:47 +00:00
Cyrus Leung
a2dd48c386
[VLM] Deprecate legacy input mapper for OOT multimodal models ( #13979 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-02-27 19:14:55 +00:00
dependabot[bot]
126f6beeb4
Bump azure/setup-helm from 4.2.0 to 4.3.0 ( #13742 )
...
Signed-off-by: dependabot[bot] <support@github.com >
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-27 19:04:10 +00:00
Yang Chen
58d1b2aa77
[Attention] MLA support for V1 ( #13789 )
...
Signed-off-by: Yang Chen <yangche@fb.com >
2025-02-27 13:14:17 -05:00
Cyrus Leung
f1579b229d
[VLM] Generalized prompt updates for multi-modal processor ( #13964 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-02-27 17:44:25 +00:00
Isotr0py
7864875879
[Bugfix] Fix qwen2.5-vl overflow issue ( #13968 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-02-27 17:30:39 +00:00
Noam Gat
1dd422b64a
Update LMFE version to v0.10.11 to support new versions of transforme… ( #13930 )
2025-02-27 17:16:12 +00:00
Rui Qiao
06c8f8d885
[bugfix] Fix profiling for RayDistributedExecutor ( #13945 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-02-28 01:01:21 +08:00
Harry Mellor
5677c9bb3e
Deduplicate .pre-commit-config.yaml's exclude ( #13967 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-27 16:27:47 +00:00
王博伟
512d77d582
Update quickstart.md ( #13958 )
2025-02-27 16:05:11 +00:00
Szymon Ożóg
7f0be2aa24
[Model] Deepseek GGUF support ( #13167 )
2025-02-27 02:08:35 -08:00
Isotr0py
edf309ebbe
[VLM] Support multimodal inputs for Florence-2 models ( #13320 )
2025-02-27 02:06:41 -08:00
Michael Goin
788f284b53
Fix test_block_fp8.py test for MoE ( #13915 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-27 18:00:00 +08:00
Yang Zheng
4b1d141f49
[PP] Correct cache size check ( #13873 )
...
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com >
2025-02-27 17:47:29 +08:00
Chauncey
10c3b8c1cf
[Misc] fixed 'required' is an invalid argument for positionals ( #13948 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
2025-02-27 09:06:49 +00:00
Brayden Zhong
a7f37314b7
[CI/Build] Add examples/ directory to be labelled by mergify ( #13944 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-02-27 08:24:11 +00:00
Mark McLoughlin
cd711c48b2
[V1][Metrics] Handle preemptions ( #13169 )
2025-02-26 20:04:59 -08:00
Sage Moore
378b3ef6f8
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding ( #13922 )
2025-02-26 20:04:12 -08:00
Rui Qiao
c9944acbf9
[misc] Rename Ray ADAG to Compiled Graph ( #13928 )
2025-02-26 20:03:28 -08:00
Michael Goin
ca377cf1b9
Use CUDA 12.4 as default for release and nightly wheels ( #12098 )
2025-02-26 19:06:37 -08:00
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
a31614e386
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined ( #13851 )
...
Signed-off-by: Hollow Man <hollowman@opensuse.org >
2025-02-27 10:39:10 +08:00
Lucas Wilkinson
f95903909f
[Kernel] FlashMLA integration ( #13747 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-02-27 10:35:08 +08:00
Woosuk Kwon
b382a7f28f
[BugFix] Make FP8 Linear compatible with torch.compile ( #13918 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-02-26 13:48:55 -08:00
Wallas Henrique
4cb6fa0a9c
[Bugfix] Backend option to disable xgrammar any_whitespace ( #12744 )
...
Signed-off-by: Wallas Santos <wallashss@ibm.com >
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com >
2025-02-26 10:52:34 -08:00
Chauncey
d08b285adf
[Misc] fixed qwen_vl_utils parameter error ( #13906 )
2025-02-26 08:31:53 -08:00
Chenyaaang
b27122acc2
[TPU] use torch2.6 with whl package ( #13860 )
...
Signed-off-by: Chenyaaang <llccyy1212@gmail.com >
2025-02-26 08:18:54 -05:00
Cyrus Leung
934bb99c71
[Bugfix] Update expected token counts for Ultravox tests ( #13895 )
2025-02-26 04:56:50 -08:00
Joe Runde
3f808cc044
[Bugfix] Do not crash V0 engine on input errors ( #13101 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-02-26 19:07:29 +08:00
Brayden Zhong
ec8a5e5386
[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor ( #13736 )
...
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca >
2025-02-26 19:06:47 +08:00
Florian Greinacher
215bf150a6
[Bugfix] Handle None parameters in Mistral function calls. ( #13786 )
2025-02-26 03:06:21 -08:00
Harry Mellor
0ecdd98031
Add comments on accessing kv_cache and attn_metadata ( #13887 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-26 18:41:02 +08:00
Cyrus Leung
7b700ec8c8
[Bugfix] Add test example for Ultravox v0.5 ( #13890 )
2025-02-26 02:31:43 -08:00
Roger Wang
7ca1da020f
[Misc] Fix input processing for Ultravox ( #13871 )
2025-02-25 23:56:34 -08:00
Jee Jee Li
5157338ed9
[Misc] Improve LoRA spelling ( #13831 )
2025-02-25 23:43:01 -08:00
Seth Kimmel
e206b54331
[v0][Core] Use xgrammar shared context to avoid copy overhead for offline engine ( #13837 )
...
Signed-off-by: Seth Kimmel <seth.kimmel3@gmail.com >
2025-02-26 14:58:24 +08:00
Sage Moore
1d35662e6d
[ROCm] Disable chunked prefill/prefix caching when running MLA on non-cuda platforms ( #13844 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
2025-02-26 14:56:58 +08:00
Albert
e656f638de
[Doc] fix the incorrect module path of tensorize_vllm_model ( #13863 )
2025-02-25 22:56:19 -08:00
Harry Mellor
145944cb94
Improve pipeline partitioning ( #13839 )
2025-02-25 18:53:56 -08:00
Henry Tsang
094b7d9496
[Kernel][Build/CI] Bump CUTLASS to 3.8 and add initializers for cutlass epilogues ( #13797 )
2025-02-25 18:52:03 -08:00
Chenguang Li
e1fe7591f2
[Misc]Code Cleanup ( #13859 )
...
Signed-off-by: noemotiovon <noemotiovon@gmail.com >
Co-authored-by: noemotiovon <noemotiovon@gmail.com >
2025-02-26 10:44:30 +08:00
Lily Liu
5629f26df7
[V1][Spec Decode] Change Spec Decode Rejection Sampling API ( #13729 )
2025-02-25 18:14:48 -08:00
Rui Qiao
9ba28043b5
[misc] Show driver IP info when Ray fails to allocate driver worker ( #13858 )
...
Signed-off-by: Rui Qiao <ruisearch42@gmail.com >
2025-02-26 09:53:43 +08:00
Harry Mellor
24679788ed
DeepSeek V2/V3/R1 only place lm_head on last pp rank ( #13833 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-26 01:24:57 +00:00
Michael Goin
07c4353057
[Model] Support Grok1 ( #13795 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-26 01:07:12 +00:00
Harry Mellor
34e3494e70
Fix failing MyGemma2Embedding test ( #13820 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-02-25 12:33:03 -08:00
Liangfu Chen
f75aa72732
[Neuron] Add custom_ops for neuron backend ( #13246 )
...
Signed-off-by: Liangfu Chen <liangfc@amazon.com >
Co-authored-by: George Novack <gnovack@amazon.com >
Co-authored-by: Aoyu Zhang <aoyuzhan@amazon.com >
2025-02-25 11:47:49 -08:00
Chen1022
340e39e387
Fix string parsing error ( #13825 )
2025-02-25 08:20:29 -08:00
Cyrus Leung
f4133ce4e5
[Bugfix] Revert inspection code in #13743 ( #13832 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-02-26 00:18:50 +08:00
Wen Sun
6522d55b6f
Fix /v1/audio/transcriptions Bad Request Error ( #13811 )
2025-02-25 06:03:33 -08:00
Isotr0py
6ff518626c
[Bugfix] Fix deepseek-vl2 inference with more than 2 images ( #13818 )
2025-02-25 06:03:02 -08:00
Nichols A. Romero
fa82074167
[Bugfix] Flush TunableOp results before worker processes are destroyed. ( #13623 )
...
Signed-off-by: Nichols A. Romero <nick.romero@amd.com >
2025-02-25 11:08:20 +00:00
Junlin Zhou
75e9d49796
[Bugfix] Initialize attention bias on the same device as Query/Key/Value ( #13468 )
2025-02-25 02:13:09 -08:00
Chen1022
32c3b6bfd1
[Misc]Clarify Error Handling for Non-existent Model Paths and HF Repo IDs ( #13724 )
...
Signed-off-by: Chen-0210 <chenjincong11@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-02-25 10:12:19 +00:00
Jee Jee Li
37b6cb4985
[CI/Build] Fix V1 LoRA failure ( #13767 )
2025-02-25 02:01:15 -08:00
Gregory Shtrasberg
aabeb2688f
[ROCm][Quantization][Kernel] Using HIP FP8 header ( #12593 )
2025-02-25 00:39:59 -08:00
Jiayi Yao
2f42a4888c
[Feature] Support KV cache offloading and disagg prefill with LMCache connector. ( #12953 )
2025-02-25 00:38:42 -08:00
Rui Qiao
3173c3b34e
[misc] Clean up ray compiled graph type hints ( #13731 )
2025-02-25 00:37:08 -08:00
Shanshan Shen
2d87d7d1ac
[Bugfix] Modify modelscope api usage in transformer_utils ( #13807 )
2025-02-25 00:36:07 -08:00
Russell Bryant
aab392774b
[Core] xgrammar: Expand list of unsupported jsonschema keywords ( #13783 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2025-02-25 08:21:25 +00:00
Cyrus Leung
6724e79164
[Misc] Check that the model can be inspected upon registration ( #13743 )
2025-02-25 00:18:19 -08:00
Varun Sundar Rabindranath
03f48b3db6
[Core] LoRA V1 - Add add/pin/list/remove_lora functions ( #13705 )
2025-02-25 00:18:02 -08:00
Michael Goin
4d251ad00e
Fix CompressedTensorsWNA16MoE with grouped scales ( #13769 )
2025-02-25 00:17:14 -08:00
Michael Goin
18e505930d
[Bugfix] Support MLA for CompressedTensorsWNA16 ( #13725 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-25 06:10:31 +00:00
Lucas Wilkinson
4a8cfc7551
[Bugfix] Fix deepseek-v2 error: "missing 1 required positional argument: 'residual'" ( #13802 )
2025-02-24 20:33:59 -08:00
Mark McLoughlin
bc32bc73aa
[V1][Metrics] Implement vllm:lora_requests_info metric ( #13504 )
2025-02-24 20:01:33 -08:00
wangxiyuan
ab1091d5f2
[Misc][Attention][Quantization] init property earlier ( #13733 )
...
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com >
2025-02-25 03:19:30 +00:00
Tyler Michael Smith
1e15aaef56
[Bugfix][Quantization] Fix FP8 + EP ( #13784 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-02-25 10:54:17 +08:00
cjackal
51010a1807
[Misc] set single whitespace between log sentences ( #13771 )
...
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com >
2025-02-25 10:26:12 +08:00
Eli Boyarski
7196a3b1db
[Doc] arg_utils.py: fixed a typo ( #13785 )
2025-02-24 18:23:04 -08:00
Harry Mellor
cdc1fa12eb
Remove unused kwargs from model definitions ( #13555 )
2025-02-24 17:13:52 -08:00
Robert Shaw
f61528d46d
[Misc][Chore] Clean Up AsyncOutputProcessing Logs ( #13780 )
2025-02-24 16:39:07 -08:00
Robert Shaw
1f0ae3ed0a
[Misc] Clean Up EngineArgs.create_engine_config ( #13734 )
...
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com >
2025-02-24 13:52:21 -05:00
Michael Goin
db986c19ea
Fix precommit fail in fused_moe intermediate_cache2 chunking ( #13772 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-24 09:25:47 -08:00
Roger Wang
227578480d
Revert "[V1][Core] Fix memory issue with logits & sampling" ( #13775 )
2025-02-24 09:16:05 -08:00
afeldman-nm
befc402d34
[V1] V1 engine implements parallel sampling (AsyncLLM and LLMEngine) ( #10980 )
...
Signed-off-by: Andrew Feldman <afeldman@neuralmagic.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-02-24 08:29:41 -08:00
Nicolò Lucchesi
444b0f0f62
[Misc][Docs] Raise error when flashinfer is not installed and VLLM_ATTENTION_BACKEND is set ( #12513 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2025-02-24 10:43:21 -05:00
Zhonghua Deng
ccc00515fd
[BugFix] Illegal memory access for MoE On H20 ( #13693 )
2025-02-24 07:37:32 -08:00
Jongseok Park
781096e385
Expert Parallelism (EP) Support for DeepSeek V2 ( #12583 )
2025-02-24 07:33:20 -08:00
Roger Meier
7940d8a6a7
[CI/Build] add python-json-logger to requirements-common ( #12842 )
2025-02-24 06:10:33 -08:00
Roger Meier
c0e3ecd6d2
[Bugfix] fix(logging): add missing opening square bracket ( #13011 )
2025-02-24 06:10:25 -08:00
Mengqing Cao
23eca9cf68
[model][refactor] remove cuda hard code in models and layers ( #13658 )
2025-02-24 06:10:14 -08:00
Roger Wang
437b76ff59
[V1][Core] Fix memory issue with logits & sampling ( #13721 )
2025-02-24 06:10:06 -08:00
Kevin H. Luu
f90a375593
[ci] Add logic to change model to S3 path only when S3 CI env var is on ( #13727 )
...
Signed-off-by: <>
Co-authored-by: EC2 Default User <ec2-user@ip-172-31-63-253.us-west-2.compute.internal >
2025-02-24 06:32:11 +00:00
Huy Do
e7ef74e26e
Fix some issues with benchmark data output ( #13641 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
2025-02-24 10:23:18 +08:00
Nick Hill
cbae7af552
[V1][BugFix] Fix engine core client shutdown hangs ( #13298 )
...
Even though ZMQ context.destroy() is meant to close open sockets before terminating the context, it appears to be necessary to do this explicitly or else it can hang in the context.term() method.
Close zmq sockets explicitly before terminating context, make shutdown of client resource more robust, shut down engine core process prior to terminating zmq context.
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-02-23 13:07:43 -08:00
youkaichao
eb24dc4a45
[v1] torchrun compatibility ( #13642 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-23 22:47:24 +08:00
Roger Wang
9bebc9512f
[Misc] Deprecate --dataset from benchmark_serving.py ( #13708 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-02-23 13:32:20 +00:00
Nick Hill
5a2ba16f5c
[Core][Distributed] Use IPC (domain socket) ZMQ socket for local comms ( #13688 )
2025-02-23 02:54:29 -08:00
Isotr0py
ba5106e519
[LMM] Implement merged multimodal processor for whisper ( #13278 )
2025-02-23 01:46:03 -08:00
Kyle Sayers
d5ca2110f1
[Quant] BaiChuan SupportsQuant ( #13710 )
2025-02-22 19:21:15 -08:00
Kevin H. Luu
2c5e637b57
[ci] Use env var to control whether to use S3 bucket in CI ( #13634 )
2025-02-22 19:19:45 -08:00
Andy Lo
322d2a27d6
[BugFix] Minor: logger import in attention backend ( #13706 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2025-02-22 16:51:13 -08:00
Roger Wang
82e0d601fc
[CI/Build] Fix pre-commit errors from #13571 ( #13709 )
...
Signed-off-by: Roger Wang <ywang@roblox.com >
2025-02-22 16:50:38 -08:00
Daniele
78ac0f591d
[CI/Build] fix uv caching in Dockerfile ( #13611 )
2025-02-22 08:25:20 -08:00
Yan Ma
b56155e7f3
[XPU]fix setuptools version for xpu ( #13548 )
2025-02-22 08:05:35 -08:00
Helena Kloosterman
382f66fb08
[Bugfix] Fix boolean conversion for OpenVINO env variable ( #13615 )
2025-02-22 08:04:12 -08:00
Cyrus Leung
8354f6640c
[Doc] Dockerfile instructions for optional dependencies and dev transformers ( #13699 )
2025-02-22 06:04:31 -08:00
Gregory Shtrasberg
c904fdddf6
[ROCm] Apply FP8 weights padding to values not divisible by 512 bytes on ROCm ( #13231 )
2025-02-22 05:54:38 -08:00
Sage Moore
558db8083c
[V1][Kernel] Refactor the prefix_prefill kernel so that the caller no longer has to pass in the context lengths ( #13095 )
2025-02-22 05:25:41 -08:00
Kaixi Hou
e109e598c7
[NVIDIA] Support nvfp4 cutlass gemm ( #13571 )
2025-02-22 05:24:05 -08:00
Keyun Tong
8db1b9d0a1
Support SSL Key Rotation in HTTP Server ( #13495 )
2025-02-22 05:17:44 -08:00
youkaichao
2382ad29d1
[ci] fix linter ( #13701 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-22 20:28:59 +08:00
youkaichao
3e472d882a
[core] set up data parallel communication ( #13591 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2025-02-22 19:28:59 +08:00
Cyrus Leung
7f6bae561c
[CI/Build] Fix pre-commit errors ( #13696 )
2025-02-22 00:31:26 -08:00
Jee Jee Li
105b8ce4c0
[Misc] Reduce LoRA-related static variable ( #13166 )
2025-02-22 00:21:30 -08:00
Mark McLoughlin
2cb8c1540e
[Metrics] Add --show-hidden-metrics-for-version CLI arg ( #13295 )
2025-02-22 00:20:45 -08:00
Mark McLoughlin
1cd981da4f
[V1][Metrics] Support vllm:cache_config_info ( #13299 )
2025-02-22 00:20:00 -08:00
Yu Chin Fabian Lim
fca20841c2
Correction to TP logic for Mamba Mixer 2 when Num Groups not divisible by TP Size ( #13660 )
2025-02-22 00:19:10 -08:00
Jennifer Zhao
da31b5333e
[Bugfix] V1 Memory Profiling: V0 Sampler Integration without Rejection Sampler ( #13594 )
...
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com >
Co-authored-by: Roger Wang <ywang@roblox.com >
2025-02-22 00:08:29 -08:00
Lu Fang
bb78fb318e
[v1] Support allowed_token_ids in v1 Sampler ( #13210 )
...
Signed-off-by: Lu Fang <lufang@fb.com >
2025-02-22 14:13:05 +08:00
Robin
8aca27fa11
[Bugfix] Fix benchmark script bug: inaccurate stats for vllm backend when max_model_len < input_len + output_len ( #13691 )
...
Signed-off-by: WangErXiao <863579016@qq.com >
2025-02-22 14:10:38 +08:00
Dipika Sikka
95c617e04b
[Misc] Bump compressed-tensors ( #13619 )
2025-02-21 22:09:04 -08:00
Shane A
9a1f1da5d1
[Bugfix][Model] OLMo 2: split qkv correctly for GQA and MQA ( #13687 )
2025-02-21 22:07:45 -08:00
Gordon Wong
68d630a0c7
[ROCM] fix native attention function call ( #13650 )
2025-02-21 22:07:04 -08:00
Jun Duan
68d535ef44
[Misc] Capture and log the time of loading weights ( #13666 )
2025-02-21 22:06:34 -08:00
Robin
c6ed93860f
[Bugfix][API Server] Fix invalid usage of 'ge' and 'le' in port valid… ( #13672 )
2025-02-21 22:05:28 -08:00
Keyun Tong
0ffdf8ce0c
[HTTP Server] Make model param optional in request ( #13568 )
2025-02-21 21:55:50 -08:00
Yuan Tang
8c0dd3d4df
docs: Add a note on full CI run in contributing guide ( #13646 )
2025-02-21 21:53:59 -08:00
Isotr0py
ada7c780d5
[Misc] Fix yapf linting tools etc not running on pre-commit ( #13695 )
...
Signed-off-by: Isotr0py <2037008807@qq.com >
2025-02-22 13:10:43 +08:00
Lucas Wilkinson
288cc6c234
[Attention] MLA with chunked prefill ( #12639 )
...
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Patrick Horn <patrick.horn@gmail.com >
Co-authored-by: simon-mo <xmo@berkeley.edu >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2025-02-21 15:30:12 -08:00
John Zheng
900edbfa48
fix typo of grafana dashboard, with correct datasource ( #13668 )
...
Signed-off-by: John Zheng <john.zheng@hp.com >
2025-02-21 18:21:05 +00:00
Isotr0py
b2c3fc5d65
[Bugfix][CPU] Fix cpu all-reduce using native pytorch implementation ( #13586 )
2025-02-20 22:24:17 -08:00
leoneo
839b27c6cc
[Kernel]Add streamK for block-quantized CUTLASS kernels ( #12978 )
2025-02-20 22:14:24 -08:00
Kevin H. Luu
34ad27fe83
[ci] Fix metrics test model path ( #13635 )
2025-02-20 22:12:10 -08:00
Gabriel Marinho
1c3c975766
[FEATURE] Enables /score endpoint for embedding models ( #12846 )
2025-02-20 22:09:47 -08:00
Szymon Ożóg
1cdc88614a
Missing comment explaining VDR variable in GGUF kernels ( #13290 )
2025-02-20 22:06:54 -08:00
Nick Hill
31aa045c11
[V1][Sampler] Avoid an operation during temperature application ( #13587 )
2025-02-20 22:05:56 -08:00
Roger Wang
a30c093502
[Bugfix] Add mm_processor_kwargs to chat-related protocols ( #13644 )
2025-02-20 22:04:33 -08:00
Harry Mellor
c7b07a95a6
Use pre-commit to update requirements-test.txt ( #13617 )
2025-02-20 22:03:27 -08:00
Kaixi Hou
27a09dc52c
[NVIDIA] Fix an issue to use current stream for the nvfp4 quant ( #13632 )
2025-02-20 22:01:48 -08:00
Edwin Hernandez
981f3c831e
[Misc] Adding script to setup ray for multi-node vllm deployments ( #12913 )
2025-02-20 21:16:40 -08:00
Kante Yin
44c33f01f3
Add llmaz as another integration ( #13643 )
...
Signed-off-by: kerthcet <kerthcet@gmail.com >
2025-02-21 03:52:40 +00:00
Lingfan Yu
33170081f1
[Neuron][Kernel] Vectorize KV cache load in FlashPagedAttention to maximize DMA bandwidth ( #13245 )
...
Signed-off-by: Lingfan Yu <lingfany@amazon.com >
2025-02-20 17:45:45 -08:00
Michael Goin
71face8540
[Bugfix] Fix max_num_batched_tokens for MLA ( #13620 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2025-02-20 17:45:20 -08:00
Joe Runde
bfbc0b32c6
[Frontend] Add backend-specific options for guided decoding ( #13505 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2025-02-20 15:07:58 -05:00
ajayvohra2005
6a417b8600
fix neuron performance issue ( #13589 )
2025-02-20 10:59:36 -08:00
Woosuk Kwon
d3ea50113c
[V1][Minor] Print KV cache size in token counts ( #13596 )
...
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2025-02-20 09:24:31 -08:00
Harry Mellor
34aad515c8
Update pre-commit's isort version to remove warnings ( #13614 )
2025-02-20 08:00:14 -08:00