Kuntai Du
|
9053d0b134
|
[Doc] Fix wrong github link in LMCache examples (#17274)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-04-28 03:09:11 +00:00 |
|
Michael Goin
|
cb3f2d8d10
|
[Bugfix] Fix Mistral3 spatial merge error (#17270)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-04-27 19:40:05 -07:00 |
|
TherLF
|
c12df53b60
|
[Bugfix] Fix cutlass dispatch for fp8/int8 to properly invoke M<=16 c… (#16751)
Signed-off-by: Ther-LF <2639852836@qq.com>
|
2025-04-27 19:38:42 -07:00 |
|
Lennart K. M. Schulz
|
d1aeea7553
|
[Bugfix] Fix missing ARG in Dockerfile for arm64 platforms (#17261)
Signed-off-by: lkm-schulz <44176356+lkm-schulz@users.noreply.github.com>
|
2025-04-27 19:38:14 -07:00 |
|
Lucas Wilkinson
|
d8bccde686
|
[BugFix] Fix vllm_flash_attn install issues (#17267)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-04-27 17:27:56 -07:00 |
|
Lily Liu
|
20e489eaa1
|
[V1][Spec Decode] Make eagle compatible with prefix caching. (#17137)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-27 09:29:43 -07:00 |
|
Cyrus Leung
|
4213475ec7
|
[Metrics] Fix minor inconsistencies in bucket progression (#17262)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-27 16:19:39 +00:00 |
|
Reid
|
d92879baf6
|
[doc] Add feature status legend (#17257)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-27 08:17:02 -07:00 |
|
cascade
|
690fe019f0
|
[Feature] support sequence parallelism using compilation pass (#16155)
Signed-off-by: cascade812 <cascade812@outlook.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-04-27 06:29:35 -07:00 |
|
Kaixi Hou
|
ed7a29d9f8
|
[NVIDIA] Support Cutlass MLA for Blackwell GPUs (#16032)
Signed-off-by: kaixih <kaixih@nvidia.com>
|
2025-04-27 06:29:21 -07:00 |
|
Alex Brooks
|
756848e79e
|
[Bugfix] Fix Lora Name Parsing (#17196)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-27 20:33:09 +08:00 |
|
Flex Wang
|
18445edd0f
|
[Misc] Change buckets of histogram_iteration_tokens to [1, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8096] to represent number of tokens (#17033)
Signed-off-by: sfc-gh-zhwang <flex.wang@snowflake.com>
|
2025-04-27 12:30:53 +00:00 |
|
Jade Zheng
|
30215ca61f
|
[MISC] Use string annotation types for class definitions (#17244)
Signed-off-by: Jade Zheng <zheng.shoujian@outlook.com>
|
2025-04-27 08:39:57 +00:00 |
|
Chen Zhang
|
838cedade7
|
[Bugfix] Get a specific type of layer from forward context (#17222)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-27 00:58:05 -07:00 |
|
Jee Jee Li
|
4283a28c2f
|
[Bugfix] Fix QWen2 VL multimodal mapping (#17240)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-04-27 05:53:23 +00:00 |
|
Cyrus Leung
|
93a126fbc7
|
[Misc] Make cached tokenizer pickle-compatible (#17048)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-27 13:05:00 +08:00 |
|
rasmith
|
8e4b351a0c
|
[Kernel][Triton][FP8] Adding fp8 and variable length sequence support to Triton FAv2 kernel (#12591)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-04-27 00:35:08 +00:00 |
|
Happy
|
9869453c42
|
Update test_flash_attn.py (#17102)
Signed-off-by: ShuaibinLi <lishuaibin@live.cn>
|
2025-04-26 22:17:35 +00:00 |
|
Reid
|
3642c59aa8
|
[CI/Build] remove -t for run-lm-eval-gsm-hf-baseline.sh (#16271)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-26 18:25:05 +00:00 |
|
Woosuk Kwon
|
43eea2953b
|
[Minor] Fix lint error in main branch (#17233)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-26 11:10:14 -07:00 |
|
Kero Liang
|
de7eb10ce4
|
[Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation (#16878)
Signed-off-by: imkero <kerorek@outlook.com>
|
2025-04-26 10:41:35 -07:00 |
|
Ning Xie
|
fd11a325b8
|
[MISC] rename interval to max_recent_requests (#14285)
|
2025-04-26 16:59:18 +00:00 |
|
Lu Fang
|
4d17e20310
|
Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (#16573)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-04-26 09:17:58 -07:00 |
|
changjun.lee
|
10fd1d7380
|
[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps (#9276)
Signed-off-by: changjun.lee <pord7457@gmail.com>
|
2025-04-26 11:51:17 -04:00 |
|
Russell Bryant
|
52b4f4a8d7
|
[Docs] Update structured output doc for V1 (#17135)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-26 15:12:18 +00:00 |
|
Aaron Pham
|
e782e0a170
|
[Chore] added stubs for vllm_flash_attn during development mode (#17228)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-04-26 07:45:26 -07:00 |
|
Ning Xie
|
dc2ceca5c5
|
[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-04-26 14:34:24 +00:00 |
|
Russell Bryant
|
f8acd01ff7
|
[V1] Add structural_tag support using xgrammar (#17085)
|
2025-04-26 14:06:37 +00:00 |
|
Agata Dobrzyniewicz
|
c48334d405
|
[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
|
2025-04-26 05:55:14 -07:00 |
|
Cyrus Leung
|
909fdaf152
|
[Bugfix] Fix standard models tests (#17217)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-26 02:26:41 -07:00 |
|
Isotr0py
|
8c1c926d00
|
[Bugfix] Fix missing int type for -n in multi-image example (#17223)
|
2025-04-26 08:49:52 +00:00 |
|
Nick Hill
|
df6f3ce883
|
[Core] Remove prompt string from engine core data structures (#17214)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 23:41:05 -07:00 |
|
Woosuk Kwon
|
513f074766
|
[CI/test] Fix Eagle Correctness Test (#17209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 23:40:36 -07:00 |
|
Nick Hill
|
b07bf83c7d
|
[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-26 06:00:07 +00:00 |
|
Zijing Liu
|
53e8cf53a4
|
[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661)
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 22:05:40 -07:00 |
|
Charlie Fu
|
54271bb766
|
[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (#17011)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-04-25 22:05:10 -07:00 |
|
Shu Wang
|
9e96f56efb
|
Allocate kv_cache with stride order (#16605)
Signed-off-by: shuw <shuw@nvidia.com>
|
2025-04-25 22:03:31 -07:00 |
|
Woosuk Kwon
|
b278911229
|
[Minor][Models] Fix Return Types of Llama & Eagle (#17220)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:54:47 -07:00 |
|
yarongmu-google
|
7bd0c7745c
|
[Doc] Minor fix for the vLLM TPU setup page (#17206)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-04-26 04:39:56 +00:00 |
|
Woosuk Kwon
|
1cf0719ebd
|
[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:08:15 -07:00 |
|
Reid
|
537d5ee025
|
[doc] add Anything LLM integration (#17216)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-25 21:03:23 -07:00 |
|
Lu Fang
|
c8e5be35f7
|
[MISC][AMD] Add unused annotation to rocm kernel file (#17097)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-04-25 20:33:35 -07:00 |
|
James Wu
|
a6e72e1e4f
|
[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (#17142)
Signed-off-by: James Wu <jjwu@meta.com>
|
2025-04-26 11:28:20 +08:00 |
|
Yihua Cheng
|
5e83a7277f
|
[v1] [P/D] Adding LMCache KV connector for v1 (#16625)
|
2025-04-26 03:03:38 +00:00 |
|
rasmith
|
68af5f6c5c
|
[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-04-25 19:55:05 -07:00 |
|
Chen Zhang
|
8de2901fea
|
[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (#17180)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-25 19:53:51 -07:00 |
|
Rui Qiao
|
c53e0730cb
|
[Misc] Refine ray_serve_deepseek example (#17204)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-04-25 16:06:59 -07:00 |
|
Benjamin Chislett
|
a0e619e62a
|
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-25 15:43:07 -07:00 |
|
Nick Hill
|
70116459c3
|
[BugFix][Frontend] Fix LLM.chat() tokenization (#16081)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 22:20:05 +00:00 |
|
Christian Heimes
|
65e262b93b
|
Fix Python packaging edge cases (#17159)
Signed-off-by: Christian Heimes <christian@python.org>
|
2025-04-26 06:15:07 +08:00 |
|