Kero Liang
|
de7eb10ce4
|
[Bugfix] Fix Qwen2.5-Omni M-RoPE position ids generation (#16878)
Signed-off-by: imkero <kerorek@outlook.com>
|
2025-04-26 10:41:35 -07:00 |
|
Ning Xie
|
fd11a325b8
|
[MISC] rename interval to max_recent_requests (#14285)
|
2025-04-26 16:59:18 +00:00 |
|
Lu Fang
|
4d17e20310
|
Disable the torch.compile cache checks when VLLM_DISABLE_COMPILE_CACHE=1 (#16573)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-04-26 09:17:58 -07:00 |
|
changjun.lee
|
10fd1d7380
|
[Bugfix] fix error due to an uninitialized tokenizer when using skip_tokenizer_init with num_scheduler_steps (#9276)
Signed-off-by: changjun.lee <pord7457@gmail.com>
|
2025-04-26 11:51:17 -04:00 |
|
Russell Bryant
|
52b4f4a8d7
|
[Docs] Update structured output doc for V1 (#17135)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-04-26 15:12:18 +00:00 |
|
Aaron Pham
|
e782e0a170
|
[Chore] added stubs for vllm_flash_attn during development mode (#17228)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-04-26 07:45:26 -07:00 |
|
Ning Xie
|
dc2ceca5c5
|
[BUGFIX] use random for NONE_HASH only when PYTHONHASHSEED not set (#17088)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-04-26 14:34:24 +00:00 |
|
Russell Bryant
|
f8acd01ff7
|
[V1] Add structural_tag support using xgrammar (#17085)
|
2025-04-26 14:06:37 +00:00 |
|
Agata Dobrzyniewicz
|
c48334d405
|
[Hardware][Intel-Gaudi] Update hpu-extension and update bucketing system for HPU device (#17186)
Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
|
2025-04-26 05:55:14 -07:00 |
|
Cyrus Leung
|
909fdaf152
|
[Bugfix] Fix standard models tests (#17217)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-26 02:26:41 -07:00 |
|
Isotr0py
|
8c1c926d00
|
[Bugfix] Fix missing int type for -n in multi-image example (#17223)
|
2025-04-26 08:49:52 +00:00 |
|
Nick Hill
|
df6f3ce883
|
[Core] Remove prompt string from engine core data structures (#17214)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 23:41:05 -07:00 |
|
Woosuk Kwon
|
513f074766
|
[CI/test] Fix Eagle Correctness Test (#17209)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 23:40:36 -07:00 |
|
Nick Hill
|
b07bf83c7d
|
[BugFix] Avoid race conditions in zero-copy tensor transmission (#17203)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-26 06:00:07 +00:00 |
|
Zijing Liu
|
53e8cf53a4
|
[V1][Metrics] Allow V1 AsyncLLM to use custom logger (#14661)
Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 22:05:40 -07:00 |
|
Charlie Fu
|
54271bb766
|
[ROCm][Misc] Follow-ups for Skinny Gemms on ROCm. (#17011)
Signed-off-by: charlifu <charlifu@amd.com>
|
2025-04-25 22:05:10 -07:00 |
|
Shu Wang
|
9e96f56efb
|
Allocate kv_cache with stride order (#16605)
Signed-off-by: shuw <shuw@nvidia.com>
|
2025-04-25 22:03:31 -07:00 |
|
Woosuk Kwon
|
b278911229
|
[Minor][Models] Fix Return Types of Llama & Eagle (#17220)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:54:47 -07:00 |
|
yarongmu-google
|
7bd0c7745c
|
[Doc] Minor fix for the vLLM TPU setup page (#17206)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-04-26 04:39:56 +00:00 |
|
Woosuk Kwon
|
1cf0719ebd
|
[Minor][Spec Decode] Add use_eagle to SpeculativeConfig (#17213)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:08:15 -07:00 |
|
Reid
|
537d5ee025
|
[doc] add Anything LLM integration (#17216)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-25 21:03:23 -07:00 |
|
Lu Fang
|
c8e5be35f7
|
[MISC][AMD] Add unused annotation to rocm kernel file (#17097)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-04-25 20:33:35 -07:00 |
|
James Wu
|
a6e72e1e4f
|
[Bugfix] [pytorch] Patch AOTAutogradCache._get_shape_env (#17142)
Signed-off-by: James Wu <jjwu@meta.com>
|
2025-04-26 11:28:20 +08:00 |
|
Yihua Cheng
|
5e83a7277f
|
[v1] [P/D] Adding LMCache KV connector for v1 (#16625)
|
2025-04-26 03:03:38 +00:00 |
|
rasmith
|
68af5f6c5c
|
[AMD][FP8][BugFix] Remove V1 check in arg_utils.py for FP8 since it is not necessary (#17215)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2025-04-25 19:55:05 -07:00 |
|
Chen Zhang
|
8de2901fea
|
[Bugfix] gemma[2,3] interleaved attention when sliding window is disabled (#17180)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-04-25 19:53:51 -07:00 |
|
Rui Qiao
|
c53e0730cb
|
[Misc] Refine ray_serve_deepseek example (#17204)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-04-25 16:06:59 -07:00 |
|
Benjamin Chislett
|
a0e619e62a
|
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-25 15:43:07 -07:00 |
|
Nick Hill
|
70116459c3
|
[BugFix][Frontend] Fix LLM.chat() tokenization (#16081)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-04-25 22:20:05 +00:00 |
|
Christian Heimes
|
65e262b93b
|
Fix Python packaging edge cases (#17159)
Signed-off-by: Christian Heimes <christian@python.org>
|
2025-04-26 06:15:07 +08:00 |
|
Cyrus Leung
|
43faa0461a
|
[Bugfix] Fix hybrid model tests (#17182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 15:14:37 -07:00 |
|
Daniel Li
|
48cb2109b6
|
[V1] Move usage stats to worker and start logging TPU hardware (#16211)
|
2025-04-25 14:06:01 -06:00 |
|
Russell Bryant
|
a5450f11c9
|
[Security] Use safe serialization and fix zmq setup for mooncake pipe (#17192)
Signed-off-by: Shangming Cai <caishangming@linux.alibaba.com>
Co-authored-by: Shangming Cai <caishangming@linux.alibaba.com>
|
2025-04-25 16:53:23 +00:00 |
|
Cyrus Leung
|
9d98ab5ec6
|
[Misc] Inline Molmo requirements (#17190)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 16:41:44 +00:00 |
|
Reid
|
df5c879527
|
[doc] update wrong hf model links (#17184)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-25 16:40:54 +00:00 |
|
Harry Mellor
|
423e9f1cbe
|
Use Transformers helper get_text_config() instead of checking for text_config (#17105)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:47:35 -07:00 |
|
Harry Mellor
|
0bd7f8fca5
|
Bump Transformers to 4.51.3 (#17116)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-25 08:34:34 -07:00 |
|
Jasmond L
|
d5615af9ae
|
[Bugfix] Fix Mistral ChatCompletionRequest Body Exception (#16769)
Signed-off-by: Jasmond Loh <Jasmond.Loh@hotmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-04-25 07:26:30 -07:00 |
|
Cyrus Leung
|
19dcc02a72
|
[Bugfix] Fix mistral model tests (#17181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-25 06:03:34 -07:00 |
|
Alex Brooks
|
7feae92c1f
|
[Doc] Move todo out of beam search docstring (#17183)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-25 04:44:58 -07:00 |
|
Michael Yao
|
f851b84266
|
[Doc] Add two links to disagg_prefill.md (#17168)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 10:23:57 +00:00 |
|
Lu Fang
|
fc966e9cc6
|
Only turn on FastIncrementalDetokenizer when tokenizers >= 0.21.1 (#17158)
|
2025-04-25 17:10:32 +08:00 |
|
Michael Yao
|
ef19e67d2c
|
[Doc] Add headings to improve gptqmodel.md (#17164)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-25 01:13:13 -07:00 |
|
rasmith
|
a41351f363
|
[Quantization][FP8] Add support for FP8 models with input_scale for output projection and QK quantization (#15734)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
|
2025-04-25 00:45:02 -07:00 |
|
Sangyeon Cho
|
6aae216b4e
|
[Bugfix] remove fallback in guided_json (int range, patterns) (#16725)
Signed-off-by: csy1204 <josang1204@gmail.com>
Co-authored-by: 조상연[플레이스 AI] <sang-yeon.cho@navercorp.com>
|
2025-04-25 06:54:43 +00:00 |
|
yexin(叶鑫)
|
b22980a1dc
|
[Perf]Optimize rotary_emb implementation to use Triton operator for improved inference performance (#16457)
Signed-off-by: cynthieye <yexin93@qq.com>
Co-authored-by: MagnetoWang <magnetowang@outlook.com>
|
2025-04-25 14:52:28 +08:00 |
|
Lucas Wilkinson
|
881f735827
|
[Misc] Benchmark Serving Script Support Appending Results (#17028)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-24 22:53:55 -07:00 |
|
Mengqing Cao
|
2f54045508
|
[Bugfix][Misc] Use TritonPlaceholderModule to defensively import triton (#15099)
Signed-off-by: Mengqing Cao <cmq0113@163.com>
|
2025-04-24 22:51:02 -07:00 |
|
Lifu Huang
|
5aa6efb9a5
|
[Misc] Clean up redundant code in uniproc_executor.py (#16762)
Signed-off-by: Lifu Huang <lifu.hlf@gmail.com>
|
2025-04-24 22:49:30 -07:00 |
|
Harry Mellor
|
6ca0234478
|
Move missed SchedulerConfig args into scheduler config group in EngineArgs (#17131)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-24 22:48:53 -07:00 |
|