Andrey Talman
|
b526ca6726
|
Add RELEASE.md (#13926)
Signed-off-by: atalman <atalman@fb.com>
|
2025-02-28 12:25:50 -08:00 |
|
Chen Zhang
|
e7bd944e08
|
[v1] Cleanup the BlockTable in InputBatch (#13977)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-02-28 19:03:16 +00:00 |
|
iefgnoix
|
c3b6559a10
|
[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU (#13379)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-02-28 11:01:36 -07:00 |
|
Harry Mellor
|
4be4b26cb7
|
Fix entrypoint tests for embedding models (#14052)
|
2025-02-28 08:56:44 -08:00 |
|
Brayden Zhong
|
2aed2c9fa7
|
[Doc] Fix ROCm documentation (#14041)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-28 16:42:07 +00:00 |
|
Yang Liu
|
9b61dd41e7
|
[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (#14031)
|
2025-02-28 07:36:08 -08:00 |
|
Cyrus Leung
|
f7bee5c815
|
[VLM][Bugfix] Enable specifying prompt target via index (#14038)
|
2025-02-28 07:35:55 -08:00 |
|
Jee Jee Li
|
e0734387fb
|
[Bugfix] Fix MoeWNA16Method activation (#14024)
|
2025-02-28 15:22:42 +00:00 |
|
Harry Mellor
|
f58f8b5c96
|
Update AutoAWQ docs (#14042)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-28 15:20:29 +00:00 |
|
Thibault Schueller
|
b3f7aaccd0
|
[V1][Minor] Restore V1 compatibility with LLMEngine class (#13090)
|
2025-02-28 00:52:25 -08:00 |
|
Kacper Pietkun
|
b91660ddb8
|
[Hardware][Intel-Gaudi] Regional compilation support (#13213)
|
2025-02-28 00:51:49 -08:00 |
|
Harry Mellor
|
76c89fcadd
|
Use smaller embedding model when not testing model specifically (#13891)
|
2025-02-28 00:50:43 -08:00 |
|
Mathis Felardos
|
b9e41734c5
|
[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) (#13987)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2025-02-28 07:53:45 +00:00 |
|
Cyrus Leung
|
1088f06242
|
[Doc] Move multimodal Embedding API example to Online Serving page (#14017)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-28 07:12:04 +00:00 |
|
Travis Johnson
|
73e0225ee9
|
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#13911)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-02-28 04:00:45 +00:00 |
|
Roger Wang
|
6c85da3a18
|
[V1]SupportsV0Only protocol for model definitions (#13959)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-02-27 20:02:15 -05:00 |
|
Jee Jee Li
|
67fc426845
|
[Misc] Print FusedMoE detail info (#13974)
|
2025-02-27 18:53:13 -05:00 |
|
Benjamin Chislett
|
9804145cac
|
[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-02-27 15:28:08 -08:00 |
|
Lucas Wilkinson
|
2e94b9cfbb
|
[Attention] Flash MLA for V1 (#13867)
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Yang Chen <yangche@fb.com>
|
2025-02-27 23:03:41 +00:00 |
|
qli88
|
8294773e48
|
[core] Perf improvement for DSv3 on AMD GPUs (#13718)
Signed-off-by: qli88 <qiang.li2@amd.com>
|
2025-02-27 22:14:30 +00:00 |
|
Woosuk Kwon
|
cd813c6d4d
|
[V1][Minor] Minor cleanup for GPU Model Runner (#13983)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-27 13:11:40 -08:00 |
|
Sage Moore
|
38acae6e97
|
[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-02-27 20:31:47 +00:00 |
|
Cyrus Leung
|
a2dd48c386
|
[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 19:14:55 +00:00 |
|
dependabot[bot]
|
126f6beeb4
|
Bump azure/setup-helm from 4.2.0 to 4.3.0 (#13742)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2025-02-27 19:04:10 +00:00 |
|
Yang Chen
|
58d1b2aa77
|
[Attention] MLA support for V1 (#13789)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-02-27 13:14:17 -05:00 |
|
Cyrus Leung
|
f1579b229d
|
[VLM] Generalized prompt updates for multi-modal processor (#13964)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 17:44:25 +00:00 |
|
Isotr0py
|
7864875879
|
[Bugfix] Fix qwen2.5-vl overflow issue (#13968)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-27 17:30:39 +00:00 |
|
Noam Gat
|
1dd422b64a
|
Update LMFE version to v0.10.11 to support new versions of transforme… (#13930)
|
2025-02-27 17:16:12 +00:00 |
|
Rui Qiao
|
06c8f8d885
|
[bugfix] Fix profiling for RayDistributedExecutor (#13945)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-02-28 01:01:21 +08:00 |
|
Harry Mellor
|
5677c9bb3e
|
Deduplicate .pre-commit-config.yaml's exclude (#13967)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-27 16:27:47 +00:00 |
|
王博伟
|
512d77d582
|
Update quickstart.md (#13958)
|
2025-02-27 16:05:11 +00:00 |
|
Szymon Ożóg
|
7f0be2aa24
|
[Model] Deepseek GGUF support (#13167)
|
2025-02-27 02:08:35 -08:00 |
|
Isotr0py
|
edf309ebbe
|
[VLM] Support multimodal inputs for Florence-2 models (#13320)
|
2025-02-27 02:06:41 -08:00 |
|
Michael Goin
|
788f284b53
|
Fix test_block_fp8.py test for MoE (#13915)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-27 18:00:00 +08:00 |
|
Yang Zheng
|
4b1d141f49
|
[PP] Correct cache size check (#13873)
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
|
2025-02-27 17:47:29 +08:00 |
|
Chauncey
|
10c3b8c1cf
|
[Misc] fixed 'required' is an invalid argument for positionals (#13948)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-02-27 09:06:49 +00:00 |
|
Brayden Zhong
|
a7f37314b7
|
[CI/Build] Add examples/ directory to be labelled by mergify (#13944)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-27 08:24:11 +00:00 |
|
Mark McLoughlin
|
cd711c48b2
|
[V1][Metrics] Handle preemptions (#13169)
|
2025-02-26 20:04:59 -08:00 |
|
Sage Moore
|
378b3ef6f8
|
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922)
|
2025-02-26 20:04:12 -08:00 |
|
Rui Qiao
|
c9944acbf9
|
[misc] Rename Ray ADAG to Compiled Graph (#13928)
|
2025-02-26 20:03:28 -08:00 |
|
Michael Goin
|
ca377cf1b9
|
Use CUDA 12.4 as default for release and nightly wheels (#12098)
|
2025-02-26 19:06:37 -08:00 |
|
ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟
|
a31614e386
|
[ROCm][Quantization][Kernel] Use FP8 FNUZ when OCP flag is 0 or undefined (#13851)
Signed-off-by: Hollow Man <hollowman@opensuse.org>
|
2025-02-27 10:39:10 +08:00 |
|
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |
|
Woosuk Kwon
|
b382a7f28f
|
[BugFix] Make FP8 Linear compatible with torch.compile (#13918)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-26 13:48:55 -08:00 |
|
Wallas Henrique
|
4cb6fa0a9c
|
[Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 10:52:34 -08:00 |
|
Chauncey
|
d08b285adf
|
[Misc] fixed qwen_vl_utils parameter error (#13906)
|
2025-02-26 08:31:53 -08:00 |
|
Chenyaaang
|
b27122acc2
|
[TPU] use torch2.6 with whl package (#13860)
Signed-off-by: Chenyaaang <llccyy1212@gmail.com>
|
2025-02-26 08:18:54 -05:00 |
|
Cyrus Leung
|
934bb99c71
|
[Bugfix] Update expected token counts for Ultravox tests (#13895)
|
2025-02-26 04:56:50 -08:00 |
|
Joe Runde
|
3f808cc044
|
[Bugfix] Do not crash V0 engine on input errors (#13101)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 19:07:29 +08:00 |
|
Brayden Zhong
|
ec8a5e5386
|
[Misc]: Add support for goodput on guided benchmarking + TPOT calculation refactor (#13736)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-26 19:06:47 +08:00 |
|