Woosuk Kwon
|
3b5567a209
|
[V1][Minor] Do not print attn backend twice (#13985)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-03-01 07:09:14 +00:00 |
|
Isotr0py
|
fdcc405346
|
[Doc] Consolidate whisper and florence2 examples (#14050)
|
2025-02-28 22:49:15 -08:00 |
|
Kuntai Du
|
8994dabc22
|
[Documentation] Add more deployment guide for Kubernetes deployment (#13841)
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Signed-off-by: Kuntai Du <kuntai@uchicago.edu>
|
2025-03-01 06:44:24 +00:00 |
|
Li, Jiang
|
02296f420d
|
[Bugfix][V1][Minor] Fix shutting_down flag checking in V1 MultiprocExecutor (#14053)
|
2025-02-28 22:31:01 -08:00 |
|
YajieWang
|
6a92ff93e1
|
[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931)
|
2025-02-28 22:30:59 -08:00 |
|
Jee Jee Li
|
6a84164add
|
[Bugfix] Add file lock for ModelScope download (#14060)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-01 06:10:28 +00:00 |
|
Brayden Zhong
|
f64ffa8c25
|
[Docs] Add pipeline_parallel_size to optimization docs (#14059)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-01 05:43:54 +00:00 |
|
Luka Govedič
|
bd56c983d6
|
[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-02-28 16:20:11 -07:00 |
|
Rui Qiao
|
084bbac8cc
|
[core] Bump ray to 2.43 (#13994)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-02-28 21:47:44 +00:00 |
|
Chen Zhang
|
28943d36ce
|
[v1] Move block pool operations to a separate class (#13973)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-02-28 20:53:31 +00:00 |
|
Andrey Talman
|
b526ca6726
|
Add RELEASE.md (#13926)
Signed-off-by: atalman <atalman@fb.com>
|
2025-02-28 12:25:50 -08:00 |
|
Chen Zhang
|
e7bd944e08
|
[v1] Cleanup the BlockTable in InputBatch (#13977)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-02-28 19:03:16 +00:00 |
|
iefgnoix
|
c3b6559a10
|
[V1][TPU] Integrate the new ragged paged attention kernel with vLLM v1 on TPU (#13379)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-02-28 11:01:36 -07:00 |
|
Harry Mellor
|
4be4b26cb7
|
Fix entrypoint tests for embedding models (#14052)
|
2025-02-28 08:56:44 -08:00 |
|
Brayden Zhong
|
2aed2c9fa7
|
[Doc] Fix ROCm documentation (#14041)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-28 16:42:07 +00:00 |
|
Yang Liu
|
9b61dd41e7
|
[Bugfix] Initialize attention bias on the same device as Query/Key/Value for QwenVL Series (#14031)
|
2025-02-28 07:36:08 -08:00 |
|
Cyrus Leung
|
f7bee5c815
|
[VLM][Bugfix] Enable specifying prompt target via index (#14038)
|
2025-02-28 07:35:55 -08:00 |
|
Jee Jee Li
|
e0734387fb
|
[Bugfix] Fix MoeWNA16Method activation (#14024)
|
2025-02-28 15:22:42 +00:00 |
|
Harry Mellor
|
f58f8b5c96
|
Update AutoAWQ docs (#14042)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-28 15:20:29 +00:00 |
|
Thibault Schueller
|
b3f7aaccd0
|
[V1][Minor] Restore V1 compatibility with LLMEngine class (#13090)
|
2025-02-28 00:52:25 -08:00 |
|
Kacper Pietkun
|
b91660ddb8
|
[Hardware][Intel-Gaudi] Regional compilation support (#13213)
|
2025-02-28 00:51:49 -08:00 |
|
Harry Mellor
|
76c89fcadd
|
Use smaller embedding model when not testing model specifically (#13891)
|
2025-02-28 00:50:43 -08:00 |
|
Mathis Felardos
|
b9e41734c5
|
[Bugfix][Disaggregated] patch the inflight batching on the decode node in SimpleConnector to avoid hangs in SimpleBuffer (nccl based) (#13987)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2025-02-28 07:53:45 +00:00 |
|
Cyrus Leung
|
1088f06242
|
[Doc] Move multimodal Embedding API example to Online Serving page (#14017)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-28 07:12:04 +00:00 |
|
Travis Johnson
|
73e0225ee9
|
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#13911)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-02-28 04:00:45 +00:00 |
|
Roger Wang
|
6c85da3a18
|
[V1]SupportsV0Only protocol for model definitions (#13959)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-02-27 20:02:15 -05:00 |
|
Jee Jee Li
|
67fc426845
|
[Misc] Print FusedMoE detail info (#13974)
|
2025-02-27 18:53:13 -05:00 |
|
Benjamin Chislett
|
9804145cac
|
[Model][Speculative Decoding] Expand DeepSeek MTP code to support k > n_predict (#13626)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-02-27 15:28:08 -08:00 |
|
Lucas Wilkinson
|
2e94b9cfbb
|
[Attention] Flash MLA for V1 (#13867)
Signed-off-by: Yang Chen <yangche@fb.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Yang Chen <yangche@fb.com>
|
2025-02-27 23:03:41 +00:00 |
|
qli88
|
8294773e48
|
[core] Perf improvement for DSv3 on AMD GPUs (#13718)
Signed-off-by: qli88 <qiang.li2@amd.com>
|
2025-02-27 22:14:30 +00:00 |
|
Woosuk Kwon
|
cd813c6d4d
|
[V1][Minor] Minor cleanup for GPU Model Runner (#13983)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-02-27 13:11:40 -08:00 |
|
Sage Moore
|
38acae6e97
|
[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-02-27 20:31:47 +00:00 |
|
Cyrus Leung
|
a2dd48c386
|
[VLM] Deprecate legacy input mapper for OOT multimodal models (#13979)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 19:14:55 +00:00 |
|
dependabot[bot]
|
126f6beeb4
|
Bump azure/setup-helm from 4.2.0 to 4.3.0 (#13742)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
|
2025-02-27 19:04:10 +00:00 |
|
Yang Chen
|
58d1b2aa77
|
[Attention] MLA support for V1 (#13789)
Signed-off-by: Yang Chen <yangche@fb.com>
|
2025-02-27 13:14:17 -05:00 |
|
Cyrus Leung
|
f1579b229d
|
[VLM] Generalized prompt updates for multi-modal processor (#13964)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 17:44:25 +00:00 |
|
Isotr0py
|
7864875879
|
[Bugfix] Fix qwen2.5-vl overflow issue (#13968)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-02-27 17:30:39 +00:00 |
|
Noam Gat
|
1dd422b64a
|
Update LMFE version to v0.10.11 to support new versions of transforme… (#13930)
|
2025-02-27 17:16:12 +00:00 |
|
Rui Qiao
|
06c8f8d885
|
[bugfix] Fix profiling for RayDistributedExecutor (#13945)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-02-28 01:01:21 +08:00 |
|
Harry Mellor
|
5677c9bb3e
|
Deduplicate .pre-commit-config.yaml's exclude (#13967)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-02-27 16:27:47 +00:00 |
|
王博伟
|
512d77d582
|
Update quickstart.md (#13958)
|
2025-02-27 16:05:11 +00:00 |
|
Szymon Ożóg
|
7f0be2aa24
|
[Model] Deepseek GGUF support (#13167)
|
2025-02-27 02:08:35 -08:00 |
|
Isotr0py
|
edf309ebbe
|
[VLM] Support multimodal inputs for Florence-2 models (#13320)
|
2025-02-27 02:06:41 -08:00 |
|
Michael Goin
|
788f284b53
|
Fix test_block_fp8.py test for MoE (#13915)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-27 18:00:00 +08:00 |
|
Yang Zheng
|
4b1d141f49
|
[PP] Correct cache size check (#13873)
Signed-off-by: Yang Zheng <zhengy.gator@gmail.com>
|
2025-02-27 17:47:29 +08:00 |
|
Chauncey
|
10c3b8c1cf
|
[Misc] fixed 'required' is an invalid argument for positionals (#13948)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-02-27 09:06:49 +00:00 |
|
Brayden Zhong
|
a7f37314b7
|
[CI/Build] Add examples/ directory to be labelled by mergify (#13944)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-02-27 08:24:11 +00:00 |
|
Mark McLoughlin
|
cd711c48b2
|
[V1][Metrics] Handle preemptions (#13169)
|
2025-02-26 20:04:59 -08:00 |
|
Sage Moore
|
378b3ef6f8
|
[ROCm][V1] Update reshape_and_cache to properly work with CUDA graph padding (#13922)
|
2025-02-26 20:04:12 -08:00 |
|
Rui Qiao
|
c9944acbf9
|
[misc] Rename Ray ADAG to Compiled Graph (#13928)
|
2025-02-26 20:03:28 -08:00 |
|