Jee Jee Li
|
12c29a881f
|
[Bugfix] Further clean up LoRA test (#14422)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-07 10:30:55 +00:00 |
|
Ilya Lavrenov
|
8ca7a71df7
|
OpenVINO: added CPU-like conditions (#14338)
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
|
2025-03-06 22:24:49 -08:00 |
|
Jee Jee Li
|
ddd1ef66ec
|
[Bugfix] Fix JambaForCausalLM LoRA (#14370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-06 22:05:47 -08:00 |
|
Luka Govedič
|
e1744502c2
|
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-07 05:20:16 +00:00 |
|
Himanshu Jaju
|
cd579352bf
|
[V1] Do not detokenize if sampling param detokenize is False (#14224)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-06 10:40:24 -08:00 |
|
Harry Mellor
|
bf0560bda9
|
Reinstate best_of for V0 (#14356)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-06 08:34:22 -08:00 |
|
Thomas Parnell
|
6bd1dd9d26
|
[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152)
|
2025-03-06 07:39:16 -08:00 |
|
Nicolò Lucchesi
|
fa82b93853
|
[Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-06 10:39:35 +00:00 |
|
kYLe
|
1769928079
|
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-06 08:31:38 +00:00 |
|
Nicolò Lucchesi
|
5ee10e990d
|
[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention (#11301)
|
2025-03-05 20:00:53 -08:00 |
|
Varun Sundar Rabindranath
|
3dbd2d813a
|
[V1] LoRA - Enable more V1 tests (#14315)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-06 11:55:42 +08:00 |
|
Lucas Wilkinson
|
f6bb18fd9a
|
[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-05 17:10:13 -08:00 |
|
Lu Fang
|
53ea6ad830
|
[V1][Easy] Add empty allowed_token_ids in the v1 sampler test (#14308)
Signed-off-by: Lu Fang <lufang@fb.com>
|
2025-03-05 21:41:18 +00:00 |
|
Vincent
|
a4f1ee35d6
|
Deprecate best_of Sampling Parameter in anticipation for vLLM V1 (#13997)
Signed-off-by: vincent-4 <vincentzhongy+githubvincent4@gmail.com>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-05 20:22:43 +00:00 |
|
Robert Shaw
|
257e200a25
|
[V1][Frontend] Add Testing For V1 Runtime Parameters (#14159)
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
|
2025-03-05 14:18:55 +00:00 |
|
Benjamin Chislett
|
32985bed7c
|
[Frontend] Allow return_tokens_as_token_ids to be passed as a request param (#14066)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-03-05 06:30:40 +00:00 |
|
Michael Goin
|
dae9ec464c
|
Temporarily disable test_awq_gemm_opcheck (#14251)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-05 06:10:35 +00:00 |
|
Tyler Michael Smith
|
72c62eae5f
|
[V1] EP/TP MoE + DP Attention (#13931)
|
2025-03-04 21:27:26 -08:00 |
|
Congcong Chen
|
0a995d5434
|
[Model] New model support for Phi-4-multimodal-instruct (#14119)
|
2025-03-04 20:57:01 -08:00 |
|
Nick Hill
|
5db6b2c961
|
[V1][BugFix] Fix remaining sync engine client shutdown errors/hangs (#13869)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-04 15:06:47 +00:00 |
|
Travis Johnson
|
c060b71408
|
[Model] Add support for GraniteMoeShared models (#13313)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-03-04 08:04:52 +08:00 |
|
Mark McLoughlin
|
ae122b1cbd
|
[WIP][[V1][Metrics] Implement max_num_generation_tokens, request_params_n, and request_params_max_tokens metrics (#14055)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-03 19:04:45 +00:00 |
|
TJian
|
848a6438ae
|
[ROCm] Faster Custom Paged Attention kernels (#12348)
|
2025-03-03 09:24:45 -08:00 |
|
Cody Yu
|
f35f8e2242
|
[Build] Make sure local main branch is synced when VLLM_USE_PRECOMPILED=1 (#13921)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-03 16:43:14 +08:00 |
|
Harry Mellor
|
cf069aa8aa
|
Update deprecated Python 3.8 typing (#13971)
|
2025-03-02 17:34:51 -08:00 |
|
Ce Gao
|
bf33700ecd
|
[v0][structured output] Support reasoning output (#12955)
Signed-off-by: Ce Gao <cegao@tensorchord.ai>
|
2025-03-02 14:49:42 -05:00 |
|
Jee Jee Li
|
cc5e8f6db8
|
[Model] Add LoRA support for TransformersModel (#13770)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-02 09:17:34 +08:00 |
|
YajieWang
|
6a92ff93e1
|
[Misc][Kernel]: Add GPTQAllSpark Quantization (#12931)
|
2025-02-28 22:30:59 -08:00 |
|
Luka Govedič
|
bd56c983d6
|
[torch.compile] Fix RMSNorm + quant fusion in the non-cutlass-fp8 case, rename RedundantReshapesPass to NoopEliminationPass (#10902)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-02-28 16:20:11 -07:00 |
|
Chen Zhang
|
28943d36ce
|
[v1] Move block pool operations to a separate class (#13973)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-02-28 20:53:31 +00:00 |
|
Chen Zhang
|
e7bd944e08
|
[v1] Cleanup the BlockTable in InputBatch (#13977)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-02-28 19:03:16 +00:00 |
|
Harry Mellor
|
4be4b26cb7
|
Fix entrypoint tests for embedding models (#14052)
|
2025-02-28 08:56:44 -08:00 |
|
Cyrus Leung
|
f7bee5c815
|
[VLM][Bugfix] Enable specifying prompt target via index (#14038)
|
2025-02-28 07:35:55 -08:00 |
|
Harry Mellor
|
76c89fcadd
|
Use smaller embedding model when not testing model specifically (#13891)
|
2025-02-28 00:50:43 -08:00 |
|
Travis Johnson
|
73e0225ee9
|
[Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#13911)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2025-02-28 04:00:45 +00:00 |
|
Sage Moore
|
38acae6e97
|
[ROCm] Fix the Kernels, Core, and Prefix Caching AMD CI groups (#13970)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-02-27 20:31:47 +00:00 |
|
Cyrus Leung
|
f1579b229d
|
[VLM] Generalized prompt updates for multi-modal processor (#13964)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-02-27 17:44:25 +00:00 |
|
Isotr0py
|
edf309ebbe
|
[VLM] Support multimodal inputs for Florence-2 models (#13320)
|
2025-02-27 02:06:41 -08:00 |
|
Michael Goin
|
788f284b53
|
Fix test_block_fp8.py test for MoE (#13915)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-02-27 18:00:00 +08:00 |
|
Mark McLoughlin
|
cd711c48b2
|
[V1][Metrics] Handle preemptions (#13169)
|
2025-02-26 20:04:59 -08:00 |
|
Rui Qiao
|
c9944acbf9
|
[misc] Rename Ray ADAG to Compiled Graph (#13928)
|
2025-02-26 20:03:28 -08:00 |
|
Lucas Wilkinson
|
f95903909f
|
[Kernel] FlashMLA integration (#13747)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-02-27 10:35:08 +08:00 |
|
Wallas Henrique
|
4cb6fa0a9c
|
[Bugfix] Backend option to disable xgrammar any_whitespace (#12744)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 10:52:34 -08:00 |
|
Cyrus Leung
|
934bb99c71
|
[Bugfix] Update expected token counts for Ultravox tests (#13895)
|
2025-02-26 04:56:50 -08:00 |
|
Joe Runde
|
3f808cc044
|
[Bugfix] Do not crash V0 engine on input errors (#13101)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-02-26 19:07:29 +08:00 |
|
Florian Greinacher
|
215bf150a6
|
[Bugfix] Handle None parameters in Mistral function calls. (#13786)
|
2025-02-26 03:06:21 -08:00 |
|
Cyrus Leung
|
7b700ec8c8
|
[Bugfix] Add test example for Ultravox v0.5 (#13890)
|
2025-02-26 02:31:43 -08:00 |
|
Roger Wang
|
7ca1da020f
|
[Misc] Fix input processing for Ultravox (#13871)
|
2025-02-25 23:56:34 -08:00 |
|
Jee Jee Li
|
5157338ed9
|
[Misc] Improve LoRA spelling (#13831)
|
2025-02-25 23:43:01 -08:00 |
|
Harry Mellor
|
145944cb94
|
Improve pipeline partitioning (#13839)
|
2025-02-25 18:53:56 -08:00 |
|