Roger Wang
|
2f0a0a17a4
|
[V1] Refactor model executable interface for multimodal models (#10570)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-26 20:46:11 +00:00 |
|
Michael Goin
|
7576cd38df
|
[Bugfix] Check bnb_4bit_quant_storage for bitsandbytes (#10642)
|
2024-11-26 12:29:00 -08:00 |
|
Michael Goin
|
9a99273b48
|
[Bugfix] Fix using -O[0,3] with LLM entrypoint (#10677)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-26 10:44:01 -08:00 |
|
Conroy Cheers
|
f5792c7c4a
|
[Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (#9735)
Signed-off-by: Conroy Cheers <conroy@corncheese.org>
|
2024-11-26 10:26:28 -08:00 |
|
Murali Andoorveedu
|
db66e018ea
|
[Bugfix] Fix for Spec model TP + Chunked Prefill (#10232)
Signed-off-by: andoorve <37849411+andoorve@users.noreply.github.com>
Signed-off-by: Sourashis Roy <sroy@roblox.com>
Co-authored-by: Sourashis Roy <sroy@roblox.com>
|
2024-11-26 09:11:16 -08:00 |
|
Kunshang Ji
|
1f6584ee85
|
[V1] Enable profile for LLMEngine (#10665)
|
2024-11-26 10:36:45 +00:00 |
|
youkaichao
|
334d64d1e8
|
[ci] add vllm_test_utils (#10659)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-26 00:20:04 -08:00 |
|
Cyrus Leung
|
940635343a
|
[Misc] Remove outdated init protocols (#10655)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-26 14:55:00 +08:00 |
|
Sage Moore
|
9a88f89799
|
custom allreduce + torch.compile (#10121)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 22:00:16 -08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
Sanket Kale
|
a6760f6456
|
[Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228)
Signed-off-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: Sanket Kale <sanketk.kale@fujitsu.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-11-25 18:32:39 -08:00 |
|
youkaichao
|
45ac4ff270
|
[bugfix] fix aria model and add torch.compile (#10645)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 18:32:09 -08:00 |
|
youkaichao
|
6e9ff050c8
|
[misc] do not read HOST_IP (#10644)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 17:04:50 -08:00 |
|
Shane A
|
9db713a1dc
|
[Model] Add OLMo November 2024 model (#10503)
|
2024-11-25 17:26:40 -05:00 |
|
Cyrus Leung
|
1b583cfefa
|
[Doc] Fix typos in docs (#10636)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 10:15:45 -08:00 |
|
Cyrus Leung
|
cf73f0c95e
|
[Model] Enable optional prefix when loading embedding models (#10639)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 18:14:33 +00:00 |
|
zhou fan
|
b1d920531f
|
[Model]: Add support for Aria model (#10514)
Signed-off-by: xffxff <1247714429@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-11-25 18:10:55 +00:00 |
|
Simon Mo
|
452a4e80c3
|
[Docs] Add Snowflake Slides (#10641)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-11-25 09:34:46 -08:00 |
|
Wallas Henrique
|
c27df94e1f
|
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-11-25 12:23:32 -05:00 |
|
Chauncey
|
d04b13a380
|
[Bug]: Authorization ignored when root_path is set (#10606)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-25 16:21:41 +00:00 |
|
fzyzcjy
|
2b0879bfc2
|
Super tiny little typo fix (#10633)
|
2024-11-25 13:08:30 +00:00 |
|
Cyrus Leung
|
ed46f14321
|
[Model] Support is_causal HF config field for Qwen2 model (#10621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 09:51:20 +00:00 |
|
youkaichao
|
05d1f8c9c6
|
[misc] move functions to config.py (#10624)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 09:27:30 +00:00 |
|
youkaichao
|
25d806e953
|
[misc] add torch.compile compatibility check (#10618)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:40:08 -08:00 |
|
youkaichao
|
65813781a2
|
[torch.compile] add warning for unsupported models (#10622)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:27:51 -08:00 |
|
Jee Jee Li
|
7c2134beda
|
[torch.compile] force inductor threads (#10620)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 23:04:21 -08:00 |
|
Cyrus Leung
|
a30a605d21
|
[Doc] Add encoder-based models to Supported Models page (#10616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 06:34:07 +00:00 |
|
youkaichao
|
571841b7fc
|
[torch.compile] support encoder based models (#10613)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 05:24:33 +00:00 |
|
Mengqing Cao
|
7ea3cd7c3e
|
[Refactor][MISC] del redundant code in ParallelConfig.postinit (#10614)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-25 05:14:56 +00:00 |
|
Maximilien de Bayser
|
214efc2c3c
|
Support Cross encoder models (#10400)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-11-24 18:56:20 -08:00 |
|
Zhuohan Li
|
49628fe13e
|
[Doc] Update README.md with Ray Summit talk links (#10610)
|
2024-11-24 16:45:09 -08:00 |
|
youkaichao
|
e4fbb14414
|
[doc] update the code to add models (#10603)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-24 11:21:40 -08:00 |
|
youkaichao
|
c055747867
|
[model][utils] add extract_layer_index utility function (#10599)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-23 22:22:54 -08:00 |
|
youkaichao
|
eda2b3589c
|
Revert "Print running script to enhance CI log readability" (#10601)
|
2024-11-23 21:31:47 -08:00 |
|
Jee Jee Li
|
1c445dca51
|
[CI/Build] Print running script to enhance CI log readability (#10594)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 03:57:13 +00:00 |
|
Jee Jee Li
|
1700c543a5
|
[Bugfix] Fix LoRA weight sharding (#10450)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-23 17:23:17 -08:00 |
|
Jee Jee Li
|
17d8fc1806
|
[bugfix] Fix example/tensorize_vllm_model tests (#10595)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-23 17:22:33 -08:00 |
|
Isotr0py
|
04668ebe7a
|
[Bugfix] Avoid import AttentionMetadata explicitly in Mllama (#10593)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-23 18:12:20 +00:00 |
|
Nishidha
|
651f6c31ac
|
For ppc64le, disabled tests for now and addressed space issues (#10538)
|
2024-11-23 09:33:53 +00:00 |
|
JiHuazhong
|
86a44fb896
|
[Platforms] Refactor openvino code (#10573)
Signed-off-by: statelesshz <hzji210@gmail.com>
|
2024-11-22 22:23:12 -08:00 |
|
Isotr0py
|
4cfe5d2bca
|
[Bugfix] multi_modal_kwargs broadcast for CPU tensor parallel (#10541)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-22 21:25:46 -08:00 |
|
Cyrus Leung
|
c8acd80548
|
[2/N] handling placeholders in merged multi-modal processor (#10485)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-22 21:25:09 -08:00 |
|
Ricky Xu
|
4634a89d18
|
Prefix Cache Aware Scheduling [1/n] (#10128)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-22 21:15:55 -08:00 |
|
kliuae
|
7c25fe45a6
|
[AMD] Add support for GGUF quantization on ROCm (#10254)
|
2024-11-22 21:14:49 -08:00 |
|
Michael Goin
|
02a43f82a9
|
Update default max_num_batch_tokens for chunked prefill to 2048 (#10544)
|
2024-11-22 21:14:19 -08:00 |
|
Chen Wu
|
cfea9c04ef
|
[Model] Fix Baichuan BNB online quantization (#10572)
Signed-off-by: Chen Wu <cntryroa@gmail.com>
|
2024-11-22 21:13:59 -08:00 |
|
Varun Vinayak Shenoy
|
7d8ffb344f
|
[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567)
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
|
2024-11-22 21:13:29 -08:00 |
|
youkaichao
|
4aba6e3d1a
|
[core] gemma2 full context length support (#10584)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 20:13:54 -08:00 |
|
Tyler Michael Smith
|
978b39744b
|
[Misc] Add pynccl wrappers for all_gather and reduce_scatter (#9432)
|
2024-11-22 22:14:03 -05:00 |
|
Russell Bryant
|
ebda51968b
|
[Core] Fix broken log configuration (#10458)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-23 10:23:51 +08:00 |
|