fzyzcjy
|
2b0879bfc2
|
Super tiny little typo fix (#10633)
|
2024-11-25 13:08:30 +00:00 |
|
Cyrus Leung
|
ed46f14321
|
[Model] Support is_causal HF config field for Qwen2 model (#10621)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 09:51:20 +00:00 |
|
youkaichao
|
05d1f8c9c6
|
[misc] move functions to config.py (#10624)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 09:27:30 +00:00 |
|
youkaichao
|
25d806e953
|
[misc] add torch.compile compatibility check (#10618)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:40:08 -08:00 |
|
youkaichao
|
65813781a2
|
[torch.compile] add warning for unsupported models (#10622)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:27:51 -08:00 |
|
Jee Jee Li
|
7c2134beda
|
[torch.compile] force inductor threads (#10620)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 23:04:21 -08:00 |
|
Cyrus Leung
|
a30a605d21
|
[Doc] Add encoder-based models to Supported Models page (#10616)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-25 06:34:07 +00:00 |
|
youkaichao
|
571841b7fc
|
[torch.compile] support encoder based models (#10613)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-25 05:24:33 +00:00 |
|
Mengqing Cao
|
7ea3cd7c3e
|
[Refactor][MISC] del redundant code in ParallelConfig.postinit (#10614)
Signed-off-by: MengqingCao <cmq0113@163.com>
|
2024-11-25 05:14:56 +00:00 |
|
Maximilien de Bayser
|
214efc2c3c
|
Support Cross encoder models (#10400)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-11-24 18:56:20 -08:00 |
|
Zhuohan Li
|
49628fe13e
|
[Doc] Update README.md with Ray Summit talk links (#10610)
|
2024-11-24 16:45:09 -08:00 |
|
youkaichao
|
e4fbb14414
|
[doc] update the code to add models (#10603)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-24 11:21:40 -08:00 |
|
youkaichao
|
c055747867
|
[model][utils] add extract_layer_index utility function (#10599)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-23 22:22:54 -08:00 |
|
youkaichao
|
eda2b3589c
|
Revert "Print running script to enhance CI log readability" (#10601)
|
2024-11-23 21:31:47 -08:00 |
|
Jee Jee Li
|
1c445dca51
|
[CI/Build] Print running script to enhance CI log readability (#10594)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 03:57:13 +00:00 |
|
Jee Jee Li
|
1700c543a5
|
[Bugfix] Fix LoRA weight sharding (#10450)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-23 17:23:17 -08:00 |
|
Jee Jee Li
|
17d8fc1806
|
[bugfix] Fix example/tensorize_vllm_model tests (#10595)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-23 17:22:33 -08:00 |
|
Isotr0py
|
04668ebe7a
|
[Bugfix] Avoid import AttentionMetadata explicitly in Mllama (#10593)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-23 18:12:20 +00:00 |
|
Nishidha
|
651f6c31ac
|
For ppc64le, disabled tests for now and addressed space issues (#10538)
|
2024-11-23 09:33:53 +00:00 |
|
JiHuazhong
|
86a44fb896
|
[Platforms] Refactor openvino code (#10573)
Signed-off-by: statelesshz <hzji210@gmail.com>
|
2024-11-22 22:23:12 -08:00 |
|
Isotr0py
|
4cfe5d2bca
|
[Bugfix] multi_modal_kwargs broadcast for CPU tensor parallel (#10541)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-22 21:25:46 -08:00 |
|
Cyrus Leung
|
c8acd80548
|
[2/N] handling placeholders in merged multi-modal processor (#10485)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-22 21:25:09 -08:00 |
|
Ricky Xu
|
4634a89d18
|
Prefix Cache Aware Scheduling [1/n] (#10128)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-22 21:15:55 -08:00 |
|
kliuae
|
7c25fe45a6
|
[AMD] Add support for GGUF quantization on ROCm (#10254)
|
2024-11-22 21:14:49 -08:00 |
|
Michael Goin
|
02a43f82a9
|
Update default max_num_batch_tokens for chunked prefill to 2048 (#10544)
|
2024-11-22 21:14:19 -08:00 |
|
Chen Wu
|
cfea9c04ef
|
[Model] Fix Baichuan BNB online quantization (#10572)
Signed-off-by: Chen Wu <cntryroa@gmail.com>
|
2024-11-22 21:13:59 -08:00 |
|
Varun Vinayak Shenoy
|
7d8ffb344f
|
[Bugfix] Internal Server Error when tool_choice is incorrect. (#10567)
Signed-off-by: Varun Shenoy <varun.vinayak.shenoy@oracle.com>
|
2024-11-22 21:13:29 -08:00 |
|
youkaichao
|
4aba6e3d1a
|
[core] gemma2 full context length support (#10584)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 20:13:54 -08:00 |
|
Tyler Michael Smith
|
978b39744b
|
[Misc] Add pynccl wrappers for all_gather and reduce_scatter (#9432)
|
2024-11-22 22:14:03 -05:00 |
|
Russell Bryant
|
ebda51968b
|
[Core] Fix broken log configuration (#10458)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-23 10:23:51 +08:00 |
|
Travis Johnson
|
9195dbdbca
|
[Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
|
2024-11-23 10:17:38 +08:00 |
|
youkaichao
|
d559979c54
|
[bugfix] fix cpu tests (#10585)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 17:34:03 -08:00 |
|
Zhonghua Deng
|
d345f409b7
|
[V1] EngineCore supports profiling (#10564)
Signed-off-by: Abatom <abzhonghua@gmail.com>
|
2024-11-22 17:16:15 -08:00 |
|
Russell Bryant
|
28598f3939
|
[Core] remove temporary local variables in LLMEngine.__init__ (#10577)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-22 16:22:53 -08:00 |
|
zixuanzhang226
|
948c859571
|
support bitsandbytes quantization with qwen model (#10549)
Signed-off-by: Ubuntu <zixuanzhang@bytedance.com>
|
2024-11-22 16:16:14 -08:00 |
|
Ricky Xu
|
97814fbf0f
|
[v1] Refactor KVCacheManager for more hash input than token ids (#10507)
Signed-off-by: rickyx <rickyx@anyscale.com>
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
Co-authored-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-11-22 23:27:25 +00:00 |
|
youkaichao
|
eebad39f26
|
[torch.compile] support all attention backends (#10558)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 14:04:42 -08:00 |
|
youkaichao
|
db100c5cde
|
[bugfix] fix full graph tests (#10581)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 10:02:14 -08:00 |
|
Noam Gat
|
11fcf0e066
|
Remove token-adding chat embedding params (#10551)
Signed-off-by: Noam Gat <noamgat@gmail.com>
|
2024-11-21 23:59:47 -08:00 |
|
Isotr0py
|
b6374e09b0
|
[Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-22 15:01:56 +08:00 |
|
youkaichao
|
a111d0151f
|
[platforms] absorb worker cls difference into platforms folder (#10555)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-21 21:00:32 -08:00 |
|
Woosuk Kwon
|
446c7806b2
|
[Minor] Fix line-too-long (#10563)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-21 19:40:40 -08:00 |
|
youkaichao
|
33e0a2540a
|
[9/N] torch.compile LLM usage (#10552)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-21 19:13:31 -08:00 |
|
Simon Mo
|
aed074860a
|
[Benchmark] Add new H100 machine (#10547)
|
2024-11-21 18:27:20 -08:00 |
|
Michael Goin
|
9afa014552
|
Add small example to metrics.rst (#10550)
|
2024-11-21 23:43:43 +00:00 |
|
Woosuk Kwon
|
46fe9b46d8
|
[Minor] Revert change in offline inference example (#10545)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-21 21:28:16 +00:00 |
|
youkaichao
|
cf656f5a02
|
[misc] improve error message (#10553)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-21 13:13:17 -08:00 |
|
Yunmeng
|
edec3385b6
|
[CI][Installation] Avoid uploading CUDA 11.8 wheel (#10535)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-11-21 13:03:58 -08:00 |
|
Woosuk Kwon
|
f9310cbd0c
|
[V1] Fix Compilation config & Enable CUDA graph by default (#10528)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-21 12:53:39 -08:00 |
|
youkaichao
|
7560ae5caf
|
[8/N] enable cli flag without a space (#10529)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-21 12:30:42 -08:00 |
|