Woosuk Kwon
|
3a765bd5e1
|
Temporarily enforce eager mode for GPTQ models (#2154)
|
2023-12-17 01:51:12 -08:00 |
|
Woosuk Kwon
|
c3372e87be
|
Remove dependency on CuPy (#2152)
|
2023-12-17 01:49:07 -08:00 |
|
Woosuk Kwon
|
e1d5402238
|
Fix all-reduce memory usage (#2151)
|
2023-12-17 01:44:45 -08:00 |
|
Woosuk Kwon
|
3d1cfbfc74
|
[Minor] Delete Llama tokenizer warnings (#2146)
|
2023-12-16 22:05:18 -08:00 |
|
Woosuk Kwon
|
37ca558103
|
Optimize model execution with CUDA graph (#1926)
Co-authored-by: Chen Shen <scv119@gmail.com>
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-12-16 21:12:08 -08:00 |
|
Roy
|
eed74a558f
|
Simplify weight loading logic (#2133)
|
2023-12-16 12:41:23 -08:00 |
|
Woosuk Kwon
|
2acd76f346
|
[ROCm] Temporarily remove GPTQ ROCm support (#2138)
|
2023-12-15 17:13:58 -08:00 |
|
CHU Tianxiang
|
0fbfc4b81b
|
Add GPTQ support (#916)
|
2023-12-15 03:04:22 -08:00 |
|
Yunfeng Bai
|
c06170cc8e
|
Add a flag to include stop string in output text (#1976)
|
2023-12-15 00:45:58 -08:00 |
|
mezuzza
|
6774bd50b0
|
Fix typing in AsyncLLMEngine & add toml to requirements-dev (#2100)
|
2023-12-14 00:19:41 -08:00 |
|
Woosuk Kwon
|
31c1f3255e
|
Bump up to v0.2.5 (#2095)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.1.1) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.1.1) (push) Has been cancelled
|
2023-12-13 23:56:15 -08:00 |
|
Antoni Baum
|
21d93c140d
|
Optimize Mixtral with expert parallelism (#2090)
|
2023-12-13 23:55:07 -08:00 |
|
Woosuk Kwon
|
f1c8520146
|
[BugFix] Fix input positions for long context with sliding window (#2088)
|
2023-12-13 12:28:13 -08:00 |
|
Woosuk Kwon
|
518369d78c
|
Implement lazy model loader (#2044)
|
2023-12-12 22:21:45 -08:00 |
|
Woosuk Kwon
|
30bad5c492
|
Fix peak memory profiling (#2031)
|
2023-12-12 22:01:53 -08:00 |
|
Megha Agarwal
|
6428f1d051
|
Support MPT with GQA (#1938)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2023-12-12 10:16:05 -08:00 |
|
Woosuk Kwon
|
cb3f30c600
|
Upgrade transformers version to 4.36.0 (#2046)
|
2023-12-11 18:39:14 -08:00 |
|
Woosuk Kwon
|
31d2ab4aff
|
Remove python 3.10 requirement (#2040)
|
2023-12-11 12:26:42 -08:00 |
|
Woosuk Kwon
|
4dd4b5c538
|
Bump up to v0.2.4 (#2034)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.1.0) (push) Has been cancelled
|
2023-12-11 11:49:39 -08:00 |
|
Woosuk Kwon
|
6120e5aaea
|
Fix import error msg for megablocks (#2038)
|
2023-12-11 11:40:56 -08:00 |
|
Woosuk Kwon
|
81ce2a4b26
|
[Minor] Fix type annotation in Mixtral (#2036)
|
2023-12-11 11:32:39 -08:00 |
|
Woosuk Kwon
|
b9bcdc7158
|
Change the load format to pt for Mixtral (#2028)
|
2023-12-11 10:32:17 -08:00 |
|
Woosuk Kwon
|
4ff0203987
|
Minor fixes for Mixtral (#2015)
|
2023-12-11 09:16:15 -08:00 |
|
Pierre Stock
|
b5f882cc98
|
Mixtral 8x7B support (#2011)
Co-authored-by: Pierre Stock <p@mistral.ai>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
|
2023-12-11 01:09:15 -08:00 |
|
Simon Mo
|
2e8fc0d4c3
|
Fix completion API echo and logprob combo (#1992)
|
2023-12-10 13:20:30 -08:00 |
|
wbn
|
dacaf5a400
|
Replace head_mapping params with num_kv_heads to attention kernel. (#1997)
Co-authored-by: wangguoya <wangguoya@baidu.com>
Co-authored-by: Yang Zhao <zhaoyangstar@foxmail.com>
|
2023-12-10 10:12:53 -08:00 |
|
Woosuk Kwon
|
24cde76a15
|
[Minor] Add comment on skipping rope caches (#2004)
|
2023-12-10 10:04:12 -08:00 |
|
Jin Shang
|
1aa1361510
|
Fix OpenAI server completion_tokens referenced before assignment (#1996)
|
2023-12-09 21:01:21 -08:00 |
|
Woosuk Kwon
|
fe470ae5ad
|
[Minor] Fix code style for baichuan (#2003)
|
2023-12-09 19:24:29 -08:00 |
|
Jun Gao
|
3a8c2381f7
|
Fix for KeyError on Loading LLaMA (#1978)
|
2023-12-09 15:59:57 -08:00 |
|
firebook
|
2b981012a6
|
Fix Baichuan2-7B-Chat (#1987)
|
2023-12-08 09:38:36 -08:00 |
|
TJian
|
6ccc0bfffb
|
Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836)
Co-authored-by: Philipp Moritz <pcmoritz@gmail.com>
Co-authored-by: Amir Balwel <amoooori04@gmail.com>
Co-authored-by: root <kuanfu.liu@akirakan.com>
Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com>
Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>
|
2023-12-07 23:16:52 -08:00 |
|
Jie Li
|
ebede26ebf
|
Make InternLM follow rope_scaling in config.json (#1956)
Co-authored-by: lijie8 <lijie8@sensetime.com>
|
2023-12-07 08:32:08 -08:00 |
|
dancingpipi
|
1d9b737e05
|
Support ChatGLMForConditionalGeneration (#1932)
Co-authored-by: shujunhua1 <shujunhua1@jd.com>
|
2023-12-05 10:52:48 -08:00 |
|
Roy
|
60dc62dc9e
|
add custom server params (#1868)
|
2023-12-03 12:59:18 -08:00 |
|
Woosuk Kwon
|
0f90effc66
|
Bump up to v0.2.3 (#1903)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.1.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.1.0) (push) Has been cancelled
|
2023-12-03 12:27:47 -08:00 |
|
Woosuk Kwon
|
464dd985e3
|
Fix num_gpus when TP > 1 (#1852)
|
2023-12-03 12:24:30 -08:00 |
|
Woosuk Kwon
|
9b294976a2
|
Add PyTorch-native implementation of custom layers (#1898)
|
2023-12-02 21:18:40 -08:00 |
|
Simon Mo
|
5313c2cb8b
|
Add Production Metrics in Prometheus format (#1890)
|
2023-12-02 16:37:44 -08:00 |
|
Woosuk Kwon
|
5f09cbdb63
|
Fix broken sampler tests (#1896)
Co-authored-by: Antoni Baum <antoni.baum@protonmail.com>
|
2023-12-02 16:06:17 -08:00 |
|
Jerry
|
f86bd6190a
|
Fix the typo in SamplingParams' docstring (#1886)
|
2023-12-01 02:06:36 -08:00 |
|
Woosuk Kwon
|
e5452ddfd6
|
Normalize head weights for Baichuan 2 (#1876)
|
2023-11-30 20:03:58 -08:00 |
|
Woosuk Kwon
|
d06980dfa7
|
Fix Baichuan tokenizer error (#1874)
|
2023-11-30 18:35:50 -08:00 |
|
Adam Brusselback
|
66785cc05c
|
Support chat template and echo for chat API (#1756)
|
2023-11-30 16:43:13 -08:00 |
|
Roy
|
d27f4bae39
|
Fix rope cache key error (#1867)
|
2023-11-30 08:29:28 -08:00 |
|
Jee Li
|
63b2206ad0
|
Avoid multiple instantiations of the RoPE class (#1828)
|
2023-11-29 23:06:27 -08:00 |
|
Woosuk Kwon
|
27feead2f8
|
Refactor Worker & InputMetadata (#1843)
|
2023-11-29 22:16:37 -08:00 |
|
Michael McCulloch
|
c782195662
|
Disable Logs Requests should Disable Logging of requests. (#1779)
Co-authored-by: Michael McCulloch <mjm.gitlab@fastmail.com>
|
2023-11-29 21:50:02 -08:00 |
|
Woosuk Kwon
|
a9e4574261
|
Refactor Attention (#1840)
|
2023-11-29 15:37:31 -08:00 |
|
FlorianJoncour
|
0229c386c5
|
Better integration with Ray Serve (#1821)
Co-authored-by: FlorianJoncour <florian@zetta-sys.com>
|
2023-11-29 13:25:43 -08:00 |
|