Commit Graph

298 Commits

Author SHA1 Message Date
Dong-Yong Lee
e11222333f fix: bug fix when penalties are negative (#913)
Co-authored-by: dongyong-lee <dongyong.lee@navercorp.com>
2023-09-01 00:37:17 +09:00
Aman Gupta Karmani
28873a2799 Improve _prune_hidden_states micro-benchmark (#707) 2023-08-31 13:28:43 +09:00
JFDuan
0d93f15694 Accelerate LLaMA model loading (#234) 2023-08-30 01:00:13 -07:00
lplcor
becd7a56f1 Enable request body OpenAPI spec for OpenAI endpoints (#865) 2023-08-29 21:54:08 -07:00
Aman Gupta Karmani
75471386de use flash-attn via xformers (#877) 2023-08-29 21:52:13 -07:00
Zhuohan Li
d2b2eed67c [Fix] Fix a condition for ignored sequences (#867) 2023-08-27 23:00:56 -07:00
Antoni Baum
4b6f069b6f Add support for CodeLlama (#854) 2023-08-25 12:44:07 -07:00
Woosuk Kwon
791d79de32 Bump up the version to v0.1.4 (#846)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9) (push) Has been cancelled
2023-08-25 12:28:00 +09:00
Woosuk Kwon
94d2f59895 Set replacement=True in torch.multinomial (#858) 2023-08-25 12:22:01 +09:00
wenjun93
75c0ca9d43 Clean up code (#844) 2023-08-23 16:44:15 -07:00
Woosuk Kwon
2a4ec90854 Fix for breaking changes in xformers 0.0.21 (#834) 2023-08-23 17:44:21 +09:00
Woosuk Kwon
d64bf1646c Implement approximate GELU kernels (#828) 2023-08-23 07:43:21 +09:00
Wen Sun
eedac9dba0 fix: revert code to avoid no attribute problem (#827) 2023-08-22 11:55:16 -07:00
shunxing1234
ad5f2fe34c Add support for aquila (#663)
* add aquila

Signed-off-by: ftgreat <ftgreat@163.com>

* fix some bug

Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* delete pdb

Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* fix bugs

Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* fix bugs

Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* delete whitespace

Signed-off-by: shunxing1234 <xw747777271@gmail.com>

* format

* fix order

---------

Signed-off-by: ftgreat <ftgreat@163.com>
Signed-off-by: shunxing1234 <xw747777271@gmail.com>
Co-authored-by: ftgreat <ftgreat@163.com>
2023-08-22 00:13:36 -07:00
zhaoyang-star
4f8584756d Fix mqa is false case in gpt_bigcode (#806) 2023-08-21 22:22:06 -07:00
wangcx18
0c04ce3234 Fix typo in sampling_params.py (#788) 2023-08-18 10:12:46 +09:00
Xinyu Yang
73b3de79ea explicitly del state (#784) 2023-08-17 12:56:04 -07:00
Abraham-Xu
d1744376ae Align with huggingface Top K sampling (#753) 2023-08-15 16:44:33 -07:00
Ikko Eltociear Ashimine
805de738f6 Fix typo in tokenizer.py (#750)
conjuction -> conjunction
2023-08-14 22:26:36 -07:00
WanMok
e06f504a76 Supports tokens and arrays of tokens as inputs to the OpenAI completion API (#715) 2023-08-11 12:14:34 -07:00
WRH
462ae5220a [Fix] unwantted bias in InternLM Model (#740) 2023-08-11 11:40:37 -07:00
Nicolas Basile
66c54aa9c3 Check the max prompt length for the OpenAI completions API (#472) 2023-08-08 17:43:49 -07:00
Jia Guoqing
735ecfff61 add internlm model (#528) 2023-08-08 16:35:06 -07:00
Qing
a57d13cc96 add QWen-7b (#685)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
2023-08-08 13:50:38 -07:00
Wen Sun
621980bdc0 fix: incorrect bigcode attention heads num (#676) 2023-08-04 10:35:22 -07:00
Zhuohan Li
aa84c92ef6 Bump up version to 0.1.3 (#657) 2023-08-02 16:46:53 -07:00
Zhuohan Li
f7389f4763 [Doc] Add Baichuan 13B to supported models (#656) 2023-08-02 16:45:12 -07:00
Woosuk Kwon
55fe8a81ec Refactor scheduler (#658) 2023-08-02 16:42:01 -07:00
YHPeter
e8ddc08ec8 [BUG FIX] upgrade fschat version to 0.2.23 (#650)
Co-authored-by: hao.yu <hao.yu@cn-c017.server.mila.quebec>
2023-08-02 14:05:59 -07:00
Zhuohan Li
1b0bd0fe8a Add Falcon support (new) (#592) 2023-08-02 14:04:39 -07:00
Lily Liu
20044cab7a Fix log message in scheduler (#652) 2023-08-02 13:35:10 -07:00
Song
64f23c2900 fix baichuan for different position embedding for 7b and 13b models (#643) 2023-08-01 22:22:51 -07:00
Qing
d4c7755ca8 fix biachuan-7b tp (#598)
Co-authored-by: wq.chu <wq.chu@tianrang-inc.com>
2023-08-01 15:41:36 -07:00
Chaofan Lin
aa39e42c5a fix doc (#622) 2023-07-31 13:11:57 -07:00
Fang li
953f28cf9a fix ModuleNotFoundError (#599)
Co-authored-by: fangli <fangli@tencent.com>
2023-07-29 20:52:41 -07:00
Xudong Zhang
c0d00f5be6 [Fix] fix import error of RayWorker (#604) (#605) 2023-07-27 23:37:40 -07:00
Zhuohan Li
58a072be15 [Fix] Add model sequence length into model config (#575) 2023-07-25 23:46:30 -07:00
Zhuohan Li
82ad323dee [Fix] Add chat completion Example and simplify dependencies (#576) 2023-07-25 23:45:48 -07:00
MoeedDar
2d867b55fa fixed tensor parallel is not defined (#564) 2023-07-25 14:16:51 -07:00
Zhuohan Li
7d5a155e4a [Fix] Fix GPTBigcoder for distributed execution (#503) 2023-07-24 18:36:33 -07:00
leegohi04517
1dde34e0f8 GPTJConfig has no attribute rotary. (#532) 2023-07-24 11:29:30 -07:00
Zhuohan Li
6fc2a38b11 Add support for LLaMA-2 (#505) 2023-07-20 11:38:27 -07:00
Antoni Baum
c487a221ee Fix bad assert in initialize_cluster if PG already exists (#526) 2023-07-19 23:17:12 -07:00
Antoni Baum
9925c17940 Ray placement group support (#397) 2023-07-19 22:49:31 -07:00
Ricardo Lu
8c4b2592fb fix: enable trust-remote-code in api server & benchmark. (#509) 2023-07-19 17:06:15 -07:00
Massimiliano Pronesti
16c3e295a8 fix(ray_utils): ignore re-init error (#465) 2023-07-19 17:01:19 -07:00
Song
bda41c70dd hotfix attn alibi wo head mapping (#496)
Co-authored-by: oliveryuan <oliveryuan@basemind.com>
2023-07-18 11:31:48 -07:00
MoeedDar
328d231c17 Fixed old name reference for max_seq_len 2023-07-18 16:47:59 +01:00
Lily Liu
b4b195b360 fix max seq len (#489) 2023-07-17 23:20:20 -07:00
codethazine
20b0d88d16 Add support for baichuan (#365) 2023-07-17 13:50:55 -07:00