biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	7fd3949a0b	[Frontend][Core] Move `merge_async_iterators` to utils (#4026 )	2024-04-12 05:30:54 +00:00
Jee Li	1096717ae9	[Core] Support LoRA on quantized models (#4012 )	2024-04-11 21:02:44 -07:00
Michael Feil	c2b4a1bce9	[Doc] Add typing hints / mypy types cleanup (#3816 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-11 17:17:21 -07:00
Nick Hill	e46a60aa4c	[BugFix] Fix handling of stop strings and stop token ids (#3672 )	2024-04-11 15:34:12 -07:00
Antoni Baum	1e96c3341a	Add extra punica sizes to support bigger vocabs (#4015 )	2024-04-11 22:18:57 +00:00
Dylan Hawk	95e7d4a97c	Fix echo/logprob OpenAI completion bug (#3441 ) Co-authored-by: Dylan Hawk <dylanwawk@gmail.com>	2024-04-11 22:15:50 +00:00
youkaichao	559eb852f8	[Core] init_distributed_environment align with init_process_group(#4014 ) [Core][Distributed] make init_distributed_environment compatible with init_process_group (#4014)	2024-04-11 14:00:48 -07:00
Antoni Baum	a10d3056da	[Core] Set `linear_weights` directly on the layer (#3977 )	2024-04-11 16:35:51 -04:00
bigPYJ1151	8afca50889	[Hardware][Intel] Isolate CPUModelRunner and ModelRunner for better maintenance (#3824 )	2024-04-11 11:56:49 -07:00
fuchen.ljl	08ccee1e83	punica fix-bgmv-kernel-640 (#4007 )	2024-04-11 08:59:26 -07:00
Roger Wang	c1dc547129	[Kernel] Fused MoE Config for Mixtral 8x22 (#4002 )	2024-04-11 07:50:00 -07:00
youkaichao	f3d0bf7589	[Doc][Installation] delete python setup.py develop (#3989 )	2024-04-11 03:33:02 +00:00
Kunshang Ji	e9da5a40c6	[Misc] Add indirection layer for custom ops (#3913 )	2024-04-10 20:26:07 -07:00
SangBin Cho	e42df7227d	[Test] Add xformer and flash attn tests (#3961 ) Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-11 03:09:50 +00:00
youkaichao	caada5e50a	[Core][Model] torch.compile for layernorm in commandr (#3985 ) [Core][Model] Use torch.compile to accelerate layernorm in commandr (#3985)	2024-04-11 01:48:26 +00:00
SangBin Cho	67b4221a61	[Core][5/N] Fully working chunked prefill e2e (#3884 )	2024-04-10 17:56:48 -07:00
youkaichao	63e7176f26	[Core][Refactor] move parallel_utils into vllm/distributed (#3950 ) [WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)	2024-04-10 15:33:30 -07:00
Travis Johnson	934d3662f7	[Bugfix] handle hf_config with architectures == None (#3982 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2024-04-10 22:28:25 +00:00
Frαnçois	92cd2e2f21	[Doc] Fix getting stared to use publicly available model (#3963 )	2024-04-10 18:05:52 +00:00
Daniel E Marasco	e4c4072c94	[Bugfix] Remove key sorting for `guided_json` parameter in OpenAi compatible Server (#3945 )	2024-04-10 10:15:51 -07:00
youkaichao	e35397468f	[Doc] Add doc to state our model support policy (#3948 ) Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-10 17:03:02 +00:00
James Whedbee	8b317c6dd0	[Model][AMD] ROCm support for 256 head dims for Gemma (#3972 )	2024-04-10 08:12:00 -07:00
Woosuk Kwon	bd3c144e0b	[Bugfix][ROCm] Add numba to Dockerfile.rocm (#3962 )	2024-04-10 07:37:17 -07:00
Travis Johnson	0258b7a94b	[Bugfix] handle prompt_logprobs in _apply_min_tokens_penalty (#3876 ) Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>	2024-04-10 01:39:56 -07:00
胡译文	b3104b2a10	[Bugfix] Fix logits processor when prompt_logprobs is not None (#3899 )	2024-04-10 00:09:36 -07:00
zhaotyer	c2e00af523	[Bugfix] fix utils.py/merge_dict func TypeError: 'type' object is not subscriptable (#3955 ) Co-authored-by: tianyi_zhao <tianyi.zhao@transwarp.io>	2024-04-10 04:49:11 +00:00
Zedong Peng	c013d32c75	[Benchmark] Add cpu options to bench scripts (#3915 )	2024-04-09 21:30:03 -07:00
Jee Li	11dd6ebb89	[Misc] Avoid loading incorrect LoRA config (#3777 )	2024-04-09 19:47:15 -07:00
Juan Villamizar	6c0b04515f	[ROCm][Hardware][AMD] Use Triton Kernel for default FA on ROCm (#3643 ) Co-authored-by: jpvillam <jpvillam@amd.com> Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-09 15:10:47 -07:00
Junichi Sato	e23a43aef8	[Bugfix] Fix KeyError on loading GPT-NeoX (#3925 )	2024-04-09 12:11:31 -07:00
Cade Daniel	e7c7067b45	[Misc] [Core] Implement RFC "Augment BaseExecutor interfaces to enable hardware-agnostic speculative decoding" (#3837 )	2024-04-09 11:44:15 -07:00
youkaichao	6d592eb430	[Core] separate distributed_init from worker (#3904 )	2024-04-09 08:49:02 +00:00
Roy	d036198e23	[BugFix][Model] Fix commandr RoPE max_position_embeddings (#3919 )	2024-04-09 06:17:21 +08:00
Matt Wong	59a6abf3c9	[Hotfix][CI/Build][Kernel] CUDA 11.8 does not support layernorm optimizations (#3782 )	2024-04-08 14:31:02 -07:00
Kiran R	bc0c0192d1	[Bugfix] Enable Proper `attention_bias` Usage in Llama Model Configuration (#3767 ) Co-authored-by: roy <jasonailu87@gmail.com>	2024-04-08 19:42:35 +00:00
egortolmachev	f46864d68d	[Bugfix] Added Command-R GPTQ support (#3849 ) Co-authored-by: Egor Tolmachev <t333ga@gmail.com>	2024-04-08 14:59:38 +00:00
ywfang	b4543c8f6b	[Model] add minicpm (#3893 )	2024-04-08 18:28:36 +08:00
Isotr0py	0ce0539d47	[Bugfix] Fix Llava inference with Tensor Parallelism. (#3883 )	2024-04-07 22:54:13 +08:00
youkaichao	2f19283549	[Core] latency optimization (#3890 )	2024-04-06 19:14:06 -07:00
youkaichao	95baec828f	[Core] enable out-of-tree model register (#3871 )	2024-04-06 17:11:41 -07:00
youkaichao	e4be7d70bb	[CI/Benchmark] add more iteration and use median for robust latency benchmark (#3889 )	2024-04-06 21:32:30 +00:00
Isotr0py	54951ac4bf	[Bugfix] Fix incorrect output on OLMo models in Tensor Parallelism (#3869 )	2024-04-05 12:02:09 -07:00
SangBin Cho	18de883489	[Chunked Prefill][4/n] Chunked prefill scheduler. (#3853 )	2024-04-05 10:17:58 -07:00
Thomas Parnell	1d7c940d74	Add option to completion API to truncate prompt tokens (#3144 )	2024-04-05 10:15:42 -07:00
Woosuk Kwon	cfaf49a167	[Misc] Define common requirements (#3841 )	2024-04-05 00:39:17 -07:00
Noam Gat	9edec652e2	[Bugfix] Fixing requirements.txt (#3865 )	2024-04-04 23:46:01 -07:00
Cade Daniel	e0dd4d3589	[Misc] Fix linter issues in examples/fp8/quantizer/quantize.py (#3864 )	2024-04-04 21:57:33 -07:00
Cade Daniel	e5043a3e75	[Misc] Add pytest marker to opt-out of global test cleanup (#3863 )	2024-04-04 21:54:16 -07:00
youkaichao	d03d64fd2e	[CI/Build] refactor dockerfile & fix pip cache [CI/Build] fix pip cache with vllm_nccl & refactor dockerfile to build wheels (#3859)	2024-04-04 21:53:16 -07:00
Sean Gallen	78107fa091	[Doc]Add asynchronous engine arguments to documentation. (#3810 ) Co-authored-by: Simon Mo <simon.mo@hey.com> Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>	2024-04-04 21:52:01 -07:00

1 2 3 4 5 ...

1087 Commits