Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

27ca23dc00 Remove exclude_unset in streaming response (#3143) Seonghyeon 2024-03-02 02:59:06 +09:00
54d3544784 Fix: Output text is always truncated in some models (#3016) Sherry 2024-03-01 15:52:22 +08:00
703e42ee4b Add guided decoding for OpenAI API server (#2819) felixzhu555 2024-02-29 14:13:08 -08:00
29a8d6a554 [Fix] Don't deep-copy LogitsProcessors when copying SamplingParams (#3099) Nick Hill 2024-02-29 11:20:42 -08:00
2c08ff23c0 Fix building from source on WSL (#3112) Billy Cao 2024-03-01 03:13:58 +08:00
bfdcfa6a05 Support starcoder2 architecture (#3089) Seonghyeon 2024-02-29 17:51:48 +09:00
9289e577ec add cache_config's info to prometheus metrics. (#3100) Allen.Dou 2024-02-29 14:15:18 +08:00
a6d471c759 Fix: AttributeError in OpenAI-compatible server (#3018) Jae-Won Chung 2024-02-29 01:04:07 -05:00
01a5d18a53 Add Support for 2/3/8-bit GPTQ Quantization Models (#2330) CHU Tianxiang 2024-02-29 13:52:23 +08:00
929b4f2973 Add LoRA support for Gemma (#3050) Woosuk Kwon 2024-02-28 13:03:28 -08:00
3b7178cfa4 [Neuron] Support inference with transformers-neuronx (#2569) Liangfu Chen 2024-02-28 09:34:34 -08:00
e46fa5d52e Restrict prometheus_client >= 0.18.0 to prevent errors when importing pkgs (#3070) Allen.Dou 2024-02-28 13:38:26 +08:00
a8683102cc multi-lora documentation fix (#3064) Ganesh Jagadeesan 2024-02-28 00:26:15 -05:00
71bcaf99e2 Enable GQA support in the prefix prefill kernels (#3007) Tao He 2024-02-27 17:14:31 +08:00
8b430d7dea [Minor] Fix StableLMEpochForCausalLM -> StableLmForCausalLM (#3046) Woosuk Kwon 2024-02-26 20:23:50 -08:00
e0ade06d63 Support logit bias for OpenAI API (#3027) Dylan Hawk 2024-02-26 19:51:53 -08:00
4bd18ec0c7 [Minor] Fix type annotation in fused moe (#3045) Woosuk Kwon 2024-02-26 19:44:29 -08:00
2410e320b3 fix get_ip error in pure ipv6 environment (#2931) Jingru 2024-02-27 11:22:16 +08:00
48a8f4a7fd Support Orion model (#2539) 张大成 2024-02-27 11:17:06 +08:00
4dd6416faf Fix stablelm (#3038) Roy 2024-02-27 10:31:10 +08:00
c1c0d00b88 Don't use cupy when enforce_eager=True (#3037) Roy 2024-02-27 09:33:38 +08:00
d9f726c4d0 [Minor] Remove unused config files (#3039) Roy 2024-02-27 09:25:22 +08:00
d6e4a130b0 [Minor] Remove gather_cached_kv kernel (#3043) Woosuk Kwon 2024-02-26 15:00:54 -08:00
cfc15a1031 Optimize Triton MoE Kernel (#2979) Philipp Moritz 2024-02-26 13:48:56 -08:00
70f3e8e3a1 Add LogProbs for Chat Completions in OpenAI (#2918) Jared Moore 2024-02-25 18:39:34 -08:00
ef978fe411 Port metrics from aioprometheus to prometheus_client (#2730) Harry Mellor 2024-02-25 19:54:00 +00:00
f7c1234990 [Fix] Fissertion on YaRN model len (#2984) Woosuk Kwon 2024-02-23 12:57:48 -08:00
57f044945f Fix nvcc not found in vlm-openai image (#2781) zhaoyang-star 2024-02-23 06:25:07 +08:00
4caf7044e0 Include tokens from prompt phase in counter_generation_tokens (#2802) Ronen Schaffer 2024-02-23 00:00:12 +02:00
6f32cddf1c Remove Flash Attention in test env (#2982) Woosuk Kwon 2024-02-22 09:58:29 -08:00
c530e2cfe3 [FIX] Fix a bug in initializing Yarn RoPE (#2983) 44670 2024-02-22 17:40:05 +08:00
fd5dcc5c81 Optimize GeGLU layer in Gemma (#2975) Woosuk Kwon 2024-02-21 20:17:52 -08:00
93dc5a2870 chore(vllm): codespell for spell checking (#2820) Massimiliano Pronesti 2024-02-22 02:56:01 +00:00
95529e3253 Use Llama RMSNorm custom op for Gemma (#2974) Woosuk Kwon 2024-02-21 18:28:23 -08:00
344020c926 Migrate MistralForCausalLM to LlamaForCausalLM (#2868) Roy 2024-02-22 10:25:05 +08:00
5574081c49 Added early stopping to completion APIs (#2939) Mustafa Eyceoz 2024-02-21 21:24:01 -05:00
d7f396486e Update comment (#2934) Ronen Schaffer 2024-02-22 04:18:37 +02:00
8fbd84bf78 Bump up version to v0.3.2 (#2968) v0.3.2 Zhuohan Li 2024-02-21 11:47:25 -08:00
7d2dcce175 Support per-request seed (#2514) Nick Hill 2024-02-21 11:47:00 -08:00
dc903e70ac [ROCm] Upgrade transformers to v4.38.0 (#2967) Woosuk Kwon 2024-02-21 09:46:57 -08:00
a9c8212895 [FIX] Add Gemma model to the doc (#2966) Zhuohan Li 2024-02-21 09:46:15 -08:00
c20ecb6a51 Upgrade transformers to v4.38.0 (#2965) Woosuk Kwon 2024-02-21 09:38:03 -08:00
5253edaacb Add Gemma model (#2964) Xiang Xu 2024-02-21 09:34:30 -08:00
017d9f1515 Add metrics to RequestOutput (#2876) Antoni Baum 2024-02-20 21:55:57 -08:00
181b27d881 Make vLLM logging formatting optional (#2877) Antoni Baum 2024-02-20 14:38:55 -08:00
63e2a6419d [FIX] Fix beam search test (#2930) Zhuohan Li 2024-02-20 14:37:39 -08:00
264017a2bf [ROCm] include gfx908 as supported (#2792) James Whedbee 2024-02-19 19:58:59 -06:00
e433c115bc Fix vllm:prompt_tokens_total metric calculation (#2869) Ronen Schaffer 2024-02-19 09:55:41 +02:00
86fd8bb0ac Add warning to prevent changes to benchmark api server (#2858) Simon Mo 2024-02-18 21:36:19 -08:00
ab3a5a8259 Support OLMo models. (#2832) Isotr0py 2024-02-19 13:05:15 +08:00
a61f0521b8 [Test] Add basic correctness test (#2908) Zhuohan Li 2024-02-18 16:44:50 -08:00
537c9755a7 [Minor] Small fix to make distributed init logic in worker looks cleaner (#2905) Zhuohan Li 2024-02-18 14:39:00 -08:00
786b7f18a5 Add code-revision config argument for Hugging Face Hub (#2892) Mark Mozolewski 2024-02-17 22:36:53 -08:00
8f36444c4f multi-LoRA as extra models in OpenAI server (#2775) jvmncs 2024-02-17 15:00:48 -05:00
185b2c29e2 Defensively copy sampling_params (#2881) Nick Hill 2024-02-17 11:18:04 -08:00
5f08050d8d Bump up to v0.3.1 (#2887) v0.3.1 Woosuk Kwon 2024-02-16 15:05:18 -08:00
64da65b322 Prefix Caching- fix t4 triton error (#2517) shiyi.c_98 2024-02-16 14:17:55 -08:00
5255d99dc5 [ROCm] Dockerfile fix for flash-attention build (#2885) Hongxia Yang 2024-02-15 13:22:39 -05:00
4f2ad11135 Fix DeciLM (#2883) Philipp Moritz 2024-02-14 22:29:57 -08:00
d7afab6d3a [BugFix] Fix GC bug for LLM class (#2882) Woosuk Kwon 2024-02-14 22:17:44 -08:00
31348dff03 Align LoRA code between Mistral and Mixtral (fixes #2875) (#2880) Philipp Moritz 2024-02-14 16:00:43 -08:00
25e86b6a61 Don't use cupy NCCL for AMD backends (#2855) Woosuk Kwon 2024-02-14 12:30:44 -08:00
4efbac6d35 Migrate AquilaForCausalLM to LlamaForCausalLM (#2867) Roy 2024-02-15 04:30:24 +08:00
87069ccf68 Fix docker python version (#2845) Nikola Borisov 2024-02-14 10:17:57 -08:00
7e45107f51 [Fix] Fix memory profiling when GPU is used by multiple processes (#2863) Woosuk Kwon 2024-02-13 19:52:34 -08:00
0c48b37c31 Fix internlm after https://github.com/vllm-project/vllm/pull/2860 (#2861) Philipp Moritz 2024-02-13 18:01:15 -08:00
7eacffd951 Migrate InternLMForCausalLM to LlamaForCausalLM (#2860) Philipp Moritz 2024-02-13 17:12:05 -08:00
2a543d6efe Add LoRA support for Mixtral (#2831) Terry 2024-02-13 15:55:45 -08:00
317b29de0f Remove Yi model definition, please use LlamaForCausalLM instead (#2854) Philipp Moritz 2024-02-13 14:22:22 -08:00
a463c333dd Use CuPy for CUDA graphs (#2811) Woosuk Kwon 2024-02-13 11:32:06 -08:00
ea356004d4 Revert "Refactor llama family models (#2637)" (#2851) Philipp Moritz 2024-02-13 09:24:59 -08:00
5c976a7e1a Refactor llama family models (#2637) Roy 2024-02-13 16:09:23 +08:00
f964493274 [CI] Ensure documentation build is checked in CI (#2842) Simon Mo 2024-02-12 22:53:07 -08:00
a4211a4dc3 Serving Benchmark Refactoring (#2433) Roger Wang 2024-02-12 22:53:00 -08:00
563836496a Refactor 2 awq gemm kernels into m16nXk32 (#2723) Rex 2024-02-12 11:02:17 -08:00
4ca2c358b1 Add documentation section about LoRA (#2834) Philipp Moritz 2024-02-12 08:24:45 -08:00
0580aab02f [ROCm] support Radeon™ 7900 series (gfx1100) without using flash-attention (#2768) Hongxia Yang 2024-02-11 02:14:37 -05:00
3711811b1d Disable custom all reduce by default (#2808) Woosuk Kwon 2024-02-08 09:58:03 -08:00
65b89d16ee [Ray] Integration compiled DAG off by default (#2471) SangBin Cho 2024-02-09 02:57:25 +09:00
931746bc6d Add documentation on how to do incremental builds (#2796) Philipp Moritz 2024-02-07 14:42:02 -08:00
c81dddb45c [ROCm] Fix build problem resulted from previous commit related to FP8 kv-cache support (#2790) Hongxia Yang 2024-02-07 01:36:59 -05:00
fe6d09ae61 [Minor] More fix of test_cache.py CI test failure (#2750) Lily Liu 2024-02-06 11:38:38 -08:00
ed70c70ea3 modelscope: fix issue when model parameter is not a model id but path of the model. (#2489) liuyhwangyh 2024-02-07 01:57:15 +08:00
f0d4e14557 Add fused top-K softmax kernel for MoE (#2769) Woosuk Kwon 2024-02-05 17:38:02 -08:00
2ccee3def6 [ROCm] Fixup arch checks for ROCM (#2627) Douglas Lehr 2024-02-05 16:59:09 -06:00
b92adec8e8 Set local logging level via env variable (#2774) Lukas 2024-02-05 23:26:50 +01:00
56f738ae9b [ROCm] Fix some kernels failed unit tests (#2498) Hongxia Yang 2024-02-05 17:25:36 -05:00
72d3a30c63 [Minor] Fix benchmark_latency script (#2765) Woosuk Kwon 2024-02-05 12:45:37 -08:00
c9b45adeeb Require triton >= 2.1.0 (#2746) whyiug 2024-02-05 15:07:36 +08:00
5a6c81b051 Remove eos tokens from output by default (#2611) Rex 2024-02-04 14:32:42 -08:00
51cd22ce56 set&get llm internal tokenizer instead of the TokenizerGroup (#2741) dancingpipi 2024-02-05 06:25:36 +08:00
5ed704ec8c docs: fix langchain (#2736) Massimiliano Pronesti 2024-02-04 03:17:55 +01:00
4abf6336ec Add one example to run batch inference distributed on Ray (#2696) Cheng Su 2024-02-02 15:41:42 -08:00
0e163fce18 Fix default length_penalty to 1.0 (#2667) zspo 2024-02-02 07:59:39 +08:00
96b6f475dd Remove hardcoded device="cuda" to support more devices (#2503) Kunshang Ji 2024-02-02 07:46:39 +08:00
c410f5d020 Use revision when downloading the quantization config file (#2697) Pernekhan Utemuratov 2024-02-01 15:41:58 -08:00
bb8c697ee0 Update README for meetup slides (#2718) Simon Mo 2024-02-01 14:56:53 -08:00
b9e96b17de fix python 3.8 syntax (#2716) Simon Mo 2024-02-01 14:00:58 -08:00
923797fea4 Fix compile error when using rocm (#2648) zhaoyang-star 2024-02-02 01:35:09 +08:00
cd9e60c76c Add Internlm2 (#2666) Fengzhe Zhou 2024-02-02 01:27:40 +08:00

... 150 151 152 153 154 ...