Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

4ea1f9678d [BugFix] Resolved Issues For LinearMethod --> QuantConfig (#4418) Robert Shaw 2024-04-27 14:35:33 -04:00
ba4be44c32 [BugFix] Fix return type of executor execute_model methods (#4402) Nick Hill 2024-04-27 11:17:45 -07:00
d6e520e170 [Core] Support offline use of local cache for models (#4374) Prashant Gupta 2024-04-27 09:59:55 -07:00
81661da7b2 [BugFix] Fix min_tokens when eos_token_id is None (#4389) Nick Hill 2024-04-27 09:52:46 -07:00
dfea173148 [Bugfix] Abort requests when the connection to /v1/completions is interrupted (#4363) Ruoyu Qin 2024-04-28 00:48:37 +08:00
7134303cbb [Bugfix][Core] Fix get decoding config from ray (#4335) Roy 2024-04-27 19:30:08 +08:00
3da24c2df7 [Model] Phi-3 4k sliding window temp. fix (#4380) Caio Mendes 2024-04-27 07:08:15 -03:00
eefeb16464 [Kernel] Full Tensor Parallelism for LoRA Layers (#3524) Austin Veselka 2024-04-27 02:03:48 -05:00
18d23f642a [ROCm][Hardware][AMD] Enable group query attention for triton FA (#4406) Hongxia Yang 2024-04-27 02:37:40 -04:00
87f545ba6f [Misc] Fix logger format typo (#4396) Roy 2024-04-27 13:45:02 +08:00
8947bc3c15 [Frontend][Bugfix] Disallow extra fields in OpenAI API (#4355) Cyrus Leung 2024-04-27 13:08:24 +08:00
12628d3c78 [Kernel] Optimize FP8 support for MoE kernel / Mixtral via static scales (#4343) Philipp Moritz 2024-04-26 21:49:59 -07:00
258a2c58d0 [Core] Introduce DistributedGPUExecutor abstract class (#4348) Nick Hill 2024-04-26 21:14:26 -07:00
aba47be3fe [Misc] add RFC issue template (#4401) youkaichao 2024-04-26 15:47:45 -07:00
a62aaf1df5 [Misc][Refactor] Generalize linear_method to be quant_method (#4373) Cody Yu 2024-04-26 13:41:14 -07:00
603ad84815 [Core] Refactoring sampler and support prompt logprob for chunked prefill (#4309) SangBin Cho 2024-04-26 22:02:02 +09:00
a88081bf76 [CI] Disable non-lazy string operation on logging (#4326) SangBin Cho 2024-04-26 16:16:58 +09:00
2f30e7c72f [Frontend] Add --log-level option to api server (#4377) Norman Mu 2024-04-25 22:36:01 -07:00
a74dee9b62 [Bugfix] Fix parameter name in get_tokenizer (#4107) Cyrus Leung 2024-04-26 10:10:48 +08:00
cf29b7eda4 [ROCm][Hardware][AMD][Doc] Documentation update for ROCm (#4376) Hongxia Yang 2024-04-25 21:12:25 -04:00
efffb63f58 [Core] Move function tracing setup to util function (#4352) Nick Hill 2024-04-25 16:45:12 -07:00
15e7c675b0 [Core] Add shutdown() method to ExecutorBase (#4349) Nick Hill 2024-04-25 16:32:48 -07:00
b6dcb4d442 [Misc] Fix flash attention backend log (#4368) Roy 2024-04-26 03:43:32 +08:00
b5b4a398a7 [Mypy] Typing lora folder (#4337) SangBin Cho 2024-04-26 04:13:50 +09:00
f4bc4de1b1 [Core]refactor aqlm quant ops (#4351) Kunshang Ji 2024-04-25 19:03:56 +00:00
bd7a8eef25 [Doc] README Phi-3 name fix. (#4372) Caio Mendes 2024-04-25 14:32:00 -03:00
7ee82bef1e [CI/Build] Adding functionality to reset the node's GPUs before processing. (#4213) Alexei-V-Ivanov-AMD 2024-04-25 11:37:20 -05:00
fbf152d976 [Bugfix][Model] Refactor OLMo model to support new HF format in transformers 4.40.0 (#4324) Isotr0py 2024-04-26 00:35:56 +08:00
479d69fad0 [Core] Move ray_utils.py from engine to executor package (#4347) Nick Hill 2024-04-24 23:52:22 -07:00
96e90fdeb3 [Model] Adds Phi-3 support (#4298) Caio Mendes 2024-04-25 00:06:57 -03:00
a395a638c2 [Misc] Use public API in benchmark_throughput (#4300) zifeitong 2024-04-24 14:10:24 -07:00
2768884ac4 [Doc] Add note for docker user (#4340) youkaichao 2024-04-24 14:09:44 -07:00
aae08249ac [Bugfix] Fix marlin kernel crash on H100 (#4218) alexm-nm 2024-04-24 13:35:01 -04:00
7923dcad12 [Misc] Update ShareGPT Dataset Sampling in Serving Benchmark (#4279) Roger Wang 2024-04-24 09:49:13 -07:00
3cd9b5bb2d [Core][Distributed] use existing torch.cuda.device (#4318) youkaichao 2024-04-24 09:00:20 -07:00
468d761b32 [Misc] Reduce supported Punica dtypes (#4304) v0.4.1 Woosuk Kwon 2024-04-23 18:54:33 -07:00
e4bf860a54 [CI][Build] change pynvml to nvidia-ml-py (#4302) youkaichao 2024-04-23 18:33:12 -07:00
91f50a6fe2 [Core][Distributed] use cpu/gloo to initialize pynccl (#4248) youkaichao 2024-04-23 18:32:19 -07:00
79a268c4ab [BUG] fixed fp8 conflict with aqlm (#4307) Robert Shaw 2024-04-23 21:26:33 -04:00
eace8bf0b9 [Kernel] FP8 support for MoE kernel / Mixtral (#4244) Philipp Moritz 2024-04-23 18:18:23 -07:00
1e8f4252aa [Bugfix][Frontend] Raise exception when file-like chat template fails to be opened (#4292) Cyrus Leung 2024-04-24 02:19:03 +08:00
2b7949c1c2 AQLM CUDA support (#3287) James Fleming 2024-04-23 13:59:33 -04:00
62b5166bd4 [CI] Add ccache for wheel builds job (#4281) Simon Mo 2024-04-23 09:51:41 -07:00
d86285a4a4 [Core][Logging] Add last frame information for better debugging (#4278) youkaichao 2024-04-23 09:45:52 -07:00
d87f39e9a9 [Bugfix] Add init_cached_hf_modules to RayWorkerWrapper (#4286) DefTruth 2024-04-24 00:28:35 +08:00
d3c8180ac4 [Bugfix] Fixing max token error message for openai compatible server (#4016) Jack Gordley 2024-04-23 12:06:29 +01:00
62b8aebc6f [Speculative decoding 7/9] Speculative decoding end-to-end correctness tests. (#3951) Cade Daniel 2024-04-23 01:02:36 -07:00
050f285ff6 [Core] Scheduling optimization 2 (#4280) SangBin Cho 2024-04-23 17:02:11 +09:00
8f2ea22bde [Core] Some simplification of WorkerWrapper changes (#4183) Nick Hill 2024-04-23 00:49:08 -07:00
0ae11f78ab [Mypy] Part 3 fix typing for nested directories for most of directory (#4161) SangBin Cho 2024-04-23 13:32:44 +09:00
34128a697e Fix autodoc directives (#4272) Harry Mellor 2024-04-23 02:53:01 +01:00
c1b4e4157c [Core][Distributed] use absolute path for library file (#4271) youkaichao 2024-04-22 17:21:48 -07:00
ceaf4ed003 [Doc] Update the SkyPilot doc with serving and Llama-3 (#4276) Zhanghao Wu 2024-04-22 15:34:31 -07:00
ad8d696a99 [Core] Scheduler perf fix (#4270) SangBin Cho 2024-04-23 06:11:06 +09:00
3d925165f2 Add example scripts to documentation (#4225) Harry Mellor 2024-04-22 17:36:54 +01:00
1543680691 [Bugfix] Ensure download_weights_from_hf(..) inside loader is using the revision parameter (#4217) alexm-nm 2024-04-22 12:10:48 -04:00
077f0a2e8a [Frontend] Enable support for CPU backend in AsyncLLMEngine. (#3993) Tao He 2024-04-22 17:19:51 +08:00
e73ed0f1c6 [Bugfix] Fix type annotations in CPU model runner (#4256) Woosuk Kwon 2024-04-22 00:54:16 -07:00
296cdf8ac7 [Misc] Add vision language model support to CPU backend (#3968) Isotr0py 2024-04-22 15:44:16 +08:00
747b1a7147 [Core][Distributed] fix _is_full_nvlink detection (#4233) youkaichao 2024-04-21 23:04:16 -07:00
95e5b087cf [AMD][Hardware][Misc][Bugfix] xformer cleanup and light navi logic and CI fixes and refactoring (#4129) Hongxia Yang 2024-04-22 00:57:24 -04:00
a37d815b83 Make initialization of tokenizer and detokenizer optional (#3748) GeauxEric 2024-04-21 15:06:46 -07:00
7f2593b164 [Doc]: Update the doc of adding new models (#4236) xiaoji 2024-04-22 00:57:08 +08:00
fe7d648fe5 Don't show default value for flags in EngineArgs (#4223) Harry Mellor 2024-04-21 17:15:28 +01:00
cc74b2b232 Updating lm-format-enforcer version and adding links to decoding libraries in docs (#4222) Noam Gat 2024-04-20 11:33:16 +03:00
91528575ec [Frontend] multiple sampling params support (#3570) nunjunj 2024-04-20 00:11:57 -07:00
a22cdea371 [Kernel][FP8] Initial support with dynamic per-tensor scaling (#4118) Cody Yu 2024-04-19 21:28:57 -07:00
682789d402 Fix missing docs and out of sync EngineArgs (#4219) Harry Mellor 2024-04-20 04:51:33 +01:00
138485a82d [Bugfix] Add fix for JSON whitespace (#4189) Ayush Rautwar 2024-04-19 23:49:22 -04:00
bc9df1571b Pass tokenizer_revision when getting tokenizer in openai serving (#4214) Chirag Jain 2024-04-20 05:43:56 +05:30
15b86408a8 [Misc] add nccl in collect env (#4211) youkaichao 2024-04-19 12:44:51 -07:00
7be4f5628f [Bugfix][Core] Restore logging of stats in the async engine (#4150) Ronen Schaffer 2024-04-19 18:08:26 +03:00
8f20fc04bf [Misc] fix docstrings (#4191) Uranus 2024-04-19 16:18:33 +08:00
221d93ecbf Bump version of 0.4.1 (#4177) Simon Mo 2024-04-19 01:00:22 -07:00
d17c8477f1 [Bugfix] Fix LoRA loading check (#4138) Jee Li 2024-04-19 15:59:54 +08:00
a134ef6f5e Support eos_token_id from generation_config.json (#4182) Simon Mo 2024-04-18 21:13:36 -07:00
8a7a3e4436 [Core] add an option to log every function call to for debugging hang/crash in distributed inference (#4079) youkaichao 2024-04-18 16:15:12 -07:00
8f9c28fd40 [Bugfix] Fix CustomAllreduce nvlink topology detection (#3974) Adam Tilghman 2024-04-18 15:32:47 -07:00
cd2f63fb36 [CI/CD] add neuron docker and ci test scripts (#3571) Liangfu Chen 2024-04-18 15:26:01 -07:00
87fa80c91f [Misc] Bump transformers to latest version (#4176) Nick Hill 2024-04-18 14:36:39 -07:00
e1bb2fd52d [Bugfix] Support logprobs when using guided_json and other constrained decoding fields (#4149) James Whedbee 2024-04-18 16:12:55 -05:00
705578ae14 [Docs] document that Meta Llama 3 is supported (#4175) Simon Mo 2024-04-18 10:55:48 -07:00
e8cc7967ff [Bugfix][Kernel] allow non-power-of-two head sizes in prefix prefill (#4128) Michał Moskal 2024-04-18 00:51:28 -07:00
53b018edcb [Bugfix] Get available quantization methods from quantization registry (#4098) Michael Goin 2024-04-18 03:21:55 -04:00
66ded03067 Allow model to be served under multiple names (#2894) Harry Mellor 2024-04-18 08:16:26 +01:00
6dc1fc9cfe [Core] nccl integrity check and test (#4155) youkaichao 2024-04-17 22:28:52 -07:00
533d2a1f39 [Typing] Mypy typing part 2 (#4043) SangBin Cho 2024-04-18 09:28:43 +09:00
a53222544c [Kernel] Add punica dimension for Swallow-MS-7B LoRA (#4134) Shoichi Uchinami 2024-04-18 02:02:45 +09:00
fe3b5bbc23 [Bugfix] fix output parsing error for trtllm backend (#4137) Elinx 2024-04-17 19:07:23 +08:00
8438e0569e [Core] RayWorkerVllm --> WorkerWrapper to reduce duplication (#4024) youkaichao 2024-04-17 01:34:33 -07:00
11d652bd4f [CI] Move CPU/AMD tests to after wait (#4123) Cade Daniel 2024-04-16 22:53:26 -07:00
d150e4f89f [Misc] [CI] Fix CI failure caught after merge (#4126) Cade Daniel 2024-04-16 17:56:01 -07:00
e95cd87959 [Speculative decoding 6/9] Integrate speculative decoding with LLMEngine (#3894) Cade Daniel 2024-04-16 13:09:21 -07:00
69e1d2fb69 [Core] Refactor model loading code (#4097) Antoni Baum 2024-04-16 11:34:39 -07:00
05434764cd LM Format Enforcer Guided Decoding Support (#3868) Noam Gat 2024-04-16 08:54:57 +03:00
4e7ee664e2 [Core] Fix engine-use-ray broken (#4105) SangBin Cho 2024-04-16 14:24:53 +09:00
37e84a403d [Typing] Fix Sequence type GenericAlias only available after Python 3.9. (#4092) SangBin Cho 2024-04-16 06:47:31 +09:00
4695397dcf [Bugfix] Fix ray workers profiling with nsight (#4095) Ricky Xu 2024-04-15 14:24:45 -07:00
d619ae2d19 [Doc] Add better clarity for tensorizer usage (#4090) Sanger Steel 2024-04-15 16:28:25 -04:00
eb46fbfda2 [Core] Simplifications to executor classes (#4071) Nick Hill 2024-04-15 13:05:09 -07:00

... 146 147 148 149 150 ...