Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

7cbd9ec7a9 [Model] Initialize support for InternVL2 series models (#6514) Isotr0py 2024-07-29 18:16:30 +08:00
3eeb148f46 [Misc] Pass cutlass_fp8_supported correctly in fbgemm_fp8 (#6871) Elsa Granger 2024-07-28 23:13:49 +08:00
b1366a9534 Add Nemotron to PP_SUPPORTED_MODELS (#6863) Michael Goin 2024-07-27 18:05:17 -04:00
75acdaa4b6 [Kernel] Increase precision of GPTQ/AWQ Marlin kernel (#6795) Alexander Matveev 2024-07-27 17:52:33 -04:00
fad5576c58 [TPU] Reduce compilation time & Upgrade PyTorch XLA version (#6856) Woosuk Kwon 2024-07-27 10:28:33 -07:00
f954d0715c [Docs] Add RunLLM chat widget (#6857) Chenggang Wu 2024-07-27 09:24:46 -07:00
1ad86acf17 [Model] Initial support for BLIP-2 (#5920) Cyrus Leung 2024-07-27 19:53:07 +08:00
ecb33a28cb [CI/Build][Doc] Update CI and Doc for VLM example changes (#6860) Roger Wang 2024-07-27 02:54:14 -07:00
a57d75821c [bugfix] make args.stream work (#6831) Wang Ran (汪然) 2024-07-27 17:07:02 +08:00
925de97e05 [Bugfix] Fix VLM example typo (#6859) Roger Wang 2024-07-26 23:24:08 -07:00
aa46953a20 [Misc][VLM][Doc] Consolidate offline examples for vision language models (#6858) Roger Wang 2024-07-26 22:44:13 -07:00
593e79e733 [Bugfix] torch.set_num_threads() in multiproc_gpu_executor (#6802) Travis Johnson 2024-07-26 23:15:20 -06:00
c53041ae3b [Doc] Add missing mock import to docs conf.py (#6834) Harry Mellor 2024-07-27 05:47:33 +01:00
52f07e3dec [Hardware][TPU] Implement tensor parallelism with Ray (#5871) Woosuk Kwon 2024-07-26 20:54:27 -07:00
14dbd5a767 [Model] H2O Danube3-4b (#6451) Joe 2024-07-26 20:47:50 -07:00
ed94e4f427 [Bugfix][Model] Jamba assertions and no chunked prefill by default for Jamba (#6784) tomeras91 2024-07-27 06:45:31 +03:00
3c3012398e [Doc] add VLLM_TARGET_DEVICE=neuron to documentation for neuron (#6844) omrishiv 2024-07-26 20:20:16 -07:00
ced36cd89b [ROCm] Upgrade PyTorch nightly version (#6845) Woosuk Kwon 2024-07-26 20:16:13 -07:00
969d032265 [Bugfix]: Fix Tensorizer test failures (#6835) Sanger Steel 2024-07-26 23:02:25 -04:00
55712941e5 [Bug Fix] Illegal memory access, FP8 Llama 3.1 405b (#6852) Lucas Wilkinson 2024-07-26 22:27:44 -04:00
981b0d5673 [Frontend] Factor out code for running uvicorn (#6828) Cyrus Leung 2024-07-27 09:58:25 +08:00
d09b94ca58 [TPU] Support collective communications in XLA devices (#6813) Woosuk Kwon 2024-07-26 18:45:57 -07:00
bb5494676f enforce eager mode with bnb quantization temporarily (#6846) chenqianfzh 2024-07-26 18:32:20 -07:00
b5f49ee55b Update README.md (#6847) Gurpreet Singh Dhami 2024-07-26 20:26:45 -04:00
150a1ffbfd [Doc] Update SkyPilot doc for wrong indents and instructions for update service (#4283) Zhanghao Wu 2024-07-26 17:39:10 -04:00
281977bd6e [Doc] Add Nemotron to supported model docs (#6843) Michael Goin 2024-07-26 17:32:44 -04:00
3bbb4936dc [Hardware] [Intel] Enable Multiprocessing and tensor parallel in CPU backend and update documentation (#6125) Li, Jiang 2024-07-27 04:50:10 +08:00
aa4867791e [Misc][TPU] Support TPU in initialize_ray_cluster (#6812) Woosuk Kwon 2024-07-26 12:39:49 -07:00
71734f1bf2 [Build/CI][ROCm] Minor simplification to Dockerfile.rocm (#6811) Woosuk Kwon 2024-07-26 12:28:32 -07:00
50704f52c4 [Bugfix][Kernel] Promote another index to int64_t (#6838) Tyler Michael Smith 2024-07-26 14:41:04 -04:00
07278c37dd [Model] Support Nemotron models (Nemotron-3, Nemotron-4, Minitron) (#6611) Michael Goin 2024-07-26 14:33:42 -04:00
85ad7e2d01 [doc][debugging] add known issues for hangs (#6816) youkaichao 2024-07-25 21:48:05 -07:00
89a84b0bb7 [Core] Use array to speedup padding (#6779) Peng Guanwen 2024-07-26 12:31:31 +08:00
084a01fd35 [Bugfix] [Easy] Fixed a bug in the multiprocessing GPU executor. (#6770) Anthony Platanios 2024-07-26 00:25:35 -04:00
062a1d0fab Fix ReplicatedLinear weight loading (#6793) QQSong 2024-07-25 19:24:58 -07:00
2eb9f4ff26 [ci] Mark tensorizer as soft fail and separate from grouped test (#6810) Kevin H. Luu 2024-07-25 18:08:33 -07:00
443c7cf4cf [ci][distributed] fix flaky tests (#6806) youkaichao 2024-07-25 17:44:09 -07:00
1adddb14bf [Core] Fix ray forward_dag error mssg (#6792) SangBin Cho 2024-07-25 16:53:25 -07:00
b7215de2c5 [Docs] Publish 5th meetup slides (#6799) Woosuk Kwon 2024-07-25 16:47:55 -07:00
f3ff63c3f4 [doc][distributed] improve multinode serving doc (#6804) youkaichao 2024-07-25 15:38:32 -07:00
cd7edc4e87 [Bugfix] Fix empty (nullptr) channelwise scales when loading wNa16 using compressed tensors (#6798) Lucas Wilkinson 2024-07-25 18:05:09 -04:00
6a1e25b151 [Doc] Add documentations for nightly benchmarks (#6412) Kuntai Du 2024-07-25 11:57:16 -07:00
95db75de64 [Bugfix] Add synchronize to prevent possible data race (#6788) Tyler Michael Smith 2024-07-25 13:40:01 -04:00
65b1f121c8 [Bugfix] Fix kv_cache_dtype=fp8 without scales for FP8 checkpoints (#6761) Michael Goin 2024-07-25 12:46:15 -04:00
889da130e7 [ Misc ] fp8-marlin channelwise via compressed-tensors (#6524) Robert Shaw 2024-07-25 09:46:04 -07:00
b75e314fff [Bugfix] Add image placeholder for OpenAI Compatible Server of MiniCPM-V (#6787) Alphi 2024-07-26 00:42:49 +08:00
316a41ac1d [Bugfix] Fix encoding_format in examples/openai_embedding_client.py (#6755) Chang Su 2024-07-24 22:48:07 -07:00
0310029a2f [Bugfix] Fix awq_marlin and gptq_marlin flags (#6745) Alexander Matveev 2024-07-25 01:34:11 -04:00
309aaef825 [Bugfix] Fix decode tokens w. CUDA graph (#6757) Cody Yu 2024-07-24 22:33:56 -07:00
9e169a4c61 [Model] Adding support for MiniCPM-V (#4087) Alphi 2024-07-25 11:59:30 +08:00
5689e256ba [Frontend] Represent tokens with identifiable strings (#6626) Evan Z. Liu 2024-07-24 18:51:00 -07:00
740374d456 [core][distributed] fix zmq hang (#6759) youkaichao 2024-07-24 17:37:12 -07:00
d88c458f44 [Doc][AMD][ROCm]Added tips to refer to mi300x tuning guide for mi300x users (#6754) Hongxia Yang 2024-07-24 17:32:57 -04:00
421e218b37 [Bugfix] Bump transformers to 4.43.2 (#6752) Michael Goin 2024-07-24 16:22:16 -04:00
5448f67635 [Core] Tweaks to model runner/input builder developer APIs (#6712) Antoni Baum 2024-07-24 12:17:12 -07:00
0e63494cf3 Add fp8 support to reshape_and_cache_flash (#6667) Antoni Baum 2024-07-24 11:36:52 -07:00
ee812580f7 [Frontend] split run_server into build_server and run_server (#6740) Daniele 2024-07-24 19:36:04 +02:00
40468b13fa [Bugfix] Miscalculated latency lead to time_to_first_token_seconds inaccurate. (#6686) Allen.Dou 2024-07-24 23:58:42 +08:00
2cf0df3381 [Bugfix] Fix speculative decode seeded test (#6743) Nick Hill 2024-07-24 08:58:31 -07:00
545146349c Adding f-string to validation error which is missing (#6748) LF Marques 2024-07-24 16:55:53 +01:00
f4f8a9d892 [Bugfix]fix modelscope compatible issue (#6730) liuyhwangyh 2024-07-24 20:04:46 +08:00
b570811706 [Build/CI] Update run-amd-test.sh. Enable Docker Hub login. (#6711) Alexei-V-Ivanov-AMD 2024-07-24 07:01:14 -05:00
ccc4a73257 [Docs][ROCm] Detailed instructions to build from source (#6680) Woosuk Kwon 2024-07-24 01:07:23 -07:00
0a740a11ba [Bugfix] Fix token padding for chameleon (#6724) Roger Wang 2024-07-24 01:05:09 -07:00
c882a7f5b3 [SpecDecoding] Update MLPSpeculator CI tests to use smaller model (#6714) Nick Hill 2024-07-24 00:34:22 -07:00
5e8ca973eb [Bugfix] fix flashinfer cudagraph capture for PP (#6708) William Lin 2024-07-23 18:49:44 -07:00
87525fab92 [bitsandbytes]: support read bnb pre-quantized model (#5753) dongmao zhang 2024-07-23 16:45:09 -07:00
2f808e69ab [Bugfix] StatLoggers: cache spec decode metrics when they get collected. (#6645) Thomas Parnell 2024-07-24 01:05:05 +02:00
01c16ede6b [CI] Add smoke test for non-uniform AutoFP8 quantization (#6702) Michael Goin 2024-07-23 18:45:12 -04:00
72fc704803 [build] relax wheel size limit (#6704) youkaichao 2024-07-23 14:03:49 -07:00
1bedf210e3 Bump transformers version for Llama 3.1 hotfix and patch Chameleon (#6690) Roger Wang 2024-07-23 13:47:48 -07:00
507ef787d8 [Model] Pipeline Parallel Support for DeepSeek v2 (#6519) Travis Johnson 2024-07-23 13:22:09 -06:00
58f53034ad [Frontend] Add Usage data in each chunk for chat_serving. #6540 (#6652) Yehoshua Cohen 2024-07-23 21:41:55 +03:00
0eb0757bef [Misc] Add ignored layers for fp8 quantization (#6657) Michael Goin 2024-07-23 14:04:04 -04:00
38c4b7e863 Bump version to 0.5.3.post1 (#6696) v0.5.3.post1 Simon Mo 2024-07-23 10:08:59 -07:00
a112a84aad [BugFix] Fix RoPE error in Llama 3.1 (#6693) Woosuk Kwon 2024-07-23 09:46:05 -07:00
461089a21a [Bugfix] Fix a log error in chunked prefill (#6694) Woosuk Kwon 2024-07-23 09:27:58 -07:00
71950af726 [doc][distributed] fix doc argument order (#6691) youkaichao 2024-07-23 08:55:33 -07:00
cb1362a889 [Docs] Announce llama3.1 support (#6688) Woosuk Kwon 2024-07-23 08:18:15 -07:00
bb2fc08072 Bump version to v0.5.3 (#6674) v0.5.3 Simon Mo 2024-07-23 00:00:08 -07:00
3eda4ec780 support ignore patterns in model loader (#6673) Simon Mo 2024-07-22 23:59:42 -07:00
22fa2e35cb [VLM][Model] Support image input for Chameleon (#6633) Roger Wang 2024-07-22 23:50:48 -07:00
c5201240a4 [misc] only tqdm for first rank (#6672) youkaichao 2024-07-22 21:57:27 -07:00
97234be0ec [Misc] Manage HTTP connections in one place (#6600) Cyrus Leung 2024-07-23 12:32:02 +08:00
c051bfe4eb [doc][distributed] doc for setting up multi-node environment (#6529) youkaichao 2024-07-22 21:22:09 -07:00
9e0b558a09 [Misc] Support FP8 kv cache scales from compressed-tensors (#6528) Michael Goin 2024-07-23 00:11:50 -04:00
e519ae097a add tqdm when loading checkpoint shards (#6569) zhaotyer 2024-07-23 11:48:01 +08:00
7c2749a4fd [misc] add start loading models for users information (#6670) youkaichao 2024-07-22 20:08:02 -07:00
729171ae58 [Misc] Enable chunked prefill by default for long context models (#6666) Woosuk Kwon 2024-07-22 20:03:13 -07:00
c5e8330997 [Bugfix] Fix null modules_to_not_convert in FBGEMM Fp8 quantization (#6665) Cheng Li 2024-07-22 19:25:05 -07:00
e0c15758b8 [Core] Modulize prepare input and attention metadata builder (#6596) Cody Yu 2024-07-22 17:45:24 -07:00
bdf5fd1386 [Misc] Remove deprecation warning for beam search (#6659) Woosuk Kwon 2024-07-22 17:21:58 -07:00
5a96ee52a3 [ci][build] add back vim in docker (#6661) youkaichao 2024-07-22 16:26:29 -07:00
42c7f66a38 [Core] Support dynamically loading Lora adapter from HuggingFace (#6234) Jiaxin Shan 2024-07-22 15:42:40 -07:00
69d5ae38dc [ci] Use different sccache bucket for CUDA 11.8 wheel build (#6656) Kevin H. Luu 2024-07-22 14:20:41 -07:00
fea59c7712 [Bugfix][Kernel] Use int64_t for indices in fp8 quant kernels (#6649) Tyler Michael Smith 2024-07-22 16:08:30 -04:00
739b61a348 [Frontend] Refactor prompt processing (#4028) Cyrus Leung 2024-07-23 01:13:53 +08:00
89c1c6a196 [Bugfix] Fix vocab_size field access in llava_next.py (#6624) Jae-Won Chung 2024-07-22 01:02:51 -04:00
42de2cefcb [Misc] Add a wrapper for torch.inference_mode (#6618) Woosuk Kwon 2024-07-21 18:43:11 -07:00
c9eef37f32 [Model] Initial Support for Chameleon (#5770) Roger Wang 2024-07-21 17:37:51 -07:00

... 137 138 139 140 141 ...