Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

789937af2e [Doc] [SpecDecode] Update MLPSpeculator documentation (#7100) Thomas Parnell 2024-08-06 01:29:43 +02:00
dfb1a15dcb [ci][frontend] deduplicate tests (#7101) youkaichao 2024-08-05 15:59:22 -07:00
4db5176d97 bump version to v0.5.4 (#7139) v0.5.4 Simon Mo 2024-08-05 14:39:48 -07:00
4cf1dc39be [Bugfix][CI/Build] Fix CUTLASS FetchContent (#7171) Tyler Michael Smith 2024-08-05 17:22:57 -04:00
6e4852ce28 [CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001) Tyler Michael Smith 2024-08-05 16:00:01 -04:00
8571ac4672 [Kernel] Update CUTLASS to 3.5.1 (#7085) Tyler Michael Smith 2024-08-05 15:13:43 -04:00
997cf78308 [Misc] Fix typo in GroupCoordinator.recv() (#7167) Rui Qiao 2024-08-05 11:10:16 -07:00
57f560aa23 [BugFix] Use args.trust_remote_code (#7121) Aditya Paliwal 2024-08-05 09:26:14 -07:00
003f8ee128 [BugFix] Use IP4 localhost form for zmq bind (#7163) Nick Hill 2024-08-05 08:41:03 -07:00
e9630458c7 [SpecDecode] Support FlashInfer in DraftModelRunner (#6926) Bongwon Jang 2024-08-06 00:05:05 +09:00
82a1b1a82b [Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963) Cade Daniel 2024-08-05 01:46:44 -07:00
c0d8f1636c [Model] SiglipVisionModel ported from transformers (#6942) Jungho Christopher Cho 2024-08-05 15:22:12 +09:00
cc08fc7225 [Frontend] Reapply "Factor out code for running uvicorn" (#7095) Cyrus Leung 2024-08-05 11:40:51 +08:00
7b86e7c9cd [Model] Add multi-image support for minicpmv (#7122) Alphi 2024-08-05 09:23:17 +08:00
f80ab3521c Clean up remaining Punica C information (#7027) Jee Jee Li 2024-08-05 06:37:08 +08:00
16a1cc9bb2 [misc][distributed] improve libcudart.so finding (#7127) youkaichao 2024-08-04 11:31:51 -07:00
b1c9aa3daa [Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (#7105) Thomas Parnell 2024-08-04 16:13:18 +02:00
179a6a36f2 [Model]Refactor MiniCPMV (#7020) Jee Jee Li 2024-08-04 16:12:41 +08:00
83c644fe7e [core][misc] simply output processing with shortcut code path (#7117) youkaichao 2024-08-04 00:22:19 -07:00
9fadc7b7a0 [misc] add zmq in collect env (#7119) youkaichao 2024-08-03 22:03:46 -07:00
654bc5ca49 Support for guided decoding for offline LLM (#6878) Yihuan Bu 2024-08-03 23:12:09 -04:00
825b044863 [Frontend] Warn if user max_model_len is greater than derived max_model_len (#7080) Jeff Fialho 2024-08-03 20:01:38 -03:00
44dcb52e39 [ci][test] finalize fork_new_process_for_each_test (#7114) youkaichao 2024-08-03 10:44:53 -07:00
67d745cc68 [CI] Temporarily turn off H100 performance benchmark (#7104) Kuntai Du 2024-08-02 23:52:44 -07:00
99d7cabd7b [LoRA] ReplicatedLinear support LoRA (#7081) Jee Jee Li 2024-08-03 13:40:19 +08:00
fb2c1c86c1 [Bugfix] Fix block table for seqs that have prefix cache hits (#7018) Zach Zheng 2024-08-02 22:38:15 -07:00
0c25435daa [Model] Refactor and decouple weight loading logic for InternVL2 model (#7067) Isotr0py 2024-08-03 13:36:14 +08:00
a0d164567c [ci][distributed] disable ray dag tests (#7099) youkaichao 2024-08-02 22:32:04 -07:00
04e5583425 [ci][distributed] merge distributed test commands (#7097) youkaichao 2024-08-02 21:33:53 -07:00
8c025fa703 [Frontend] Factor out chat message parsing (#7055) Cyrus Leung 2024-08-03 12:31:27 +08:00
69ea15e5cc [ci][distributed] shorten wait time if server hangs (#7098) youkaichao 2024-08-02 21:05:16 -07:00
ed812a73fa [ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883) Robert Shaw 2024-08-02 21:27:28 -04:00
708989341e [misc] add a flag to enable compile (#7092) youkaichao 2024-08-02 16:18:45 -07:00
22e718ff1a [Misc] Revive to use loopback address for driver IP (#7091) Rui Qiao 2024-08-02 15:50:00 -07:00
05308891e2 [Core] Pipeline parallel with Ray ADAG (#6837) Rui Qiao 2024-08-02 13:55:40 -07:00
a8d604ca2a [Misc] Disambiguate quantized types via a new ScalarType (#6396) Lucas Wilkinson 2024-08-02 16:51:58 -04:00
b482b9a5b1 [CI/Build] Add support for Python 3.12 (#7035) Michael Goin 2024-08-02 16:51:22 -04:00
806949514a [ci] set timeout for test_oot_registration.py (#7082) youkaichao 2024-08-02 10:03:24 -07:00
c16eaac500 [Hardware][Intel CPU] Update torch 2.4.0 for CPU backend (#6931) Jie Fu (傅杰) 2024-08-02 23:55:58 +08:00
db35186391 [Core] Comment out unused code in sampler (#7023) Peng Guanwen 2024-08-02 15:58:26 +08:00
660dea1235 [cuda][misc] remove error_on_invalid_device_count_status (#7069) youkaichao 2024-08-02 00:14:21 -07:00
cf2a1a4d9d Fix tracing.py (#7065) Bongwon Jang 2024-08-02 15:28:00 +09:00
252357793d [ci][distributed] try to fix pp test (#7054) youkaichao 2024-08-01 22:03:12 -07:00
3bb4b1e4cd [mypy] Speed up mypy checking (#7056) Cyrus Leung 2024-08-02 10:49:43 +08:00
954f7305a1 [Kernel] Fix input for flashinfer prefill wrapper. (#7008) Lily Liu 2024-08-01 18:44:16 -07:00
6ce01f3066 [Performance] Optimize get_seqs (#7051) Woosuk Kwon 2024-08-01 18:29:52 -07:00
6a11fdfbb8 [CI/Build][Bugfix] Fix CUTLASS header-only line (#7034) Tyler Michael Smith 2024-08-01 16:51:15 -04:00
805a8a75f2 [Misc] Support attention logits soft-capping with flash-attn (#7022) Woosuk Kwon 2024-08-01 13:14:37 -07:00
562e580abc Update run-amd-test.sh (#7044) omkar kakarparthi 2024-08-01 15:12:37 -05:00
fc912e0886 [Models] Support Qwen model with PP (#6974) Murali Andoorveedu 2024-08-01 12:40:43 -07:00
f4fd390f5d [Bugfix] Lower gemma's unloaded_params exception to warning (#7002) Michael Goin 2024-08-01 15:01:07 -04:00
fb3db61688 [CI/Build] Remove sparseml requirement from testing (#7037) Michael Goin 2024-08-01 15:00:51 -04:00
2dd34371a6 [Bugfix] Fix RMSNorm forward in InternViT attention qk_layernorm (#6992) Isotr0py 2024-08-02 03:00:28 +08:00
7e0861bd0b [CI/Build] Update PyTorch to 2.4.0 (#6951) Sage Moore 2024-08-01 11:11:24 -07:00
a72a424b3e [Build/CI] Fixing Docker Hub quota issue. (#7043) Alexei-V-Ivanov-AMD 2024-08-01 13:07:37 -05:00
c8a7e93273 [core][scheduler] simplify and improve scheduler (#6867) youkaichao 2024-07-31 23:51:09 -07:00
3c10591ef2 [Bugfix] Set SamplingParams.max_tokens for OpenAI requests if not provided by user (#6954) zifeitong 2024-07-31 21:13:34 -07:00
0437492ea9 PP comm optimization: replace send with partial send + allgather (#6695) Aurick Qiao 2024-07-31 20:15:42 -07:00
630dd9e0ae [Bugfix][Model] Skip loading lm_head weights if using tie_word_embeddings (#6758) Travis Johnson 2024-07-31 20:49:11 -06:00
23993a7997 [Bugfix][TPU] Do not use torch.Generator for TPUs (#6981) Woosuk Kwon 2024-07-31 18:50:28 -07:00
1d2e7fb73f [Model] Pipeline parallel support for Qwen2 (#6924) xuyi 2024-08-01 09:49:51 +08:00
7ecee34321 [Kernel][RFC] Refactor the punica kernel based on Triton (#5036) Jee Jee Li 2024-08-01 08:12:24 +08:00
7eb0cb4a14 Revert "[Frontend] Factor out code for running uvicorn" (#7012) Simon Mo 2024-07-31 16:34:26 -07:00
a0dce9383a [Misc] Add compressed-tensors to optimized quant list (#7006) Michael Goin 2024-07-31 17:40:44 -04:00
35e9c12bfa [Kernel] Tuned int8 Cutlass Kernels for SM75 (T4) (#6996) Varun Sundar Rabindranath 2024-07-31 17:40:32 -04:00
93548eb37e [Kernel] Enable FP8 Cutlass for Ada Lovelace (#6950) Varun Sundar Rabindranath 2024-07-31 17:40:22 -04:00
460c1884e3 [Bugfix] Support cpu offloading with fp8 quantization (#6960) Michael Goin 2024-07-31 15:47:46 -04:00
bd70013407 [MISC] Introduce pipeline parallelism partition strategies (#6920) Cody Yu 2024-07-31 12:02:17 -07:00
2ee8d3ba55 [Model] use FusedMoE layer in Jamba (#6935) Avshalom Manevich 2024-07-31 22:00:24 +03:00
daed30c4a9 [Bugfix] Fix feature size calculation for LLaVA-NeXT (#6982) Cyrus Leung 2024-07-31 23:46:17 +08:00
2f4e108f75 [Bugfix] Clean up MiniCPM-V (#6939) Alphi 2024-07-31 22:39:19 +08:00
6512937de1 Support W4A8 quantization for vllm (#5218) HandH1998 2024-07-31 21:55:21 +08:00
c0644cf9ce [Bugfix] fix logit processor excceed vocab size issue (#6927) Fei 2024-07-31 01:16:01 -07:00
533d1932d2 [Bugfix][TPU] Set readonly=True for non-root devices (#6980) Woosuk Kwon 2024-07-31 00:19:28 -07:00
9f0e69b653 [CI/Build] Fix mypy errors (#6968) Cyrus Leung 2024-07-31 10:49:48 +08:00
f230cc2ca6 [Bugfix] Fix broadcasting logic for multi_modal_kwargs (#6836) Cyrus Leung 2024-07-31 10:38:45 +08:00
da1f7cc12a [mypy] Enable following imports for some directories (#6681) Cyrus Leung 2024-07-31 10:38:03 +08:00
c32ab8be1a [Speculative decoding] Add serving benchmark for llama3 70b + speculative decoding (#6964) Cade Daniel 2024-07-30 17:53:21 -07:00
fb4f530bf5 [CI] [nightly benchmark] Do not re-download sharegpt dataset if exists (#6706) Cade Daniel 2024-07-30 16:28:49 -07:00
79319cedfa [Nightly benchmarking suite] Remove pkill python from run benchmark suite (#6965) Cade Daniel 2024-07-30 16:28:05 -07:00
40c27a7cbb [Build] Temporarily Disable Kernels and LoRA tests (#6961) Simon Mo 2024-07-30 14:59:48 -07:00
6ca8031e71 [core][misc] improve free_finished_seq_groups (#6865) youkaichao 2024-07-30 14:32:12 -07:00
d7a299edaa [Kernel] Remove scaled_fp8_quant kernel padding footgun (#6842) Tyler Michael Smith 2024-07-30 16:37:01 -04:00
052b6f8ca4 [Bugfix] Fix tensorizer memory profiling bug during testing (#6881) Sanger Steel 2024-07-30 14:48:50 -04:00
5895b24677 [OpenVINO] Updated OpenVINO requirements and build docs (#6948) Ilya Lavrenov 2024-07-30 22:33:01 +04:00
cbbc904470 [Kernel] Squash a few more warnings (#6914) Tyler Michael Smith 2024-07-30 13:50:42 -04:00
5cf9254a9c [BugFix] Fix use of per-request seed with pipeline parallel (#6698) Nick Hill 2024-07-30 10:40:08 -07:00
f058403683 [Doc] Super tiny fix doc typo (#6949) fzyzcjy 2024-07-31 00:14:03 +08:00
c66c7f86ac [Bugfix] Fix PaliGemma MMP (#6930) Roger Wang 2024-07-30 02:20:57 -07:00
6e063ea35b [TPU] Fix greedy decoding (#6933) Woosuk Kwon 2024-07-30 02:06:29 -07:00
af647fb8b3 [Kernel] Tuned int8 kernels for Ada Lovelace (#6848) Varun Sundar Rabindranath 2024-07-29 22:24:58 -04:00
61a97c32f6 [Kernel] Fix marlin divide-by-zero warnings (#6904) Tyler Michael Smith 2024-07-29 21:26:07 -04:00
4fbf4aa128 [ci] GHA workflow to remove ready label upon "/notready" comment (#6921) Kevin H. Luu 2024-07-29 17:03:45 -07:00
aae6d36f7e [Kernel] Remove unused variables in awq/gemm_kernels.cu (#6908) Tyler Michael Smith 2024-07-29 20:01:17 -04:00
9f69d8245a [Frontend] New allowed_token_ids decoding request parameter (#6753) Nick Hill 2024-07-29 16:37:27 -07:00
9a7e2d0534 [Bugfix] Allow vllm to still work if triton is not installed. (#6786) Thomas Parnell 2024-07-29 23:51:27 +02:00
7f8d612d24 [TPU] Support tensor parallelism in async llm engine (#6891) Earthwalker 2024-07-30 03:42:21 +08:00
60d1c6e584 [Kernel] Fix deprecation function warnings squeezellm quant_cuda_kernel (#6901) Tyler Michael Smith 2024-07-29 12:59:02 -04:00
db9e5708a9 [Core] Reduce unnecessary compute when logprobs=None (#6532) Peng Guanwen 2024-07-30 00:47:31 +08:00
766435e660 [Kernel] Tuned FP8 Kernels for Ada Lovelace (#6677) Varun Sundar Rabindranath 2024-07-29 11:42:35 -04:00

... 136 137 138 139 140 ...