Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

3c6325f0fc [core][distributed] custom allreduce when pp size > 1 (#6117) youkaichao 2024-07-03 14:41:32 -07:00
47f0954af0 [Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975) Michael Goin 2024-07-03 13:38:00 -04:00
7cd2ebb025 [Bugfix] Fix compute_logits in Jamba (#6093) Roger Wang 2024-07-03 00:32:35 -07:00
f1c78138aa [Doc] Fix Mock Import (#6094) Roger Wang 2024-07-03 00:13:56 -07:00
3a86b54fb0 [VLM][Frontend] Proper Image Prompt Formatting from OpenAI API (#6091) Roger Wang 2024-07-02 23:41:23 -07:00
f666207161 [misc][distributed] error on invalid state (#6092) youkaichao 2024-07-02 23:37:29 -07:00
d830656a97 [BugFix] Avoid unnecessary Ray import warnings (#6079) Nick Hill 2024-07-02 23:09:40 -07:00
d18bab3587 [CI] Fix base url doesn't strip "/" (#6087) SangBin Cho 2024-07-03 13:31:25 +09:00
9831aec49f [Core] Dynamic image size support for VLMs (#5276) Cyrus Leung 2024-07-03 11:34:00 +08:00
482045ee77 [hardware][misc] introduce platform abstraction (#6080) youkaichao 2024-07-02 20:12:22 -07:00
9d6a8daa87 [Model] Jamba support (#4115) Mor Zusman 2024-07-03 02:11:29 +03:00
ee93f4f92a [CORE] Quantized lm-head Framework (#4442) Qubitium-ModelCloud 2024-07-03 06:25:17 +08:00
7c008c51a9 [ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970) Robert Shaw 2024-07-02 17:54:35 -04:00
4d26d806e1 Update conftest.py (#6076) Robert Shaw 2024-07-02 16:14:22 -04:00
c5832d2ae9 [Core] Pipeline Parallel Support (#4412) Murali Andoorveedu 2024-07-02 10:58:08 -07:00
15aba081f3 [Speculative Decoding] MLPSpeculator Tensor Parallel support (1/2) (#6050) Sirej Dua 2024-07-02 07:20:29 -07:00
31354e563f [Doc] Reinstate doc dependencies (#6061) Cyrus Leung 2024-07-02 18:53:16 +08:00
98d6682cd1 [VLM] Remove image_input_type from VLM config (#5852) xwjiang2010 2024-07-02 00:57:09 -07:00
2c37540aa6 [Frontend] Add template related params to request (#5709) danieljannai21 2024-07-02 09:01:57 +03:00
3476ed0809 [Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602) Alexander Matveev 2024-07-01 23:10:37 -04:00
54600709b6 [Model] Changes to MLPSpeculator to support tie_weights and input_scale (#5965) Thomas Parnell 2024-07-02 01:40:02 +02:00
e373853e12 [Frontend] Relax api url assertion for openai benchmarking (#6046) James Whedbee 2024-07-01 18:39:10 -05:00
c87ebc3ef9 [BugFix] Ensure worker model loop is always stopped at the right time (#5987) Nick Hill 2024-07-01 16:17:58 -07:00
c4059ea54f [Bugfix] Add explicit end_forward calls to flashinfer (#6044) Antoni Baum 2024-07-01 16:08:58 -07:00
8e0817c262 [Bugfix][Doc] Fix Doc Formatting (#6048) Roger Wang 2024-07-01 15:09:11 -07:00
83bdcb6ac3 add FAQ doc under 'serving' (#5946) ning.zhang 2024-07-01 14:11:36 -07:00
12a59959ed [Bugfix] adding chunking mechanism to fused_moe to handle large inputs (#6029) Avshalom Manevich 2024-07-02 00:08:29 +03:00
dec6fc6f3b [Bugfix] Use RayActorError for older versions of Ray in RayTokenizerGroupPool (#6039) Antoni Baum 2024-07-01 13:12:40 -07:00
8893130b63 [doc][misc] further lower visibility of simple api server (#6041) youkaichao 2024-07-01 10:50:56 -07:00
bb60326836 [Misc] update benchmark backend for scalellm (#6018) zhyncs 2024-07-02 01:20:33 +08:00
4050d646e5 [doc][misc] remove deprecated api server in doc (#6037) youkaichao 2024-07-01 09:52:43 -07:00
d76084c12f [ CI ] Re-enable Large Model LM Eval (#6031) Robert Shaw 2024-07-01 12:40:45 -04:00
80ca1e6a3a [Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348) sroy745 2024-07-01 00:33:05 -07:00
614aa51203 [misc][cuda] use nvml to avoid accidentally cuda initialization (#6007) youkaichao 2024-06-30 20:07:34 -07:00
af9ad46fca [ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify Weight Loading) (#5940) Robert Shaw 2024-06-30 19:06:27 -04:00
7836fdcc11 [Misc] Fix get_min_capability (#5971) Dipika Sikka 2024-06-30 16:15:16 -04:00
deacb7ec44 [ CI ] Temporarily Disable Large LM-Eval Tests (#6005) Robert Shaw 2024-06-30 14:56:56 -04:00
f5e73c9f1b [Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909) SangBin Cho 2024-07-01 02:11:15 +09:00
c6c240aa0a [Frontend]: Support base64 embedding (#5935) llmpros 2024-06-30 08:53:00 -07:00
2be6955a3f [ci][distributed] fix device count call youkaichao 2024-06-30 01:06:13 -07:00
9d47f64eb6 [CI/Build] [3/3] Reorganize entrypoints tests (#5966) Cyrus Leung 2024-06-30 12:58:49 +08:00
cff6a1fec1 [CI/Build] Reuse code for checking output consistency (#5988) Cyrus Leung 2024-06-30 11:44:25 +08:00
bcc6a09b63 [CI/Build] Temporarily Remove Phi3-Vision from TP Test (#5989) Roger Wang 2024-06-29 18:18:31 -07:00
9def10664e [Bugfix][CI/Build][Hardware][AMD] Install matching torchvision to fix AMD tests (#5949) Matt Wong 2024-06-29 14:47:58 -05:00
75aa1442db [ CI/Build ] LM Eval Harness Based CI Testing (#5838) Robert Shaw 2024-06-29 13:04:30 -04:00
99397da534 [CI/Build] Add TP test for vision models (#5892) Cyrus Leung 2024-06-29 23:45:54 +08:00
8dbfcd35bf [ CI/Build ] Added E2E Test For Compressed Tensors (#5839) Robert Shaw 2024-06-29 09:12:58 -04:00
f7dac83d95 [Kernel] Raise an exception in MoE kernel if the batch size is larger then 65k (#5939) Cody Yu 2024-06-29 06:04:20 -07:00
7c01f70641 [Core] Optimize SequenceStatus.is_finished by switching to IntEnum (#5974) Antoni Baum 2024-06-29 05:47:53 -07:00
51e971d39e [Bugfix] Support eos_token_id from config.json (#5954) Cyrus Leung 2024-06-29 19:19:02 +08:00
329df38f1a [Misc] Update Phi-3-Vision Example (#5981) Roger Wang 2024-06-28 23:34:29 -07:00
580353da93 [Bugfix] Fix precisions in Gemma 1 (#5913) Woosuk Kwon 2024-06-28 20:10:21 -07:00
ba4994443a [Kernel] Add punica dimensions for Granite 3b and 8b (#5930) Joe Runde 2024-06-28 20:48:25 -06:00
906a19cdb0 [Misc] Extend vLLM Metrics logging API (#5925) William Lin 2024-06-28 19:36:06 -07:00
c4bca740e8 [Bugfix] fix missing last itl in openai completions benchmark (#5926) mcalman 2024-06-28 22:34:42 -04:00
7f83f40dee [Bugfix][TPU] Fix pad slot id (#5977) Woosuk Kwon 2024-06-28 18:55:17 -07:00
54814fd85b [Bugfix][TPU] Fix TPU sampler output (#5978) Woosuk Kwon 2024-06-28 18:14:16 -07:00
7041de4384 [Kernel] Flashinfer for prefill & decode, with Cudagraph support for decode (#4628) Lily Liu 2024-06-28 15:28:49 -07:00
6a62cb82cc [Bugfix] Fix Engine Failing After Invalid Request - AsyncEngineDeadError (#5963) Robert Shaw 2024-06-28 17:46:30 -04:00
5d2a1a9cf0 Unmark more files as executable (#5962) Tyler Michael Smith 2024-06-28 17:34:56 -04:00
4bf35ed9ae [Bugfix] Only add Attention.kv_scale if kv cache quantization is enabled (#5936) Michael Goin 2024-06-28 17:12:40 -04:00
be0b3af9e0 Support Deepseek-V2 (#4650) wangding zeng 2024-06-29 04:24:57 +08:00
2cd402e169 [ Bugfix ] Enabling Loading Models With Fused QKV/MLP on Disk with FP8 (#5921) Robert Shaw 2024-06-28 14:43:49 -04:00
b185230744 [ Misc ] Remove fp8_shard_indexer from Col/Row Parallel Linear (Simplify Weight Loading) (#5928) Robert Shaw 2024-06-28 13:49:57 -04:00
6a2d659d28 [Bugfix] Fix compute datatype for cutlass 3.x epilogues (#5931) Tyler Michael Smith 2024-06-28 13:10:34 -04:00
b2c620230a [Spec Decode] Introduce DraftModelRunner (#5799) Cody Yu 2024-06-28 09:17:51 -07:00
b90d8cd832 [Distributed] Make it clear that % should not be in tensor dict keys. (#5927) xwjiang2010 2024-06-28 08:20:22 -07:00
3b752a6555 [CI/Build] [2/3] Reorganize entrypoints tests (#5904) Cyrus Leung 2024-06-28 22:59:18 +08:00
ec1ad0046c [Bugfix] Better error message for MLPSpeculator when num_speculative_tokens is set too high (#5894) Thomas Parnell 2024-06-28 16:42:17 +02:00
57f09a419c [Hardware][Intel] OpenVINO vLLM backend (#5379) Ilya Lavrenov 2024-06-28 17:50:16 +04:00
5932634409 Unmark fused_moe config json file as executable (#5960) Tyler Michael Smith 2024-06-28 09:36:12 -04:00
5cbe8d155c [Core] Registry for processing model inputs (#5214) Cyrus Leung 2024-06-28 20:09:56 +08:00
0d0e3a42ac [Bugfix][Hardware][Intel CPU] Fix unpassed multi_modal_kwargs for CPU runner (#5956) Isotr0py 2024-06-28 20:03:41 +08:00
74d55c065b [VLM][BugFix] Make sure that multi_modal_kwargs can broadcast properly with ring buffer. (#5905) xwjiang2010 2024-06-28 00:29:13 -07:00
f136da15e1 [Hardware][TPU] Optimize KV cache swapping (#5878) Woosuk Kwon 2024-06-27 21:12:13 -07:00
c3dde367f1 [Kernel][ROCm][AMD] fused_moe Triton configs v2 for mi300X (#5932) Divakar Verma 2024-06-27 15:41:08 -05:00
64e8d2a783 [core][misc] remove logical block (#5882) youkaichao 2024-06-27 13:34:55 -07:00
79c92c7c8a [Model] Add Gemma 2 (#5908) Woosuk Kwon 2024-06-27 13:33:56 -07:00
736ed38849 [CI/Build] Fix Args for _get_logits_warper in Sampler Test (#5922) Roger Wang 2024-06-27 11:43:04 -07:00
365791ff81 [BugFix] Fix min_tokens behaviour for multiple eos tokens (#5849) Nick Hill 2024-06-27 11:31:11 -07:00
691e29ecf3 [BugFix] Fix MLPSpeculator handling of num_speculative_tokens (#5876) Nick Hill 2024-06-27 10:59:33 -07:00
3fd02bda51 [doc][misc] add note for Kubernetes users (#5916) youkaichao 2024-06-27 10:07:07 -07:00
98cf2ed678 [Model][Bugfix] Implicit model flags and reenable Phi-3-Vision (#5896) Cyrus Leung 2024-06-28 00:08:10 +08:00
e9d32d077d [CI/Build] [1/3] Reorganize entrypoints tests (#5526) Cyrus Leung 2024-06-27 20:43:17 +08:00
2061f0b8a7 [Bugfix] Fix img_sizes Parsing in Phi3-Vision (#5888) Roger Wang 2024-06-27 01:29:24 -07:00
96354d6a29 [Model] Add base class for LoRA-supported models (#5018) Cyrus Leung 2024-06-27 16:03:04 +08:00
d12af207d2 [VLM][Bugfix] Make sure that multi_modal_kwargs is broadcasted properly (#5880) xwjiang2010 2024-06-27 00:15:24 -07:00
6eabc6cb0e [Doc] Add note about context length in Phi-3-Vision example (#5887) Cyrus Leung 2024-06-27 14:20:01 +08:00
2110557dab [BugFix] Fix cuda graph for MLPSpeculator (#5875) Nick Hill 2024-06-26 21:12:10 -07:00
b9e84259e9 [Misc] Add example for LLaVA-NeXT (#5879) Roger Wang 2024-06-26 17:57:16 -07:00
294104c3f9 [doc] update usage of env var to avoid conflict (#5873) youkaichao 2024-06-26 14:57:12 -07:00
38a1674abb Support CPU inference with VSX PowerPC ISA (#5652) Chip Kerchner 2024-06-26 17:53:04 -04:00
f5c8628fdc [Bugfix][TPU] Fix CPU cache allocation (#5869) Woosuk Kwon 2024-06-26 13:42:40 -07:00
cbc53b6b8d [Hardware][TPU] Support parallel sampling & Swapping (#5855) Woosuk Kwon 2024-06-26 11:07:49 -07:00
c54269d967 [Frontend] Add tokenize/detokenize endpoints (#5054) sasha0552 2024-06-26 16:54:22 +00:00
5bfd1bbc98 [Kernel] Adding bias epilogue support for cutlass_scaled_mm (#5560) Luka Govedič 2024-06-26 11:16:00 -04:00
6984c02a27 [CI/Build] Refactor image test assets (#5821) Cyrus Leung 2024-06-26 16:02:34 +08:00
3439c5a8e3 [Bugfix][TPU] Fix KV cache size calculation (#5860) Woosuk Kwon 2024-06-26 00:58:23 -07:00
6806998bf9 [Bugfix] Fix embedding to support 2D inputs (#5829) Woosuk Kwon 2024-06-26 00:15:22 -07:00
515080ad2f [bugfix][distributed] fix shm broadcast when the queue size is full (#5801) youkaichao 2024-06-25 21:56:02 -07:00

... 140 141 142 143 144 ...