Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

3aa7b6cf66 [Misc][Doc] Add Example of using OpenAI Server with VLM (#5832) Roger Wang 2024-06-25 20:34:25 -07:00
dda4811591 [Core] Refactor Worker and ModelRunner to consolidate control plane communication (#5408) Stephanie Wang 2024-06-25 20:30:03 -07:00
82079729cc [Bugfix] Fix assertion in NeuronExecutor (#5841) aws-patlange 2024-06-25 19:52:10 -07:00
c2a8ac75e0 [CI/Build] Add E2E tests for MLPSpeculator (#5791) Thomas Parnell 2024-06-26 01:04:08 +01:00
f178e56c68 [Hardware][TPU] Raise errors for unsupported sampling params (#5850) Woosuk Kwon 2024-06-25 16:58:23 -07:00
dd793d1de5 [Hardware][AMD][CI/Build][Doc] Upgrade to ROCm 6.1, Dockerfile improvements, test fixes (#5422) Matt Wong 2024-06-25 17:56:15 -05:00
bc34937d68 [Hardware][TPU] Refactor TPU backend (#5831) Woosuk Kwon 2024-06-25 15:25:52 -07:00
dd248f7675 [Misc] Update w4a16 compressed-tensors support to include w8a16 (#5794) Dipika Sikka 2024-06-25 15:23:35 -04:00
d9b34baedd [CI/Build] Add unit testing for FlexibleArgumentParser (#5798) Michael Goin 2024-06-25 15:18:03 -04:00
c18ebfdd71 [doc][distributed] add both gloo and nccl tests (#5834) youkaichao 2024-06-25 12:10:28 -07:00
67882dbb44 [Core] Add fault tolerance for RayTokenizerGroupPool (#5748) Antoni Baum 2024-06-25 10:15:10 -07:00
7b99314301 [Misc] Remove useless code in cpu_worker (#5824) Jie Fu (傅杰) 2024-06-26 00:41:36 +08:00
2ce5d6688b [Speculative Decoding] Support draft model on different tensor-parallel size than target model (#5414) Woo-Yeon Lee 2024-06-25 18:56:06 +09:00
f23871e9ee [Doc] Add notice about breaking changes to VLMs (#5818) Cyrus Leung 2024-06-25 16:25:03 +08:00
e9de9dd551 [ci] Remove aws template (#5757) Kevin H. Luu 2024-06-24 21:09:02 -07:00
ba991d5c84 [Bugfix] Fix FlexibleArgumentParser replaces _ with - for actual args (#5795) Chang Su 2024-06-24 16:01:19 -07:00
1744cc99ba [Doc] Add Phi-3-medium to list of supported models (#5788) Michael Goin 2024-06-24 13:48:55 -04:00
e72dc6cb35 [Doc] Add "Suggest edit" button to doc pages (#5789) Michael Goin 2024-06-24 13:26:17 -04:00
c246212952 [doc][faq] add warning to download models for every nodes (#5783) youkaichao 2024-06-24 00:37:42 -07:00
edd5fe5fa2 [Bugfix] Add phi3v resize for dynamic shape and fix torchvision requirement (#5772) Isotr0py 2024-06-24 12:11:53 +08:00
5d4d90536f [Distributed] Add send and recv helpers (#5719) Murali Andoorveedu 2024-06-23 17:42:28 -04:00
6c916ac8a8 [BugFix] [Kernel] Add Cutlass2x fallback kernels (#5744) Varun Sundar Rabindranath 2024-06-24 02:37:11 +05:30
832ea88fcb [core][distributed] improve shared memory broadcast (#5754) youkaichao 2024-06-22 10:00:43 -07:00
8c00f9c15d [Docs][TPU] Add installation tip for TPU (#5761) Woosuk Kwon 2024-06-21 23:09:40 -07:00
0cbc1d2b4f [Bugfix] Fix pin_lora error in TPU executor (#5760) Woosuk Kwon 2024-06-21 22:25:14 -07:00
ff9ddbceee [Misc] Remove #4789 workaround left in vllm/entrypoints/openai/run_batch.py (#5756) zifeitong 2024-06-21 20:33:12 -07:00
9c62db07ed [Model] Support Qwen-VL and Qwen-VL-Chat models with text-only inputs (#5710) Jie Fu (傅杰) 2024-06-22 10:07:08 +08:00
cf90ae0123 [CI][Hardware][Intel GPU] add Intel GPU(XPU) ci pipeline (#5616) Kunshang Ji 2024-06-22 08:09:34 +08:00
f5dda63eb5 [LoRA] Add support for pinning lora adapters in the LRU cache (#5603) rohithkrn 2024-06-21 15:42:46 -07:00
7187507301 [ci][test] fix ca test in main (#5746) youkaichao 2024-06-21 14:04:26 -07:00
f1e72cc19a [BugFix] exclude version 1.15.0 for modelscope (#5668) zhyncs 2024-06-22 03:15:48 +08:00
5b15bde539 [Doc] Documentation on supported hardware for quantization methods (#5745) Michael Goin 2024-06-21 12:44:29 -04:00
bd620b01fb [Kernel][CPU] Add Quick gelu to CPU (#5717) Roger Wang 2024-06-20 23:39:40 -07:00
d9a252bc8e [Core][Distributed] add shm broadcast (#5399) youkaichao 2024-06-20 22:12:35 -07:00
67005a07bc [Bugfix] Add fully sharded layer for QKVParallelLinearWithLora (#5665) Jee Li 2024-06-21 12:46:28 +08:00
c35e4a3dd7 [BugFix] Fix test_phi3v.py (#5725) Chang Su 2024-06-20 21:45:34 -07:00
1f5674218f [Kernel] Add punica dimension for Qwen2 LoRA (#5441) Jinzhen Lin 2024-06-21 08:55:41 +08:00
b12518d3cf [Model] MLPSpeculator speculative decoding support (#4947) Joshua Rosenkranz 2024-06-20 20:23:12 -04:00
6c5b7af152 [distributed][misc] use fork by default for mp (#5669) youkaichao 2024-06-20 17:06:34 -07:00
8065a7e220 [Frontend] Add FlexibleArgumentParser to support both underscore and dash in names (#5718) Michael Goin 2024-06-20 19:00:13 -04:00
3f3b6b2150 [Bugfix] Fix the CUDA version check for FP8 support in the CUTLASS kernels (#5715) Tyler Michael Smith 2024-06-20 14:36:10 -04:00
a7dcc62086 [Kernel] Update Cutlass int8 kernel configs for SM80 (#5275) Varun Sundar Rabindranath 2024-06-20 19:03:21 +05:30
ad137cd111 [Model] Port over CLIPVisionModel for VLMs (#5591) Roger Wang 2024-06-20 04:52:09 -07:00
111af1fa2c [Kernel] Update Cutlass int8 kernel configs for SM90 (#5514) Varun Sundar Rabindranath 2024-06-20 12:07:08 +05:30
1b2eaac316 [Bugfix][Doc] FIx Duplicate Explicit Target Name Errors (#5703) Roger Wang 2024-06-19 23:10:47 -07:00
3730a1c832 [Misc] Improve conftest (#5681) Cyrus Leung 2024-06-20 10:09:21 +08:00
949e49a685 [ci] Limit num gpus if specified for A100 (#5694) Kevin H. Luu 2024-06-19 16:30:03 -07:00
4a30d7e3cc [Misc] Add per channel support for static activation quantization; update w8a8 schemes to share base classes (#5650) Dipika Sikka 2024-06-19 18:06:44 -04:00
e83db9e7e3 [Doc] Update docker references (#5614) Rafael Vasquez 2024-06-19 18:01:45 -04:00
78687504f7 [Bugfix] AsyncLLMEngine hangs with asyncio.run (#5654) zifeitong 2024-06-19 13:57:12 -07:00
d571ca0108 [ci][distributed] add tests for custom allreduce (#5689) youkaichao 2024-06-19 13:16:04 -07:00
afed90a034 [Frontend][Bugfix] Fix preemption_mode -> preemption-mode for CLI arg in arg_utils.py (#5688) Michael Goin 2024-06-19 14:41:42 -04:00
3ee5c4bca5 [ci] Add A100 queue into AWS CI template (#5648) Kevin H. Luu 2024-06-19 07:42:13 -07:00
e9c2732b97 [CI/Build] Add tqdm to dependencies (#5680) Cyrus Leung 2024-06-19 22:37:33 +08:00
d8714530d1 [Misc]Add param max-model-len in benchmark_latency.py (#5629) DearPlanet 2024-06-19 18:19:08 +08:00
7d46c8d378 [Bugfix] Fix sampling_params passed incorrectly in Phi3v example (#5684) Isotr0py 2024-06-19 17:58:32 +08:00
da971ec7a5 [Model] Add FP8 kv cache for Qwen2 (#5656) Michael Goin 2024-06-19 05:38:26 -04:00
3eea74889f [misc][distributed] use 127.0.0.1 for single-node (#5619) youkaichao 2024-06-19 01:05:00 -07:00
f758aed0e8 [Bugfix][CI/Build][AMD][ROCm]Fixed the cmake build bug which generate garbage on certain devices (#5641) Hongxia Yang 2024-06-19 02:21:29 -04:00
e5150f2c28 [Bugfix] Added test for sampling repetition penalty bug. (#5659) Thomas Parnell 2024-06-19 08:03:55 +02:00
59a1eb59c9 [Bugfix] Fix Phi-3 Long RoPE scaling implementation (#5628) Shukant Pal 2024-06-18 18:46:38 -07:00
6820724e51 [Bugfix] Fix w8a8 benchmarks for int8 case (#5643) Tyler Michael Smith 2024-06-18 20:33:25 -04:00
b23ce92032 [Bugfix] Fix CUDA version check for mma warning suppression (#5642) Tyler Michael Smith 2024-06-18 19:48:49 -04:00
2bd231a7b7 [Doc] Added cerebrium as Integration option (#5553) milo157 2024-06-18 18:56:59 -04:00
8a173382c8 [Bugfix] Fix for inconsistent behaviour related to sampling and repetition penalties (#5639) Thomas Parnell 2024-06-18 23:18:37 +02:00
07feecde1a [Model] LoRA support added for command-r (#5178) sergey-tinkoff 2024-06-18 21:01:21 +03:00
19091efc44 [ci] Setup Release pipeline and build release wheels with cache (#5610) Kevin H. Luu 2024-06-18 11:00:36 -07:00
95db455e7f [Misc] Add channel-wise quantization support for w8a8 dynamic per token activation quantization (#5542) Dipika Sikka 2024-06-18 12:45:05 -04:00
7879f24dcc [Misc] Add OpenTelemetry support (#4687) Ronen Schaffer 2024-06-18 19:17:03 +03:00
13db4369d9 [ci] Deprecate original CI template (#5624) Kevin H. Luu 2024-06-18 07:26:20 -07:00
4ad7b53e59 [CI/Build][Misc] Update Pytest Marker for VLMs (#5623) Roger Wang 2024-06-18 06:10:04 -07:00
f0cc0e68e3 [Misc] Remove import from transformers logging (#5625) Chang Su 2024-06-18 05:12:19 -07:00
db5ec52ad7 [bugfix][distributed] improve p2p capability test (#5612) youkaichao 2024-06-18 00:21:05 -07:00
114d7270ff [CI] Avoid naming different metrics with the same name in performance benchmark (#5615) Kuntai Du 2024-06-17 21:37:18 -07:00
32c86e494a [Misc] Fix typo (#5618) Cyrus Leung 2024-06-18 11:58:30 +08:00
8eadcf0b90 [misc][typo] fix typo (#5620) youkaichao 2024-06-17 20:54:57 -07:00
5002175e80 [Kernel] Add punica dimensions for Granite 13b (#5559) Joe Runde 2024-06-17 21:54:11 -06:00
daef218b55 [Model] Initialize Phi-3-vision support (#4986) Isotr0py 2024-06-18 10:34:33 +08:00
fa9e385229 [Speculative Decoding 1/2 ] Add typical acceptance sampling as one of the sampling techniques in the verifier (#5131) sroy745 2024-06-17 19:29:09 -07:00
26e1188e51 [Fix] Use utf-8 encoding in entrypoints/openai/run_batch.py (#5606) zifeitong 2024-06-17 16:16:10 -07:00
a3e8a05d4c [Bugfix] Fix KV head calculation for MPT models when using GQA (#5142) Bruce Fontaine 2024-06-17 15:26:41 -07:00
e441bad674 [Optimization] use a pool to reuse LogicalTokenBlock.token_ids (#5584) youkaichao 2024-06-17 15:08:05 -07:00
1b44aaf4e3 [bugfix][distributed] fix 16 gpus local rank arrangement (#5604) youkaichao 2024-06-17 14:35:04 -07:00
9e4e6fe207 [CI] the readability of benchmarking and prepare for dashboard (#5571) Kuntai Du 2024-06-17 11:41:08 -07:00
ab66536dbf [CI/BUILD] Support non-AVX512 vLLM building and testing (#5574) Jie Fu (傅杰) 2024-06-18 02:36:10 +08:00
728c4c8a06 [Hardware][Intel GPU] Add Intel GPU(XPU) inference backend (#3814) Kunshang Ji 2024-06-18 02:01:25 +08:00
1f12122b17 [Misc] use AutoTokenizer for benchmark serving when vLLM not installed (#5588) zhyncs 2024-06-18 00:40:35 +08:00
890d8d960b [Kernel] compressed-tensors marlin 24 support (#5435) Dipika Sikka 2024-06-17 12:32:48 -04:00
9e74d9d003 Correct alignment in the seq_len diagram. (#5592) Charles Riggins 2024-06-18 00:05:33 +08:00
9333fb8eb9 [Model] Rename Phi3 rope scaling type (#5595) Amit Garg 2024-06-17 09:04:14 -07:00
e2b85cf86a Fix w8a8 benchmark and add Llama-3-8B (#5562) Cody Yu 2024-06-16 23:48:06 -07:00
845a3f26f9 [Doc] add debugging tips for crash and multi-node debugging (#5581) youkaichao 2024-06-16 19:08:01 -07:00
f07d513320 [build][misc] limit numpy version (#5582) youkaichao 2024-06-16 16:07:01 -07:00
4a6769053a [CI][BugFix] Flip is_quant_method_supported condition (#5577) Michael Goin 2024-06-16 10:07:34 -04:00
f31c1f90e3 Add basic correctness 2 GPU tests to 4 GPU pipeline (#5518) Antoni Baum 2024-06-16 00:48:02 -07:00
3ce2c050dd [Fix] Correct OpenAI batch response format (#5554) zifeitong 2024-06-15 16:57:54 -07:00
1c0afa13c5 [BugFix] Don't start a Ray cluster when not using Ray (#5570) Nick Hill 2024-06-15 16:30:51 -07:00
d919ecc771 add gptq_marlin test for bug report https://github.com/vllm-project/vllm/issues/5088 (#5145) Alexander Matveev 2024-06-15 13:38:16 -04:00
e691918e3b [misc] Do not allow to use lora with chunked prefill. (#5538) SangBin Cho 2024-06-15 23:59:36 +09:00
81fbb3655f [CI/Build] Test both text and token IDs in batched OpenAI Completions API (#5568) Cyrus Leung 2024-06-15 19:29:42 +08:00

... 141 142 143 144 145 ...