Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

0e9164b40a [mypy] Enable type checking for test directory (#5017) Cyrus Leung 2024-06-15 12:45:31 +08:00
1b8a0d71cf [Core][Bugfix]: fix prefix caching for blockv2 (#5364) leiwen83 2024-06-15 08:23:56 +08:00
bd7efe95d0 Add ccache to amd (#5555) Simon Mo 2024-06-14 19:18:22 -05:00
f5bb85b435 [Core][Distributed] improve p2p cache generation (#5528) youkaichao 2024-06-14 14:47:45 -07:00
28c145eb57 [Bugfix] Fix typo in Pallas backend (#5558) Woosuk Kwon 2024-06-14 14:40:09 -07:00
e2afb03c92 [Bugfix] Enable loading FP8 checkpoints for gpt_bigcode models (#5460) Thomas Parnell 2024-06-14 22:28:11 +02:00
6e2527a7cb [Doc] Update documentation on Tensorizer (#5471) Sanger Steel 2024-06-14 14:27:57 -04:00
cdab68dcdb [Docs] Add ZhenFund as a Sponsor (#5548) Simon Mo 2024-06-14 13:17:21 -05:00
d1c3d7d139 [misc][distributed] fix benign error in is_in_the_same_node (#5512) youkaichao 2024-06-14 10:59:28 -07:00
77490c6f2f [Core] Remove duplicate processing in async engine (#5525) Cyrus Leung 2024-06-15 01:04:42 +08:00
48f589e18b [mis] fix flaky test of test_cuda_device_count_stateless (#5546) youkaichao 2024-06-14 10:02:23 -07:00
348616ac4b [Kernel] Suppress mma.sp warning on CUDA 12.5 and later (#5401) Tyler Michael Smith 2024-06-14 13:02:00 -04:00
15985680e2 [ Misc ] Rs/compressed tensors cleanup (#5432) Robert Shaw 2024-06-14 13:01:46 -04:00
d74674bbd9 [Misc] Fix arg names (#5524) Allen.Dou 2024-06-15 00:47:44 +08:00
703475f6c2 [Kernel] Fix CUTLASS 3.x custom broadcast load epilogue (#5516) Tyler Michael Smith 2024-06-14 12:30:15 -04:00
d47af2bc02 [CI/Build] Disable LLaVA-NeXT CPU test (#5529) Cyrus Leung 2024-06-15 00:27:30 +08:00
319ad7f1d3 [CI/Build][Misc] Add CI that benchmarks vllm performance on those PRs with perf-benchmarks label (#5073) Kuntai Du 2024-06-13 22:36:20 -07:00
0f0d8bc065 bump version to v0.5.0.post1 (#5522) Simon Mo 2024-06-13 21:42:06 -05:00
55d6361b13 [Misc] Fix arg names in quantizer script (#5507) Allen.Dou 2024-06-14 10:02:53 +08:00
cd9c0d65d9 [Hardware][Intel] Support CPU inference with AVX2 ISA (#5452) Jie Fu (傅杰) 2024-06-14 07:22:24 +08:00
50eed24d25 Add cuda_device_count_stateless (#5473) v0.5.0.post1 Antoni Baum 2024-06-13 16:06:49 -07:00
e38042d4af [Kernel] Disable CUTLASS kernels for fp8 (#5505) Tyler Michael Smith 2024-06-13 16:38:05 -04:00
33e3b37242 [CI/Build] Disable test_fp8.py (#5508) Tyler Michael Smith 2024-06-13 16:37:48 -04:00
1696efe6c9 [misc] fix format.sh (#5511) youkaichao 2024-06-13 12:09:16 -07:00
6b0511a57b Revert "[Core] Remove unnecessary copies in flash attn backend" (#5478) Antoni Baum 2024-06-13 11:22:50 -07:00
a8fda4f661 Seperate dev requirements into lint and test (#5474) Antoni Baum 2024-06-13 11:22:41 -07:00
30299a41fa [MISC] Remove FP8 warning (#5472) Cody Yu 2024-06-13 11:22:30 -07:00
85657b5607 [Kernel] Factor out epilogues from cutlass kernels (#5391) Tyler Michael Smith 2024-06-13 14:22:19 -04:00
0ce7b952f8 [Doc] Update LLaVA docs (#5437) Cyrus Leung 2024-06-14 02:22:07 +08:00
39873476f8 [CI/Build] Simplify OpenAI server setup in tests (#5100) Cyrus Leung 2024-06-14 02:21:53 +08:00
03dccc886e [Misc] Add vLLM version getter to utils (#5098) Cyrus Leung 2024-06-14 02:21:39 +08:00
a65634d3ae [Docs] Add 4th meetup slides (#5509) Woosuk Kwon 2024-06-13 10:18:26 -07:00
80aa7e91fc [Hardware][Intel] Optimize CPU backend and add more performance tips (#4971) Li, Jiang 2024-06-14 00:33:14 +08:00
bd43973522 [Kernel] Tune Qwen2MoE kernel configurations with tp2,4 (#5497) wenyujin333 2024-06-14 00:01:10 +08:00
23ec72fa03 [CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations (#5466) Michael Goin 2024-06-13 11:18:08 -04:00
c2637a613b [Kernel] w4a16 support for compressed-tensors (#5385) Dipika Sikka 2024-06-13 10:19:56 -04:00
88407532e7 [Bugfix]if the content is started with ":"(response of ping), client should i… (#5303) Wang, Yi 2024-06-13 11:16:41 +08:00
916d219d62 [ci] Use sccache to build images (#5419) Kevin H. Luu 2024-06-12 17:58:12 -07:00
ea3890a5f0 [Core][Distributed] code deduplication in tp&pp with coordinator(#5293) youkaichao 2024-06-12 17:27:08 -07:00
2135cacb45 [Bugfix] Fix wrong multi_modal_input format for CPU runner (#5451) Isotr0py 2024-06-13 07:20:18 +08:00
7d19de2e9c [Frontend] Add "input speed" to tqdm postfix alongside output speed (#5425) Michael Goin 2024-06-12 18:42:12 -04:00
94a07bbdd8 [Bugfix] Fix typo in scheduler.py (requeset -> request) (#5470) Michael Goin 2024-06-12 17:59:44 -04:00
b8d4dfff9c [Doc] Update debug docs (#5438) Cyrus Leung 2024-06-13 05:49:31 +08:00
622d45128c [misc] add hint for AttributeError (#5462) youkaichao 2024-06-12 14:46:35 -07:00
51602eefd3 [Frontend] [Core] Support for sharded tensorized models (#4990) Travis Johnson 2024-06-12 15:13:52 -06:00
5cc50a531f [Bugfix] TYPE_CHECKING for MultiModalData (#5444) Arthur Kim 2024-06-13 06:08:52 +09:00
5985e3427d [Kernel] Vectorized FP8 quantize kernel (#5396) Cody Yu 2024-06-12 14:07:26 -07:00
8b82a89997 [ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests (#5464) Kevin H. Luu 2024-06-12 14:00:18 -07:00
c3c2903e72 [Bugfix] Add device assertion to TorchSDPA (#5402) Li, Jiang 2024-06-13 03:58:53 +08:00
1a8bfd92d5 [Hardware] Initial TPU integration (#5292) Woosuk Kwon 2024-06-12 11:53:03 -07:00
847cdcca1c [CI] Upgrade codespell version. (#5381) SangBin Cho 2024-06-13 02:06:14 +09:00
e3c12bf6d2 Revert "[CI/Build] Add is_quant_method_supported to control quantization test configurations" (#5463) Simon Mo 2024-06-12 12:03:24 -05:00
3dd6853bc8 [CI/Build] Add is_quant_method_supported to control quantization test configurations (#5253) Michael Goin 2024-06-12 12:58:02 -04:00
8f89d72090 [Doc] add common case for long waiting time (#5430) v0.5.0 youkaichao 2024-06-11 11:12:13 -07:00
99dac099ab [Core][Doc] Default to multiprocessing for single-node distributed case (#5230) Nick Hill 2024-06-11 11:10:41 -07:00
c4bd03c7c5 [Core][Distributed] add same-node detection (#5369) youkaichao 2024-06-11 10:53:59 -07:00
dcbf4286af [Frontend] Customizable RoPE theta (#5197) sasha0552 2024-06-11 17:42:26 +00:00
00e6a2dc53 [Bugfix] fix lora_dtype value type in arg_utils.py (#5398) Ali Panahi 2024-06-11 10:40:23 -07:00
2e02311a1b [Bugfix] Fix MultiprocessingGPUExecutor.check_health when world_size == 1 (#5254) Junichi Sato 2024-06-12 02:38:07 +09:00
89ec06c33b [Docs] [Spec decode] Fix docs error in code example (#5427) Cade Daniel 2024-06-11 10:31:56 -07:00
9fde251bf0 [Doc] Add an automatic prefix caching section in vllm documentation (#5324) Kuntai Du 2024-06-11 10:24:59 -07:00
4c2ffb28ff [Speculative decoding] Initial spec decode docs (#5400) Cade Daniel 2024-06-11 10:15:40 -07:00
246598a6b1 [CI] docfix (#5410) SangBin Cho 2024-06-11 17:28:50 +09:00
8bab4959be [Misc] Remove VLLM_BUILD_WITH_NEURON env variable (#5389) Woosuk Kwon 2024-06-11 00:37:56 -07:00
3c4cebf751 [Doc][Typo] Fixing Missing Comma (#5403) Roger Wang 2024-06-11 00:20:28 -07:00
d8f31f2f8b [Doc] add debugging tips (#5409) youkaichao 2024-06-10 23:21:43 -07:00
640052b069 [Bugfix][Frontend] Cleanup "fix chat logprobs" (#5026) Cyrus Leung 2024-06-11 13:36:46 +08:00
351d5e7b82 [Bugfix] OpenAI entrypoint limits logprobs while ignoring server defined --max-logprobs (#5312) maor-ps 2024-06-11 05:30:31 +03:00
a008629807 [Misc] Various simplifications and typing fixes (#5368) Nick Hill 2024-06-10 19:29:02 -07:00
76477a93b7 [ci] Fix Buildkite agent path (#5392) Kevin H. Luu 2024-06-10 18:58:07 -07:00
77c87beb06 [Doc] Add documentation for FP8 W8A8 (#5388) Michael Goin 2024-06-10 20:55:12 -04:00
114332b88e Bump version to v0.5.0 (#5384) Simon Mo 2024-06-10 17:56:06 -05:00
cb77ad836f [Docs] Alphabetically sort sponsors (#5386) Woosuk Kwon 2024-06-10 13:17:19 -07:00
856c990041 [Docs] Add Docs on Limitations of VLM Support (#5383) Roger Wang 2024-06-10 09:53:50 -07:00
c5602f0baa [ci] Mount buildkite agent on Docker container to upload benchmark results (#5330) Kevin H. Luu 2024-06-10 09:22:34 -07:00
f7f9c5f97b [ci] Use small_cpu_queue for doc build (#5331) Kevin H. Luu 2024-06-10 09:21:11 -07:00
2c0d933594 [Bugfix] Fix LLaVA-NeXT (#5380) Cyrus Leung 2024-06-10 23:38:47 +08:00
774d1035e4 [Feature][Frontend]: Continued stream_options implementation also in CompletionRequest (#5319) Itay Etelis 2024-06-10 17:22:09 +03:00
6b29d6fe70 [Model] Initial support for LLaVA-NeXT (#4199) Cyrus Leung 2024-06-10 20:47:15 +08:00
0bfa1c4f13 [Misc] Improve error message when LoRA parsing fails (#5194) Cyrus Leung 2024-06-10 19:38:49 +08:00
c81da5f56d [misc][typo] fix typo (#5372) youkaichao 2024-06-10 02:51:02 -07:00
68bc81703e [Frontend][Misc] Enforce Pixel Values as Input Type for VLMs in API Server (#5374) Roger Wang 2024-06-10 02:13:39 -07:00
5884c2b454 [Misc] Update to comply with the new compressed-tensors config (#5350) Dipika Sikka 2024-06-09 23:49:46 -04:00
45f92c00cf [Bugfix] Fix KeyError: 1 When Using LoRA adapters (#5164) Bla_ckB 2024-06-10 06:23:14 +07:00
5467ac3196 [Kernel][Misc] Use TORCH_LIBRARY instead of PYBIND11_MODULE for custom ops (#5047) bnellnm 2024-06-09 16:23:30 -04:00
5d7e3d0176 [mis][ci/test] fix flaky test in test_sharded_state_loader.py (#5361) youkaichao 2024-06-08 20:50:14 -07:00
0373e1837e [Core][CUDA Graph] add output buffer for cudagraph (#5074) youkaichao 2024-06-08 19:14:43 -07:00
c09dade2a2 [Misc][Breaking] Change FP8 checkpoint format from act_scale -> input_scale (#5353) Michael Goin 2024-06-08 13:54:05 -04:00
8ea5e44a43 [CI/Test] improve robustness of test (vllm_runner) (#5357) youkaichao 2024-06-08 01:59:20 -07:00
9fb900f90c [CI/Test] improve robustness of test (hf_runner) (#5347) youkaichao 2024-06-07 22:31:32 -07:00
c96fc06747 [ROCm][AMD] Use pytorch sdpa math backend to do naive attention (#4965) Hongxia Yang 2024-06-07 22:13:12 -04:00
b3376e5c76 [Misc] Add args for selecting distributed executor to benchmarks (#5335) Benjamin Kitor 2024-06-07 18:20:16 -07:00
e69ded7d1c [Bug Fix] Fix the support check for FP8 CUTLASS (#5352) Cheng Li 2024-06-07 17:42:05 -07:00
767c727a81 fix DbrxFusedNormAttention missing cache_config (#5340) Calvinn Ng 2024-06-08 05:10:21 +08:00
6840a71610 [Misc] Remove unused cuda_utils.h in CPU backend (#5345) Jie Fu (傅杰) 2024-06-08 05:09:13 +08:00
7a9cb294ae [Frontend] Add OpenAI Vision API Support (#5237) Roger Wang 2024-06-07 11:23:32 -07:00
ca3ea51bde [Kernel] Dynamic Per-Token Activation Quantization (#5037) Dipika Sikka 2024-06-07 12:36:26 -04:00
dc49fb892c Addition of lacked ignored_seq_groups in _schedule_chunked_prefill (#5296) limingshu 2024-06-07 21:35:42 +08:00
18a277b52d Remove Ray health check (#4693) Antoni Baum 2024-06-07 03:01:56 -07:00
8d75fe48ca [Kernel] Switch fp8 layers to use the CUTLASS kernels (#5183) Tyler Michael Smith 2024-06-07 04:42:35 -04:00

... 142 143 144 145 146 ...