Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

ebce310b74 [Model] Snowflake arctic model implementation (#4652) Hao Zhang 2024-05-09 15:37:14 -07:00
be0c5180ac [Bugfix] Add logs for all model dtype casting (#4717) Michael Goin 2024-05-09 14:36:25 -04:00
cea64430f6 [Bugfix] Update grafana.json (#4711) Robert Shaw 2024-05-09 11:10:13 -06:00
a3c124570a [Bugfix] Fix CLI arguments in OpenAI server docs (#4709) Cyrus Leung 2024-05-10 00:53:14 +08:00
ff5abcd746 [ROCm] Add support for Punica kernels on AMD GPUs (#3140) kliuae 2024-05-10 00:19:50 +08:00
0ee535b294 [Misc] Set block size at initialization & Fix test_model_runner (#4705) Woosuk Kwon 2024-05-09 09:04:59 -07:00
190bc838e1 [Misc] Remove unnecessary ModelRunner imports (#4703) Woosuk Kwon 2024-05-09 00:17:17 -07:00
f12b20decc [Frontend] Move async logic outside of constructor (#4674) Cyrus Leung 2024-05-09 13:48:33 +08:00
16bc0a098f [Frontend] add tok/s speed metric to llm class when using tqdm (#4400) Mahmoud Ashraf 2024-05-09 08:02:31 +03:00
e288df0632 [Bugfix] Fine-tune gptq_marlin configs to be more similar to marlin (#4626) alexm-nm 2024-05-08 20:14:31 -04:00
8b9241be3a [Speculative decoding] [Bugfix] Fix overallocation in ngram + spec logprobs (#4672) Cade Daniel 2024-05-08 16:24:46 -07:00
f942efb5a3 [Dynamic Spec Decoding] Auto-disable by the running queue size (#4592) Cody Yu 2024-05-08 14:44:00 -07:00
89579a201f [Misc] Use vllm-flash-attn instead of flash-attn (#4686) Woosuk Kwon 2024-05-08 13:15:34 -07:00
230c4b38c1 [CI/Test] fix swap test for multi gpu (#4689) youkaichao 2024-05-08 13:14:02 -07:00
20cfcdec99 [Core][Optimization] change python dict to pytorch tensor for blocks to swap (#4659) youkaichao 2024-05-08 12:07:05 -07:00
ad932a221d [Core] Faster startup for LoRA enabled models (#4634) Antoni Baum 2024-05-08 10:33:18 -07:00
5510cf0e8a [Misc] Add get_name method to attention backends (#4685) Woosuk Kwon 2024-05-08 09:59:31 -07:00
0f9a6e3d22 [Bugfix][Kernel] allow non-power-of-2 for prefix prefill with alibi (#4573) DefTruth 2024-05-09 00:19:58 +08:00
f6a593093a [CI] Make mistral tests pass (#4596) SangBin Cho 2024-05-09 00:44:35 +09:00
d7740ea4dc [Core] Optimize sampler get_logprobs (#4594) SangBin Cho 2024-05-09 00:42:28 +09:00
cc466a3290 [Core][Distributed] support cpu&device in broadcast tensor dict (#4660) youkaichao 2024-05-07 19:34:47 -07:00
8344f7742b [Bug fix][Core] fixup ngram not setup correctly (#4551) leiwen83 2024-05-08 02:40:18 +08:00
469f85c782 [Core][Optimization] change copy-on-write from dict[int, list] to list (#4648) youkaichao 2024-05-07 11:06:32 -07:00
10760da800 [Bugfix] Fixed error in slice_lora_b for MergedQKVParallelLinearWithLora (#4609) Austin Veselka 2024-05-07 12:59:07 -05:00
478aed5827 [Build/CI] Fixing 'docker run' to re-enable AMD CI tests. (#4642) Alexei-V-Ivanov-AMD 2024-05-07 11:23:17 -05:00
63575bc2e1 [Core][Optimization] change python dict to pytorch tensor (#4607) youkaichao 2024-05-06 21:30:27 -07:00
a98187cf72 [Kernel] Make static FP8 scaling more robust (#4570) Philipp Moritz 2024-05-06 17:39:28 -07:00
bd99d22629 Update lm-format-enforcer to 0.10.1 (#4631) Noam Gat 2024-05-07 02:51:59 +03:00
19cb4716ee [CI] Add retry for agent lost (#4633) Cade Daniel 2024-05-06 16:18:57 -07:00
e186d37cb1 [CI] use ccache actions properly in release workflow (#4629) Simon Mo 2024-05-06 15:23:36 -07:00
323f27b904 [Bugfix] Fix asyncio.Task not being subscriptable (#4623) Cyrus Leung 2024-05-07 00:31:05 +08:00
0650e5935b Disable cuda version check in vllm-openai image (#4530) zhaoyang-star 2024-05-06 07:58:55 +08:00
c7f2cf2b7f [CI] Reduce wheel size by not shipping debug symbols (#4602) v0.4.2 Simon Mo 2024-05-04 21:28:58 -07:00
8d8357c8ed bump version to v0.4.2 (#4600) Simon Mo 2024-05-04 17:09:49 -07:00
4302987069 [Bugfix] Fix inappropriate content of model_name tag in Prometheus metrics (#3937) DearPlanet 2024-05-05 06:39:34 +08:00
021b1a2ab7 [CI] check size of the wheels (#4319) Simon Mo 2024-05-04 13:44:36 -07:00
2a052011ca [Kernel] Support MoE Fp8 Checkpoints for Mixtral (Static Weights with Dynamic/Static Activations) (#4527) Michael Goin 2024-05-04 14:45:16 -04:00
36fb68f947 [Doc] Chunked Prefill Documentation (#4580) SangBin Cho 2024-05-04 16:18:00 +09:00
bc8ad68455 [Misc][Refactor] Introduce ExecuteModelData (#4540) Cody Yu 2024-05-03 17:47:07 -07:00
344bf7cd2d [Misc] add installation time env vars (#4574) youkaichao 2024-05-03 15:55:56 -07:00
ab50275111 [Speculative decoding] Support target-model logprobs (#4378) Cade Daniel 2024-05-03 15:52:01 -07:00
43c413ec57 [Kernel] Use flashinfer for decoding (#4353) Lily Liu 2024-05-03 15:51:27 -07:00
f8e7adda21 Fix/async chat serving (#2727) Sebastian Schoennenbeck 2024-05-03 20:04:14 +02:00
7e65477e5e [Bugfix] Allow "None" or "" to be passed to CLI for string args that default to None (#4586) Michael Goin 2024-05-03 13:32:21 -04:00
3521ba4f25 [Core][Model runner refactoring 1/N] Refactor attn metadata term (#4518) SangBin Cho 2024-05-04 02:20:12 +09:00
2d7bce9cd5 [Doc] add env vars to the doc (#4572) youkaichao 2024-05-02 22:13:49 -07:00
ce3f1eedf8 [Misc] remove chunk detected debug logs (#4571) DefTruth 2024-05-03 12:48:08 +08:00
808632d3b4 [BugFix] Prevent the task of _force_log from being garbage collected (#4567) Yang, Bo 2024-05-02 18:35:18 -07:00
344a5d0c33 [Core][Distributed] enable allreduce for multiple tp groups (#4566) youkaichao 2024-05-02 17:32:33 -07:00
0f8a91401c [Core] Ignore infeasible swap requests. (#4557) SangBin Cho 2024-05-03 06:31:20 +09:00
9b5c9f9484 [CI/Build] AMD CI pipeline with extended set of tests. (#4267) Alexei-V-Ivanov-AMD 2024-05-02 14:29:07 -05:00
32881f3f31 [kernel] fix sliding window in prefix prefill Triton kernel (#4405) Michał Moskal 2024-05-02 11:23:37 -07:00
5b8a7c1cb0 [Misc] centralize all usage of environment variables (#4548) youkaichao 2024-05-02 11:13:25 -07:00
1ff0c73a79 [BugFix] Include target-device specific requirements.txt in sdist (#4559) Mark McLoughlin 2024-05-02 18:52:51 +01:00
5ad60b0cbd [Misc] Exclude the tests directory from being packaged (#4552) Hu Dong 2024-05-03 01:50:25 +08:00
fb087af52e [mypy][7/N] Cover all directories (#4555) SangBin Cho 2024-05-03 02:47:41 +09:00
7038e8b803 [Kernel] Support running GPTQ 8-bit models in Marlin (#4533) alexm-nm 2024-05-02 12:56:22 -04:00
2a85f93007 [Core][Distributed] enable multiple tp group (#4512) youkaichao 2024-05-01 21:28:21 -07:00
cf8cac8c70 [mypy][6/N] Fix all the core subdirectory typing (#4450) SangBin Cho 2024-05-02 12:01:00 +09:00
5e401bce17 [CI]Add regression tests to ensure the async engine generates metrics (#4524) Ronen Schaffer 2024-05-02 05:57:12 +03:00
0d62fe58db [Bug fix][Core] assert num_new_tokens == 1 fails when SamplingParams.n is not 1 and max_tokens is large & Add tests for preemption (#4451) SangBin Cho 2024-05-02 11:24:13 +09:00
b8afa8b95a [MISC] Rework logger to enable pythonic custom logging configuration to be provided (#4273) Danny Guinther 2024-05-01 20:34:40 -04:00
826b82a260 [Misc] Fix expert_ids shape in MoE (#4517) Woosuk Kwon 2024-05-01 16:47:59 -07:00
c9d852d601 [Misc] Remove Mixtral device="cuda" declarations (#4543) Philipp Moritz 2024-05-01 16:30:52 -07:00
6ef09b08f8 [Core][Distributed] fix pynccl del error (#4508) youkaichao 2024-05-01 15:23:06 -07:00
3a922c1e7e [Bugfix][Core] Fix and refactor logging stats (#4336) Roy 2024-05-02 04:08:14 +08:00
c47ba4aaa9 [Bugfix] Add validation for seed (#4529) sasha0552 2024-05-01 19:31:22 +00:00
24bb4fe432 [Kernel] Update fused_moe tuning script for FP8 (#4457) Philipp Moritz 2024-05-01 11:47:38 -07:00
a657bfc48a [Core] Add multiproc_worker_utils for multiprocessing-based workers (#4357) Nick Hill 2024-05-01 11:41:59 -07:00
24750f4cad [Core] Enable prefix caching with block manager v2 enabled (#4142) leiwen83 2024-05-02 02:20:32 +08:00
b38e42fbca [Speculative decoding] Add ngram prompt lookup decoding (#4237) leiwen83 2024-05-02 02:13:03 +08:00
8b798eec75 [CI/Build][Bugfix] VLLM_USE_PRECOMPILED should skip compilation (#4534) Travis Johnson 2024-05-01 12:01:50 -06:00
69909126a7 [Bugfix] Use random seed if seed is -1 (#4531) sasha0552 2024-05-01 17:41:17 +00:00
e491c7e053 [Doc] update(example model): for OpenAI compatible serving (#4503) Frαnçois 2024-05-01 19:14:16 +02:00
4dc8026d86 [Bugfix] Fix 307 Redirect for /metrics (#4523) Robert Shaw 2024-05-01 12:14:13 -04:00
a88bb9b032 [Bugfix] Fix the fp8 kv_cache check error that occurs when failing to obtain the CUDA version. (#4173) AnyISalIn 2024-05-02 00:11:03 +08:00
6f1df80436 [Test] Add ignore_eos test (#4519) SangBin Cho 2024-05-01 21:45:42 +09:00
d6f4bd7cdd [Misc]Add customized information for models (#4132) Jee Li 2024-05-01 12:18:14 +08:00
c3845d82dc Allow user to define whitespace pattern for outlines (#4305) Robert Caulk 2024-05-01 05:48:39 +02:00
a822eb3413 [Misc] fix typo in block manager (#4453) Pastel！ 2024-05-01 11:41:32 +08:00
f458112e8a [Misc][Typo] type annotation fix (#4495) harrywu 2024-05-01 11:21:39 +08:00
2e240c69a9 [Core] Centralize GPU Worker construction (#4419) Nick Hill 2024-04-30 18:06:34 -07:00
ee37328da0 Unable to find Punica extension issue during source code installation (#4494) fuchen.ljl 2024-05-01 08:42:09 +08:00
6ad58f42c5 fix_tokenizer_snapshot_download_bug (#4493) fuchen.ljl 2024-05-01 07:38:50 +08:00
dd1a50a8bc [Bugfix][Minor] Make ignore_eos effective (#4468) Li, Jiang 2024-05-01 07:33:33 +08:00
715c2d854d [Frontend] [Core] Tensorizer: support dynamic num_readers, update version (#4467) Alpay Ariyak 2024-04-30 19:32:13 -04:00
a494140433 [Frontend] Support complex message content for chat completions endpoint (#3467) Florian Greinacher 2024-05-01 01:28:46 +02:00
111815d482 [Kernel] Support Fp8 Checkpoints (Dynamic + Static) (#4332) Robert Shaw 2024-04-30 17:46:12 -04:00
b31a1fb63c [Doc] add visualization for multi-stage dockerfile (#4456) Prashant Gupta 2024-04-30 10:41:59 -07:00
4bb53e2dde [BugFix] fix num_lookahead_slots missing in async executor (#4165) leiwen83 2024-05-01 01:12:59 +08:00
26f2fb5113 [Core]Refactor gptq_marlin ops (#4466) Kunshang Ji 2024-04-30 12:14:47 +00:00
fa32207842 [Bugfix][Kernel] Fix compute_type for MoE kernel (#4463) Woosuk Kwon 2024-04-29 22:05:40 -07:00
d627a3d837 [Misc] Upgrade to torch==2.3.0 (#4454) Michael Goin 2024-04-29 20:05:47 -04:00
f4f921b7f1 [Core][Distributed] use cpu group to broadcast metadata in cpu (#4444) youkaichao 2024-04-29 13:52:22 -07:00
ac5ccf0156 [CI] hotfix: soft fail neuron test (#4458) Simon Mo 2024-04-29 12:50:01 -07:00
73c8d677e5 [Kernel] Marlin Expansion: Support AutoGPTQ Models with Marlin (#3922) Robert Shaw 2024-04-29 12:35:34 -04:00
df29793dc7 [mypy][5/N] Support all typing on model executor (#4427) SangBin Cho 2024-04-29 11:01:26 +09:00
03dd7d52bf [CI] clean docker cache for neuron (#4441) Simon Mo 2024-04-28 16:32:07 -07:00
bf480c5302 Add more Prometheus metrics (#2764) Ronen Schaffer 2024-04-29 01:59:33 +03:00
9c7306ac11 [Misc] fix typo in llm_engine init logging (#4428) DefTruth 2024-04-28 18:58:30 +08:00

... 145 146 147 148 149 ...