Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

388596c914 [Misc][Utils] allow get_open_port to be called for multiple times (#5333) youkaichao 2024-06-06 22:15:11 -07:00
baa15a9ec3 [Feature][Frontend]: Add support for stream_options in ChatCompletionRequest (#5135) Itay Etelis 2024-06-07 06:29:24 +03:00
15063741e3 [Misc] Missing error message for custom ops import (#5282) Jie Fu (傅杰) 2024-06-07 11:17:21 +08:00
ccdc490dda [Core] Change LoRA embedding sharding to support loading methods (#5038) Antoni Baum 2024-06-06 19:07:57 -07:00
a31cab7556 [Core] Avoid copying prompt/output tokens if no penalties are used (#5289) Antoni Baum 2024-06-06 18:12:00 -07:00
828da0d44e [Frontend] enable passing multiple LoRA adapters at once to generate() (#5300) Matthew Goldey 2024-06-06 16:48:13 -04:00
abe855d637 [Kernel] Retune Mixtral 8x22b configs for FP8 on H100 (#5294) Philipp Moritz 2024-06-06 09:29:29 -07:00
4efff036f0 Bugfix: fix broken of download models from modelscope (#5233) liuyhwangyh 2024-06-07 00:28:10 +08:00
89c920785f [CI/Build] Update vision tests (#5307) Cyrus Leung 2024-06-06 18:17:18 +08:00
7b0a0dfb22 [Frontend][Core] Update Outlines Integration from FSM to Guide (#4109) Breno Faria 2024-06-06 01:49:12 +02:00
3a6ae1d33c [CI] Disable flash_attn backend for spec decode (#5286) Simon Mo 2024-06-05 17:49:27 -05:00
8f1729b829 [Docs] Add Ray Summit CFP (#5295) Simon Mo 2024-06-05 17:25:18 -05:00
6a7c7711a2 [Misc] Skip for logits_scale == 1.0 (#5291) Woosuk Kwon 2024-06-05 15:19:02 -07:00
0f83ddd4d7 [Bugfix][Frontend/Core] Don't log exception when AsyncLLMEngine gracefully shuts down. (#5290) Alex Wu 2024-06-05 15:18:12 -07:00
065aff6c16 [Bugfix] Make EngineArgs use named arguments for config construction (#5285) Michael Goin 2024-06-05 18:16:56 -04:00
3d33e372a1 [BugFix] Fix log message about default max model length (#5284) Nick Hill 2024-06-05 14:53:16 -07:00
faf71bcd4b [Speculative Decoding] Add ProposerWorkerBase abstract class (#5252) Nick Hill 2024-06-05 14:53:05 -07:00
f270a39537 [Docs] Add Sequoia as sponsors (#5287) Simon Mo 2024-06-05 13:02:56 -05:00
51a08e7d8f [Kernel] Re-tune Mixtral MoE configurations for FP8 on H100 (#5238) Philipp Moritz 2024-06-05 10:59:14 -07:00
eb8fcd2666 [BugFix] Apply get_cached_tokenizer to the tokenizer setter of LLM (#5207) DriverSong 2024-06-06 01:59:02 +08:00
5563a4dea8 [Model] Correct Mixtral FP8 checkpoint loading (#5231) Cody Yu 2024-06-05 10:58:50 -07:00
ccd4f129e8 [Kernel] Add GPU architecture guards to the CUTLASS w8a8 kernels to reduce binary size (#5157) Tyler Michael Smith 2024-06-05 13:44:15 -04:00
02cc3b51a7 [misc] benchmark_serving.py -- add ITL results and tweak TPOT results (#5263) Tyler Michael Smith 2024-06-05 13:17:51 -04:00
d5b1eb081e [CI] Add nightly benchmarks (#5260) Simon Mo 2024-06-05 11:42:08 -05:00
f0a500545f [Frontend] OpenAI API server: Add add_special_tokens to ChatCompletionRequest (default False) (#5278) tomeras91 2024-06-05 19:32:58 +03:00
c65146e75e [Misc] Fix docstring of get_attn_backend (#5271) Woosuk Kwon 2024-06-05 09:18:59 -07:00
41ca62cf03 [Misc] Add CustomOp interface for device portability (#5255) Woosuk Kwon 2024-06-05 09:18:19 -07:00
974fc9b845 [Bugfix] Fix prompt_logprobs when SamplingParams.detokenize is set to True (#5226) zifeitong 2024-06-04 19:37:28 -07:00
fee4dcc33a [Misc] update collect env (#5261) youkaichao 2024-06-04 15:29:09 -07:00
650a4cc55e [Misc] Add transformers version to collect_env.py (#5259) Michael Goin 2024-06-04 15:52:28 -04:00
9ca62d8668 [CI] mark AMD test as softfail to prevent blockage (#5256) Simon Mo 2024-06-04 13:34:53 -05:00
45c35f0d58 [CI/Build] Reducing CPU CI execution time (#5241) Li, Jiang 2024-06-05 01:26:40 +08:00
9ba093b4f4 [CI/Build] Simplify model loading for HfRunner (#5251) Cyrus Leung 2024-06-05 01:09:19 +08:00
27208be66e [Kernel] Add back batch size 1536 and 3072 to MoE tuning (#5242) Woosuk Kwon 2024-06-04 09:58:47 -07:00
87d5abef75 [Bugfix] Fix a bug caused by pip install setuptools>=49.4.0 for CPU backend (#5249) Jie Fu (傅杰) 2024-06-05 00:57:51 +08:00
ec784b2526 [CI/Build] Add inputs tests (#5215) Cyrus Leung 2024-06-04 12:01:46 +08:00
a58f24e590 [Bugfix] Fix torch.compile() error when using MultiprocessingGPUExecutor (#5229) zifeitong 2024-06-03 20:55:50 -07:00
f42a006b15 [Bugfix]: During testing, use pytest monkeypatch for safely overriding the env var that indicates the vLLM backend (#5210) afeldman-nm 2024-06-03 23:32:57 -04:00
3a434b07ed [Kernel] Enhance MoE benchmarking & tuning script (#4921) Woosuk Kwon 2024-06-03 20:06:59 -07:00
bd0e7802e0 [Bugfix] Add warmup for prefix caching example (#5235) Zhuohan Li 2024-06-03 19:36:41 -07:00
06b2550cbb [Bugfix] Support prompt_logprobs==0 (#5217) Toshiki Kataoka 2024-06-04 09:59:30 +09:00
f775a07e30 [FRONTEND] OpenAI tools support named functions (#5032) Breno Faria 2024-06-04 01:25:29 +02:00
4f0d17c05c New CI template on AWS stack (#5110) Kevin H. Luu 2024-06-03 16:16:43 -07:00
10c38e3e46 [Misc]: Implement CPU/GPU swapping in BlockManagerV2 (#3834) Kaiyang Chen 2024-06-04 04:37:11 +08:00
cafb8e06c5 [CI/BUILD] enable intel queue for longer CPU tests (#4113) Yuan 2024-06-04 01:39:50 +08:00
cbb2f59cc8 [Kernel] Pass a device pointer into the quantize kernel for the scales (#5159) Tyler Michael Smith 2024-06-03 12:52:30 -04:00
0ab278ca31 [Core] Remove unnecessary copies in flash attn backend (#5138) Antoni Baum 2024-06-03 09:39:31 -07:00
7a64d24aad [Core] Support image processor (#4197) Cyrus Leung 2024-06-03 13:56:41 +08:00
dfbe60dc62 [Misc] Simplify code and fix type annotations in conftest.py (#5118) Cyrus Leung 2024-06-03 07:05:50 +08:00
a66cf40b20 [Kernel][ROCm][AMD] enable fused topk_softmax kernel for moe layer (#4927) Divakar Verma 2024-06-02 16:13:26 -05:00
f790ad3c50 [Frontend][OpenAI] Support for returning max_model_len on /v1/models response (#4643) Avinash Raj 2024-06-02 13:36:13 +05:30
ed59a7ed23 Update test_ignore_eos (#4898) Simon Mo 2024-06-01 21:21:53 -05:00
044793d8df [BugFix] Prevent LLM.encode for non-generation Models (#5184) Robert Shaw 2024-06-01 19:35:41 -04:00
c2d6d2f960 [Bugfix]: Fix issues related to prefix caching example (#5177) (#5180) Daniil Arapov 2024-06-02 01:53:52 +03:00
8279078e21 [Bugfix] Remove deprecated @abstractproperty (#5174) Zhuohan Li 2024-06-01 15:40:25 -07:00
b9c0605a8e [Feature][Kernel] Support bitsandbytes quantization and QLoRA (#4776) chenqianfzh 2024-06-01 13:51:10 -07:00
37464a0f74 [Bugfix] Fix call to init_logger in openai server (#4765) Nadav Shmayovits 2024-06-01 20:18:50 +03:00
c354072828 [Minor] Fix the path typo in loader.py: save_sharded_states.py -> save_sharded_state.py (#5151) Ye Cao 2024-06-02 01:11:22 +08:00
f081c3ce4b [Kernel] Update Cutlass fp8 configs (#5144) Varun Sundar Rabindranath 2024-06-01 14:16:07 +05:30
260d119e86 [Kernel] Refactor CUTLASS kernels to always take scales that reside on the GPU (#5137) Tyler Michael Smith 2024-06-01 02:45:32 -04:00
a360ff80bb [CI/Build] CMakeLists: build all extensions' cmake targets at the same time (#5034) Daniele 2024-06-01 06:06:45 +02:00
1197e02141 [Build] Guard against older CUDA versions when building CUTLASS 3.x kernels (#5168) v0.4.3 Tyler Michael Smith 2024-05-31 20:21:38 -04:00
657579113f [Doc] Add checkmark for GPTBigCodeForCausalLM LoRA support (#5171) Nick Hill 2024-05-31 17:20:19 -07:00
e9899fb7a4 [Model] Enable FP8 QKV in MoE and refine kernel tuning script (#5039) Cody Yu 2024-05-31 14:29:19 -07:00
a377f0bd5e [Misc]: optimize eager mode host time (#4196) functionxu123 2024-05-31 13:14:50 +08:00
e9d3aa04f6 Revert "[Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5)" (#5149) Simon Mo 2024-05-31 00:00:26 -05:00
a22dea54d3 [Model] Support MAP-NEO model (#5081) SnowDist 2024-05-31 10:24:41 +08:00
533c217792 Fix cutlass sm_90a vesrion in CMakeList simon-mo 2024-05-31 02:13:01 +00:00
6d21fa1cad [Kernel] Marlin_24: Ensure the mma.sp instruction is using the ::ordered_metadata modifier (introduced with PTX 8.5) (#5136) Alexander Matveev 2024-05-30 22:02:11 -04:00
b35be5403f [Bugfix] Avoid Warnings in SparseML Activation Quantization (#5120) Robert Shaw 2024-05-30 17:04:37 -07:00
45a1a69b98 [Build] Disable sm_90a in cu11 (#5141) Simon Mo 2024-05-30 16:37:16 -05:00
87a658c812 Bump version to v0.4.3 (#5046) Simon Mo 2024-05-30 13:13:46 -05:00
429d89720e add doc about serving option on dstack (#3074) Chansung Park 2024-05-31 02:11:07 +09:00
a9bcc7afb2 [Doc] Use intersphinx and update entrypoints docs (#5125) Cyrus Leung 2024-05-31 00:59:23 +08:00
d79d9eaaff [Misc] remove duplicate definition of seq_lens_tensor in model_runner.py (#5129) Hyunsung Lee 2024-05-30 22:56:19 +09:00
f758505c73 [CI/Build] increase wheel size limit to 200 MB (#5130) youkaichao 2024-05-30 06:29:48 -07:00
d910816c73 [Bugfix] Automatically Detect SparseML models (#5119) Robert Shaw 2024-05-30 05:58:37 -07:00
87d41c849d [BUGFIX] [FRONTEND] Correct chat logprobs (#5029) Breno Faria 2024-05-30 11:52:14 +02:00
e07aff9e52 [CI/Build] Docker cleanup functionality for amd servers (#5112) omkar kakarparthi 2024-05-29 22:27:39 -05:00
5bf185a1c4 [Bugfix] gptq_marlin: Ensure g_idx_sort_indices is not a Parameter (#5108) Alexander Matveev 2024-05-29 20:30:18 -04:00
4fbcb0f27e [Doc][Build] update after removing vllm-nccl (#5103) youkaichao 2024-05-29 16:51:18 -07:00
7c3604fb68 [Bugfix] logprobs is not compatible with the OpenAI spec #4795 (#5031) Itay Etelis 2024-05-30 02:13:22 +03:00
b1c255630d [Core] Avoid the need to pass None values to Sequence.inputs (#5099) Cyrus Leung 2024-05-30 07:05:01 +08:00
eb6c50cdc2 [Bugfix][CI/Build] Fix codespell failing to skip files in git diff (#5097) Cyrus Leung 2024-05-30 07:02:54 +08:00
eecd864388 [Bugfix][CI/Build] Fix test and improve code for merge_async_iterators (#5096) Cyrus Leung 2024-05-30 07:02:25 +08:00
ae495c74ea [Doc]Replace deprecated flag in readme (#4526) Ronen Schaffer 2024-05-30 01:26:33 +03:00
4238bc82f2 [Core] Cross-attention KV caching and memory-management (towards eventual encoder/decoder model support) (#4837) afeldman-nm 2024-05-29 12:09:13 -04:00
594392d27a [Core][Distributed] improve p2p access check (#4992) youkaichao 2024-05-29 04:29:07 -07:00
18c1f16d86 [Bugfix] Fix arguments passed to Sequence in stop checker test (#5092) Cyrus Leung 2024-05-29 15:16:41 +08:00
5bd3c65072 [Core][Optimization] remove vllm-nccl (#5091) youkaichao 2024-05-28 22:13:52 -07:00
616e600e0b [Misc] add gpu_memory_utilization arg (#5079) Marut Pandya 2024-05-28 17:16:18 -07:00
dfba529b40 [Bugfix] Remove the last EOS token unless explicitly specified (#5077) Junichi Sato 2024-05-29 09:15:35 +09:00
5ae5ed1e60 [Core] Consolidate prompt arguments to LLM engines (#4328) Cyrus Leung 2024-05-29 04:29:31 +08:00
290f4ada2b [Docs] Add Dropbox as sponsors (#5089) Simon Mo 2024-05-28 12:29:09 -05:00
dd8de11f0a [Kernel][ROCm][AMD] Add fused_moe Triton configs for MI300X (#4951) Divakar Verma 2024-05-28 11:03:23 -05:00
9ba415588a [BugFix] Fix Embedding Models with TP>1 (#5075) Robert Shaw 2024-05-28 08:32:42 -07:00
d4f3985907 [Core] Sliding window for block manager v2 (#4545) Michał Moskal 2024-05-27 19:07:07 -07:00
890aa93d27 [Model] Add support for falcon-11B (#5069) Isotr0py 2024-05-28 07:41:43 +08:00
fbdb7b3ee2 [Core] Allow AQLM on Pascal (#5058) sasha0552 2024-05-27 22:26:14 +00:00
1102bef219 [Bugfix / Core] Prefix Caching Guards (merged with main) (#4846) Zhuohan Li 2024-05-27 15:18:17 -07:00

... 143 144 145 146 147 ...