Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

308cc5e21e [ci] fix slow tests (#10698) youkaichao 2024-11-27 09:26:14 -08:00
9e0a147d50 [V1] Update interface for mistral-format Pixtral (#10703) Roger Wang 2024-11-27 04:26:27 -08:00
418cb3b93f [Bugfix][Hardware][CPU] Fix intel-omp version to avoid segfault (#10700) Li, Jiang 2024-11-27 19:55:38 +08:00
1209261e93 [Model] Support telechat2 (#10311) shunxing12345 2024-11-27 19:32:35 +08:00
e2251109c7 [Kernel] Remove if-else with identical branches in marlin 2:4 (#10687) Tyler Michael Smith 2024-11-27 01:55:32 -05:00
15cc2a9f1a [Misc]Further reduce BNB static variable (#10597) Jee Jee Li 2024-11-27 14:54:12 +08:00
e85250b1d1 [Hardware][Gaudi]add get_name method for HPUAttentionBackend (#10667) Kunshang Ji 2024-11-27 14:49:40 +08:00
cfb3bf25fb [bugfix] fix the default value of llm_int8_threshold in BitsAndBytesConfig (#10657) yansh97 2024-11-27 13:55:23 +08:00
1bf905ddaa [Bugfix][SpecDecode] apply sampling parameters to target probabilities for consistency in rejection sampling. (#10198) jeongin601 2024-11-27 14:07:30 +09:00
0a4d968500 [V1] Update interface for idefics3 (#10680) Roger Wang 2024-11-26 18:04:01 -08:00
0a71900bc9 Remove hard-dependencies of Speculative decode to CUDA workers (#10587) Chendi.Xue 2024-11-26 19:57:11 -06:00
2f0a0a17a4 [V1] Refactor model executable interface for multimodal models (#10570) Roger Wang 2024-11-26 12:46:11 -08:00
7576cd38df [Bugfix] Check bnb_4bit_quant_storage for bitsandbytes (#10642) Michael Goin 2024-11-26 15:29:00 -05:00
9a99273b48 [Bugfix] Fix using -O[0,3] with LLM entrypoint (#10677) Michael Goin 2024-11-26 13:44:01 -05:00
f5792c7c4a [Hardware][NVIDIA] Add non-NVML CUDA mode for Jetson (#9735) Conroy Cheers 2024-11-27 05:26:28 +11:00
db66e018ea [Bugfix] Fix for Spec model TP + Chunked Prefill (#10232) Murali Andoorveedu 2024-11-26 09:11:16 -08:00
1f6584ee85 [V1] Enable profile for LLMEngine (#10665) Kunshang Ji 2024-11-26 18:36:45 +08:00
334d64d1e8 [ci] add vllm_test_utils (#10659) youkaichao 2024-11-26 00:20:04 -08:00
940635343a [Misc] Remove outdated init protocols (#10655) Cyrus Leung 2024-11-26 14:55:00 +08:00
9a88f89799 custom allreduce + torch.compile (#10121) Sage Moore 2024-11-26 00:00:16 -06:00
519e8e4182 [v1] EngineArgs for better config handling for v1 (#10382) Ricky Xu 2024-11-25 21:09:43 -08:00
a6760f6456 [Feature] vLLM ARM Enablement for AARCH64 CPUs (#9228) Sanket Kale 2024-11-26 08:02:39 +05:30
45ac4ff270 [bugfix] fix aria model and add torch.compile (#10645) youkaichao 2024-11-25 18:32:09 -08:00
6e9ff050c8 [misc] do not read HOST_IP (#10644) youkaichao 2024-11-25 17:04:50 -08:00
9db713a1dc [Model] Add OLMo November 2024 model (#10503) Shane A 2024-11-25 14:26:40 -08:00
1b583cfefa [Doc] Fix typos in docs (#10636) Cyrus Leung 2024-11-26 02:15:45 +08:00
cf73f0c95e [Model] Enable optional prefix when loading embedding models (#10639) Cyrus Leung 2024-11-26 02:14:33 +08:00
b1d920531f [Model]: Add support for Aria model (#10514) zhou fan 2024-11-26 02:10:55 +08:00
452a4e80c3 [Docs] Add Snowflake Slides (#10641) Simon Mo 2024-11-25 09:34:46 -08:00
c27df94e1f [Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850) Wallas Henrique 2024-11-25 14:23:32 -03:00
d04b13a380 [Bug]: Authorization ignored when root_path is set (#10606) Chauncey 2024-11-26 00:21:41 +08:00
2b0879bfc2 Super tiny little typo fix (#10633) fzyzcjy 2024-11-25 21:08:30 +08:00
ed46f14321 [Model] Support is_causal HF config field for Qwen2 model (#10621) Cyrus Leung 2024-11-25 17:51:20 +08:00
05d1f8c9c6 [misc] move functions to config.py (#10624) youkaichao 2024-11-25 01:27:30 -08:00
25d806e953 [misc] add torch.compile compatibility check (#10618) youkaichao 2024-11-24 23:40:08 -08:00
65813781a2 [torch.compile] add warning for unsupported models (#10622) youkaichao 2024-11-24 23:27:51 -08:00
7c2134beda [torch.compile] force inductor threads (#10620) Jee Jee Li 2024-11-25 15:04:21 +08:00
a30a605d21 [Doc] Add encoder-based models to Supported Models page (#10616) Cyrus Leung 2024-11-25 14:34:07 +08:00
571841b7fc [torch.compile] support encoder based models (#10613) youkaichao 2024-11-24 21:24:33 -08:00
7ea3cd7c3e [Refactor][MISC] del redundant code in ParallelConfig.postinit (#10614) Mengqing Cao 2024-11-25 13:14:56 +08:00
214efc2c3c Support Cross encoder models (#10400) Maximilien de Bayser 2024-11-24 23:56:20 -03:00
49628fe13e [Doc] Update README.md with Ray Summit talk links (#10610) Zhuohan Li 2024-11-24 16:45:09 -08:00
e4fbb14414 [doc] update the code to add models (#10603) youkaichao 2024-11-24 11:21:40 -08:00
c055747867 [model][utils] add extract_layer_index utility function (#10599) youkaichao 2024-11-23 22:22:54 -08:00
eda2b3589c Revert "Print running script to enhance CI log readability" (#10601) youkaichao 2024-11-23 21:31:47 -08:00
1c445dca51 [CI/Build] Print running script to enhance CI log readability (#10594) Jee Jee Li 2024-11-24 11:57:13 +08:00
1700c543a5 [Bugfix] Fix LoRA weight sharding (#10450) Jee Jee Li 2024-11-24 09:23:17 +08:00
17d8fc1806 [bugfix] Fix example/tensorize_vllm_model tests (#10595) Jee Jee Li 2024-11-24 09:22:33 +08:00
04668ebe7a [Bugfix] Avoid import AttentionMetadata explicitly in Mllama (#10593) Isotr0py 2024-11-24 02:12:20 +08:00
651f6c31ac For ppc64le, disabled tests for now and addressed space issues (#10538) Nishidha 2024-11-23 15:03:53 +05:30
86a44fb896 [Platforms] Refactor openvino code (#10573) JiHuazhong 2024-11-23 14:23:12 +08:00
4cfe5d2bca [Bugfix] multi_modal_kwargs broadcast for CPU tensor parallel (#10541) Isotr0py 2024-11-23 13:25:46 +08:00
c8acd80548 [2/N] handling placeholders in merged multi-modal processor (#10485) Cyrus Leung 2024-11-23 13:25:09 +08:00
4634a89d18 Prefix Cache Aware Scheduling [1/n] (#10128) Ricky Xu 2024-11-22 21:15:55 -08:00
7c25fe45a6 [AMD] Add support for GGUF quantization on ROCm (#10254) kliuae 2024-11-23 13:14:49 +08:00
02a43f82a9 Update default max_num_batch_tokens for chunked prefill to 2048 (#10544) Michael Goin 2024-11-23 00:14:19 -05:00
cfea9c04ef [Model] Fix Baichuan BNB online quantization (#10572) Chen Wu 2024-11-23 13:13:59 +08:00
7d8ffb344f [Bugfix] Internal Server Error when tool_choice is incorrect. (#10567) Varun Vinayak Shenoy 2024-11-22 21:13:29 -08:00
4aba6e3d1a [core] gemma2 full context length support (#10584) youkaichao 2024-11-22 20:13:54 -08:00
978b39744b [Misc] Add pynccl wrappers for all_gather and reduce_scatter (#9432) Tyler Michael Smith 2024-11-22 22:14:03 -05:00
ebda51968b [Core] Fix broken log configuration (#10458) Russell Bryant 2024-11-22 21:23:51 -05:00
9195dbdbca [Bugfix][Frontend] Update Llama Chat Templates to also support Non-Tool use (#10164) Travis Johnson 2024-11-22 19:17:38 -07:00
d559979c54 [bugfix] fix cpu tests (#10585) youkaichao 2024-11-22 17:34:03 -08:00
d345f409b7 [V1] EngineCore supports profiling (#10564) Zhonghua Deng 2024-11-23 09:16:15 +08:00
28598f3939 [Core] remove temporary local variables in LLMEngine.__init__ (#10577) Russell Bryant 2024-11-22 19:22:53 -05:00
948c859571 support bitsandbytes quantization with qwen model (#10549) zixuanzhang226 2024-11-22 16:16:14 -08:00
97814fbf0f [v1] Refactor KVCacheManager for more hash input than token ids (#10507) Ricky Xu 2024-11-22 15:27:25 -08:00
eebad39f26 [torch.compile] support all attention backends (#10558) youkaichao 2024-11-22 14:04:42 -08:00
db100c5cde [bugfix] fix full graph tests (#10581) youkaichao 2024-11-22 10:02:14 -08:00
11fcf0e066 Remove token-adding chat embedding params (#10551) Noam Gat 2024-11-22 09:59:47 +02:00
b6374e09b0 [Bugfix] Fix Phi-3 BNB quantization with tensor parallel (#9948) Isotr0py 2024-11-22 15:01:56 +08:00
a111d0151f [platforms] absorb worker cls difference into platforms folder (#10555) youkaichao 2024-11-21 21:00:32 -08:00
446c7806b2 [Minor] Fix line-too-long (#10563) Woosuk Kwon 2024-11-21 19:40:40 -08:00
33e0a2540a [9/N] torch.compile LLM usage (#10552) youkaichao 2024-11-21 19:13:31 -08:00
aed074860a [Benchmark] Add new H100 machine (#10547) Simon Mo 2024-11-21 18:27:20 -08:00
9afa014552 Add small example to metrics.rst (#10550) Michael Goin 2024-11-21 18:43:43 -05:00
46fe9b46d8 [Minor] Revert change in offline inference example (#10545) Woosuk Kwon 2024-11-21 13:28:16 -08:00
cf656f5a02 [misc] improve error message (#10553) youkaichao 2024-11-21 13:13:17 -08:00
edec3385b6 [CI][Installation] Avoid uploading CUDA 11.8 wheel (#10535) Yunmeng 2024-11-22 05:03:58 +08:00
f9310cbd0c [V1] Fix Compilation config & Enable CUDA graph by default (#10528) Woosuk Kwon 2024-11-21 12:53:39 -08:00
7560ae5caf [8/N] enable cli flag without a space (#10529) youkaichao 2024-11-21 12:30:42 -08:00
e7a8341c7c [Bugfix] Allow token ID-only inputs in Qwen2-Audio (#10536) Cyrus Leung 2024-11-22 02:09:43 +08:00
c51e397fe8 [Misc] Suppress duplicated logging regarding multimodal input pipeline (#10530) Roger Wang 2024-11-21 09:21:31 -08:00
2385b60d83 [Kernel] Register punica ops directly (#10522) Jee Jee Li 2024-11-22 01:18:11 +08:00
da7e702c6f [Bug]: When apply continue_final_message for OpenAI server, the "echo":false is ignored (#10180) Chauncey 2024-11-22 00:24:32 +08:00
4d676f0852 [Bugfix] Embedding model pooling_type equals ALL and multi input's bug (#10494) Xiaoyu Zhang 2024-11-21 22:40:02 +08:00
d5ec121f95 [Model] Expose dynamic_image_size as mm_processor_kwargs for InternVL2 models (#10518) Isotr0py 2024-11-21 22:20:08 +08:00
8a93a598d9 fix the issue that len(tokenizer(prompt)["input_ids"]) > prompt_len (#10524) Wang, Yi 2024-11-21 19:15:36 +08:00
1cfde82ffd [Model] Add Support for Multimodal Granite Models (#10291) Alex Brooks 2024-11-21 03:46:20 -07:00
f0e0238016 [Doc] fix a small typo in docstring of llama_tool_parser (#10513) Zhong Qishuai 2024-11-21 17:05:23 +08:00
aaddce5d26 [platforms] improve error message for unspecified platforms (#10520) youkaichao 2024-11-20 23:07:56 -08:00
3430857b64 [Misc] Increase default video fetch timeout (#10495) Cyrus Leung 2024-11-21 15:06:42 +08:00
8b0fe06c89 [torch.compile] Inductor code caching fix (#10273) Luka Govedič 2024-11-21 00:44:57 -05:00
9d827170a3 [Platforms] Add device_type in Platform (#10508) Mengqing Cao 2024-11-21 12:44:20 +08:00
6c1208d083 [Core] Add Sliding Window Support with Flashinfer (#10462) Pavani Majety 2024-11-20 19:56:47 -08:00
388ee3de66 [torch.compile] limit inductor threads and lazy import quant (#10482) youkaichao 2024-11-20 18:36:33 -08:00
2f77b6cfec [TPU] Implement prefix caching for TPUs (#10307) Woosuk Kwon 2024-11-20 13:54:15 -08:00
c68f7ede6a [Bugfix]: allow extra fields in requests to openai compatible server (#10463) Guillaume Calmettes 2024-11-20 22:42:21 +01:00
0cd3d9717e [7/N] torch.compile, reduce compilation time (#10460) youkaichao 2024-11-20 11:20:38 -08:00
5f1d6af2b6 [perf bench] H200 development (#9768) Simon Mo 2024-11-20 11:06:56 -08:00

... 122 123 124 125 126 ...