Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

a11f326528 [V1] Initial support of multimodal models for V1 re-arch (#10699) Roger Wang 2024-12-08 04:50:51 -08:00
fd57d2b534 [torch.compile] allow candidate compile sizes (#10984) youkaichao 2024-12-08 03:05:21 -08:00
7be15d9356 [core][misc] remove use_dummy driver for _run_workers (#10920) youkaichao 2024-12-07 12:06:08 -08:00
1b62745b1d [core][executor] simplify instance id (#10976) youkaichao 2024-12-07 09:33:45 -08:00
78029b34ed [BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928) zhou fan 2024-12-08 01:21:18 +08:00
c889d5888b [Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975) Cyrus Leung 2024-12-08 01:20:49 +08:00
39e227c7ae [Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711) Cyrus Leung 2024-12-08 01:10:05 +08:00
1c768fe537 [Doc] Explicitly state that InternVL 2.5 is supported (#10978) Cyrus Leung 2024-12-08 00:58:02 +08:00
bf0e382e16 [Model] Composite weight loading for multimodal Qwen2 (#10944) Cyrus Leung 2024-12-07 22:22:52 +08:00
b26b4cd03c [Misc][LoRA] Refactor and clean MergedQKVParallelLinearWithLora implementation (#10958) Isotr0py 2024-12-07 18:33:49 +08:00
f13cf9ad50 [Build] Fix for the Wswitch-bool clang warning (#10060) Gregory Shtrasberg 2024-12-07 04:03:44 -05:00
955fa9533a [3/N] Support and implement merged input processor for LLaVA model (#10676) Cyrus Leung 2024-12-07 16:50:58 +08:00
acf092d348 [Bugfix] Fix test-pipeline.yaml (#10973) Jee Jee Li 2024-12-07 12:08:54 +08:00
69d357ba12 [Core] Cleanup startup logging a bit (#10961) Russell Bryant 2024-12-06 21:30:23 -05:00
dcdc3fafe5 [ci] fix broken tests (#10956) youkaichao 2024-12-06 11:25:47 -08:00
c05cfb67da [misc] fix typo (#10960) youkaichao 2024-12-06 11:25:20 -08:00
7406274041 [Doc] add KubeAI to serving integrations (#10837) Sam Stoelinga 2024-12-06 09:03:56 -08:00
8b59631855 [Core] Support Lark grammars for XGrammar (#10870) Michael Goin 2024-12-06 10:34:29 -05:00
a1887f2c96 [torch.compile] fix deprecated code (#10948) youkaichao 2024-12-06 03:01:23 -08:00
222f5b082a [CI/Build] Fix broken multimodal test (#10950) Cyrus Leung 2024-12-06 18:41:23 +08:00
b031a455a9 [torch.compile] add logging for compilation time (#10941) youkaichao 2024-12-06 02:07:15 -08:00
db87eb6c67 [torch.compile] use size tuning for specific sizes (#10933) youkaichao 2024-12-05 20:30:41 -08:00
9743d64e4e [ci][build] add tests for python only compilation (#10915) youkaichao 2024-12-05 08:54:47 -08:00
a43065272f [Misc][Gaudi] Avoid torch.compile and enable lazy collectives (#10897) Konrad Zawora 2024-12-05 17:47:46 +01:00
998eeafe58 [CI/Build] Bump test transformers version (#10106) Isotr0py 2024-12-06 00:05:52 +08:00
571da8fc43 [Misc][LoRA] Clean up the function interface of Punica (#10917) Jee Jee Li 2024-12-05 21:22:28 +08:00
39c89e71a8 [Misc] Update llama 3.2 template to support system prompt with images (#10901) Travis Johnson 2024-12-04 22:54:06 -07:00
1f958a7d52 [Bugfix] Fix BNB loader target_modules (#10720) Jee Jee Li 2024-12-05 13:20:26 +08:00
aa39a8e175 [Doc] Create a new "Usage" section (#10827) Cyrus Leung 2024-12-05 11:19:35 +08:00
8d370e91cb [Bugfix] Fallback to outlines for complex json schemas (#10899) Michael Goin 2024-12-04 22:14:06 -05:00
7883c2bbe7 [benchmark] Make H100 benchmark optional (#10908) Kevin H. Luu 2024-12-04 17:02:17 -08:00
2a56e1264f [V1] Fix when max_model_len is not divisible by block_size (#10903) Woosuk Kwon 2024-12-04 16:54:05 -08:00
e4c34c23de [CI/Build] improve python-only dev setup (#9621) Daniele 2024-12-04 22:48:13 +01:00
82eb5ea8f3 Benchmark serving structured output (#10880) Chendi.Xue 2024-12-04 15:28:21 -06:00
10398b4706 [Model] Consolidate ViTs attention implementation without mask (#10893) Isotr0py 2024-12-05 02:11:08 +08:00
01d079fd8e [LoRA] Change lora_tokenizers capacity (#10796) Xin Yang 2024-12-04 09:40:16 -08:00
c92acb9693 [ci/build] Update vLLM postmerge ECR repo (#10887) Kevin H. Luu 2024-12-04 01:01:20 -08:00
8db957ee3a [bugfix] fixed parameter “n” when set parameter “bestof” > 1 (#10854) jianzheng 2024-12-04 16:48:22 +08:00
c9ca4fce3f [ci/build] Job to build and push release image (#10877) Kevin H. Luu 2024-12-03 23:02:40 -08:00
fa2dea61df [ci/build] Change queue name for Release jobs (#10875) Kevin H. Luu 2024-12-03 23:02:16 -08:00
b5b647b084 Drop ROCm load format check (#10767) wangxiyuan 2024-12-04 12:32:21 +08:00
d2bd88b122 [CI/Build] Replace mean with torch.all in test_pynccl.py (#10876) Tyler Michael Smith 2024-12-03 22:23:21 -05:00
381ac93bb5 [Benchmark] Benchmark structured output with datasets (#10557) Chendi.Xue 2024-12-03 18:21:06 -06:00
a061fe601e [Build][Bugfix] Using the correct type hint (#10866) Gregory Shtrasberg 2024-12-03 15:47:55 -05:00
7c32b6861e [Frontend] correctly record prefill and decode time metrics (#10853) tomeras91 2024-12-03 21:13:31 +02:00
7090c27bb2 [Bugfix] Only require XGrammar on x86 (#10865) Michael Goin 2024-12-03 13:32:21 -05:00
2f2cdc745a [MISC][XPU] quick fix for XPU CI (#10859) Yan Ma 2024-12-04 01:16:31 +08:00
3bc94cab69 [V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640) Alexander Matveev 2024-12-03 05:33:10 -05:00
f6084f6324 [Speculative Decoding] Move indices to device before filtering output (#10850) Yang Zheng 2024-12-03 17:01:39 +08:00
9323a3153b [Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785) Aaron Pham 2024-12-03 02:17:00 -05:00
3257d449fa [Misc] Remove deprecated names (#10817) Cyrus Leung 2024-12-03 14:52:57 +08:00
ef51831ee8 [Doc] Add github links for source code references (#10672) Russell Bryant 2024-12-03 01:46:07 -05:00
dc5ce861bf [torch.compile] remove compilation_context and simplify code (#10838) youkaichao 2024-12-02 22:19:02 -08:00
21fe7b481a [core][distributed] add pynccl broadcast (#10843) youkaichao 2024-12-02 20:53:23 -08:00
a4cf256159 [Bugfix] Fix QKVParallelLinearWithShardedLora bias bug (#10844) Jee Jee Li 2024-12-03 12:10:29 +08:00
d746268e92 [Model] support bitsandbytes quantization with minicpm model (#10842) zixuanzhang226 2024-12-02 19:06:41 -08:00
4433195ab7 [Bugfix] Prevent benchmark_throughput.py from using duplicated random prompts (#10753) Michael Goin 2024-12-02 21:26:15 -05:00
4c05edb33a [Model] Add TP and BNB quantization support to LlavaMultiModalProjector (#10834) Isotr0py 2024-12-03 07:06:09 +08:00
9b14d978aa Fix openvino on GPU (#10793) Jani Monoses 2024-12-02 20:52:19 +02:00
519cc6ca12 [Misc][XPU] Avoid torch compile for XPU platform (#10747) Yan Ma 2024-12-03 01:53:55 +08:00
b45f0d7946 [Misc][LoRA] Move the implementation of lora bias to punica.py (#10829) Jee Jee Li 2024-12-03 01:53:36 +08:00
a4c4daf364 [misc] use out argument for flash attention (#10822) youkaichao 2024-12-02 02:50:10 -08:00
e95f275f57 [CI/Build] Update mistral_common version for tests and docs (#10825) Cyrus Leung 2024-12-02 18:26:10 +08:00
ef31eabc68 [Model]: add some tests for aria model (#10770) zhou fan 2024-12-02 13:36:36 +08:00
995a148575 [doc]Update config docstring (#10732) wangxiyuan 2024-12-02 12:14:45 +08:00
63a164172d [misc] remove xverse modeling file (#10814) youkaichao 2024-12-01 19:27:13 -08:00
e25810ae29 Fill TorchSDPAAttentionMetadata seq_lens_field for prefill (#10799) Maximilien de Bayser 2024-12-01 23:05:32 -03:00
073a4bd1c0 [Kernel] Use out arg in flash_attn_varlen_func (#10811) Woosuk Kwon 2024-12-01 17:55:39 -08:00
b7954776fd [core] Avoid metrics log noise when idle - include speculative decodi… (#10809) cduk 2024-12-02 02:49:48 +01:00
b18c9bbaba [Model] Add BNB support to Llava and Pixtral-HF (#10795) Isotr0py 2024-12-02 09:31:09 +08:00
0590ec3fd9 [Core] Implement disagg prefill by StatelessProcessGroup (#10502) Kuntai Du 2024-12-01 19:01:00 -06:00
c11f172187 [Misc] Adding MMMU-Pro vision dataset to serving benchmark (#10804) Roger Wang 2024-12-01 00:47:05 -08:00
169a0ff911 [doc] add warning about comparing hf and vllm outputs (#10805) youkaichao 2024-12-01 00:41:38 -08:00
d2f058e76c [Misc] Rename embedding classes to pooling (#10801) Cyrus Leung 2024-12-01 14:36:51 +08:00
f877a7d12a [Misc] Improve type annotations for support_torch_compile (#10763) Cyrus Leung 2024-12-01 09:48:35 +08:00
133707123e [Model] Replace embedding models with pooling adapter (#10769) Cyrus Leung 2024-12-01 08:02:54 +08:00
7e4bbda573 [doc] format fix (#10789) wangxiyuan 2024-11-30 19:38:40 +08:00
e7cfc4ef4c [Interleaved ATTN] Support for Mistral-8B (#10591) Patrick von Platen 2024-11-30 08:45:50 +01:00
16ee07f22a [Model] Refactor Molmo weights loading to use AutoWeightsLoader (#10771) Isotr0py 2024-11-30 12:19:14 +08:00
40bc242579 [Bugfix] Fix OpenVino/Neuron driver_worker init (#10779) Nicolò Lucchesi 2024-11-30 05:07:13 +01:00
661175bc82 [platform] Add verify_quantization in platform. (#10757) wangxiyuan 2024-11-29 23:22:21 +08:00
3132aac043 [Bugfix] Fix Idefics3 bug (#10778) Jee Jee Li 2024-11-29 21:56:46 +08:00
c82b432d4a [Misc] typo find in sampling_metadata.py (#10740) wang.yuqi 2024-11-29 13:17:57 +08:00
fa6ecb9aa7 [Model] Clean up MiniCPMV (#10751) Cyrus Leung 2024-11-29 12:47:06 +08:00
c83919c7a6 [Model] Add Internlm2 LoRA support (#5064) Isotr0py 2024-11-29 01:29:04 +08:00
98f47f2a40 [V1] Optimize the CPU overheads in FlashAttention custom op (#10733) Woosuk Kwon 2024-11-28 09:01:02 -08:00
8c1e77fb58 [Kernel] Update vllm-flash-attn version to reduce CPU overheads (#10742) Woosuk Kwon 2024-11-28 08:31:28 -08:00
5fc5ce0fe4 [Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561) sixgod 2024-11-28 22:53:31 +08:00
3ed5e73146 [TPU] Update requirements-tpu (#10726) Richard Liu 2024-11-28 02:30:48 -08:00
9a8bff0285 [Kernel] Update vllm-flash-attn version (#10736) Woosuk Kwon 2024-11-28 02:25:59 -08:00
a79b122400 [V1] Do not allocate beyond the max_model_len (#10730) Woosuk Kwon 2024-11-28 00:13:15 -08:00
d9b4b3f069 [Bug][CLI] Allow users to disable prefix caching explicitly (#10724) Ricky Xu 2024-11-27 23:59:28 -08:00
278be671a3 [Doc] Update model in arch_overview.rst to match comment (#10701) 罗泽轩 2024-11-28 15:58:39 +08:00
70dc14fbd0 [Model] support bitsandbytes quantization with minicpm3 model (#10682) zixuanzhang226 2024-11-27 23:58:02 -08:00
cb4e1c3f3a [misc] upgrade filelock version (#10731) youkaichao 2024-11-27 19:54:58 -08:00
395b1c7454 [Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635) tomeras91 2024-11-27 23:21:10 +02:00
9b4b150395 [Bugfix] Ignore lm_head when loading embedding models (#10719) Cyrus Leung 2024-11-28 03:05:29 +08:00
197b4484a3 [Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705) Mor Zusman 2024-11-27 21:02:27 +02:00
b98c62ba49 [Bugfix] Fix GGUF inference with FP16 unquantized checkpoint (#10675) Isotr0py 2024-11-28 02:43:17 +08:00
c411def234 [torch.compile] fix shape specialization (#10722) youkaichao 2024-11-27 10:16:10 -08:00

... 121 122 123 124 125 ...