Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

2b5bf20988 [torch.compile] Adding torch compile annotations to some models (#9876) Yongzao 2024-11-01 15:25:47 +08:00
93a76dd21d [Model] Support bitsandbytes for MiniCPMV (#9891) Michael Goin 2024-11-01 01:31:56 -04:00
566cd27797 [torch.compile] rework test plans (#9866) youkaichao 2024-10-31 22:20:17 -07:00
37a4947dcd [Bugfix] Fix layer skip logic with bitsandbytes (#9887) Michael Goin 2024-11-01 01:12:44 -04:00
96e0c9cbbd [torch.compile] directly register custom op (#9896) youkaichao 2024-10-31 21:56:09 -07:00
031a7995f3 [Bugfix][Frontend] Reject guided decoding in multistep mode (#9892) Joe Runde 2024-10-31 19:09:46 -06:00
b63c64d95b [ci/build] Configure dependabot to update pip dependencies (#9811) Kevin H. Luu 2024-10-31 12:55:38 -10:00
9fb12f7848 [BugFix][Kernel] Fix Illegal memory access in causal_conv1d in H100 (#9838) Mor Zusman 2024-10-31 22:06:25 +02:00
55650c83a0 [Bugfix] Fix illegal memory access error with chunked prefill, prefix caching, block manager v2 and xformers enabled together (#9532) sasha0552 2024-10-31 18:46:36 +00:00
77f7ef2908 [CI/Build] Adding a forced docker system prune to clean up space (#9849) Alexei-V-Ivanov-AMD 2024-10-31 12:02:58 -05:00
16b8f7a86f [CI/Build] Add Model Tests for Qwen2-VL (#9846) Alex Brooks 2024-10-31 10:10:52 -06:00
5608e611c2 [Doc] Update Qwen documentation (#9869) Jee Jee Li 2024-10-31 16:54:18 +08:00
3ea2dc2ec4 [Misc] Remove deprecated arg for cuda graph capture (#9864) Roger Wang 2024-10-31 00:22:07 -07:00
d087bf863e [Model] Support quantization of Qwen2VisionTransformer (#9817) Michael Goin 2024-10-31 01:41:20 -04:00
890ca36072 Revert "[Bugfix] Use host argument to bind to interface (#9798)" (#9852) Kevin H. Luu 2024-10-30 15:44:51 -10:00
abbfb6134d [Misc][OpenAI] deprecate max_tokens in favor of new max_completion_tokens field for chat completion endpoint (#9837) Guillaume Calmettes 2024-10-31 02:15:56 +01:00
64384bbcdf [torch.compile] upgrade tests (#9858) youkaichao 2024-10-30 16:34:22 -07:00
00d91c8a2c [CI/Build] Simplify exception trace in api server tests (#9787) Yongzao 2024-10-31 05:52:05 +08:00
c2cd1a2142 [doc] update pp support (#9853) youkaichao 2024-10-30 13:36:51 -07:00
c787f2d81d [Neuron] Update Dockerfile.neuron to fix build failure (#9822) Harsha vardhan manoj Bikki 2024-10-30 12:22:02 -07:00
33d257735f [Doc] link bug for multistep guided decoding (#9843) Joe Runde 2024-10-30 11:28:29 -06:00
3b3f1e7436 [Bugfix][core] replace heartbeat with pid check (#9818) Joe Runde 2024-10-30 10:34:07 -06:00
9ff4511e43 [Misc] Add chunked-prefill support on FlashInfer. (#9781) Elfie Guo 2024-10-30 09:33:53 -07:00
81f09cfd80 [Model] Support math-shepherd-mistral-7b-prm model (#9697) Went-Liang 2024-10-31 00:33:42 +08:00
cc98f1e079 [CI/Build] VLM Test Consolidation (#9372) Alex Brooks 2024-10-30 10:32:17 -06:00
211fe91aa8 [TPU] Correctly profile peak memory usage & Upgrade PyTorch XLA (#9438) Woosuk Kwon 2024-10-30 02:41:38 -07:00
6aa6020f9b [Misc] Specify minimum pynvml version (#9827) Jee Jee Li 2024-10-30 14:05:43 +08:00
ff5ed6e1bc [torch.compile] rework compile control with piecewise cudagraph (#9715) youkaichao 2024-10-29 23:03:49 -07:00
7b0365efef [Doc] Add the DCO to CONTRIBUTING.md (#9803) Russell Bryant 2024-10-30 01:22:23 -04:00
04a3ae0aca [Bugfix] Fix multi nodes TP+PP for XPU (#8884) Yan Ma 2024-10-30 12:34:45 +08:00
62fac4b9aa [ci/build] Pin CI dependencies version with pip-compile (#9810) Kevin H. Luu 2024-10-29 17:34:55 -10:00
226688bd61 [Bugfix][VLM] Make apply_fp8_linear work with >2D input (#9812) Michael Goin 2024-10-29 22:49:44 -04:00
64cb1cdc3f Update README.md (#9819) Lily Liu 2024-10-29 17:28:43 -07:00
1ab6f6b4ad [core][distributed] fix custom allreduce in pytorch 2.5 (#9815) youkaichao 2024-10-29 17:06:24 -07:00
bc73e9821c [Bugfix] Fix prefix strings for quantized VLMs (#9772) Michael Goin 2024-10-29 19:02:59 -04:00
8d7724104a [Docs] Add notes about Snowflake Meetup (#9814) Simon Mo 2024-10-29 15:19:02 -07:00
882a1ad0de [Model] tool calling support for ibm-granite/granite-20b-functioncalling (#8339) Will Eaton 2024-10-29 18:07:37 -04:00
67bdf8e523 [Bugfix][Frontend] Guard against bad token ids (#9634) Joe Runde 2024-10-29 16:13:20 -05:00
0ad216f575 [MISC] Set label value to timestamp over 0, to keep track of recent history (#9777) Kunjan 2024-10-29 12:52:19 -07:00
7585ec996f [CI/Build] mergify: fix rules for ci/build label (#9804) Russell Bryant 2024-10-29 15:24:42 -04:00
ab6f981671 [CI][Bugfix] Skip chameleon for transformers 4.46.1 (#9808) Michael Goin 2024-10-29 14:12:43 -04:00
ac3d748dba [Model] Add LlamaEmbeddingModel as an embedding Implementation of LlamaModel (#9806) Junichi Sato 2024-10-30 02:40:35 +09:00
0ce7798f44 [Misc]: Typo fix: Renaming classes (casualLM -> causalLM) (#9801) yannicks1 2024-10-29 18:39:20 +01:00
0f43387157 [Bugfix] Use host argument to bind to interface (#9798) Sven Seeberg 2024-10-29 18:37:59 +01:00
08600ddc68 Fix the log to correct guide user to install modelscope (#9793) tastelikefeet 2024-10-30 01:36:59 +08:00
74fc2d77ae [Misc] Add metrics for request queue time, forward time, and execute time (#9659) 科英 2024-10-30 01:32:56 +08:00
622b7ab955 [Hardware] using current_platform.seed_everything (#9785) wangshuai09 2024-10-29 22:47:44 +08:00
09500f7dde [Model] Add BNB quantization support for Mllama (#9720) Isotr0py 2024-10-29 20:20:02 +08:00
ef7865b4f9 [Frontend] re-enable multi-modality input in the new beam search implementation (#9427) Zhong Qishuai 2024-10-29 19:49:47 +08:00
eae3d48181 [Bugfix] Use temporary directory in registry (#9721) Cyrus Leung 2024-10-29 13:08:20 +08:00
e74f2d448c [Doc] Specify async engine args in docs (#9726) Cyrus Leung 2024-10-29 13:07:57 +08:00
7a4df5f200 [Model][LoRA]LoRA support added for Qwen (#9622) Jee Jee Li 2024-10-29 12:14:07 +08:00
c5d7fb9ddc [Doc] fix third-party model example (#9771) Russell Bryant 2024-10-28 22:39:21 -04:00
76ed5340f0 [torch.compile] add deepseek v2 compile (#9775) youkaichao 2024-10-28 14:35:17 -07:00
97b61bfae6 [misc] avoid circular import (#9765) youkaichao 2024-10-28 13:51:23 -07:00
aa0addb397 Adding "torch compile" annotations to moe models (#9758) Yongzao 2024-10-29 04:49:56 +08:00
5f8d8075f9 [Model][VLM] Add multi-video support for LLaVA-Onevision (#8905) litianjian 2024-10-29 02:04:10 +08:00
8b0e4f2ad7 [CI/Build] Adopt Mergify for auto-labeling PRs (#9259) Russell Bryant 2024-10-28 12:38:09 -04:00
2adb4409e0 [Bugfix] Fix ray instance detect issue (#9439) Yan Ma 2024-10-28 15:13:03 +08:00
feb92fbe4a Fix beam search eos (#9627) Robert Shaw 2024-10-28 02:59:37 -04:00
32176fee73 [torch.compile] support moe models (#9632) youkaichao 2024-10-27 21:58:04 -07:00
4e2d95e372 [Hardware][ROCM] using current_platform.is_rocm (#9642) wangshuai09 2024-10-28 12:07:00 +08:00
34a9941620 [Bugfix] Fix load config when using bools (#9533) madt2709 2024-10-27 10:46:41 -07:00
e130c40e4e Fix cache management in "Close inactive issues and PRs" actions workflow (#9734) Harry Mellor 2024-10-27 17:30:03 +00:00
3cb07a36a2 [Misc] Upgrade to pytorch 2.5 (#9588) bnellnm 2024-10-27 05:44:24 -04:00
8549c82660 [core] cudagraph output with tensor weak reference (#9724) youkaichao 2024-10-27 00:19:28 -07:00
67a6882da4 [Misc] SpecDecodeWorker supports profiling (#9719) 科英 2024-10-27 12:18:03 +08:00
6650e6a930 [Model] Add classification Task with Qwen2ForSequenceClassification (#9704) kakao-kevin-us 2024-10-27 02:53:35 +09:00
07e981fdf4 [Frontend] Bad words sampling parameter (#9717) Vasiliy Alekseev 2024-10-26 19:29:38 +03:00
55137e8ee3 Fix: MI100 Support By Bypassing Custom Paged Attention (#9560) ErkinSagiroglu 2024-10-26 13:12:57 +01:00
5cbdccd151 [Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716) Mengqing Cao 2024-10-26 18:59:06 +08:00
067e77f9a8 [Bugfix] Steaming continuous_usage_stats default to False (#9709) Sam Stoelinga 2024-10-25 22:05:47 -07:00
6567e13724 [Bugfix] Fix crash with llama 3.2 vision models and guided decoding (#9631) Travis Johnson 2024-10-25 16:42:56 -06:00
228cfbd03f [Doc] Improve quickstart documentation (#9256) Rafael Vasquez 2024-10-25 17:32:10 -04:00
ca0d92227e [Bugfix] Fix compressed_tensors_moe bad config.strategy (#9677) Michael Goin 2024-10-25 15:40:33 -04:00
9645b9f646 [V1] Support sliding window attention (#9679) Woosuk Kwon 2024-10-24 22:20:37 -07:00
a6f3721861 [Model] add a lora module for granite 3.0 MoE models (#9673) Will Johnson 2024-10-25 01:00:17 -04:00
9f7b4ba865 [ci/Build] Skip Chameleon for transformers 4.46.0 on broadcast test #9675 (#9676) Kevin H. Luu 2024-10-24 17:59:00 -10:00
c91ed47c43 [Bugfix] Remove xformers requirement for Pixtral (#9597) Michael Goin 2024-10-24 18:38:05 -04:00
59449095ab [Performance][Kernel] Fused_moe Performance Improvement (#9384) Charlie Fu 2024-10-24 17:37:52 -05:00
e26d37a185 [Log][Bugfix] Fix default value check for image_url.detail (#9663) Michael Goin 2024-10-24 13:44:38 -04:00
722d46edb9 [Model] Compute Llava Next Max Tokens / Dummy Data From Gridpoints (#9650) Alex Brooks 2024-10-24 11:42:24 -06:00
c866e0079d [CI/Build] Fix VLM test failures when using transformers v4.46 (#9666) Cyrus Leung 2024-10-25 01:40:40 +08:00
d27cfbf791 [torch.compile] Adding torch compile annotations to some models (#9641) Yongzao 2024-10-25 00:31:42 +08:00
de662d32b5 Increase operation per run limit for "Close inactive issues and PRs" workflow (#9661) Harry Mellor 2024-10-24 17:17:45 +01:00
f58454968f [Bugfix]Disable the post_norm layer of the vision encoder for LLaVA models (#9653) litianjian 2024-10-24 22:52:07 +08:00
b979143d5b [Doc] Move additional tips/notes to the top (#9647) Cyrus Leung 2024-10-24 17:43:59 +08:00
ad6f78053e [torch.compile] expanding support and fix allgather compilation (#9637) Yongzao 2024-10-24 16:32:15 +08:00
295a061fb3 [Kernel] add kernel for FATReLU (#9610) Jee Jee Li 2024-10-24 16:18:27 +08:00
8a02cd045a [torch.compile] Adding torch compile annotations to some models (#9639) Yongzao 2024-10-24 15:54:57 +08:00
4fdc581f9e [core] simplify seq group code (#9569) youkaichao 2024-10-24 00:16:44 -07:00
3770071eb4 [V1][Bugfix] Clean up requests when aborted (#9629) Woosuk Kwon 2024-10-23 23:33:22 -07:00
836e8ef6ee [Bugfix] Fix PP for ChatGLM and Molmo (#9422) Cyrus Leung 2024-10-24 14:12:05 +08:00
056a68c7db [XPU] avoid triton import for xpu (#9440) Yan Ma 2024-10-24 13:14:00 +08:00
33bab41060 [Bugfix]: Make chat content text allow type content (#9358) Vinay R Damodaran 2024-10-24 01:05:49 -04:00
b7df53cd42 [Bugfix] Use "vision_model" prefix for MllamaVisionModel (#9628) Michael Goin 2024-10-23 22:07:44 -04:00
bb01f2915e [Bugfix][Model] Fix Mllama SDPA illegal memory access for batched multi-image (#9626) Michael Goin 2024-10-23 22:03:44 -04:00
b548d7a5f4 [CI/Build] Add bot to close stale issues and PRs (#9436) Russell Bryant 2024-10-23 18:45:26 -04:00
fc6c274626 [Model] Add Qwen2-Audio model support (#9248) Yunfei Chu 2024-10-24 01:54:22 +08:00
150b779081 [Frontend] Enable Online Multi-image Support for MLlama (#9393) Alex Brooks 2024-10-23 11:28:57 -06:00

... 126 127 128 129 130 ...