Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

0c6f5023c3 [V1] Scheduler Refactoring [1/N] - Add Scheduler Interface (#15250) Woosuk Kwon 2025-03-20 17:50:43 -07:00
06dd08256f Enforce that TP > 1 is not supported for Mamba2 if Quantization is Enabled. (#14617) Yu Chin Fabian Lim 2025-03-21 08:44:37 +08:00
2b22290ce0 [V1] Add flag to disable cascade attention (#15243) Woosuk Kwon 2025-03-20 15:24:16 -07:00
d8e82bc06d [Bugfix] fix V1 Engine crash while handling requests with duplicate request id (#15043) Jason 2025-03-21 01:01:02 +08:00
086b56824c [ci] feat: make the test_torchrun_example run with tp=2, external_dp=2 (#15172) Chi Zhang 2025-03-21 00:30:04 +08:00
5a0905ba2a Replace misc issues with link to forum (#15226) Harry Mellor 2025-03-20 15:18:20 +00:00
a8f12a63fd Fix env vars for running Ray distributed backend on GKE (#15166) Richard Liu 2025-03-20 07:59:33 -07:00
69ae2380c6 Add user forum to README (#15220) Harry Mellor 2025-03-20 14:39:51 +00:00
27261e40a6 [Bugfix] Multi-video inference on LLaVA-Onevision (#15082) Cyrus Leung 2025-03-20 22:10:45 +08:00
e3f813c33b [macOS] Ugrade pytorch to 2.6.0 (#15129) Quang-Linh LE 2025-03-20 09:22:40 +01:00
c607a2652b Fixing Imprecise Type Annotations (#15192) Wang Ran (汪然) 2025-03-20 16:19:55 +08:00
3d45e3d749 [release] Tag vllm-cpu with latest upon new version released (#15193) Kevin H. Luu 2025-03-20 01:19:10 -07:00
742369d35a [Frontend][Bugfix] support prefill decode disaggregation on deepseek (#14824) billishyahao 2025-03-20 15:00:33 +08:00
bfe2fe0af4 typo: Update config.py (#15189) Wang Ran (汪然) 2025-03-20 14:31:21 +08:00
a8652f4f0f Enable CUDA graph support for llama 3.2 vision (#14917) Matt Ritter 2025-03-19 23:29:16 -07:00
2f726b241e [Doc] Update README.md (#15187) Cyrus Leung 2025-03-20 13:25:58 +08:00
a597a57595 [Attention] Flash Attention 3 - fp8 (#14570) Mickaël Seznec 2025-03-20 06:14:20 +01:00
ae65f3e237 [Misc]fixed disable these http request logs (#14754) Chauncey 2025-03-20 12:53:40 +08:00
34868b106a [Doc] Update Mistral Small 3.1/Pixtral example (#15184) Roger Wang 2025-03-19 21:46:06 -07:00
1f16b7fe74 [Core][V0] Add guidance backend for structured output (#14589) Russell Bryant 2025-03-20 00:33:51 -04:00
b88be22165 [Benchmark] Allow oversample request in benchmark dataset (#15170) Jennifer Zhao 2025-03-19 21:32:58 -07:00
d8c6d7d6b5 [V1][TPU] Support V1 Sampler for ragged attention (#14227) Nicolò Lucchesi 2025-03-20 05:00:39 +01:00
40828ce5fe fix "Total generated tokens:" is 0 if using --backend tgi and --endpo… (#14673) Wang, Yi 2025-03-20 11:56:16 +08:00
ffa443afed [Bugfix] Fix embedding assignment for InternVL-based models (#15086) Cyrus Leung 2025-03-20 11:40:13 +08:00
70e500cad9 Fix broken tests (#14713) Jovan Sardinha 2025-03-19 19:06:49 -07:00
4cb1c05c9e [Doc] Clarify run vllm only on one node in distributed inference (#15148) Rui Qiao 2025-03-19 18:55:59 -07:00
c47aafa37c [BugFix] Lazily import XgrammarBackend to avoid early cuda init (#15171) Nick Hill 2025-03-19 18:30:43 -07:00
cfbca8a2f2 [V1] TPU - Tensor parallel MP support (#15059) Alexander Matveev 2025-03-19 20:55:18 -04:00
0fe5609874 [Docs] Annouce Ollama and Singapore Meetups (#15161) Simon Mo 2025-03-19 16:18:04 -07:00
22d33baca2 [FrontEnd][Perf] merge_async_iterators fast-path for single-prompt requests (#15150) Nick Hill 2025-03-19 14:04:41 -07:00
b0e96aaebb [V1][TPU] Change kv cache shape. (#15145) iefgnoix 2025-03-19 12:16:42 -07:00
8310e0b59b simple bugfix: Update stats.py (#15139) Wang Ran (汪然) 2025-03-20 02:26:27 +08:00
26dd972adb [FEAT]Support reset prefix cache by specified device (#15003) maobaolong 2025-03-20 01:54:41 +08:00
61c7a1b856 [V1] Minor V1 async engine test refactor (#15075) v0.8.1 Murali Andoorveedu 2025-03-19 10:37:17 -07:00
374ee287d8 [Frontend] Remove custom_cache_manager (#13791) Alessandro Sangiorgi 2025-03-19 11:13:50 -05:00
a4d83661d7 [Misc] Update the "the first vLLM China Meetup" slides link to point to the first page (#15134) Kero Liang 2025-03-19 23:07:39 +08:00
8363cd093d [Bugfix] Adjust mllama to regional compilation (#15112) Jan Kaniecki 2025-03-19 15:57:25 +01:00
6c5a3195db [Misc][Benchmark] Add support for different tokenizer_mode (#15040) Aaron Pham 2025-03-19 10:56:50 -04:00
073d1ed354 [Doc] Update tip info on using latest transformers when creating a custom Dockerfile (#15070) Marc-Alexandre Côté 2025-03-19 09:33:40 -04:00
3d446433ec [Bugfix] Fix size calculation of processing cache (#15114) Cyrus Leung 2025-03-19 20:53:19 +08:00
1fe0fd12d3 [Misc] Avoid unnecessary HF do_rescale warning when passing dummy data (#15107) Cyrus Leung 2025-03-19 18:42:31 +08:00
dafb4e504a [V1][Bugfix] Fix oracle for device checking (#15104) Roger Wang 2025-03-19 03:35:32 -07:00
68cf1601d3 [CI][Intel GPU] update XPU dockerfile and CI script (#15109) Kunshang Ji 2025-03-19 01:29:25 -07:00
61f412187d [Bugfix] Re-enable Gemma3 for V1 (#14980) Cyrus Leung 2025-03-19 14:58:22 +08:00
05ccd0aa35 [V1] Ensure using int64 for sampled token ids (#15065) Woosuk Kwon 2025-03-18 23:52:19 -07:00
f690372b68 [Core] Update dtype detection and defaults (#14858) Cyrus Leung 2025-03-19 13:49:33 +08:00
8b3e94a357 [Model] Remove duplicated message check in Mistral chat completion request (#15069) Brayden Zhong 2025-03-19 01:09:32 -04:00
437f9162d0 [Model] Pixtral: Remove layer instantiation duplication (#15053) Julien Denize 2025-03-19 03:34:03 +01:00
4f065f12f5 [Misc][V1] Skip device checking if not available (#15061) Cody Yu 2025-03-18 19:33:43 -07:00
228b768db6 [Doc] Minor v1_user_guide update (#15064) Jennifer Zhao 2025-03-18 16:10:45 -07:00
027827cc1d fix long dtype in topk sampling (#15049) Chujie Zheng 2025-03-19 06:57:31 +08:00
72a8639b68 [V1] TPU - CI/CD use smaller model (#15054) Alexander Matveev 2025-03-18 17:39:21 -04:00
99abb8b650 [V1][Spec Decode] Optimize Rejection Sampler with Triton Kernels (#14930) Woosuk Kwon 2025-03-18 14:31:54 -07:00
3a1e648158 [V1] Refactor Structured Output for multiple backends (#14694) Russell Bryant 2025-03-18 15:49:15 -04:00
966f933ee1 [Bugfix] Fix LoRA extra vocab size (#15047) v0.8.0 Jee Jee Li 2025-03-19 00:40:29 +08:00
1a504aff6c [Bugfix] Fix broken CPU quantization due to triton import (#15038) Isotr0py 2025-03-18 23:57:39 +08:00
01ca85bbd8 [MODEL] Add support for Zamba2 models (#13185) yury-tokpanov 2025-03-18 08:56:21 -07:00
d82b9487ea [Bugfix] Register serializers for V0 MQ Engine (#15009) Simon Mo 2025-03-18 06:14:47 -07:00
be13281d4b [Bugfix] Loosen type check to avoid errors in V1 (#15021) Cyrus Leung 2025-03-18 20:54:40 +08:00
54e084f7fb [Bugfix] torchrun compatibility (#14899) hoshi-hiyouga 2025-03-18 20:49:27 +08:00
9e8f089d08 [Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685) Varun Sundar Rabindranath 2025-03-18 05:47:53 -04:00
46c759c165 [Bugfix] Fix LoRA extra vocab size (#15047) Jee Jee Li 2025-03-19 00:40:29 +08:00
179a619c21 [Bugfix] Fix broken CPU quantization due to triton import (#15038) Isotr0py 2025-03-18 23:57:39 +08:00
452e8fd968 [MODEL] Add support for Zamba2 models (#13185) yury-tokpanov 2025-03-18 08:56:21 -07:00
8b793f7ec6 MI325 configs, fused_moe_kernel bugfix (#14987) ekuznetsov139 2025-03-18 08:05:18 -07:00
af35d3a3cc [TPU][V1][Bugfix] Fix chunked prefill with padding (#15037) Nicolò Lucchesi 2025-03-18 15:34:45 +01:00
3b457143d2 [Bugfix] Register serializers for V0 MQ Engine (#15009) Simon Mo 2025-03-18 06:14:47 -07:00
ab656f2c2f [Bugfix] Loosen type check to avoid errors in V1 (#15021) Cyrus Leung 2025-03-18 20:54:40 +08:00
64fc2193dc [Misc][Docs] fix the comments of KV_T and CACHE_T in CALL_RESHAPE_AND_CACHE_XX macros (#14347) Serena 2025-03-18 20:50:19 +08:00
dd732028f5 [Bugfix][Frontend] Fix validation of logprobs in ChatCompletionRequest (#14352) Sebastian Schoennenbeck 2025-03-18 13:50:05 +01:00
414919138b [Bugfix] torchrun compatibility (#14899) hoshi-hiyouga 2025-03-18 20:49:27 +08:00
db7c8ca910 [Misc] Embedding model support LoRA (#14935) Jee Jee Li 2025-03-18 20:07:00 +08:00
f863ffc965 [Mistral-Small 3.1] Update docs and tests (#14977) Patrick von Platen 2025-03-18 11:29:42 +01:00
400d483e87 [Kernels] LoRA - Retire SGMV and BGMV Kernels (#14685) Varun Sundar Rabindranath 2025-03-18 05:47:53 -04:00
d1695758b2 [Doc][V1] Fix V1 APC doc (#14920) Shanshan Shen 2025-03-18 16:15:46 +08:00
53a0cf8b95 [Neuron] trim attention kernel tests to fit trn1.2x instance (#14988) Liangfu Chen 2025-03-18 00:05:52 -07:00
5eeabc2a44 [Bugfix] Fix bnb quantization for models with both HF-format and Mistral-format weights (#14950) Tristan Leclercq 2025-03-18 00:27:26 +01:00
18551e820c [V1] TPU - Fix CI/CD runner (#14974) Alexander Matveev 2025-03-17 17:07:07 -04:00
16e9064f84 [V1] Guard Against Main Thread Usage (#14972) Robert Shaw 2025-03-17 16:23:02 -04:00
e41e160263 [V1] Guard Against Main Thread Usage (#14972) Robert Shaw 2025-03-17 16:23:02 -04:00
5ac1a8e6e4 [Bugfix] Fix interface for Olmo2 on V1 (#14976) Roger Wang 2025-03-17 11:26:38 -07:00
b89fb2a4a1 [CI/Build] Use AutoModelForImageTextToText to load VLMs in tests (#14945) Cyrus Leung 2025-03-18 02:35:17 +08:00
5340b0e221 [Bugfix] Fix interface for Olmo2 on V1 (#14976) Roger Wang 2025-03-17 11:26:38 -07:00
37e3806132 [Bugfix] Make Gemma3 MM V0 only for now (#14971) v0.8.0rc2 Roger Wang 2025-03-17 10:04:21 -07:00
c0efdd655b [Fix][Structured Output] using vocab_size to construct matcher (#14868) Aaron Pham 2025-03-17 11:42:45 -04:00
aaaec52ad9 [Bugfix][Model] Mixtral: use unused head_dim config argument (#14961) Quentin 2025-03-17 15:44:18 +01:00
e1eb45d397 [Bugfix] Fix precommit - line too long in pixtral.py (#14960) Tyler Michael Smith 2025-03-17 10:18:50 -04:00
89fca671fb [V1] Default MLA to V1 (#14921) Simon Mo 2025-03-17 06:54:40 -07:00
d20b0c139c Add patch merger (#14957) Patrick von Platen 2025-03-17 14:47:50 +01:00
166a168b0f [Doc] Fix misleading log during multi-modal profiling (#14955) Cyrus Leung 2025-03-17 21:14:32 +08:00
2bb0e1a799 [Bugfix][ROCm] running new process using spawn method for rocm in tests. (#14810) vllmellm 2025-03-17 19:33:35 +08:00
6eaf1e5c52 [Misc] Add --seed option to offline multi-modal examples (#14934) Cyrus Leung 2025-03-17 18:00:17 +08:00
868a8c5b2c [Bugfix] Fix Ultravox on V1 (#14929) Cyrus Leung 2025-03-17 17:15:20 +08:00
b4ad56c1bd [V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. (#14846) iefgnoix 2025-03-17 01:48:28 -07:00
69698f257e fix minor miscalled method (#14327) kushanam 2025-03-17 01:47:58 -07:00
cd0cd85102 [MISC] More AMD unused var clean up (#14926) Lu Fang 2025-03-17 01:40:41 -07:00
0a74bfce9c setup.py: drop assumption about local main branch (#14692) Russell Bryant 2025-03-17 04:37:42 -04:00
dd3b865854 [Doc] Add vLLM Beijing meetup slide (#14938) Chen Zhang 2025-03-17 16:29:36 +08:00
9b87a579aa [Misc][XPU] Use None as device capacity for XPU (#14932) Yan Ma 2025-03-17 16:22:14 +08:00
b539222d4e [V1] Remove input cache client (#14864) Cyrus Leung 2025-03-17 14:42:06 +08:00

... 105 106 107 108 109 ...