Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

fc601665eb [Misc] Update disaggregation benchmark scripts and test logs (#11456) Jiaxin Shan 2024-12-24 22:58:48 -08:00
9832e5572a [V1] Unify VLLM_ENABLE_V1_MULTIPROCESSING handling in RayExecutor (#11472) Rui Qiao 2024-12-24 19:49:46 -08:00
3f3e92e1f2 [Model] Automatic conversion of classification and reward models (#11469) Cyrus Leung 2024-12-25 02:22:22 +08:00
409475a827 [Bugfix] Fix issues in CPU build Dockerfile. Fixes #9182 (#11435) Yuan Tang 2024-12-24 11:53:28 -05:00
196c34b0ac [Misc] Move weights mapper (#11443) Jee Jee Li 2024-12-24 21:05:25 +08:00
5c7963249d [attn][tiny fix] fix attn backend in MultiHeadAttention (#11463) Mengqing Cao 2024-12-24 20:39:36 +08:00
461cde2080 [OpenVINO] Fixed installation conflicts (#11458) Ilya Lavrenov 2024-12-24 15:38:21 +04:00
7a5286cc04 [Bugfix][Hardware][CPU] Fix CPU input_positions creation for text-only inputs with mrope (#11434) Isotr0py 2024-12-24 17:59:51 +08:00
b1b1038fbd [Bugfix] Fix Qwen2-VL LoRA weight loading (#11430) Jee Jee Li 2024-12-24 17:56:10 +08:00
9edca6bf8f [Frontend] Online Pooling API (#11457) Cyrus Leung 2024-12-24 17:54:30 +08:00
4f074fbf53 [Misc]Suppress irrelevant exception stack trace information when CUDA… (#11438) dpxa 2024-12-24 16:43:39 +08:00
a491d6f535 [V1] TP Ray executor (#11107) Rui Qiao 2024-12-23 15:00:12 -08:00
32aa2059ad [Docs] Convert rST to MyST (Markdown) (#11145) Rafael Vasquez 2024-12-23 17:35:38 -05:00
94d545a1a1 [Doc] Fix typo in the help message of '--guided-decoding-backend' (#11440) yansh97 2024-12-24 04:20:44 +08:00
60fb4f3bcf [Bugfix] Add kv cache scales to gemma2.py (#11269) Michael Goin 2024-12-23 14:30:45 -05:00
63afbe9215 [CI] Expand OpenAI test_chat.py guided decoding tests (#11048) Michael Goin 2024-12-23 13:35:38 -05:00
8cef6e02dc [Misc] add w8a8 asym models (#11075) Dipika Sikka 2024-12-23 13:33:20 -05:00
b866cdbd05 [Misc] Add assertion and helpful message for marlin24 compressed models (#11388) Dipika Sikka 2024-12-23 13:23:38 -05:00
2e726680b3 [Bugfix] torch nightly version in ROCm installation guide (#11423) Yuan Tang 2024-12-23 12:20:22 -05:00
5bfb30a529 [Bugfix] Fix CFGGuide and use outlines for grammars that can't convert to GBNF (#11389) Michael Goin 2024-12-23 10:06:20 -05:00
e51719ae72 mypy type checking for vllm/worker (#11418) Lucas Tucker 2024-12-23 07:55:49 -06:00
f30581c518 [misc][perf] remove old code (#11425) youkaichao 2024-12-23 00:01:08 -08:00
048fc57a0f [CI] Unboock H100 Benchmark (#11419) Simon Mo 2024-12-22 14:17:43 -08:00
f1d1bf6288 [Bugfix] Fix fully sharded LoRAs with Mixtral (#11390) Jason T. Greene 2024-12-22 09:25:10 -06:00
72d9c316d3 [cd][release] fix race conditions (#11407) youkaichao 2024-12-22 00:39:11 -08:00
4a9139780a [cd][release] add pypi index for every commit and nightly build (#11404) youkaichao 2024-12-21 23:53:44 -08:00
29c748930e [CI] Fix flaky entrypoint tests (#11403) Roger Wang 2024-12-21 21:08:44 -08:00
c2d1b075ba [Bugfix] Fix issues for Pixtral-Large-Instruct-2411 (#11393) Roger Wang 2024-12-21 02:15:03 -08:00
584f0ae40d [V1] Make AsyncLLMEngine v1-v0 opaque (#11383) Ricky Xu 2024-12-20 23:14:08 -08:00
51ff216d85 [Bugfix] update should_ignore_layer (#11354) George 2024-12-21 01:36:23 -05:00
dd2b5633dd [V1][Bugfix] Skip hashing empty or None mm_data (#11386) Woosuk Kwon 2024-12-21 14:22:21 +09:00
47a0b615b4 Add ray[default] to wget to run distributed inference out of box (#11265) Jiaxin Shan 2024-12-20 13:54:55 -08:00
5d2248d81a [doc] explain nccl requirements for rlhf (#11381) youkaichao 2024-12-20 13:00:56 -08:00
d573aeadcc [Bugfix] Don't log OpenAI field aliases as ignored (#11378) Michael Goin 2024-12-20 14:03:50 -05:00
995f56236b [Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192) omer-dayan 2024-12-20 18:46:24 +02:00
7c7aa37c69 [CI/Build] fix pre-compiled wheel install for exact tag (#11373) Daniele 2024-12-20 17:14:40 +01:00
04139ade59 [V1] Fix profiling for models with merged input processor (#11370) Roger Wang 2024-12-20 04:04:21 -08:00
1ecc645b8f [doc] backward compatibility for 0.6.4 (#11359) youkaichao 2024-12-19 21:33:53 -08:00
c954f21ac0 [misc] add early error message for custom ops (#11355) youkaichao 2024-12-19 21:18:25 -08:00
86c2d8fd1c [Bugfix] Fix spec decoding when seed is none in a batch (#10863) Wallas Henrique 2024-12-20 02:15:31 -03:00
b880ffb87e [Misc] Add tqdm progress bar during graph capture (#11349) Michael Goin 2024-12-19 23:35:18 -05:00
7801f56ed7 [ci][gh200] dockerfile clean up (#11351) youkaichao 2024-12-19 18:13:06 -08:00
48edab8041 [Bugfix][Hardware][POWERPC] Fix auto dtype failure in case of POWER10 (#11331) Akash kaothalkar 2024-12-20 07:02:07 +05:30
a985f7af9f [CI] Adding CPU docker pipeline (#11261) Yuan 2024-12-20 03:46:55 +08:00
e461c262f0 [Misc] Remove unused vllm/block.py (#11336) yangzhibin 2024-12-20 01:54:24 +08:00
276738ce0f [Bugfix] Fix broken CPU compressed-tensors test (#11338) Isotr0py 2024-12-20 01:37:31 +08:00
cdf22afdda [Misc] Clean up and consolidate LRUCache (#11339) Cyrus Leung 2024-12-20 00:59:32 +08:00
e24113a8fe [Model] Refactor Qwen2-VL to use merged multimodal processor (#11258) Isotr0py 2024-12-20 00:28:00 +08:00
7379b3d4b2 [V1] Fix multimodal profiling for Molmo (#11325) Roger Wang 2024-12-19 08:27:22 -08:00
6c7f881541 [Model] Add JambaForSequenceClassification model (#10860) Yehoshua Cohen 2024-12-19 16:48:06 +02:00
a0f7d53beb [Bugfix] Cleanup Pixtral HF code (#11333) Cyrus Leung 2024-12-19 21:22:00 +08:00
5aef49806d [Feature] Add load generation config from model (#11164) Yanyi Liu 2024-12-19 18:50:38 +08:00
98356735ac [misc] benchmark_throughput : Add LoRA (#11267) Varun Sundar Rabindranath 2024-12-19 02:43:16 -05:00
f26c4aeecb [Misc] Optimize ray worker initialization time (#11275) Rui Qiao 2024-12-18 23:38:02 -08:00
8936316d58 [Kernel] Refactor Cutlass c3x (#10049) Varun Sundar Rabindranath 2024-12-19 02:00:18 -05:00
6142ef0ada [VLM] Merged multimodal processor for Qwen2-Audio (#11303) Cyrus Leung 2024-12-19 14:14:17 +08:00
c6b0a7d3ba [V1] Simplify prefix caching logic by removing num_evictable_computed_blocks (#11310) Chen Zhang 2024-12-18 20:17:12 -08:00
a30482f054 [CI] Expand test_guided_generate to test all backends (#11313) Michael Goin 2024-12-18 23:00:38 -05:00
17ca964273 [Model] IBM Granite 3.1 (#11307) Travis Johnson 2024-12-18 20:27:24 -07:00
5a9da2e6e9 [Bugfix][Build/CI] Fix sparse CUTLASS compilation on CUDA [12.0, 12.2) (#11311) Tyler Michael Smith 2024-12-18 21:43:30 -05:00
fdea8ec167 [V1] VLM - enable processor cache by default (#11305) Alexander Matveev 2024-12-18 18:54:46 -05:00
ca5f54a9b9 [Bugfix] fix minicpmv test (#11304) Joe Runde 2024-12-18 10:34:26 -08:00
f954fe0e65 [FIX] update openai version (#11287) Kunshang Ji 2024-12-19 02:17:05 +08:00
362cff1eb3 [CI][Misc] Remove Github Action Release Workflow (#11274) Simon Mo 2024-12-18 10:16:53 -08:00
996aa70f00 [Bugfix] Fix broken phi3-v mm_processor_kwargs tests (#11263) Isotr0py 2024-12-19 02:16:40 +08:00
60508ffda9 [Kernel]: Cutlass 2:4 Sparsity + FP8/Int8 Quant Support (#10995) Dipika Sikka 2024-12-18 09:57:16 -05:00
f04e407e6b [MISC][XPU]update ipex link for CI fix (#11278) Yan Ma 2024-12-18 14:34:23 +08:00
8b79f9e107 [Bugfix] Fix guided decoding with tokenizer mode mistral (#11046) Wallas Henrique 2024-12-18 03:34:08 -03:00
866fa4550d [Bugfix] Restore support for larger block sizes (#11259) Konrad Zawora 2024-12-18 01:39:07 +01:00
bf8717ebae [V1] Prefix caching for vision language models (#11187) Cody Yu 2024-12-17 16:37:59 -08:00
c77eb8a33c [Bugfix] Set temperature=0.7 in test_guided_choice_chat (#11264) Michael Goin 2024-12-17 19:34:06 -05:00
2d1b9baa8f [Bugfix] Fix request cancellation without polling (#11190) v0.6.5 Joe Runde 2024-12-17 13:26:32 -07:00
f9ecbb18bf [Misc] Allow passing logits_soft_cap for xformers backend (#11252) Isotr0py 2024-12-17 16:37:04 +08:00
02222a0256 [Misc] Kernel Benchmark for RMSNorm (#11241) Roger Wang 2024-12-16 22:57:02 -08:00
2bfdbf2a36 [V1][Core] Use weakref.finalize instead of atexit (#11242) Tyler Michael Smith 2024-12-17 01:11:33 -05:00
e88db68cf5 [Platform] platform agnostic for EngineArgs initialization (#11225) wangxiyuan 2024-12-17 14:11:06 +08:00
59c9b6ebeb [V1][VLM] Proper memory profiling for image language models (#11210) Roger Wang 2024-12-16 22:10:57 -08:00
66d4b16724 [Frontend] Add OpenAI API support for input_audio (#11027) kYLe 2024-12-17 00:09:58 -06:00
0064f697d3 [CI] Add test case with JSON schema using references + use xgrammar by default with OpenAI parse (#10935) Michael Goin 2024-12-16 22:39:58 -05:00
35bae114a8 fix gh200 tests on main (#11246) youkaichao 2024-12-16 17:22:38 -08:00
88a412ed3d [torch.compile] fast inductor (#11108) youkaichao 2024-12-16 16:15:22 -08:00
c301616ed2 [ci][tests] add gh200 tests (#11244) youkaichao 2024-12-16 15:53:18 -08:00
35ffa682b1 [Docs] hint to enable use of GPU performance counters in profiling tools for multi-node distributed serving (#11235) bk-TurbaAI 2024-12-16 23:20:39 +01:00
551603feff [core] overhaul memory profiling and fix backward compatibility (#10511) youkaichao 2024-12-16 13:32:25 -08:00
efbce85f4d [misc] Layerwise profile updates (#10242) Varun Sundar Rabindranath 2024-12-16 13:14:57 -05:00
2ca830dbaa [Doc] Reorder vision language examples in alphabet order (#11228) Isotr0py 2024-12-16 19:23:33 +08:00
d927dbcd88 [Model] Refactor Ultravox to use merged input processor (#11198) Isotr0py 2024-12-16 18:09:53 +08:00
bddbbcb132 [Model] Support Cohere2ForCausalLM (Cohere R7B) (#11203) Jani Monoses 2024-12-16 11:56:19 +02:00
b3b1526f03 WIP: [CI/Build] simplify Dockerfile build for ARM64 / GH200 (#11212) cennn 2024-12-16 17:20:49 +08:00
17138af7c4 [Bugfix] Fix the default value for temperature in ChatCompletionRequest (#11219) yansh97 2024-12-16 16:15:40 +08:00
69ba344de8 [Bugfix] Fix block size validation (#10938) chenqianfzh 2024-12-15 16:38:40 -08:00
da6f409246 Update deploying_with_k8s.rst (#10922) AlexHe99 2024-12-16 08:33:58 +08:00
25ebed2f8c [V1][Minor] Cache np arange to reduce input preparation overhead (#11214) Woosuk Kwon 2024-12-15 13:33:00 -08:00
d263bd9df7 [Core] Support disaggregated prefill with Mooncake Transfer Engine (#10884) shangmingc 2024-12-16 05:28:18 +08:00
38e599d6a8 [Doc] add documentation for disaggregated prefilling (#11197) Kuntai Du 2024-12-15 13:31:16 -06:00
96d673e0f8 [Bugfix] Fix error handling of unsupported sliding window (#11213) Cyrus Leung 2024-12-16 01:59:42 +08:00
b10609e6a1 [Misc] Clean up multi-modal processor (#11207) Cyrus Leung 2024-12-15 14:30:28 +08:00
a1c02058ba [torch.compile] allow tracking forward time (#11081) youkaichao 2024-12-14 19:45:00 -08:00
15859f2357 [[Misc]Upgrade bitsandbytes to the latest version 0.45.0 (#11201) Jee Jee Li 2024-12-15 11:03:06 +08:00
886936837c [Performance][Core] Optimize the performance of evictor v1 and v2 by applying a priority queue and lazy deletion (#7209) Sungjae Lee 2024-12-15 04:38:10 +09:00

... 119 120 121 122 123 ...