Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

371f7e4ca2 [Doc] Fix broken links and unlinked docs, add shortcuts to home sidebar (#18627) Cyrus Leung 2025-05-24 01:22:40 +08:00
15b45ffb9a [Doc] Avoid documenting dynamic / internal modules (#18626) Cyrus Leung 2025-05-24 00:58:02 +08:00
273cb3b4d9 [Doc] Fix top-level API links/docs (#18621) Cyrus Leung 2025-05-24 00:46:56 +08:00
8ddd1cf26a [Doc] fix list formatting (#18624) David Xia 2025-05-23 12:41:17 -04:00
6550114c9c [v1] Redo "Support multiple KV cache groups in GPU model runner (#17945)" (#18593) Chen Zhang 2025-05-24 00:39:47 +08:00
9520a989df [Docs] Change mkdocs to not use directory urls (#18622) Michael Goin 2025-05-23 12:33:21 -04:00
3d28ad343f Fix figures in design doc (#18612) Harry Mellor 2025-05-23 17:09:54 +01:00
6a7988c55b Refactor pplx init logic to make it modular (prepare for deepep) (#18200) youkaichao 2025-05-23 23:43:43 +08:00
022d8abe29 [Doc] Use a different color for the announcement (#18616) Cyrus Leung 2025-05-23 23:25:03 +08:00
5221815a00 [Doc] Fix markdown list indentation for MkDocs rendering (#18620) Hyogeun Oh (오효근) 2025-05-24 00:23:21 +09:00
1068556b2c [Bugfix][Build/CI] Fixup CUDA compiler version check for CUDA_SUPPORTED_ARCHS (#18579) Simon Mo 2025-05-23 07:43:58 -07:00
2cd1fa4556 [Misc] add Haystack integration (#18601) Reid 2025-05-23 21:21:19 +08:00
d4c2919760 Include private attributes in API documentation (#18614) Harry Mellor 2025-05-23 15:18:31 +02:00
6220f3c6b0 [Bugfix] Fix transformers model impl ignored for mixtral quant (#18602) Tristan Leclercq 2025-05-23 14:54:13 +02:00
52fb23f47e Fix examples with code blocks in docs (#18609) Harry Mellor 2025-05-23 14:53:44 +02:00
6dd51c7ef1 [CI/Build] Fix V1 flag being set in entrypoints tests (#18598) Cyrus Leung 2025-05-23 20:51:53 +08:00
2edb533af2 Replace {func} with mkdocs style links (#18610) Harry Mellor 2025-05-23 14:51:38 +02:00
38a95cb4a8 [Doc] Fix indent of contributing to vllm (#18611) Hyogeun Oh (오효근) 2025-05-23 21:50:07 +09:00
cd821ea5d2 [CI] fix kv_cache_type argument (#18594) Ning Xie 2025-05-23 19:49:18 +08:00
7ab056c273 [Hardware][CPU] Update intel_extension_for_pytorch 2.7.0 and move to requirements/cpu.txt (#18542) Kay Yan 2025-05-23 19:38:42 +08:00
6526e05111 Add myself as docs code owner (#18605) Harry Mellor 2025-05-23 13:08:31 +02:00
e493e48524 [V0][Bugfix] Fix parallel sampling performance regression when guided decoding is enabled (#17731) Madeesh Kannan 2025-05-23 12:38:23 +02:00
4ce64e2df4 [Bugfix][Model] Fix baichuan model loader for tp (#18597) Mengqing Cao 2025-05-23 17:39:05 +08:00
fbb13a2c15 Revert "[V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034)" (#18600) Cyrus Leung 2025-05-23 17:18:22 +08:00
a1fe24d961 Migrate docs from Sphinx to MkDocs (#18145) Harry Mellor 2025-05-23 11:09:53 +02:00
d0bc2f810b [Bugfix] Add half type support in reshape_and_cache_cpu_impl on x86 cpu platform (#18430) Yuqi Zhang 2025-05-23 01:41:37 -07:00
b046cf792d [Feature][V1]: suupports cached_tokens in response usage (#18149) Chauncey 2025-05-23 16:41:03 +08:00
54af915949 [Doc] Update quickstart and install for cu128 using --torch-backend=auto (#18505) Michael Goin 2025-05-23 04:36:37 -04:00
71ea614d4a [Feature]Add async tensor parallelism using compilation pass (#17882) cascade 2025-05-23 01:03:34 -07:00
4c611348a7 [V1] [Bugfix] eagle bugfix and enable correct lm_head for multimodal (#18034) RonaldBXu 2025-05-23 00:37:18 -07:00
60cad94b86 [Hardware] correct method signatures for HPU,ROCm,XPU (#18551) Ning Xie 2025-05-23 13:31:59 +08:00
9c1baa5bc6 [Misc] Replace cuda hard code with current_platform (#16983) Shanshan Shen 2025-05-23 12:38:50 +08:00
4be2255c81 [Bugfix][Benchmarks] Fix a benchmark of deepspeed-mii backend to use api_key (#17291) Teruaki Ishizaki 2025-05-23 13:30:47 +09:00
ed5d408255 [Neuron] Remove bypass on EAGLEConfig and add a test (#18514) aws-elaineyz 2025-05-22 21:26:32 -07:00
583507d130 [Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488) Benjamin Chislett 2025-05-22 23:17:39 -04:00
e44d8ce8c7 [Bugfix] Set KVTransferConfig.engine_id in post_init (#18576) lkchen 2025-05-22 19:54:42 -07:00
93ecb8139c [BugFix] Increase TP execute_model timeout (#18558) Nick Hill 2025-05-22 19:22:11 -07:00
fae453f8ce [Misc] refactor: simplify input validation and num_requests handling in _convert_v1_inputs (#18482) CYJiang 2025-05-23 10:15:32 +08:00
4b0da7b60e Enable hybrid attention models for Transformers backend (#18494) Harry Mellor 2025-05-23 04:12:08 +02:00
c6b636f9fb [V1][Spec Decoding] Use model_loader.get_model() to load models (#18273) Mark McLoughlin 2025-05-23 03:05:44 +01:00
04eb88dc80 Re-submit: Fix: Proper RGBA -> RGB conversion for PIL images. (#18569) Chenheli Hua 2025-05-22 18:59:18 -07:00
46791e1b4b [AMD] [P/D] Compute num gpus for ROCm correctly in run_accuracy_test.sh (#18568) rasmith 2025-05-22 20:45:35 -05:00
c32e249a23 [Frontend] [Core] Add Tensorizer support for V1, LoRA adapter serialization and deserialization (#17926) Sanger Steel 2025-05-22 21:44:18 -04:00
c91fe7b1b9 [Frontend][Bug Fix] Update llama4 pythonic jinja template and llama4_pythonic parser (#17917) Kai Wu 2025-05-22 16:44:08 -07:00
a04720bc36 [V1][Spec Decode][Bugfix] Load quantize weights for EAGLE (#18290) Ekagra Ranjan 2025-05-22 18:17:33 -04:00
7b9d832c80 [Tool] Add NIXL installation script (#18172) lkchen 2025-05-22 14:33:16 -07:00
6e588da0f4 [Build/CI] Fix CUDA 11.8 build (#17679) Tyler Michael Smith 2025-05-22 15:13:54 -04:00
f8d2cc5f55 [Compile][Platform] Make PiecewiseBackend pluggable and extendable (#18076) Mengqing Cao 2025-05-23 03:11:53 +08:00
721fb9b181 [Platform] Move platform check to right place (#18470) wangxiyuan 2025-05-23 03:11:28 +08:00
1f3a1200e4 [Bugfix] make test_openai_schema.py pass (#18224) David Xia 2025-05-22 14:34:06 -04:00
54631f8262 [Misc] Call ndarray.tobytes() directly instead of ndarray.data.tobytes() (#18347) Lukas Geiger 2025-05-22 17:00:13 +01:00
cb506ecb5a [Misc] improve Automatic Prefix Caching example (#18554) Reid 2025-05-22 22:50:46 +08:00
93f71673ce [BugFix][CPU] Fix x86 SHM distributed module initialization (#18536) Li, Jiang 2025-05-22 22:35:00 +08:00
3f505233fd [Doc] Add stream flag for chat completion example (#18524) Calvin Chen 2025-05-22 22:07:10 +08:00
4e04eceb58 [Bugfix] Use random hidden states in dummy sampler run (#18543) Bowen Wang 2025-05-22 06:48:56 -07:00
71075029f2 [Doc] Support --stream arg in openai_completion_client.py script (#18388) CYJiang 2025-05-22 21:20:17 +08:00
ca86a7cf6e [CI/Build] Update bamba test model location (#18544) Harry Mellor 2025-05-22 15:01:07 +02:00
a35a494745 [Bugfix] Add kwargs to RequestOutput __init__ to be forward compatible (#18513) lkchen 2025-05-22 05:24:43 -07:00
f6037d1907 [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18526) 燃 2025-05-22 20:22:53 +08:00
fa72f9a812 Order sequence ids + config update to support specifying custom quantization layers (#18279) aws-elaineyz 2025-05-22 02:20:36 -07:00
ebed81fbf5 Update default neuron config for speculation (#18274) aws-elaineyz 2025-05-22 02:18:55 -07:00
e2d7d31244 [Neuron] Update Dockerfile.neuron to use latest neuron release (2.23) (#18512) Satyajith Chilappagari 2025-05-22 02:17:34 -07:00
23b67b37b2 [Doc] Fix invalid JSON in example args (#18527) Cyrus Leung 2025-05-22 15:11:46 +08:00
db5a29ba19 [Bugfix] Fix LoRA test (#18518) Jee Jee Li 2025-05-22 12:48:53 +08:00
51797775c3 [Bugfix][Model] Make Olmo2Model weight loading return loaded weights (#18504) Shane A 2025-05-21 21:17:03 -07:00
cf5984b2fe [BugFix][DP] Send DP wave completion only from dp_rank==0 (#18502) Nick Hill 2025-05-21 20:25:25 -07:00
d022115cc6 [Bugfix] Inconsistent token calculation compared to HF in llava family (#18479) youngrok cha 2025-05-22 12:21:47 +09:00
acb54ca8e1 Intialize io_thread_pool attribute in the beginning. (#18331) Rabi Mishra 2025-05-22 08:51:14 +05:30
6e0fd34d3c [CI] Fix race condition with StatelessProcessGroup.barrier (#18506) Russell Bryant 2025-05-21 23:19:13 -04:00
176d62e4ea [MISC] update project urls in pyproject.toml (#18519) Ning Xie 2025-05-22 11:17:34 +08:00
20bd6f4d2e [FalconH1] Fix output dtype in RMSNorm fallback path for Falcon-H1 (e.g. 0.5B) (#18500) Dhia Eddine Rhaiem 2025-05-22 06:23:59 +04:00
1f079540db [Bugfix] Consistent ascii handling in tool parsers (#17704) Sebastian Schoennenbeck 2025-05-21 22:41:23 +02:00
94d8ec8d2b [FEAT][ROCm] Upgrade AITER MLA v1 backend (#18338) vllmellm 2025-05-22 01:34:28 +08:00
bb0a311213 Revert "[v1] Support multiple KV cache groups in GPU model runner (#17945) (#18459) Mark McLoughlin 2025-05-21 18:25:23 +01:00
dd5fa7e04f [ROCm][Kernel][V1] Enable AMD Radeon GPU Custom Paged Attention on v1 (#17004) Hosang 2025-05-21 11:35:00 -04:00
2b16104557 [Misc] Update deprecation message for --enable-reasoning (#18404) Hyogeun Oh (오효근) 2025-05-21 23:33:11 +09:00
371376f996 [Build] fix Dockerfile shell (#18402) Kebe 2025-05-21 22:32:06 +08:00
c6c10ca920 [Bugfix] Reduce moe_sum test size to avoid OOM (#18484) bnellnm 2025-05-21 09:46:39 -04:00
c154d89306 [Doc] fix arg docstring in linear layers (#18410) GiantCroc 2025-05-21 21:45:57 +08:00
eca18691d2 [MODEL] FalconH1 (#18406) Dhia Eddine Rhaiem 2025-05-21 15:59:06 +04:00
61acfc45bc [Bugfix][Failing Test] Fix test_events.py (#18460) Rabi Mishra 2025-05-21 17:27:28 +05:30
107f5fc4cb [Misc] refactor disaggregated-prefill-v1 example (#18474) Reid 2025-05-21 19:10:14 +08:00
907f935de9 [V1] Fix general plugins not loaded in engine for multiproc (#18326) Yong Hoon Shin 2025-05-21 01:21:49 -07:00
5d7f545204 [Frontend] deprecate --device arg (#18399) Kebe 2025-05-21 16:21:17 +08:00
cd8dfc6dfc [Misc] MultiConnector._connectors type (#18423) Nicolò Lucchesi 2025-05-21 07:48:43 +02:00
d06dd72ba9 [Bugfix][Failing Test] Fix nixl connector test when promt size < block size (#18429) wwl2755 2025-05-21 00:41:44 -05:00
ad0012a0ac Revert "[Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407)" (#18456) Cyrus Leung 2025-05-21 13:39:22 +08:00
92247c522e [Bug] Fix moe_sum signature (#18440) bnellnm 2025-05-21 01:37:08 -04:00
0c15c2e486 [Bugfix] config.head_dim is now explicitly set to None (#18432) Gregory Shtrasberg 2025-05-21 00:04:33 -04:00
3b17ea26e4 [TPU] Re-enable the Pallas MoE kernel (#18025) Michael Goin 2025-05-20 22:52:27 -04:00
23baa2180b fix:Build torch wheel inline rather than picking from nightly (#18351) Dilip Gowda Bhagavan 2025-05-21 03:52:24 +05:30
980a172474 [Kernel] update comment for KV shape in unified triton attn (#18099) Percy 2025-05-20 13:19:34 -05:00
e1f5a71ed7 [Model] use AutoWeightsLoader for bloom (#18300) Calvin Chen 2025-05-21 00:40:05 +08:00
f4a8a37465 [Minor] Rename quantization nvfp4 to modelopt_fp4 (#18356) Michael Goin 2025-05-20 12:08:37 -04:00
8f55962a7f [Misc] refactor prompt embedding examples (#18405) Reid 2025-05-20 23:26:12 +08:00
be48360c1f [Bugfix] Fix MRoPE Errors in the Qwen-VL Model When Processing Pure Text (#18407) 燃 2025-05-20 21:59:48 +08:00
86847700d7 [CI] Add mteb testing to test the accuracy of the embedding model (#17175) wang.yuqi 2025-05-20 21:51:12 +08:00
d6c86d09ae Update cpu.txt (#18398) 汪志鹏 2025-05-20 18:53:23 +08:00
6b35cb10a0 [Misc] Add LoRA code owner (#18387) Jee Jee Li 2025-05-20 18:27:30 +08:00
1b1e8e05ff [doc] update env variable export (#18391) Reid 2025-05-20 16:53:27 +08:00

... 91 92 93 94 95 ...