Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

561f38dc3c [Bugfix] Improve EPLB config validation error message (#24524) Tyler Michael Smith 2025-09-09 20:32:36 -04:00
73e688cb79 [ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm (#24275) Charlie Fu 2025-09-09 18:27:35 -05:00
fb1a8f932a [Benchmark] Add option to skip oversampling in benchmark (#24457) Ekagra Ranjan 2025-09-09 18:00:17 -04:00
0dc9cbb527 [Benchmark] Update bench doc with mtbench, blazedit, spec bench (#24450) Ekagra Ranjan 2025-09-09 17:15:41 -04:00
b5fb3005a8 [Log] Use a relative path in debug-level logs to distinguish files with identical names (#23846) Jiangyun Zhu 2025-09-10 04:46:35 +08:00
15de5ff9ea [Feature] Disallow FlashMLA on Blackwell (#24521) Wentao Ye 2025-09-09 14:59:34 -04:00
b8a93076d3 [CI] execute all piecewise compilation tests together (#24502) v0.10.2rc1 Jiangyun Zhu 2025-09-10 02:05:25 +08:00
c3f9773b2c [TPU] Fix tpu structured decoding in mixed batches (#24458) Chenyaaang 2025-09-09 23:34:25 +05:30
3707cb2505 [Docs] Gemma3n transcriptions endpoint support (#24512) Nicolò Lucchesi 2025-09-09 20:03:32 +02:00
920ed46b09 [Misc] bump outlines_core to fix the version conflicts with outlines >= 1.2.0 (#24368) Kazuhiro Serizawa 2025-09-10 02:59:46 +09:00
15cb047e25 Extend renderer with embedding support and integrate completion endpoint (#24405) Flora Feng 2025-09-09 10:46:46 -07:00
9ad0688e43 [Bugfix] Fix hidden_size for multimodal classification model (#24501) Jee Jee Li 2025-09-10 01:37:25 +08:00
b9a1c4c8a2 [ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (#24279) Gregory Shtrasberg 2025-09-09 12:21:56 -04:00
1aa427fdc1 [Kernels] Add Flash Linear Attention Kernels (#24518) youkaichao 2025-09-10 00:04:41 +08:00
1c63a16b65 [Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128) Micah Williamson 2025-09-09 09:38:10 -05:00
922d3b401b [Bugfix] Handle the edge case in detokenizer where processed tokens contain both stop str and eos token (#23938) d.transposed 2025-09-09 16:30:24 +02:00
19332c0479 [Model] Systematic support for fp32 head, pooling models part (#23810) wang.yuqi 2025-09-09 22:29:50 +08:00
a55cf41a09 [Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT (#24123) Wentao Ye 2025-09-09 10:21:10 -04:00
6fb2788163 [CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411) Ye (Charlotte) Qi 2025-09-09 03:02:35 -07:00
3d2a2de8f7 [RL] fast weight update with zmq + ipc handles (#24295) Weixiao Huang 2025-09-09 16:57:46 +08:00
1116590b16 [gpt-oss] Validate gpt-oss python tool during initialization (#23856) Chen Zhang 2025-09-09 01:37:48 -07:00
ccb97338af [Misc] Add Codex settings to gitignore (#24493) Roger Wang 2025-09-09 01:25:44 -07:00
45c9cb5835 [Misc] Add claude settings to gitignore (#24492) Ye (Charlotte) Qi 2025-09-09 01:14:45 -07:00
e283976f3a [Performance][MM] Building the inverse permutation in O(n) time in Qwen2_5_VisionTransformer (#24443) WeiQing Chen 2025-09-09 15:24:11 +08:00
46876dff32 [Doc]: fixing typos to improve docs (#24480) Didier Durand 2025-09-09 08:06:04 +02:00
1823a00d67 [Misc] Support bench serve long context (#24373) Ming Yang 2025-09-08 22:53:10 -07:00
ed16d0f26f [Doc] mention fpdb for multiprocess breakpoints (#24452) Mickaël Seznec 2025-09-09 06:46:45 +02:00
0cdd213641 [Misc] Improve Worker process title and logging prefix (#22205) 22quinn 2025-09-08 21:43:48 -07:00
948dd3443b [Bugfix] Fix Apertus HF repo name (#24447) Cyrus Leung 2025-09-09 12:40:29 +08:00
b2f7745774 Add data_parallel_size to VllmConfig string representation (#24298) cong-meta 2025-09-08 21:35:18 -07:00
82dfb12e52 [Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673) Zebing Lin 2025-09-09 00:34:37 -04:00
bba1042c6f [Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647) elvischenv 2025-09-09 11:53:07 +08:00
b6fbc15634 [BugFix][Model] Fix Ernie4.5-VL hanging on long inputs (#24074) CSWYF3634076 2025-09-09 11:37:16 +08:00
3e0d4a3475 Move KVTransferConfig from config/__init__.py to config/kv_transfer.py (#24434) Harry Mellor 2025-09-09 04:30:32 +01:00
562663a044 Bump actions/github-script from 7.0.1 to 8.0.0 (#24413) dependabot[bot] 2025-09-09 03:12:44 +00:00
ed1623a88a Bump actions/stale from 9.1.0 to 10.0.0 (#24412) dependabot[bot] 2025-09-09 03:11:20 +00:00
13b89bd823 [doc] update vllm serve cli args documentation (#24329) cjackal 2025-09-09 12:07:58 +09:00
22a0070530 Bump actions/setup-python from 5.4.0 to 6.0.0 (#24414) dependabot[bot] 2025-09-09 02:54:58 +00:00
170129eb28 [gpt-oss] Harmony changes with container tool support (#23386) zhiweiz 2025-09-08 19:03:50 -07:00
955c624915 [Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134) Tyler Michael Smith 2025-09-08 22:01:51 -04:00
4f87abdcc6 Update reviewers for modelopt related files (#24468) Zhiyu 2025-09-08 18:53:13 -07:00
6910b56da2 [CI] Add nightly multiarch manifests to dockerhub (#24102) Sahithi Chigurupati 2025-09-08 18:18:09 -07:00
e10fef0883 [Hardware][IBM Z] Fix Outlines Core issue for s390x (#24034) R3hankhan 2025-09-09 05:20:34 +05:30
e680723eba [Bugfix] Disable the statslogger if the api_server_count is greater than 1 (#22227) Chauncey 2025-09-09 06:28:03 +08:00
620db1fc58 [Attention] FlashAttention MLA cudagraph support (#23958) Matthew Bonanni 2025-09-08 15:05:26 -07:00
41183c1fe0 [Spec Decode] Fix offline spec_decode.py (#24257) Ekagra Ranjan 2025-09-08 16:44:13 -04:00
43d9ad03ba [Model loader]: support multi-thread model weight loading (#23928) Yang Kaiyong 2025-09-09 02:49:39 +08:00
7be141b2c5 [CI] Enable encoder model compilation test (#24442) Jiangyun Zhu 2025-09-09 02:48:06 +08:00
8d7f39b48c [Model] Remove quantized mixtral (#24437) Jee Jee Li 2025-09-09 02:02:14 +08:00
cd08636926 [Spec Decode][Benchmark] Add Blitzedit dataset (#23605) Ekagra Ranjan 2025-09-08 13:32:52 -04:00
3feeeb9fea [Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (#23563) Ekagra Ranjan 2025-09-08 13:32:42 -04:00
6f4a82f8b5 [Model] Enable BNB support for qwen2_5_omni_thinker (#24420) Jee Jee Li 2025-09-09 00:37:08 +08:00
c44797a4d6 [Docs]add eplb_config param use docs (#24213) rongfu.leng 2025-09-09 00:36:57 +08:00
55be93baf5 [Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure (#24438) Didier Durand 2025-09-08 18:36:54 +02:00
717fc00e98 [Docs] Move feature compatibility tables to README (#24431) Harry Mellor 2025-09-08 14:45:14 +01:00
01dfb5e982 [Frontend] User-provided uuids for medias in chat. (RFC #22044) (#23449) Chenheli Hua 2025-09-08 06:42:20 -07:00
03dd652c16 Move KVEventsConfig from config/__init__.py to config/kv_events.py (#24433) Harry Mellor 2025-09-08 14:41:27 +01:00
9cd76b71ab [Misc] Terratorch related fixes (#24337) Christian Pinto 2025-09-08 15:40:26 +02:00
e041314184 [Bugfix] Fix mamba2 prefill chunking (#23279) tomeras91 2025-09-08 14:42:41 +03:00
5e537f45b4 [Bugfix] Fix get_quant_config when using modelscope (#24421) Li Wang 2025-09-08 19:03:02 +08:00
c2a8b08fcd [Doc] Fix issues in integrations/llamastack.md (#24428) Michael Yao 2025-09-08 17:28:32 +08:00
f4962a6d55 [Doc]: fix typos in Python comments (#24417) Didier Durand 2025-09-08 09:22:16 +02:00
2f0b833a05 [Docs] Fix a tip indentation and typo (#24419) Michael Yao 2025-09-08 15:19:40 +08:00
425b04b8f4 [gpt-oss][Responses API] Fix the function call id format (#24409) Chauncey 2025-09-08 14:49:52 +08:00
60f0843ef8 [Model] Remove unnecessary CUDA sync of Qwen2VL image and video preprocess (#24334) Chatcharin Sangbutsarakum 2025-09-08 13:11:12 +07:00
8a46602606 [Model] Remove unnecessary CUDA sync of GLM-4.1V image and video preprocess (#24332) Chatcharin Sangbutsarakum 2025-09-08 13:10:54 +07:00
61aa4b2901 [P/D] Add a shutdown method to the Connector API (#22699) Chauncey 2025-09-08 14:07:00 +08:00
8c892b1831 [Doc] Fix UTF-8 encoding issues in documentation generation on Windows (#24361) Al-Ekram Elahee Hridoy 2025-09-07 23:33:52 -06:00
3bca396f79 [CI/Build] Fix local image inputs in test_pixtral.py (#24401) Chenheli Hua 2025-09-07 20:31:35 -07:00
3a3e91bdfe [CI/Build] Disable flaky test_structured_output tests (#24404) 22quinn 2025-09-07 19:51:59 -07:00
b3d7e3c845 [Sampler] Support returning all prompt logprobs (#23868) Xingyu Liu 2025-09-07 19:34:31 -07:00
67841317d1 [xpu] upgrade ipex/python3.12 for xpu (#23830) Yan Ma 2025-09-08 10:07:16 +08:00
86173ad593 [Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA (#24385) Ming Yang 2025-09-07 18:27:12 -07:00
795b6951cd Add @luccafong to codeowner for spec decode (#24397) Lucia Fang 2025-09-07 17:30:27 -07:00
2e5d21378d Skip MM Encoder for non-first PP ranks (#24387) Woosuk Kwon 2025-09-07 09:38:35 -07:00
0661cb9df3 Add renderer-based prompt processing for embedding and classification endpoints (#24356) Flora Feng 2025-09-07 01:26:48 -07:00
105d3d62ef [TPU] Remove TopKTopPSampler dependency for TPU sampler (#24391) Woosuk Kwon 2025-09-07 01:12:36 -07:00
62f66be1f7 [Bugfix] Fix Qwen3-coder moe tuned config (#24072) Jee Jee Li 2025-09-07 13:19:46 +08:00
81c53ef55c [Misc] collect flashinfer version in collect_env.py (#24378) Ye (Charlotte) Qi 2025-09-06 20:30:41 -07:00
75334956c2 QWEN3 Thinking Fused MoE kernels Optimization configs (#24330) Saman A. Pour 2025-09-06 20:18:54 -07:00
77aec83b8c [Benchmark] add benchmark for custom activation op (#23908) Jiangyun Zhu 2025-09-07 11:12:05 +08:00
e67597545b [CI][Fix] deterministic seed for flaky CI runs on structured outputs (#24380) Aaron Pham 2025-09-06 23:10:40 -04:00
37a6fa95fd Migrate Qwen2 inputs to TensorSchema (#23475) Benji Beck 2025-09-06 20:07:31 -07:00
558f0907dc [attention][DCP] use AttentionImpl.need_to_return_lse_for_decode (#24372) youkaichao 2025-09-07 09:18:59 +08:00
4172235ab7 [V0 deprecation] Deprecate V0 Neuron backend (#21159) Woosuk Kwon 2025-09-06 16:15:18 -07:00
848562bd49 break execute_model in gpu_model_runner into sub-functions for custom scopes (#24265) Bangsheng Tang 2025-09-06 14:02:47 -07:00
e68dc2f014 [Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370) elvischenv 2025-09-07 04:39:34 +08:00
a3645ed94d [Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285) Ye (Charlotte) Qi 2025-09-06 13:27:15 -07:00
fb691ee4e7 [Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324) Aaron Pham 2025-09-06 15:10:32 -04:00
6024d115cd Lora bias(enable_lora_bias) deprecate warning (#24339) Ashwin Phadke 2025-09-06 22:12:19 +05:30
7555d6b34a [Bugfix] Fix test_mixtral_moe (#24371) Jee Jee Li 2025-09-07 00:32:03 +08:00
00a4e56d8d [Bugfix] Fix broken deepseek fp8 TP weights loading (#24367) Isotr0py 2025-09-07 00:23:12 +08:00
0eadaeff7e [Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335) mohankku 2025-09-06 08:17:03 -07:00
0077c8634e Add @benchislett to codeowner for spec decode and structured outputs (#24362) Benjamin Chislett 2025-09-06 10:03:35 -04:00
b121ca22ad [CI] Disable flaky structured output test from CI (#24366) Roger Wang 2025-09-06 06:31:56 -07:00
eddaafc1c7 [Multimodal] Improve max video embedding length estimation in V1 (#24312) Roger Wang 2025-09-06 02:33:19 -07:00
305a1cc0d2 refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345) Andrew Sansom 2025-09-06 01:01:23 -05:00
6d6c6b05d3 [New Model]: google/embeddinggemma-300m (#24318) wang.yuqi 2025-09-06 13:58:36 +08:00
53b19ccdd5 [Core] Allow disabling TP sharding for parallel Linear layer (#23024) Isotr0py 2025-09-06 13:53:58 +08:00
6432739ef1 [Bugfix] Catch and log invalid token ids in detokenizer (#24351) Nick Hill 2025-09-05 22:30:22 -07:00

... 65 66 67 68 69 ...