Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

bf214ca226 [Misc] Fix examples openai_pooling_client.py (#24853) wang.yuqi 2025-09-15 19:57:30 +08:00
2e41f5abca [XPU] Set consistent default KV cache layout (#24745) Nicolò Lucchesi 2025-09-15 12:09:34 +02:00
bc0f6059a2 [UT] enhance free kv cache block queue popleft_n (#24220) Ning Xie 2025-09-15 18:04:37 +08:00
8de261b04a [P/D]kv_output_aggregator support P TP > D TP (#23917) Chao Lei 2025-09-15 17:36:06 +08:00
a0d8b9738d [Misc] Own KVConnectors installation (#24867) Nicolò Lucchesi 2025-09-15 11:21:09 +02:00
59e17dd4a0 [Misc] rename interval to max_recent_requests (#24229) Ning Xie 2025-09-15 17:18:42 +08:00
4979eb79da [Doc]: fix typos in various files (#24821) Didier Durand 2025-09-15 10:08:52 +02:00
a8c0f59973 [Bugfix] MiDashengLM model contact error under concurrent testing (#24738) bingchen-mi 2025-09-15 14:38:12 +08:00
f4a948f33f [Frontend] Skip stop in reasoning content (#14550) Ce Gao 2025-09-15 14:04:55 +08:00
3f3313981c [kv cache] update num_free_blocks in the end (#24228) Ning Xie 2025-09-15 13:15:12 +08:00
78818dd1b0 [Docs] Have a try to improve frameworks/streamlit.md (#24841) Michael Yao 2025-09-15 12:50:36 +08:00
8e5cdcda4e [Hybrid Allocator] Support Pipeline Parallel (#23974) Chen Zhang 2025-09-14 15:55:17 -07:00
90f3f7d73e [Spec Decoding]Support Spec Decoding Metrics in DP Mode (#24049) wuhang 2025-09-15 05:11:09 +08:00
6dc8da5dc1 [Chore] Remove ipex_ops warning (#24835) Robert Shaw 2025-09-14 15:41:53 -04:00
79cbcab871 Force use C++17 globally to avoid compilation error (#24823) FengjinChen 2025-09-15 03:30:10 +08:00
ff68035932 [Benchmarks] Throw usage error when using dataset-name random and dataset-path together (#24819) Ye (Charlotte) Qi 2025-09-14 10:50:01 -07:00
1177dd53e9 fix type of sampling rate for encode_base64 (#24826) co63oc 2025-09-15 00:17:16 +08:00
fc2dbcda8b [Perf] Fix DeepGEMM Contiguous Layout Issue, 5.5% Throughput Improvement (#24783) Wentao Ye 2025-09-14 11:20:17 -04:00
fec347dee1 [Misc] Improve s3_utils type hints with BaseClient (#24825) Hyogeun Oh (오효근) 2025-09-14 21:11:14 +09:00
cc3173ae98 [Multi Modal][Performance] Fused Q,K's apply_rope into one (#24511) Wenlong Wang 2025-09-14 01:10:21 -07:00
3e903b6cb4 [Chore] Minor simplification for non-PP path (#24810) Woosuk Kwon 2025-09-13 17:41:36 -07:00
973c9d01da [Minor] Simplify duplicative device check for cuda (#24793) Victor Ziliang Peng 2025-09-13 11:28:38 -07:00
26b999c71a [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750) Michael Goin 2025-09-13 03:29:19 -04:00
e925187f6d Merge branch 'main' into wye-refactor-quant-folder ci/build/22474 yewentao256 2025-09-13 07:38:47 -07:00
15b8fef453 Remove redundant assignment in xfer_buffers, This is a little fix (#24732) TaoYu Chen 2025-09-13 16:11:59 +08:00
cfa3234a5b [CI][Spec Decode] Adjust threshold for flaky ngram spec decoding test again (#24771) Wenlong Wang 2025-09-13 00:45:11 -07:00
41ae4a1eab [Doc]: fix typos in various files (#24798) Didier Durand 2025-09-13 09:43:33 +02:00
4dad72f0d9 [Misc] Correct an outdated comment. (#24765) Russell Bryant 2025-09-13 03:34:53 -04:00
59d7ffc17f [CI Failure] Fix test_flashinfer_cutlass_mxfp4_mxfp8_fused_moe (#24750) Michael Goin 2025-09-13 03:29:19 -04:00
1da0f1441d [Core][Multimodal] Cache supports_kw (#24773) Lukas Geiger 2025-09-13 08:27:04 +01:00
98229db244 [Kernels][DP/EP] Optimize Silu Kernel for R1 (#24054) Elvir Crnčević 2025-09-13 09:17:27 +02:00
dbeee3844c [Perf] Use NVIDIA hardware-accelerated instruction for float to fp8_e4m3 quantization (#24757) elvischenv 2025-09-13 15:16:24 +08:00
30498f2a65 [Doc]: Remove 404 hyperlinks (#24785) Rakesh Asapanna 2025-09-13 12:45:41 +05:30
abc7989adc [Docs] Remove Neuron install doc as backend no longer exists (#24396) Harry Mellor 2025-09-13 08:15:03 +01:00
9a8966bcc2 [Docs] Fix warnings in mkdocs build (continued) (#24791) Hyogeun Oh (오효근) 2025-09-13 16:13:44 +09:00
5febdc8750 [Chore] Remove unused batched RoPE op & kernel (#24789) Woosuk Kwon 2025-09-13 00:08:20 -07:00
99bfef841f [Bugfix] Fix GPUModelRunner has no attribute lora_manager (#24762) Jee Jee Li 2025-09-13 14:55:14 +08:00
da3fa78dc9 [Compilation Bug] Fix Inductor Graph Output with Shape Issue (#24772) v0.10.2rc3 Wentao Ye 2025-09-12 17:23:05 -04:00
bbb70036cb Enable conversion of multimodal models to pooling tasks (#24451) Maximilien de Bayser 2025-09-12 00:30:41 -03:00
89da8d9d09 [Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) (#24667) Tao He 2025-09-13 06:31:32 +08:00
01085b134d [Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739) Elvir Crnčević 2025-09-12 16:54:04 +02:00
66160a9943 [BugFix] Fix Qwen3-Next PP (#24709) Nick Hill 2025-09-11 23:35:04 -07:00
eaca762c18 [Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707) Jee Jee Li 2025-09-12 10:06:26 +08:00
89e08d6d18 [Model] Add Olmo3 model implementation (#24534) Shane A 2025-09-12 20:26:21 -07:00
7f2ea7074e [Frontend][Multimodal] Allow skipping media data when UUIDs are provided. (#23950) Chenheli Hua 2025-09-12 19:16:06 -07:00
4fdd6f5cbf [Core] Support async scheduling with uniproc executor (#24219) Nick Hill 2025-09-12 16:34:28 -07:00
8226dd56bf [Qwen3Next] Fixes the cuda graph capture conditions under large batch sizes (#24660) (#24667) Tao He 2025-09-13 06:31:32 +08:00
5fe643fc26 Add FLASHINFER_MLA to backend selector test (#24753) Matthew Bonanni 2025-09-12 18:30:07 -04:00
7ba32aa60b [Attention][FlashInfer] Enable FP8 FlashInfer (TRTLLM) MLA decode (#24705) Matthew Bonanni 2025-09-12 17:45:53 -04:00
c89ed8de43 Invert pattern order to make sure that out_proj layers are identified (#24781) Alexandre Marques 2025-09-12 17:45:29 -04:00
3beadc2f25 [Compilation Bug] Fix Inductor Graph Output with Shape Issue (#24772) Wentao Ye 2025-09-12 17:23:05 -04:00
bc636f21a6 [Benchmark] Allow arbitrary headers to be passed to benchmarked endpoints (#23937) Clayton Coleman 2025-09-12 16:57:53 -04:00
017354c0ef [CI] Trigger BC Linter when labels are added/removed (#24767) Zhewen Li 2025-09-12 11:44:36 -07:00
1e3e56abfc Merge branch 'main' into wye-refactor-quant-folder Wentao Ye 2025-09-12 14:17:56 -04:00
010acc6e1e [Bugfix] Fix incompatibility between #20452 and #24548 (#24754) Cyrus Leung 2025-09-13 02:17:29 +08:00
c8c42597ab [CI] Speed up model unit tests in CI (#24253) afeldman-nm 2025-09-12 13:36:50 -04:00
9d2a44606d [UX] Remove AsyncLLM torch profiler disabled log (#24609) Michael Goin 2025-09-12 13:08:44 -04:00
f17c075884 [Model] Switch to Fused RMSNorm in GLM-4.1V model (#24733) Samit 2025-09-13 00:12:23 +08:00
b0d1213ac3 [Models] Prevent CUDA sync in Qwen2.5-VL (#24741) Lukas Geiger 2025-09-12 17:03:55 +01:00
57f94e88ea [Models] Optimise and simplify _validate_and_reshape_mm_tensor (#24742) Lukas Geiger 2025-09-12 16:37:37 +01:00
684b6870e1 [Bugfix][Frontend] Fix --enable-log-outputs does not match the documentation (#24626) Kebe 2025-09-13 00:01:24 +09:00
1facf77094 Merge branch 'main' into wye-refactor-quant-folder yewentao256 2025-09-12 08:00:41 -07:00
a5b84f1cbf [Core] Shared memory based object store for Multimodal data caching and IPC (#20452) dongluw 2025-09-12 10:54:17 -04:00
9f04d9d55f [Qwen3-Next] MoE configs for H100 TP=1,2 and TP2/EP (#24739) Elvir Crnčević 2025-09-12 16:54:04 +02:00
4d7c1d531b [Bugfix] Fix MRoPE dispatch on XPU (#24724) Yan Ma 2025-09-12 21:43:56 +08:00
41f17bf290 [Docs] Fix warnings in mkdocs build (continued) (#24740) Hyogeun Oh (오효근) 2025-09-12 22:43:15 +09:00
bcb06d7baf [Doc]: fix typos in various files (#24726) Didier Durand 2025-09-12 15:43:12 +02:00
0377802c20 [Multimodal] Remove legacy multimodal fields in favor of MultiModalFeatureSpec (#24548) Flora Feng 2025-09-12 06:42:23 -07:00
72fc8aa412 [Multi Modal] Add FA3 in VIT (#24347) Wenlong Wang 2025-09-12 06:27:24 -07:00
fdb09c77d6 [sleep mode] save memory for on-the-fly quantization (#24731) youkaichao 2025-09-12 19:25:19 +08:00
7a1c4025f1 [Kernel] [CPU] refactor cpu_attn.py:_run_sdpa_forward for better memory access (#24701) Ignacio Sica 2025-09-12 08:23:07 -03:00
60a0951924 [Bugfix] Fix BNB name match (#24735) Jee Jee Li 2025-09-12 19:12:01 +08:00
64d90c3e4f [Misc][gpt-oss] Add gpt-oss label to PRs that mention harmony or related to builtin tool call (#24717) Chen Zhang 2025-09-12 03:57:07 -07:00
59d5d2c736 [CI/Build] Skip prompt embeddings tests on V1-only CPU backend (#24721) Li, Jiang 2025-09-12 18:51:01 +08:00
d21a36f5f9 [CI] Add ci_envs for convenient local testing (#24630) wang.yuqi 2025-09-12 16:52:25 +08:00
561a0baee0 [CI] Fix flaky test v1/worker/test_gpu_model_runner.py::test_kv_cache_stride_order (#24640) Chen Zhang 2025-09-12 00:49:09 -07:00
f592b3174b [BugFix] Fix Qwen3-Next PP (#24709) Nick Hill 2025-09-11 23:35:04 -07:00
7920de0a2a [Bugfix] Fix MRoPE dispatch on CPU (#24712) Li, Jiang 2025-09-12 12:56:31 +08:00
ddcec289c7 Fix implementation divergence for BLOOM models between vLLM and HuggingFace when using prompt embeds (#24686) Andrew Sansom 2025-09-11 23:35:48 -05:00
e090b7b45b Enable conversion of multimodal models to pooling tasks (#24451) Maximilien de Bayser 2025-09-12 00:30:41 -03:00
6a50eaa0d3 [DOCs] Update ROCm installation docs section (#24691) Gregory Shtrasberg 2025-09-11 23:02:53 -04:00
12a8414d81 [Qwen3-Next] MoE configs for H20 TP=1,2,4,8 (#24707) Jee Jee Li 2025-09-12 10:06:26 +08:00
880c741bb6 [Bugfix] fixes the causal_conv1d_update kernel update non-speculative decoding cases (#24680) v0.10.2rc2 Tao He 2025-09-12 09:16:43 +08:00
40b6c9122b [V1] feat:add engine v1 tracing (#20372) RichardoMu 2025-09-12 08:10:39 +08:00
2e6bc46821 [Startup] Make DeepGEMM warmup scale with max-num-batched-tokens (#24693) Lucas Wilkinson 2025-09-11 20:10:19 -04:00
fcba05c435 [Bug] Fix Layer weight_block_size Assertion Issue (#24674) Wentao Ye 2025-09-11 19:47:59 -04:00
7a30fa8708 [Doc] Clarify cudagraph capture size logic and default behavior in scheduler (#18698) Zazzle516 2025-09-12 07:18:09 +08:00
f82f7a8990 [Qwen3-Next] MOE configs for H100 TP4 (#24699) Chen Zhang 2025-09-11 15:45:52 -07:00
c3aea10dc8 [Perf] Use upstream CUTLASS for SM90 Block FP8 kernel (#23280) Michael Goin 2025-09-11 18:43:14 -04:00
d4fd2768ef [Bugfix][Attention] Fix FlashInfer MLA block size logic (#24692) Matthew Bonanni 2025-09-11 18:39:42 -04:00
7a70a71892 [Qwen3-Next] Add B200 MoE configs for Qwen3-next (#24698) Vadim Gimpelson 2025-09-12 02:34:58 +04:00
7d4651997a [CI/Build] Add bc-linter to vLLM CI (#21234) Zhewen Li 2025-09-11 15:34:36 -07:00
569bf1c9c0 [Qwen3-Next] MoE configs for H200 TP=1,2,4 (#24695) Woosuk Kwon 2025-09-11 14:38:16 -07:00
1ec20355f5 [Bugfix] Set VLLM_ALLREDUCE_USE_SYMM_MEM default to False (#24696) Wentao Ye 2025-09-11 17:32:27 -04:00
e42af78b18 [flashinfer] [kernel] support for fp8 kv cache for trtllm prefill attention (#24197) Xiaozhu Meng 2025-09-11 14:20:09 -07:00
074854b24f [Kernel][B200] mxfp4 fused cutlass moe (#23696) Duncan Moss 2025-09-11 14:04:56 -07:00
79ac59f32e Update Spec Decode metrics to include drafted and accepted token throughput (#24127) Andrew Xia 2025-09-11 12:58:43 -07:00
b971f91504 [BugFix] Fix tokenize asyncio task leak (#24677) Nick Hill 2025-09-11 12:44:04 -07:00
c733bd5e87 [Qwen3-Next] Add MoE Config for H200 (#24688) Woosuk Kwon 2025-09-11 12:40:15 -07:00
a892b259b4 [Doc] Remove Useless Comments (#24687) Wentao Ye 2025-09-11 15:25:47 -04:00

... 63 64 65 66 67 ...