Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

60af7b967b [Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283) TJian 2026-03-27 00:32:25 +08:00
bdc1719eb9 [ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137) Andreas Karatzas 2026-03-26 11:26:46 -05:00
0aac2048bf [Bugfix] Restore CUDA graph persistent buffers for FP8 FlashMLA decode (#35175) haosdent 2026-03-27 00:13:39 +08:00
cb2263218e [Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886) Chuan (Richard) Li 2026-03-26 08:59:24 -07:00
e054f152fa [CI] Add batch invariant test for b200 (#38014) Wentao Ye 2026-03-26 11:54:54 -04:00
0f5b526040 [Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232) zhang-prog 2026-03-26 23:34:49 +08:00
be1a85b7a2 Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169) Zhewen Li 2026-03-26 07:59:09 -07:00
2e225f7bd2 [Renderer] Consolidate factory methods (#38218) Cyrus Leung 2026-03-26 20:19:22 +08:00
757eafcf37 [bug-fix] GLM OCR Patch Merger context_dim (#37962) Jared Wen 2026-03-26 20:11:21 +08:00
dcdc145893 [CI] Reorganize scoring tests (#38207) wang.yuqi 2026-03-26 20:07:01 +08:00
f2d16207c7 [ROCm][CI] Fix flaky GPTQ compile correctness test (#38161) Andreas Karatzas 2026-03-26 06:57:00 -05:00
37a83007fe [ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167) Andreas Karatzas 2026-03-26 06:54:59 -05:00
9fdc0f3aeb merge khluu 2026-03-26 02:17:52 -07:00
bf5eec638d [Refactor] Remove unused utils (#38153) Wentao Ye 2026-03-26 05:08:19 -04:00
b1cb1d3d2c DOC: Documentation pages fixes (#38125) Mateusz Sokół 2026-03-26 09:55:42 +01:00
6ae8bbd0c2 [XPU] Disable xpu graph by default (#38193) Kunshang Ji 2026-03-26 16:53:45 +08:00
a9213c0ffe [Doc] Fix outdated reference to CUDAGraphManager (#38209) Cyrus Leung 2026-03-26 16:52:38 +08:00
502c41a8f6 [Model] Use helper function to run MM processors with token inputs (where applicable) (#38018) Cyrus Leung 2026-03-26 16:44:04 +08:00
05d96d7991 merge Vadim Gimpelson 2026-03-26 12:21:47 +04:00
52069012fe [Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083) Vadim Gimpelson 2026-03-26 12:21:47 +04:00
71161e8b63 [cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691) Fadi Arafeh 2026-03-26 07:03:31 +00:00
38de822310 [Model] Add torch.compile support for InternVL vision encoder (#38049) Terry Gao 2026-03-25 23:52:29 -07:00
2bfbdca23c [Bugfix] Fix benchmark_fused_collective.py (#38082) Jee Jee Li 2026-03-26 14:51:00 +08:00
2908094567 Add /v1/chat/completions/batch endpoint for batched chat completions (#38011) Matej Rojec 2026-03-26 05:13:33 +01:00
e6bf9f15ec [Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format (#38092) BadrBasowid 2026-03-26 12:11:43 +08:00
144030c84e Relocate Encoder CUDA graph manager (#38116) Woosuk Kwon 2026-03-25 20:52:12 -07:00
e2db2b4234 [Tool Parser][1/3] Pass tools to ToolParser constructor (#38029) Flora Feng 2026-03-25 22:29:06 -04:00
87f05d6880 [Revert] Remove DeepGEMM availability check in DeepseekV32IndexerMetadataBuilder (#38076) Chauncey 2026-03-26 09:43:51 +08:00
36f6aede23 [Misc] Optimized check to encapsulate both CUDA and ROCm platforms (#34549) Andreas Karatzas 2026-03-25 20:43:07 -05:00
9704a5c310 Disable dual stream execution of input projection for Qwen3 (#38152) Xin Yang 2026-03-25 18:20:39 -07:00
74056039b7 Fix minimax m2.5 nvfp4 kv scales weight loading (#37214) Wei Zhao 2026-03-25 20:48:06 -04:00
d7d51a7ee5 [Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU (#37348) Jacob Platin 2026-03-25 17:46:01 -07:00
3c3c084240 Various Transformers v5 fixes (#38127) Harry Mellor 2026-03-26 00:10:08 +00:00
7b54f60db0 [Cohere] Enable Cohere-Transcribe (#38120) Ekagra Ranjan 2026-03-25 19:13:51 -04:00
a0e8c74005 [ROCm]: Update rope+kvcache fusion conditions and disable custom op by default (#36716) Rohan Potdar 2026-03-25 15:58:44 -05:00
70a2152830 [MultiModal] add support for numpy array embeddings (#38119) Guillaume Guy 2026-03-25 15:13:04 -05:00
978fc18bf0 [ROCm] Utilize persistent MLA kernel from AITER (#36574) Sathish Sanjeevi 2026-03-25 12:00:42 -07:00
7d6917bef5 [ROCm] Fix MoE kernel test failures on gfx950 (#37833) Andreas Karatzas 2026-03-25 13:46:40 -05:00
e38817fadb [Core][KV Connector] Remove use of num_cached_tokens in error handling (#38096) Mark McLoughlin 2026-03-25 18:20:48 +00:00
72cad44d3c [Frontend] Move APIServerProcessManager target server fn (#38115) Nick Hill 2026-03-25 11:14:41 -07:00
ba2f0acc2d [Misc] Reorganize inputs (#35182) Cyrus Leung 2026-03-26 01:22:54 +08:00
678b3c99e8 [MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration (#38050) Yongye Zhu 2026-03-25 13:16:40 -04:00
bf4cc9ed2d [2/n] Migrate per_token_group_quant to torch stable ABI (#36058) mikaylagawarecki 2026-03-25 13:15:13 -04:00
1ac2ef2e53 [CI/Docs] Improve aarch64/DGX Spark support for dev setup (#38057) Ben Browning 2026-03-25 12:24:42 -04:00
6e37c46b35 [compile] Add some more startup tests for top models (#38046) Richard Zou 2026-03-25 12:02:22 -04:00
1bf2ddd0ee [Refactor] Rename WAITING_FOR_FSM to WAITING_FOR_STRUCTURED_OUTPUT_GRAMMAR (#38048) Wentao Ye 2026-03-25 11:41:44 -04:00
e7221180e1 [Kernel] Optimize SM120 CUTLASS blockwise FP8 GEMM (#37970) Necofish 2026-03-25 23:20:04 +08:00
4a76ad12e0 [Bugfix] Preserve CUDA arch suffix (a/f) for SM12x — fixes NVFP4 NaN on desktop Blackwell (#37725) RobTand 2026-03-25 11:18:25 -04:00
d7e93e13fb [Feature] EPLB Support for GPU Model Runner v2 (#37488) Wentao Ye 2026-03-25 11:16:39 -04:00
cd7643015e [Feature] Support per-draft-model MoE backend via --speculative-config (#37880) Andrii Skliar 2026-03-25 15:31:52 +01:00
a1a2566447 [Docs] Add guide for editing agent instruction files (#37819) Ben Browning 2026-03-25 09:54:09 -04:00
b745e8b5d3 [KVTransfer][Mooncake] Add heterogeneous TP support for disaggregated P/D in MooncakeConnector (#36869) yjz 2026-03-25 21:24:07 +08:00
d215d1efca [Mypy] Better fixes for the mypy issues in vllm/config (#37902) Harry Mellor 2026-03-25 13:14:43 +00:00
34d317dcec [CPU][UX][Perf] Enable tcmalloc by default (#37607) Fadi Arafeh 2026-03-25 12:39:57 +00:00
7ac48fd357 [Model] Add AutoWeightsLoader support for jais (#38074) grYe99 2026-03-25 20:38:40 +08:00
d6bb2a9d9a Fix Plamo 2/3 & LFM2 for Transformers v5 (#38090) Harry Mellor 2026-03-25 12:29:49 +00:00
1e673a43ce Better weight tying check for multimodal models (#38035) Harry Mellor 2026-03-25 12:07:23 +00:00
04417ecd5f [ROCm][CI] Rename filepath test to point to correct file (#38102) Andreas Karatzas 2026-03-25 07:05:46 -05:00
242c93f744 [Docs] Adds vllm-musa to custom_op.md (#37840) R0CKSTAR 2026-03-25 19:54:36 +08:00
a889b7f584 [Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280) Matthias Gehre 2026-03-25 12:42:58 +01:00
ba2910f73a Fix offline mode test for Transformers v5 (#38095) Harry Mellor 2026-03-25 11:39:48 +00:00
f262a62aa1 [ROCm][CI] Fix flaky Cohere/OpenAI embedding parity test (#37616) Andreas Karatzas 2026-03-25 05:55:51 -05:00
9ac2fcafbb [CI] Fix realtime WebSocket timeout deadlock and unhandled model validation errors (#37483) Andreas Karatzas 2026-03-25 05:24:33 -05:00
e9ae3f8077 [Hardware][XPU] Align memory usage with cuda on xpu (#37029) Kunshang Ji 2026-03-25 18:14:29 +08:00
04cec4f927 [ROCm][CI] Increase OpenAPI schema test timeouts (#38088) Andreas Karatzas 2026-03-25 05:06:58 -05:00
14771f7150 [XPU] support MLA model on Intel GPU (#37143) Kunshang Ji 2026-03-25 17:43:42 +08:00
189ddefbfd [ROCm] Attention selector reordering (#36702) Gregory Shtrasberg 2026-03-25 04:42:56 -05:00
09c3dc9186 [Revert] Remove CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#37968) Chauncey 2026-03-25 14:19:37 +08:00
42e9547976 [ROCm][Test] Fix ROCM_AITER_UNIFIED_ATTN attn+quant fusion test (#37640) vllmellm 2026-03-25 13:06:15 +08:00
a32783bb35 [Bugfix] Fix IndexError when accessing prev_tool_call_arr in OpenAIToolParser (#37958) Chauncey 2026-03-25 12:06:21 +08:00
9d0351c91d [Docs] Add Encoder (ViT) CUDA Graphs section to CUDA Graphs design doc (#37914) Baorun (Lauren) Mu 2026-03-24 22:53:24 -04:00
ccbc5ac449 [Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158) Dimitrios Bariamis 2026-03-17 21:13:06 +01:00
a93a53f8a1 [Performance] Auto-enable prefetch on NFS with RAM guard (#37673) Artem Perevedentsev 2026-03-25 02:31:14 +02:00
679c6a3ecc [Bugfix][ROCm][MoE] Fix mxfp4 oracle regressions from #37128 (#37787) Andreas Karatzas 2026-03-24 19:17:33 -05:00
8bbb7c7f20 [ROCm][CI][PD] Add Hybrid SSM integration tests to CI (#37924) Andreas Karatzas 2026-03-24 18:58:39 -05:00
af945615b5 [release] Move the rest of release jobs to release queue (#38044) Kevin H. Luu 2026-03-24 16:40:58 -07:00
82580b10ac [Perf] Disable inductor runtime asserts by default for serving perfor… (#37485) Terry Gao 2026-03-24 16:37:51 -07:00
a0d487b2e1 nano_nemotron_vl: suppress readonly torch.from_numpy() warning in image and video resize paths (#37903) Netanel Haber 2026-03-25 01:25:56 +02:00
b73b5b0629 Make microbatch optimization (DBO) work with general models (#37926) Junhao 2026-03-24 17:40:08 -04:00
0f0e03890e [UX] Add flashinfer-cubin as CUDA default dep (#37233) Michael Goin 2026-03-24 22:13:08 +01:00
4b53740d7f [MRV2] Fix for DS v3.2 (#38030) Woosuk Kwon 2026-03-24 14:03:24 -07:00
4e824d1c83 [Model Runner V2][Minor] Simplify PP logic (#38031) Nick Hill 2026-03-24 13:57:17 -07:00
0c1809c806 Add Ubuntu 24.04 support for Docker builds (#35386) amey asgaonkar 2026-03-24 13:34:44 -07:00
8c47fdfdb1 [FlexAttention] allow custom mask mod (#37692) liangel-02 2026-03-24 16:03:24 -04:00
54b0578ada [Bugfix] Pass hf_token through config loading paths for gated model support (#37920) Javier De Jesus 2026-03-24 20:22:05 +01:00
89f572dbc0 [BugFix] fix VLLM_USE_STANDALONE_COMPILE=0 (#38015) Richard Zou 2026-03-24 15:08:26 -04:00
71a4a2fbd0 [BugFix] Fix order of compile logging (#38012) Richard Zou 2026-03-24 14:58:18 -04:00
935c46dd9b [Model] Add Granite 4.0 1B speech to supported models (#38019) Nick Cao 2026-03-24 14:23:41 -04:00
057fc94cbd [Bugfix] Fix structured output crash on CPU due to pin_memory=True (#37706) Willy Hardy 2026-03-24 13:44:17 -04:00
b58c5f28aa docs: fix broken offline inference paths in documentation (#37998) Vineeta Tiwari 2026-03-24 23:05:14 +05:30
c07e2ca6e0 Fix Mamba state corruption from referencing stale block table entries (#37728) (#37728) (#37728) Ming Yang 2026-03-24 10:29:59 -07:00
4df5fa7439 [Bugfix] Force continuous usage stats when CLI override is enabled (#37923) Dhruv Singal 2026-03-24 10:29:50 -07:00
a5416bc52e [XPU] Support Intel XPU hardware information collection in usage stats (#37964) sihao_li 2026-03-25 01:29:17 +08:00
b3601da6e7 [Mypy] Fix mypy for vllm/model_executor (except vllm/model_executor/layers) (#37904) Harry Mellor 2026-03-24 17:14:01 +00:00
dc78c2c933 [Core] add option to schedule requests based on full ISL (#37307) Dan Blanaru 2026-03-24 18:01:12 +01:00
4731884796 [Feature] limit thinking tokens (hard limit) (#20859) Sungjae Lee 2026-03-25 01:53:07 +09:00
8de5261e69 Update new contributor message (#37999) Harry Mellor 2026-03-24 16:01:41 +00:00
1b6cb920e6 [Deprecate] Deprecate pooling multi task support. (#37956) wang.yuqi 2026-03-24 22:07:47 +08:00
352b90c4a4 [Bugfix] Add replacement of _compute_slot_mapping_kernel on CPU (#37987) Li, Jiang 2026-03-24 22:00:20 +08:00
1c0aabdeb0 [Bugfix] Suppress spurious CPU KV cache warning in launch render (#37911) Sage 2026-03-24 14:36:18 +02:00

... 3 4 5 6 7 ...