Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

14acf429ac [EPLB] Remove main waits in case of slow EPLB (#36271) Ilya Markov 2026-03-24 12:50:44 +01:00
ce57fd5557 [Docs] Fix build (#37991) Harry Mellor 2026-03-24 10:20:49 +00:00
2e67fa756d Fix tool_parser_cls type annotation from Callable to type[ToolParser] (#37957) Flora Feng 2026-03-24 01:58:27 -04:00
e3c6c10cad [KV Offload] Refactor CPU offloading: pluggable CachePolicy, remove Backend abstraction, restructure into cpu/ package (#37874) Ronen Schaffer 2026-03-24 07:02:51 +02:00
16a664df24 [Frontend][Bugfix] Pass default_chat_template_kwargs to AnthropicServingMessages (#37899) jetxa 2026-03-24 13:00:12 +08:00
7281199a8c [release] Move agent queue to Release cluster queues (#37783) Kevin H. Luu 2026-03-23 20:36:47 -07:00
b2dd75eb48 Downsize CPU jobs to use small queue (#37913) Kevin H. Luu 2026-03-23 20:36:37 -07:00
c59a132f96 [V0 Deprecation] Refactor kv cache from list to element (#37487) Wentao Ye 2026-03-23 23:10:11 -04:00
de99d91ece [ROCm][CI] Split Entrypoints Integration (API Server 1) into 3 jobs (#37906) Andreas Karatzas 2026-03-23 20:48:37 -05:00
83c9d525b6 [CI] Add batch invariant test: Block FP8 + small MOE (#37895) Wentao Ye 2026-03-23 21:16:14 -04:00
8f4824b664 [Model Runner V2] Gather multimodal embeddings before draft model postprocess (#37932) Giancarlo Delfin 2026-03-23 18:14:13 -07:00
56777b5c89 [Test] E2E Nemotron-3-Super tests (#36803) roikoren755 2026-03-24 02:49:56 +02:00
2488a82f89 [CI] Split V1 Others into 3 separate jobs (#37016) Kevin H. Luu 2026-03-23 15:44:38 -07:00
dc6908ac6a [Bugfix] Register VLLM_BATCH_INVARIANT in envs.py to fix spurious unknown env var warning (#35007) Ranran 2026-03-23 17:31:14 -05:00
e85f8f0932 [Bug][MoE] Strengthen _supports_current_device() checks in the TRTLLM FP8, NVFP4, and FlashInfer CuteDSL MoE experts (#36728) yzong-rh 2026-03-23 17:02:57 -04:00
5bf3c42d4c [Bug][MoE] Fix TRTLLM NVFP4 Routing Kernel Precision (#36725) Robert Shaw 2026-03-23 21:19:06 +01:00
38364a7e32 [Sparse24] [Deprecation] Remove Sparse24 CT integration and kernels (#36799) Kyle Sayers 2026-03-23 16:03:29 -04:00
fafe76b4af [Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951) Matthew Bonanni 2026-03-23 15:37:22 -04:00
ffb5b32b5f [MRV2] Consider spec decoding in warmup (#37812) Woosuk Kwon 2026-03-23 10:45:43 -07:00
91fd695b75 [CI] split Entrypoints Integration (API Server 1) into 3 jobs (#37882) Kunshang Ji 2026-03-24 01:37:56 +08:00
1cbbcfe8a3 [CI][PD] Add Hybrid SSM integration tests to CI (#37657) Nicolò Lucchesi 2026-03-23 16:58:19 +01:00
aceadb5ee1 Use lazy graph module during split_module to defer recompile() (#37609) Angela Yi 2026-03-23 08:21:29 -07:00
ec2280611a [Bugfix] Fix RoBERTa position_ids accumulation on CUDA graph padding (#37884) Yufeng He 2026-03-23 23:15:12 +08:00
7151ae6528 [Bugfix] RoBERTa position_id accumulation in CUDA graph padding region (#37873) yanghui1-arch 2026-03-23 22:59:21 +08:00
45bd5c8e75 [Mypy] Fix mypy for vllm/config (#37808) Wentao Ye 2026-03-23 10:33:59 -04:00
10a1018c12 [ROCm] fix sleep mode not releasing GPU memory problem on ROCm (#37533) Zhaodong Bing 2026-03-23 21:07:19 +08:00
aec2dc6c0d [Bugfix][LoRA] Fix incorrect LoRA Log (#37877) Jee Jee Li 2026-03-23 19:42:52 +08:00
7938d12119 [Bugfix] Fix CPU backend crash in KV cache block zeroing (#37550) DorBernsohn 2026-03-23 13:35:45 +02:00
debd6e768c [XPU][MoE Refactor] Refactor xpu mxfp4 support into oracle (#37784) Kunshang Ji 2026-03-23 19:10:41 +08:00
9ace378a63 [Frontend][Responses API] Fix arrival_time recording for TTFT on initial request (#37498) Andrew Xia 2026-03-23 02:58:08 -07:00
27d5ee3e6f [FP8]add FP8 WoQ kernel abstraction. (#32929) Kunshang Ji 2026-03-23 17:47:47 +08:00
35141a7eed [Misc]Update gitignore (#37863) wangxiyuan 2026-03-23 16:14:10 +08:00
e99fb98867 [ROCm] Fix fused_moe_fake signature mismatch and other AITER bugs (#36100) Chuan (Richard) Li 2026-03-23 00:48:31 -07:00
a16133a0f1 [Perf] [Bugfix] Fix Triton autotuning in inference for Qwen3.5 (#37338) Artem Perevedentsev 2026-03-23 09:37:58 +02:00
54ab804e87 [Bugfix] Store Qwen3Next A_log in fp32 (#37810) Hojin Yang 2026-03-23 16:36:57 +09:00
02e6efe56d [Bugfix] JAIS: Only apply ALiBi when position_embedding_type='alibi' (#37820) r266-tech 2026-03-23 15:36:34 +08:00
410d300893 [ROCm][Refactor] Enable AWQMarlinConfig on ROCm to use choose_mp_linear_kernel (#36505) Matthias Gehre 2026-03-23 08:36:08 +01:00
d3fe857135 update doc for online fp8 quantization (#37851) Yan Ma 2026-03-23 13:19:03 +08:00
f85e479e66 [Feature] ViT Full CUDA Graph (#35963) Baorun (Lauren) Mu 2026-03-23 01:01:10 -04:00
1f0d210641 [CI/Build][LoRA] Update Qwen35 LoRA testing (#37816) Jee Jee Li 2026-03-23 12:55:49 +08:00
3bbe2e1e6e [Test] Consolidate tool parser unit tests to tests/tool_parsers (#37834) Ben Browning 2026-03-23 00:24:25 -04:00
6e04e79326 always use embed&token_classify for bge-m3 (#37632) Augusto Yao 2026-03-23 11:10:57 +08:00
e7767eccae Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643) Lasha Koroshinadze 2026-03-22 22:29:07 -04:00
43877a620b [MRV2] Enable PP CUDA graph test (#37830) Woosuk Kwon 2026-03-22 16:30:25 -07:00
63f49b8bd4 [Model Runner V2] Enable piecewise CUDA graphs for pipeline parallelism (#35162) zhanqiuhu 2026-03-22 16:48:25 -04:00
a5e9d511de [MRV2] Use FP64 for Gumbel noise (#37798) Woosuk Kwon 2026-03-22 12:28:10 -07:00
c058ff44d4 [Bigfix]fix lora test by pass padded size back to the layer (#37811) Yongye Zhu 2026-03-22 15:20:13 -04:00
ce9b1d76cf [MRV2] Skip hidden states allocation for PW CUDA graphs (#37818) Woosuk Kwon 2026-03-22 11:47:21 -07:00
e74c17e153 Enable NemotronHPuzzle + NemotronHMTP (#37803) Netanel Haber 2026-03-22 17:13:58 +02:00
eaf4978621 [Test] Only Run MLA model when user explicitly set for batch invariance (#37719) Wentao Ye 2026-03-22 09:09:12 -04:00
77d24c4bfe [Bug] Fix fp8 deepgemm batch invariant (#37718) Wentao Ye 2026-03-22 08:57:20 -04:00
b3e846017d [Model Runner V2] Support multi-modal embeddings for spec decode model (#36097) Giancarlo Delfin 2026-03-22 02:48:43 -07:00
cd1242d82a [ROCm][CI] Stabilize ROCm speech-to-text translation test with lower min acc threshold (#37723) Andreas Karatzas 2026-03-22 04:32:08 -05:00
4383f1532e [MoE] Move PF Methods to Folder (#35927) Robert Shaw 2026-03-22 04:42:59 -04:00
6eedec6e36 [ROCm][CI] Make some duplicated tests optional so that they are only evaluated in our nightly (#37780) Andreas Karatzas 2026-03-22 03:03:18 -05:00
ffc8531524 [ROCm][CI] Added missing resampy dependency for MM audio tests (#37778) Andreas Karatzas 2026-03-22 03:02:41 -05:00
6ecba840d7 [ROCm][CI] get_cu_count was renamed to num_compute_units in #35042 (#37764) Andreas Karatzas 2026-03-22 03:02:21 -05:00
3b06c55c78 [ROCm][CI] Fix MEGA_AOT_ARTIFACT fallback when PyTorch < 2.10.0 lacks AOT support (#37763) Andreas Karatzas 2026-03-22 03:02:03 -05:00
b050700462 [Perf] Optimize glm4.xv VIT (#37779) Yang Liu 2026-03-22 02:12:34 -04:00
5dac719b2b [Bugfix] Handle libsndfile sf_error(NULL) race condition in audio fallback (#37782) Andreas Karatzas 2026-03-22 00:37:29 -05:00
c862481c02 [CI] Skip ISAAC multimodal tests due to broken upstream HF model weights (#37781) Andreas Karatzas 2026-03-22 00:23:32 -05:00
c86b17cfe6 [ROCm][CI] Add large_gpu_mark to test_max_tokens_none for ROCm (#37717) Andreas Karatzas 2026-03-21 23:25:16 -05:00
66f927f205 [Bugfix] Fix pooling non-determinism from pinned prompt_lens aliasing (#37775) Andreas Karatzas 2026-03-21 22:22:24 -05:00
e78bc74268 [ROCm][CI] close missing quote in kernels/moe block in run-amd-test.sh (#37774) Andreas Karatzas 2026-03-21 20:42:34 -05:00
6b2fa3a762 [MoE] Move FlashInfer CuteDSL experts into fused_moe/experts/ (#37759) Robert Shaw 2026-03-21 19:15:16 -04:00
eeee5b262d [Quantization][Deprecation] Remove PTPC FP8 (#32700) Robert Shaw 2026-03-21 18:10:16 -04:00
5ad0446572 Revert "Consolidate AWQ quantization into single awq_marlin.py file" (#37768) Robert Shaw 2026-03-21 17:20:41 -04:00
8cc700dd6a Consolidate AWQ quantization into single awq_marlin.py file Robert Shaw 2026-03-21 17:09:17 -04:00
80b70884eb Add tensor IPC transfer mechanism for multimodal data (#32104) Brandon Pelfrey 2026-03-21 13:10:20 -07:00
61e381dcf0 [Perf] Add SM 10.3 (B300/GB300) all-reduce communicator tuning (#37756) Mohammad Miadh Angkad 2026-03-22 03:43:47 +08:00
88f1b374f5 [Core] Enable allreduce fusion by default for SM 10.3 (B300/GB300) (#37755) Mohammad Miadh Angkad 2026-03-22 03:40:37 +08:00
298e510848 [Hybrid] calling get_mamba_groups() once at MambaCopyBuffers.create() (#37318) v0.18.1rc0 Francesco Fusco 2026-03-21 10:29:43 +01:00
3982bc2cd0 [ROCm] Enable DeepEP ROCm as all2allbackend for AMD GPUs. (#34692) Chaitanya Sri Krishna Lolla 2026-03-21 13:02:31 +05:30
02eec7ecbe [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (MI355) (#37721) Andreas Karatzas 2026-03-21 02:27:12 -05:00
17ee641c45 [Responses API] Add kv_transfer_params for PD disaggregation (#37424) Bongwoo Bak 2026-03-21 14:48:54 +09:00
0d50fa1db6 [ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610) Andreas Karatzas 2026-03-20 23:57:25 -05:00
1fa1e53a73 Revert "[compile] Initialize passes at VllmBackend init" (#37733) Simon Mo 2026-03-20 21:35:49 -07:00
3ffa52009f [ROCm][CI] Guard CudaPlatform/RocmPlatform imports to fix test collection on cross-platform builds (#37617) Andreas Karatzas 2026-03-20 22:58:58 -05:00
87bd91892f [MoE Refactor] Mxfp4 oracle rebased (#37128) Yongye Zhu 2026-03-20 22:37:04 -05:00
c7f98b4d0a [Frontend] Remove librosa from audio dependency (#37058) Isotr0py 2026-03-21 11:36:15 +08:00
1c472f8fe1 Add get_device_uuid for rocm (#37694) tmm77 2026-03-20 23:33:16 -04:00
c57d38d603 elastic_ep: Fix issues with repeated scale up/down cycles (#37131) Itay Alroy 2026-03-21 01:13:02 +02:00
e5ed6c6c13 [BugFix] Allow qk_nope_head_dim=192 in FlashInfer MLA backend checks (#37475) Kaihang Jiang 2026-03-20 18:14:55 -04:00
b3d0b37908 [Refactor] Remove unused dead code (#36171) Wentao Ye 2026-03-20 18:12:51 -04:00
85f671b8e1 [Model Runner V2] Support Streaming Inputs (#37028) Santino Ramos 2026-03-20 13:42:25 -07:00
8bc6b5cdb0 [ROCm][CI] Setting some mi325_4 tests back to optional (in parity with upstream) (#37711) Andreas Karatzas 2026-03-20 14:25:08 -05:00
4f16ebbbd3 [Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591) (#37605) Vadim Gimpelson 2026-03-20 23:19:26 +04:00
12fd17eb51 [compile] Initialize passes at VllmBackend init (#35216) Angela Yi 2026-03-20 11:40:33 -07:00
37aadf6237 [Model] Update Kimi-K25 and Isaac processors to fit HF-style (#37693) Cyrus Leung 2026-03-21 02:30:22 +08:00
d7d2b5e405 [Bugfix] Disable --calculate-kv-scales for hybrid GDN/Mamba+Attention… (#37565) Le Yang 2026-03-21 02:28:34 +08:00
6ec5e9fd37 refactor: abstract deepgemm support into platform (#37519) SherryC41 2026-03-21 01:54:08 +08:00
e1d85e5c24 [Attention] Support distinguishing between short extends and decodes (#37303) Lucas Wilkinson 2026-03-20 10:49:36 -07:00
79eb9369c5 fix CUDAGraph memory being counted twice (#37426) Peter Pan 2026-03-21 01:36:32 +08:00
e80cfe575d [MRV2] Avoid recompilation of _gather_block_tables_kernel (#37645) Woosuk Kwon 2026-03-20 10:31:45 -07:00
d0532bf38d [Perf] Eliminate redundant SparseMatrix creation in gpt_oss_triton_kernels (#37683) Xin Yang 2026-03-20 10:28:41 -07:00
fb4e8bf442 [ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613) Andreas Karatzas 2026-03-20 12:16:59 -05:00
6ade4bc5a5 Fix various config related issues for Transformers v5 (#37681) Harry Mellor 2026-03-20 16:30:12 +00:00
2e089b96a8 [compile] Add compiled artifact counter for VLLM_USE_MEGA_AOT_ARTIFACT=1. (#37589) Zhengxu Chen 2026-03-20 12:22:46 -04:00
880be2b1b8 [Metrics] Some small refactoring for better maintainability (#33898) Martin Hickey 2026-03-20 16:11:34 +00:00
c0f5fae601 [compile] Fix aot test failures with torch 2.12. (#37604) Zhengxu Chen 2026-03-20 12:06:29 -04:00

... 4 5 6 7 8 ...