Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

e568cf88bc [UX] Infer dtype for local checkpoint (#36218) Isotr0py 2026-03-11 16:50:04 +08:00
098d844731 [NIXL][1/N] Refactor kernel_block_size detection (#35752) Nicolò Lucchesi 2026-03-11 09:11:23 +01:00
a40ee486f2 [Bugfix] Add Multiple of 16 block_size to triton fallback on rocm Attention to support qwen3_5 (#35923) JartX 2026-03-11 08:45:57 +01:00
eac2dc2b41 AITER MLA backend: Avoid CPU sync in _build_decode (#35765) pschlan-amd 2026-03-11 08:25:00 +01:00
d5080aeaa4 [Refactor] Remove deadcode in Responses API serving (#36726) Flora Feng 2026-03-11 03:11:41 -04:00
f22d6e0267 [Hardware][NIXL] set default kv buffer type for different platform (#36438) liuzhenwei 2026-03-11 13:19:28 +08:00
76c6e6da08 [XPU] Support block fp8 moe by fallback to TritonExpert on XPU (#36458) Kunshang Ji 2026-03-11 12:54:09 +08:00
4184653775 feat: add RISC-V support for CPU backend (v2) (#36578) typer-J 2026-03-11 12:51:39 +08:00
4aaaf8c8ce feat(spec_decode): fuse EAGLE step slot mapping and metadata updates (#33503) Sladyn 2026-03-10 21:35:33 -07:00
4bf533623b [Doc] Fix duplicate words in comments (#36713) Hongbin Guo 2026-03-11 12:28:31 +08:00
5f77ef15ae [Misc][Attention] Clean up unused method in CPU_ATTN (#36673) Matthew Bonanni 2026-03-11 00:27:22 -04:00
7d6abdd022 [Fix] Use torch.empty for output in attention+quant fusion (#31785) elvischenv 2026-03-11 12:26:14 +08:00
a8ff2cca92 [Perf] Optimize scheduler overhead for PD disaggregation, around 5% E2E perf improvement (#35781) Wentao Ye 2026-03-11 00:25:30 -04:00
42fadebecb [Model] Add support for moonshotai/Kimi-Audio-7B-Instruct (#36127) tunglinwood 2026-03-11 12:24:48 +08:00
a197eda9c3 Add tuned H100 MoE configs for LFM2 8B and 24B (#36699) tianshu-Michael-yu 2026-03-10 21:22:02 -07:00
1ff2393897 [ci] Bound nvidia-cudnn-frontend version (#36719) Kevin H. Luu 2026-03-10 21:17:35 -07:00
5bec0b0ba3 [DSV3.2][MTP] Optimize Indexer MTP handling (#36723) Benjamin Chislett 2026-03-11 00:16:56 -04:00
82b110d50e [ci] Bound nvidia-cudnn-frontend version (#36719) Kevin H. Luu 2026-03-10 21:17:35 -07:00
9040cd40af [DSV3.2][MTP] Optimize Indexer MTP handling (#36723) Benjamin Chislett 2026-03-11 00:16:56 -04:00
fa0d353acf [Bugfix] Surface exceptions from non-blocking execute_model in UniProcExecutor to avoid DP deadlocks (#35194) fangyuchu 2026-03-11 11:22:21 +08:00
b386bb3d7c fix bugs when token_classify & classify run concurrently (#36614) Augusto Yao 2026-03-11 11:16:34 +08:00
fe714dd507 [openapi server] log exception in exception handler(2/N) (#36201) Ning Xie 2026-03-11 11:16:30 +08:00
8ab3d7427c [Bugfix] Fix DeepSeek V3.2 OOM during CG memory profiling (#36691) Matthew Bonanni 2026-03-10 23:01:07 -04:00
6da1310f91 [Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296) Wei Zhao 2026-03-10 22:04:47 -04:00
84e436ed1c [Bug] Fix TRTLLM Block FP8 MoE Monolithic (#36296) Wei Zhao 2026-03-10 22:04:47 -04:00
81939e7733 [ROCm][CI] Making some tests optional to reduce workload (#36090) Andreas Karatzas 2026-03-10 18:45:27 -05:00
195d1ca3e8 [Minor] Enhance error message for TRTLLM decode uniformity check (#36609) Woosuk Kwon 2026-03-10 15:38:45 -07:00
8d983d7cd6 [Model Runner V2] Add initial CI tests (#36041) Nick Hill 2026-03-10 14:55:21 -07:00
65b2f405dc [Core] Simplify core kv-cache blocks initialization logic (#36521) Nick Hill 2026-03-10 13:20:02 -07:00
bc46be5daf Revert "add nemotron v3 reasoning parser (#36393)" khluu 2026-03-10 11:47:09 -07:00
2a68464c5b [Test] test_async_scheduling.py improvements (#36340) Nick Hill 2026-03-10 11:17:26 -07:00
8e39d39fd4 add nemotron v3 reasoning parser (#36393) Shaun Kotek 2026-03-10 00:11:41 +02:00
bdd8981dab [compile] Apply stored functorch config while finalizing loaded artifacts. (#36582) Zhengxu Chen 2026-03-10 12:34:35 -04:00
f088a831dd [Model Runner V2] Use unpadded num_tokens for PW CUDA graph attn metadata (#36626) Woosuk Kwon 2026-03-10 09:30:56 -07:00
46fa044cc1 [BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219) Vadim Gimpelson 2026-03-10 14:32:20 +04:00
ab43e37158 Fix: Re-Enable EP for trtllm MoE FP8 backend (#36494) amirkl94 2026-03-10 08:11:27 +02:00
f45d010120 Fix/resupport nongated fused moe triton (#36412) Shaun Kotek 2026-03-09 20:01:18 +02:00
244b922088 [Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017) amitz-nv 2026-03-05 00:23:51 +02:00
f83b933b84 [CI] Bump mypy version to 1.19.1 (#36104) v0.17.1rc0 Harry Mellor 2026-03-10 16:18:28 +00:00
82f3f30e26 [ROCm][Perf] Enable sparse_mla's cudagraph on ROCm platform (#35719) Pleaplusone 2026-03-11 00:14:35 +08:00
9095cbbfb6 [Bugfix][Sparse MLA] report indexer CG support properly (#36519) Matthew Bonanni 2026-03-10 12:14:31 -04:00
721ae79f50 Improvements to wvSplitKrc skinny GEMM solution (#34304) Hashem Hashemi 2026-03-10 09:14:27 -07:00
aefc59f088 FunASR model bugfix (#36633) AllenDou 2026-03-10 23:14:21 +08:00
d88f28da05 Fix hf_override_fn when it modifies model_type (#35200) Harry Mellor 2026-03-10 15:03:18 +00:00
106ff69c4e feat(kv-offload): Strategy A — StoreReusedOffloadingManager gates CPU stores on reuse frequency (#35342) Srinivasoo7 2026-03-10 09:43:40 -05:00
ca5fb4bbd8 [Bugfix] Avoid merging empty-only partitions into splitting-op subgraphs (#36595) Jiangyun Zhu 2026-03-10 22:39:01 +08:00
cf88b23749 fix: check HTTP status in batch read_file to prevent silent failures (#36397) Alvin Tang 2026-03-10 22:22:40 +08:00
a3189a08b0 [Model] Consolidate score logic by introduce score_type (#36479) wang.yuqi 2026-03-10 21:32:25 +08:00
409c4e632d [Misc] fix typo: homogenous-> homogeneous (2 lines change) (#36508) SoluMilken 2026-03-10 21:25:37 +08:00
8850738b70 [Bugfix] Fix processor signature (#36630) Raushan Turganbay 2026-03-10 14:20:47 +01:00
234860399b [Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (#36628) Mark McLoughlin 2026-03-10 13:20:41 +00:00
c88510083b Fix Qwen2.5-VL test for Transformers v5 (#36532) Harry Mellor 2026-03-10 12:05:34 +00:00
4ff8c3c8f9 [BUGFIX][Mamba][Qwen3.5] Zero freed SSM cache blocks on GPU (#35219) Vadim Gimpelson 2026-03-10 14:32:20 +04:00
507ddbe992 feat(grpc): extract gRPC servicer into smg-grpc-servicer package, add --grpc flag to vllm serve (#36169) Chang Su 2026-03-10 03:29:59 -07:00
ddbb0d230a [Model Runner V2] Fix mm input embeddings lookup (#36588) Nick Hill 2026-03-10 00:24:58 -07:00
9efc3bdcd6 [Model Runner V2] Fix _compute_slot_mappings_kernel for chunked prefill (#36580) Nick Hill 2026-03-10 00:23:42 -07:00
156e33553c Fix: Re-Enable EP for trtllm MoE FP8 backend (#36494) amirkl94 2026-03-10 08:11:27 +02:00
d0cd736caa [Bugfix] Fix RuntimeError: Already borrowed that degrades VLM serving throughput under concurrent load. (#36557) hallerite 2026-03-09 22:30:51 -07:00
195c997203 Fix LFM2 MoE test for Transformers v5 (#36534) Harry Mellor 2026-03-10 05:29:17 +00:00
04b67d8f62 Remove unused disable_fallback field (#36546) Zhuohan Li 2026-03-09 20:56:54 -07:00
7279374f91 [Perf] Compute maxsim in worker side, reducing redundant copies, 2.7% E2E throughput improvement (#36159) Wentao Ye 2026-03-09 23:55:58 -04:00
006aea17d7 [BugFix] Remove incorrect assert in split_decodes_and_prefills (#36553) Woosuk Kwon 2026-03-09 20:02:02 -07:00
0836be3b03 [Model] Add HyperCLOVAX-SEED-Think-32B vision-language model support (#31471) Hojin Yang 2026-03-10 11:59:19 +09:00
4e95ec111c [Bugfix] Fix Qwen3-Next in_proj_ba weight sharding with TP > 1 (#36242) Ajay Anubolu 2026-03-09 19:16:26 -07:00
179547d62c [ROCm][CI] Fix ROCm GPT-OSS Eval test group (#36179) Andreas Karatzas 2026-03-09 19:55:20 -05:00
f85b4eda3a [bugfix] fix nvlink for nixl/ucx (#36475) youkaichao 2026-03-10 07:49:47 +08:00
2a194ddd72 [Model Runner V2] Add model_state inputs to CUDA graph capture (#36544) Woosuk Kwon 2026-03-09 15:14:51 -07:00
203a7f27da add nemotron v3 reasoning parser (#36393) Shaun Kotek 2026-03-10 00:11:41 +02:00
483463f735 [MRV2] Extensible CG dispatch rework (#35959) Lucas Wilkinson 2026-03-09 16:58:45 -04:00
4e571ce643 [MTP][Misc] Clean up dead code (#36507) Matthew Bonanni 2026-03-09 14:43:06 -04:00
4ff9b045fe [ROCm][CI] Prep Tests For Change To ROCM_ATTN As New Default Backend On ROCm (#36025) Micah Williamson 2026-03-09 13:27:55 -05:00
3fd03f1ec2 [BE] Rename should_torch_compile_mm_vit to should_torch_compile_mm_encoder (#36281) Lucas Kabela 2026-03-09 11:22:05 -07:00
10a5f4d53d [Model Runner V2] Use NamedTuple for execute_model_state (#35930) Woosuk Kwon 2026-03-09 11:17:34 -07:00
fe0c085c28 [Docs] Remove the reo beacon (#36528) Simon Mo 2026-03-09 11:16:50 -07:00
8d6b3d5dda [Misc] Refactored 5 duplicate helper functions that were copied-pasted across multiple parsers (#36436) Taneem Ibrahim 2026-03-09 14:14:11 -04:00
4b87ffbefb [torch.compile] Rename compile_ranges_split_points to compile_ranges_endpoints (#36027) Copilot 2026-03-09 18:04:40 +00:00
fa028207aa Fix/resupport nongated fused moe triton (#36412) Shaun Kotek 2026-03-09 20:01:18 +02:00
d460a18fc6 [Docs] Expand --allowed-media-domains security guidance with threat details (#36506) Russell Bryant 2026-03-09 13:43:42 -04:00
6e956d9eca [Model Runner V2] Add dummy profile_cudagraph_memory API (#36520) Woosuk Kwon 2026-03-09 10:20:13 -07:00
1e0f917b34 [ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm (#36101) Andreas Karatzas 2026-03-09 12:07:44 -05:00
c174d54f86 [ROCm][CI] Fix ROCm attention backend validation for head sizes, block sizes, and compute capability checks (#36292) Andreas Karatzas 2026-03-09 12:02:41 -05:00
55d27cca55 [Misc] fix typo: dependant -> dependent (2 lines change) (#36511) SoluMilken 2026-03-10 01:00:12 +08:00
580864d81e [Attention][Perf][Kernel] Replace torch.cat with vectorized CUDA kernel MLA query concat - DeepSeek-V3.2 (#34917) Roberto L. Castro 2026-03-09 17:50:36 +01:00
2b28b9b269 [Attention][Perf] Optimize cp_gather_and_upconvert_fp8_kv_cache - DeepSeek-v3.2 (#35290) Roberto L. Castro 2026-03-09 17:46:57 +01:00
70485a11bd [ROCM] Optimize the fused_topk_bias to use aiter instead of fallback torch ops. (#36253) Taoyu Zhu 2026-03-10 00:30:35 +08:00
74a9f54cdb [CI] Fix edge case that could lead to broken docs builds on main (#36515) Harry Mellor 2026-03-09 16:06:19 +00:00
00c4cb5606 [Bugfix] Clear stale CG keys after memory profiling (#36416) Matthew Bonanni 2026-03-09 11:56:00 -04:00
941e52c298 [Refactor] Simplify chat_completion_full_generator for tool parsers (#35634) Wentao Ye 2026-03-09 11:33:46 -04:00
be292b7c14 [Bug] Fix pooling model benchmark script (#36300) Wentao Ye 2026-03-09 11:17:45 -04:00
77a73458e3 Reapply [Attention] Refactor check_and_update_config (#35122) Matthew Bonanni 2026-03-09 10:17:14 -04:00
5578f2a4d3 Support online use_audio_in_video (#36319) Tianyu Guo 2026-03-09 22:16:44 +08:00
3ec2115015 [Frontend] Move warmup into Renderer (#36482) Cyrus Leung 2026-03-09 21:03:21 +08:00
b0906d8b02 [MM Encoder] Default to use TORCH_SDPA backend for ViT on Volta/Turing GPU (#36472) Isotr0py 2026-03-09 18:43:44 +08:00
aaf5fa9abf [ci] Bound openai dependency to 2.24.0 (#36471) Kevin H. Luu 2026-03-09 03:43:26 -07:00
f96c3ab08c [Deprecation][1/2] Remove items deprecated in v0.18 (#36470) Cyrus Leung 2026-03-09 18:43:23 +08:00
dc6b578466 [Kernel] Add fused_sigmoid_gating_delta_rule_update kernel for Qwen3 Next (#35777) Xin Yang 2026-03-08 23:41:01 -07:00
1bc9c77f6d [XPU] Add test script of PD disaggregation (#36434) liuzhenwei 2026-03-09 13:50:27 +08:00
65a4da1504 [Frontend] Add Support for MM Encoder/Decoder Beam Search (Online Transcriptions) (#36160) Alex Brooks 2026-03-08 23:46:23 -06:00
217f27598d [Bugfix] Avoid to replace non-tensor members in cpu model runner (#36430) Li, Jiang 2026-03-09 13:06:28 +08:00
fff3711a24 [Frontend][2/n] Improve pooling entrypoints | embed. (#36110) wang.yuqi 2026-03-09 11:42:19 +08:00

... 9 10 11 12 13 ...