Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

d2f4a71cd5 [Bugfix] Kimi-K2 grouped_topk usage for Flashinfer monolithic kernels. (#33858) Pavani Majety 2026-02-05 01:32:10 -08:00
2abd97592f [KV Connector][Metrics] Do not count local prefix cache hits in connector queries (#30522) Mark McLoughlin 2026-02-05 07:57:27 +00:00
6abb0454ad [Perf] Optimize the performance of structured output + reasoning (#33557) Chauncey 2026-02-05 15:45:29 +08:00
db6f71d4c9 [CI/Build] Fix CPU CI test case title (#33870) Li, Jiang 2026-02-05 15:07:14 +08:00
fd03538bf9 [CPU][BugFix] Allow w8a8 oneDNN quantized matmul to support 3D inputs (#33727) Fadi Arafeh 2026-02-05 06:26:09 +00:00
1f70313e59 [Bugfix] Fix ScoreMultiModalParam multi-document scoring returning single result (#33837) Andreas Karatzas 2026-02-05 00:17:00 -06:00
07daee132b [CI/Build] Parallelize CPU CI tests (#33778) Li, Jiang 2026-02-05 13:53:48 +08:00
9595afda18 [2/N] move responses/serving _make_response_output_items logic to parser (#33281) Andrew Xia 2026-02-05 00:46:15 -05:00
c1395f72cd [CI][AMD][BugFix] Ensure VLLM_ROCM_USE_AITER is set so test_rocm_aiter_topk.py can run correctly (#33840) rasmith 2026-02-04 23:05:48 -06:00
007b183d74 [docs] fix unintentional misspellings (#33863) rinbaro 2026-02-04 20:50:59 -08:00
add9f1fbd9 [Minor] Include StreamingInput in inputs package (#33856) Nick Hill 2026-02-04 20:38:20 -08:00
e3bf79ffa0 Revert "[Attention][FA3] Update FA3 to include new swizzle optimization" (#33841) Luka Govedič 2026-02-04 22:54:27 -05:00
fb1270f1f8 [CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762) Andreas Karatzas 2026-02-04 21:14:06 -06:00
72bb24e2db [release] Minor fixes to release annotation (#33849) Kevin H. Luu 2026-02-04 18:07:35 -08:00
a7be77beef [Bugfix] fix DeepSeek R1 with CUTLASS MLA Broken on B200 (#33637) Chauncey 2026-02-05 09:28:36 +08:00
bbe0574d8e [Bugfix] Disable TRTLLM attention when KV transfer is enabled (#33192) v0.15.2rc0 zhanqiuhu 2026-02-04 19:49:18 -05:00
4d9513537d [CI][torch.compile] Reduce e2e fusion test time (#33293) Luka Govedič 2026-02-04 19:09:03 -05:00
439afa4eea feat: Add ColBERT late interaction model support (#33686) Ilya Boytsov 2026-02-05 01:05:13 +01:00
fa4e0fb028 [Core] Don't schedule spec tokens with prefill chunks (#33652) Nick Hill 2026-02-04 15:40:22 -08:00
ce498a6d61 Change the type signature of MixtureOfExperts.expert_weights to MutableSequence[Sequence[Tensor]] (#33573) Sage Moore 2026-02-04 14:02:46 -08:00
9f14c9224d Revert "[torch.compile] Significantly speed up cold start times" (#33820) Richard Zou 2026-02-04 13:59:59 -08:00
535de06cb1 [Model] Add transcription support for Qwen3-Omni (#29828) Muhammad Hashmi 2026-02-04 13:17:47 -08:00
4292c90a2a [Bugfix] Support RotaryEmbedding CustomOp for gpt-oss (#33800) Simon Danielsson 2026-02-04 21:17:41 +01:00
6e98f6d8b6 Implement zero-copy GQA for multimodal and CPU (#33732) Taeksang Kim 2026-02-05 05:11:39 +09:00
2f6d17cb2f [rocm][ray] Fix: Unify Ray device visibility handling across CUDA and ROCm (#33308) kourosh hakhamaneshi 2026-02-04 10:09:14 -08:00
192ad4648b [Bugfix] Fix interns1-pro initialization and PP (#33793) Isotr0py 2026-02-05 01:54:45 +08:00
0e92298622 [Misc] Delay deprecation of CommonAttentionMetadata properties (#33801) Lucas Wilkinson 2026-02-04 09:41:57 -07:00
87d9a26166 [Bugfix] Fix ubatch wrapper num_tokens calculate (#33694) jiangkuaixue123 2026-02-05 00:41:45 +08:00
80f921ba4b [Bugfix] Fix normalize still being passed to PoolerConfig (#33794) Cyrus Leung 2026-02-04 23:56:02 +08:00
711edaf0d0 [Perf] Optimize spec decoding + async scheduling, 1.5% Throughput improvement (#33612) Wentao Ye 2026-02-04 09:34:32 -05:00
1d367a738e [Bugfix][ROCm] Include float8_e4m3fnuz in NCCL Dtype Dispatching (#33713) Micah Williamson 2026-02-04 07:36:29 -06:00
32a02c7ca2 Apply #33621 to main (#33758) Cyrus Leung 2026-02-04 21:35:39 +08:00
f67ee8b859 [Perf] Optimize chat completion streaming performance (#33782) Chauncey 2026-02-04 20:30:36 +08:00
e57ef99b40 [Model] Apply #32631 for recent models (#33785) Cyrus Leung 2026-02-04 20:23:01 +08:00
f8516a1ab9 [Bugfix][Model] Fix audio-in-video support for Qwen2.5-Omni and Qwen3-Omni (#33605) Yueqian Lin 2026-02-04 07:15:29 -05:00
824058076c [PERF] Change GDN Attention State Layout from [N, HV, K, V] to [N, HV, V, K] (#33291) Vadim Gimpelson 2026-02-04 15:20:52 +04:00
8e32690869 [KV Connector][BugFix] scheduler: Delay freeing blocks of aborted async loads (#32255) Or Ozeri 2026-02-04 13:16:34 +02:00
a208439537 [compile] Remove runner type from ignored caching factor list. (#33712) Zhengxu Chen 2026-02-04 05:56:45 -05:00
bcd2f74c0d [compile] Clean up AOT compile bypass on evaluate_guards. (#33578) Zhengxu Chen 2026-02-04 05:12:53 -05:00
f79f777803 [XPU][2/N] add support unquantized moe support for xpu (#33659) Kunshang Ji 2026-02-04 18:12:25 +08:00
4c8d1bf361 use ORJSONResponse when available to improve the efficiency of request process (#33548) Augusto Yao 2026-02-04 18:04:11 +08:00
061da6bcf7 [XPU] remove common path warning log (#33769) Kunshang Ji 2026-02-04 16:40:17 +08:00
4403e3ed4c [Metrics] Add labeled prompt token metrics for P/D disaggregation (#33290) zhanqiuhu 2026-02-04 02:46:48 -05:00
08e094997e [Hardware][AMD][CI] Refactor AMD tests to properly use BuildKite parallelism (#32745) Matt 2026-02-04 00:51:33 -06:00
d88a1df699 [Deprecation] Deprecate profiling envs (#33722) Wentao Ye 2026-02-04 00:58:21 -05:00
90d74ebaa4 [Deprecation] Remove _get_data_parser in MM processor (#33757) Cyrus Leung 2026-02-04 13:51:52 +08:00
45f8fd6f97 [Feature] Enable TRITON_ATTN for Batch Invariance (#33688) Frank Wang 2026-02-03 21:27:34 -08:00
5e1e0a0fbd [Refactor] Remove unused dead code (#33718) Wentao Ye 2026-02-04 00:25:11 -05:00
eb5ed20743 [Bugfix] Define router_logits_dtype for remaining MoE models (#33737) Michael Goin 2026-02-04 00:24:14 -05:00
2647163674 Save startup benchmark results as a list of values (#33629) Huy Do 2026-02-03 20:37:51 -08:00
9fb27dd3b3 [MM] Align the prefix of MMEncoderAttention with Attention (#33750) Shanshan Shen 2026-02-04 12:07:30 +08:00
4dffc5e044 [CPU] Split attention dispatch by head_dim alignment (#32161) R3hankhan 2026-02-04 09:07:15 +05:30
e1bf04b6c2 [1/N] Initial Implementation of Parser for ResponsesAPI (#32712) Andrew Xia 2026-02-03 21:59:03 -05:00
02080179a3 [Bugfix] Fix torchrun PP broadcast deadlock with async scheduling (#33701) Isotr0py 2026-02-04 10:17:37 +08:00
1b8fe6f7c4 [Frontend][4/n] Make pooling entrypoints request schema consensus | ScoreRequest (#33060) wang.yuqi 2026-02-04 09:48:40 +08:00
1892993bc1 [BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729) v0.15.1rc1 v0.15.1 Nick Hill 2026-02-03 15:34:41 -08:00
7d98f09b1c cherry pick Michael Goin 2026-02-03 16:26:51 -05:00
daa2784bb9 [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Michael Goin 2026-02-03 05:37:15 -05:00
52ee21021a [BugFix][Spec Decoding] Fix negative accepted tokens metric crash (#33729) Nick Hill 2026-02-03 15:34:41 -08:00
655efb3e69 [Dependency] Remove comments of ray in dependency files (#33351) Wentao Ye 2026-02-03 18:30:47 -05:00
bd8da29a66 [Bugfix] Fix sparse MLA metadata building (#33579) Matthew Bonanni 2026-02-03 18:29:48 -05:00
2a99c5a6c8 [Bugfix] Disable TRTLLM FP8 MoE if router_logits_dtype==float32 and routing_method!=DeepSeekV3 (#33613) Michael Goin 2026-02-03 16:26:51 -05:00
3f7662d650 [Voxtral Realtime] Change name (#33716) Patrick von Platen 2026-02-03 22:03:28 +01:00
a372f3f40a [MISC] Fix Tensor Parallelism for Quantized Mamba Models with n_groups=1 (#33257) Vadim Gimpelson 2026-02-04 00:10:31 +04:00
61e632aea1 Turn @config into a dataclass_transform (#31541) Harry Mellor 2026-02-03 17:40:59 +00:00
b1bb18de8d [torch.compile] Significantly speed up cold start times (#33641) Richard Zou 2026-02-03 09:12:11 -08:00
2267cb1cfd [Attention][FA3] Update FA3 to include new swizzle optimization (#23465) Lucas Wilkinson 2026-02-03 09:08:47 -07:00
0d6ccf68fa [P/D] rework mooncake connector and introduce its bootstrap server (#31034) dtc 2026-02-04 00:08:25 +08:00
18e7cbbb15 [Bugfix] Fix startup hang for Granite Speech (#33699) Cyrus Leung 2026-02-03 23:57:56 +08:00
f0d5251715 [Voxtral models] Skip warm-up to skip confusing error message in warm-up (#33576) Patrick von Platen 2026-02-03 16:22:34 +01:00
5c4f2dd6ef [MM] Pass prefix parameter to MMEncoderAttention (#33674) Shanshan Shen 2026-02-03 22:47:41 +08:00
f3d8a34671 [Bugfix] Do not add extra \n for image-only cases when constructing multimodal text prompts. (#33647) wang.yuqi 2026-02-03 22:43:47 +08:00
4bc913aeec Feat/add nemotron nano v3 tests (#33345) shaharmor98 2026-02-03 15:52:49 +02:00
fbb3cf6981 [Bugfix][Async][Connector] avoid vllm-side double free during async scheduling + request abort + async KV cache transfer (#33377) Kuntai Du 2026-02-03 21:50:15 +08:00
2df2b3499d Document NixlConnector backend selection via kv_connector_extra_config (#33552) Krish Gupta 2026-02-03 19:19:59 +05:30
2a8d84e66d Fix Gemma3n audio encoder for Transformers v5 (#33673) Harry Mellor 2026-02-03 13:49:49 +00:00
a3acfa1071 [Models] Intern-S1-Pro (#33636) zxy 2026-02-03 21:49:45 +08:00
be8168ff88 Fix Gemma3 GGUF for Transformers v5 (#33683) Harry Mellor 2026-02-03 12:36:53 +00:00
f6af34626d Fix offline test for Transformers v5 (#33682) Harry Mellor 2026-02-03 12:07:24 +00:00
ceab70c89d [Bugfix] fix qwen3-asr response error (#33644) Song Zhixin 2026-02-03 19:33:56 +08:00
52683ccbe1 [Misc] Update default image format of encode_base64 (#33656) Cyrus Leung 2026-02-03 19:13:16 +08:00
e346e2d056 [Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE (#33620) Michael Goin 2026-02-03 05:37:15 -05:00
83449a5ff0 [Refactor] Clean up pooling serial utils (#33665) Cyrus Leung 2026-02-03 18:29:18 +08:00
e4bf6ed90d [torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624) Richard Zou 2026-02-02 19:38:49 -08:00
dad2d6a590 [Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token (#33642) Lucas Hänke de Cansino 2026-02-03 09:35:58 +01:00
611b18757e [torch.compile] Speed up MOE handling in forward_context (#33184) Richard Zou 2026-01-27 18:17:54 -05:00
9cd2cce17d [torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding (#33624) v0.15.1rc0 Richard Zou 2026-02-02 19:38:49 -08:00
eec3546bba [Misc][Build] Lazy load cv2 in nemotron_parse.py (#33189) Kiersten Stokes 2026-01-29 00:55:50 -06:00
7c023baf58 Patch Protobuf for CVE 2026-0994 (#33619) zaristei2 2026-02-03 00:03:14 -08:00
099a787ee2 Patch aiohttp for CVE-2025-69223 (#33621) zaristei2 2026-02-03 00:02:39 -08:00
32e84fa1ff [CI/Build] Investigate torchrun distributed tests hanging issue (#33650) Isotr0py 2026-02-03 15:49:17 +08:00
fd9c83d0e0 [torch.compile] Document the workaround to standalone_compile failing (#33571) Richard Zou 2026-02-02 23:16:55 -08:00
b95cc5014d [Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable (#33535) 杨朱 · Kiki 2026-02-03 15:01:59 +08:00
61397891ce [Minor] Some code simplification in scheduler.py (#33597) Nick Hill 2026-02-02 23:00:00 -08:00
ef248ff740 [Misc] Remove deprecated profiler environment variables (#33536) 杨朱 · Kiki 2026-02-03 14:58:44 +08:00
e10604480b [XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform (#33379) Kunshang Ji 2026-02-03 14:46:10 +08:00
bf001da4bf [Bugfix] Interleaved thinking keeps compatibility with reasoning_content (#33635) Chauncey 2026-02-03 14:46:05 +08:00
a0a984ac2e [CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles (#33553) 杨朱 · Kiki 2026-02-03 14:32:39 +08:00
f1cb9b5544 Fix quantized Falcon-H1 model loading issues (#32728) Shengliang Xu 2026-02-02 22:31:27 -08:00
4c4b6f7a97 [Frontend] Add sampling parameters to Responses API (#32609) Daniel Mescheder 2026-02-03 06:51:10 +01:00

... 20 21 22 23 24 ...