Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

c4d859c274 [Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243) Tushar Shetty 2026-03-09 09:10:16 +05:30
747431044d feat(attention): extract KV-cache update from FlexAttention backend (#36263) cong-or 2026-03-09 03:40:12 +00:00
d62856b928 [Misc] Move processors to transformers_utils (#35953) Cyrus Leung 2026-03-09 11:31:39 +08:00
bd2659a566 Increase Flexibility for OOV Multimodal Token Handling (#34858) Alex Brooks 2026-03-08 21:30:49 -06:00
90512b2e8b fix: Use iterator as not to store all the file loads in memory at once (#36149) Shaun Kotek 2026-03-09 05:25:21 +02:00
dcf8862fd4 [Examples][1/n] Resettle basic examples. (#35579) wang.yuqi 2026-03-09 11:22:53 +08:00
43aa389231 [Bugfix] Fix CPU OMP autobind assertion to use local_world_size (#35815) Weiguang Li 2026-03-09 11:07:29 +08:00
384425f84e [Dependency] Remove default ray dependency (#36170) Wentao Ye 2026-03-08 23:06:22 -04:00
a0f44bb616 Allow markdownlint to run locally (#36398) Harry Mellor 2026-03-09 03:05:24 +00:00
fde4771bbd [XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301) Kunshang Ji 2026-03-09 10:09:22 +08:00
e5ff140216 [cudagraph] fix cudagraph warning in deepseekv32 (#28044) Jiangyun Zhu 2026-03-09 08:27:41 +08:00
0a6a3a1290 Add support for ModelOpt MXFP8 MoE models (#35986) danisereb 2026-03-08 22:00:05 +02:00
4497431df6 [Frontend] Add GPU-less render serving path (vllm launch render) (#36166) Sage 2026-03-08 17:35:09 +02:00
b7332b058c [Model] Nano Nemotron VL - fast media preprocessing (#35657) nvnbagrov 2026-03-08 12:04:05 +02:00
40077ea3de [CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341) Andreas Karatzas 2026-03-08 00:42:24 -06:00
5d6aae4577 [LMCache MP Patch]: Race Condition + Duplicated Block Ids (#35831) Samuel Shen 2026-03-07 13:52:48 -08:00
63298ee173 [Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode (#35931) Roy Huang 2026-03-07 13:52:35 -08:00
2dde535df1 [compile] Split compile/warmup monitoring (#36098) Richard Zou 2026-03-07 16:52:11 -05:00
379689d533 [Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891) Wei Zhao 2026-03-07 16:51:54 -05:00
a6be75dbd2 [Core] NGram GPU Implementation compatible with Async Scheduler (#29184) PatchyTIS 2026-03-08 05:51:37 +08:00
ee54f9cdb9 [ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224) Micah Williamson 2026-03-07 15:50:52 -06:00
fc4657756f [ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 (#36174) Micah Williamson 2026-03-07 15:50:17 -06:00
eebd14651f [CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416) qli88 2026-03-07 15:49:56 -06:00
ebb9cc5f2b [UX][Startup] Account for CUDA graphs during memory profiling (#30515) Matthew Bonanni 2026-03-07 16:49:23 -05:00
85f50eb41f Adding support to Sarvam's MoE models (#33942) rahul-sarvam 2026-03-08 01:16:24 +08:00
5261223c2d [Misc] Remove duplicate parser registration (#36303) Taneem Ibrahim 2026-03-07 08:37:01 -06:00
00b814ba5a [V0 Deprecation] Remove unused swap_space parameter (#36216) lif 2026-03-07 22:09:55 +08:00
ee8a29511f [Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247) vllmellm 2026-03-07 17:26:59 +08:00
755356b3d1 feat: expose media_io_kwargs at runtime (#34778) milesial 2026-03-06 20:27:04 -08:00
58928475e4 [ROCm][CI] Making entrypoints more deterministic on ROCm (#36293) Andreas Karatzas 2026-03-06 21:04:40 -06:00
1a9718085c Fix CUDA graph decode capture crash in AITER FlashAttention (#36042) Mengtao (Martin) Yuan 2026-03-06 18:12:07 -08:00
7eb524e64c refine vllm bench throughput --backend hf (#35971) Kunshang Ji 2026-03-07 10:10:33 +08:00
c7f32e08c2 [BugFix] Avoid ignored trust_remote_code warnings (#36290) Nick Hill 2026-03-06 17:24:18 -08:00
b354686524 [Model Runner V2] Fix warmup for pipeline parallel (#36280) Nick Hill 2026-03-06 16:58:51 -08:00
6a18d8789b [Core] Fix benign error log during normal shutdown (#36270) Nick Hill 2026-03-06 16:39:21 -08:00
24a03915f5 mla: don't update kv cache on dummy forwards (#36282) Itay Alroy 2026-03-07 02:36:00 +02:00
b5e34e1fca [ROCm][CI] Fixing yaml file for external amd-ci signal (#36284) Andreas Karatzas 2026-03-06 18:30:39 -06:00
ce8546a12b [docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538) Copilot 2026-03-06 23:55:06 +00:00
b31e9326a7 Bound openai to under 2.25.0 v0.17.0rc1 v0.17.0 khluu 2026-03-06 13:04:15 -08:00
e346c08560 [Release] Include source distribution (sdist) in PyPI uploads (#35136) Doug Smith 2026-03-05 04:43:50 -05:00
b7a423cb01 [BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994) Avery Miao 2026-03-06 01:16:29 +08:00
fa78ec8a72 [Bugfix] Fix Qwen-VL tokenizer implementation (#36140) Cyrus Leung 2026-03-06 00:07:19 +08:00
9a474ce7a4 [XPU] bump vllm-xpu-kernels to v0.1.3 (#35984) Kunshang Ji 2026-03-04 18:23:31 +08:00
c188749bcd [ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) (#35850) Chuan (Richard) Li 2026-03-06 12:24:03 -08:00
225d1090a0 Enabling some B200-specific tests on MI355 (#35253) Alexei-V-Ivanov-AMD 2026-03-06 13:27:20 -06:00
f3c6c9c9d7 [CustomOp] CustomOp FusedRMSNormGated (#35877) eellison 2026-03-06 13:53:37 -05:00
26bd43b52d Revert "[BugFix] Fix engine hanging after KV cache initialization fai… (#36262) Nick Hill 2026-03-06 08:28:09 -08:00
6b625a8807 [Bugfix] Quickfix followups to busy loop removal in #28053 (#36068) Travis Johnson 2026-03-06 09:13:05 -07:00
54756b6109 [compile] Stop unconditionally patching constrain_to_fx_strides (#36152) Richard Zou 2026-03-06 10:17:27 -05:00
39f9ea0da4 [Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) (#36165) Raphaël Rialland 2026-03-06 15:15:31 +01:00
e4ae148a78 [Refactor] Modular video loader backend refactoring (#35202) Isotr0py 2026-03-06 22:06:59 +08:00
1d0c0d209c [Misc] Lazy import registered processors (#36024) Isotr0py 2026-03-06 22:06:45 +08:00
fcb73f306c [bugfix] add api process rank in default multimodal request (#36150) Chenguang Zheng 2026-03-06 20:00:09 +08:00
e2090bf3af [CI] Fix startup error test (#36230) Harry Mellor 2026-03-06 11:50:28 +00:00
2a00d3241f [CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206) Andreas Karatzas 2026-03-06 03:17:08 -06:00
10f4db4dbe [Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153) Alex Brooks 2026-03-06 02:16:56 -07:00
5b3ba94ab4 [Core][KVConnector] Support HMA+NixlConnector (#35758) Nicolò Lucchesi 2026-03-06 08:51:21 +01:00
90f3c01fa4 [Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158) zhanqiuhu 2026-03-06 02:50:44 -05:00
807d680337 [ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553) Andreas Karatzas 2026-03-06 01:15:12 -06:00
5afb387bd4 Change "following fields were present in the request but ignored" log from warn to debug (#36173) Tyler Michael Smith 2026-03-06 01:15:46 -05:00
43e77e59ab [BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191) Walter Beller-Morales 2026-03-06 01:15:29 -05:00
00bd08edee [Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192) Russell Bryant 2026-03-06 01:15:19 -05:00
43f10573c9 [Bugfix] Fix misleading context length error messages (#36197) Ajay Anubolu 2026-03-05 22:15:12 -08:00
86e1060b17 [Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892) Yongye Zhu 2026-03-06 01:04:44 -05:00
27066d1b2b [Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730) Mark McLoughlin 2026-03-06 06:04:31 +00:00
57c84ff129 perf: add __slots__ to KVCacheBlock (#36164) cong-or 2026-03-06 06:04:09 +00:00
e68de8adc0 docs: fix wrong cc in int8.md (#36209) Xiang Shi 2026-03-06 14:01:02 +08:00
a1ffa56a1e [CI] Fix bge-m3 similarity reference values after *Defination* typo fix (#36208) Andreas Karatzas 2026-03-05 23:07:29 -06:00
0a208d1f54 [BugFix] Fix engine hanging after KV cache initialization failure (#35478) Shiyan Deng 2026-03-05 20:58:09 -08:00
03a49bb8f0 [Feature] Add --distributed-timeout-seconds CLI option (#36047) Shiyan Deng 2026-03-05 20:57:51 -08:00
8e87cc57f1 [Bug] Fix a corner case in _process_simple_streaming_events (#34754) Shiyan Deng 2026-03-05 20:57:32 -08:00
6dd302653f [Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs (#36158) Cyrus Leung 2026-03-06 12:32:48 +08:00
de00ebeac4 [Bugfix] Fix simple Mistral-Small example (#36156) Cyrus Leung 2026-03-06 12:25:11 +08:00
639680d220 [ROCm][CI] Adding missing dependencies for Multi-modal models tests (#36177) Andreas Karatzas 2026-03-05 22:23:10 -06:00
c5362c739f Reenable features for ROCm attention backends (#36185) Rohan Potdar 2026-03-05 22:21:06 -06:00
0a49676fb0 cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147) Nikhil Gupta 2026-03-06 03:48:59 +00:00
c012a8c477 Don't fire ray compatibility webhook when PR or branch is not provided (#36088) Jeffrey Wang 2026-03-05 16:42:21 -08:00
ebed80a7c8 [Performance] Extract KV-cache update from TreeAttention backend (#35384) Dor Huri 2026-03-06 02:22:43 +02:00
a73af584fe [Model Runner V2] Fix warmup for very small kvcache and/or blocksizes (#36176) Nick Hill 2026-03-05 14:48:10 -08:00
a97954b6a8 [compile] Consistent compiler config for saved/loaded vllm backends. (#35810) Zhengxu Chen 2026-03-05 15:08:12 -05:00
a911f4dd20 [Model] Add support for OLMo Hybrid (#32550) Yanhong Li 2026-03-05 11:51:06 -08:00
5395471d29 [CI] Add explicit permissions to macOS smoke test workflow (#35775) Russell Bryant 2026-03-05 14:08:48 -05:00
a57c877f18 [BugFix] Fallback from FA4->FA2 for Batch Invariance (#36059) Frank Wang 2026-03-05 11:05:56 -08:00
f917020983 [Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794) Xin Yang 2026-03-05 10:47:53 -08:00
86483ca774 [Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146) tomeras91 2026-03-05 19:49:05 +02:00
b93a9e6f6d ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133) Netanel Haber 2026-03-05 19:29:30 +02:00
d8839ef7d9 [XPU] Enable ModelRunnerV2 on XPU (#36078) Xinyu Chen 2026-03-06 01:19:18 +08:00
e998fa76b9 [BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994) Avery Miao 2026-03-06 01:16:29 +08:00
6a895197fa [Bugfix][CI] fix typos (#34934) Jiayi Yan 2026-03-06 01:05:46 +08:00
8c760b6ab6 [ROCm] Refactor ROCm attention backend selection logic (#35246) Sage Moore 2026-03-05 08:51:26 -08:00
3ee68590c7 refactor funasr model. (#36108) AllenDou 2026-03-06 00:07:37 +08:00
7196348157 [Bugfix] Fix Qwen-VL tokenizer implementation (#36140) Cyrus Leung 2026-03-06 00:07:19 +08:00
176c799f4c [openai api] log exception in exception handler (1/N) (#31164) Ning Xie 2026-03-06 00:00:12 +08:00
612e7729c2 [KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616) Or Ozeri 2026-03-05 16:25:15 +02:00
ecde7af9c4 Fix import that was moved in Transformers 5.2.0 (#36120) Harry Mellor 2026-03-05 13:59:44 +00:00
8df523351f [Docs] Only build docs if documentation or ready labels are present (#36135) Harry Mellor 2026-03-05 13:58:16 +00:00
b03ff6a96b [CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107) Andreas Karatzas 2026-03-05 07:52:49 -06:00
ed81d5edd1 [Bugfix] Fix RunAI streamer crash with S3-hosted model paths (#35976) Ajay Anubolu 2026-03-05 04:14:20 -08:00
3c23ac840e [Bugfix] Fix mypy errors in hermes_tool_parser.py (#36114) Shiyan Deng 2026-03-05 03:37:47 -08:00
a708ef5944 [Misc] Fix SyntaxWarning - invalid escape sequence '\e' (#36020) cjackal 2026-03-05 19:55:31 +09:00

... 10 11 12 13 14 ...