Commit Graph

  • c4d859c274 [Bugfix] Skip out-of-stage layers in get_layers_from_vllm_config for pipeline parallel (#36243) Tushar Shetty 2026-03-09 09:10:16 +05:30
  • 747431044d feat(attention): extract KV-cache update from FlexAttention backend (#36263) cong-or 2026-03-09 03:40:12 +00:00
  • d62856b928 [Misc] Move processors to transformers_utils (#35953) Cyrus Leung 2026-03-09 11:31:39 +08:00
  • bd2659a566 Increase Flexibility for OOV Multimodal Token Handling (#34858) Alex Brooks 2026-03-08 21:30:49 -06:00
  • 90512b2e8b fix: Use iterator as not to store all the file loads in memory at once (#36149) Shaun Kotek 2026-03-09 05:25:21 +02:00
  • dcf8862fd4 [Examples][1/n] Resettle basic examples. (#35579) wang.yuqi 2026-03-09 11:22:53 +08:00
  • 43aa389231 [Bugfix] Fix CPU OMP autobind assertion to use local_world_size (#35815) Weiguang Li 2026-03-09 11:07:29 +08:00
  • 384425f84e [Dependency] Remove default ray dependency (#36170) Wentao Ye 2026-03-08 23:06:22 -04:00
  • a0f44bb616 Allow markdownlint to run locally (#36398) Harry Mellor 2026-03-09 03:05:24 +00:00
  • fde4771bbd [XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301) Kunshang Ji 2026-03-09 10:09:22 +08:00
  • e5ff140216 [cudagraph] fix cudagraph warning in deepseekv32 (#28044) Jiangyun Zhu 2026-03-09 08:27:41 +08:00
  • 0a6a3a1290 Add support for ModelOpt MXFP8 MoE models (#35986) danisereb 2026-03-08 22:00:05 +02:00
  • 4497431df6 [Frontend] Add GPU-less render serving path (vllm launch render) (#36166) Sage 2026-03-08 17:35:09 +02:00
  • b7332b058c [Model] Nano Nemotron VL - fast media preprocessing (#35657) nvnbagrov 2026-03-08 12:04:05 +02:00
  • 40077ea3de [CI] fix flaky empty responses and add diagnostic assertions in vision chat tests (#36341) Andreas Karatzas 2026-03-08 00:42:24 -06:00
  • 5d6aae4577 [LMCache MP Patch]: Race Condition + Duplicated Block Ids (#35831) Samuel Shen 2026-03-07 13:52:48 -08:00
  • 63298ee173 [Bugfix][LMCache][KVConnector] fix potential memory leak in LMCache multiprocess mode (#35931) Roy Huang 2026-03-07 13:52:35 -08:00
  • 2dde535df1 [compile] Split compile/warmup monitoring (#36098) Richard Zou 2026-03-07 16:52:11 -05:00
  • 379689d533 [Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891) Wei Zhao 2026-03-07 16:51:54 -05:00
  • a6be75dbd2 [Core] NGram GPU Implementation compatible with Async Scheduler (#29184) PatchyTIS 2026-03-08 05:51:37 +08:00
  • ee54f9cdb9 [ROCm][CI] Accept Different But Valid Output for test_olmoe_tp (#35224) Micah Williamson 2026-03-07 15:50:52 -06:00
  • fc4657756f [ROCm][CI] Enable AITER for failing test_gpt_oss test case on MI355 (#36174) Micah Williamson 2026-03-07 15:50:17 -06:00
  • eebd14651f [CI] Enable Crosslayer KV layout tests for ROCm platforms (#35416) qli88 2026-03-07 15:49:56 -06:00
  • ebb9cc5f2b [UX][Startup] Account for CUDA graphs during memory profiling (#30515) Matthew Bonanni 2026-03-07 16:49:23 -05:00
  • 85f50eb41f Adding support to Sarvam's MoE models (#33942) rahul-sarvam 2026-03-08 01:16:24 +08:00
  • 5261223c2d [Misc] Remove duplicate parser registration (#36303) Taneem Ibrahim 2026-03-07 08:37:01 -06:00
  • 00b814ba5a [V0 Deprecation] Remove unused swap_space parameter (#36216) lif 2026-03-07 22:09:55 +08:00
  • ee8a29511f [Bugfix] Fix compressed-tensors quantization failure for DeepSeek-R1 on MI300x (#36247) vllmellm 2026-03-07 17:26:59 +08:00
  • 755356b3d1 feat: expose media_io_kwargs at runtime (#34778) milesial 2026-03-06 20:27:04 -08:00
  • 58928475e4 [ROCm][CI] Making entrypoints more deterministic on ROCm (#36293) Andreas Karatzas 2026-03-06 21:04:40 -06:00
  • 1a9718085c Fix CUDA graph decode capture crash in AITER FlashAttention (#36042) Mengtao (Martin) Yuan 2026-03-06 18:12:07 -08:00
  • 7eb524e64c refine vllm bench throughput --backend hf (#35971) Kunshang Ji 2026-03-07 10:10:33 +08:00
  • c7f32e08c2 [BugFix] Avoid ignored trust_remote_code warnings (#36290) Nick Hill 2026-03-06 17:24:18 -08:00
  • b354686524 [Model Runner V2] Fix warmup for pipeline parallel (#36280) Nick Hill 2026-03-06 16:58:51 -08:00
  • 6a18d8789b [Core] Fix benign error log during normal shutdown (#36270) Nick Hill 2026-03-06 16:39:21 -08:00
  • 24a03915f5 mla: don't update kv cache on dummy forwards (#36282) Itay Alroy 2026-03-07 02:36:00 +02:00
  • b5e34e1fca [ROCm][CI] Fixing yaml file for external amd-ci signal (#36284) Andreas Karatzas 2026-03-06 18:30:39 -06:00
  • ce8546a12b [docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538) Copilot 2026-03-06 23:55:06 +00:00
  • b31e9326a7 Bound openai to under 2.25.0 v0.17.0rc1 v0.17.0 khluu 2026-03-06 13:04:15 -08:00
  • e346c08560 [Release] Include source distribution (sdist) in PyPI uploads (#35136) Doug Smith 2026-03-05 04:43:50 -05:00
  • b7a423cb01 [BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994) Avery Miao 2026-03-06 01:16:29 +08:00
  • fa78ec8a72 [Bugfix] Fix Qwen-VL tokenizer implementation (#36140) Cyrus Leung 2026-03-06 00:07:19 +08:00
  • 9a474ce7a4 [XPU] bump vllm-xpu-kernels to v0.1.3 (#35984) Kunshang Ji 2026-03-04 18:23:31 +08:00
  • c188749bcd [ROCm] Support MLA with nhead<16 and FP8 KV cache for TP=8 (Kimi K2.5/Linear) (#35850) Chuan (Richard) Li 2026-03-06 12:24:03 -08:00
  • 225d1090a0 Enabling some B200-specific tests on MI355 (#35253) Alexei-V-Ivanov-AMD 2026-03-06 13:27:20 -06:00
  • f3c6c9c9d7 [CustomOp] CustomOp FusedRMSNormGated (#35877) eellison 2026-03-06 13:53:37 -05:00
  • 26bd43b52d Revert "[BugFix] Fix engine hanging after KV cache initialization fai… (#36262) Nick Hill 2026-03-06 08:28:09 -08:00
  • 6b625a8807 [Bugfix] Quickfix followups to busy loop removal in #28053 (#36068) Travis Johnson 2026-03-06 09:13:05 -07:00
  • 54756b6109 [compile] Stop unconditionally patching constrain_to_fx_strides (#36152) Richard Zou 2026-03-06 10:17:27 -05:00
  • 39f9ea0da4 [Bugfix] Fix cudagraph_mode:FULL dispatch (This does not impact FULL_AND_PIECEWISE (default)) (#36165) Raphaël Rialland 2026-03-06 15:15:31 +01:00
  • e4ae148a78 [Refactor] Modular video loader backend refactoring (#35202) Isotr0py 2026-03-06 22:06:59 +08:00
  • 1d0c0d209c [Misc] Lazy import registered processors (#36024) Isotr0py 2026-03-06 22:06:45 +08:00
  • fcb73f306c [bugfix] add api process rank in default multimodal request (#36150) Chenguang Zheng 2026-03-06 20:00:09 +08:00
  • e2090bf3af [CI] Fix startup error test (#36230) Harry Mellor 2026-03-06 11:50:28 +00:00
  • 2a00d3241f [CI][MM] Gate vision encoder attention mask to MiniCPM only, fixing Aria regression (#36206) Andreas Karatzas 2026-03-06 03:17:08 -06:00
  • 10f4db4dbe [Frontend] Add Support for MM Encoder/Decoder Beam Search (Offline) (#36153) Alex Brooks 2026-03-06 02:16:56 -07:00
  • 5b3ba94ab4 [Core][KVConnector] Support HMA+NixlConnector (#35758) Nicolò Lucchesi 2026-03-06 08:51:21 +01:00
  • 90f3c01fa4 [Spec Decode][KV Connector] Fix KV transfer in PD + speculative decoding (#35158) zhanqiuhu 2026-03-06 02:50:44 -05:00
  • 807d680337 [ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553) Andreas Karatzas 2026-03-06 01:15:12 -06:00
  • 5afb387bd4 Change "following fields were present in the request but ignored" log from warn to debug (#36173) Tyler Michael Smith 2026-03-06 01:15:46 -05:00
  • 43e77e59ab [BugFix] avoid infinite loop with VLLM_PORT and get_open_ports_list (#36191) Walter Beller-Morales 2026-03-06 01:15:29 -05:00
  • 00bd08edee [Security] Respect user trust_remote_code setting in NemotronVL and KimiK25 (#36192) Russell Bryant 2026-03-06 01:15:19 -05:00
  • 43f10573c9 [Bugfix] Fix misleading context length error messages (#36197) Ajay Anubolu 2026-03-05 22:15:12 -08:00
  • 86e1060b17 [Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892) Yongye Zhu 2026-03-06 01:04:44 -05:00
  • 27066d1b2b [Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730) Mark McLoughlin 2026-03-06 06:04:31 +00:00
  • 57c84ff129 perf: add __slots__ to KVCacheBlock (#36164) cong-or 2026-03-06 06:04:09 +00:00
  • e68de8adc0 docs: fix wrong cc in int8.md (#36209) Xiang Shi 2026-03-06 14:01:02 +08:00
  • a1ffa56a1e [CI] Fix bge-m3 similarity reference values after *Defination* typo fix (#36208) Andreas Karatzas 2026-03-05 23:07:29 -06:00
  • 0a208d1f54 [BugFix] Fix engine hanging after KV cache initialization failure (#35478) Shiyan Deng 2026-03-05 20:58:09 -08:00
  • 03a49bb8f0 [Feature] Add --distributed-timeout-seconds CLI option (#36047) Shiyan Deng 2026-03-05 20:57:51 -08:00
  • 8e87cc57f1 [Bug] Fix a corner case in _process_simple_streaming_events (#34754) Shiyan Deng 2026-03-05 20:57:32 -08:00
  • 6dd302653f [Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs (#36158) Cyrus Leung 2026-03-06 12:32:48 +08:00
  • de00ebeac4 [Bugfix] Fix simple Mistral-Small example (#36156) Cyrus Leung 2026-03-06 12:25:11 +08:00
  • 639680d220 [ROCm][CI] Adding missing dependencies for Multi-modal models tests (#36177) Andreas Karatzas 2026-03-05 22:23:10 -06:00
  • c5362c739f Reenable features for ROCm attention backends (#36185) Rohan Potdar 2026-03-05 22:21:06 -06:00
  • 0a49676fb0 cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147) Nikhil Gupta 2026-03-06 03:48:59 +00:00
  • c012a8c477 Don't fire ray compatibility webhook when PR or branch is not provided (#36088) Jeffrey Wang 2026-03-05 16:42:21 -08:00
  • ebed80a7c8 [Performance] Extract KV-cache update from TreeAttention backend (#35384) Dor Huri 2026-03-06 02:22:43 +02:00
  • a73af584fe [Model Runner V2] Fix warmup for very small kvcache and/or blocksizes (#36176) Nick Hill 2026-03-05 14:48:10 -08:00
  • a97954b6a8 [compile] Consistent compiler config for saved/loaded vllm backends. (#35810) Zhengxu Chen 2026-03-05 15:08:12 -05:00
  • a911f4dd20 [Model] Add support for OLMo Hybrid (#32550) Yanhong Li 2026-03-05 11:51:06 -08:00
  • 5395471d29 [CI] Add explicit permissions to macOS smoke test workflow (#35775) Russell Bryant 2026-03-05 14:08:48 -05:00
  • a57c877f18 [BugFix] Fallback from FA4->FA2 for Batch Invariance (#36059) Frank Wang 2026-03-05 11:05:56 -08:00
  • f917020983 [Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794) Xin Yang 2026-03-05 10:47:53 -08:00
  • 86483ca774 [Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146) tomeras91 2026-03-05 19:49:05 +02:00
  • b93a9e6f6d ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133) Netanel Haber 2026-03-05 19:29:30 +02:00
  • d8839ef7d9 [XPU] Enable ModelRunnerV2 on XPU (#36078) Xinyu Chen 2026-03-06 01:19:18 +08:00
  • e998fa76b9 [BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994) Avery Miao 2026-03-06 01:16:29 +08:00
  • 6a895197fa [Bugfix][CI] fix typos (#34934) Jiayi Yan 2026-03-06 01:05:46 +08:00
  • 8c760b6ab6 [ROCm] Refactor ROCm attention backend selection logic (#35246) Sage Moore 2026-03-05 08:51:26 -08:00
  • 3ee68590c7 refactor funasr model. (#36108) AllenDou 2026-03-06 00:07:37 +08:00
  • 7196348157 [Bugfix] Fix Qwen-VL tokenizer implementation (#36140) Cyrus Leung 2026-03-06 00:07:19 +08:00
  • 176c799f4c [openai api] log exception in exception handler (1/N) (#31164) Ning Xie 2026-03-06 00:00:12 +08:00
  • 612e7729c2 [KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616) Or Ozeri 2026-03-05 16:25:15 +02:00
  • ecde7af9c4 Fix import that was moved in Transformers 5.2.0 (#36120) Harry Mellor 2026-03-05 13:59:44 +00:00
  • 8df523351f [Docs] Only build docs if documentation or ready labels are present (#36135) Harry Mellor 2026-03-05 13:58:16 +00:00
  • b03ff6a96b [CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107) Andreas Karatzas 2026-03-05 07:52:49 -06:00
  • ed81d5edd1 [Bugfix] Fix RunAI streamer crash with S3-hosted model paths (#35976) Ajay Anubolu 2026-03-05 04:14:20 -08:00
  • 3c23ac840e [Bugfix] Fix mypy errors in hermes_tool_parser.py (#36114) Shiyan Deng 2026-03-05 03:37:47 -08:00
  • a708ef5944 [Misc] Fix SyntaxWarning - invalid escape sequence '\e' (#36020) cjackal 2026-03-05 19:55:31 +09:00