Commit Graph

  • 5dfc5abe94 [ROCm] [Release] Change the package from aiter to amd-aiter (#35198) TJian 2026-03-03 15:13:39 +08:00
  • 8fa68a8ce4 Fix TYPE_CHECKING stub defaults in envs.py to match actual runtime defaults (#35645) lin-shh 2026-03-03 00:59:43 -05:00
  • 35a6f0bfe2 [Misc] Fix typos in comments: explict→explicit, paramaters→parameters (#35648) lin-shh 2026-03-03 00:59:14 -05:00
  • 3a6cbf16e2 [MISC] Removed unused function find_all_indices() from tool_parsers/utils.py (#35683) Taneem Ibrahim 2026-03-02 23:58:42 -06:00
  • f44d1ddc8c [BugFix] Fix cmake based incremental install (wrong vllm install dir) (#35773) Lucas Wilkinson 2026-03-03 00:58:16 -05:00
  • 48a54c1e0d [CI/Build] Trigger processor tests on registry update (#35824) Cyrus Leung 2026-03-03 13:55:57 +08:00
  • 8b9e8b7454 [ROCm][CI] Fix Assertion Logic For test_gpt_oss (#35806) Micah Williamson 2026-03-02 23:08:04 -06:00
  • c21d0039ec [Refactor] Fix maxsim cuda platform and add cli to control it (#35427) Wentao Ye 2026-03-02 23:48:31 -05:00
  • 7d8bbe6f42 [CI/Build] Automatically patch video metadata for multimodal processor test (#35822) Isotr0py 2026-03-03 12:27:45 +08:00
  • 25e02647c2 [Core] Add optional flags to check for repetitive token patterns in engine output (#35451) aykoppol 2026-03-02 20:23:25 -08:00
  • a0a5178ab4 [Model Runner V2] Use ModelState.prepare_attn() for cuda graph capture [5/N] (#35774) Woosuk Kwon 2026-03-02 20:06:27 -08:00
  • 8ea8ba275e [V0 deprecation] Remove Swin model (#35821) Isotr0py 2026-03-03 12:03:41 +08:00
  • 4f85bae9d6 [Docs][Model Runner V2] Add Design Docs (#35819) Woosuk Kwon 2026-03-02 19:58:14 -08:00
  • 0a7165fd71 [ModelRunnerV2] Rename sampler functions and variables for clarity (#35459) Andy Lo 2026-03-03 04:48:56 +01:00
  • 6521ccf286 [CI] Temporarily Disable Nightly Failures (#35770) Robert Shaw 2026-03-02 20:49:13 -05:00
  • 8ebd872f50 [Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode (#35615) Martin Vit 2026-03-03 02:40:37 +01:00
  • 168ee03e1c [Model Runner V2][Perf] align dummy_run tokens to uniform decode for dp cudagraph (#35376) zhrrr 2026-03-03 09:10:47 +08:00
  • 9dd656f0ea [XPU][NIXL] Add GPUDirect RDMA support for XPU (#35270) liuzhenwei 2026-03-03 08:42:49 +08:00
  • c8b678e53e [Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735) Jakub Zakrzewski 2026-03-03 01:32:14 +01:00
  • 18c29c746b [ROCm][CI] Fix backslash-continuation in pytest marker re-quoting and treat exit code 5 as success (#35798) Andreas Karatzas 2026-03-02 18:07:51 -06:00
  • 96fc09503a [All Reduce] Change default backend of Flashinfer All Reduce to trtllm (#35793) Hanjie Qiu 2026-03-02 18:57:38 -05:00
  • 1b82b433fc [Bugfix] Fix MM processor test for Qwen3.5 (#35797) Roger Wang 2026-03-02 15:05:08 -08:00
  • 9319044ee9 [MoE][Perf] Wrap DSV3 QKVAProj GEMM in custom op for torch.compile (#35751) Robert Shaw 2026-03-02 18:03:49 -05:00
  • c42dc402c1 clean unused cudagraph_batch_sizes (#35552) Boyuan Feng 2026-03-02 14:00:16 -08:00
  • fa6a6be519 [Bugfix] Fix missing sequence_lengths in qwen3_omni_moe_thinker (#35741) Ye (Charlotte) Qi 2026-03-02 13:11:56 -08:00
  • cad21918e3 [BUG] Fix rlhf_async example (#35788) Aaron Hao 2026-03-02 12:36:40 -08:00
  • 53700bf49b [ci] Add Ray compatibility check informational CI job (#34672) Jeffrey Wang 2026-03-02 12:06:16 -08:00
  • a13d8c03c9 [KVConnector] Auto-downgrade to PIECEWISE cudagraph mode for layerwise async ops (#31057) Yashwant Bezawada 2026-03-02 14:04:47 -06:00
  • 9433acb8df [Spec Decode] Add hidden states extraction system (#33736) Fynn Schmitt-Ulms 2026-03-02 14:29:09 -05:00
  • d1a6e96d9e [torch.compile] Improve cold and warm start compile tests (#35709) Richard Zou 2026-03-02 14:27:06 -05:00
  • 2a9e3347e9 [BugFix][Model]Fix the garbled code in Ernie4.5-VL caused by fast_moe_cold_start (#35587) CSWYF3634076 2026-03-03 02:56:33 +08:00
  • cc0d565f40 [CI/Build] Enable Qwen3.5 tests on CI (#35763) Isotr0py 2026-03-03 01:43:53 +08:00
  • 358e4d5ba7 [CI][HPU] Pin vllm commit compatible with vllm-gaudi - HPU tests (#35307) Patryk Wolsza 2026-03-02 18:02:26 +01:00
  • 792a74b973 [Doc] Improve UX of --enable-log-requests (#35723) Cyrus Leung 2026-03-03 00:24:09 +08:00
  • 4034c3d32e [Core] Move test utility to test file (#35672) Turner Jabbour 2026-03-02 08:56:03 -07:00
  • 7560d674c9 [CI] Fix mypy for vllm/device allocator (#35518) Martin Hickey 2026-03-02 15:53:18 +00:00
  • d9c7730877 [Performance] Extract kv update ops from MLA attention backends (#34627) ElizaWszola 2026-03-02 16:43:19 +01:00
  • ada4f4fadd [Fix Bug]num_active_loras always equals to zero (#34119) Runkai Tao 2026-03-02 10:17:46 -05:00
  • 7e9149d9a9 [Docs] Add breadcrumbs for better UX (#35749) Harry Mellor 2026-03-02 14:31:54 +00:00
  • 87c98b0236 [MyPy][BugFix] Check profiler is assigned before calling start() on it (#35505) Martin Hickey 2026-03-02 13:23:42 +00:00
  • de7dd634b9 Fix unresolved-import errors when using Astral's ty by removing src.root (#35681) Tyler Michael Smith 2026-03-02 05:26:47 -05:00
  • 9a87b0578f [Feat] Supports Anthropic Messages count_tokens API (#35588) Chauncey 2026-03-02 17:48:54 +08:00
  • 510bc9e1df [Misc] Cleanup useless current_platform import (#35715) wangxiyuan 2026-03-02 17:36:54 +08:00
  • cbd361fd46 [CPU][Distributed] Fix Enable _CPUSHMDistributed only when TP/PP ranks share the same SHM group name (#34169) Charles Ashby 2026-03-02 04:34:35 -05:00
  • c212202d93 [Misc] Bound NIXL upper bound version (#35495) Nicolò Lucchesi 2026-03-02 09:57:07 +01:00
  • ec27b36b4b [CI] Defining extended V1 e2e + engine tests (#35580) Andreas Karatzas 2026-03-02 02:10:54 -06:00
  • 3fd1d4ec2c [Rocm][CI] Fix LM Eval Large Models (H100) test group (#34750) Charlie Fu 2026-03-02 01:43:38 -06:00
  • cb21972a97 [Kernel] Integrate SM100 MXFP8 blockscaled grouped MM and quant kernels (#34448) EdalatiAli 2026-03-02 02:31:19 -05:00
  • c34963f138 [ROCm][CI] Disable skinny GEMMs in language model standard tests to fix non-determinism (#35152) Andreas Karatzas 2026-03-02 01:04:18 -06:00
  • f26650d649 [ROCm] add amd-quark package in requirements for rocm to use quantized models (#35658) Hongxia Yang 2026-03-02 01:02:43 -05:00
  • 92f5d0f070 [XPU] fix mxfp4 activation type (#35691) Kunshang Ji 2026-03-02 11:48:39 +08:00
  • a60985b07e Fix deprecated v1 config tests (#35327) Jesse Cai 2026-03-01 17:32:03 -08:00
  • 8b5014d3dd [Attention] FA4 integration (#32974) Lucas Wilkinson 2026-03-01 18:44:57 -05:00
  • 57a96e26c9 Revert "[Bugfix] Disable TRTLLM attention with KV transfer enabled (#33192)" (#34832) zhanqiuhu 2026-03-01 17:32:37 -05:00
  • e82fbeec7b [torch.compile] Undo the fast_moe_cold_start hack in torch>=2.11 (#35475) Richard Zou 2026-03-01 16:44:22 -05:00
  • 6290470843 [Bugfix] Fix dtype mismatch in RMSNormGated.forward_native() during torch.compile (#35256) haosdent 2026-03-02 04:14:46 +08:00
  • 72f4d16262 [Model Runner V2] Use block table apis for capture inputs (#35671) Woosuk Kwon 2026-03-01 10:31:13 -08:00
  • 5a435507d8 fix(mxfp4): return is_monolithic=False when LoRA is enabled for Triton backend (#35382) Seungho Yoon 2026-03-01 23:59:30 +09:00
  • 59d7af9c6c [MISC] Fixing a null reference by removing parallel_utils from mypy EXCLUDE (#35630) Taneem Ibrahim 2026-03-01 08:26:44 -06:00
  • bbf81f9a92 [Mamba1] - Kernel Level Chunk Alignment for Prefix Caching (#34798) Asaf Gardin 2026-03-01 14:40:23 +02:00
  • da543d1abe [Model Runner V2] Minor refactoring for EncoderRunner (#35628) Woosuk Kwon 2026-03-01 00:15:39 -08:00
  • 87d319c52f [AMD][CI] Support Triton attention with ExampleConnector (#34931) Ryan Rock 2026-03-01 01:58:07 -06:00
  • a9ec392c86 Fix typo: implictly -> implicitly in isaac.py docstring (#35646) lin-shh 2026-03-01 02:34:37 -05:00
  • afd089f231 [Bugfix][Model] Fix Qwen3.5/Qwen3Next ignoring --dtype flag on older GPUs (#35617) lailoo 2026-03-01 11:27:37 +08:00
  • 3ecd0bf9fc Add TMA support to fused_moe_lora kernel (#32195) gnovack 2026-02-28 18:55:25 -08:00
  • e3eb146f7a [Model Runner V2] Add ModelStateInterface [4/N] (#35621) Woosuk Kwon 2026-02-28 13:19:45 -08:00
  • 95a395dbec [Bugfix] Fix Anthropic API base64 image handling in Messages endpoint (#35557) Martin Vit 2026-02-28 21:57:08 +01:00
  • e94b263bd6 [Chore] Cleanup BNB utilization dead code (#35620) Isotr0py 2026-03-01 03:22:41 +08:00
  • e113a30113 [Deprecation] Deprecate code in 0.17 as scheduled (#35441) Wentao Ye 2026-02-28 12:32:37 -05:00
  • 1dafb29f91 [Benchmark] Avoid unnecessary video download in MMVU (#35618) Cyrus Leung 2026-03-01 01:07:02 +08:00
  • 49b9ae32e9 [Fix] Avoid sending image input to other PP ranks (#35405) emricksini-h 2026-02-28 17:14:29 +01:00
  • 63d7972f13 Fix Qwen3_5MTP packed_modules_mapping for gate_up_proj (#35581) cwazai 2026-02-28 22:50:55 +08:00
  • c68e69f144 custom dataset img support base64 (#35280) flutist 2026-02-28 19:49:52 +08:00
  • 7e08c22b8c [Feat] Add CUDA torch fallbacks for fp8_mqa_logits/fp8_paged_mqa_logits_torch function (#35271) Chauncey 2026-02-28 18:12:00 +08:00
  • 8e75d88554 add io_process_plugin for sparse embedding (#34214) Augusto Yao 2026-02-28 17:16:37 +08:00
  • 0892d1ab1f [Feature]Supports Anthropic Thinking Block (#33671) Mario Hong 2026-02-28 17:02:33 +08:00
  • 7600642eae Add padding support to wvSplitK solution for skinny GEMMs (#33762) Hashem Hashemi 2026-02-28 01:02:05 -08:00
  • 1e69c04887 [ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances (#35571) Andreas Karatzas 2026-02-28 02:59:26 -06:00
  • 4292e3b807 [Benchmark] Improve UX of sweep scripts (#35600) Cyrus Leung 2026-02-28 16:36:02 +08:00
  • 24d6ea8afd [Benchmark] Rename SLA Finder to Workload Explorer (#35586) Cyrus Leung 2026-02-28 15:31:55 +08:00
  • 57c86c0741 [Misc] Change logging level from info to debug for tool parser import (#35575) Chauncey 2026-02-28 14:51:35 +08:00
  • 06254d4cbb [CI] add trainer_send_weights for MockWeightTransferEngine (#35589) Chauncey 2026-02-28 14:47:43 +08:00
  • f5d1281c9d [ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071) Andreas Karatzas 2026-02-27 23:57:31 -06:00
  • 94029ffaf0 [ROCm] Derive device capability from GCN arch string without CUDA init (#35069) Andreas Karatzas 2026-02-27 23:55:28 -06:00
  • 88e8525f2e [ROCm][CI] Adding infiniband mappings for moriio tests (#35170) Andreas Karatzas 2026-02-27 23:53:28 -06:00
  • b2d8b422b2 [EPLB] Enforce sync eplb for NCCL-based all2all backend (#35212) Ilya Markov 2026-02-28 06:47:12 +01:00
  • 1d5ab5d603 [Bugfix] Move chat completion response_format validation to Pydantic model_validator (#35510) Umut Polat 2026-02-28 08:26:19 +03:00
  • 7b346ba8ed [Bugfix] Propagate compilation_time from workers to main process for TP>1 (#35503) Huy Do 2026-02-27 21:03:22 -08:00
  • dea268336f [1/N] Elastic EP Milestone 2 (#34861) Itay Alroy 2026-02-28 06:46:42 +02:00
  • 90805ff464 [CI/Build] CPU release supports both of AVX2 and AVX512 (#35466) Ma Jian 2026-02-28 12:35:21 +08:00
  • 2562e0271e [MTP] Validate that MTP weights are actually loaded (#35548) Matthew Bonanni 2026-02-27 23:27:40 -05:00
  • fd68cd132b [Bugfix] Fixes for SLA finder (#35537) Cyrus Leung 2026-02-28 12:20:55 +08:00
  • 0edf101d2b [ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527) Micah Williamson 2026-02-27 22:16:34 -06:00
  • d5b6f3ba36 [ROCm][Quantization] Add Composable Kernel (CK) backend support for M… (#34301) Douglas Lehr 2026-02-27 21:37:01 -06:00
  • 1a014a0a93 [Model Runner V2] Move MM encoder to Model States [3/N] (#35564) Woosuk Kwon 2026-02-27 18:32:38 -08:00
  • 86ac7bcf84 [Model Runner V2] Support pooling models (#35120) Woosuk Kwon 2026-02-27 18:03:01 -08:00
  • 405f28d38d [Misc] Clean up ResponsesRequest model validators (#35531) Umut Polat 2026-02-28 04:19:21 +03:00
  • 5323672bc2 [misc] cleanup one level of error stack when nixl fails to initialize (#35517) youkaichao 2026-02-28 08:42:37 +08:00
  • a201ad72d8 [Refactor][Kernel] Add global helper to deduplicate vectorized memory ops (#35105) Roberto L. Castro 2026-02-28 01:28:17 +01:00
  • e3691988d0 [ROCm]: fix aiter rope functionalization (#35533) Rohan Potdar 2026-02-27 16:42:30 -06:00