Commit Graph

  • d6704dd099 Fix MiniMax-M2 rmsnorm precision and remove useless code (#27627) Roger Young 2025-10-29 21:01:05 +08:00
  • ecca3fee76 [Frontend] Add vllm bench sweep to CLI (#27639) Cyrus Leung 2025-10-29 20:59:48 +08:00
  • 9a0d2f0d92 [CI/Build] Skip cpu offloading test on AMD (#27690) Zhewen Li 2025-10-29 05:55:51 -07:00
  • ad3ec89532 [VLM] Add Qwen3-VL generation test (#25185) Isotr0py 2025-10-29 20:19:37 +08:00
  • 3481e40743 [chore] Remove models weight on S3 logic (#27725) Kevin H. Luu 2025-10-29 03:29:49 -07:00
  • 5e72216d17 Feature/video support in random mm dataset (#25963) Eugene Khvedchenya 2025-10-29 12:24:52 +02:00
  • 1a33aacf82 [Misc] Raise error for missing video metadata in MultiModalDataParser (#27664) Isotr0py 2025-10-29 18:06:42 +08:00
  • 7ba6aa8f56 [Fix] import get_kv_cache_torch_dtype error in LMCacheConnector integration (#27670) Yue Zhang 2025-10-29 18:03:54 +08:00
  • ab2eb27b74 [Frontend] [gpt-oss] Mcp type bug (#27689) Alec S 2025-10-29 06:01:32 -04:00
  • 3c7fefdeba [Frontend] [gpt-oss] Tool json call parsing error retry (#27675) Alec S 2025-10-29 05:42:44 -04:00
  • 1891cf605a [Bugfix] Fix modular kernel tests (#27707) bnellnm 2025-10-29 04:14:33 -04:00
  • 8df98c2161 [perf] Enable concurrent execution of "shared_experts" and "selected_experts" in qwen3-next (#27578) Jiangyun Zhu 2025-10-29 16:12:54 +08:00
  • 4fb8771cc0 [CI/Build] Move pre-commit only scripts to tools/pre_commit (#27657) Cyrus Leung 2025-10-29 16:04:33 +08:00
  • 413ef7a3b4 [Speculators] Move tests + fix integration (#27308) Dipika Sikka 2025-10-29 03:54:21 -04:00
  • 8b62495076 [Bugfix] Fix non-contiguous tensor error in rocm_unquantized_gemm_impl (#27605) Zhewen Li 2025-10-29 00:00:15 -07:00
  • 83fd49b1fc [CI/Build][Bugfix]Fix Quantized Models Test on AMD (#27712) Zhewen Li 2025-10-28 23:27:30 -07:00
  • a4a4f0f617 [KV Connector] Update lmcache connector with latest compatibility (#27681) Shaoting 2025-10-28 22:38:37 -07:00
  • 0d8161b075 [Model] Fix Qwen3VL and Qwen3Omni after torch.compile changes (#27705) Lukas Geiger 2025-10-29 05:28:20 +00:00
  • d2c33c397a [NIXL][XPU] update name of nixl wheel (#27631) liuzhenwei 2025-10-29 12:43:29 +08:00
  • f6d5f5888c [Build] Revert triton_kernels requirements (#27659) Varun Sundar Rabindranath 2025-10-29 00:07:09 -04:00
  • 9007bf57e6 Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27714) Simon Mo 2025-10-28 20:58:01 -07:00
  • f257544709 Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 (#27598) v0.11.1rc4 Huy Do 2025-10-28 19:39:15 -07:00
  • 0b51c9bd8b [Core] Early return in SlidingWindowManager.remove_skipped_blocks (#27673) Jialin Ouyang 2025-10-28 18:32:33 -07:00
  • d3ab240f39 [Bug] Fix deepep low latency use nvlink by default (#27677) Wentao Ye 2025-10-28 19:53:12 -04:00
  • 94666612a9 [Misc][qwen2_5_vl][torch.compile] Enable supports_torch_compile on generic nn.Module and demonstrate speedup on Qwen Vision model (#23207) Lucas Kabela 2025-10-28 15:36:43 -07:00
  • 4fe5895361 [AsyncScheduling] Make async overlap work with logprobs (#27615) Nick Hill 2025-10-28 15:35:54 -07:00
  • 111faf1118 [Core] Scheduler: Publish connector events after output (#25875) Or Ozeri 2025-10-28 23:01:33 +02:00
  • 6afc28a9ba [Test] Batch Invariant: Unit test using parameterized backend (#27478) Wentao Ye 2025-10-28 16:51:35 -04:00
  • 141e6a0505 [Misc] Make reorder batch also separate extends (#27367) Lucas Wilkinson 2025-10-29 01:55:10 +08:00
  • 130aa8cbcf Add load pattern configuration guide to benchmarks (#26886) Matvei Pashkovskii 2025-10-28 19:49:15 +02:00
  • e3d8186666 [compile] Add fallback path to AOT compile when serialization fails. (#27350) Zhengxu Chen 2025-10-28 12:54:26 -04:00
  • f5710ef02a [Misc] Make LayerBlockType a Literal instead of Enum (#27658) Cyrus Leung 2025-10-29 00:23:35 +08:00
  • a8c02fb5bf [Bugfix][CI] Fix v1 attention backend tests and add CI coverage (#26597) Mohammad Miadh Angkad 2025-10-28 23:42:05 +08:00
  • 02af36df36 [Bugfix] Fix allocation & free logic of SingleWriterShmRingBuffer (#27117) Kero Liang 2025-10-28 23:01:24 +08:00
  • e88bdd60d9 [FLA] Introduce Kimi Delta Attention(KDA) to VLLM (#27654) Zhiyuan Li 2025-10-28 22:56:28 +08:00
  • 05e034f085 [nit]: Fix import for the lmcache integration (#27600) Samuel Shen 2025-10-28 07:40:55 -07:00
  • 936643a868 [BugFix] Also consider RAY_EXPERIMENTAL_NOSET_* when storing compilation cache (#27294) ℍ𝕠𝕝𝕝𝕠𝕨 𝕄𝕒𝕟 2025-10-28 16:22:28 +02:00
  • b186149e8e [Bugfix][Frontend] validate arg priority in frontend LLM class before add request (#27596) Junpu Fan 2025-10-28 07:02:43 -07:00
  • 2abbd351ef [Core] Enable async scheduling for external_launcher mode (#27394) 22quinn 2025-10-28 06:52:47 -07:00
  • 446912d1cb fix: allow HuggingFace standard chat template params via **kwargs (#27622) wangln19 2025-10-28 21:12:34 +08:00
  • a00d6254e9 [compile] Disable dynamo guards check for AOT compilation. (#27288) Zhengxu Chen 2025-10-28 08:58:12 -04:00
  • 05181cc57f [Hybrid] Add mamba_block_size to Engine Args (#27289) Asaf Joseph Gardin 2025-10-28 14:54:24 +02:00
  • 259504e147 [compile] Add enable_prompt_embeds to compile hash. (#27285) Zhengxu Chen 2025-10-28 08:46:03 -04:00
  • 0484b64248 [Bug] Fix shape issue for eplb expert weights (#27589) Wentao Ye 2025-10-28 08:44:05 -04:00
  • f58d9b6404 [Misc] Separate out utils.counter and move utils.Device to engine (#27588) Cyrus Leung 2025-10-28 20:20:46 +08:00
  • 44b5ce956d [Bugfix] In LongRoPE, decide short vs long based on max_model_len (#27431) Matthew Bonanni 2025-10-28 08:00:56 -04:00
  • 7a865f2325 [V0 Deprecation] Remove vestigial V0 logits_processors.py file (#27601) Nick Hill 2025-10-28 04:17:45 -07:00
  • 2fa90bda27 Fix a robust parsing issue in KimiK2ToolParser that causes IndexError (#27565) wangln19 2025-10-28 19:11:50 +08:00
  • 0291fbf65c [CI/Build] Fix amd model executor test (#27612) Zhewen Li 2025-10-28 01:58:11 -07:00
  • b46e4a06f1 [Core][Bookkeeping Optimization] Update against numpy view of is_token_ids tensor (#27618) Jialin Ouyang 2025-10-28 01:13:10 -07:00
  • d34f5fe939 [Bugfix][CPU] Fallback oneDNN linear to torch linear to fix half gemm support on legecy platforms (#27526) Li, Jiang 2025-10-28 14:25:44 +08:00
  • bdb01a38fe [Hardware][AMD][Model] Triton MoE tuning configs for GLM-4.6 for MI300X (#27323) Eric Yue 2025-10-28 13:58:06 +08:00
  • 5b3c35a68e [ROCm] [Doc] Update ROCm installation docs (#27327) vllmellm 2025-10-28 13:00:50 +08:00
  • 61fbfe5274 [Bugfix] fixed inconsistent finish_reason handling between V0 and V1 engines (#27555) Chauncey 2025-10-28 10:18:08 +08:00
  • 255e34ca50 [Stability fix] turn off HMA allocator when connector is set (#27592) Kuntai Du 2025-10-27 18:32:23 -07:00
  • a8d2e326ec [Bugfix][CI] Fix config resolving logic with remote models (#27610) Roger Wang 2025-10-27 17:48:32 -07:00
  • 53a56e658b [gpt-oss][2/N] Support input_messages in responsesRequest (#26962) Andrew Xia 2025-10-27 16:15:49 -07:00
  • 69f064062b Code quality improvements: version update, type annotation enhancement, and enum usage simplification (#27581) usberkeley 2025-10-28 01:50:22 +08:00
  • 921e78f4bb [ROCm] Update AITER branch for ROCm base docker (#27586) Micah Williamson 2025-10-27 12:22:33 -05:00
  • 6ebffafbb6 [Misc] Clean up more utils (#27567) Cyrus Leung 2025-10-27 23:30:38 +08:00
  • 3b96f85c36 [Chore]: Stream tokens vs characters in tool call parser tests (#26513) Ben Browning 2025-10-27 11:06:25 -04:00
  • 23ad820553 fixing mm placeholder replacement issue with gemma3 (#27538) tingtinggithub 2025-10-27 07:34:01 -07:00
  • 5d3be3ba4c [Bugfix][LoRA][FusedMoE] Select MxFP4 Backend based on LoRA Enablement (#27487) Varun Sundar Rabindranath 2025-10-27 10:32:50 -04:00
  • 4f882be4a0 [Model] Siglip2 Model Support (#27566) Yu Jiaqi 2025-10-27 21:57:37 +08:00
  • 9273754222 [Hybrid] Added supports_mamba_prefix_caching Protocol (#27339) Asaf Joseph Gardin 2025-10-27 15:05:20 +02:00
  • f4e8154076 [Kernel] Enable moe LoRA kernel support FP16 (#27468) Jee Jee Li 2025-10-27 19:48:37 +08:00
  • a663f6ae64 [cpu][perf] Fix low CPU utilization with VLLM_CPU_OMP_THREADS_BIND on AArch64 (#27415) Fadi Arafeh 2025-10-27 11:14:55 +00:00
  • a4fc21895e [Bugfix] Fixed when return_token_ids=False, the first event still contains prompt_token_ids. (#27561) Chauncey 2025-10-27 19:06:43 +08:00
  • a3e8611da5 [Bugfix] Limit the default value of max_model_len when it is not specified by users (#27556) Shanshan Shen 2025-10-27 18:16:20 +08:00
  • 7c2bdb83dc [Misc] Clean up utils (#27552) Cyrus Leung 2025-10-27 17:05:40 +08:00
  • 9932ed6a83 [Kernel] Adding split_K implementation for fused_moe_lora (#27291) Danielle Robinson 2025-10-27 02:05:24 -07:00
  • 2d631d28c6 [Doc] Slight improvement to M2 and beyond (#27554) Jee Jee Li 2025-10-27 17:02:10 +08:00
  • b368382964 [Model] Deprecate merge_by_field_config=False (#27551) Cyrus Leung 2025-10-27 16:43:00 +08:00
  • a806c14cc7 [Performance][LoRA] add context varying params to 'do_not_specialize' in fused moe lora (#27445) gnovack 2025-10-26 23:31:55 -07:00
  • 181bf5bbde [Docs] reemove the incorrect enable_reasoning parameter (#27550) yyzxw 2025-10-27 14:17:19 +08:00
  • cbd5e07a51 [Model] Use merge_by_field_config for MM models (Qwen series) (#27546) Cyrus Leung 2025-10-27 13:38:05 +08:00
  • 63b22e0dbb [Model][Bugfix] fix ernie45 moe 300B SharedFusedMoE output tuple (#27316) CSWYF3634076 2025-10-27 11:53:31 +08:00
  • 5980604c44 Fix MiniMax-M2 copyright (#27537) Roger Young 2025-10-27 11:29:51 +08:00
  • 361a7463d3 fix m2 test (#27536) youkaichao 2025-10-27 01:04:36 +08:00
  • 720af6ab79 [Model][MiniMax-M2] Support MiniMax-M2 Model (#27535) Roger Young 2025-10-27 00:59:11 +08:00
  • 55cba4a05c [CI/Build] Update causal-conv1d installation (#27529) Cyrus Leung 2025-10-26 22:14:22 +08:00
  • c7abff2990 Revert "[CI/Build] Use CPU for mm processing test on CI (#27522)" (#27531) Cyrus Leung 2025-10-26 19:44:27 +08:00
  • 71b1c8b667 [Chore]:Extract math and argparse utilities to separate modules (#27188) Yeshwanth N 2025-10-26 16:33:32 +05:30
  • 8fb7b2fab9 [Doc] Fix links to GH projects (#27530) Cyrus Leung 2025-10-26 17:55:51 +08:00
  • be7b55a83d [Doc] Remove Molmo warning (#27527) Cyrus Leung 2025-10-26 16:22:52 +08:00
  • 315b860abe [bugfix]fix empty prompts for async-engine mode in benchmark throughput (#27494) Lucia Fang 2025-10-26 01:16:35 -07:00
  • 87c41c26ad [Bugfix] Fix processor initialization for model from modelscope instead of HF (#27461) rongfu.leng 2025-10-26 15:44:31 +08:00
  • 65d2cf9511 [BUGFIX][ROCM] ViT FlashAttention on ROCm (no GFX9) and contiguous on qwen3vl ROCm TORCH_SDPA (#27190) JartX 2025-10-26 08:08:52 +01:00
  • d63cd9ff10 [CI/Build] Use CPU for mm processing test on CI (#27522) Isotr0py 2025-10-26 13:09:18 +08:00
  • 66a168a197 [CI/Build] Refactor processing tests (#27470) Cyrus Leung 2025-10-26 00:14:30 +08:00
  • a99564ac5b [Attention] Add missing kv cache scale setup (#27490) Matthew Bonanni 2025-10-25 03:12:49 -04:00
  • 4c5f632165 [Misc] Simplify max tokens in multimodal registry (#27500) Cyrus Leung 2025-10-25 14:56:01 +08:00
  • b853540388 [Core][Hybrid allocator + kv connector 1/n] Enable hybrid allocator + KV cache connector (#25712) Kuntai Du 2025-10-24 23:34:18 -07:00
  • 56ed7609a9 Revert "[Misc] Remove use of CUDA_VISIBLE_DEVICES for device selectio… (#27502) Zhuohan Li 2025-10-24 22:31:43 -07:00
  • 29c9cb8007 [CI] Add tests for cudagraph (#27391) Jiangyun Zhu 2025-10-25 10:37:33 +08:00
  • 83f478bb19 [KVConnector] Migrate the LMCache integration code to be vLLM native (#25542) v0.11.1rc3 Yihua Cheng 2025-10-24 17:23:53 -07:00
  • 269c4db0a4 [Misc][DP] Guard mxfp4 implementation selection (#27484) Varun Sundar Rabindranath 2025-10-24 19:29:24 -04:00
  • 52efc34ebf [Log] Optimize Startup Log (#26740) Wentao Ye 2025-10-24 19:27:04 -04:00
  • d95d0f4b98 [Distributed] Basic set of configuration for large EP deployment on GB200 (#27328) Pengchao Wang 2025-10-24 14:16:44 -07:00
  • 0402428200 [Perf][Async Scheduling] Remove CPU->GPU sync in dummy_run (#27455) Lehua Ding 2025-10-25 04:45:36 +08:00