Commit Graph

  • 09ad3b76b3 [Bug] Fix attention_backend arg string parsing (#30534) Michael Goin 2025-12-12 10:40:50 -05:00
  • dc13c99eed fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408) Christina Norman 2025-12-12 09:10:12 -06:00
  • 3e34adcdfb [DeepSeek V3.2] Proper drop_thinking logic (#30490) Vladislav Nosivskoy 2025-12-12 18:01:06 +03:00
  • 3e41992fec [Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532) Lucas Wilkinson 2025-12-12 08:57:47 -05:00
  • 91401c7a26 [Bugfix] Fix CMakeLists Environment Variable (#21804) 吴坎 2025-12-12 18:54:52 +08:00
  • f90319d5d1 [Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692) Jaehwang Jung 2025-12-12 19:27:20 +09:00
  • 302b2c1eb9 [CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291) rasmith 2025-12-12 03:30:23 -06:00
  • 8f8fda261a [Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729) Ben Browning 2025-12-11 23:59:53 -05:00
  • fe1787107e [compile] Parse compile range cache keys as Range during cache loading. (#30516) Zhengxu Chen 2025-12-11 23:30:51 -05:00
  • 783644e4ac [ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527) Andreas Karatzas 2025-12-11 21:54:56 -06:00
  • 197473c4e7 [CI/Build] Use spawn subprocess for ROCm (#30272) Ryan Rock 2025-12-11 21:33:17 -06:00
  • 947dfda9c2 [LMCache] Relax lmcache version requirement (#30425) Nick Hill 2025-12-11 19:18:47 -08:00
  • 9f2fc16a69 [Bugfix][Model] Fix Afmoe rope_parameters issue (#30505) Michael Goin 2025-12-11 21:53:57 -05:00
  • 6a6fc41c79 gptq marlin quantization support for fused moe with lora (#30254) Bhanu Prakash Voutharoja 2025-12-12 13:27:22 +11:00
  • f355ad5412 [CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481) Fadi Arafeh 2025-12-12 02:09:25 +00:00
  • 042da73244 [Core] Refactor _build_attention_metadata (#29628) Lucas Wilkinson 2025-12-11 20:54:12 -05:00
  • b5945d49c0 [ROCm][CI] Use mi325_4 agent pool for V1 e2e tests (#30526) Andreas Karatzas 2025-12-11 19:37:24 -06:00
  • ba80926681 [CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508) rasmith 2025-12-11 19:02:19 -06:00
  • 0ab23c2b2b [fix] fix SM check for Flashinfer TRTLLM MOE (#30314) jiahanc 2025-12-11 17:00:58 -08:00
  • 48661d275f [CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417) rasmith 2025-12-11 18:24:20 -06:00
  • d527cf0b3d [FIX]Patch run-cluster.sh (fix for #28328) (#30002) Ev Lacey 2025-12-11 15:36:31 -08:00
  • 2cc5affc38 [ROCM][CI] Fix AMD Examples Test Group (#30276) Concurrensee 2025-12-11 17:03:54 -06:00
  • a00d88973d [EPLB] Support EPLB w/ NVFP4 (#29804) Andrew Briand 2025-12-11 16:59:40 -06:00
  • 61249b177d [Refactor] Remove useless syncwarp (#30510) Wentao Ye 2025-12-11 17:43:41 -05:00
  • c817b14151 [Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement (#30494) Wentao Ye 2025-12-11 17:28:34 -05:00
  • 3efdc3feae [Docs][CPU backend] Add pre-built Arm CPU Docker images (#30491) ioana ghiban 2025-12-11 23:03:29 +01:00
  • 0efd9f867c [Core] Whisper Enable Encoder Batching (#29421) Nicolò Lucchesi 2025-12-11 22:06:51 +01:00
  • 90d6cf921f [BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS (#30472) Xingyu Liu 2025-12-11 13:00:15 -08:00
  • cf3eacfe58 Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389) Harry Mellor 2025-12-11 20:45:23 +00:00
  • 92fea56fd1 [compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503) Zhengxu Chen 2025-12-11 15:28:03 -05:00
  • e458270a95 [Misc] Add mcp to requirements (#30474) Ye (Charlotte) Qi 2025-12-11 12:06:09 -08:00
  • 72aaac5b66 [ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding (#30430) Andreas Karatzas 2025-12-11 13:25:01 -06:00
  • 0e71eaa644 [Feature] AWQ marlin quantization support for fused moe with lora (#30442) 汪志鹏 2025-12-12 02:03:32 +08:00
  • 8781cd6b88 Add Eagle and Eagle3 support to Transformers modeling backend (#30340) Harry Mellor 2025-12-11 17:02:10 +00:00
  • aa3c250c48 [IMPROVEMENT] Change MistralReasoningParser behavior (#30391) Julien Denize 2025-12-11 17:53:26 +01:00
  • 305b168a9f [CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version (#30341) Shengqi Chen 2025-12-12 00:42:30 +08:00
  • 93db3256a4 Give pooling examples better names (#30488) Harry Mellor 2025-12-11 16:22:58 +00:00
  • 17cb540248 [Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wheels (#30402) ioana ghiban 2025-12-11 16:57:10 +01:00
  • 97a042f3bc Make the httpx logger less annoying when Transformers v5 is installed (#30480) Harry Mellor 2025-12-11 15:44:56 +00:00
  • 3a3b06ee70 [Misc] Improve error message for is_multimodal (#30483) Cyrus Leung 2025-12-11 22:39:51 +08:00
  • f4417f8449 [KVConnector] Add KV events to KV Connectors (#28309) Martin Hickey 2025-12-11 14:30:29 +00:00
  • a11f4a81e0 [Misc][PCP&DCP] relocate PCP feature check (#30050) Qiu 2025-12-11 19:36:18 +08:00
  • 853611bb18 Fix typo of endpoint name in CLI args docs (#30473) Kenichi Maehashi 2025-12-11 20:07:56 +09:00
  • d917747c95 [Bugfix] Fix task still being passed in tests/benchmarks (#30476) Cyrus Leung 2025-12-11 18:33:55 +08:00
  • a5f9fb5960 [Deprecation] Deprecation --convert reward, use --convert embed instead. (#30463) wang.yuqi 2025-12-11 18:18:25 +08:00
  • 4515eb1a0b [Fix] Update lazing loading of video loader backend (#30444) jeremyteboul 2025-12-11 02:14:57 -08:00
  • 13d63b65e0 [Deprecation] Remove missed fallback for embed_input_ids (#30469) Cyrus Leung 2025-12-11 18:06:36 +08:00
  • b4e8b91278 [Fix]fix import error from lmcache (#30376) wz1qqx 2025-12-11 17:23:52 +08:00
  • 6299628d32 [bugfix] fix MiniMaxM2ReasoningParser streaming output not separating reasoning_content. (#29882) Rei. 2025-12-11 17:05:08 +08:00
  • fba8906930 [perf] Use direct copy (broadcast) instead of cat for k_nope/k_pe in MLA prefill (#29710) Ming Yang 2025-12-11 00:20:45 -08:00
  • d02d1043de fix: enhance human_readable_int function (#30337) Ning Xie 2025-12-11 15:30:33 +08:00
  • 979f50efd0 [Deprecation] Remove fallbacks for embed_input_ids and embed_multimodal (#30458) Cyrus Leung 2025-12-11 14:58:23 +08:00
  • 36c9ce2554 Ensure minimum frames for GLM 4.6V compatibility (#30285) gh-wf 2025-12-11 00:26:49 -05:00
  • 1a516557e1 [Doc] Add Baidu Kunlun XPU support (#30455) xyDong0223 2025-12-11 12:52:17 +08:00
  • d6464f2679 [Chore] Fix torch precision warning (#30428) Wentao Ye 2025-12-10 23:05:56 -05:00
  • 7e24e5d4d6 [Deprecation] Remove deprecated task, seed and MM settings (#30397) Cyrus Leung 2025-12-11 11:59:39 +08:00
  • 5a87d8b9b1 [Deprecation] Remove deprecated plugin and compilation fields for v0.13 release (#30396) Cyrus Leung 2025-12-11 11:59:35 +08:00
  • d1e1fb4363 [Bugfix] Fix grouped_topk pytorch impl when num_experts can't be grouped properly (#29439) Divakar Verma 2025-12-10 21:47:18 -06:00
  • b51255f369 [ROCm] Fix broken import in platform attention backend dispatching (#30432) Andreas Karatzas 2025-12-10 19:12:58 -06:00
  • b4054c8ab4 Revert "[CI] Add Async Eplb nightly CI tests (#29385)" (#30431) Sage Moore 2025-12-10 16:48:35 -08:00
  • 25221b44bb Add more docs for regex (#30106) Xu Song 2025-12-11 08:12:21 +08:00
  • 8580919ac3 [Bugfix] fix confusing OOM errors during v1 init (#28051) shivampr 2025-12-10 15:17:41 -08:00
  • 166ac3c94d fix(shm): Add memory barriers for cross-process shared memory visibility (#30407) Christina Norman 2025-12-10 17:01:19 -06:00
  • b9e0951f96 [docs] Improve wide-EP performance + benchmarking documentation (#27933) Seiji Eicher 2025-12-10 17:15:54 -05:00
  • fcb894222f [Docs] Update EPLB docs (#30426) Michael Goin 2025-12-10 15:56:51 -05:00
  • 6ccb7baeb1 [LMCache] Fix breakage due to new LMCache version (#30216) Nick Hill 2025-12-10 11:52:01 -08:00
  • eea41804a4 [bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used (#30241) Po-Han Huang (NVIDIA) 2025-12-11 03:18:51 +08:00
  • 9f042ba26b [Perf] Enable environment cache in EngineCore to enable the feature for UniProcExecutor as well (#29289) Jialin Ouyang 2025-12-10 11:13:01 -08:00
  • e72d65b959 {Deprecation] Remove tokenizer setter (#30400) Cyrus Leung 2025-12-11 03:10:58 +08:00
  • a9e4106f28 [P/D] KV Load Failure Recovery/Abort Configuration (#26813) Will Eaton 2025-12-10 14:00:52 -05:00
  • e8e8cd73e5 [Bugfix] Fix HunyuanOCR cross-image contamination in batch processing (#30344) Anker 2025-12-10 19:09:31 +01:00
  • 253305d5b2 [Chore] Delay recent deprecations (#30398) Cyrus Leung 2025-12-11 01:48:38 +08:00
  • 794a7875ee [Misc] Consistent case for vllm bench serve results (#30403) Matthew Bonanni 2025-12-10 12:44:02 -05:00
  • 2dcbac9077 [Docs] Generate full list of metrics in user docs (#30388) Mark McLoughlin 2025-12-10 16:09:34 +00:00
  • aacf0abf8b [BugFix] Fix AttributeError: 'MergedColumnParallelLinear' object has no attribute 'weight_scale' (#30399) Lucas Wilkinson 2025-12-10 10:59:23 -05:00
  • c756fb6781 [Core] Whisper enable FULL_DECODE_ONLY CudaGraph (#30072) Nicolò Lucchesi 2025-12-10 15:14:24 +01:00
  • d017bceb08 [BugFix] Fix minimax m2 model rotary_dim (#30384) Roger Young 2025-12-10 20:58:50 +08:00
  • cebda2a4af [CPU] Support for Whisper (#30062) Aditya Tewari 2025-12-10 12:58:42 +00:00
  • 53d2420b44 [Bugfix] tpu_model_runner: set vllm config context when calling reset_dynamo_cache() (#30331) Daniele 2025-12-10 13:58:35 +01:00
  • 9db78f34dc [Bugfix] Fix the issue where DeepSeek v3.2 cannot use structured_output (#30371) Chauncey 2025-12-10 16:30:16 +08:00
  • 434ac76a7c [cpu][ci] Add CPU Attention Tests for Neon Backend (#30347) Fadi Arafeh 2025-12-10 05:37:35 +00:00
  • ed7af3178a [ROCm][CI] Attempt to fix the failures under a subgroup of the e2e the test group (#29358) Andreas Karatzas 2025-12-09 23:33:13 -06:00
  • 180345807f [CMake][Build]: Remove unused ACL CMake env variables (#30339) Radu Salavat 2025-12-09 20:27:19 -08:00
  • d007387aa7 [Bugfix] Cache added_vocab to avoid per-token overhead (#30351) Mingliang Li 2025-12-10 12:05:51 +08:00
  • 3bdd426636 Fix typos in comments across multiple files (#30345) Wilson Wu 2025-12-10 12:05:28 +08:00
  • 06462392e4 [bugfix][quantization] fix quark qwen3 kv_cache quantization (#30308) haoyangli-amd 2025-12-10 11:24:12 +08:00
  • 7d80c73d42 [CI] Reduce Flakiness For test_spec_decode.py::test_suffix_decoding_acceptance (#30367) v0.13.0rc1 Micah Williamson 2025-12-09 20:35:49 -06:00
  • b75f826fca [CI/Build][AMD] Skip quantization kernels tests that require CUTLASS or e4m3fn when not supported by platform (#30020) rasmith 2025-12-09 20:28:37 -06:00
  • c3487aca34 [responsesAPI][6] Fix multi turn MCP tokenization (#30230) Andrew Xia 2025-12-09 18:13:13 -08:00
  • abe93bce59 [Attention] Make seq_lens_cpu optional in CommonAttentionMetadata to enable true async spec-decode (#29624) Lucas Wilkinson 2025-12-09 20:18:10 -05:00
  • 2e7035dd8c [Bugfix] Fix fp8 DeepGemm compilation issues (#30336) ElizaWszola 2025-12-10 02:17:25 +01:00
  • 4c2e10ea19 [Bugfix] Fix cuda graph sizes when running with speculative decoding (#30330) PatrykSaffer 2025-12-10 01:47:07 +01:00
  • 03b5f940fd [V1][Spec Decode] Optimize Medusa proposer to avoid GPU-CPU sync (#29723) dongbo910220 2025-12-10 08:15:01 +08:00
  • 2e7054da06 Improve wvsplitK tile and balance heristics. (#29937) Hashem Hashemi 2025-12-09 15:51:32 -08:00
  • 3c680f4a17 [Rocm][torch.compile] Adding layernorm + fp8 block quant and silu + fp8 block quant for Aiter (#25693) Charlie Fu 2025-12-09 16:39:26 -06:00
  • fccd532587 [Quantization] FP8 Weight Reloading for Quantized RL Rollout (#28480) Kyle Sayers 2025-12-09 16:54:32 -05:00
  • 00e5cbb967 [MoE][Refactor] Remove most arguments to FusedMoEMethodBase.apply (#29066) bnellnm 2025-12-09 16:48:25 -05:00
  • 7618dc973d [CI/Build] Make test_mha_attn.py run on correct platform only and check for flash_attn_varlen_func in layer.py (#29145) rasmith 2025-12-09 14:18:17 -06:00
  • f8dacc66b6 Bump actions/stale from 10.1.0 to 10.1.1 (#30234) dependabot[bot] 2025-12-09 20:12:14 +00:00
  • 7cab92fd45 Bump actions/checkout from 6.0.0 to 6.0.1 (#30233) dependabot[bot] 2025-12-09 20:03:16 +00:00