Commit Graph

  • 682566b18e [Bug] Refactor max_num_batched_tokens to account for drafting (#34898) Benjamin Chislett 2026-02-22 11:18:46 -05:00
  • b9c2a565cc [Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup (#34529) qizixi 2026-02-22 08:08:32 -08:00
  • dd8c3a7fb2 [ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays (#35052) Andreas Karatzas 2026-02-22 04:07:18 -06:00
  • a8a47c17b6 [ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison (#35050) Andreas Karatzas 2026-02-22 03:03:44 -06:00
  • 40f88d8318 [Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser (#34779) Roger Wang 2026-02-21 23:15:35 -08:00
  • 2cbf9656ce [Model Runner V2] Enable CUDA graph for Eagle3 (#35040) Woosuk Kwon 2026-02-21 21:42:50 -08:00
  • 30132cd144 Fix apply_top_k_top_p_triton called by non-cuda logits Tensor (#35030) Xiao Li 2026-02-21 21:11:54 -08:00
  • cbd95a2dd1 [Benchmark] Use sns.relplot for plotting (#35027) Cyrus Leung 2026-02-22 12:26:48 +08:00
  • 970861ac0c [New Model] Add ColModernVBERT (#34558) Athrael Soju 2026-02-22 04:23:41 +00:00
  • d24bdd7c4b [CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests (#34961) Wentao Ye 2026-02-21 23:23:24 -05:00
  • d403c1da1c [CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream (#35008) Andreas Karatzas 2026-02-21 22:01:10 -06:00
  • b71fbd06e2 [Model Runner V2] Support attention group (#35036) Woosuk Kwon 2026-02-21 16:42:53 -08:00
  • 74d90b1ce4 [Model Bash][DSR1] Add selective dynamic shape marking for CustomOp (#34900) Vadim Gimpelson 2026-02-22 04:28:01 +04:00
  • a4047d4ea9 [Model Runner V2] Support Eagle3 (no CUDA graph) (#35029) Woosuk Kwon 2026-02-21 12:55:24 -08:00
  • 965fe45935 [CI/Build] Fix gRPC version mismatch (#35013) Cyrus Leung 2026-02-22 03:14:41 +08:00
  • 98b0205c3c [Frontend] Add automatic language detection for Whisper transcription (#34342) Roman 2026-02-21 13:49:41 +01:00
  • 272b535ab3 [Bugfix] Gate 256-bit instructions to CUDA 12.9+ (#34791) Huy Do 2026-02-21 04:48:14 -08:00
  • f74f1572ca [Benchmark] Improve benchmarks (#35012) Cyrus Leung 2026-02-21 18:31:58 +08:00
  • bebfe55b1c [Doc] Fix example of eagle3 (#34960) petrpechman 2026-02-21 10:57:53 +01:00
  • 820d7815eb [Core] Minor structured-output related scheduler optimization (#34765) Nick Hill 2026-02-21 01:38:28 -08:00
  • ab6f3487a6 [PD] Change kv_load_failure_policy Default from "recompute" to "fail" (#34896) Nicolò Lucchesi 2026-02-21 10:34:57 +01:00
  • 8dc8a99b56 [ROCm] Enable bitsandbytes quantization support on ROCm (#34688) BADAOUI Abdennacer 2026-02-21 09:34:55 +01:00
  • 2aab2bb543 [ROCM] Optimize ROCM_AITER_FA spec decode eagle performance (#34541) jennyyyyzhen 2026-02-20 20:32:05 -08:00
  • 54254f7a61 [ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends (#34599) Andreas Karatzas 2026-02-20 22:25:23 -06:00
  • cf93c1a128 [ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 (#34570) Andreas Karatzas 2026-02-20 22:25:07 -06:00
  • 89358f0d35 [CI] Fix ColBERT HF comparison tests on AMD CI + refactor (#34567) Andreas Karatzas 2026-02-20 22:12:05 -06:00
  • a0fe7ea2f0 [feat] Add per-block extra_keys to KV events (#33304) zhongdaor-nv 2026-02-20 21:11:40 -07:00
  • 991d6bff38 [CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949) Andreas Karatzas 2026-02-20 22:03:32 -06:00
  • 5719a4e4e6 [Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed (#34574) Kata Coder 2026-02-21 13:01:40 +09:00
  • 11be2c74dc [Realtime] Add Qwen3-ASR realtime streaming support (#34613) pougetat 2026-02-20 19:59:42 -08:00
  • 7a5adad480 [Kernel] Optimize sample_recovered_tokens_kernel (#34974) Xin Yang 2026-02-20 19:59:06 -08:00
  • 59c6233297 Support prompt_embeds for pooling requests in output processor (#34904) Li 2026-02-20 22:57:38 -05:00
  • d38cd3dde5 [Misc] Fix mypy errors in vllm/profiler and remove from exclude list (#34959) Taneem Ibrahim 2026-02-20 21:56:33 -06:00
  • ded333fb9b [ROCm][Bugfix]: Only save unpadded sizes for shared_experts in MoERunner to fix rmsnorm pad fusion (#34636) Rohan Potdar 2026-02-20 21:56:16 -06:00
  • 9d7577b2bd [Kernel] [Helion] [9/N] Canonicalize GPU variant names to base model names (#34928) Yanan Cao 2026-02-20 19:55:51 -08:00
  • e739c29ea4 [CI/Build] Add opentelemetry libs in default vllm build (requirements/common.txt) (#34466) Vlad Tiberiu Mihailescu 2026-02-20 19:54:55 -08:00
  • a55caf6ae9 [LoRA] Support Quantized Adapters (#30286) yugong333 2026-02-20 19:54:35 -08:00
  • 0e22cd618b Revert "[Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers " (#34997) Lucas Wilkinson 2026-02-20 20:19:19 -05:00
  • ea5f903f80 Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion (#34899) Wei Zhao 2026-02-20 16:37:31 -05:00
  • 0632ed8778 [AMD][CI] Fix test_custom_allreduce for A100 testgroup (#34735) Ryan Rock 2026-02-20 15:33:04 -06:00
  • aaefc58ee0 [CI] Revert PRs 34818 and 33600 (#34979) Lucas Wilkinson 2026-02-20 16:25:50 -05:00
  • f24b2de3d3 [Test] Add FP8 KV Cache Testing for MLA Backends (#34473) Wei Zhao 2026-02-20 13:51:58 -05:00
  • fac1507f03 [CI] Remove failing prime-rl integration test (#34843) Michael Goin 2026-02-20 13:17:42 -05:00
  • f863994084 [compile] Fix torch.compile time discrepancy in logging. (#34912) Zhengxu Chen 2026-02-20 11:47:14 -05:00
  • e4a5d8c653 [compile] Move torch_aot_compile directory under torch_compile_cache (#34831) Zhengxu Chen 2026-02-20 11:46:45 -05:00
  • a6d0299c75 [Kernel] [Helion] [6/N] Add num_tokens dimension to silu_mul autotuning and dispatching (#34185) Yanan Cao 2026-02-20 08:36:51 -08:00
  • 6ce80f7071 Ensure that MkDocs v2 does not get installed (#34958) Harry Mellor 2026-02-20 15:38:11 +00:00
  • 1fe462168c [perf] Avoid dtype promotion sync in mamba_get_block_table_tensor (#34870) Huamin Li 2026-02-20 06:21:56 -08:00
  • ed31a020ee [Refactor] Extract Harmony streaming SSE event builders into streaming_events.py (#34909) Flora Feng 2026-02-20 09:20:46 -05:00
  • f9ac19204f [V0 Deprecation] Remove unused MM placeholders in request output (#34944) Cyrus Leung 2026-02-20 22:19:23 +08:00
  • 59965affbd [BUGFIX] Fix _dummy_run missing prepare_inputs_event synchronization (#34866) Vadim Gimpelson 2026-02-20 17:54:27 +04:00
  • b1c4f0b265 [Kernel] Optimize grouped topk kernel (#34206) Xin Yang 2026-02-20 01:34:45 -08:00
  • 8de7c636cc [Bugfix][Hardware][AMD] Fix ROCM_AITER_FA speculative decoding support (#32877) Kevin McKay 2026-02-20 00:25:46 -06:00
  • 059779231f [Minor] Add logging when using MXFP4 MXFP8 TRTLLM backend (#34916) Frank Wang 2026-02-19 22:07:57 -08:00
  • ea37530b47 [Models] LFM2: Support LoRA (#34921) tianshu-Michael-yu 2026-02-19 22:07:23 -08:00
  • f5432e35a3 [ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout (#34922) Micah Williamson 2026-02-19 23:37:49 -06:00
  • 07cab212f0 [Misc] Add deprecated environment variable utilities (#33677) 杨朱 · Kiki 2026-02-20 13:33:25 +08:00
  • 0c1dc42748 [CI][AMD][BugFix][P/D] Add default_vllm_config to test_moriio_connector.py so tests pass (#33739) rasmith 2026-02-19 23:32:40 -06:00
  • 676f82ae81 Add validation to reject non-text content in system messages (#34072) Varun Chawla 2026-02-19 21:30:33 -08:00
  • 81bfc21a6a [Model Bash]: Improve FP8 Oracle for Config Specific Kernel Selection (#34260) Elizabeth Thomas 2026-02-19 23:29:08 -06:00
  • 4e2c7caf2d [Bugfix] Add regression test for MoE quant_config under torch.compile (#34335) Matthias Gehre 2026-02-20 06:27:26 +01:00
  • d9e62c03eb [Quark] Fix MoE fp8 activation scale handling on mi300 (#34386) Bowen Bao 2026-02-19 21:27:14 -08:00
  • a1a2d79442 [ci] Use the right tag for CPU arm64 image (#34915) Kevin H. Luu 2026-02-19 19:59:15 -08:00
  • ac900c89bb [Refactor] Implement output type check in LLM (#34794) Cyrus Leung 2026-02-20 11:57:55 +08:00
  • 76df6072ff [Core] Fix state names in pause_scheduler() (#34840) Mark McLoughlin 2026-02-20 01:21:46 +00:00
  • 16f24e8797 [CI] Add GPT-OSS Eval job for H100 (#34359) Michael Goin 2026-02-19 20:14:54 -05:00
  • 40b2f1c3d9 [Model Runner V2] Minor CPU optimizations (#34856) Nick Hill 2026-02-19 16:05:37 -08:00
  • 648951a9c3 [Bugfix] Fix benchmark_fused_collective crash on CustomOp init (#34665) Mayank Ketkar 2026-02-19 16:01:00 -08:00
  • f72061a19a [UX] More descriptive reasons in is_supported_config for MoE (#34908) Michael Goin 2026-02-19 18:20:52 -05:00
  • 662205d34e [Bugfix] Fix Basic Models Test (#34818) Matthew Bonanni 2026-02-19 17:49:07 -05:00
  • 4fb8beefaa [Bugfix] Fix cutlass fp8 kernel on hopper for Qwen3.5 (#34914) Roger Wang 2026-02-19 13:34:55 -08:00
  • 304319c4ed Change targets for AMD build in the "CI" pipeline (#34918) Alexei-V-Ivanov-AMD 2026-02-19 15:26:53 -06:00
  • c683d11c94 [Refactor] Deprecate head_first for chunk_gated_delta_rule (#34263) Wentao Ye 2026-02-19 13:23:49 -05:00
  • 3eff45d793 Revert "[NemotronH] Do not force router to run in fp32 (#34582)" (#34808) roikoren755 2026-02-19 19:47:05 +02:00
  • 4685a630a2 [Model Bash][DeepSeekR1] Remove Shared Expert Clone (#34344) Robert Shaw 2026-02-19 10:56:14 -05:00
  • ee1d25f199 [Llama4,Quantization] Simplify and generalize logic for Q/K permutations in quantized self-attn layers (#34471) Eldar Kurtić 2026-02-19 16:55:41 +01:00
  • 6fff24f30f [Bugfix] Qwen3.5 kv-scale weight remapping (#34719) Linda 2026-02-19 13:13:37 +01:00
  • 23210a911e [CI/Build] Try to make beam search test less flaky (#34885) Cyrus Leung 2026-02-19 19:16:58 +08:00
  • 1391378861 [Bugfix] Fix edge case in UUID data parsing (#34884) Cyrus Leung 2026-02-19 18:24:30 +08:00
  • f6220f9877 [ROCm][Test] Fix beam search determinism failures from batch-size-dependent FP divergence and removed wrong marker (#34878) Andreas Karatzas 2026-02-19 02:25:26 -06:00
  • 2df2bb27b0 [ROCm][CI] Removing all blocking labels from MI355 until stable infra (#34879) Andreas Karatzas 2026-02-19 01:53:08 -06:00
  • f75b61a9e9 [Voxtral Realtime] Fix engine crash on empty multimodal embeddings (#34862) Tal Nir 2026-02-19 02:21:47 -05:00
  • 7f51e93864 [Bug] Fix DeepSeek V3 weight loading caused by incorrect prefix (#34876) Wei Zhao 2026-02-19 02:20:30 -05:00
  • 4611af1663 [Bugfix] Add Quant Config to Llava Next Projector (#34847) Alex Brooks 2026-02-19 00:18:23 -07:00
  • ad5aa6bd9f fix(docs): fix typos in comments and docstrings (#34836) Manrique Vargas 2026-02-19 02:17:41 -05:00
  • 9681068cf9 [Frontend] Fix reasoning_tokens for text-based parsers in Responses API (#33513) Jaeyeon Kim(김재연) 2026-02-19 08:16:41 +01:00
  • b6101d384d Deprecate test-pipeline.yaml (#34864) Kevin H. Luu 2026-02-18 18:15:27 -08:00
  • 5fcb0cdd68 [Model Runner V2] Use FP32 for Gumbel Noise (#34854) Woosuk Kwon 2026-02-18 17:07:37 -08:00
  • c878b43b64 [Model Runner V2] Remove unnecessary copies in PW CUDA graph capture (#34849) Woosuk Kwon 2026-02-18 15:52:50 -08:00
  • 2b84ac669c [CI][AMD][BugFix] Use torch.testing.assert_close instead of assert torch.allclose in test_rocm_skinny_gemms.py (#34181) rasmith 2026-02-18 17:10:19 -06:00
  • 11d3976b88 [Model Runner V2] support piecewise & mixed cudagraph (#32771) zhrrr 2026-02-19 07:03:17 +08:00
  • 40da9625a1 [MoE Refactor] Convert mxfp4 marlin into modular kernel format (#34588) Yongye Zhu 2026-02-18 17:37:14 -05:00
  • 8d9babd4de Fix empty tool_call_id in Anthropic messages API tool result conversion (#34745) Flora Feng 2026-02-18 17:31:59 -05:00
  • e99ba957ec [BUG] Fixing Weight Sync unit test (#34841) Aaron Hao 2026-02-18 14:20:10 -08:00
  • 64ac1395e8 [Docs] Clean up speculators docs (#34065) Kyle Sayers 2026-02-18 16:48:11 -05:00
  • 61cf087680 [Bugfix] Fix lora tests (#34834) Cyrus Leung 2026-02-19 05:22:31 +08:00
  • 847a57cd12 [Bugfix][MoE Kernel] Fix incorrect routing selection for models without expert groups (e.g., MiniMax-M2.1) (#34673) Wenlong Wang 2026-02-18 13:03:24 -08:00
  • fcd6ac97ed [CI][AMD][BugFix] Skip tests in test_unquantized_backend_selection that should not run on ROCm (#34655) rasmith 2026-02-18 14:00:40 -06:00
  • 95be2a7f22 [Model Runner V2] Minor simplification for DCP (#34786) Woosuk Kwon 2026-02-18 11:04:53 -08:00
  • 0e60c925cf [Bugfix] Remove assert causing hipErrorStreamCaptureUnsupported (#34455) Jaden Mathias 2026-02-18 13:54:54 -05:00