Commit Graph

  • bcc6f67447 [Bugfix] Use null block (0) for padded block table entries (#35431) SandishKumarHN 2026-03-30 14:02:51 -07:00
  • 1fc69f59bb [Bug fix][Quantization] Fix dummy weight loading (#38478) Asaf Gardin 2026-03-30 23:38:02 +03:00
  • d9c7db18da [ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381) Micah Williamson 2026-03-30 15:26:46 -05:00
  • 12701e8af2 [EPLB] Optmize eplb mapping and record in router for prefill (#36261) Ilya Markov 2026-03-30 21:48:33 +02:00
  • a26e8dc7ff [Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM (#38562) v0.18.1 Matthew Bonanni 2026-03-30 12:51:24 -04:00
  • 599e7359a3 release push khluu 2026-03-30 12:41:59 -07:00
  • 494636b29d [Feat][Spec Decode] DFlash (#36847) Benjamin Chislett 2026-03-30 15:03:15 -04:00
  • ab1a6a43fa [3/n] Migrate cutlass/scaled_mm_entry.cu torch stable ABI (#37221) mikaylagawarecki 2026-03-30 14:20:13 -04:00
  • d0cf73ce42 [release] Move the rest of release jobs to release queue (#38044) Kevin H. Luu 2026-03-24 16:40:58 -07:00
  • f0a5c5973b Add Ubuntu 24.04 support for Docker builds (#35386) amey asgaonkar 2026-03-24 13:34:44 -07:00
  • b7e4b88987 [release] Move agent queue to Release cluster queues (#37783) Kevin H. Luu 2026-03-23 20:36:47 -07:00
  • b5e608258e [Refactor] Unify engine process monitoring in engine manager and add Ray backend support (#35862) fangyuchu 2026-03-31 01:16:09 +08:00
  • 2c734ed0e0 [Bugfix][MLA] Change default SM100 MLA prefill backend back to TRT-LLM (#38562) Matthew Bonanni 2026-03-30 12:51:24 -04:00
  • 3b1dbaad4e [HMA]Fix corner case when hybrid page_size can not be evenly divided issue (blk_size=64,tp=4) (#37467) Chendi.Xue 2026-03-30 11:47:30 -05:00
  • b4a2f3ac36 [NVIDIA] Bugfix NVFP4 DGX Spark and RTX50 (#38423) Johnny 2026-03-30 18:36:18 +02:00
  • 8e6293e838 [Mamba] Add stochastic rounding support (#35753) roikoren755 2026-03-30 19:33:49 +03:00
  • dbdd9ae067 [ROCm][Bugfix] fix exception related to trust_remote_code for MiniMax-M2.1-MXFP4 (#37698) Hongxia Yang 2026-03-30 11:49:23 -04:00
  • e8b055a5ac [Bugfix] Handle ParallelLMHead in compressed-tensors get_quant_method (#37291) Matthias Gehre 2026-03-30 16:30:52 +02:00
  • 246dc7d864 [Misc] Add @tomeras91 as a maintainer of Nemotron related code + mamba block (#38547) tomeras91 2026-03-30 16:12:17 +03:00
  • 7c3f88b2a8 [Bugfix] Remove false-positive format mismatch warnings in FLA ops (#38255) Thomas Parnell 2026-03-30 14:32:26 +02:00
  • 6557f4937f [Bugfix][CPU] Skip set_num_threads after thread binding (#38535) Li, Jiang 2026-03-30 20:13:00 +08:00
  • 677424c7ac [Core][CI] Add opt-in media URL caching via VLLM_MEDIA_CACHE (#37123) Andreas Karatzas 2026-03-30 06:58:53 -05:00
  • 1031c84c36 Fix ambiguous num_blocks for hybrid attn mamba (#37236) Collin McCarthy 2026-03-30 04:09:45 -07:00
  • 7e76af14fa [Bugfix][Frontend] Return 400 for corrupt/truncated image inputs instead of 500 (#38253) aliialsaeedii 2026-03-30 12:26:46 +02:00
  • 3683fe6c06 [Bugfix] Fix shared-object aliasing in n>1 streaming with tool calls (#38158) yzong-rh 2026-03-30 06:12:13 -04:00
  • cc06b4e86b [Mamba][Bugfix] Raise on insufficient cache blocks instead of silently capping cudagraph sizes (#38270) Nicolò Lucchesi 2026-03-30 11:41:50 +02:00
  • 03ac6ca895 [ROCm] [DOC] Update the Documentation to include ROCm Nightly Wheel support (#38457) TJian 2026-03-30 17:25:46 +08:00
  • 90b29e5302 [CI] Fix Ernie4.5-VL initialization test (#38429) haosdent 2026-03-28 22:43:24 +08:00
  • a08b7733fd [CI] Fix SPLADE pooler test broken by #38139 (#38495) haosdent 2026-03-30 15:48:33 +08:00
  • 85c0950b1f [ROCm] Enable MORI EP for unquantized MoE with AITER backend (#37529) Tan Pin Siang 2026-03-30 15:19:33 +08:00
  • 57861ae48d (security) Fix SSRF in batch runner download_bytes_from_url (#38482) Juan Pérez de Algaba 2026-03-30 09:10:01 +02:00
  • ac30a8311e [Bugfix][Model] Fix PixtralForConditionalGeneration LoRA (#36963) Jee Jee Li 2026-03-30 14:59:42 +08:00
  • 63babd17f1 [Model][Quantization] Add GGUF support for MiniMax-M2.1 (#36965) PikaPikachu 2026-03-30 14:24:06 +08:00
  • fec5aeca12 [ci] Soft fail and disable retry for AMD build image job (#38505) Kevin H. Luu 2026-03-29 23:05:26 -07:00
  • d816834c1a [MoE] Add RoutingMethodType.Simulated to TRT-LLM FP8/NVFP4 kernel allowlists (#38329) Jaewon 2026-03-29 22:53:43 -07:00
  • 92f0db57a8 [Misc] Always use forward_mulmat for Conv3d on newer versions of torch. (#38487) Roger Wang 2026-03-29 22:39:41 -07:00
  • bea23536f6 [CI] Add temperature=0.0, reduce max_tokens, and add debug prints to audio_in_video tests (#38492) Andreas Karatzas 2026-03-30 00:36:45 -05:00
  • c133f33746 Add @ZJY0516 to CODEOWNERS (#38497) Jiangyun Zhu 2026-03-30 12:10:00 +08:00
  • a6db99ba02 [Bugfix] Support multi-type params parsing for DeepSeek v3.2 (#33703) Stanislav Kirillov 2026-03-30 06:07:28 +02:00
  • 4f2ed5fddb [ROCm][CI] Enable hybrid chunked prefill test (#38317) Andreas Karatzas 2026-03-29 21:30:26 -05:00
  • d28d86e8a3 [QeRL] Fix online quantized reloading (#38442) Kyle Sayers 2026-03-29 16:56:41 -04:00
  • 995dea1354 [Perf] Remove redundant device copies for CPU-only pooling token IDs, 48.9% E2E throughput improvement (#38139) Wentao Ye 2026-03-29 14:12:50 -04:00
  • 8c0b6267d7 [Transformers v5] fix missing pixtral/voxtral multimodal dispatch (#38410) allgather 2026-03-29 02:59:06 -07:00
  • a45d96ff42 [CI] Skip failing test (#38369) Nicolò Lucchesi 2026-03-27 21:25:19 +01:00
  • 7693c8eabf Fix attribute error in isaac_patch_hf_runner (#37685) Harry Mellor 2026-03-20 14:49:40 +00:00
  • 43cc5138e5 [ROCm][CI] Fix cross-attention dispatch for encoder-decoder models (#38450) Andreas Karatzas 2026-03-29 00:08:03 -05:00
  • 5b8c30d62b [Spec Decode, BugFix] Propagate norm_before_fc from Eagle3 speculator (#38111) Shubhra Pandit 2026-03-28 20:42:06 -04:00
  • d39b8daf5f [Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367) haosdent 2026-03-29 08:27:52 +08:00
  • fafca38adc [BugFix][Frontend] apply task instruction as system prompt in cohere v2/embed (#38362) Walter Beller-Morales 2026-03-28 14:30:54 -04:00
  • aa4eb0db78 [CI]revert initialize_model context manager (#38426) Kunshang Ji 2026-03-29 00:56:50 +08:00
  • af89140efc [ROCm][CI] Fix UV install in Dockerfile.rocm to detect curl failures and retry (#38415) Andreas Karatzas 2026-03-28 11:47:42 -05:00
  • b2bc736b12 [CI] Fix Ernie4.5-VL initialization test (#38429) haosdent 2026-03-28 22:43:24 +08:00
  • 58c959a767 [Misc]: clean up non-core lint issues (#37049) whyiug 2026-03-28 22:28:16 +08:00
  • bda3eda82d [Bugfix] Disallow renderer_num_workers > 1 with mm processor cache (#38418) Bvicii 2026-03-28 06:32:52 -07:00
  • 2bf5b70ae8 [CI Bugfix] Pre-download missing FlashInfer headers in Docker build (#38391) Michael Goin 2026-03-28 14:09:00 +01:00
  • 6dad4c5722 [Test] Fix flaky race condition in test_abort_final_step (#38414) yzong-rh 2026-03-28 05:06:56 -04:00
  • 171775f306 Fix Device Index for ROCm Ray Workers in MoE Benchmark (#38108) Liwen 2026-03-28 08:27:11 +00:00
  • 58a249bc61 [ROCm] [Release] Update ROCm variant from rocm700 to rocm721 (#38413) TJian 2026-03-28 14:07:03 +08:00
  • 148a5c1226 [Bugfix]fix output Nan/Inf in marlin if dtype=float16 (#33972) IriKa 2026-03-28 07:36:08 +08:00
  • b69bf2f0b1 [Perf] Use torch compile to fuse pack topk in trtllm moe (#37695) Wei Zhao 2026-03-27 19:30:46 -04:00
  • 88149b635e Add nvidia h800 moe config (#31201) rongfu.leng 2026-03-28 07:28:48 +08:00
  • 83a4df049d [ROCm][Documentation] update quickstart and installation to include rocm nightly docker tips (#38367) Hongxia Yang 2026-03-27 19:20:19 -04:00
  • 731285c939 [ROCm][CI/Build] ROCm 7.2.1 release version; torch 2.10; triton 3.6 (#38252) Gregory Shtrasberg 2026-03-27 18:03:12 -05:00
  • 97d19197bc [NVIDIA] Fix DGX Spark logic (#38126) Johnny 2026-03-27 23:26:07 +01:00
  • 7624525bf6 cherry-pick [Bugfix] Restore prepare_fp8_layer_for_marlin removed by merge conflict resolution Vadim Gimpelson 2026-03-27 14:35:05 -07:00
  • d1b4f10b19 cherry-pick [CI Bugfix] Pre-download missing FlashInfer headers in Docker build Michael Goin 2026-03-27 14:34:30 -07:00
  • 384e4d5f48 [Model Runner V2] Rebuild attention metadata before eagle decode full… (#38311) Giancarlo Delfin 2026-03-27 13:46:42 -07:00
  • 44a6528028 [CI] Skip failing test (#38369) Nicolò Lucchesi 2026-03-27 21:25:19 +01:00
  • 648edcf729 [QeRL] Compose online quantization with quantized reloading (#38032) Kyle Sayers 2026-03-27 16:22:33 -04:00
  • 7ba425e916 Add short flag -sc for --speculative-config argument (#38380) Michael Goin 2026-03-27 20:04:22 +01:00
  • b8665383df [ROCm] Fix GPT-OSS import for triton 3.6 (#37453) Gregory Shtrasberg 2026-03-27 13:00:57 -05:00
  • 0e9358c11d {ROCm]: gpt-oss fusion/padding fixes (#38043) Rohan Potdar 2026-03-27 11:19:15 -05:00
  • 21d2b53f88 Remove need for explicit \n in docstring lists for --help formatting (#38350) Harry Mellor 2026-03-27 15:38:00 +00:00
  • 98e7f223b9 enable skipping of SW attention layers when using FP8 KV cache (#33695) Jonas M. Kübler 2026-03-27 14:25:02 +01:00
  • b111f8a61f fix(security): Add VLLM_MAX_N_SEQUENCES environment variable and enforce limit (#37952) Juan Pérez de Algaba 2026-03-27 14:02:10 +01:00
  • 497e234d38 [EPLB] Cleanup the transfer logic for the various eplb maps (#34520) Sage Moore 2026-03-27 02:18:46 -07:00
  • 6287e7fa20 [P/D] Mooncake: Add unit tests and minor fixes for mooncake connector (#36946) dtc 2026-03-27 16:26:40 +08:00
  • 84e439a9cb [CI/Build] Move nightly wheel index generation to a single post-build step (#38322) Shengqi Chen 2026-03-27 15:44:18 +08:00
  • a1746ff9ec [Doc] Clarify Helm chart location in deployment guide (#38328) Yuichiro Utsumi 2026-03-27 16:43:02 +09:00
  • aee4c14689 [Bugfix] Fix Hermes tool parser when stream interval > 1 (#38168) Flora Feng 2026-03-27 02:42:26 -04:00
  • 0ae89f18fd [Refactor] Move FusedMoE hidden_size roundup to quant_method (#34285) Bowen Bao 2026-03-26 23:38:26 -07:00
  • c2b17d71af [CI] Add xpu auto-label rule for Intel GPU/XPU PRs (#38320) wenjun liu 2026-03-27 14:22:38 +08:00
  • becaed6ec8 [CPU] Support CT W4A16 on CPU MP kernel (#38219) Li, Jiang 2026-03-27 14:15:28 +08:00
  • a8eab8f30d [Model] Extract GatedDeltaNetAttention into shared layer for Qwen3Next and Qwen3.5 (#37975) Xiaoshuang Wang 2026-03-27 14:13:21 +08:00
  • 2babac0bed [frontend] dump openai responses type by alias (#38262) cjackal 2026-03-27 14:58:20 +09:00
  • 7cc302dd87 [kv_offload+HMA][7/N]: Support register_kv_caches for hybrid models (#37853) Or Ozeri 2026-03-27 08:38:33 +03:00
  • 999dfc1622 [Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789) Bvicii 2026-03-26 22:17:00 -07:00
  • d86060122a [CI/Build] enable Intel XPU test flow with prebuilt image (#37447) wenjun liu 2026-03-27 09:16:04 +08:00
  • f73bcb1c51 Various Transformers v5 config fixes (#38247) Harry Mellor 2026-03-26 23:06:59 +00:00
  • 28048bd6b0 [Bugfix] Add missing f-string prefix in xgrammar choices error message (#38162) yzong-rh 2026-03-26 17:43:03 -04:00
  • c32e97602d [Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045) Giancarlo Delfin 2026-03-26 13:38:12 -07:00
  • 0904b6550d Fix multi-node allreduce fusion (#38136) Wei Zhao 2026-03-26 16:24:36 -04:00
  • f26fcdfb9e [Bugfix][ROCm] Fix lru_cache on paged_mqa_logits_module (#37547) Stig-Arne Grönroos 2026-03-26 21:01:05 +02:00
  • bc9c6fbbe6 [ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263) TJian 2026-03-27 02:47:10 +08:00
  • bff9a1c266 [ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165) Andreas Karatzas 2026-03-26 13:33:45 -05:00
  • db01535e2b [ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile (#37930) Andreas Karatzas 2026-03-26 12:44:01 -05:00
  • a4cf9b22ba [ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model (#37228) (#37228) jennyyyyzhen 2026-03-26 10:33:39 -07:00
  • 9c3ae04bfe [ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155) Andreas Karatzas 2026-03-26 11:51:18 -05:00
  • a8e48a7b85 [CI] Fix conch kernel crash on 3D input by reshaping to 2D before GEMM (#38178) Andreas Karatzas 2026-03-26 11:46:03 -05:00
  • b9dbc5c4ab [Mamba][APC] Add test case to compare apc outputs (#34977) Divakar Verma 2026-03-26 12:40:35 -04:00