Commit Graph

  • 73f48ce559 [Kernel] [Helion] Use warning_once in get_gpu_name to prevent log spam (#38743) Yanan Cao 2026-04-01 21:30:31 -07:00
  • 3aab680e3e [ROCm][Bugfix] Fix ROCm runtime failure due to missing symbol (#38750) Gregory Shtrasberg 2026-04-01 23:30:11 -05:00
  • 5a2d420c17 [Bugfix] Use dedicated MM processor cache in /tokenize to prevent sender-cache pollution (#38545) Sergey Zinchenko 2026-04-02 07:14:49 +03:00
  • 5f96f9aff1 [Perf] DSV3.2 Indexer Fused Weights Projection (#38684) Benjamin Chislett 2026-04-01 23:34:49 -04:00
  • 694449050f Fix multiline-format string for python 3.10 (#38739) Luka Govedič 2026-04-01 23:19:35 -04:00
  • 6241521dd2 [BugFix] Fix precommit breakage due to conflicting in-flight merges (#38759) Nick Hill 2026-04-01 15:35:06 -07:00
  • 1785dc5501 Revert "[Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831)" (#38751) Kevin H. Luu 2026-04-01 15:34:28 -07:00
  • 54500546ac [Bugfix] Preserve original ImportError in gRPC server entrypoint (#38673) Chang Su 2026-04-01 15:16:44 -07:00
  • cfad6a509c Revert "[Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730)" khluu 2026-04-01 15:14:58 -07:00
  • de5e6c44c6 [Feat][Executor] Introduce RayExecutorV2 (#36836) Jeffrey Wang 2026-04-01 14:34:29 -07:00
  • cb268e4e55 [Refactor] Simplify FutureWrapper in MultiprocExecutor (#38644) yzong-rh 2026-04-01 17:28:26 -04:00
  • c284a6671c [Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730) v0.19.0rc1 Stefano Castagnetta 2026-04-01 21:08:40 +02:00
  • 3a30a1a6a8 [Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242) Chauncey 2026-04-02 00:56:45 +08:00
  • 29982d48b3 (security) Enforce frame limit in VideoMediaIO (#38636) Juan Pérez de Algaba 2026-04-01 12:23:45 +02:00
  • 6183cae1bd [Bugfix] Restrict TRTLLM attention to SM100, fixing GB300 (SM103) hang (#38730) Stefano Castagnetta 2026-04-01 21:08:40 +02:00
  • c09ad767cd Feature/silu block quant fusion v1 (#32996) Monishver 2026-04-01 11:50:43 -07:00
  • c9a9db0e02 [Compile] Fix nvfp4 compile warning (#38573) Wentao Ye 2026-04-01 14:28:57 -04:00
  • cbe7d18096 [Misc] Rename think_start_str/think_end_str to reasoning_start_str/reasoning_end_str (#38242) Chauncey 2026-04-02 00:56:45 +08:00
  • db5d0719e1 [Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664) Michael Goin 2026-04-01 18:41:42 +02:00
  • dc0428ebb8 [NIXL][BUG] Fix Triton heterogeneous TP (#37940) yzong-rh 2026-04-01 11:23:15 -04:00
  • 148c2072ec Add ibm-granite/granite-vision-3.3-2b to supported models documentation (#38714) Jesus Talavera 2026-04-01 17:22:25 +02:00
  • 2f5c3c1ec0 [Misc] Fix docstring typo: buildin -> builtin (#38722) majianhan 2026-04-01 22:39:46 +08:00
  • fa246d5231 Fix shape comment in extract_hidden_states example (#38723) Fynn Schmitt-Ulms 2026-04-01 10:29:33 -04:00
  • 7cf56a59a2 [MoE Refactor] Make SharedExperts class for use with DefaultMoERunner (#35153) bnellnm 2026-04-01 09:44:08 -04:00
  • 5e30e9b9a9 [Bugfix] Revert "Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding" (#38359) Elvir Crnčević 2026-04-01 15:11:10 +02:00
  • 582340f273 [Bugfix] Fix Qwen3CoderToolParser anyOf/oneOf type resolution for nullable params (#37831) 손세정 2026-04-01 21:22:29 +09:00
  • 992368522f [KVTransfer] Fix TpKVTopology.is_kv_replicated equality case (#38179) yjz 2026-04-01 18:41:49 +08:00
  • 58ee614221 (security) Enforce frame limit in VideoMediaIO (#38636) Juan Pérez de Algaba 2026-04-01 12:23:45 +02:00
  • f9f6a9097a Add verified label to trigger pre-commit (#38708) Harry Mellor 2026-04-01 10:31:02 +01:00
  • c75a313824 [Perf] triton bilinear_pos_embed kernel for ViT (#37948) Zhanda Zhu 2026-04-01 04:52:02 -04:00
  • 4f6eed3bd4 [Core] Simplify multimodal masking (#34246) Lukas Geiger 2026-04-01 09:18:22 +01:00
  • 1dbbafd3f3 [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160) v0.19.0rc0 Yifan Qiao 2026-03-31 17:58:37 -07:00
  • 0ee3b7fc3d [Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178) Lucas Wilkinson 2026-04-01 00:15:53 -04:00
  • 268bed9cf3 [Bugfix][Async] Fix async spec decoding with hybrid models (#38556) Matthew Bonanni 2026-03-31 11:08:54 -04:00
  • bcc0fdd0f3 [CI] fix LM Eval Qwen3.5 Models (B200) (#38632) Jiangyun Zhu 2026-03-31 21:20:08 +08:00
  • 69b8bd4b33 [CI Failure] pin colmodernvbert revision (#38612) wang.yuqi 2026-03-31 18:54:54 +08:00
  • 36d7f19897 [CPU] Support head_size 512 in cpu_attn (#38676) Li, Jiang 2026-04-01 13:42:27 +08:00
  • 2d725b89c5 [Bugfix] Lazy import diskcache to avoid sqlite3/libstdc++ ImportError at startup (#38649) Jeffrey Wang 2026-03-31 22:31:20 -07:00
  • ef53395e2c [bugfix] do not add extra linebreak for score/rerank with chat template (#38617) Augusto Yao 2026-04-01 12:50:07 +08:00
  • eb47454987 [Bugfix][MLA] Add logits size budget to sparse indexer prefill chunking (#36178) Lucas Wilkinson 2026-04-01 00:15:53 -04:00
  • 116f4be405 [1/N][Cleanup] Standardize on use of is_quantized_kv_cache (#38659) Matthew Bonanni 2026-04-01 00:08:01 -04:00
  • 7b01d97a22 [Perf] Optimize mean pooling using chunks and index_add, 5.9% E2E throughput improvement (#38559) Wentao Ye 2026-03-31 23:54:58 -04:00
  • 17b72fd1c8 Fix priority preemption regression test in scheduler (#37051) HarshRathva 2026-04-01 09:06:12 +05:30
  • c49497726b [ROCm][perf] Shuffle KV cache to use paged_attention_common (#32914) Samu Tamminen 2026-04-01 06:30:19 +03:00
  • cb0b443274 [Misc] Add 20 regression tests for 11 tool parser bug fixes (#38172) Ben Browning 2026-03-31 23:00:31 -04:00
  • 40bb175027 [vLLM IR] 1/N Implement IR skeleton and rms_norm op (#33825) Luka Govedič 2026-03-31 22:15:05 -04:00
  • 0fab52f0aa Fix NaN from stale FP4 scale padding in create_fp4_scale_tensor (#38148) Elvir Crnčević 2026-04-01 04:14:59 +02:00
  • 91e4521f9f [Feat][v1] Simple yet General CPU KV Cache Offloading (#37160) Yifan Qiao 2026-03-31 17:58:37 -07:00
  • 31a719bcd3 [ROCm][perf] fix Aiter sparse MLA with MTP>1 (#37887) Stig-Arne Grönroos 2026-04-01 02:22:23 +03:00
  • 2e56975657 Generative Scoring (#34539) Vedant V Jhaveri 2026-03-31 16:02:11 -07:00
  • 36f1dc19ae feat(grpc): add periodic stats logging and servicer log forwarding (#38333) Chang Su 2026-03-31 15:50:07 -07:00
  • 3dc01ef352 [Quantization] Consolidate dummy format logic into DummyModelLoader (#38637) Asaf Gardin 2026-04-01 01:20:45 +03:00
  • cc671cb110 [Kernel] [Helion] [17/N] Add Helion kernel torch.compile support (#38592) Yanan Cao 2026-03-31 14:06:42 -07:00
  • 856589ed9a [Refactor] Remove dead code in kv connector and model runner (#38383) Wentao Ye 2026-03-31 17:05:23 -04:00
  • 517b769b58 [Perf] Fix DBO overlap: capture DeepEP event before yield (#38451) czhu-cohere 2026-03-31 13:38:59 -07:00
  • d9b90a07ac [MoE Refactor] Migrate Unquantized to Full Oracle Flow (#36286) yzong-rh 2026-03-31 15:43:33 -04:00
  • 598190aac3 [fix] Remove trtllm ragged mla prefills (#36540) Olya Kozlova 2026-03-31 21:30:27 +02:00
  • b779eb3363 [Model] Sync upstream BT=chunk_size fix for GDN chunk_fwd_kernel_o, simplify warmup to single pass (#38343) Xu Jinyang 2026-04-01 03:03:24 +08:00
  • 077a9a8e37 [torch.compile] Refactor Attention Quant Fusion Pass and Remove Boilerplate (#37373) BadrBasowid 2026-04-01 02:15:50 +08:00
  • 07edd551cc [CI/Build] Resolve a dependency deadlock when installing the test dependencies used in CI (#37766) Run Yu 2026-03-31 11:05:14 -07:00
  • 7c080dd3c5 [4/n] Migrate FP4/W4A8 CUTLASS kernels to torch stable ABI (#37503) mikaylagawarecki 2026-03-31 13:21:13 -04:00
  • 0dd25a44ea [Quantization][Autoround][XPU] Add W4A16 Support (#37986) Yi Liu 2026-04-01 00:48:24 +08:00
  • 3896e021a0 [Bugfix] Fix FusedMoE weight loading with padded hidden dimensions (#37010) SandishKumarHN 2026-03-31 09:22:26 -07:00
  • b6e636c12c [Fix] handle PaddleOCR-VL image processor max_pixels across Transformers v4/v5 (#38629) v0.18.2rc0 zhang-prog 2026-03-31 23:50:41 +08:00
  • f1ff50c86c [Bugfix] clamp dA_cumsum differences to prevent Inf in Mamba2 SSD kernels (#37501) Jingu Kang 2026-04-01 00:35:51 +09:00
  • 757068dc65 [Bugfix][Async] Fix async spec decoding with hybrid models (#38556) Matthew Bonanni 2026-03-31 11:08:54 -04:00
  • 7337ff7f03 [Docs] PD with Nixl compat matrix (#38628) Nicolò Lucchesi 2026-03-31 17:01:21 +02:00
  • 5869f69c5f [Online Quant] [QeRL] Minor code cleanup (#38574) Kyle Sayers 2026-03-31 10:56:43 -04:00
  • 4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841) wliao2 2026-03-31 07:32:54 -07:00
  • e8057c00bc [CI] Avoid concurrent docker pull in intel XPU CI runners to prevent rate limit issues (#38594) wenjun liu 2026-03-31 22:23:18 +08:00
  • 7430389669 [Bugfix][CI] Skip flaky test_eagle test (#38566) Nicolò Lucchesi 2026-03-31 15:42:37 +02:00
  • 202f147cf2 Fix MLA runs when use_inductor_graph_partition=True (#38631) ElizaWszola 2026-03-31 15:37:43 +02:00
  • ea7bfde6e4 [CI] fix LM Eval Qwen3.5 Models (B200) (#38632) Jiangyun Zhu 2026-03-31 21:20:08 +08:00
  • d71a15041f [XPU]move testing dependencies from Dockerfile to xpu-test.in (#38596) sihao_li 2026-03-31 20:49:43 +08:00
  • abdbb68386 [EPLB] Add alternative communication for EPLB weight exchange (#33176) Ilya Markov 2026-03-31 14:17:12 +02:00
  • 0c63739135 [EPD] update EPD script arguments (#36742) liuzhenwei 2026-03-31 20:02:09 +08:00
  • 719735d6c5 [CI Failure] pin colmodernvbert revision (#38612) wang.yuqi 2026-03-31 18:54:54 +08:00
  • aae3e688f8 Fix document of torchrun_example.py (#31113) Maosheng Liao 2026-03-31 18:54:23 +08:00
  • 7d65463528 [WIP][CI][Bugfix] Fix test_run_eagle_dp (#38584) Matthew Bonanni 2026-03-31 06:30:25 -04:00
  • 8278825b57 DOC: TPU mention fix (#38129) Mateusz Sokół 2026-03-31 12:27:56 +02:00
  • acf7292bf2 [Misc] Move --grpc CLI argument into make_arg_parser (#38570) Chang Su 2026-03-31 03:24:05 -07:00
  • ce884756f0 [Feature]: add presence_penalty and frequency_penalty fields to Responses API (#38613) Chauncey 2026-03-31 16:45:57 +08:00
  • d9d21eb8e3 [Frontend][3/n] Improve pooling entrypoints | scoring. (#28631) wang.yuqi 2026-03-31 15:52:00 +08:00
  • f09daea261 [CPU] Support int8 compute mode in CPU AWQ (#35697) Yintong Lu 2026-03-31 15:27:37 +08:00
  • 42318c840b [ci] Remove benchmarks job (#38611) Kevin H. Luu 2026-03-30 23:46:21 -07:00
  • 1ac6694297 [OOT] Add OOT support for linear kernel. (#37989) zhangyiming 2026-03-31 14:33:21 +08:00
  • 12449f9492 [Bugfix][CPU] Skip set_num_threads after thread binding (#38535) Li, Jiang 2026-03-30 20:13:00 +08:00
  • 6cc7abdc66 [kv_offload+HMA] Fix num_blocks with different per-layer page sizes and improve assert message (#38554) Kfir Toledo 2026-03-31 09:00:40 +03:00
  • d53cb9cb8e [Tool Parser][2/3] Use self.tools instead of request.tools in tool parsers (#38189) Flora Feng 2026-03-31 01:41:36 -04:00
  • 44eef0ca1e vLLM Benchmark Suite perf regression after PR#32723 (#38576) Louie Tsai 2026-03-30 22:23:17 -07:00
  • b9cdc85207 [ROCm][CI] Fix Whisper translation test attention backend selection (#38508) Andreas Karatzas 2026-03-31 00:21:49 -05:00
  • b92312dfd7 [CI] Fix SPLADE pooler test broken by #38139 (#38495) haosdent 2026-03-30 15:48:33 +08:00
  • 3e802e8786 [Mypy] Fix adjust_request typing (#38264) Flora Feng 2026-03-31 00:21:18 -04:00
  • 350af48e14 [KVConnector] Remove redundant method KVConnectorOutput::merge() (#38546) Martin Hickey 2026-03-31 05:11:02 +01:00
  • e31915063d [Bugfix] Fix for builtins (forward fix of pytorch/177558) (#37234) Lucas Kabela 2026-03-30 18:08:11 -07:00
  • 29e48707e8 [Refactor] Consolidate Tool type alias in tool_parsers/utils.py (#38265) Flora Feng 2026-03-30 20:55:51 -04:00
  • 4ac227222f [Bugfix][DCP] Fix CUDA graph capture for Decode Context Parallelism (#36070) sungsoo ha 2026-03-30 17:20:43 -07:00
  • bb51d5b40d Add @vadiklyutiy as committer (#38589) Vadim Gimpelson 2026-03-31 03:50:04 +04:00
  • 93b3ec1585 feat(attention): extract KV-cache update from FlashAttentionDiffKV ba… (#36466) Prathmesh Bhatt 2026-03-30 16:16:09 -07:00
  • e812bf70bd Restore non-hf processor path for Nano-Nemotron-VL (bypass call_hf_processor_mm_only) - fixes #38018 (#38567) Netanel Haber 2026-03-31 00:56:52 +03:00