Commit Graph

  • d7ff22204a [Misc] Add mooncake-transfer-engine to kv_connectors requirements (#34826) Teng Ma 2026-02-19 02:26:24 +08:00
  • c0bd8b13da [Bugfix] Redo Qwen3.5/Qwen3-Next GDN projector fusion (#34697) Isotr0py 2026-02-19 01:46:53 +08:00
  • caeb887bf6 [Bugfix] Fix NVFP4 TRTLLM MoE non-gated support; add gsm8k for Nemotron-3-Nano FP8+NVFP4 (#34725) Michael Goin 2026-02-18 12:39:22 -05:00
  • 6b3166a7c7 [CI][Bugfix] Fix multinode test script (#34820) Ilya Markov 2026-02-18 17:45:10 +01:00
  • 25e2e136ef [CI] temporarily disable multi-node tests (#34825) Robert Shaw 2026-02-18 11:32:44 -05:00
  • 6874638bc4 [Model Bash] DeepSeek R1 BF16 Min Latency QKV A GEMM (0.5% E2E Speedup) (#34758) Robert Shaw 2026-02-18 10:42:36 -05:00
  • e24663c5a9 Add unit tests for fp8 output fusion of triton_attn (#34228) Burkhard Ringlein 2026-02-18 12:22:49 +01:00
  • c50e105a88 [Model Runner V2] Avoid prepare prefill kernel launch overhead (#34780) Nick Hill 2026-02-18 00:49:21 -08:00
  • a766b30349 [Renderer] Deprecate code paths for old input processing (#34775) Cyrus Leung 2026-02-18 16:35:04 +08:00
  • 1faa8cb73c [Quantization] - Added uses_meta_device_weights to quant config (#34645) Asaf Joseph Gardin 2026-02-18 09:43:44 +02:00
  • e89a91d927 [Bugfix] fix activation in cpu_fused_moe_torch call (#34696) Marek Michalowski 2026-02-18 07:39:46 +00:00
  • 909b147197 [Bugfix] Fix prefix creation for Qwen3.5 (#34723) Michael Goin 2026-02-18 02:39:15 -05:00
  • a88b3be7c4 [Bugfix] Fix quant RMS norm fusion for quantization with TMA-aligned scales (#33255) ElizaWszola 2026-02-18 08:35:04 +01:00
  • a49ea5a58f [Model Runner V2] A bit more PP simplification (#34766) Nick Hill 2026-02-17 21:39:07 -08:00
  • 30ebe0dc3c [CI/Build] Remove use of skip_v1 (#34699) Cyrus Leung 2026-02-18 12:19:11 +08:00
  • cef65f0715 [ROCm][CI] Removed hard-coded attn backend requirement for Qwen VL (#34753) Andreas Karatzas 2026-02-17 21:59:53 -06:00
  • 6f3b2047ab [Core] Fix SSRF bypass via backslash-@ URL parsing inconsistency (#34743) Russell Bryant 2026-02-17 22:53:35 -05:00
  • 02e8f26cea [torch.compile] Turn on silu+fp4 quant fusion by default for O1+ (#34718) Luka Govedič 2026-02-17 22:29:15 -05:00
  • 4a00a511bb [BugFix] [Build] fix string literals comparison in indexer_k_quant_and_cache calling site (#34653) Hongxia Yang 2026-02-17 22:19:41 -05:00
  • a0d8d944e2 [Renderer] Move MM Hash parsing into Renderer (#34711) Cyrus Leung 2026-02-18 11:18:55 +08:00
  • df3f537a66 [CI] Remove unused precompiled wheel args from image build (#34767) Amr Mahdi 2026-02-17 18:58:18 -08:00
  • 7743152957 [Attention] Refactor check_and_update_config (#33600) Matthew Bonanni 2026-02-17 20:06:54 -05:00
  • ab33d2a629 [Feature] Decode Context Parallel support for GPU model runner v2 (#34179) Wentao Ye 2026-02-17 19:27:15 -05:00
  • be3af2d29e [Model Runner V2] Further simplification for PP (#34724) Woosuk Kwon 2026-02-17 15:18:18 -08:00
  • c656ba3b4d [Kernel] Triton-based Top-k and Top-p sampler kernels (#33538) Jongseok Park 2026-02-17 15:14:30 -08:00
  • dc5fa77a4e [Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs (#34457) Matthew Bonanni 2026-02-17 14:01:27 -05:00
  • 1e4a084c8e [CI] Fix flaky test_parsable_context (#34717) Flora Feng 2026-02-17 13:42:52 -05:00
  • 7967e854da [BugFix] Fix sp tests (#34716) Richard Zou 2026-02-17 12:07:56 -05:00
  • 6bd6d0c3c1 Fixed whisper CPU test that does not spawn properly. (#34324) almayne 2026-02-17 14:46:23 +00:00
  • 8e962fef5f [CI][Nixl] Add CrossLayer KV layout tests (#34615) Nicolò Lucchesi 2026-02-17 14:35:40 +01:00
  • 574fe75245 [Renderer] Move InputPreprocessor into Renderer (2/2) (#34560) Cyrus Leung 2026-02-17 21:29:01 +08:00
  • c61a98f529 [CI][BugFix] ShellCheck cleanup to remove baseline and preserve runtime behavior (#34514) junuxyz 2026-02-17 21:22:56 +09:00
  • 28bffe9466 Fix docs build warning (#34686) Harry Mellor 2026-02-17 10:31:40 +00:00
  • ad65177a19 [Bugfix] Fix 'remove_instance_endpoint' method logic in disagg_proxy_demo (#32922) ChenqianCao 2026-02-17 18:06:53 +08:00
  • d44a5b6c47 Remove dead bitsandbytes CxB code from 8-bit inference path (#34633) Tim Dettmers 2026-02-17 04:49:14 -05:00
  • 1d65283e95 Revert "[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj" (#34683) Jiangyun Zhu 2026-02-17 17:29:27 +08:00
  • c464b57374 [Ray] Propagate third-party env vars to Ray workers via prefix matching (#34383) kourosh hakhamaneshi 2026-02-17 01:08:42 -08:00
  • c5c38e152a [CI] Fix bake config artifact path for AMI rebuild pipeline (#34656) Amr Mahdi 2026-02-16 22:39:44 -08:00
  • d00df624f3 [Model Runner V2] Minor refactoring for penalties (#34662) Woosuk Kwon 2026-02-16 21:43:00 -08:00
  • 9752da9d9c [Model Runner V2] Minor simplification for BadWordsState (#34669) Woosuk Kwon 2026-02-16 21:27:24 -08:00
  • 04925b2202 [Model Runner V2] Minor cleanup for PP (#34666) Woosuk Kwon 2026-02-16 19:15:31 -08:00
  • d74278fb67 [Model Runner V2] Fix unintended CPU-GPU sync in make_dummy (#34667) Woosuk Kwon 2026-02-16 19:00:29 -08:00
  • b68fd899d1 [Bugfix] Fix fused MoE int32 overflow in stride*offset without perf regression (#34507) haosdent 2026-02-17 09:58:49 +08:00
  • 0b5f9b7204 [CI] Enable mypy import following for vllm/v1/kv_offload (#34639) Aneesh Puttur 2026-02-16 20:58:15 -05:00
  • 9a8853f781 [Core] Pipeline Parallel support for Model Runner V2 (#33960) zhanqiuhu 2026-02-16 20:48:16 -05:00
  • 387a1898d9 [Model Runner V2] support bad_words sampling param (#33433) zhrrr 2026-02-17 08:36:06 +08:00
  • 3b30e61507 [NemotronH] Do not force router to run in fp32 (#34582) roikoren755 2026-02-16 20:15:32 +02:00
  • 824f9e8f3c Targeting the MI355 agent pool with all existing tests (#34629) Alexei-V-Ivanov-AMD 2026-02-16 11:02:27 -06:00
  • 6cc403e67d [Bugfix][CI] Fix flaky entrypoints/openai/test_response_api_with_harmony.py::test_function_calling[openai/gpt-oss-20b] (#34624) Nicolò Lucchesi 2026-02-16 17:11:07 +01:00
  • 72d5951d02 [Bugfix] Treat generation_config max_tokens as default not ceiling (#34063) Almog Tavor 2026-02-16 17:58:24 +02:00
  • a3205beffb [CI] Enable mypy coverage for individual excluded files (#34292) Lucas Kabela 2026-02-16 07:34:29 -08:00
  • 6930becd45 (bugfix): Fixed encode in LLM entrypoint for IOProcessr plugin prompts (#34618) Christian Pinto 2026-02-16 15:33:55 +00:00
  • 03a8770a6d [ROCm][CI] Fix plugins test group; updating terratorch and dependencies (#34589) Andreas Karatzas 2026-02-16 09:33:42 -06:00
  • bc56a1d56e [Bugfix] Fix ARC touch KeyError for non-ready T1 blocks in kv offload (#34576) Yiqi Xue 2026-02-16 07:33:19 -08:00
  • ec7d9e6745 Fix call to moe_mk in modelopt MoE modules (required for LoRA) (#34575) danisereb 2026-02-16 17:33:09 +02:00
  • 3bb4e4311c [Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj (#34492) Isotr0py 2026-02-16 23:32:51 +08:00
  • 08f8c198ae [CI] Disable precompiled wheel path in CI image builds (#34606) Amr Mahdi 2026-02-16 07:14:43 -08:00
  • a21cedf4ff Bump lm-eval version for Transformers v5 compatibility (#33994) Harry Mellor 2026-02-16 14:24:35 +01:00
  • 3ef74cde5d [CI][Tracing] Fix race condition by adding server readiness check (#34364) emricksini-h 2026-02-16 13:57:39 +01:00
  • cd81cdb399 [Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058) Ekagra Ranjan 2026-02-16 06:08:44 -05:00
  • 1e828573b4 [CI][Metrics] Stabilize tests with polling and subprocess guards (#34566) Andreas Karatzas 2026-02-16 04:52:02 -06:00
  • a5ccc85c8c [Bugfix] Fix Dynamo unexpected keyword argument (#34320) Samu Tamminen 2026-02-16 11:32:30 +02:00
  • b5475d0534 Revert "[Misc] fix qwen3.5 config" (#34610) Roger Wang 2026-02-16 01:06:05 -08:00
  • 9521002f0a [Misc] fix qwen3.5 config (#34604) JJJYmmm 2026-02-16 16:25:38 +08:00
  • ec17bdd894 [Renderer] Move InputPreprocessor into Renderer (1.5/2) (#34598) Cyrus Leung 2026-02-16 15:46:33 +08:00
  • bb59c90248 [CI] Write bake config to temp directory instead of repo root (#34569) Amr Mahdi 2026-02-15 22:15:47 -08:00
  • 5bff999d12 [Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues (#34453) bnellnm 2026-02-15 23:10:50 -05:00
  • bb85929aa6 [BugFix] Fix Python 3.13 FlashMLA import error (#34548) Lucas Wilkinson 2026-02-15 20:09:18 -08:00
  • 5653021094 [Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584) Parth Bansal 2026-02-16 05:09:00 +01:00
  • 974d829b05 [CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice (#34590) Andreas Karatzas 2026-02-15 22:06:48 -06:00
  • 91ac5d9bfd [CI/Build] Enable tests for recent day-0 new models (#34585) Isotr0py 2026-02-16 10:17:04 +08:00
  • 23d825aba1 [torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392) Luka Govedič 2026-02-15 09:33:57 -05:00
  • f07a128413 [CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079) Maryam Tahhan 2026-02-15 14:33:08 +00:00
  • 71cd89264f [MM Encoder] Add Triton ViT attention backend (#32183) Isotr0py 2026-02-15 22:32:47 +08:00
  • 19fab44152 [Doc] Update Encoder-Decoder models support doc with Florence-2 (#34581) Isotr0py 2026-02-15 20:18:57 +08:00
  • 79c7e09235 [KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround (#34415) Seiji Eicher 2026-02-14 23:26:10 -08:00
  • 79f3fab05a [Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE (#34494) haosdent 2026-02-15 15:25:46 +08:00
  • 604b9eaec5 [BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 (#34476) Vadim Gimpelson 2026-02-15 11:25:17 +04:00
  • 50dbd6c9e6 [bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used (#34516) Stanislav Kirillov 2026-02-15 12:24:25 +05:00
  • 98bcc6ca59 [CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 (#34468) Andreas Karatzas 2026-02-15 01:08:38 -06:00
  • f13e86d8dd [Kernels] Fix Helion GPU utils to use platform-agnostic device name API (#34537) Andreas Karatzas 2026-02-14 22:29:23 -06:00
  • 9ca768c740 [Model Runner V2] Minor cleanup for Sampler (#34563) Woosuk Kwon 2026-02-14 18:29:03 -08:00
  • d5fe3f702c [Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997) Thomas Parnell 2026-02-14 22:15:56 +01:00
  • 73391a1baa [Renderer] Move InputPreprocessor into Renderer (1/2) (#34510) Cyrus Leung 2026-02-15 02:14:21 +08:00
  • b3c14229b0 [ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538) Andreas Karatzas 2026-02-14 09:32:09 -06:00
  • 2f186635cb [Bugfix] Fix Qwen3.5 config loading (#34554) Roger Wang 2026-02-14 03:56:11 -08:00
  • 342a7cda2d [Misc] Update tests and examples for Prithvi/Terratorch models (#34416) Christian Pinto 2026-02-14 07:03:51 +00:00
  • d1ea65d0a1 [new model] add COLQwen3 code & Inference (#34398) Kata Coder 2026-02-14 13:15:19 +09:00
  • de42abb366 [CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294) Andreas Karatzas 2026-02-13 22:04:29 -06:00
  • 60ca7981bc Add explicit validation error for tool calls. (#34438) Julien Denize 2026-02-14 05:04:01 +01:00
  • 0ef5b9147b fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection (#34527) Christian S. Perone 2026-02-14 04:03:37 +00:00
  • ed242652d7 [bug] Make sure get_modality_with_max_tokens is deterministic (#34533) Shiyan Deng 2026-02-13 20:02:59 -08:00
  • b37b679770 [Feature][Perf] Support Selective CPU Weight Offloading (#34535) Wei Zhao 2026-02-13 23:02:24 -05:00
  • a0638d052d [Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 (#34543) Andreas Karatzas 2026-02-13 22:01:42 -06:00
  • c027541eaf [Hybrid] Enable spec decoding in mamba cache align mode (#33705) Harry Huang 2026-02-14 05:02:28 +08:00
  • fd267bc7b7 [Bugfix]: Fix structured output in multi-turn gpt-oss (#34454) Ben Browning 2026-02-13 14:12:48 -05:00
  • bfaa559305 Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" (#34530) Michael Goin 2026-02-13 13:35:29 -05:00
  • 87789c8364 [Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523) Richard Zou 2026-02-13 12:52:20 -05:00
  • bcd65c1f6a [Bugfix] Replace c10::optional with std::optional in topk kernel (#34467) Pushpinder Singh 2026-02-13 08:30:23 -08:00
  • 59d53066d8 [Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993) Wei Zhao 2026-02-13 11:11:26 -05:00