Commit Graph

  • aa84e43ccb [Pixtral] Enable Pixtral language model support Eagle3 (#37182) Rémi Delacourt 2026-03-20 16:50:15 +01:00
  • 5e806bcf54 [Bugfix] Fix ConchLinearKernel channelwise quantization (group_size=-1) (#37329) Matthias Gehre 2026-03-20 16:32:21 +01:00
  • 56a62c310c [Bugfix] Reject channelwise quantization (group_size <= 0) in ExllamaLinearKernel (#37331) Matthias Gehre 2026-03-20 16:31:57 +01:00
  • 1779c09898 [ROCm] Enable wvSplitK skinny GEMM kernel for RDNA4/gfx1x decode (#34709) L.B.R. 2026-03-20 15:11:23 +00:00
  • 44eea10f68 [ROCm][Quantization] make quark ocp mx dtype parser robust for weight-only quantization (#36232) xuebwang-amd 2026-03-20 23:10:03 +08:00
  • 8b6c6b9505 [Model] Add LFM2-ColBERT-350M support (#37528) Ilya Boytsov 2026-03-20 15:57:57 +01:00
  • 9f6d9dd371 Fix attribute error in isaac_patch_hf_runner (#37685) Harry Mellor 2026-03-20 14:49:40 +00:00
  • dd20ee4e3e [UX] Enable torch_profiler_with_stack (#37571) Jee Jee Li 2026-03-20 19:17:26 +08:00
  • 0523449c9c [Misc] Use logger.info_once for auto tool choice log message (#37661) Chauncey 2026-03-20 18:40:36 +08:00
  • b4c1aef21c [Refactor] Relocate tests from tests/v1/entrypoints/ to tests/entrypoints/ (#37500) Flora Feng 2026-03-20 05:50:34 -04:00
  • 6050b93bed [Refactor] Move serve entrypoint tests under tests/entrypoints/serve/ (#37595) Flora Feng 2026-03-20 05:10:47 -04:00
  • 5a4a179591 [ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611) Andreas Karatzas 2026-03-20 04:07:26 -05:00
  • 37cd9fc107 [ROCm][CI] Remove deepep DBO tests on gfx90a (#37614) Andreas Karatzas 2026-03-20 04:07:07 -05:00
  • 9cfd4ebb5e [ROCm][CI] Update GSM8K eval config to use fp8-and-mixed models list (#37619) Andreas Karatzas 2026-03-20 04:06:53 -05:00
  • ed359c497a [Model] Deprecate the score task (this will not affect users). (#37537) wang.yuqi 2026-03-20 16:07:56 +08:00
  • dcee9be95a [Model Runner V2] Fix draft logits not populated during cudagraph replay (#37639) Giancarlo Delfin 2026-03-20 00:43:47 -07:00
  • bd8c4c0752 [CI] Removing deprecated rlhf examples reference (#37585) Andreas Karatzas 2026-03-20 02:20:33 -05:00
  • 0140eafb15 [Bug] Fix FlashInfer allreduce fusion workspace uninitialized error (#37461) Wei Zhao 2026-03-20 03:09:21 -04:00
  • bdf6a0a57b [XPU] bump vllm-xpu-kernels to v0.1.4 (#37641) Kunshang Ji 2026-03-20 15:04:38 +08:00
  • 0674d1fee7 [PluggableLayer][MM] Add PluggableLayer for CustomQwen2Decoder (#37293) Wangbei25 2026-03-20 14:24:07 +08:00
  • 30108fc8b0 [Model] Refactor Step3-VL processor to HF style (#37579) Cyrus Leung 2026-03-20 14:05:08 +08:00
  • e2d1c8b5e8 [Refactor] Relocate entrypoint tests to match serving code structure (#37593) Flora Feng 2026-03-20 01:31:23 -04:00
  • 6951fcd44f [XPU] Automatically detect target platform as XPU in build. (#37634) Huanxing 2026-03-20 13:30:15 +08:00
  • 39474513f6 [Model Runner V2] fix draft attention metadata generation (#37364) Giancarlo Delfin 2026-03-19 21:05:15 -07:00
  • 638a872d77 fix(xpu): Re-compute compile ranges after platform-specific config updates (#37523) Yuxiang Liang 2026-03-20 11:52:35 +08:00
  • 9040151fe1 [V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612) Flora Feng 2026-03-19 23:31:43 -04:00
  • 8fbe3f303f [Bugfix][LoRA] Fix Qwen35 LoRA (#36976) Jee Jee Li 2026-03-20 11:09:32 +08:00
  • ea2c148fa7 [compile][graph_partition]Add tensor size handling (#36038) Xiao 2026-03-19 19:55:25 -07:00
  • 47b7af0d87 [Feat] Enable CompressedTensorW4A8Int for XPU (#37207) Tianmu Li 2026-03-19 19:34:28 -07:00
  • 269bf46d99 fix: disambiguate multimodal prefix cache keys (#36708) tianshu-Michael-yu 2026-03-19 19:33:20 -07:00
  • e5a77a5015 [CI] Update mergify tool-calling label paths (#37478) Flora Feng 2026-03-19 22:22:23 -04:00
  • ca1ac1a4b4 Fix DP coordinator ZMQ port TOCTOU (#37452) Itay Alroy 2026-03-20 02:58:31 +02:00
  • 4ca3fa6bb4 [ROCm][Bugfix] fix cache block size mismatch for aiter unified attention (#37606) Divakar Verma 2026-03-19 20:00:08 -04:00
  • be12afd284 [Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056) Flora Feng 2026-03-19 19:51:25 -04:00
  • df3c0291a3 [Bug] Fix EmbedIOprocessor "classify" <-> "embed" (#37573) Wentao Ye 2026-03-19 19:40:10 -04:00
  • 2be1a0f74b [Refactor] Remove dead code in pooling model (#37572) Wentao Ye 2026-03-19 19:39:43 -04:00
  • 4120a05ff1 Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448) Jim Smith 2026-03-19 19:21:14 -04:00
  • 98ff042917 [CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996) rasmith 2026-03-19 18:12:45 -05:00
  • bcf2be9612 [cherry-pick][Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591)#37605 v0.18.0 khluu 2026-03-19 15:06:38 -07:00
  • b55156eae9 [Performance] Enable Triton autotuning disk cache by default (#37188) Artem Perevedentsev 2026-03-19 23:36:28 +02:00
  • 112944fab9 test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064) Laith Sakka 2026-03-19 14:28:45 -07:00
  • 91be5f9be3 [MoE Refactor] Rename "naive" all2all backend (#36294) bnellnm 2026-03-19 15:50:34 -04:00
  • 4ee847e400 Comment fix for async rl example (#35244) Aaron Hao 2026-03-19 12:46:07 -07:00
  • 040a505ff5 [ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839) Andreas Karatzas 2026-03-19 14:30:58 -05:00
  • 9279c59a0e [MoE Refactor] DefaultMoERunner simplifcation (#33049) bnellnm 2026-03-19 15:07:44 -04:00
  • 7454096199 [Log] Log once in local node by default (#37568) Wentao Ye 2026-03-19 15:04:59 -04:00
  • fb8b5e05fc [CI] Add retry with 4x backoff to HTTP fetches for transient failures (#37218) Andreas Karatzas 2026-03-19 14:00:20 -05:00
  • e5d96dc8fc Fix SpeculatorsConfig now that PreTrainedConfig is a dataclass in Transformers (#37574) Harry Mellor 2026-03-19 18:04:40 +00:00
  • daa05bf340 [Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358) EdalatiAli 2026-03-19 13:58:33 -04:00
  • 7769b58307 [torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345) Lucas Kabela 2026-03-19 10:26:12 -07:00
  • 2f9f946b22 [P/D] AnthropicMessages add kv_transfer_params for PD disaggregation (#37535) Chauncey 2026-03-20 00:41:20 +08:00
  • 2890aecce5 [CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37561) Fadi Arafeh 2026-03-19 16:35:45 +00:00
  • 34f093b417 [CI] Gate pre-commit on ready label or number of contributions (#37544) Harry Mellor 2026-03-19 16:21:57 +00:00
  • 4dce8321a9 Run MacOS smoke test on daily cron job instead of every commit (#37567) Harry Mellor 2026-03-19 16:19:50 +00:00
  • 657855ab41 [Misc] Cleanup more configs and processors (#37560) Cyrus Leung 2026-03-19 23:45:23 +08:00
  • e27b8ba3d1 [Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#37346) Wei Zhao 2026-03-19 11:43:06 -04:00
  • 40b8363b45 [MRV2] Use fp32 for draft logits (#37526) Woosuk Kwon 2026-03-19 08:41:21 -07:00
  • 8b10e4fb31 [1/n] Migrate permute_cols to libtorch stable ABI (#31509) mikaylagawarecki 2026-03-19 11:27:26 -04:00
  • 104605cbf2 Remove deprecated reasoning_content message field(part-2) (#37480) Ifta khairul Alam Adil 2026-03-19 16:20:08 +01:00
  • 96266f119b [LoRA] Minor improvements to LoRA log (#37557) Jee Jee Li 2026-03-19 23:18:06 +08:00
  • 7c0cf3bcd0 Cap the number of API servers to 1 when using Elastic EP. (#37466) Sage Moore 2026-03-19 07:42:57 -07:00
  • 572b432913 Stop bench CLI from recursively casting all configs to dict (#37559) Harry Mellor 2026-03-19 14:04:03 +00:00
  • 9515c20868 [Misc] Clean up processing logic (#37541) Cyrus Leung 2026-03-19 21:30:20 +08:00
  • c63ca2b2e6 [Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438) DorBernsohn 2026-03-19 15:08:00 +02:00
  • a32eaf5bb2 [CI] Merge cleanup_pr_body.yml and reminder_comment.yml (#37552) Harry Mellor 2026-03-19 12:55:07 +00:00
  • e390742c59 Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536) XueLiang Yang 2026-03-19 20:05:07 +08:00
  • 7a6ebcbfcf [Model] Remove unnecessary get_language_model (#37545) Cyrus Leung 2026-03-19 20:00:36 +08:00
  • c7bc12c20f [CI/Build] Split out MM pooling tests (#37542) Cyrus Leung 2026-03-19 19:36:11 +08:00
  • f9e2a38386 [Docs] Reorganize pooling docs. (#35592) wang.yuqi 2026-03-19 19:25:47 +08:00
  • 4426447bba Don't log exc_info when vLLM tries to doenload a file that doesn't exist (#37458) Harry Mellor 2026-03-19 10:38:29 +00:00
  • 3322e26420 [Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538) Li, Jiang 2026-03-19 18:24:39 +08:00
  • 765e461065 [Bugfix] Fix Nemotron Parse loading (#37407) Cyrus Leung 2026-03-19 17:55:29 +08:00
  • 6a9cceb219 [Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418) Duyi-Wang 2026-03-19 17:49:27 +08:00
  • 199f914183 fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369) yassha 2026-03-19 10:45:06 +01:00
  • ca21483bf9 [MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415) Kunshang Ji 2026-03-19 17:23:24 +08:00
  • da70c87e81 [CI] Fix wrong path test file, missing rlhf_async_new_apis.py (#37532) TJian 2026-03-19 17:21:55 +08:00
  • 0b6d52629f Support temporal compression for Nemotron-3-VL videos (#36808) Collin McCarthy 2026-03-19 01:02:19 -07:00
  • d3cc379567 [Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425) Ziming Huang 2026-03-19 15:43:48 +08:00
  • 354cd580d5 fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic streaming (#37510) cdpath 2026-03-19 15:23:35 +08:00
  • d49f273144 [SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310) zhanqiuhu 2026-03-19 03:22:00 -04:00
  • b21d384304 [Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504) Flora Feng 2026-03-19 03:19:36 -04:00
  • e3126cd107 [ROCm] issue management - request information for bug issues on ROCm (#37009) Hongxia Yang 2026-03-18 23:51:29 -04:00
  • e37ff5b5c8 [Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement (#37347) Wentao Ye 2026-03-18 22:27:51 -04:00
  • 6accb21f2a [bug] Fix deadlock with pause resume and collective_rpc (#37024) Aaron Hao 2026-03-18 18:49:02 -07:00
  • 89138b21cc [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442) v0.18.0rc2 Elvir Crnčević 2026-03-19 01:28:37 +01:00
  • 6edd43de3c [Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720) JartX 2026-03-17 22:55:34 +01:00
  • 053f3b6309 [Model Runner V2] Spec decode rejection sampler logprobs support (#37237) Giancarlo Delfin 2026-03-18 18:36:27 -07:00
  • 5f82706a21 [BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334) Aaron Hao 2026-03-18 17:45:10 -07:00
  • c32a58cc2a [EPLB] Simplify EPLB rearrange by only returning one map (#36267) Sage Moore 2026-03-18 17:34:00 -07:00
  • ef2c4f778d [Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442) Elvir Crnčević 2026-03-19 01:28:37 +01:00
  • 9dade5da3a [XPU]Unify xpu test dependencies in dockerfile.xpu (#36477) sihao_li 2026-03-19 08:12:07 +08:00
  • 828f862acb [Bugfix] Expand quantization method support in perf metrics (#37231) Thillai Chithambaram 2026-03-18 19:54:19 -04:00
  • 577df69b26 [Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054) Andy Lo 2026-03-18 23:07:29 +00:00
  • 04244fd0e1 [Model Runner V2] Spec decode rejection sampler greedy support (#37238) Giancarlo Delfin 2026-03-18 15:59:03 -07:00
  • 9482b0b085 [Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465) Michael Goin 2026-03-18 23:37:49 +01:00
  • 5bc1da147f [LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928) Woosuk Kwon 2026-03-18 15:34:19 -07:00
  • 0091017188 fix(worker): optimize swap_states to copy only active token prefixes (#34733) Philip Ottesen 2026-03-18 22:59:27 +01:00
  • 0d81a1fe61 [V0 Deprecation] Deprecate virtual engine (#37195) Wentao Ye 2026-03-18 17:30:14 -04:00
  • 6ae4c8d6fc chunk parakeet into 30s clips to prevent OOMs on long audios (#36671) Netanel Haber 2026-03-18 23:22:24 +02:00
  • a913b612d8 [Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795) (#37427) JartX 2026-03-18 21:06:31 +01:00