Commit Graph

  • cd775bdbe0 [Tests] Replace flaky sleep with polling in test_background_cancel (#32986) 7. Sun 2026-01-24 16:39:07 +00:00
  • da5e7b12be [MLA] Fuse cat and qaunt for fp8 kv-cache (#32950) Lucas Wilkinson 2026-01-24 09:03:02 -07:00
  • 719ac592ed Update CPU doc according to feedback (#32963) Louie Tsai 2026-01-24 08:02:44 -08:00
  • 1209b784f2 [Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes (#32842) Hiroken. 2026-01-24 22:45:14 +08:00
  • 5fa0f6efa9 [EncoderCacheManager] Remove unnecessary copy (#32800) Lukas Geiger 2026-01-24 14:28:57 +00:00
  • bc0d291bfe feat: Complete LoRA support for MiniMaxM2 Fixes #32736 (#32763) david guan 2026-01-24 20:48:46 +08:00
  • 9ad7f89f55 [Models]: Make Multimodal config implicit in ViT implementation (#31972) Isotr0py 2026-01-24 20:34:26 +08:00
  • 6450b536a6 [Bugfix] Fix E2E latency calculation and add warmup support in mm_processor benchmark (#32646) Hiroken. 2026-01-24 18:31:41 +08:00
  • 0f19427db5 [Perf] Cache exc.errors() result in validation exception handler (#32984) 7. Sun 2026-01-24 10:01:35 +00:00
  • 51931c5c9a [UX] Deduplicate sampling parameter startup logs (#32953) Cyrus Leung 2026-01-24 17:37:28 +08:00
  • 06b557ecd9 feat(benchmark): add encoder forward pass benchmarking to mm-processor (#31655) Reagan Lee 2026-01-24 00:24:44 -08:00
  • 81c2a889ce [Doc] Ignore typo check on doc (#32999) Roger Wang 2026-01-23 23:52:22 -08:00
  • 8edaf38570 [Models] Add SharedFusedMoE support to Qwen3MoE (#32082) Isotr0py 2026-01-24 15:36:31 +08:00
  • 5c86a89805 [docs] Update governance process links (#32995) Roy Wang 2026-01-24 15:32:44 +08:00
  • 0ccecf8833 [Tests] Standardize RNG seed utility across test files (#32982) 7. Sun 2026-01-24 06:47:14 +00:00
  • 0b9a735e11 [Tests] Clarify pytest skip reasons with actionable context (#32981) 7. Sun 2026-01-24 06:38:50 +00:00
  • 14d03b8ddb [Perf] Cache xpu_get_mem_info() result to avoid duplicate calls (#32983) 7. Sun 2026-01-24 04:56:23 +00:00
  • d0cbac5827 [Dev UX] Add auto-detection for VLLM_PRECOMPILED_WHEEL_VARIANT during install (#32948) Michael Goin 2026-01-23 22:15:17 -05:00
  • c0d820457a Auth_token added in documentation as it is required (#32988) ruizcrp 2026-01-24 04:03:05 +01:00
  • 97ef11dd34 [ROCm][ViT] Enable Flash Attention Triton backend on RDNA3/RDNA4 (#32944) monajafi-amd 2026-01-23 19:03:07 -07:00
  • ecc3dd66cc [Bugfix] Fix FusedMoE LoRA kernel offs_token out of bound value (#32279) Xin Yang 2026-01-23 17:41:35 -08:00
  • 7e1f10d562 [Core][Bugfix] allow graceful worker termination (#32965) Joe Runde 2026-01-23 18:28:45 -07:00
  • a28b94e6ef [Performance] Split FlashAttn attention and cache update (#25954) ElizaWszola 2026-01-24 02:28:06 +01:00
  • 0118cdcc02 [fix] add VLLM_OBJECT_STORAGE_SHM_BUFFER_NAME to compile factors (#32912) dolpm 2026-01-23 14:53:10 -08:00
  • d7de043d55 [CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (#32971) v0.14.1 Shengqi Chen 2026-01-23 14:21:49 -08:00
  • 136c499f6e [CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (#32971) Shengqi Chen 2026-01-23 14:21:49 -08:00
  • ebd0a17e0e [Bugfix] Fix missing is_layer_skipped check for FusedMoE in AWQConfig (#32935) joninco 2026-01-23 17:19:56 -05:00
  • 37c9859fab [Refactor] Clean up unused variables & func (#32692) Wentao Ye 2026-01-23 17:04:25 -05:00
  • 4561f13985 [Refactor] Rename gptq_marlin to marlin to match MoE (#32952) Michael Goin 2026-01-23 16:48:12 -05:00
  • 6cc6d92be5 [CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel (#32831) rasmith 2026-01-23 15:35:48 -06:00
  • dfab5f3764 [Bug] Fix benchmark script moe_permute_unpermute (#32949) Wentao Ye 2026-01-23 16:18:56 -05:00
  • 586a57ad7e fix: Add glm4_moe_lite to MLA detection (#32614) Markus / Mark 2026-01-23 21:38:57 +01:00
  • 3a41459501 [cudagraphs] Refactor cudagraph capture loop (#32946) Lucas Wilkinson 2026-01-23 13:22:20 -07:00
  • 8518b30447 [Model Runner V2] Add KV Connector support (#32742) Nick Hill 2026-01-23 10:49:17 -08:00
  • 2d6b537157 [Bugfix][CI] Fix pre-commit (#32956) Matthew Bonanni 2026-01-23 13:26:56 -05:00
  • 68b0a6c1ba [CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests (#30443) Orion Reblitz-Richardson 2026-01-23 08:22:56 -10:00
  • 5206e5e28c [V1][Hybrid] Mamba Prefix Caching with align mode (#30877) Harry Huang 2026-01-24 01:56:48 +08:00
  • fec9da0af4 [Model] Enable LoRA support for internvl2 (#32397) Matteo Fari 2026-01-23 18:39:01 +01:00
  • bbbd696af9 [torch.compile][CI] Add back attn fusion on hopper/ada (#32940) Luka Govedič 2026-01-23 11:49:20 -05:00
  • 9b77bb790d [Frontend] add logprob, compression_rate to 'verbose_json' features (#31059) sangbumlikeagod 2026-01-24 01:35:13 +09:00
  • 305e53ade8 [Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test (#32904) Matt 2026-01-23 10:24:26 -06:00
  • 1cb4341fbc [ROCm][PD] Remove unused moriio connector proxy code (#32939) Mark McLoughlin 2026-01-23 15:59:04 +00:00
  • 1fb648bf10 [Bugfix] Fix FP8 MoE EP Weight Loading for ModelOpt Llama4 (#32886) baonudesifeizhai 2026-01-23 10:31:48 -05:00
  • 7e22309755 [Misc] Postpone torch_profiler deprecation (#32867) Nicolò Lucchesi 2026-01-23 15:39:48 +01:00
  • 90c2007932 [Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916) Xin Yang 2026-01-23 06:34:30 -08:00
  • d95d650762 [Bugfix] Fix getting vision features in Transformer Multimodal backend (#32933) Raushan Turganbay 2026-01-23 14:34:48 +01:00
  • 13d8746c54 [Feature]: Remove DtoH Copy for lfm2_vl On Default Stream (#32815) tianshu-Michael-yu 2026-01-23 05:20:30 -08:00
  • 10e94c84f6 [CPU][Feat] Update PyTorch to v2.10 for CPU Backend (#32869) Fadi Arafeh 2026-01-23 13:13:06 +00:00
  • 243e78c20f [Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark (#32927) Isotr0py 2026-01-23 20:11:18 +08:00
  • aac0b817fa [CPU Backend][BugFix] Fix failing CPU MoE test (#32876) Fadi Arafeh 2026-01-23 12:06:51 +00:00
  • 05f3d714db [Frontend][3/n] Make pooling entrypoints request schema consensus | EmbedRequest & ClassifyRequest (#32905) wang.yuqi 2026-01-23 20:03:44 +08:00
  • 3f3f89529d [Voxtral] Add new streaming arch (#32861) Patrick von Platen 2026-01-23 12:41:52 +01:00
  • 4dc11b06d3 [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789) Nicolò Lucchesi 2026-01-22 11:50:37 +01:00
  • 2bd95d803a [Misc] Bump opencv-python dependecy version to 4.13 (#32668) Isotr0py 2026-01-22 23:51:15 +08:00
  • f46d576c54 [Misc] Replace urllib's urlparse with urllib3's parse_url (#32746) Isotr0py 2026-01-22 16:37:15 +08:00
  • 5da4c7d789 [CI/Build][CPU] Fix failed pooling tests and macos smoke test (#32907) Li, Jiang 2026-01-23 18:48:20 +08:00
  • 160c6fa387 [Misc] Add get_name to missing AttentionBackends (#32698) Nicolò Lucchesi 2026-01-23 11:35:44 +01:00
  • a8eb1182f1 [CI][Models] Add VLM Support for Sequence Classification Conversion (#32885) Andreas Karatzas 2026-01-23 02:22:51 -06:00
  • fa6e599a61 [Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777) Karan Bansal 2026-01-23 13:52:37 +05:30
  • 7ef5873752 [CI] Fix mypy for vllm/v1/structured_output (#32722) Wentao Ye 2026-01-22 22:55:51 -05:00
  • 5e4e0e51f4 [torch.compile] Compile CustomOp.forward_native for SiluAndMul and QuantFP8 to avoid raw torch ops inside opaque custom ops (#32806) Luka Govedič 2026-01-22 22:52:26 -05:00
  • f61c9da711 [BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions (#32884) Rishabh Saini 2026-01-22 22:44:11 -05:00
  • 7fe255889e [Misc] Log vLLM logo when starting server (#32796) Nick Hill 2026-01-22 19:15:12 -08:00
  • dc917cceb8 [MoE Refactor] Move select_experts from FusedMoEQuantMethod -> FusedMoE (#31996) bnellnm 2026-01-22 18:21:35 -05:00
  • fc56f4a071 [BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration (#32855) Fadi Arafeh 2026-01-22 22:27:40 +00:00
  • d08b356ee0 [Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619) Xin Yang 2026-01-22 12:47:04 -08:00
  • f744810184 [Refactor] Remove unused tpu files (#32610) Wentao Ye 2026-01-22 15:35:18 -05:00
  • 44f08af3a7 Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141) Eldar Kurtić 2026-01-22 21:29:57 +01:00
  • 955b43a5a5 [Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795) Matthew Bonanni 2026-01-22 14:05:18 -05:00
  • 744ef30484 [CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792) Fadi Arafeh 2026-01-22 18:55:23 +00:00
  • 300622e609 [CI][Attention] Add more CI dependencies for attention tests (#32487) Matthew Bonanni 2026-01-22 13:44:56 -05:00
  • 69d09fdd6c [Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937) RickyChen / 陳昭儒 2026-01-23 01:53:24 +08:00
  • 3a63be0faa Support custom URI schemes and trace handlers for profiler (#32393) David Ramon Prados 2026-01-22 12:45:40 -05:00
  • 803e3f3f68 [UX] Default api_server_count to dp_size if not specified (#32525) Tyler Michael Smith 2026-01-22 12:35:35 -05:00
  • 70917b1c55 [MISC] Add .cursor to .gitignore (#32868) Vadim Gimpelson 2026-01-22 21:27:13 +04:00
  • c517d8c934 [Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837) Matt 2026-01-22 10:59:15 -06:00
  • fc37187a51 [Bugfix] ModelScope is supported when downloading LORA models. (#32844) Xu Jinyang 2026-01-23 00:33:21 +08:00
  • ff365eea94 Support bge-m3 sparse embeddings and colbert embeddings (#14526) Maximilien de Bayser 2026-01-22 12:52:57 -03:00
  • 444e2e7e1f [Misc] Bump opencv-python dependecy version to 4.13 (#32668) Isotr0py 2026-01-22 23:51:15 +08:00
  • bc14663e6a [Cleanup] Move scheduler get_routed_experts logic to separate method (#32706) Nick Hill 2026-01-22 07:46:00 -08:00
  • 654a71fc3c [torch.compile] Improve Cold Start for MoEs (#32805) Richard Zou 2026-01-22 10:44:40 -05:00
  • 15e302dfce [Misc][BE] Turn on strict type coverage for vllm/compilation (#31756) Lucas Kabela 2026-01-22 07:12:26 -08:00
  • d117a4d1a9 [Frontend] Introduce Renderer for processing chat messages (using ModelConfig) (#30200) Cyrus Leung 2026-01-22 20:44:22 +08:00
  • 421012b63a OffloadingConnector: Support kernel_block_size != block_size (#30692) Or Ozeri 2026-01-22 14:30:04 +02:00
  • 841d53aaa8 [Frontend] add prompt_cache_key for openresponses (#32824) Chauncey 2026-01-22 19:34:14 +08:00
  • 1752262e96 [CI] refactor release pipeline config into groups (#32833) Shengqi Chen 2026-01-22 03:27:21 -08:00
  • ea6102b85d [Bugfix] Fix Whisper/encoder-decoder GPU memory leak (#32789) Nicolò Lucchesi 2026-01-22 11:50:37 +01:00
  • 328cbb2773 [Frontend][2/n] Make pooling entrypoints request schema consensus | ChatRequest (#32574) wang.yuqi 2026-01-22 18:32:44 +08:00
  • 64e3d67ac0 Enable Cross layers KV cache layout at NIXL Connector (#30207) liranschour 2026-01-22 12:12:58 +02:00
  • 098b2d66fe [Benchmark] Don't default to temperature==0 in vllm bench serve (#32723) Nick Hill 2026-01-22 02:03:15 -08:00
  • 8ebf271bb6 [Misc] Replace urllib's urlparse with urllib3's parse_url (#32746) Isotr0py 2026-01-22 16:37:15 +08:00
  • 49a1262267 [AMD][ROCm] MoRI EP: a high-performance all2all backend (#28664) Alex Sun 2026-01-22 16:33:18 +08:00
  • 2b8a38b6d6 [Model] Extend collect_children and no_init_weights contexts (#32757) Cyrus Leung 2026-01-22 16:20:27 +08:00
  • 1bf1a34b19 [bench] add start_times field to vllm bench serve json result (#32667) Kebe 2026-01-22 16:10:14 +09:00
  • a810299838 [ROCm][CI][Docs] Add comment explaining TRITON_ATTN fallback for ROCm (#32835) Andreas Karatzas 2026-01-22 00:11:09 -06:00
  • eb1629da24 [ROCm][CI] Fix AITER test flakiness by using explicit attention backend (#32346) Andreas Karatzas 2026-01-21 23:55:25 -06:00
  • 019e2c3b7c [ROCm][CI] Lower Acceptance Len Threshold For test_draft_model_quantization (#32731) Micah Williamson 2026-01-21 23:47:33 -06:00
  • f5fdec8ce2 Upgrade transformers-4.57.5 (#32287) Huy Do 2026-01-21 21:19:19 -08:00
  • 1579c9b5fd [Llama.py -> mistral.py] Extract mistral-only relevant code into separate file (#32780) Patrick von Platen 2026-01-22 06:14:57 +01:00
  • 889722f3bf [FlashMLA] Update FlashMLA to expose new arguments (#32810) Lucas Wilkinson 2026-01-21 22:02:39 -07:00