Commit Graph

  • 981cadb35c [Bugfix][Kernel] fix merge attn states when both prefix and suffix are empty (#28181) courage17340 2025-11-06 17:52:13 +08:00
  • c3ee80a01a [V0 deprecation]clean up is_v1_supported_oracle (#28116) wangxiyuan 2025-11-06 16:05:32 +08:00
  • 3755c14532 [CPU] Enable torch profiling (#28130) Aditya Tewari 2025-11-06 07:32:05 +00:00
  • 201dc98acc Fix hard-coded parameter name in gemma3n.py (#27946) Seungduk Kim 2025-11-06 16:07:36 +09:00
  • a404e2c0f1 Patch Mistral Tokenizer (#28146) Julien Denize 2025-11-06 07:43:16 +01:00
  • e31946f86e [flashinfer] fix FI all2all with FI cutlass moe (#28166) Xiaozhu Meng 2025-11-05 21:52:16 -08:00
  • bde5039325 [CI] Add compile/test_multimodal_compile.py to CI (#28151) gmagogsfm 2025-11-05 21:41:47 -08:00
  • d72299d47b Make the cv2 dependency optional (#27780) Jacob Zhong 2025-11-06 13:08:55 +08:00
  • 80679f108f [Core][MM] Use non-blocking CPU-GPU copy of multimodal data (#28141) Lukas Geiger 2025-11-06 04:05:12 +00:00
  • 43ecd0a900 [Chore] Clean up deepseek v2/v3 config copy (#28055) Isotr0py 2025-11-06 11:46:30 +08:00
  • 07d614511f [Misc] Remove the duplicate code (#28111) Chauncey 2025-11-06 10:07:47 +08:00
  • f948ab6945 [CI Failure] nm-testing/Qwen2-0.5B-Instruct-FP8-SkipQKV was removed from HF. Skip it in tests (#28170) Vadim Gimpelson 2025-11-06 05:22:13 +04:00
  • d71af5f502 [Feature] Enable TP + EP shared_experts overlap with router, 3.7% E2E performance improvement (#28164) Wentao Ye 2025-11-05 20:21:08 -05:00
  • 90189c71a9 [Bug] Fix env string "0" same to True (#28159) Wentao Ye 2025-11-05 20:04:20 -05:00
  • d79d9f0780 [Bug] Fix cpu disable shared_experts VLLM_DISABLE_SHARED_EXPERTS_STREAM (#28157) Wentao Ye 2025-11-05 20:03:09 -05:00
  • b6a248bdd7 [PERF] Decouple projections from GDN custom op. Attempt 2 (#28083) Vadim Gimpelson 2025-11-06 05:01:12 +04:00
  • 1767658559 [Debugging] Add annotation for easier trace analysis (#22496) Dayeol Lee 2025-11-05 16:52:52 -08:00
  • efe73e9b57 [Core][Hybrid allocator + connector 2/n] Unify remove_skipped_blocks by get_last_useful_token (#25431) Kuntai Du 2025-11-05 16:12:00 -08:00
  • 0b8e871e5e [CI/Build] Fix test_defaults_with_usage_context in AMD CI (#27926) Zhewen Li 2025-11-05 15:40:24 -08:00
  • 5ee93a5956 [CI/Build] Update checking logic in cutlass_group_gemm_supported (#27948) Zhewen Li 2025-11-05 15:40:10 -08:00
  • e15601789b [Feature]: Add corrupted request metric to V1 metrics system. (#27306) Snehlata 2025-11-06 03:15:29 +05:30
  • 65ac8d8dc4 [Docs] Add guide to debugging vLLM-torch.compile integration (#28094) Richard Zou 2025-11-05 16:31:46 -05:00
  • ffb08379d8 [Chore] Remove Nemotron-Nano-VL config copy (#28126) Isotr0py 2025-11-06 04:06:45 +08:00
  • e04492449e [Hardware][IBM Z] Optimize s390x Dockerfile (#28023) R3hankhan 2025-11-06 00:55:44 +05:30
  • 518ec6b722 [Docs] Clean up README_TUNING.md (#28088) Michael Yao 2025-11-06 03:01:34 +08:00
  • 802748bddb [Bugfix] Fix Qwen3-Reranker-8B load (#28117) wang.yuqi 2025-11-06 02:33:50 +08:00
  • faedbb4d4f [Feature] Extend batch invariant torch.compile to B200 (#27856) Paul Zhang 2025-11-05 13:04:49 -05:00
  • 40db194446 [CI]: Add LMCacheConnector Unit Tests (#27852) Samuel Shen 2025-11-05 09:45:57 -08:00
  • c765f0b443 [FlashInfer] Avoid FlashInfer block_size 16 + head_size 256 on blackwell (#27994) Chen Zhang 2025-11-05 09:25:32 -08:00
  • 002b07c4b2 [Bugfix] vLLM should check Inductor config for compile cache enablement status (#27637) gmagogsfm 2025-11-05 09:22:44 -08:00
  • 752ddeacaa [Core] add support for reasoning parser plugins (#28075) Walter Beller-Morales 2025-11-05 12:15:06 -05:00
  • c18f88c6ca [Kernel] Fuse computation of g and beta for Gated Delta Net (#28095) Jiangyun Zhu 2025-11-06 01:14:55 +08:00
  • 6fd0df8132 [misc] add vLLM Beijing Meetup (#28127) Jiaju Zhang 2025-11-06 01:12:59 +08:00
  • 3f5a4b6473 [Bugfix] Validate custom logits processor xargs for online serving (#27560) Isotr0py 2025-11-06 00:53:33 +08:00
  • 6cae1e5332 [ROCm][MLA] Support block-size > 1 for AITER MLA backend (#27224) Pleaplusone 2025-11-05 23:43:02 +08:00
  • 80c9275348 Enabling cooperative multi-gpu tests on multi-gpu nodes (#27986) Alexei-V-Ivanov-AMD 2025-11-05 09:35:49 -06:00
  • e50c454672 [BugFix] Support EP/DP + EPLB with MTP (#25311) Ilya Markov 2025-11-05 16:22:17 +01:00
  • 5d16d0fa62 [DCP] check return_lse for all layers in dcp (#27929) Chen Zhang 2025-11-05 06:27:25 -08:00
  • 0606bea2b6 add kimi reasoning parser (#28128) bigmoyan 2025-11-05 21:48:33 +08:00
  • 6e97eccf5d [XPU] Enable custom routing functions in IPEX for Llama4 (#28004) Frost Mitchell 2025-11-05 08:39:57 -05:00
  • 6ab183813c [Graph Partition][Cache] Use inductor partition ops config (#27702) Boyuan Feng 2025-11-05 05:04:48 -08:00
  • 6b7a81185d Bugfix: Cutlass FP8 FusedMoE bad scaling factors (#27255) amirkl94 2025-11-05 13:06:06 +02:00
  • b57789b62b Fix excessive logging noise by reducing the log level of the MinimaxM2ToolParser import success message (#27635) Eric Yue 2025-11-05 19:03:51 +08:00
  • 377061d481 [Misc] fix import error for DeepSeekR1ReasoningParser (#28114) Chauncey 2025-11-05 19:02:32 +08:00
  • 86dca07d9b [Hybrid allocator + kv connector] revert connector test changes related to hybrid allocator (#28011) Kuntai Du 2025-11-05 02:36:31 -08:00
  • 16b37f3119 [bugfix] fix wrong dcp_local_seq_lens calc (#27518) Qiu 2025-11-05 17:58:13 +08:00
  • 0976711f3b [Refactor] to simplify and extract the shared logic between chat completion and responses (#27961) Chauncey 2025-11-05 15:46:39 +08:00
  • e261d37c9a [Refactor] Lazy-loaded reasoning_parser (#28092) Chauncey 2025-11-05 15:37:02 +08:00
  • b7cbc25416 [Model, Core] Support Granite Speech & LoRA for STT (#24455) Alex Brooks 2025-11-05 00:33:48 -07:00
  • d43ad5a757 [BugFix] Fix DCP Assert (AssertionError: DCP not support reorder_batch_threshold > 1 now.) (#28100) Lucas Wilkinson 2025-11-05 01:54:43 -05:00
  • 0ff05e3770 [Bugfix] Fix encoder-only model support for transformers backend (#28021) Isotr0py 2025-11-05 14:24:41 +08:00
  • 428bc7bf1c [V0 deprecation] Remove VLLM_USE_V1 usage in most modules (#27955) wangxiyuan 2025-11-05 12:51:16 +08:00
  • 878fd5a16f [CI/Build] Enable some fixed tests in AMD CI (#28078) Zhewen Li 2025-11-04 19:15:59 -08:00
  • 18b39828d9 [XPU] Add gpt-oss model support for Intel GPU (#27786) Kunshang Ji 2025-11-05 10:17:23 +08:00
  • 4ea62b77f5 [Qwen3-Next] MOE configs for A100-SXM4-80GB TP4 TP8 (#27740) tou 2025-11-05 09:25:09 +08:00
  • d4e547bb7e Revert "[PERF] Decouple projections from GDN custom op" (#28080) Vadim Gimpelson 2025-11-05 03:58:23 +04:00
  • 2d977a7a9e [ROCm] gemm_a16w16 upstreaming (#26969) Aleksandr Malyshev 2025-11-04 13:01:00 -08:00
  • 1fb4217a05 [Multimodal] Make MediaConnector extensible. (#27759) Chenheli Hua 2025-11-04 10:28:01 -08:00
  • 611c86ea3c Added disable rule to track files under benchmarks/lib (#28048) nadavkluger 2025-11-04 20:18:43 +02:00
  • dc937175d4 [ROCm][Perf] New design on ROCm AITER MHA backend Implementation (#25763) Pleaplusone 2025-11-05 02:05:33 +08:00
  • 2f1cc8cef1 Remove deprecated --rope-scaling and --rope-theta (#28006) Harry Mellor 2025-11-04 10:01:56 -08:00
  • 938a81692e [AsyncScheduling] Don't schedule past request max_tokens (#27922) Nick Hill 2025-11-04 09:06:28 -08:00
  • c9f66da8fd [PerfFix] Avoid separate thread for MP executor shm spin (#28012) Nick Hill 2025-11-04 08:33:55 -08:00
  • 05cae69f0f [model] Add support for openPangu_Ultra_MoE (#27521) yt0428 2025-11-05 00:17:20 +08:00
  • 5fd8f02ea9 [PERF] Decouple projections from GDN custom op (#27512) Vadim Gimpelson 2025-11-04 20:11:41 +04:00
  • 97e3dda84b [Perf] SM100 - add swap AB optimization to CUTLASS FP8 GEMM (#27284) lyrisz 2025-11-04 07:49:25 -08:00
  • 5a0a6dfd55 [BugFix] Fix incorrect preallocated sampled_token_ids tensor size (#28025) Nick Hill 2025-11-04 07:38:16 -08:00
  • 938772af03 [Kernels] Isolate modular kernel code from FusedMoEMethodBase subclasses. (#27123) bnellnm 2025-11-04 08:59:45 -05:00
  • e4ee658672 [Model] add optimal triton fused moe configs for NemotronH MoE (#27967) tomeras91 2025-11-04 14:59:43 +02:00
  • 77f8001f53 [Model][Bugfix] fix pipeline parallelism support for NemotronH (#27968) tomeras91 2025-11-04 14:28:36 +02:00
  • 300a265978 [Core] Enable StatLogger in LLMEngine (#28020) Zhuohan Li 2025-11-04 04:13:35 -08:00
  • 03c4c4aa9d Support using Int4PreshuffledTensor after loading (#26066) Jerry Zhang 2025-11-04 03:00:57 -08:00
  • 2ec401bc39 Load tuned fused_moe_lora shrink and expand kernel configs separately (#27435) yugong333 2025-11-04 02:27:35 -08:00
  • 4022a9d279 [BugFix][Performance] Restore flashinfer autotuning for all scenarios (#27904) Varun Sundar Rabindranath 2025-11-04 02:56:21 -05:00
  • 53f6e81dfd [CI/Build] Fix OpenAI API correctness on AMD CI (#28022) Zhewen Li 2025-11-03 23:20:50 -08:00
  • 43a6acfb7d [Model] fix ernie45 reasoning_parser (#27973) CSWYF3634076 2025-11-04 15:16:46 +08:00
  • 58279c60b5 [KV Connector] Make KVCacheConfig an explicit constructor argument (#27887) Mark McLoughlin 2025-11-04 07:00:49 +00:00
  • 2f84ae1f27 [CI/Build] Update LM Eval Version in AMD CI (#27944) Zhewen Li 2025-11-03 22:36:40 -08:00
  • f32cbc9a0c [CPU]Improve dynamic 4bit moe performance (#27240) xiangze-arm 2025-11-04 14:33:23 +08:00
  • 7e4be74104 [Bug] Batch invariant: Fix flash attn MLA RuntimeError: scheduler_metadata must have shape (metadata_size) (#27884) Wentao Ye 2025-11-04 01:05:55 -05:00
  • 380ba6816d [Metrics] Enable sleep state metric outside of dev mode (#27867) Mark McLoughlin 2025-11-04 04:35:36 +00:00
  • 14a125a06d [NIXL][XPU] Pin NIXL version to 0.7.0 (#27849) liuzhenwei 2025-11-04 11:28:35 +08:00
  • c02fccdbd2 [Refactor] Lazy import tool_parser (#27974) Chauncey 2025-11-04 10:10:10 +08:00
  • 6ddae74054 [LoRA] Lora shrink swizzle (#27694) li2haipeng 2025-11-03 17:30:20 -08:00
  • b13a447546 [Bugfix][ROCm] Fix ViT rotary embeddings for torch.compile compatibility on ROCm (#27748) vllmellm 2025-11-04 09:12:19 +08:00
  • 7956b0c0bc Remove the tpu docker image nightly build. (#27997) QiliangCui 2025-11-03 16:35:54 -08:00
  • 3758757377 [Bugfix] Fix MoE Routing Simulation (#28002) Tyler Michael Smith 2025-11-03 17:26:49 -05:00
  • ccd3e55e51 [Bugfix][plugin] fla crash on plugin (#27322) Hank_ 2025-11-04 05:27:03 +08:00
  • 01baefe674 Add TP parameter to attention tests (#27683) Matthew Bonanni 2025-11-03 16:04:40 -05:00
  • 786030721e [Docs] add runai_streamer_sharded to LoadConfig (#27937) Ning Xie 2025-11-04 04:35:16 +08:00
  • 145c00a4d3 [Bugfix] change FlashMLA reorder_batch_threshold (#27777) Matthew Bonanni 2025-11-03 15:17:10 -05:00
  • 55011aef24 [Bugfix][Qwen][Multimodal] Move Qwen2_5_vl sdpa to custom op and reenable compile (#27764) Lucas Kabela 2025-11-03 11:12:15 -08:00
  • a4398fbb5e [Feature][Benchmarks] Support inf burstiness (#26941) Sophie du Couédic 2025-11-03 19:33:17 +01:00
  • 2c19d96777 [Spec Decode] Integrate Suffix Decoding from Arctic Inference (#25784) Aurick Qiao 2025-11-03 09:23:31 -08:00
  • 4bc400f47e [CI/Testing] Add basic single node dual batch overlap test (#27235) Lucas Wilkinson 2025-11-04 02:00:46 +09:00
  • cac4c10ef0 [BUG] Make 'binary' default option for saving torch compile artifacts when using standalone_compile (#27616) ahao-anyscale 2025-11-03 08:13:51 -08:00
  • f7d2946e99 [Bugfix] Skip gs:// model paths for speculator detection (#27846) pwschuurman 2025-11-03 06:31:03 -08:00
  • 294c805f1d Early exit for MoE LoRA kernels (#27131) gnovack 2025-11-03 04:22:17 -08:00
  • 40b69e33e7 [Model] Add PaddleOCR-VL Model Support (#27758) zhang-prog 2025-11-03 19:04:22 +08:00
  • 32257297dd [CI/Build] Remove the flaky gpt-oss lora test (#27966) Jee Jee Li 2025-11-03 16:50:06 +08:00