Commit Graph

  • 883131544f [Bugfix] Update import path for bc_linter_include (#24766) Mohammad Miadh Angkad 2025-09-18 04:33:11 +08:00
  • ee5fd49150 [Misc] Update owners for KV connector and V1 offloading (#25041) Yihua Cheng 2025-09-17 12:37:29 -07:00
  • 7ae9887542 [V1] Logits processor docs (#22919) afeldman-nm 2025-09-17 14:53:12 -04:00
  • e3db5ebb66 [CI Bugfix] Fix failing test_model_load_with_params tests due to tokenizer refactor (#25086) Michael Goin 2025-09-17 14:15:05 -04:00
  • 9d442b7c48 [V0 Deprecation] Remove V0 tests in test_sequence.py (#25088) Woosuk Kwon 2025-09-17 11:08:45 -07:00
  • eb68c2dcd9 [CI] Revert back prepare_prompts and check_answers (#25087) Woosuk Kwon 2025-09-17 11:03:16 -07:00
  • 8b32464ac1 Change log level from info to debug for IOProcessor (#24999) Michael Goin 2025-09-17 13:21:28 -04:00
  • 99cc41ad50 [V0 Deprecation] Remove unused output processor util (#25023) Woosuk Kwon 2025-09-17 09:50:07 -07:00
  • d6a518fdde Remove unused find_cuda_init helper script (#25044) Simon Mo 2025-09-17 09:47:40 -07:00
  • 4aa8c7b047 cleanup: remove adapter commons (#25045) Simon Mo 2025-09-17 09:46:29 -07:00
  • 4b946d693e [V0 Deprecation] Remove V0 Core tests (#25082) Woosuk Kwon 2025-09-17 09:32:42 -07:00
  • 087c6ffc92 [CI Bugfix] Fix failing test_invalid_env (#25078) Michael Goin 2025-09-17 11:28:58 -04:00
  • 4a2d33e371 [Docs] vllm/benchmarks/datasets.py fix docstring param format. (#24970) samzong 2025-09-17 23:11:51 +08:00
  • 8f3616f422 Remove old cutlass mla (#23961) Matthew Bonanni 2025-09-17 10:31:43 -04:00
  • 47f670b03b [Docs] improve code formatting and comments for eliminate griffe build warning. (#25010) samzong 2025-09-17 22:31:20 +08:00
  • dd6a910aac [Bugfix][Qwen3-Next] fixes the varlen issue in qwen3-next's MTP implementation. (#24957) Tao He 2025-09-17 21:59:09 +08:00
  • 1b962e2457 [fix] lora benchmarks pass no_lora_flag_cpu (#23774) dolpm 2025-09-17 06:22:25 -07:00
  • bfe9380161 Apply fixes for CUDA 13 (#24599) Aidyn-A 2025-09-17 17:15:42 +04:00
  • 9fccd04e30 [Bugfix] Fix Stream usage in CPU model runner and OneDNN kernel check (#25046) Li, Jiang 2025-09-17 20:54:02 +08:00
  • 252ada5559 Add RADIO Vision Encoder Support to vLLM (#24595) danielafrimi 2025-09-17 15:53:30 +03:00
  • e120533d7a [Misc] Avoid use of deprecated AutoModelForVision2Seq (#25065) Cyrus Leung 2025-09-17 20:19:15 +08:00
  • 2b85697031 [BugFix] enable DOTALL to match multi-line tool_call parameters in extract_tool_call_required_streaming (#24668) Shijun Yin 2025-09-17 17:21:18 +08:00
  • 544fe76b95 [Frontend] Support returning all prompt logprobs (#24956) Chauncey 2025-09-17 17:03:52 +08:00
  • bb58dc8c20 [DP] Create placement groups by ray_device_key (#25026) Xinyu Chen 2025-09-17 16:57:25 +08:00
  • 0fb2551c23 [Docs] Fix griffe warning in base_static_graph.py (#25018) Michael Yao 2025-09-17 16:49:19 +08:00
  • 6c47f6bfa4 [Core] Remove tokenizer group in vLLM (#24078) Zhuohan Li 2025-09-17 01:42:59 -07:00
  • c15309a730 [Model] Apply SharedFusedMoE to glm4_moe. (#24849) whx 2025-09-17 16:02:31 +08:00
  • 4a9375fe9d [Model] Pass param prefix to LLMHead (#24862) whx 2025-09-17 16:01:27 +08:00
  • 03191cd8f0 [Core][MultiModalHasher] Hash images without converting image mode (#24969) Lukas Geiger 2025-09-17 08:57:34 +01:00
  • b77bf34e53 [EPLB] Support EPLB for Mixtral Model (#22842) rouchenzi 2025-09-17 00:27:34 -07:00
  • dd39baf717 [XPU] Fix xpu model runner call torch.cuda APIs (#25011) Kunshang Ji 2025-09-17 14:45:25 +08:00
  • 43a62c51be Add more documentation and improve usability of lognormal dist (benchmark_serving_multi_turn) (#23255) Daniel Serebrenik 2025-09-17 08:53:17 +03:00
  • ca2d1925ef [Rocm] [quantization] Fix quark ptpc moe and add test case (#24649) haoyangli-amd 2025-09-17 13:15:13 +08:00
  • 0f7acdd73c [Model] Support Qwen3-VL Model Series (#24727) Roger Wang 2025-09-16 22:01:04 -07:00
  • 5801e49776 [V0 Deprecation] Remove MQLLMEngine (#25019) Woosuk Kwon 2025-09-16 21:29:27 -07:00
  • 58d4c705a8 [Core] Get num_encoder_tokens from scheduler config (#24989) Russell Bryant 2025-09-16 23:59:07 -04:00
  • ea3de5ef0d [misc] fix typo in value error (#24995) Prashant Gupta 2025-09-16 20:58:38 -07:00
  • 67532a1a68 [UX] Remove "quantization is not fully optimized yet" log (#25012) Michael Goin 2025-09-16 23:57:51 -04:00
  • 5672ba90bd [Docs] fix invalid doc link (#25017) yyzxw 2025-09-17 11:53:23 +08:00
  • dd83a157f1 [UX] Enforce valid choices for envs like VLLM_ATTENTION_BACKEND, etc (#24761) Michael Goin 2025-09-16 23:42:23 -04:00
  • 5a411ef6c4 [Benchmarks] Add MMVU video dataset support and clean up deprecated datasets (#24719) Isotr0py 2025-09-17 11:29:43 +08:00
  • eeb135eb87 [Core] Use CpuGpuBuffer for block table tensors (#24795) Nick Hill 2025-09-16 19:18:06 -07:00
  • 3059b9cc6b [Doc] Add --force-overwrite option to generate_cmake_presets.py (#24375) elvischenv 2025-09-17 09:45:29 +08:00
  • 64ad551878 Removes source compilation of nixl dependency (#24874) Benjamin Bartels 2025-09-17 02:33:18 +01:00
  • cef32104b4 [FP8] Extend per-token-group quantization support to QuantFP8 (#24342) Tahsin Tunan 2025-09-17 07:31:06 +06:00
  • 493b10f8bf [CI] GPT-OSS GPQA eval test for Blackwell (#24920) Michael Goin 2025-09-16 21:13:21 -04:00
  • d119fc8614 [CI][Bugfix] Fix failing Blackwell test (#24993) Matthew Bonanni 2025-09-16 18:55:02 -04:00
  • dbebb7f812 [Perf] Reuse workspace for FP8+FP4 Marlin MoE (#20500) Michael Goin 2025-09-16 17:45:10 -04:00
  • 3053a22b33 fp8 kv cache support fix for torch.compile (#22758) Aleksandr Malyshev 2025-09-16 14:27:11 -07:00
  • 02d4b85454 Use kwargs for long lists of EngineCoreRequest arguments in tests and fix extra kwargs (#24987) Andrew Sansom 2025-09-16 16:06:56 -05:00
  • 86daa875fe [gpt-oss][1][bugfix] fix streaming final output (#24466) Andrew Xia 2025-09-16 12:56:16 -07:00
  • dcf2f3ec06 [ROCm] Add dependencies for ROCm (#24900) Concurrensee 2025-09-16 14:49:06 -05:00
  • 218454b9b2 [MISC] Add code owners of vllm/v1 to vllm/v1/core (#24928) Chen Zhang 2025-09-16 12:07:34 -07:00
  • f4d6eb95cf [gpt-oss][1b] streaming add item id, content id (#24788) Andrew Xia 2025-09-16 11:41:12 -07:00
  • cd1f885bcf Directly get max encoder len from VLLM config in V1 (#24866) Sugar 2025-09-17 01:52:31 +08:00
  • d593cf28fa [Misc] Add removed encoder-decoder models to previously supported models list (#24961) Isotr0py 2025-09-17 01:46:46 +08:00
  • faa7a5daac [Bugfix] Fix unable to run encoder model when disable_hybrid_kv_cache_manager is true (#24571) lianyibo 2025-09-17 01:36:58 +08:00
  • 567939953b [Core/DBO][1/N] Add Dual-Batch Overlap mechanism to VLLM (#23693) Sage Moore 2025-09-16 09:21:48 -07:00
  • 08369289af [Core][MultiModalHasher] Don't convert memoryviews to bytes during hashing (#24925) Lukas Geiger 2025-09-16 16:32:47 +01:00
  • 73cfb3c5ee [Model] Clean up and simplify Mamba2 Metadata Usage in both V0 and V1 (#24331) Chih-Chieh Yang 2025-09-16 10:53:43 -04:00
  • 4e5affeaa1 [CI] Add Decode Context Parallelism (DCP) test to CI (#24487) Ming Yang 2025-09-16 06:21:28 -07:00
  • e4f0b4cd96 (doc): set cmake c++ compatible standard when building on MacOS CPU. (#23483) TeeKen Lau 2025-09-16 23:08:46 +10:00
  • de3e53a75b feat: Add Grafana and Perces monitoring dashboards for vLLM (#23498) liangwen12year 2025-09-16 08:53:40 -04:00
  • 85e0df1392 [Docs] move benchmarks README to contributing guides (#24820) Ye (Charlotte) Qi 2025-09-16 05:52:57 -07:00
  • 0faf3cc3e8 Move SpeculativeConfig from config/__init__.py to config/speculative.py (#24904) Harry Mellor 2025-09-16 12:51:35 +01:00
  • 7ea5c73ad7 [Feat][EPLB] A novel static EPLB placement strategy for MoE models. (#23745) Chen Bruce 2025-09-16 18:55:16 +08:00
  • 27fcfe7bcf [Mamba] Support TP>1 with quantization for mamba2 mixer in case n_groups % tp_size == 0 (#24593) tomeras91 2025-09-16 13:51:01 +03:00
  • 68dbde5dbb [Bugfix] remove duplicate tokens streamed in required tool choice streaming (#23312) Cheng Kuan Yong Jason 2025-09-16 15:16:32 +08:00
  • 04ad0dc275 [benchmark] Add triton version in the moe tuned config (#24769) Jee Jee Li 2025-09-16 14:10:54 +08:00
  • 238c4c1705 [QWEN NEXT] Fused MoE kernels Optimization configs (#24924) Saman A. Pour 2025-09-15 22:06:03 -07:00
  • 8c54610265 [Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target (#24505) vllmellm 2025-09-16 12:45:38 +08:00
  • 17871983a2 [Bugfix] Fix sequence parallelism bug when enable pipeline parallelism (#24021) cascade 2025-09-15 21:32:32 -07:00
  • 759ef49b15 Remove V0 Encoder-Decoder Support (#24907) Woosuk Kwon 2025-09-15 21:17:14 -07:00
  • 5206ab20ba [XPU] Fix circular import error. (#24927) Kunshang Ji 2025-09-16 11:35:36 +08:00
  • 0af3ce1355 Upgrade flashinfer to 0.3.1 (#24470) Lu Fang 2025-09-15 19:36:09 -07:00
  • e1279ef00f [Docs] Update instructions for how to using existing torch binary (#24892) Richard Zou 2025-09-15 22:25:50 -04:00
  • 2942970d44 [Metrics] Hide deprecated metrics with gpu_ prefix (#24245) Mark McLoughlin 2025-09-16 03:15:57 +01:00
  • 3c96e7b8a1 [CI] Small Accuracy Eval Test for Deepseek Model (#24259) Wentao Ye 2025-09-15 22:14:50 -04:00
  • b42566f440 [Bug] Fix is_flashmla_supported Check Error (#24774) Wentao Ye 2025-09-15 22:10:55 -04:00
  • d96e11167d Add pytest-cov and .coveragerc (#24778) Reza Barazesh 2025-09-15 22:08:46 -04:00
  • 2891603efd [ROCm][Bugfix] Fix the case where there's bias (#24895) Gregory Shtrasberg 2025-09-15 22:05:12 -04:00
  • de2cc3d867 [Deprecation] Remove DeepGEMM Old Symbol Wrapper (#24902) Wentao Ye 2025-09-15 22:03:29 -04:00
  • e95084308b Updated CODEOWNERS for flashinfer, mla, fused_moe (#24906) Michael Goin 2025-09-15 22:01:28 -04:00
  • 7f6f2c1182 HuggingFace -> Hugging Face in Integration with Hugging Face docs (#24889) Sergio Paniego Blanco 2025-09-16 02:28:35 +02:00
  • 5bcc153d7b [Compile] Fix noop_elimination pass and add tests for noop_elimination (#24880) Jiangyun Zhu 2025-09-16 07:33:18 +08:00
  • 45bfa49cb8 [Tests] fix initialization of kv hash in tests (#24273) Mickaël Seznec 2025-09-15 23:48:27 +02:00
  • fd2f10546c [ci] fix wheel names for arm wheels (#24898) Simon Mo 2025-09-15 14:39:08 -07:00
  • e757a629e7 [Bug] Fix Cutlass Scaled MM Compilation Error (#24887) Wentao Ye 2025-09-15 17:21:17 -04:00
  • aae725af7c [Performance] Remove redundant clone() calls in cutlass_mla (#24891) Alexander Matveev 2025-09-15 16:21:53 -04:00
  • 73df49ef3a [gpt-oss][1a] create_responses stream outputs BaseModel type, api server is SSE still (#24759) Andrew Xia 2025-09-15 13:08:08 -07:00
  • 25aba2b6a3 [gpt-oss] Add IncompleteDetails to ResponsesRepsonse (#24561) Andrew Xia 2025-09-15 13:07:55 -07:00
  • 94b03f88dd Bump Flashinfer to 0.3.1 (#24868) Benjamin Bartels 2025-09-15 20:45:55 +01:00
  • 49bfc538e4 Update num_tokens_across_dp to use nccl instead of gloo (#24105) Sage Moore 2025-09-15 12:05:48 -07:00
  • a0b26701c9 [Transform] Deterministic Hadacore Transforms (#24106) Kyle Sayers 2025-09-15 19:59:31 +01:00
  • c4afdb69cc Move MultiModalConfig from config/__init__.py to config/multimodal.py (#24659) Harry Mellor 2025-09-15 18:43:16 +01:00
  • b834b4cbf1 [USAGE] Improve error handling for weight initialization in Unquantized… (#20321) Rafael Marcelino Koike 2025-09-15 12:45:49 -04:00
  • 740f0647b1 Reinstate existing torch script (#24729) Harry Mellor 2025-09-15 17:43:40 +01:00
  • 01413e0cf5 Fp8 paged attention update (#22222) xiao-llm 2025-09-15 10:43:26 -04:00
  • 0e219cd50b [Bugfix] Fix GLM4.1V multimodal processor with compatability for Transformers v4.56 (#24822) Isotr0py 2025-09-15 20:45:06 +08:00
  • 72c99f2a75 [Model]: support Ling2.0 (#24627) ant-yy 2025-09-15 20:09:30 +08:00