Commit Graph

  • 561f38dc3c [Bugfix] Improve EPLB config validation error message (#24524) Tyler Michael Smith 2025-09-09 20:32:36 -04:00
  • 73e688cb79 [ROCm][Feature] Enable Pipeline Parallelism with Ray Compiled Graph on ROCm (#24275) Charlie Fu 2025-09-09 18:27:35 -05:00
  • fb1a8f932a [Benchmark] Add option to skip oversampling in benchmark (#24457) Ekagra Ranjan 2025-09-09 18:00:17 -04:00
  • 0dc9cbb527 [Benchmark] Update bench doc with mtbench, blazedit, spec bench (#24450) Ekagra Ranjan 2025-09-09 17:15:41 -04:00
  • b5fb3005a8 [Log] Use a relative path in debug-level logs to distinguish files with identical names (#23846) Jiangyun Zhu 2025-09-10 04:46:35 +08:00
  • 15de5ff9ea [Feature] Disallow FlashMLA on Blackwell (#24521) Wentao Ye 2025-09-09 14:59:34 -04:00
  • b8a93076d3 [CI] execute all piecewise compilation tests together (#24502) v0.10.2rc1 Jiangyun Zhu 2025-09-10 02:05:25 +08:00
  • c3f9773b2c [TPU] Fix tpu structured decoding in mixed batches (#24458) Chenyaaang 2025-09-09 23:34:25 +05:30
  • 3707cb2505 [Docs] Gemma3n transcriptions endpoint support (#24512) Nicolò Lucchesi 2025-09-09 20:03:32 +02:00
  • 920ed46b09 [Misc] bump outlines_core to fix the version conflicts with outlines >= 1.2.0 (#24368) Kazuhiro Serizawa 2025-09-10 02:59:46 +09:00
  • 15cb047e25 Extend renderer with embedding support and integrate completion endpoint (#24405) Flora Feng 2025-09-09 10:46:46 -07:00
  • 9ad0688e43 [Bugfix] Fix hidden_size for multimodal classification model (#24501) Jee Jee Li 2025-09-10 01:37:25 +08:00
  • b9a1c4c8a2 [ROCm][CI/Build] Sync ROCm dockerfiles with the ROCm fork (#24279) Gregory Shtrasberg 2025-09-09 12:21:56 -04:00
  • 1aa427fdc1 [Kernels] Add Flash Linear Attention Kernels (#24518) youkaichao 2025-09-10 00:04:41 +08:00
  • 1c63a16b65 [Core] Run garbage collector after CUDA graph capture to fix throughput regression (#24128) Micah Williamson 2025-09-09 09:38:10 -05:00
  • 922d3b401b [Bugfix] Handle the edge case in detokenizer where processed tokens contain both stop str and eos token (#23938) d.transposed 2025-09-09 16:30:24 +02:00
  • 19332c0479 [Model] Systematic support for fp32 head, pooling models part (#23810) wang.yuqi 2025-09-09 22:29:50 +08:00
  • a55cf41a09 [Compilation][WideEP] Enable Piecewise CUDAGraph for DeepEPHT (#24123) Wentao Ye 2025-09-09 10:21:10 -04:00
  • 6fb2788163 [CI/Build][Doc] Fully deprecate old bench scripts for serving / throughput / latency (#24411) Ye (Charlotte) Qi 2025-09-09 03:02:35 -07:00
  • 3d2a2de8f7 [RL] fast weight update with zmq + ipc handles (#24295) Weixiao Huang 2025-09-09 16:57:46 +08:00
  • 1116590b16 [gpt-oss] Validate gpt-oss python tool during initialization (#23856) Chen Zhang 2025-09-09 01:37:48 -07:00
  • ccb97338af [Misc] Add Codex settings to gitignore (#24493) Roger Wang 2025-09-09 01:25:44 -07:00
  • 45c9cb5835 [Misc] Add claude settings to gitignore (#24492) Ye (Charlotte) Qi 2025-09-09 01:14:45 -07:00
  • e283976f3a [Performance][MM] Building the inverse permutation in O(n) time in Qwen2_5_VisionTransformer (#24443) WeiQing Chen 2025-09-09 15:24:11 +08:00
  • 46876dff32 [Doc]: fixing typos to improve docs (#24480) Didier Durand 2025-09-09 08:06:04 +02:00
  • 1823a00d67 [Misc] Support bench serve long context (#24373) Ming Yang 2025-09-08 22:53:10 -07:00
  • ed16d0f26f [Doc] mention fpdb for multiprocess breakpoints (#24452) Mickaël Seznec 2025-09-09 06:46:45 +02:00
  • 0cdd213641 [Misc] Improve Worker process title and logging prefix (#22205) 22quinn 2025-09-08 21:43:48 -07:00
  • 948dd3443b [Bugfix] Fix Apertus HF repo name (#24447) Cyrus Leung 2025-09-09 12:40:29 +08:00
  • b2f7745774 Add data_parallel_size to VllmConfig string representation (#24298) cong-meta 2025-09-08 21:35:18 -07:00
  • 82dfb12e52 [Core] Use sha256 bytes instead of BlockHash to reduce GC overhead (#23673) Zebing Lin 2025-09-09 00:34:37 -04:00
  • bba1042c6f [Flashinfer] Support Flashinfer TRTLLM FP8-qkv BF16/FP16-out Attention Kernel (#23647) elvischenv 2025-09-09 11:53:07 +08:00
  • b6fbc15634 [BugFix][Model] Fix Ernie4.5-VL hanging on long inputs (#24074) CSWYF3634076 2025-09-09 11:37:16 +08:00
  • 3e0d4a3475 Move KVTransferConfig from config/__init__.py to config/kv_transfer.py (#24434) Harry Mellor 2025-09-09 04:30:32 +01:00
  • 562663a044 Bump actions/github-script from 7.0.1 to 8.0.0 (#24413) dependabot[bot] 2025-09-09 03:12:44 +00:00
  • ed1623a88a Bump actions/stale from 9.1.0 to 10.0.0 (#24412) dependabot[bot] 2025-09-09 03:11:20 +00:00
  • 13b89bd823 [doc] update vllm serve cli args documentation (#24329) cjackal 2025-09-09 12:07:58 +09:00
  • 22a0070530 Bump actions/setup-python from 5.4.0 to 6.0.0 (#24414) dependabot[bot] 2025-09-09 02:54:58 +00:00
  • 170129eb28 [gpt-oss] Harmony changes with container tool support (#23386) zhiweiz 2025-09-08 19:03:50 -07:00
  • 955c624915 [Bugfix][Wide EP] Fix redundant work when using DeepEP, TP Attn, and EP MoE (#24134) Tyler Michael Smith 2025-09-08 22:01:51 -04:00
  • 4f87abdcc6 Update reviewers for modelopt related files (#24468) Zhiyu 2025-09-08 18:53:13 -07:00
  • 6910b56da2 [CI] Add nightly multiarch manifests to dockerhub (#24102) Sahithi Chigurupati 2025-09-08 18:18:09 -07:00
  • e10fef0883 [Hardware][IBM Z] Fix Outlines Core issue for s390x (#24034) R3hankhan 2025-09-09 05:20:34 +05:30
  • e680723eba [Bugfix] Disable the statslogger if the api_server_count is greater than 1 (#22227) Chauncey 2025-09-09 06:28:03 +08:00
  • 620db1fc58 [Attention] FlashAttention MLA cudagraph support (#23958) Matthew Bonanni 2025-09-08 15:05:26 -07:00
  • 41183c1fe0 [Spec Decode] Fix offline spec_decode.py (#24257) Ekagra Ranjan 2025-09-08 16:44:13 -04:00
  • 43d9ad03ba [Model loader]: support multi-thread model weight loading (#23928) Yang Kaiyong 2025-09-09 02:49:39 +08:00
  • 7be141b2c5 [CI] Enable encoder model compilation test (#24442) Jiangyun Zhu 2025-09-09 02:48:06 +08:00
  • 8d7f39b48c [Model] Remove quantized mixtral (#24437) Jee Jee Li 2025-09-09 02:02:14 +08:00
  • cd08636926 [Spec Decode][Benchmark] Add Blitzedit dataset (#23605) Ekagra Ranjan 2025-09-08 13:32:52 -04:00
  • 3feeeb9fea [Spec Decode][Benchmark] Add Spec Bench Dataset for benchmarking (#23563) Ekagra Ranjan 2025-09-08 13:32:42 -04:00
  • 6f4a82f8b5 [Model] Enable BNB support for qwen2_5_omni_thinker (#24420) Jee Jee Li 2025-09-09 00:37:08 +08:00
  • c44797a4d6 [Docs]add eplb_config param use docs (#24213) rongfu.leng 2025-09-09 00:36:57 +08:00
  • 55be93baf5 [Doc]: fix 2 hyperlinks leading to Ray site after they changed Ray's doc structure (#24438) Didier Durand 2025-09-08 18:36:54 +02:00
  • 717fc00e98 [Docs] Move feature compatibility tables to README (#24431) Harry Mellor 2025-09-08 14:45:14 +01:00
  • 01dfb5e982 [Frontend] User-provided uuids for medias in chat. (RFC #22044) (#23449) Chenheli Hua 2025-09-08 06:42:20 -07:00
  • 03dd652c16 Move KVEventsConfig from config/__init__.py to config/kv_events.py (#24433) Harry Mellor 2025-09-08 14:41:27 +01:00
  • 9cd76b71ab [Misc] Terratorch related fixes (#24337) Christian Pinto 2025-09-08 15:40:26 +02:00
  • e041314184 [Bugfix] Fix mamba2 prefill chunking (#23279) tomeras91 2025-09-08 14:42:41 +03:00
  • 5e537f45b4 [Bugfix] Fix get_quant_config when using modelscope (#24421) Li Wang 2025-09-08 19:03:02 +08:00
  • c2a8b08fcd [Doc] Fix issues in integrations/llamastack.md (#24428) Michael Yao 2025-09-08 17:28:32 +08:00
  • f4962a6d55 [Doc]: fix typos in Python comments (#24417) Didier Durand 2025-09-08 09:22:16 +02:00
  • 2f0b833a05 [Docs] Fix a tip indentation and typo (#24419) Michael Yao 2025-09-08 15:19:40 +08:00
  • 425b04b8f4 [gpt-oss][Responses API] Fix the function call id format (#24409) Chauncey 2025-09-08 14:49:52 +08:00
  • 60f0843ef8 [Model] Remove unnecessary CUDA sync of Qwen2VL image and video preprocess (#24334) Chatcharin Sangbutsarakum 2025-09-08 13:11:12 +07:00
  • 8a46602606 [Model] Remove unnecessary CUDA sync of GLM-4.1V image and video preprocess (#24332) Chatcharin Sangbutsarakum 2025-09-08 13:10:54 +07:00
  • 61aa4b2901 [P/D] Add a shutdown method to the Connector API (#22699) Chauncey 2025-09-08 14:07:00 +08:00
  • 8c892b1831 [Doc] Fix UTF-8 encoding issues in documentation generation on Windows (#24361) Al-Ekram Elahee Hridoy 2025-09-07 23:33:52 -06:00
  • 3bca396f79 [CI/Build] Fix local image inputs in test_pixtral.py (#24401) Chenheli Hua 2025-09-07 20:31:35 -07:00
  • 3a3e91bdfe [CI/Build] Disable flaky test_structured_output tests (#24404) 22quinn 2025-09-07 19:51:59 -07:00
  • b3d7e3c845 [Sampler] Support returning all prompt logprobs (#23868) Xingyu Liu 2025-09-07 19:34:31 -07:00
  • 67841317d1 [xpu] upgrade ipex/python3.12 for xpu (#23830) Yan Ma 2025-09-08 10:07:16 +08:00
  • 86173ad593 [Kernel] Support decode context parallelism on Blackwell with CUTLASS MLA (#24385) Ming Yang 2025-09-07 18:27:12 -07:00
  • 795b6951cd Add @luccafong to codeowner for spec decode (#24397) Lucia Fang 2025-09-07 17:30:27 -07:00
  • 2e5d21378d Skip MM Encoder for non-first PP ranks (#24387) Woosuk Kwon 2025-09-07 09:38:35 -07:00
  • 0661cb9df3 Add renderer-based prompt processing for embedding and classification endpoints (#24356) Flora Feng 2025-09-07 01:26:48 -07:00
  • 105d3d62ef [TPU] Remove TopKTopPSampler dependency for TPU sampler (#24391) Woosuk Kwon 2025-09-07 01:12:36 -07:00
  • 62f66be1f7 [Bugfix] Fix Qwen3-coder moe tuned config (#24072) Jee Jee Li 2025-09-07 13:19:46 +08:00
  • 81c53ef55c [Misc] collect flashinfer version in collect_env.py (#24378) Ye (Charlotte) Qi 2025-09-06 20:30:41 -07:00
  • 75334956c2 QWEN3 Thinking Fused MoE kernels Optimization configs (#24330) Saman A. Pour 2025-09-06 20:18:54 -07:00
  • 77aec83b8c [Benchmark] add benchmark for custom activation op (#23908) Jiangyun Zhu 2025-09-07 11:12:05 +08:00
  • e67597545b [CI][Fix] deterministic seed for flaky CI runs on structured outputs (#24380) Aaron Pham 2025-09-06 23:10:40 -04:00
  • 37a6fa95fd Migrate Qwen2 inputs to TensorSchema (#23475) Benji Beck 2025-09-06 20:07:31 -07:00
  • 558f0907dc [attention][DCP] use AttentionImpl.need_to_return_lse_for_decode (#24372) youkaichao 2025-09-07 09:18:59 +08:00
  • 4172235ab7 [V0 deprecation] Deprecate V0 Neuron backend (#21159) Woosuk Kwon 2025-09-06 16:15:18 -07:00
  • 848562bd49 break execute_model in gpu_model_runner into sub-functions for custom scopes (#24265) Bangsheng Tang 2025-09-06 14:02:47 -07:00
  • e68dc2f014 [Bugfix] Fix unstable silu_mul+nvfp4 quant fusion test (#24370) elvischenv 2025-09-07 04:39:34 +08:00
  • a3645ed94d [Frontend][Responses API] Support reporting tool output tokens and fix reasoning token count (#24285) Ye (Charlotte) Qi 2025-09-06 13:27:15 -07:00
  • fb691ee4e7 [Fix] [gpt-oss] fix non-tool calling path for chat completion (#24324) Aaron Pham 2025-09-06 15:10:32 -04:00
  • 6024d115cd Lora bias(enable_lora_bias) deprecate warning (#24339) Ashwin Phadke 2025-09-06 22:12:19 +05:30
  • 7555d6b34a [Bugfix] Fix test_mixtral_moe (#24371) Jee Jee Li 2025-09-07 00:32:03 +08:00
  • 00a4e56d8d [Bugfix] Fix broken deepseek fp8 TP weights loading (#24367) Isotr0py 2025-09-07 00:23:12 +08:00
  • 0eadaeff7e [Bugfix] Avoid uninitialized usage of azp_val when AZP is false. (#24335) mohankku 2025-09-06 08:17:03 -07:00
  • 0077c8634e Add @benchislett to codeowner for spec decode and structured outputs (#24362) Benjamin Chislett 2025-09-06 10:03:35 -04:00
  • b121ca22ad [CI] Disable flaky structured output test from CI (#24366) Roger Wang 2025-09-06 06:31:56 -07:00
  • eddaafc1c7 [Multimodal] Improve max video embedding length estimation in V1 (#24312) Roger Wang 2025-09-06 02:33:19 -07:00
  • 305a1cc0d2 refactor: Turn GPUModelRunner.inputs_embeds to a CpuGpuBuffer (#24345) Andrew Sansom 2025-09-06 01:01:23 -05:00
  • 6d6c6b05d3 [New Model]: google/embeddinggemma-300m (#24318) wang.yuqi 2025-09-06 13:58:36 +08:00
  • 53b19ccdd5 [Core] Allow disabling TP sharding for parallel Linear layer (#23024) Isotr0py 2025-09-06 13:53:58 +08:00
  • 6432739ef1 [Bugfix] Catch and log invalid token ids in detokenizer (#24351) Nick Hill 2025-09-05 22:30:22 -07:00