Commit Graph

  • 04b5f9802d [CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722) Michael Goin 2025-10-14 13:52:05 -04:00
  • efc8f7d814 Update coveragerc and add codecov.yml for path fixes (#26435) Reza Barazesh 2025-10-14 12:45:06 -04:00
  • 6d87a2838c [Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH (#26743) Wentao Ye 2025-10-14 11:47:49 -04:00
  • e6cdbd6792 Revert "[issues template] Encourage the author implement their own ideas" (#26814) wang.yuqi 2025-10-14 23:37:34 +08:00
  • df850c4912 [Feature][Responses API] Stream Function Call - harmony (#24317) Chauncey 2025-10-14 23:31:43 +08:00
  • 720394de43 [KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046) Qier Li 2025-10-14 10:38:07 -04:00
  • 88a49745af [issues template] Encourage the author implement their own ideas (#26671) wang.yuqi 2025-10-14 22:32:36 +08:00
  • ca683a2a72 use combo kernel to fuse qk-norm and qk-rope (#26682) Boyuan Feng 2025-10-14 06:40:59 -07:00
  • e9f1b8c9e9 Adjusted the model order of the model registration file (#26798) 汪志鹏 2025-10-14 21:26:11 +08:00
  • ea97940d6c [DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864) Jaya Yuan 2025-10-14 21:07:50 +08:00
  • fdd32750f0 [CI/Build] Cleanup LoRA test (#26752) Jee Jee Li 2025-10-14 20:06:35 +08:00
  • c715ba3735 [Feature] Change vllm.py with pydantic validation (#26726) Vladislav Bronzov 2025-10-14 14:00:54 +02:00
  • 9c4cb68339 [Chore] Remove SupportsV0Only interface and update supported models docs (#26783) Cyrus Leung 2025-10-14 19:55:10 +08:00
  • 780eb03d9b [CI] Fix test_tool_id_kimi_k2 (#26787) Chauncey 2025-10-14 18:27:07 +08:00
  • ef9676a1f1 [Doc] ruff format some Python examples (#26767) Cyrus Leung 2025-10-14 18:21:53 +08:00
  • 70b1b330e1 Don't allow typos to fix by default (#26785) Harry Mellor 2025-10-14 11:05:15 +01:00
  • d1d063a588 [Chore] Use max_transformers_version for Qwen-VL test (#26792) Cyrus Leung 2025-10-14 18:03:46 +08:00
  • 7e6edb1469 [NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556) Chendi.Xue 2025-10-14 04:46:05 -05:00
  • 74704d4553 [Model] Use merge_by_field_config for MM models (O-P) (#26776) Cyrus Leung 2025-10-14 17:42:45 +08:00
  • d2f816d6ff [Bugfix] Standardize merging multimodal embeddings (#26771) Cyrus Leung 2025-10-14 17:36:21 +08:00
  • 577d498212 [Plugin] Make plugin group clear (#26757) wangxiyuan 2025-10-14 15:49:59 +08:00
  • fd85c9f426 [Bugfix][FE]: Always include usage with --enable-force-include-usage (#20983) Max Wittig 2025-10-14 09:17:39 +02:00
  • d32c611f45 [CI/Build] Use 127.0.0.1 instead of localhost in utils (#26750) Ye (Charlotte) Qi 2025-10-14 00:04:00 -07:00
  • 01ad27faff [Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code (#26684) CSWYF3634076 2025-10-14 14:55:23 +08:00
  • 481545b397 scheduler.py: Update the name of the default scheduler. (#26758) Ryan Li 2025-10-14 14:52:21 +08:00
  • d3cc8427c0 [ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) (#26718) Alexei-V-Ivanov-AMD 2025-10-14 01:10:23 -05:00
  • 4821ac1b4d [CI] [ROCm] Automate CC list for ROCm related issue (#26753) vllmellm 2025-10-14 13:57:26 +08:00
  • 4497c8f821 Fix lora tests failure in TPU CI due to the removal of LoRA bias (#26723) XiongfeiWei 2025-10-13 22:04:23 -07:00
  • 2e36cdbe2b [Docs] Add a start tag to build.inc.md (#26747) Michael Yao 2025-10-14 12:51:55 +08:00
  • fe3edb4cf0 Add support for the /rerank endpoint in vllm bench serve (#26602) Maximilien de Bayser 2025-10-14 01:25:43 -03:00
  • 29350922c6 [Feature][Quantization] auto_round format add support for regex (#24024) Heng Guo 2025-10-14 11:03:16 +08:00
  • 8ae169286f [torch.compile] Unwrap fused_marlin_moe custom op (#26739) Varun Sundar Rabindranath 2025-10-13 22:22:16 -04:00
  • 8a0af6a561 [build][torch.compile] upgrade depyf version (#26702) youkaichao 2025-10-14 10:12:09 +08:00
  • cfded80793 [Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742) Jialin Ouyang 2025-10-13 18:46:44 -07:00
  • b59dd19b55 [compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681) Angela Yi 2025-10-13 18:15:34 -07:00
  • 3e051bda82 [UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732) Michael Goin 2025-10-13 21:12:52 -04:00
  • 8317f72354 [Misc][DP] support customized aggregated logger for dp (#24354) Lucia Fang 2025-10-13 17:45:59 -07:00
  • d8bebb008a Add tests for chunked prefill and prefix cache with causal pooling models (#26526) Maximilien de Bayser 2025-10-13 20:45:04 -03:00
  • 35bc22f23c [ResponseAPI] Further polish message serialization and unit tests (#26728) Jialin Ouyang 2025-10-13 16:31:35 -07:00
  • fa96fb9c70 Pruning kernel Core Tests (#26727) Fardin Hoque 2025-10-13 16:08:18 -07:00
  • e3fdb627d9 [FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502) Morrison Turnansky 2025-10-13 18:47:16 -04:00
  • 7200a21cd1 [Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' (#26532) Wentao Ye 2025-10-13 18:26:37 -04:00
  • 577c72a227 [CI Perf]Prune Tests in kernel/mamba (#26538) Fardin Hoque 2025-10-13 15:22:31 -07:00
  • 314285d4f2 [CI] Fix mypy for vllm/distributed (#26593) Wentao Ye 2025-10-13 16:02:24 -04:00
  • d2a7938582 [Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). (#26414) wang.yuqi 2025-10-14 03:06:43 +08:00
  • 89342ce4c0 [Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization (#26051) Alex Kogan 2025-10-13 14:52:54 -04:00
  • f89f599395 [CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 (#26698) Yibo Cai 2025-10-14 02:42:12 +08:00
  • e251e457c5 [Log] Optimize Startup Log (#26601) Wentao Ye 2025-10-13 14:06:57 -04:00
  • afc47e4de7 [Model] Use merge_by_field_config for MM models (M-N) (#26710) Cyrus Leung 2025-10-14 01:27:01 +08:00
  • e3b90c1ba2 [Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py (#26590) Rahul Tuli 2025-10-13 22:47:13 +05:30
  • 134f70b3ed [Bugfix][Rocm] fix qr error when different inp shape (#25892) haoyangli-amd 2025-10-14 01:04:21 +08:00
  • a1b2d658ee [CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (#26501) Sangyeon Cho 2025-10-14 01:58:33 +09:00
  • 5c7fe25491 [Misc] Separate prompt logging to debug (#26713) Aleksei Tsvetkov 2025-10-13 19:04:18 +03:00
  • 53c9a7cee2 [P/D] [NixlConnector] kv load recovery integration (#26171) Will Eaton 2025-10-13 11:48:04 -04:00
  • 0d21b9b51e [UX] Speedup DeepGEMM warmup with heuristics (#25619) Michael Goin 2025-10-13 10:59:27 -04:00
  • 10214b6935 [FEATURE]: Use pydantic validation in multimodal.py config (#26629) Anand Roy 2025-10-13 20:26:59 +05:30
  • 4a61950f4d [Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError (#26693) ihb2032 2025-10-13 22:56:01 +08:00
  • 3263799056 [unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373) Bram Wasti 2025-10-13 07:24:53 -07:00
  • 8e67b2557a [Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph (#26687) Isotr0py 2025-10-13 18:21:48 +08:00
  • 4073c82c4e [ResponseAPI] Simplify input/output message serialization (#26620) Jialin Ouyang 2025-10-13 02:59:15 -07:00
  • 767c3ab869 [Model][0/N] Improve all pooling task | clean up (#25817) wang.yuqi 2025-10-13 16:44:50 +08:00
  • 4f207c7174 Ignore large reformatting PRs in git blame (#26690) Harry Mellor 2025-10-13 09:20:47 +01:00
  • 782505ed8e [Model] Add reasoning_parser and tool_parser for Ernie45 thinking (#25027) CSWYF3634076 2025-10-13 15:55:20 +08:00
  • 98f30b8cba [Model] Fix Skywork R1V mlp (#26673) Jee Jee Li 2025-10-13 13:42:17 +08:00
  • 3cd36660f7 docs: wrong command in structured_outputs README (#26677) yihong 2025-10-13 11:59:01 +08:00
  • 46ad73955a [FIX] Throwing an exception when the model does not support pool tasks (#25840) (#25855) yyzxw 2025-10-13 11:56:21 +08:00
  • 41f3884438 [Bugfix][Core]Fix block table out-of-range issue in priority scheduling (#26661) quanliu 2025-10-13 09:25:42 +08:00
  • 60e419c1ee [Misc] cache result of disable_inplace (#26666) bnellnm 2025-10-12 20:17:50 -04:00
  • 7ef6052804 [CI/Build] Add tool to build vllm-tpu wheel (#19165) Michael Goin 2025-10-12 18:25:40 -04:00
  • 4fca1a1bd2 [easy] fix pre commit error on trunk (#26665) Huamin Li 2025-10-12 14:25:34 -07:00
  • a6049be73c [Models][Qwen3VL] Speedup fast_pos_embed_interpolate (#26647) Lukas Geiger 2025-10-12 18:20:07 +01:00
  • 18ed7746ea [Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339) gjgjos 2025-10-13 02:00:52 +09:00
  • 8fcaaf6a16 Update Optional[x] -> x | None and Union[x, y] to x | y (#26633) Harry Mellor 2025-10-12 17:51:31 +01:00
  • 9bb38130cb [Bugfix] Fix GPU_ID issue in test script (#26442) Chendi.Xue 2025-10-12 06:39:05 -05:00
  • b91d8db873 [Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP (#26574) Jaya Yuan 2025-10-12 17:58:38 +08:00
  • 045b396d09 [Bugfix][CI/Build] Fix failing Mteb CI (#26638) Isotr0py 2025-10-12 17:42:42 +08:00
  • 76852017ea [MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867) wang.yuqi 2025-10-12 17:29:08 +08:00
  • 82e64c7a20 [PERF] [Qwen3-next] Speed up gated RMSNorm (#26207) Vadim Gimpelson 2025-10-12 12:27:50 +04:00
  • 4ca204055e Add @noooop to codeowner for pooling models (#26652) wang.yuqi 2025-10-12 14:04:44 +08:00
  • c5c8f5ea59 [EPLB] Support ernie4.5-moe (#22100) Haisheng Chen 2025-10-11 19:40:47 -07:00
  • 01653a917b [compile] Fix inductor partition config (#26645) Angela Yi 2025-10-11 14:03:14 -07:00
  • 0cd103e7cb CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding (#26509) Huamin Li 2025-10-11 13:50:57 -07:00
  • 5be7ca1b99 [Benchmark] Support Infinity API (#26641) Cyrus Leung 2025-10-12 01:45:32 +08:00
  • f0a30a067b [Bugfix] Fix qwen-moe packed_modules_mapping (#26634) Jee Jee Li 2025-10-11 23:21:33 +08:00
  • 9d6cff3ede [Bugfix][Qwen3VL] fix deepstack in qwen3vl (#26626) JJJYmmm 2025-10-11 20:58:33 +08:00
  • a25f2adee9 [compile] Add patched_fused_scaled_matmul_reduce_scatter (#26604) Angela Yi 2025-10-11 05:44:43 -07:00
  • d0bed837ac [Refactor]Reduce duplicate code in serving_chat (#26627) Chauncey 2025-10-11 20:04:49 +08:00
  • f7ee69868a [CPU] fix the issue when the node is '-' cause json decode error. (#26562) muzian666 2025-10-11 20:04:04 +08:00
  • d2a71530c1 Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE (#26485) Rahul Tuli 2025-10-11 15:44:41 +05:30
  • 086609de64 fix(nix): Allow local oneDNN path to fix vLLM CPU build failure (#26401) ihb2032 2025-10-11 17:12:16 +08:00
  • 727144bed1 [Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py (#24172) dsinghvi 2025-10-11 12:51:04 +05:30
  • 55392bc879 [Bugfix][Multi Modal] Fix incorrect Molmo image processing (#26563) sangho.lee 2025-10-11 00:28:23 -05:00
  • ddaff2938e [MM] Move Qwen3Omni MRoPE impl to model file (#26608) Roger Wang 2025-10-10 22:17:24 -07:00
  • 27ed39a347 [XPU] Upgrade NIXL to remove CUDA dependency (#26570) liuzhenwei 2025-10-11 13:15:23 +08:00
  • 8f8474fbe3 [CI/Build] Fix ppc64le CPU build and tests (#22443) Nishidha Panpaliya 2025-10-11 10:34:42 +05:30
  • be067861c6 [Frontend] Improve the performance of is_reasoning_end (#25735) Chauncey 2025-10-11 10:43:39 +08:00
  • 5bc26c438d [BugFix] Make penalties and bad_words work with async scheduling (#26467) Nick Hill 2025-10-10 16:27:04 -07:00
  • eef921f45e AOT Compilation for torch.compile (Bundled) (#24274) Zhengxu Chen 2025-10-10 19:02:11 -04:00
  • e317414ce1 Cache the environment variable check for batch invariance (#26510) Bram Wasti 2025-10-10 15:47:34 -07:00
  • 949cb0170d [BugFix] Fix async scheduling + request preemption (#26385) Nick Hill 2025-10-10 13:29:57 -07:00