Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

04b5f9802d [CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722) Michael Goin 2025-10-14 13:52:05 -04:00
efc8f7d814 Update coveragerc and add codecov.yml for path fixes (#26435) Reza Barazesh 2025-10-14 12:45:06 -04:00
6d87a2838c [Config] Remove Unused Environment Variable VLLM_DISABLE_PAD_FOR_CUDAGRAPH (#26743) Wentao Ye 2025-10-14 11:47:49 -04:00
e6cdbd6792 Revert "[issues template] Encourage the author implement their own ideas" (#26814) wang.yuqi 2025-10-14 23:37:34 +08:00
df850c4912 [Feature][Responses API] Stream Function Call - harmony (#24317) Chauncey 2025-10-14 23:31:43 +08:00
720394de43 [KVConnector][Metrics] Aggregate scheduler-side KVConnectorStats (#26046) Qier Li 2025-10-14 10:38:07 -04:00
88a49745af [issues template] Encourage the author implement their own ideas (#26671) wang.yuqi 2025-10-14 22:32:36 +08:00
ca683a2a72 use combo kernel to fuse qk-norm and qk-rope (#26682) Boyuan Feng 2025-10-14 06:40:59 -07:00
e9f1b8c9e9 Adjusted the model order of the model registration file (#26798) 汪志鹏 2025-10-14 21:26:11 +08:00
ea97940d6c [DCP] Support Decode Context Parallel (DCP) for GQA with FlashAttention (#24864) Jaya Yuan 2025-10-14 21:07:50 +08:00
fdd32750f0 [CI/Build] Cleanup LoRA test (#26752) Jee Jee Li 2025-10-14 20:06:35 +08:00
c715ba3735 [Feature] Change vllm.py with pydantic validation (#26726) Vladislav Bronzov 2025-10-14 14:00:54 +02:00
9c4cb68339 [Chore] Remove SupportsV0Only interface and update supported models docs (#26783) Cyrus Leung 2025-10-14 19:55:10 +08:00
780eb03d9b [CI] Fix test_tool_id_kimi_k2 (#26787) Chauncey 2025-10-14 18:27:07 +08:00
ef9676a1f1 [Doc] ruff format some Python examples (#26767) Cyrus Leung 2025-10-14 18:21:53 +08:00
70b1b330e1 Don't allow typos to fix by default (#26785) Harry Mellor 2025-10-14 11:05:15 +01:00
d1d063a588 [Chore] Use max_transformers_version for Qwen-VL test (#26792) Cyrus Leung 2025-10-14 18:03:46 +08:00
7e6edb1469 [NIXL][HeteroTP] Enable KV transfer from HND prefill to NHD decode (#26556) Chendi.Xue 2025-10-14 04:46:05 -05:00
74704d4553 [Model] Use merge_by_field_config for MM models (O-P) (#26776) Cyrus Leung 2025-10-14 17:42:45 +08:00
d2f816d6ff [Bugfix] Standardize merging multimodal embeddings (#26771) Cyrus Leung 2025-10-14 17:36:21 +08:00
577d498212 [Plugin] Make plugin group clear (#26757) wangxiyuan 2025-10-14 15:49:59 +08:00
fd85c9f426 [Bugfix][FE]: Always include usage with --enable-force-include-usage (#20983) Max Wittig 2025-10-14 09:17:39 +02:00
d32c611f45 [CI/Build] Use 127.0.0.1 instead of localhost in utils (#26750) Ye (Charlotte) Qi 2025-10-14 00:04:00 -07:00
01ad27faff [Model][Bugfix]fix ernie45 load failed due to ernie45 eplb code (#26684) CSWYF3634076 2025-10-14 14:55:23 +08:00
481545b397 scheduler.py: Update the name of the default scheduler. (#26758) Ryan Li 2025-10-14 14:52:21 +08:00
d3cc8427c0 [ci] Adding the test-amd.yaml for test definitions for the AMD backend. (alternative PR) (#26718) Alexei-V-Ivanov-AMD 2025-10-14 01:10:23 -05:00
4821ac1b4d [CI] [ROCm] Automate CC list for ROCm related issue (#26753) vllmellm 2025-10-14 13:57:26 +08:00
4497c8f821 Fix lora tests failure in TPU CI due to the removal of LoRA bias (#26723) XiongfeiWei 2025-10-13 22:04:23 -07:00
2e36cdbe2b [Docs] Add a start tag to build.inc.md (#26747) Michael Yao 2025-10-14 12:51:55 +08:00
fe3edb4cf0 Add support for the /rerank endpoint in vllm bench serve (#26602) Maximilien de Bayser 2025-10-14 01:25:43 -03:00
29350922c6 [Feature][Quantization] auto_round format add support for regex (#24024) Heng Guo 2025-10-14 11:03:16 +08:00
8ae169286f [torch.compile] Unwrap fused_marlin_moe custom op (#26739) Varun Sundar Rabindranath 2025-10-13 22:22:16 -04:00
8a0af6a561 [build][torch.compile] upgrade depyf version (#26702) youkaichao 2025-10-14 10:12:09 +08:00
cfded80793 [Easy] Fix env type check errors from VLLM_DEBUG_LOG_API_SERVER_RESPONSE (#26742) Jialin Ouyang 2025-10-13 18:46:44 -07:00
b59dd19b55 [compile] Enable sequence parallelism for full cuda graph without specifying compile sizes (#26681) Angela Yi 2025-10-13 18:15:34 -07:00
3e051bda82 [UX] Replace VLLM_ALL2ALL_BACKEND with --all2all-backend (#26732) Michael Goin 2025-10-13 21:12:52 -04:00
8317f72354 [Misc][DP] support customized aggregated logger for dp (#24354) Lucia Fang 2025-10-13 17:45:59 -07:00
d8bebb008a Add tests for chunked prefill and prefix cache with causal pooling models (#26526) Maximilien de Bayser 2025-10-13 20:45:04 -03:00
35bc22f23c [ResponseAPI] Further polish message serialization and unit tests (#26728) Jialin Ouyang 2025-10-13 16:31:35 -07:00
fa96fb9c70 Pruning kernel Core Tests (#26727) Fardin Hoque 2025-10-13 16:08:18 -07:00
e3fdb627d9 [FrontEnd] UNREVERT CompilationConfig overhaul (#20283): deprecate use_inductor in favor of backend, simplify custom_ops (#26502) Morrison Turnansky 2025-10-13 18:47:16 -04:00
7200a21cd1 [Bug] Fix Assertion error DeepEP/csrc/kernels/intranode.cu:928: 'false and Unsupported type' (#26532) Wentao Ye 2025-10-13 18:26:37 -04:00
577c72a227 [CI Perf]Prune Tests in kernel/mamba (#26538) Fardin Hoque 2025-10-13 15:22:31 -07:00
314285d4f2 [CI] Fix mypy for vllm/distributed (#26593) Wentao Ye 2025-10-13 16:02:24 -04:00
d2a7938582 [Frontend][1/N] Improve all pooling task | Support FP16 Embedding Base64 (Still uses fp32 by default). (#26414) wang.yuqi 2025-10-14 03:06:43 +08:00
89342ce4c0 [Quantization] [Performance] Enable Marlin GEMM kernels for the calibration-free RTN-based quantization (#26051) Alex Kogan 2025-10-13 14:52:54 -04:00
f89f599395 [CI][Release][Arm64]: Build arm64 release for gpu arch 8.9 (#26698) Yibo Cai 2025-10-14 02:42:12 +08:00
e251e457c5 [Log] Optimize Startup Log (#26601) Wentao Ye 2025-10-13 14:06:57 -04:00
afc47e4de7 [Model] Use merge_by_field_config for MM models (M-N) (#26710) Cyrus Leung 2025-10-14 01:27:01 +08:00
e3b90c1ba2 [Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py (#26590) Rahul Tuli 2025-10-13 22:47:13 +05:30
134f70b3ed [Bugfix][Rocm] fix qr error when different inp shape (#25892) haoyangli-amd 2025-10-14 01:04:21 +08:00
a1b2d658ee [CI/Build] upgrade compressed-tensors to 0.12.2 to address LGPLv3 (#26501) Sangyeon Cho 2025-10-14 01:58:33 +09:00
5c7fe25491 [Misc] Separate prompt logging to debug (#26713) Aleksei Tsvetkov 2025-10-13 19:04:18 +03:00
53c9a7cee2 [P/D] [NixlConnector] kv load recovery integration (#26171) Will Eaton 2025-10-13 11:48:04 -04:00
0d21b9b51e [UX] Speedup DeepGEMM warmup with heuristics (#25619) Michael Goin 2025-10-13 10:59:27 -04:00
10214b6935 [FEATURE]: Use pydantic validation in multimodal.py config (#26629) Anand Roy 2025-10-13 20:26:59 +05:30
4a61950f4d [Hardware][CPU] Disable torch.compile for RISC-V to prevent APIError (#26693) ihb2032 2025-10-13 22:56:01 +08:00
3263799056 [unrevert] Add batch invariant kernel override for FlashInfer backend [2/n] (#26373) Bram Wasti 2025-10-13 07:24:53 -07:00
8e67b2557a [Bugfix] Fix out of bound index issue for Jina-embedding-v3 RoPE with cuda graph (#26687) Isotr0py 2025-10-13 18:21:48 +08:00
4073c82c4e [ResponseAPI] Simplify input/output message serialization (#26620) Jialin Ouyang 2025-10-13 02:59:15 -07:00
767c3ab869 [Model][0/N] Improve all pooling task | clean up (#25817) wang.yuqi 2025-10-13 16:44:50 +08:00
4f207c7174 Ignore large reformatting PRs in git blame (#26690) Harry Mellor 2025-10-13 09:20:47 +01:00
782505ed8e [Model] Add reasoning_parser and tool_parser for Ernie45 thinking (#25027) CSWYF3634076 2025-10-13 15:55:20 +08:00
98f30b8cba [Model] Fix Skywork R1V mlp (#26673) Jee Jee Li 2025-10-13 13:42:17 +08:00
3cd36660f7 docs: wrong command in structured_outputs README (#26677) yihong 2025-10-13 11:59:01 +08:00
46ad73955a [FIX] Throwing an exception when the model does not support pool tasks (#25840) (#25855) yyzxw 2025-10-13 11:56:21 +08:00
41f3884438 [Bugfix][Core]Fix block table out-of-range issue in priority scheduling (#26661) quanliu 2025-10-13 09:25:42 +08:00
60e419c1ee [Misc] cache result of disable_inplace (#26666) bnellnm 2025-10-12 20:17:50 -04:00
7ef6052804 [CI/Build] Add tool to build vllm-tpu wheel (#19165) Michael Goin 2025-10-12 18:25:40 -04:00
4fca1a1bd2 [easy] fix pre commit error on trunk (#26665) Huamin Li 2025-10-12 14:25:34 -07:00
a6049be73c [Models][Qwen3VL] Speedup fast_pos_embed_interpolate (#26647) Lukas Geiger 2025-10-12 18:20:07 +01:00
18ed7746ea [Feature] Add support for naver/splade-v3 (BERT-based sparse embedding model) (#26339) gjgjos 2025-10-13 02:00:52 +09:00
8fcaaf6a16 Update Optional[x] -> x | None and Union[x, y] to x | y (#26633) Harry Mellor 2025-10-12 17:51:31 +01:00
9bb38130cb [Bugfix] Fix GPU_ID issue in test script (#26442) Chendi.Xue 2025-10-12 06:39:05 -05:00
b91d8db873 [Bugfix][DCP] Set default CUDAGraphMode to PIECEWISE for DCP (#26574) Jaya Yuan 2025-10-12 17:58:38 +08:00
045b396d09 [Bugfix][CI/Build] Fix failing Mteb CI (#26638) Isotr0py 2025-10-12 17:42:42 +08:00
76852017ea [MISC] Rename the torch profiler filename as instance_id+rank_id for merging the Profiler results of each Rank (#25867) wang.yuqi 2025-10-12 17:29:08 +08:00
82e64c7a20 [PERF] [Qwen3-next] Speed up gated RMSNorm (#26207) Vadim Gimpelson 2025-10-12 12:27:50 +04:00
4ca204055e Add @noooop to codeowner for pooling models (#26652) wang.yuqi 2025-10-12 14:04:44 +08:00
c5c8f5ea59 [EPLB] Support ernie4.5-moe (#22100) Haisheng Chen 2025-10-11 19:40:47 -07:00
01653a917b [compile] Fix inductor partition config (#26645) Angela Yi 2025-10-11 14:03:14 -07:00
0cd103e7cb CP: make correct_attn_out robust to 4‑D views and fix Triton arg binding (#26509) Huamin Li 2025-10-11 13:50:57 -07:00
5be7ca1b99 [Benchmark] Support Infinity API (#26641) Cyrus Leung 2025-10-12 01:45:32 +08:00
f0a30a067b [Bugfix] Fix qwen-moe packed_modules_mapping (#26634) Jee Jee Li 2025-10-11 23:21:33 +08:00
9d6cff3ede [Bugfix][Qwen3VL] fix deepstack in qwen3vl (#26626) JJJYmmm 2025-10-11 20:58:33 +08:00
a25f2adee9 [compile] Add patched_fused_scaled_matmul_reduce_scatter (#26604) Angela Yi 2025-10-11 05:44:43 -07:00
d0bed837ac [Refactor]Reduce duplicate code in serving_chat (#26627) Chauncey 2025-10-11 20:04:49 +08:00
f7ee69868a [CPU] fix the issue when the node is '-' cause json decode error. (#26562) muzian666 2025-10-11 20:04:04 +08:00
d2a71530c1 Add EAGLE-3 Speculative Decoding Support for Qwen3 MoE (#26485) Rahul Tuli 2025-10-11 15:44:41 +05:30
086609de64 fix(nix): Allow local oneDNN path to fix vLLM CPU build failure (#26401) ihb2032 2025-10-11 17:12:16 +08:00
727144bed1 [Refactor]: Use M-RoPE interface directly while defining model class instead of maintaining model specific M-RoPE implementation in mrope.py (#24172) dsinghvi 2025-10-11 12:51:04 +05:30
55392bc879 [Bugfix][Multi Modal] Fix incorrect Molmo image processing (#26563) sangho.lee 2025-10-11 00:28:23 -05:00
ddaff2938e [MM] Move Qwen3Omni MRoPE impl to model file (#26608) Roger Wang 2025-10-10 22:17:24 -07:00
27ed39a347 [XPU] Upgrade NIXL to remove CUDA dependency (#26570) liuzhenwei 2025-10-11 13:15:23 +08:00
8f8474fbe3 [CI/Build] Fix ppc64le CPU build and tests (#22443) Nishidha Panpaliya 2025-10-11 10:34:42 +05:30
be067861c6 [Frontend] Improve the performance of is_reasoning_end (#25735) Chauncey 2025-10-11 10:43:39 +08:00
5bc26c438d [BugFix] Make penalties and bad_words work with async scheduling (#26467) Nick Hill 2025-10-10 16:27:04 -07:00
eef921f45e AOT Compilation for torch.compile (Bundled) (#24274) Zhengxu Chen 2025-10-10 19:02:11 -04:00
e317414ce1 Cache the environment variable check for batch invariance (#26510) Bram Wasti 2025-10-10 15:47:34 -07:00
949cb0170d [BugFix] Fix async scheduling + request preemption (#26385) Nick Hill 2025-10-10 13:29:57 -07:00

... 53 54 55 56 57 ...