Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

811ac13d03 [Core] Factor out common logic for MM budget calculation (#22228) Cyrus Leung 2025-08-05 14:54:55 +08:00
e79a12fc3a [UX] Fail if an invalid attention backend is specified (#22217) Michael Goin 2025-08-05 02:54:52 -04:00
cdfd6871a5 [Bugfix] Misaligned params in TreeAttentionImpl (#22226) Cyrus Leung 2025-08-05 13:40:09 +08:00
4b3e4474d7 Optimize configuration access with LRU cache in custom ops (#22204) ZiTian.Zhao 2025-08-05 12:43:24 +08:00
bd3db7f469 [Misc] log more detailed message for ensure_model_parallel_initialized (#22144) Ning Xie 2025-08-05 10:36:55 +08:00
29b97c0995 [Doc] add backend to doc string of initialize_model_parallel (#22142) Ning Xie 2025-08-05 10:36:20 +08:00
7b455cf1c0 [Misc] Remove pass_config from CompilationConfig dump_json excluded (#21911) elvischenv 2025-08-05 10:17:18 +08:00
8a6e108e76 fix: kimi_k2 return empty tool call list (#22149) tlipoca9 2025-08-05 10:15:31 +08:00
d7b28f3415 [Log] DeepGEMM Update Log for Unaligned Problem Size (#22208) Wentao Ye 2025-08-04 22:13:19 -04:00
6fa41e0c32 self.gate dtype update for GLM-4.5 (#22203) Yuxuan Zhang 2025-08-05 10:12:38 +08:00
031ca762d7 [ROCm][Bugfix] Compilation passes fix (#22202) Gregory Shtrasberg 2025-08-04 22:12:28 -04:00
6ad6b8e115 [FEAT] Refactor ROPE into module (#22192) TJian 2025-08-04 19:12:16 -07:00
f4f4e7ef27 [V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785) lkchen 2025-08-04 19:11:33 -07:00
5ea71ff46f [V1] reduce block size for tree attention correctness test to fix 'ou… (#22207) Giancarlo Delfin 2025-08-04 19:11:06 -07:00
7175817637 Revert "[Bugfix] V1 Fix the cursor leakage issue during request scheduling." (#22223) Woosuk Kwon 2025-08-04 18:37:06 -07:00
2dffac464c [Bugfix] V1 Fix the cursor leakage issue during request scheduling. (#21173) PiteXChen 2025-08-05 09:34:10 +08:00
bdcb42e45d [NVIDIA] Auto detect modelopt quant and fix DSR1-FP4 weight loading (#22073) Po-Han Huang (NVIDIA) 2025-08-05 09:02:55 +08:00
c09efff976 [Bugfix][V1][P/D]Fix the uneven polling issue in the toy proxy for P2pNcclConnector (#21819) Zhonghua Deng 2025-08-05 04:17:05 +08:00
309c1bb822 [Bug] Update auto_tune.sh to separate benchmarking and profiling. (#21629) ericehanley 2025-08-04 10:12:06 -05:00
9af654cc38 [Responses API] Ignore store=True and process the request by default (#22185) Woosuk Kwon 2025-08-04 05:12:48 -07:00
a5fff3bd49 Fix Arcee model weight loading: Add custom load_weights (#21725) Raghav Ravishankar 2025-08-04 16:39:56 +05:30
1539ced93a [Doc] Update pooling model docs (#22186) Cyrus Leung 2025-08-04 18:37:06 +08:00
54de71d0df [Sampler] Support returning all logprobs or logits (#21792) 22quinn 2025-08-04 03:04:12 -07:00
fed5849d3f [Bugfix] Fix failing GGUF models test (#22174) Isotr0py 2025-08-04 16:27:02 +08:00
c1b4eb048a [feat] move WEIGHT_SCALE_SUPPORTED into raise block to accelerate RLHF weight loading (#21164) Weixiao Huang 2025-08-04 15:43:06 +08:00
a7b8788d2c [Misc] Modify the organization of GLM series (#22171) Jee Jee Li 2025-08-04 14:51:20 +08:00
8ecb3e9e93 [CI Bugfix] Fix wNa16 kernel not found for test_shared_storage_connector_hashes (#22163) Tyler Michael Smith 2025-08-04 01:19:04 -04:00
e5949e5ae0 Remove index_put from MM embeddings merging (#22105) Chenxi Yang 2025-08-03 22:15:14 -07:00
49bcd893e7 [refactor] improve ConstantList exception specificity (#22156) ZiTian.Zhao 2025-08-04 13:14:49 +08:00
aa7012eb6d Add tree attention backend for v1 (part 1) (#20401) Giancarlo Delfin 2025-08-03 22:13:26 -07:00
c2e75b3c11 remove duplicate code within cleanup_dist_env_and_memory (#22147) Ning Xie 2025-08-04 11:03:58 +08:00
0d7db16a92 [PD] add test for chat completions endpoint (#21925) Abirdcfly 2025-08-04 10:57:03 +08:00
845420ac2c [RLHF] Fix torch.dtype not serializable in example (#22158) 22quinn 2025-08-03 19:43:33 -07:00
e27d25a0dc [fix] fix correct assertion syntax error in attention utils. (#22154) ZiTian.Zhao 2025-08-04 10:24:02 +08:00
6f5478298d Use aiohttp connection pool for benchmarking (#21981) Seiji Eicher 2025-08-03 19:23:32 -07:00
6a39ba85fe [Bugfix] Fix failing multimodal standard test (#22153) Isotr0py 2025-08-04 03:04:38 +08:00
d3c18c9cb0 fuse fp32 for GLM-4.5 e_score_correction_bias (#22143) Yuxuan Zhang 2025-08-04 00:04:54 +08:00
83f7bbb318 Add chat doc in quick start (#21213) TankNee 2025-08-03 22:47:55 +08:00
b5dfb94fa0 [CI/Build][Bugfix] Fix Qwen2.5 tests in CPU CI via fallback silu_and_mul to torch native implementation (#22145) Li, Jiang 2025-08-03 20:34:04 +08:00
6d98843b31 [Responses API] Disable response store by default (#22137) Woosuk Kwon 2025-08-03 04:04:21 -07:00
aefeea0fde [V1] [P/D] Refactor KV Connector Path (#21980) David Ben-David 2025-08-03 14:03:40 +03:00
24d1dffbeb [executor] feat: add supports_pp attr to executors (#21786) H 2025-08-03 03:04:45 -07:00
7de45db9a5 [Misc] update doc comment for send (#22026) Ning Xie 2025-08-03 15:55:20 +08:00
789562c28c Support CUTLASS NVFP4 (w4a4) for Blackwell Geforce GPUs (SM120) (#21309) Roberto L. Castro 2025-08-03 09:54:22 +02:00
3f36c325fa [Benchmark] Support ready check timeout in vllm bench serve (#21696) Ye (Charlotte) Qi 2025-08-03 00:52:38 -07:00
3dddbf1f25 [Misc] Add tensor schema test coverage for multimodal models (#21754) Isotr0py 2025-08-03 15:52:14 +08:00
337eb23bcc [Fix] Fix llama4 modelopt weight loading error (#22107) jiahanc 2025-08-03 00:50:34 -07:00
2ff46b8826 [Misc] Bump ray to 2.48.0 (#22123) Rui Qiao 2025-08-02 19:42:00 -07:00
554df8a6a2 Revert "[compile][startup] Disable C++ compilation of symbolic shapes" (#22122) Xiao 2025-08-02 09:03:30 -07:00
73e1b9b1d4 [xpu]support moe models on XPU platform (#21643) Yan Ma 2025-08-02 22:49:08 +08:00
4abfd8796f [V1] [Hybrid] Validate compatibility of attention backend batch reordering at init time (#21557) Thomas Parnell 2025-08-02 14:29:40 +02:00
f5d0f4784f [Frontend] Improve error message for too many mm items (#22114) Cyrus Leung 2025-08-02 17:20:38 +08:00
b690e34824 [Model] Mamba2 preallocate SSM output tensor to avoid d2d copy overhead (#21075) Chih-Chieh Yang 2025-08-02 04:59:34 -04:00
25373b6c6c for glm-4.1V update (#22000) Yuxuan Zhang 2025-08-02 16:46:57 +08:00
58eee5f2e0 [PERF] Use faster way of decode in tokenizer: avoid useless list-to-list conversion (#20000) Vadim Gimpelson 2025-08-02 12:43:52 +04:00
067c34a155 docs: remove deprecated disable-log-requests flag (#22113) Roger Wang 2025-08-02 00:19:48 -07:00
c64861d63c [Bugfix] Mamba2 remove bugged initial state condition in chunk scan (#22034) Chih-Chieh Yang 2025-08-02 02:55:57 -04:00
8564dc9448 Fix test_kv_sharing_fast_prefill flakiness (#22038) Yong Hoon Shin 2025-08-01 23:55:34 -07:00
4ac8437352 [Misc] Getting and passing ray runtime_env to workers (#22040) Rui Qiao 2025-08-01 23:54:40 -07:00
d3a6f2120b [FEAT][ROCm] Enable running Flash Attention as ViT attn backend for Qwen-VL models on ROCm platform. (#22069) vllmellm 2025-08-02 14:53:18 +08:00
0edaf752d7 [Attention][DBO] Add support for "splitting" the CommonAttentionMetadata (#21153) Sage Moore 2025-08-01 19:47:53 -07:00
6e8d8c4afb [Test] Add Unit Test for Batched DeepGEMM (#21559) Wentao Ye 2025-08-01 22:45:46 -04:00
8d524ce79f [BugFix] Improve internal DP load balancing (#21617) Nick Hill 2025-08-02 03:45:27 +01:00
9f9c38c392 [Speculators][Speculative Decoding] Add Qwen Eagle3 Support (#21835) Dipika Sikka 2025-08-01 22:43:37 -04:00
a65f46be5e [Misc] DeepGemmExperts : Avoid JIT generation in the hot-path (#21955) Varun Sundar Rabindranath 2025-08-02 08:12:03 +05:30
57393715e8 [Misc] VLLM_TARGET_DEVICE.lower() (#22101) Nicolò Lucchesi 2025-08-02 04:41:40 +02:00
ee2eb6ecd8 [Model] Qwen2.5 VL SiLU-and-Mul (#22066) vllmellm 2025-08-02 10:34:37 +08:00
23322431c8 [V1][CUDA] Full cudagraph support for FlashInfer (#21367) fhl2000 2025-08-02 09:49:34 +08:00
3654847db5 feat: Add Support GPTQ Quantization MOE on ROCM vllm serve (#21733) JartX 2025-08-02 03:12:19 +02:00
eefbf4a68b [Perf] Optimize reshape_and_cache_flash CUDA Kernel (#22036) Wentao Ye 2025-08-01 19:18:51 -04:00
88faa466d7 [CI] Initial tests for SM100 Blackwell runner (#21877) Michael Goin 2025-08-01 19:18:38 -04:00
881e1af43a [BugFix] Harden distributed DP startup (#21538) Nick Hill 2025-08-01 22:40:45 +01:00
d84b97a3e3 Add lora test for tp>1 case for TPU. (#21970) XiongfeiWei 2025-08-01 11:56:08 -07:00
d331759488 Introduce RayPPCommunicator for ray-based PP (#21660) Rui Qiao 2025-08-01 11:50:58 -07:00
9659bc7f27 [compile][startup] Disable C++ compilation of symbolic shapes (#20836) Animesh Jain 2025-08-01 10:38:52 -07:00
3277e8f9e1 Fix pre-commit failure for SECURTIY.md (#22102) Michael Goin 2025-08-01 13:36:07 -04:00
8d705996df [Misc] Minor enhancement of benchmark_moe (#22068) Jee Jee Li 2025-08-02 01:35:30 +08:00
38c8bce8b6 Enable headless models for pooling in the Transformers backend (#21767) Harry Mellor 2025-08-01 18:31:29 +01:00
ac45c44d98 [Bugfix] [Performance] DeepEPHighThroughput + DeepSeek : Quant before Dispatch (#21837) Varun Sundar Rabindranath 2025-08-01 22:44:38 +05:30
d6664664b4 security policy: take 1 (#21119) Huzaifa Sidhpurwala 2025-08-01 21:09:49 +04:00
b879ecd6e2 [Bugfix] fix when skip tokenizer init (#21922) rongfu.leng 2025-08-02 01:09:36 +08:00
3f8e952179 [Bugfix] Fix glm4.1v video inference issue (#22067) Isotr0py 2025-08-02 00:33:30 +08:00
326a1b001d Improve documentation of ModelConfig.try_get_generation_config to prevent future confusion (#21526) Harry Mellor 2025-08-01 17:32:27 +01:00
2d7b09b998 Deprecate --disable-log-requests and replace with --enable-log-requests (#21739) Harry Mellor 2025-08-01 17:16:37 +01:00
97608dc276 [Docs] use uv in CPU installation docs (#22089) David Xia 2025-08-01 10:55:55 -04:00
3146519add [BugFix] Don't change title of top-level process (#22032) Nick Hill 2025-08-01 15:37:55 +01:00
8026a335a1 [BugFix] Update AttnFusionPass cache key (#21947) Richard Zou 2025-08-01 10:11:29 -04:00
a59cd9d9f7 [Refactor] Fix Compile Warning #1444-D (#21462) Wentao Ye 2025-08-01 09:10:30 -04:00
5c54d9759d [Bugfix][PD] set max_completion_tokens=1 if req has this value (#21841) Abirdcfly 2025-08-01 21:08:45 +08:00
0a6d305e0f feat(multimodal): Add customizable background color for RGBA to RGB conversion (#22052) Gamhang 2025-08-01 21:07:33 +08:00
f81c1bb055 [Bugfix] Check NVIDIA artifactory is accessible before using flashinfer cubin kernels (#21893) Michael Goin 2025-08-01 08:28:45 -04:00
fb0e0d46fc Fix get_kwargs for case where type hint is list[Union[str, type]] (#22016) Harry Mellor 2025-08-01 13:26:42 +01:00
26b5f7bd2a [BUG] [ROCm] Fix import bug on ROCm (#22083) TJian 2025-08-01 05:25:20 -07:00
dfbc1f8880 [Speculative Decoding] Add speculators config support (#21345) Dipika Sikka 2025-08-01 08:25:18 -04:00
87c94bc879 Revert "Update sampling_metadata.py (#21937)" (#22088) Harry Mellor 2025-08-01 13:24:46 +01:00
28b18cc741 [Quantization] Enable BNB support for InternS1 (#21953) Jee Jee Li 2025-08-01 19:09:54 +08:00
4931486988 [Doc] Added warning of speculating with draft model (#22047) WeiQing Chen 2025-08-01 17:11:56 +08:00
0f81b310db [Misc] Remove upper bound in openai package version (#22060) Woosuk Kwon 2025-08-01 02:11:40 -07:00
e6680f9e25 [Bugfix] Add log prefix in non-dp mode engine core (#21889) wuhang 2025-08-01 17:04:16 +08:00
27a145e893 [Doc] Add example for Step3-VL (#22061) Roger Wang 2025-08-01 01:35:49 -07:00

... 75 76 77 78 79 ...