Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

992e5c3d34 Merge similar examples in offline_inference into single basic example (#12737) Harry Mellor 2025-02-20 12:53:51 +00:00
b69692a2d8 [Kernel] LoRA - Refactor sgmv kernels (#13110) Varun Sundar Rabindranath 2025-02-20 17:58:06 +05:30
a64a84433d [2/n][ci] S3: Use full model path (#13564) Kevin H. Luu 2025-02-20 01:20:15 -08:00
aa1e62d0db [ci] Fix spec decode test (#13600) Kevin H. Luu 2025-02-20 00:56:00 -08:00
497bc83124 [CI/Build] Use uv in the Dockerfile (#13566) Michael Goin 2025-02-20 02:05:44 -05:00
3738e6fa80 [API Server] Add port number range validation (#13506) Yuan Tang 2025-02-20 02:05:13 -05:00
0023cd2b9d [ROCm] MI300A compile targets deprecation (#13560) Gregory Shtrasberg 2025-02-20 02:05:00 -05:00
041e294716 [Misc] add mm_processor_kwargs to extra_body for Qwen2.5-VL (#13533) 燃 2025-02-20 15:04:30 +08:00
9621667874 [Misc] Warn if the vLLM version can't be retrieved (#13501) Alex Brooks 2025-02-19 23:24:48 -07:00
8c755c3b6d [bugfix] spec decode worker get tp group only when initialized (#13578) Simon Mo 2025-02-19 20:46:28 -08:00
ba81163997 [core] add sleep and wake up endpoint and v1 support (#12987) youkaichao 2025-02-20 12:41:17 +08:00
0d243f2a54 [ROCm][MoE] mi300 mixtral8x7B perf for specific BS (#13577) Divakar Verma 2025-02-19 22:01:02 -06:00
88f6ba3281 [ci] Add AWS creds for AMD (#13572) Kevin H. Luu 2025-02-19 19:56:06 -08:00
512368e34a [Misc] Qwen2.5 VL support LoRA (#13261) Jee Jee Li 2025-02-20 10:37:55 +08:00
473f51cfd9 [3/n][CI] Load Quantization test models with S3 (#13570) Kevin H. Luu 2025-02-19 18:12:30 -08:00
a4c402a756 [BugFix] Avoid error traceback in logs when V1 LLM terminates (#13565) Nick Hill 2025-02-19 16:49:01 -08:00
550d97eb58 [Misc] Avoid calling unnecessary hf_list_repo_files for local model path (#13348) Isotr0py 2025-02-20 02:57:48 +08:00
fbbe1fbac6 [MISC] Logging the message about Ray teardown (#13502) Cody Yu 2025-02-19 09:40:50 -08:00
01c184b8f3 Fix copyright year to auto get current year (#13561) Wilson Wu 2025-02-20 00:55:34 +08:00
ad5a35c21b [doc] clarify multi-node serving doc (#13558) youkaichao 2025-02-19 22:32:17 +08:00
5ae9f26a5a [Bugfix] Fix device ordinal for multi-node spec decode (#13269) shangmingc 2025-02-19 22:13:15 +08:00
377d10bd14 [VLM][Bugfix] Pass processor kwargs properly on init (#13516) Cyrus Leung 2025-02-19 21:13:50 +08:00
52ce14d31f [doc] clarify profiling is only for developers (#13554) youkaichao 2025-02-19 20:55:58 +08:00
81dabf24a8 [CI/Build] force writing version file (#13544) Daniele 2025-02-19 11:48:03 +01:00
423330263b [Feature] Pluggable platform-specific scheduler (#13161) Yannick Schnider 2025-02-19 10:16:38 +01:00
caf7ff4456 [V1][Core] Generic mechanism for handling engine utility (#13060) Nick Hill 2025-02-19 01:09:22 -08:00
f525c0be8b [Model][Speculative Decoding] DeepSeek MTP spec decode (#12755) Lucia Fang 2025-02-19 01:06:23 -08:00
983a40a8bb [Bugfix] Fix Positive Feature Layers in Llava Models (#13514) Alex Brooks 2025-02-19 01:50:07 -07:00
fdc5df6f54 use device param in load_model method (#13037) Zhe Zhang 2025-02-19 16:05:02 +08:00
3b05cd4555 [perf-benchmark] Fix ECR path for premerge benchmark (#13512) Kevin H. Luu 2025-02-18 23:56:11 -08:00
d5d214ac7f [1/n][CI] Load models in CI from S3 instead of HF (#13205) Kevin H. Luu 2025-02-18 23:34:59 -08:00
fd84857f64 [Doc] Add clarification note regarding paligemma (#13511) Roger Wang 2025-02-18 22:24:03 -08:00
8aada19dfc [ROCm][MoE configs] mi325 mixtral & mi300 qwen_moe (#13503) Divakar Verma 2025-02-19 00:23:24 -06:00
9aa95b0e6a [perf-benchmark] Allow premerge ECR (#13509) Kevin H. Luu 2025-02-18 21:13:41 -08:00
d0a7a2769d [Hardware][Gaudi][Feature] Support Contiguous Cache Fetch (#12139) Yu-Zhou 2025-02-19 11:40:19 +08:00
00b69c2d27 [Misc] Remove dangling references to --use-v2-block-manager (#13492) Harry Mellor 2025-02-19 03:37:26 +00:00
4c82229898 [V1][Spec Decode] Optimize N-gram matching with Numba (#13365) Woosuk Kwon 2025-02-18 13:19:58 -08:00
c8d70e2437 Pin Ray version to 2.40.0 (#13490) Woosuk Kwon 2025-02-18 12:50:31 -08:00
30172b4947 [V1] Optimize handling of sampling metadata and req_ids list (#13244) Nick Hill 2025-02-18 12:15:33 -08:00
a4d577b379 [V1][Tests] Adding additional testing for multimodal models to V1 (#13308) Murali Andoorveedu 2025-02-18 09:53:14 -08:00
7b203b7694 [misc] fix debugging code (#13487) youkaichao 2025-02-19 01:37:11 +08:00
4fb8142a0e [V1][PP] Enable true PP with Ray executor (#13472) Woosuk Kwon 2025-02-18 09:15:32 -08:00
a02c86b4dd [CI/Build] migrate static project metadata from setup.py to pyproject.toml (#8772) Daniele 2025-02-18 17:02:49 +01:00
3809458456 [Bugfix] Fix invalid rotary embedding unit test (#13431) Liangfu Chen 2025-02-18 03:52:03 -08:00
d3231cb436 [Bugfix] Handle content type with optional parameters (#13383) zifeitong 2025-02-18 03:29:13 -08:00
435b502a6e [ROCm] Make amdsmi import optional for other platforms (#13460) Cyrus Leung 2025-02-18 19:15:56 +08:00
29fc5772c4 [Bugfix] Remove noisy error logging during local model loading (#13458) Isotr0py 2025-02-18 19:15:48 +08:00
2358ca527b [Doc]: Improve feature tables (#13224) Harry Mellor 2025-02-18 10:52:39 +00:00
8cf97f8661 [Bugfix] Fix failing transformers dynamic module resolving with spawn multiproc method (#13403) Isotr0py 2025-02-18 18:25:53 +08:00
e2603fefb8 [Bugfix] Ensure LoRA path from the request can be included in err msg (#13450) Yuan Tang 2025-02-18 03:19:15 -05:00
b53d79983c Add outlines fallback when JSON schema has enum (#13449) Michael Goin 2025-02-18 01:49:41 -05:00
9915912f7f [V1][PP] Fix & Pin Ray version in requirements-cuda.txt (#13436) Woosuk Kwon 2025-02-17 21:58:06 -08:00
d1b649f1ef [Quant] Aria SupportsQuant (#13416) Kyle Sayers 2025-02-18 00:51:09 -05:00
ac19b519ed [core] fix sleep mode in pytorch 2.6 (#13456) youkaichao 2025-02-18 13:48:10 +08:00
a1074b3efe [Bugfix] Only print out chat template when supplied (#13444) Yuan Tang 2025-02-18 00:43:31 -05:00
00294e1bc6 [Quant] Arctic SupportsQuant (#13366) Kyle Sayers 2025-02-18 00:35:09 -05:00
88787bce1d [Quant] Molmo SupportsQuant (#13336) Kyle Sayers 2025-02-18 00:34:47 -05:00
932b51cedd [v1] fix parallel config rank (#13445) youkaichao 2025-02-18 12:33:45 +08:00
7c7adf81fc [ROCm] fix get_device_name for rocm (#13438) Divakar Verma 2025-02-17 22:07:12 -06:00
67ef8f666a [Model] Enable quantization support for transformers backend (#12960) Isotr0py 2025-02-18 11:52:47 +08:00
efbe854448 [Misc] Remove dangling references to SamplingType.BEAM (#13402) Harry Mellor 2025-02-18 03:52:35 +00:00
b3942e157e [Bugfix][CI][V1] Work around V1 + CUDA Graph + torch._scaled_mm fallback issue (#13425) Tyler Michael Smith 2025-02-17 19:32:48 -05:00
cd4a72a28d [V1][Spec decode] Move drafter to model runner (#13363) Woosuk Kwon 2025-02-17 15:40:12 -08:00
6ac485a953 [V1][PP] Fix intermediate tensor values (#13417) Cody Yu 2025-02-17 13:37:45 -08:00
4c21ce9eba [V1] Get input tokens from scheduler (#13339) Woosuk Kwon 2025-02-17 11:01:07 -08:00
ce77eb9410 [Bugfix] Fix VLLM_USE_MODELSCOPE issue (#13384) r.4ntix 2025-02-17 22:22:01 +08:00
30513d1cb6 [Bugfix] fix xpu communicator (#13368) Yan Ma 2025-02-17 20:59:18 +08:00
1f69c4a892 [Model] Support Mamba2 (Codestral Mamba) (#9292) Tyler Michael Smith 2025-02-17 07:17:50 -05:00
7b623fca0b [VLM] Check required fields before initializing field config in DictEmbeddingItems (#13380) Cyrus Leung 2025-02-17 17:36:07 +08:00
238dfc8ac3 [MISC] tiny fixes (#13378) Mengqing Cao 2025-02-17 16:57:13 +08:00
45186834a0 Run v1 benchmark and integrate with PyTorch OSS benchmark database (#13068) Huy Do 2025-02-17 00:16:32 -08:00
f857311d13 Fix spelling error in index.md (#13369) yankooo 2025-02-17 14:53:20 +08:00
46cdd59577 [Feature][Spec Decode] Simplify the use of Eagle Spec Decode (#12304) shangmingc 2025-02-17 11:32:26 +08:00
2010f04c17 [V1][Misc] Avoid unnecessary log output (#13289) Jee Jee Li 2025-02-17 11:26:24 +08:00
69e1d23e1e [V1][BugFix] Clean up rejection sampler & Fix warning msg (#13362) Woosuk Kwon 2025-02-16 12:25:29 -08:00
d67cc21b78 [Bugfix][Platform][CPU] Fix cuda platform detection on CPU backend edge case (#13358) Isotr0py 2025-02-17 02:55:27 +08:00
e18227b04a [V1][PP] Cache Intermediate Tensors (#13353) Woosuk Kwon 2025-02-16 10:02:27 -08:00
7b89386553 [V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359) Woosuk Kwon 2025-02-16 09:39:08 -08:00
da833b0aee [Docs] Change myenv to vllm. Update python_env_setup.inc.md (#13325) 凌 2025-02-17 00:04:21 +08:00
5d2965b7d7 [Bugfix] Fix 2 Node and Spec Decode tests (#13341) Cyrus Leung 2025-02-16 22:20:22 +08:00
a0231b7c25 [platform] add base class for communicators (#13208) youkaichao 2025-02-16 22:14:22 +08:00
124776ebd5 [ci] skip failed tests for flashinfer (#13352) youkaichao 2025-02-16 22:09:15 +08:00
b7d309860e [V1] Update doc and examples for H2O-VL (#13349) Roger Wang 2025-02-16 02:35:54 -08:00
dc0f7ccf8b [BugFix] Enhance test_pos_encoding to support execution on multi-devices (#13187) wchen61 2025-02-16 16:59:49 +08:00
d3d547e057 [Bugfix] Pin xgrammar to 0.1.11 (#13338) Michael Goin 2025-02-15 22:42:25 -05:00
12913d17ba [Quant] Add SupportsQuant to phi3 and clip (#13104) Kyle Sayers 2025-02-15 22:28:33 -05:00
80f63a3966 [V1][Spec Decode] Ngram Spec Decode (#12193) Lily Liu 2025-02-15 18:05:11 -08:00
367cb8ce8c [Doc] [2/N] Add Fuyu E2E example for multimodal processor (#13331) Cyrus Leung 2025-02-15 23:06:23 +08:00
54ed913f34 [ci/build] update flashinfer (#13323) youkaichao 2025-02-15 21:33:13 +08:00
9206b3d7ec [V1][PP] Run engine busy loop with batch queue (#13064) Cody Yu 2025-02-15 03:59:01 -08:00
ed0de3e4b8 [AMD] [Model] DeepSeek tunings (#13199) rasmith 2025-02-15 05:58:09 -06:00
2ad1bc7afe [V1][Metrics] Add iteration_tokens_total histogram from V0 (#13288) Mark McLoughlin 2025-02-15 11:56:19 +00:00
7fdaaf48ef [Bugfix] Fix qwen2.5-vl image processor (#13286) Isotr0py 2025-02-15 19:00:11 +08:00
067fa2255b [Bugfix]Fix search start_index of stop_checker (#13280) Xu Song 2025-02-15 13:39:42 +08:00
9076325677 [BugFix] Don't scan entire cache dir when loading model (#13302) Nick Hill 2025-02-14 21:33:31 -08:00
97a3d6d995 [Bugfix] Massage MLA's usage of flash attn for RoCM (#13310) Tyler Michael Smith 2025-02-15 00:33:25 -05:00
579d7a63b2 [Bugfix][Docs] Fix offline Whisper (#13274) Nicolò Lucchesi 2025-02-15 06:32:37 +01:00
c9f9d5b397 [Bugfix][AMD] Update torch_bindings so that scaled_fp4_quant isn't build on ROCm (#13235) Sage Moore 2025-02-14 20:30:42 -08:00
0c73026844 [V1][PP] Fix memory profiling in PP (#13315) Woosuk Kwon 2025-02-14 20:17:25 -08:00
6a854c7a2b [V1][Sampler] Don't apply temp for greedy-only (#13311) Nick Hill 2025-02-14 18:10:53 -08:00

... 111 112 113 114 115 ...