Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

324960a95c [TPU][CI] Update torchxla version in requirement-tpu.txt (#12422) Siyuan Liu 2025-01-24 23:23:03 -08:00
f1fc0510df [Misc] Add FA2 support to ViT MHA layer (#12355) Isotr0py 2025-01-25 15:07:35 +08:00
bf21481dde [ROCm][MoE] MI300 tuned configs Mixtral-8x(7B,22B) | fp16, fp8 (#12408) Divakar Verma 2025-01-24 22:17:19 -06:00
fb30ee92ee [Bugfix] Fix BLIP-2 processing (#12412) Cyrus Leung 2025-01-25 11:42:42 +08:00
221d388cc5 [Bugfix][Kernel] Fix moe align block issue for mixtral (#12413) ElizaWszola 2025-01-24 20:49:28 -05:00
3132a933b6 [Bugfix][Kernel] FA3 Fix - RuntimeError: This flash attention build only supports pack_gqa (for build size reasons). (#12405) Lucas Wilkinson 2025-01-24 15:20:59 -05:00
df5dafaa5b [Misc] Remove deprecated code (#12383) Cyrus Leung 2025-01-25 03:45:20 +08:00
ab5bbf5ae3 [Bugfix][Kernel] Fix CUDA 11.8 being broken by FA3 build (#12375) Lucas Wilkinson 2025-01-24 10:27:59 -05:00
3bb8e2c9a2 [Misc] Enable proxy support in benchmark script (#12356) Junichi Sato 2025-01-24 23:58:26 +09:00
e784c6b998 [ci/build] sync default value for wheel size (#12398) youkaichao 2025-01-24 17:54:29 +08:00
9a0f3bdbe5 [Hardware][Gaudi][Doc] Add missing step in setup instructions (#12382) Mohit Deopujari 2025-01-24 01:43:49 -08:00
c7c9851036 [ci/build] fix wheel size check (#12396) youkaichao 2025-01-24 17:31:25 +08:00
3c818bdb42 [Misc] Use VisionArena Dataset for VLM Benchmarking (#12389) Roger Wang 2025-01-24 00:22:04 -08:00
6dd94dbe94 [perf] fix perf regression from #12253 (#12380) youkaichao 2025-01-24 11:34:27 +08:00
0e74d797ce [V1] Increase default batch size for H100/H200 (#12369) Woosuk Kwon 2025-01-23 19:19:55 -08:00
55ef66edf4 Update compressed-tensors version (#12367) Dipika Sikka 2025-01-23 22:19:42 -05:00
5e5630a478 [Bugfix] Path join when building local path for S3 clone (#12353) omer-dayan 2025-01-24 05:06:07 +02:00
d3d6bb13fb Set weights_only=True when using torch.load() (#12366) Russell Bryant 2025-01-23 21:17:30 -05:00
24b0205f58 [V1][Frontend] Coalesce bunched RequestOutputs (#12298) Nick Hill 2025-01-23 17:17:41 -08:00
c5cffcd0cd [Docs] Update spec decode + structured output in compat matrix (#12373) Russell Bryant 2025-01-23 20:15:52 -05:00
682b55bc07 [Docs] Add meetup slides (#12345) Woosuk Kwon 2025-01-23 14:10:03 -08:00
9726ad676d [Misc] Fix OpenAI API Compatibility Issues in Benchmark Script (#12357) Junichi Sato 2025-01-24 07:02:13 +09:00
eb5cb5e528 [BugFix] Fix parameter names and process_after_weight_loading for W4A16 MoE Group Act Order (#11528) Dipika Sikka 2025-01-23 16:40:33 -05:00
2cbeedad09 [Docs] Document Phi-4 support (#12362) Isotr0py 2025-01-24 03:18:51 +08:00
2c85529bfc [TPU] Update TPU CI to use torchxla nightly on 20250122 (#12334) Siyuan Liu 2025-01-23 10:50:16 -08:00
e97f802b2d [FP8][Kernel] Dynamic kv cache scaling factors computation (#11906) Gregory Shtrasberg 2025-01-23 13:04:03 -05:00
6e650f56a1 [torch.compile] decouple compile sizes and cudagraph sizes (#12243) youkaichao 2025-01-24 02:01:30 +08:00
3f50c148fd [core] add wake_up doc and some sanity check (#12361) youkaichao 2025-01-24 02:00:50 +08:00
8c01b8022c [Bugfix] Fix broken internvl2 inference with v1 (#12360) Isotr0py 2025-01-24 01:20:33 +08:00
99d01a5e3d [V1] Simplify M-RoPE (#12352) Roger Wang 2025-01-23 07:13:23 -08:00
d07efb31c5 [Doc] Troubleshooting errors during model inspection (#12351) Cyrus Leung 2025-01-23 22:46:58 +08:00
978b45f399 [Kernel] Flash Attention 3 Support (#12093) Lucas Wilkinson 2025-01-23 09:45:48 -05:00
c5b4b11d7f [Bugfix] Fix k_proj's bias for whisper self attention (#12342) Isotr0py 2025-01-23 18:15:33 +08:00
8ae5ff2009 [Hardware][Gaudi][BugFix] Fix dataclass error due to triton package update (#12338) liuzhenwei 2025-01-23 16:35:46 +08:00
511627445e [doc] explain common errors around torch.compile (#12340) youkaichao 2025-01-23 14:56:02 +08:00
f0ef37233e [V1] Add uncache_blocks (#12333) Cody Yu 2025-01-22 20:19:21 -08:00
7551a34032 [Docs] Document vulnerability disclosure process (#12326) Russell Bryant 2025-01-22 22:44:09 -05:00
01a55941f5 [Docs] Update FP8 KV Cache documentation (#12238) Michael Goin 2025-01-22 22:18:09 -05:00
8d7aa9de71 [Bugfix] Fixing AMD LoRA CI test. (#12329) Alexei-V-Ivanov-AMD 2025-01-22 20:53:02 -06:00
68c4421b6d [AMD][Quantization] Add TritonScaledMMLinearKernel since int8 is broken for AMD (#12282) rasmith 2025-01-22 18:10:37 -06:00
aea94362c9 [Frontend][V1] Online serving performance improvements (#12287) Nick Hill 2025-01-22 14:22:12 -08:00
7206ce4ce1 [Core] Support reset_prefix_cache (#12284) Cody Yu 2025-01-22 10:52:27 -08:00
96f6a7596f [Bugfix] Fix HPU multiprocessing executor (#12167) Konrad Zawora 2025-01-22 19:07:07 +01:00
84bee4bd5c [Misc] Improve the readability of BNB error messages (#12320) Jee Jee Li 2025-01-23 00:56:54 +08:00
fc66dee76d [Misc] Fix the error in the tip for the --lora-modules parameter (#12319) Robin 2025-01-23 00:48:41 +08:00
6609cdf019 [Doc] Add docs for prompt replacement (#12318) Cyrus Leung 2025-01-22 22:56:29 +08:00
16366ee8bb [Bugfix][VLM] Fix mixed-modality inference backward compatibility for V0 (#12313) Roger Wang 2025-01-22 05:06:36 -08:00
528dbcac7d [Model][Bugfix]: correct Aria model output (#12309) zhou fan 2025-01-22 19:39:19 +08:00
cd7b6f0857 [VLM] Avoid unnecessary tokenization (#12310) Cyrus Leung 2025-01-22 19:08:31 +08:00
68ad4e3a8d [Core] Support fully transparent sleep mode (#11743) youkaichao 2025-01-22 14:39:32 +08:00
4004f144f3 [Build] update requirements of no-device (#12299) Mengqing Cao 2025-01-22 14:29:31 +08:00
66818e5b63 [core] separate builder init and builder prepare for each batch (#12253) youkaichao 2025-01-22 14:13:52 +08:00
222a9dc350 [Benchmark] More accurate TPOT calc in benchmark_serving.py (#12288) Nick Hill 2025-01-21 21:46:14 -08:00
cbdc4ad5a5 [Ci/Build] Fix mypy errors on main (#12296) Cyrus Leung 2025-01-22 12:06:54 +08:00
016e3676e7 [CI] add docker volume prune to neuron CI (#12291) Liangfu Chen 2025-01-21 18:47:49 -08:00
64ea24d0b3 [ci/lint] Add back default arg for pre-commit (#12279) Kevin H. Luu 2025-01-21 17:15:27 -08:00
df76e5af26 [VLM] Simplify post-processing of replacement info (#12269) Cyrus Leung 2025-01-22 08:48:13 +08:00
09ccc9c8f7 [Documentation][AMD] Add information about prebuilt ROCm vLLM docker for perf validation purpose (#12281) Hongxia Yang 2025-01-21 18:49:22 -05:00
69196a9bc7 [BUGFIX] When skip_tokenize_init and multistep are set, execution crashes (#12277) Aleksandr Malyshev 2025-01-21 15:30:46 -08:00
2acba47d9b [bugfix] moe tuning. rm is_navi() (#12273) Divakar Verma 2025-01-21 16:47:32 -06:00
9c485d9e25 [Core] Free CPU pinned memory on environment cleanup (#10477) Jani Monoses 2025-01-21 21:56:41 +02:00
fa9ee08121 [Misc] Set default backend to SDPA for get_vit_attn_backend (#12235) wangxiyuan 2025-01-22 03:52:11 +08:00
347eeebe3b [Misc] Remove experimental dep from tracing.py (#12007) Adrian Cole 2025-01-21 11:51:55 -08:00
18fd4a8331 [Bugfix] Multi-sequence broken (#11898) Andy Lo 2025-01-21 19:51:35 +00:00
132a132100 [v1][stats][1/n] Add RequestStatsUpdate and RequestStats types (#10907) Ricky Xu 2025-01-21 11:51:13 -08:00
1e60f87bb3 [Kernel] fix moe_align_block_size error condition (#12239) Jinzhen Lin 2025-01-22 02:30:28 +08:00
9705b90bcf [Bugfix] fix race condition that leads to wrong order of token returned (#10802) Jannis Schönleber 2025-01-21 18:47:04 +01:00
3aec49e56f [ci/build] update nightly torch for gh200 test (#12270) youkaichao 2025-01-21 23:03:17 +08:00
c64612802b [Platform] improve platforms getattr (#12264) Mengqing Cao 2025-01-21 22:42:41 +08:00
9a7c3a0042 Remove pytorch comments for outlines + compressed-tensors (#12260) Thomas Parnell 2025-01-21 14:49:08 +01:00
b197a5ccfd [V1][Bugfix] Fix data item ordering in mixed-modality inference (#12259) Roger Wang 2025-01-21 05:18:43 -08:00
c81081fece [torch.compile] transparent compilation with more logging (#12246) youkaichao 2025-01-21 19:32:55 +08:00
a94eee4456 [Bugfix] Fix mm_limits access for merged multi-modal processor (#12252) Cyrus Leung 2025-01-21 18:09:39 +08:00
f2e9f2a3be [Misc] Remove redundant TypeVar from base model (#12248) Cyrus Leung 2025-01-21 16:40:39 +08:00
1f1542afa9 [Misc]Add BNB quantization for PaliGemmaForConditionalGeneration (#12237) Jee Jee Li 2025-01-21 15:49:08 +08:00
96912550c8 [Misc] Rename MultiModalInputsV2 -> MultiModalInputs (#12244) Cyrus Leung 2025-01-21 15:31:19 +08:00
2fc6944c5e [ci/build] disable failed and flaky tests (#12240) youkaichao 2025-01-21 13:25:03 +08:00
5fe6bf29d6 [BugFix] Fix GGUF tp>1 when vocab_size is not divisible by 64 (#12230) Nicolò Lucchesi 2025-01-21 05:23:14 +01:00
d4b62d4641 [AMD][Build] Porting dockerfiles from the ROCm/vllm fork (#11777) Gregory Shtrasberg 2025-01-20 23:22:23 -05:00
ecf67814f1 Add quantization and guided decoding CODEOWNERS (#12228) Michael Goin 2025-01-20 20:23:40 -05:00
750f4cabfa [Kernel] optimize moe_align_block_size for cuda graph and large num_experts (e.g. DeepSeek-V3) (#12222) Jinzhen Lin 2025-01-21 08:42:16 +08:00
06a760d6e8 [bugfix] catch xgrammar unsupported array constraints (#12210) Cheng Kuan Yong Jason 2025-01-21 08:42:02 +08:00
da7512215f [misc] add cuda runtime version to usage data (#12190) youkaichao 2025-01-21 08:31:01 +08:00
af69a6aded fix: update platform detection for M-series arm based MacBook processors (#12227) Işık 2025-01-20 22:23:28 +00:00
7bd3630067 [Misc] Update CODEOWNERS (#12229) Roger Wang 2025-01-20 14:19:09 -08:00
96663699b2 [CI] Pass local python version explicitly to pre-commit mypy.sh (#12224) Chen Zhang 2025-01-20 23:49:18 +08:00
18572e3384 [Bugfix] Fix HfExampleModels.find_hf_info (#12223) Cyrus Leung 2025-01-20 23:35:36 +08:00
86bfb6dba7 [Misc] Pass attention to impl backend (#12218) wangxiyuan 2025-01-20 23:25:28 +08:00
5f0ec3935a [V1] Remove _get_cache_block_size (#12214) Chen Zhang 2025-01-20 21:54:16 +08:00
c222f47992 [core][bugfix] configure env var during import vllm (#12209) youkaichao 2025-01-20 19:35:59 +08:00
170eb35079 [misc] print a message to suggest how to bypass commit hooks (#12217) youkaichao 2025-01-20 18:06:24 +08:00
b37d82791e [Model] Upgrade Aria to transformers 4.48 (#12203) Cyrus Leung 2025-01-20 17:58:48 +08:00
3127e975fb [CI/Build] Make pre-commit faster (#12212) Cyrus Leung 2025-01-20 17:36:24 +08:00
4001ea1266 [CI/Build] Remove dummy CI steps (#12208) Cyrus Leung 2025-01-20 16:41:57 +08:00
5c89a29c22 [misc] add placeholder format.sh (#12206) youkaichao 2025-01-20 16:04:49 +08:00
59a0192fb9 [Core] Interface for accessing model from VllmRunner (#10353) Cyrus Leung 2025-01-20 15:00:59 +08:00
83609791d2 [Model] Add Qwen2 PRM model support (#12202) Isotr0py 2025-01-20 14:59:46 +08:00
0974c9bc5c [Bugfix] Fix incorrect types in LayerwiseProfileResults (#12196) Yuan Tang 2025-01-20 01:59:20 -05:00
d2643128f7 [DOC] Add missing docstring in LLMEngine.add_request() (#12195) Yuan Tang 2025-01-20 01:59:00 -05:00
c5c06209ec [DOC] Fix typo in docstring and assert message (#12194) Yuan Tang 2025-01-20 01:58:29 -05:00

... 115 116 117 118 119 ...