Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

16422ea76f [misc][plugin] add plugin system implementation (#7426) youkaichao 2024-08-13 16:24:17 -07:00
373538f973 [Misc] compressed-tensors code reuse (#7277) Kyle Sayers 2024-08-13 19:05:15 -04:00
33e5d7e6b6 [frontend] spawn engine process from api server process (#7484) youkaichao 2024-08-13 15:40:17 -07:00
c5c7768264 Announce NVIDIA Meetup (#7483) Simon Mo 2024-08-13 14:28:36 -07:00
b1e5afc3e7 [Misc] Update awq and awq_marlin to use vLLMParameters (#7422) Dipika Sikka 2024-08-13 17:08:20 -04:00
d3bdfd3ab9 [Misc] Update Fused MoE weight loading (#7334) Dipika Sikka 2024-08-13 14:57:45 -04:00
fb377d7e74 [Misc] Update gptq_marlin to use new vLLMParameters (#7281) Dipika Sikka 2024-08-13 14:30:11 -04:00
181abbc27d [Misc] Update LM Eval Tolerance (#7473) Dipika Sikka 2024-08-13 14:28:14 -04:00
00c3d68e45 [Frontend][Core] Add plumbing to support audio language models (#7446) Peter Salas 2024-08-13 10:39:33 -07:00
e20233d361 Revert "[Doc] Update supported_hardware.rst (#7276)" (#7467) Woosuk Kwon 2024-08-13 01:37:08 -07:00
d6e634f3d7 [TPU] Suppress import custom_ops warning (#7458) Woosuk Kwon 2024-08-13 00:30:30 -07:00
4d2dc5072b [hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102) youkaichao 2024-08-13 00:16:42 -07:00
7025b11d94 [Bugfix] Fix weight loading for Chameleon when TP>1 (#7410) Cyrus Leung 2024-08-13 13:33:41 +08:00
5469146bcc [ci] Remove fast check cancel workflow (#7455) Kevin H. Luu 2024-08-12 21:19:51 -07:00
97a6be95ba [Misc] improve logits processors logging message (#7435) Andrew Wang 2024-08-12 19:29:34 -07:00
9ba85bc152 [mypy] Misc. typing improvements (#7417) Cyrus Leung 2024-08-13 09:20:20 +08:00
198d6a2898 [Core] Shut down aDAG workers with clean async llm engine exit (#7224) Rui Qiao 2024-08-12 17:57:16 -07:00
774cd1d3bf [CI/Build] bump minimum cmake version (#6999) Daniele 2024-08-13 01:29:20 +02:00
91294d56e1 [Bugfix] Handle PackageNotFoundError when checking for xpu version (#7398) sasha0552 2024-08-12 23:07:20 +00:00
a046f86397 [Core/Bugfix] Add FP8 K/V Scale and dtype conversion for prefix/prefill Triton Kernel (#7208) jon-chuang 2024-08-12 15:47:41 -07:00
4ddc4743d7 [Core] Consolidate GB constant and enable float GB arguments (#7416) Cyrus Leung 2024-08-13 05:14:14 +08:00
6aa33cb2dd [Misc] Use scalar type to dispatch to different gptq_marlin kernels (#7323) Lucas Wilkinson 2024-08-12 14:40:13 -04:00
1137f343aa [ci] Cancel fastcheck when PR is ready (#7433) Kevin H. Luu 2024-08-12 10:59:14 -07:00
9b3e2edd30 [ci] Cancel fastcheck run when PR is marked ready (#7427) Kevin H. Luu 2024-08-12 10:56:52 -07:00
65950e8f58 [ci] Entrypoints run upon changes in vllm/ (#7423) Kevin H. Luu 2024-08-12 10:18:03 -07:00
cfba4def5d [Bugfix] Fix logit soft cap in flash-attn backend (#7425) Woosuk Kwon 2024-08-12 09:58:28 -07:00
d2bc4510a4 [CI/Build] bump Dockerfile.neuron image base, use public ECR (#6832) Daniele 2024-08-12 18:53:35 +02:00
24154f8618 [Frontend] Disallow passing model as both argument and option (#7347) Cyrus Leung 2024-08-12 20:58:34 +08:00
e6e42e4b17 [Core][VLM] Support image embeddings as input (#6613) Roger Wang 2024-08-12 01:16:06 -07:00
ec2affa8ae [Kernel] Flashinfer correctness fix for v0.1.3 (#7319) Lily Liu 2024-08-12 00:59:17 -07:00
86ab567bae [CI/Build] Minor refactoring for vLLM assets (#7407) Roger Wang 2024-08-11 19:41:52 -07:00
f020a6297e [Docs] Update readme (#7316) Simon Mo 2024-08-11 17:13:37 -07:00
6c8e595710 [misc] add commit id in collect env (#7405) youkaichao 2024-08-11 15:40:48 -07:00
02b1988b9f [Doc] building vLLM with VLLM_TARGET_DEVICE=empty (#7403) tomeras91 2024-08-12 00:38:17 +03:00
386087970a [CI/Build] build on empty device for better dev experience (#4773) tomeras91 2024-08-11 23:09:44 +03:00
c08e2b3086 [core] [2/N] refactor worker_base input preparation for multi-step (#7387) William Lin 2024-08-11 08:50:08 -07:00
4fb7b52a2c Updating LM Format Enforcer version to v0.10.6 (#7189) Noam Gat 2024-08-11 15:11:50 +03:00
90bab18f24 [TPU] Use mark_dynamic to reduce compilation time (#7340) Woosuk Kwon 2024-08-10 18:12:22 -07:00
4c5d8e8ea9 [Bugfix] Fix phi3v batch inference when images have different aspect ratio (#7392) Isotr0py 2024-08-11 00:19:33 +08:00
baa240252e [Core] Fix edge case in chunked prefill + block manager v2 (#7380) Cade Daniel 2024-08-09 16:48:49 -07:00
999ef0b917 [Misc] Add numpy implementation of compute_slot_mapping (#7377) Antoni Baum 2024-08-09 15:52:29 -07:00
5c6c54d67a [Bugfix] Fix PerTensorScaleParameter weight loading for fused models (#7376) Dipika Sikka 2024-08-09 17:23:46 -04:00
933790c209 [Core] Add span metrics for model_forward, scheduler and sampler time (#7089) Mahesh Keralapura 2024-08-09 13:55:13 -07:00
70d268a399 [Bugfix] Fix ITL recording in serving benchmark (#7372) Roger Wang 2024-08-09 10:00:00 -07:00
249b88228d [Frontend] Support embeddings in the run_batch API (#7132) Pooya Davoodi 2024-08-09 09:48:21 -07:00
74af2bbd90 [Bugfix] Fix reinit procedure in ModelInputForGPUBuilder (#7360) Alexander Matveev 2024-08-09 12:35:49 -04:00
fc7b8d1eef [Performance] e2e overheads reduction: Small followup diff (#7364) Alexander Matveev 2024-08-09 11:49:36 -04:00
67abdbb42f [VLM][Doc] Add stop_token_ids to InternVL example (#7354) Isotr0py 2024-08-09 22:51:04 +08:00
07ab160741 [Model][Jamba] Mamba cache single buffer (#6739) Mor Zusman 2024-08-09 17:07:06 +03:00
b4e9528f95 [Core] Streamline stream termination in AsyncLLMEngine (#7336) Nick Hill 2024-08-09 00:06:36 -07:00
57b7be0e1c [Speculative decoding] [Multi-Step] decouple should_modify_greedy_probs_inplace (#6971) William Lin 2024-08-08 22:42:45 -07:00
99b4cf5f23 [Bugfix] Fix speculative decoding with MLPSpeculator with padded vocabulary (#7218) Travis Johnson 2024-08-08 23:08:46 -06:00
e02ac55617 [Performance] Optimize e2e overheads: Reduce python allocations (#7162) Alexander Matveev 2024-08-09 00:34:28 -04:00
73388c07a4 [TPU] Fix dockerfile.tpu (#7331) Woosuk Kwon 2024-08-08 20:24:58 -07:00
7eb4a51c5f [Core] Support serving encoder/decoder models (#7258) Cyrus Leung 2024-08-09 10:39:41 +08:00
0fa14907da [TPU] Add Load-time W8A16 quantization for TPU Backend (#7005) Siyuan Liu 2024-08-08 18:35:49 -07:00
5923532e15 Add Skywork AI as Sponsor (#7314) Simon Mo 2024-08-08 13:59:57 -07:00
a049b107e2 [Misc] Temporarily resolve the error of BitAndBytes (#7308) Jee Jee Li 2024-08-09 04:42:58 +08:00
8334c39f37 [Bugfix] Fix new Llama3.1 GGUF model loading (#7269) Isotr0py 2024-08-09 04:42:44 +08:00
e904576743 [CI/Build] Dockerfile.cpu improvements (#7298) Daniele 2024-08-08 21:24:52 +02:00
e14fb22e59 [Doc] Put collect_env issue output in a <detail> block (#7310) Michael Goin 2024-08-08 14:22:49 -04:00
782e53ab59 [Bugfix][fast] Fix the get_num_blocks_touched logic (#6849) Zach Zheng 2024-08-08 10:43:30 -07:00
21b9c49aa3 [Frontend] Kill the server on engine death (#6594) Joe Runde 2024-08-08 10:47:48 -06:00
5fb4a3f678 [Bugfix][Kernel] Increased atol to fix failing tests (#7305) Luka Govedič 2024-08-08 12:16:13 -04:00
757ac70a64 [Model] Rename MiniCPMVQwen2 to MiniCPMV2.6 (#7273) Jee Jee Li 2024-08-08 22:02:41 +08:00
6dffa4b0a6 [Bugfix] Fix LoRA with PP (#7292) Murali Andoorveedu 2024-08-08 00:02:27 -07:00
48abee9e54 [Frontend] remove max_num_batched_tokens limit for lora (#7288) Cherilyn Buren 2024-08-08 14:17:29 +08:00
746709642c [Misc] Fix typos in scheduler.py (#7285) Rui Qiao 2024-08-07 17:06:01 -07:00
e53dfd3eaf [Kernel] Fix Flashinfer Correctness (#7284) Lily Liu 2024-08-07 16:26:52 -07:00
6d94420246 [Doc] Update supported_hardware.rst (#7276) Michael Goin 2024-08-07 17:21:50 -04:00
fc1493a01e [FrontEnd] Make merge_async_iterators is_cancelled arg optional (#7282) Nick Hill 2024-08-07 13:35:14 -07:00
311f743831 [Bugfix] Fix gptq failure on T4s (#7264) Lucas Wilkinson 2024-08-07 16:05:37 -04:00
469b3bc538 [ci] Make building wheels per commit optional (#7278) Kevin H. Luu 2024-08-07 11:34:25 -07:00
5223199e03 [Bugfix][FP8] Fix dynamic FP8 Marlin quantization (#7219) Michael Goin 2024-08-07 14:23:12 -04:00
fde47d3bc2 [BugFix] Fix frontend multiprocessing hang (#7217) Maximilien de Bayser 2024-08-07 15:09:36 -03:00
0e12cd67a8 [Doc] add online speculative decoding example (#7243) Stas Bekman 2024-08-07 09:58:02 -07:00
80cbe10c59 [OpenVINO] migrate to latest dependencies versions (#7251) Ilya Lavrenov 2024-08-07 20:49:10 +04:00
b764547616 [Bugfix] Fix input processor for InternVL2 model (#7164) Isotr0py 2024-08-08 00:32:07 +08:00
ab0f5e2823 Fixes typo in function name (#7275) Rafael Vasquez 2024-08-07 12:29:27 -04:00
564985729a [ BugFix ] Move zmq frontend to IPC instead of TCP (#7222) Robert Shaw 2024-08-07 12:24:56 -04:00
0f7052bc7e [Misc] Refactor linear layer weight loading; introduce BasevLLMParameter and weight_loader_v2 (#5874) Dipika Sikka 2024-08-07 12:17:58 -04:00
639159b2a6 [distributed][misc] add specialized method for cuda platform (#7249) youkaichao 2024-08-07 08:54:52 -07:00
66d617e343 [Frontend] Gracefully handle missing chat template and fix CI failure (#7238) Cyrus Leung 2024-08-07 17:12:05 +08:00
7b261092de [BUGFIX]: top_k is expected to be an integer. (#7227) Atilla Akkuş 2024-08-07 10:32:16 +03:00
2385c8f374 [Doc] Mock new dependencies for documentation (#7245) Roger Wang 2024-08-06 23:43:03 -07:00
9a3f49ae07 [BugFix] Overhaul async request cancellation (#7111) Nick Hill 2024-08-06 22:21:41 -07:00
f9a5600649 [Bugfix] Fix GPTQ and GPTQ Marlin CPU Offloading (#7225) Michael Goin 2024-08-06 21:34:26 -04:00
fd95e026e0 [Core] Subclass ModelRunner to support cross-attention & encoder sequences (towards eventual encoder/decoder model support) (#4942) afeldman-nm 2024-08-06 16:51:47 -04:00
660470e5a3 [Core] Optimize evictor-v2 performance (#7193) xiaobochen123 2024-08-07 03:34:25 +08:00
8d59dbb000 [Kernel] Add per-tensor and per-token AZP epilogues (#5941) Luka Govedič 2024-08-06 14:17:08 -04:00
5c60c8c423 [SpecDecode] [Minor] Fix spec decode sampler tests (#7183) Lily Liu 2024-08-06 10:40:32 -07:00
00afc78590 [Bugfix] add gguf dependency (#7198) Katarzyna Papis 2024-08-06 19:08:35 +02:00
541c1852d3 [ BugFix ] Fix ZMQ when VLLM_PORT is set (#7205) Robert Shaw 2024-08-06 12:26:26 -04:00
a3bbbfa1d8 [BugFix] Fix DeepSeek remote code (#7178) Dipika Sikka 2024-08-06 11:16:53 -04:00
1f26efbb3a [Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153) Cyrus Leung 2024-08-06 16:55:31 +08:00
9118217f58 [LoRA] Relax LoRA condition (#7146) Jee Jee Li 2024-08-06 09:57:25 +08:00
e3c664bfcb [Build] Add initial conditional testing spec (#6841) Simon Mo 2024-08-05 17:39:22 -07:00
360bd67cf0 [Core] Support loading GGUF model (#5191) Isotr0py 2024-08-06 07:54:23 +08:00
ef527be06c [MISC] Use non-blocking transfer in prepare_input (#7172) Cody Yu 2024-08-05 16:41:27 -07:00
89b8db6bb2 [Bugfix] Specify device when loading LoRA and embedding tensors (#7129) Jacob Schein 2024-08-05 16:35:47 -07:00

... 135 136 137 138 139 ...