Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

635b897246 [distributed] remove pynccl's redundant stream (#11744) cennn 2025-01-05 23:09:11 +08:00
4068f4b5b5 [MISC] Replace c10::optional with std::optional (#11730) Lu Fang 2025-01-04 17:20:34 -08:00
47831430cc [Bugfix][V1] Fix test_kv_cache_utils.py (#11738) Jee Jee Li 2025-01-05 00:07:59 +08:00
65c08928c2 [Model] Remove unnecessary weight initialization logic (#11736) Cyrus Leung 2025-01-04 23:46:21 +08:00
ba214dffbe [Bugfix] Fix precision error in LLaVA-NeXT (#11735) Cyrus Leung 2025-01-04 23:45:57 +08:00
eed11ebee9 [VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717) Cyrus Leung 2025-01-04 19:40:53 +08:00
300acb8347 [Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233) Yan Burman 2025-01-04 08:50:16 +02:00
d91457d529 [V1] Add kv cache utils tests. (#11513) xcnick 2025-01-04 14:49:46 +08:00
fbf2564554 [V1] Add RayExecutor support for AsyncLLM (api server) (#11712) Kunshang Ji 2025-01-04 14:41:31 +08:00
d1d49397e7 Update bnb.md with example for OpenAI (#11718) Alberto Ferrer 2025-01-04 00:29:02 -06:00
9c93636d84 Update tool_calling.md (#11701) Hust_YangXian 2025-01-04 14:16:30 +08:00
e5d7ed0c53 [V1] log GPU blocks num for MultiprocExecutor (#11656) WangErXiao 2025-01-04 08:13:12 +08:00
ad0d567e1c [V1] Chore: cruft removal (#11724) Robert Shaw 2025-01-03 18:25:02 -05:00
bf0d97d786 Update requirements-tpu.txt to support python 3.9 and 3.11 (#11695) Michael Goin 2025-01-03 17:36:46 -05:00
a655eb3025 [Misc]Add BNB quantization for Qwen2VL (#11719) Jee Jee Li 2025-01-04 06:19:02 +08:00
1543914c04 [V1] Improve TP>1 Error Handling + Stack Trace (#11721) Robert Shaw 2025-01-03 16:29:11 -05:00
61fed92c7e [Bugfix] Fix ColumnParallelLinearWithLoRA slice (#11708) ZincCat 2025-01-03 13:02:34 -08:00
80c751e7f6 [V1] Simplify Shutdown (#11659) Robert Shaw 2025-01-03 12:25:38 -05:00
e1a5c2f0a1 [Model] Whisper model implementation (#11280) Aurick Qiao 2025-01-03 03:39:19 -05:00
fd3a62a122 [perf-benchmark] Fix dependency for steps in benchmark pipeline (#11710) Kevin H. Luu 2025-01-03 13:38:37 +07:00
07064cb1d4 [Bugfix] Check chain_speculative_sampling before calling it (#11673) Lu Fang 2025-01-02 16:58:56 -08:00
2f1e8e8f54 Update default max_num_batch_tokens for chunked prefill (#11694) Sachin Varghese 2025-01-02 19:25:53 -05:00
68d37809b9 [Misc] Minimum requirements for SageMaker compatibility (#11576) Nathan Azrak 2025-01-03 10:59:25 +11:00
5dba257506 Resolve race conditions in Marlin kernel (#11493) wchen61 2025-01-03 06:58:56 +08:00
187e32997c [Bugfix] Change kv scaling factor by param json on nvidia gpu (#11688) bjmsong 2025-01-03 05:11:39 +08:00
b55ed6ef8a [V1][Minor] Optimize token_ids_cpu copy (#11692) Woosuk Kwon 2025-01-03 04:04:58 +09:00
2f385183f3 [Bugfix] Free cross attention block table for preempted-for-recompute sequence group. (#10013) Kathy Yu 2025-01-02 10:28:09 -08:00
84c35c374a According to vllm.EngineArgs, the name should be distributed_executor_backend (#11689) Chunyang Wen 2025-01-03 02:14:16 +08:00
8c38ee7007 [VLM] Merged multi-modal processor for LLaVA-NeXT (#11682) Cyrus Leung 2025-01-03 00:39:27 +08:00
b6087a6bee [mypy] Pass type checking in vllm/inputs (#11680) Tobias Pitters 2025-01-02 17:18:15 +01:00
23c1b10a4c [VLM][Bugfix] Multi-modal processor compatible with V1 multi-input (#11674) Cyrus Leung 2025-01-02 17:00:00 +08:00
a115ac46b5 [VLM] Move supported limits and max tokens to merged multi-modal processor (#11669) Cyrus Leung 2025-01-01 23:44:42 +08:00
73001445fb [V1] Implement Cascade Attention (#11635) Woosuk Kwon 2025-01-01 21:56:46 +09:00
6d70198b17 [Doc] Fix typo (#11666) Kazuhiro Serizawa 2025-01-01 17:10:10 +09:00
f962f426bc [Misc] Replace space with - in the file names (#11667) Lu Fang 2024-12-31 23:39:30 -08:00
11d8a091c6 [Misc] Optimize Qwen2-VL LoRA test (#11663) Jee Jee Li 2025-01-01 14:42:23 +08:00
365801fedd [VLM] Add max-count checking in data parser for single image models (#11661) Cyrus Leung 2025-01-01 14:15:21 +08:00
4db72e57f6 [Bugfix][Refactor] Unify model management in frontend (#11660) Joe Runde 2024-12-31 18:21:51 -08:00
0c6f998554 [Benchmark] Add benchmark script for CPU offloading (#11533) Yihua Cheng 2024-12-31 18:10:55 -06:00
e7c7c5e822 [V1][VLM] V1 support for selected single-image models. (#11632) Roger Wang 2024-12-31 13:17:22 -08:00
8c3230d8c1 [V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646) Chen Zhang 2024-12-31 16:56:01 +08:00
2c5718809b [Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565) sakunkun 2024-12-31 14:29:04 +08:00
82c49d3260 [Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909) John Giorgi 2024-12-31 01:15:58 -05:00
74fa1d123c [Bugfix] Fix OpenAI parallel sampling when using xgrammar (#11637) Michael Goin 2024-12-30 22:43:54 -05:00
a2a40bcd0d [Model][LoRA]LoRA support added for MolmoForCausalLM (#11439) Matthias Vogler 2024-12-31 02:33:06 +01:00
ccb1aabcca [benchmark] Remove dependency for H100 benchmark step (#11572) Kevin H. Luu 2024-12-30 12:27:07 -08:00
36e7670045 [Bugfix] Validate and concatenate image embeddings in MiniCPMVBaseModel (#11631) whyiug 2024-12-31 02:51:04 +08:00
5886aa496e [V1] [6/N] API Server: Better Shutdown (#11586) Robert Shaw 2024-12-30 10:51:02 -05:00
8d9b6721e7 [VLM] Abstract out multi-modal data parsing in merged processor (#11620) Cyrus Leung 2024-12-30 23:01:35 +08:00
b12e87f942 [platforms] enable platform plugins (#11602) youkaichao 2024-12-30 20:24:45 +08:00
5dbf854553 [CI/Build][CPU] Fix CPU CI by lazy importing triton FP8 kernels (#11618) Li, Jiang 2024-12-30 18:17:04 +08:00
970d6d0776 [Build][Kernel] Update CUTLASS to v3.6.0 (#11607) Tyler Michael Smith 2024-12-30 04:22:13 -05:00
628ec6c17b [Docker] bump up neuron sdk v2.21 (#11593) Liangfu Chen 2024-12-29 21:46:14 -08:00
3682e33f9f [v1] fix compilation cache (#11598) youkaichao 2024-12-30 12:24:12 +08:00
0aa38d16f5 Remove print statement in DeepseekScalingRotaryEmbedding (#11604) Michael Goin 2024-12-29 15:16:46 -05:00
faef77c0d6 [Misc] KV cache transfer connector registry (#11481) Kuntai Du 2024-12-29 10:08:09 -06:00
dba4d9dec6 [v1][bugfix] fix cudagraph with inplace buffer assignment (#11596) youkaichao 2024-12-29 17:03:49 +08:00
32b4c63f02 [Doc] Convert list tables to MyST (#11594) Cyrus Leung 2024-12-29 15:56:22 +08:00
4fb8e329fd [V1] [5/N] API Server: unify Detokenizer and EngineCore input (#11545) Robert Shaw 2024-12-28 15:51:57 -05:00
328841d002 [bugfix] interleaving sliding window for cohere2 model (#11583) youkaichao 2024-12-29 00:55:42 +08:00
d427e5cfda [Doc] Minor documentation fixes (#11580) Cyrus Leung 2024-12-28 21:53:59 +08:00
42bb201fd6 [V1][Minor] Set pin_memory=False for token_ids_cpu tensor (#11581) Woosuk Kwon 2024-12-28 22:33:12 +09:00
59d6bb4c86 [Hardware][AMD]: Replace HIPCC version with more precise ROCm version (#11515) hj-wei 2024-12-28 19:17:35 +08:00
b7dcc003dc [Model] Remove hardcoded image tokens ids from Pixtral (#11582) Roger Wang 2024-12-28 02:54:23 -08:00
d34be24bb1 [Model] Support InternLM2 Reward models (#11571) Isotr0py 2024-12-28 14:14:10 +08:00
b5cbe8eeb3 [Bugfix] Last token measurement fix (#11376) Rajveer Bachkaniwala 2024-12-27 22:34:46 -05:00
df04dffade [V1] [4/N] API Server: ZMQ/MP Utilities (#11541) Robert Shaw 2024-12-27 20:45:08 -05:00
a60731247f [Doc] Update mllama example based on official doc (#11567) Chen Zhang 2024-12-28 08:31:10 +08:00
ac79799403 [Bugfix] Fix for ROCM compressed tensor support (#11561) Selali 2024-12-27 12:12:11 -08:00
dde1fa18c9 [Misc] Improve BNB loader to handle mixture of sharded and merged weights with same suffix (#11566) Isotr0py 2024-12-28 03:45:13 +08:00
0240402c46 [Misc]Add BNB quantization for MolmoForCausalLM (#11551) Jee Jee Li 2024-12-28 02:48:24 +08:00
55509c2114 [MODEL] LoRA support for Jamba model (#11209) ErezSC42 2024-12-27 19:58:21 +02:00
101418096f [VLM] Support caching in merged multi-modal processor (#11396) Cyrus Leung 2024-12-28 01:22:48 +08:00
5ce4627a7e [Doc] Add xgrammar in doc (#11549) Chen1022 2024-12-27 21:05:10 +08:00
7af553ea30 [Misc] Abstract the logic for reading and writing media content (#11527) Cyrus Leung 2024-12-27 19:21:23 +08:00
2c9b8ea2b0 [Bugfix] Fix TeleChat2ForCausalLM weights mapper (#11546) Jee Jee Li 2024-12-27 18:39:15 +08:00
d003f3ea39 Update deploying_with_k8s.md with AMD ROCm GPU example (#11465) AlexHe99 2024-12-27 18:00:04 +08:00
6c6f7fe8a8 [Platform] Move model arch check to platform (#11503) Mengqing Cao 2024-12-27 16:45:25 +08:00
2339d59f92 [BugFix] Fix quantization for all other methods (#11547) v0.6.6.post1 Robert Shaw 2024-12-27 01:23:29 -05:00
1b875a0ef3 [V1][3/N] API Server: Reduce Task Switching + Handle Abort Properly (#11534) Robert Shaw 2024-12-27 00:19:21 -05:00
eb881ed006 [misc] fix typing (#11540) youkaichao 2024-12-27 11:05:08 +08:00
46d4359450 [CI] Fix broken CI (#11543) Robert Shaw 2024-12-26 21:49:16 -05:00
81b979f2a8 [V1] Fix yapf (#11538) Woosuk Kwon 2024-12-27 09:47:10 +09:00
371d04d39b [V1] Use FlashInfer Sampling Kernel for Top-P & Top-K Sampling (#11394) Woosuk Kwon 2024-12-27 09:32:38 +09:00
0c0c2015c5 Update openai_compatible_server.md (#11536) Robert Shaw 2024-12-26 19:26:18 -05:00
82d24f7aac [Docs] Document Deepseek V3 support (#11535) Simon Mo 2024-12-26 16:21:56 -08:00
f49777ba62 Deepseek v3 (#11502) v0.6.6 Simon Mo 2024-12-26 16:09:44 -08:00
55fb97f7bd [2/N] API Server: Avoid ulimit footgun (#11530) Robert Shaw 2024-12-26 18:43:05 -05:00
2072924d14 [Model] [Quantization] Support deepseek_v3 w8a8 fp8 block-wise quantization (#11523) Michael Goin 2024-12-26 18:33:30 -05:00
720b10fdc6 [1/N] API Server (Remove Proxy) (#11529) Robert Shaw 2024-12-26 18:03:43 -05:00
b85a977822 [Doc] Add video example to openai client for multimodal (#11521) Isotr0py 2024-12-27 01:31:29 +08:00
eec906d811 [Misc] Add placeholder module (#11501) Cyrus Leung 2024-12-26 21:12:51 +08:00
f57ee5650d [Model] Modify MolmoForCausalLM MLP (#11510) Jee Jee Li 2024-12-26 21:12:05 +08:00
dcb1a944d4 [V1] Adding min tokens/repetition/presence/frequence penalties to V1 sampler (#10681) sroy745 2024-12-26 02:02:58 -08:00
7492a36207 [Doc] Add QVQ and QwQ to the list of supported models (#11509) Roger Wang 2024-12-26 01:44:32 -08:00
aa25985bd1 [Misc][LoRA] Fix LoRA weight mapper (#11495) Jee Jee Li 2024-12-26 15:52:48 +08:00
dbeac95dbb Mypy checking for vllm/compilation (#11496) Lucas Tucker 2024-12-25 23:04:07 -06:00
51a624bf02 [Misc] Move some multimodal utils to modality-specific modules (#11494) Cyrus Leung 2024-12-26 12:23:20 +08:00
6ad909fdda [Doc] Improve GitHub links (#11491) Cyrus Leung 2024-12-26 06:49:26 +08:00
b689ada91e [Frontend] Enable decord to load video from base64 (#11492) Cyrus Leung 2024-12-26 00:33:55 +08:00

... 118 119 120 121 122 ...