Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

8c6de96ea1 [Model] Explicit interface for vLLM models and support OOT embedding models (#9108) Cyrus Leung 2024-10-07 14:10:35 +08:00
18b296fdb2 [core] remove beam search from the core (#9105) youkaichao 2024-10-06 22:47:04 -07:00
c8f26bb636 [BugFix][Core] Fix BlockManagerV2 when Encoder Input is None (#9103) sroy745 2024-10-06 20:52:42 -07:00
487678d046 [Bugfix][Hardware][CPU] Fix CPU model input for decode (#9044) Isotr0py 2024-10-07 10:14:27 +08:00
cb3b2b9ba4 [Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling (#9038) Varun Sundar Rabindranath 2024-10-06 15:48:11 -04:00
fdf59d30ea [Bugfix] fix tool_parser error handling when serve a model not support it (#8709) Yanyi Liu 2024-10-06 20:51:08 +08:00
b22b798471 [Model] PP support for embedding models and update docs (#9090) Cyrus Leung 2024-10-06 16:35:27 +08:00
f22619fe96 [Misc] Remove user-facing error for removed VLM args (#9104) Cyrus Leung 2024-10-06 16:33:52 +08:00
168cab6bbf [Frontend] API support for beam search (#9087) Brendan Wong 2024-10-05 23:39:03 -07:00
23fea8714a [Bugfix] Fix try-catch conditions to import correct Flash Attention Backend in Draft Model (#9101) TJian 2024-10-05 22:00:04 -07:00
f4dd830e09 [core] use forward context for flash infer (#9097) youkaichao 2024-10-05 19:37:31 -07:00
5df1834895 [Bugfix] Fix order of arguments matters in config.yaml (#8960) Andy Dai 2024-10-05 10:35:11 -07:00
cfadb9c687 [Bugfix] Deprecate registration of custom configs to huggingface (#9083) Chen Zhang 2024-10-05 06:56:40 -07:00
15986f598c [Model] Support Gemma2 embedding model (#9004) Xin Yang 2024-10-04 23:57:05 -07:00
53b3a33027 [Bugfix] Fixes Phi3v & Ultravox Multimodal EmbeddingInputs (#8979) hhzhang16 2024-10-04 22:05:37 -07:00
dac914b0d6 [Bugfix] use blockmanagerv1 for encoder-decoder (#9084) Chen Zhang 2024-10-04 21:45:38 -07:00
a95354a36e [Doc] Update README.md with Ray summit slides (#9088) Zhuohan Li 2024-10-04 19:54:45 -07:00
663874e048 [torch.compile] improve allreduce registration (#9061) youkaichao 2024-10-04 16:43:50 -07:00
cc90419e89 [Hardware][Neuron] Add on-device sampling support for Neuron (#8746) Chongming Ni 2024-10-04 16:42:20 -07:00
27302dd584 [Misc] Fix CI lint (#9085) Cody Yu 2024-10-04 16:07:54 -07:00
0cc566ca8f [Misc] Add random seed for prefix cache benchmark (#9081) Andy Dai 2024-10-04 14:58:57 -07:00
05c531be47 [Misc] Improved prefix cache example (#9077) Andy Dai 2024-10-04 14:38:42 -07:00
fbb74420e7 [CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412) Kuntai Du 2024-10-04 14:01:44 -07:00
05d686432f [Kernel] Zero point support in fused MarlinMoE kernel + AWQ Fused MoE (#8973) ElizaWszola 2024-10-04 20:34:44 +02:00
0dcc8cbe5a Adds truncate_prompt_tokens param for embeddings creation (#8999) Flávia Béo 2024-10-04 15:31:40 -03:00
26aa325f4f [Core][VLM] Test registration for OOT multimodal models (#8717) Roger Wang 2024-10-04 10:38:25 -07:00
e5dc713c23 [Hardware][PowerPC] Make oneDNN dependency optional for Power (#9039) Varad Ahirwadkar 2024-10-04 22:54:42 +05:30
36eecfbddb Remove AMD Ray Summit Banner (#9075) Simon Mo 2024-10-04 10:17:16 -07:00
9ade8bbc8d [Model] add a bunch of supported lora modules for mixtral (#9008) Prashant Gupta 2024-10-04 09:24:40 -07:00
22482e495e [Bugfix] Flash attention arches not getting set properly (#9062) Lucas Wilkinson 2024-10-04 11:43:15 -04:00
3d826d2c52 [Bugfix] Reshape the dimensions of the input image embeddings in Qwen2VL (#9071) whyiug 2024-10-04 22:34:58 +08:00
0e36fd4909 [Misc] Move registry to its own file (#9064) Cyrus Leung 2024-10-04 18:01:37 +08:00
0f6d7a9a34 [Models] Add remaining model PP support (#7168) Murali Andoorveedu 2024-10-03 19:56:58 -07:00
303d44790a [Misc] Enable multi-step output streaming by default (#9047) Michael Goin 2024-10-03 22:55:42 -04:00
aeb37c2a72 [CI/Build] Per file CUDA Archs (improve wheel size and dev build times) (#8845) Lucas Wilkinson 2024-10-03 22:55:25 -04:00
3dbb215b38 [Frontend][Feature] support tool calling for internlm/internlm2_5-7b-chat model (#8405) 代君 2024-10-04 10:36:39 +08:00
2838d6b38e [Bugfix] Weight loading fix for OPT model (#9042) Domen Vreš 2024-10-04 01:53:29 +02:00
91add85ec4 Fix failing spec decode test (#9054) sroy745 2024-10-03 16:07:29 -07:00
9aaf14c62e [misc] add forward context for attention (#9029) youkaichao 2024-10-03 12:09:42 -07:00
63e39937f9 [Frontend] [Neuron] Parse literals out of override-neuron-config (#8959) xendo 2024-10-03 20:02:07 +02:00
f5d72b2fc6 [Core] Make BlockSpaceManagerV2 the default BlockManager to use. (#8678) sroy745 2024-10-03 09:44:21 -07:00
83caf35e08 [BugFix] Enforce Mistral ToolCall id constraint when using the Mistral tool call parser (#9020) Guillaume Calmettes 2024-10-03 10:44:52 +02:00
01843c89b8 [Misc] log when using default MoE config (#8971) Divakar Verma 2024-10-02 23:31:07 -05:00
19a4dd0990 [Bugfix] example template should not add parallel_tool_prompt if tools is none (#9007) Travis Johnson 2024-10-02 21:04:17 -06:00
18c2e30c57 [Doc] Update Granite model docs (#9025) Nick Hill 2024-10-03 03:42:24 +01:00
19f0d25796 [Model] Adding Granite MoE. (#8206) Shawn Tan 2024-10-02 21:33:57 -04:00
f58d4fccc9 [OpenVINO] Enable GPU support for OpenVINO vLLM backend (#8192) Sergey Shlyapnikov 2024-10-03 01:50:01 +04:00
afb050b29d [Core] CUDA Graphs for Multi-Step + Chunked-Prefill (#8645) Varun Sundar Rabindranath 2024-10-02 15:44:39 -04:00
7f60520deb [Misc] Update Default Image Mapper Error Log (#8977) Alex Brooks 2024-10-02 05:44:38 -06:00
563649aafe [Core] Combined support for multi-step scheduling, chunked prefill & prefix caching (#8804) afeldman-nm 2024-10-02 03:52:20 -04:00
1570203864 [Spec Decode] (1/2) Remove batch expansion (#8839) Lily Liu 2024-10-01 16:04:42 -07:00
22f5851b80 Update benchmark_serving.py to read and write json-datasets, results in UTF8, for better compatibility with Windows (#8997) vlsav 2024-10-01 21:07:06 +03:00
4f341bd4bf [Doc] Update list of supported models (#8987) Cyrus Leung 2024-10-02 00:35:39 +08:00
35bd215168 [Core] [Frontend] Priority scheduling for embeddings and in the OpenAI-API (#8965) Sebastian Schoennenbeck 2024-10-01 11:58:06 +02:00
1fe0a4264a [Bugfix] Fix Token IDs Reference for MiniCPM-V When Images are Provided With No Placeholders (#8991) Alex Brooks 2024-10-01 03:52:44 -06:00
bc4eb65b54 [Bugfix] Fix Fuyu tensor parallel inference (#8986) Isotr0py 2024-10-01 17:51:41 +08:00
82f3937e59 [Misc] add process_weights_after_loading for DummyLoader (#8969) Divakar Verma 2024-09-30 22:46:41 -05:00
7da2487591 [torch.compile] fix tensor alias (#8982) youkaichao 2024-09-30 20:40:48 -07:00
aaccca2b4d [CI/Build] Fix machete generated kernel files ordering (#8976) Kevin H. Luu 2024-09-30 20:33:12 -07:00
062c89e7c9 [Frontend][Core] Move guided decoding params into sampling params (#8252) Joe Runde 2024-09-30 19:34:25 -06:00
bce324487a [CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975) Lily Liu 2024-09-30 17:51:40 -07:00
1425a1bcf9 [ci] Add CODEOWNERS for test directories (#8795) Kevin H. Luu 2024-09-30 17:47:08 -07:00
1cabfcefb6 [Misc] Adjust max_position_embeddings for LoRA compatibility (#8957) Jee Jee Li 2024-09-30 20:57:39 +08:00
be76e5aabf [Core] Make scheduling policy settable via EngineArgs (#8956) Sebastian Schoennenbeck 2024-09-30 14:28:44 +02:00
2ae25f79cf [Model] Expose InternVL2 max_dynamic_patch as a mm_processor_kwarg (#8946) Isotr0py 2024-09-30 13:01:20 +08:00
8e60afa15e [Model][LoRA]LoRA support added for MiniCPMV2.6 (#8943) Jee Jee Li 2024-09-30 12:31:55 +08:00
b6d7392579 [Misc][CI/Build] Include cv2 via mistral_common[opencv] (#8951) Roger Wang 2024-09-29 21:28:26 -07:00
e01ab595d8 [Model] support input embeddings for qwen2vl (#8856) whyiug 2024-09-30 11:16:10 +08:00
f13a07b1f8 [Kernel][Model] Varlen prefill + Prefill chunking support for mamba kernels and Jamba model (#8533) Mor Zusman 2024-09-30 00:35:58 +03:00
6c9ba48fde [Frontend] Added support for HF's new continue_final_message parameter (#8942) danieljannai21 2024-09-29 20:59:47 +03:00
1fb9c1b0bf [Misc] Fix typo in BlockSpaceManagerV1 (#8944) juncheoll 2024-09-30 00:05:54 +09:00
31f46a0d35 [BugFix] Fix seeded random sampling with encoder-decoder models (#8870) Nick Hill 2024-09-29 10:43:14 +01:00
3d49776bbb [Model][LoRA]LoRA support added for MiniCPMV2.5 (#7199) Jee Jee Li 2024-09-29 14:59:45 +08:00
bc2ef1f77c [Model] Support Qwen2.5-Math-RM-72B (#8896) Zilin Zhu 2024-09-29 12:19:39 +08:00
2e7fe7e79f [Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (#8930) Tyler Michael Smith 2024-09-28 23:13:01 -04:00
26a68d5d7e [CI/Build] Add test decorator for minimum GPU memory (#8925) Cyrus Leung 2024-09-29 10:50:51 +08:00
d081da0064 [Bugfix] Fix Marlin MoE act order when is_k_full == False (#8741) ElizaWszola 2024-09-29 03:19:40 +02:00
5bf8789b2a [Bugfix] Block manager v2 with preemption and lookahead slots (#8824) sroy745 2024-09-28 18:17:45 -07:00
d1537039ce [Core] Improve choice of Python multiprocessing method (#8823) Russell Bryant 2024-09-28 21:17:07 -04:00
cc276443b5 [doc] organize installation doc and expose per-commit docker (#8931) youkaichao 2024-09-28 17:48:41 -07:00
e585b583a9 [Bugfix] Support testing prefill throughput with benchmark_serving.py --hf-output-len 1 (#8891) Chen Zhang 2024-09-28 11:51:22 -07:00
090e945e36 [Frontend] Make beam search emulator temperature modifiable (#8928) Edouard B. 2024-09-28 20:30:21 +02:00
e1a3f5e831 [CI/Build] Update models tests & examples (#8874) Cyrus Leung 2024-09-29 00:54:35 +08:00
19d02ff938 [Bugfix] Fix PP for Multi-Step (#8887) Varun Sundar Rabindranath 2024-09-28 11:52:46 -04:00
39d3f8d94f [Bugfix] Fix code for downloading models from modelscope (#8443) tastelikefeet 2024-09-28 23:24:12 +08:00
b0298aa8cc [Misc] Remove vLLM patch of BaichuanTokenizer (#8921) Cyrus Leung 2024-09-28 16:11:25 +08:00
260024a374 [Bugfix][Intel] Fix XPU Dockerfile Build (#7824) Tyler Titsworth 2024-09-27 23:45:50 -07:00
d86f6b2afb [misc] fix wheel name (#8919) youkaichao 2024-09-27 22:10:44 -07:00
bd429f2b75 [Core] Priority-based scheduling in async engine (#8850) Sebastian Schoennenbeck 2024-09-28 00:07:10 +02:00
18e60d7d13 [misc][distributed] add VLLM_SKIP_P2P_CHECK flag (#8911) youkaichao 2024-09-27 14:27:56 -07:00
c2ec430ab5 [Core] Multi-Step + Single Step Prefills via Chunked Prefill code path (#8378) Varun Sundar Rabindranath 2024-09-27 16:32:07 -04:00
c5d55356f9 [Bugfix] fix for deepseek w4a16 (#8906) Lucas Wilkinson 2024-09-27 15:12:34 -04:00
172d1cd276 [Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271) Luka Govedič 2024-09-27 14:25:10 -04:00
a9b15c606f [torch.compile] use empty tensor instead of None for profiling (#8875) youkaichao 2024-09-27 08:11:32 -07:00
8df2dc3c88 [TPU] Update pallas.py to support trillium (#8871) Brittany 2024-09-27 01:16:55 -07:00
6d792d2f31 [Bugfix][VLM] Fix Fuyu batching inference with max_num_seqs>1 (#8892) Isotr0py 2024-09-27 16:15:58 +08:00
0e088750af [MISC] Fix invalid escape sequence '\' (#8830) Peter Pan 2024-09-27 16:13:25 +08:00
dc4e3df5c2 [misc] fix collect env (#8894) youkaichao 2024-09-27 00:26:38 -07:00
3b00b9c26c [Core] renamePromptInputs and inputs (#8876) Cyrus Leung 2024-09-27 11:35:15 +08:00
344cd2b6f4 [Feature] Add support for Llama 3.1 and 3.2 tool use (#8343) Maximilien de Bayser 2024-09-26 21:01:42 -03:00

... 129 130 131 132 133 ...