Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

415f76a9cb Support mistral interleaved attn (#9414) Patrick von Platen 2024-10-16 15:28:30 +02:00
cf1d62a644 [Model] Support SDPA attention for Molmo vision backbone (#9410) Isotr0py 2024-10-16 19:52:01 +08:00
59230ef32b [Misc] Consolidate example usage of OpenAI client for multimodal models (#9412) Roger Wang 2024-10-16 04:20:51 -07:00
cee711fdbb [Core] Rename input data types (#8688) Cyrus Leung 2024-10-16 18:49:37 +08:00
1de76a0e55 [CI/Build] Test VLM embeddings (#9406) Cyrus Leung 2024-10-16 17:44:30 +08:00
7abba39ee6 [Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303) Cyrus Leung 2024-10-16 14:31:00 +08:00
7e7eae338d [Misc] Standardize RoPE handling for Qwen2-VL (#9250) Cyrus Leung 2024-10-16 13:56:17 +08:00
ed920135c8 [Bugfix] Molmo text-only input bug fix (#9397) Reza Salehi 2024-10-15 21:56:09 -07:00
717a5f82cd [Bugfix][CI/Build] Fix CUDA 11.8 Build (#9386) Lucas Wilkinson 2024-10-15 20:15:21 -04:00
ba30942240 [Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034) Chang Su 2024-10-15 15:40:43 -07:00
22f8a69549 [Misc] Directly use compressed-tensors for checkpoint definitions (#8909) Michael Goin 2024-10-15 18:40:25 -04:00
5d264f4ab8 pass ignore_eos parameter to all benchmark_serving calls (#9349) Grace Ho 2024-10-15 13:30:44 -07:00
e9d517f276 [BugFix] Fix chat API continuous usage stats (#9357) Nick Hill 2024-10-15 07:19:48 +01:00
55e081fbad [Bugfix] Update InternVL input mapper to support image embeds (#9351) hhzhang16 2024-10-14 21:29:19 -07:00
8e836d982a [Doc] Fix code formatting in spec_decode.rst (#9348) Michael Goin 2024-10-15 00:29:11 -04:00
44eaa5a5d9 [Frontend] Clarify model_type error messages (#9345) Steve Grubb 2024-10-15 00:29:01 -04:00
169b530607 [Bugfix] Clean up some cruft in mamba.py (#9343) Tyler Michael Smith 2024-10-14 20:24:25 -04:00
f0fe4fe86d [Model] Make llama3.2 support multiple and interleaved images (#9095) Xiang Xu 2024-10-14 15:24:26 -07:00
4d31cd424b [Frontend] merge beam search implementations (#9296) Brendan Wong 2024-10-14 15:05:52 -07:00
473e7b3606 [TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350) Woosuk Kwon 2024-10-14 15:02:06 -07:00
fd47e57f4b [Docs] Remove PDF build from Readtehdocs (#9347) v0.6.3 Simon Mo 2024-10-14 11:57:47 -07:00
203ab8f80f [CI/Build] setuptools-scm fixes (#8900) Daniele 2024-10-14 20:34:47 +02:00
4141608c6a [Hardware][intel GPU] add async output process for xpu (#8897) Kunshang Ji 2024-10-15 02:23:33 +08:00
dfe43a2071 [Model] Molmo vLLM Integration (#9016) Reza Salehi 2024-10-14 07:56:24 -07:00
16b24e7dcd [Bugfix] Bandaid fix for speculative decoding tests (#9327) Tyler Michael Smith 2024-10-13 19:02:11 -04:00
f519902c52 [CI] Fix merge conflict (#9317) Lily Liu 2024-10-12 23:41:23 -07:00
250e26a63e [Bugfix]Fix MiniCPM's LoRA bug (#9286) Jee Jee Li 2024-10-13 00:36:47 +08:00
2b184ddd4f [Misc][Installation] Improve source installation script and doc (#9309) Yunmeng 2024-10-13 00:36:40 +08:00
00298e092c [Bugfix] Fix bug of xformer prefill for encoder-decoder (#9026) Xiang Xu 2024-10-12 00:00:43 -07:00
89feb4c84d [SpecDec] Remove Batch Expansion (2/3) (#9298) Lily Liu 2024-10-11 22:13:37 -07:00
ec10cb8511 [BugFix] Fix tool call finish reason in streaming case (#9209) Maximilien de Bayser 2024-10-11 22:24:26 -03:00
d11b46f3a5 [bugfix] fix f-string for error (#9295) Prashant Gupta 2024-10-11 17:03:48 -07:00
c6cf9295e1 [Bugfix] Sets is_first_step_output for TPUModelRunner (#9202) Allen Wang 2024-10-11 15:28:10 -05:00
de9fb4bef8 [Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected (#9254) Lucas Wilkinson 2024-10-11 15:57:39 -04:00
8baf85e4e9 [Doc] Compatibility matrix for mutual exclusive features (#8512) Wallas Henrique 2024-10-11 15:18:50 -03:00
1a1823871d [Doc] Remove outdated comment to avoid misunderstanding (#9287) homeffjy 2024-10-12 02:02:03 +08:00
6cf1167c1a [Model] Add GLM-4v support and meet vllm==0.6.2 (#9242) sixgod 2024-10-12 01:36:13 +08:00
f710090d8e [Kernel] adding fused moe kernel config for L40S TP4 (#9245) Burkhard Ringlein 2024-10-11 08:54:22 -07:00
7342a7d7f8 [Model] Support Mamba (#6484) Tyler Michael Smith 2024-10-11 11:40:06 -04:00
df3dcdf49d [Bugfix] Fix priority in multiprocessing engine (#9277) Sebastian Schoennenbeck 2024-10-11 17:35:35 +02:00
36ea79079b [Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275) Jee Jee Li 2024-10-11 20:31:21 +08:00
e808156f30 [Misc] Collect model support info in a single process per model (#9233) Cyrus Leung 2024-10-11 19:08:11 +08:00
cbc2ef5529 [misc] hide best_of from engine (#9261) youkaichao 2024-10-10 21:30:44 -07:00
94bf9ae4e9 [Misc] Fix sampling from sonnet for long context case (#9235) Andy Dai 2024-10-10 17:33:16 -07:00
f990bab2a4 [Doc][Neuron] add note to neuron documentation about resolving triton issue (#9257) omrishiv 2024-10-10 16:36:32 -07:00
e00c094f15 [torch.compile] generic decorators (#9258) youkaichao 2024-10-10 15:54:23 -07:00
a78c6ba7c8 [ci/build] Add placeholder command for custom models test (#9262) Kevin H. Luu 2024-10-10 15:45:09 -07:00
fb870fd491 Bump actions/setup-python from 3 to 5 (#9195) dependabot[bot] 2024-10-10 13:30:46 -07:00
270953bafb Bump actions/checkout from 3 to 4 (#9196) dependabot[bot] 2024-10-10 13:30:35 -07:00
9cc811c4ff Bump actions/github-script from 6 to 7 (#9197) dependabot[bot] 2024-10-10 13:30:24 -07:00
e4d652ea3e [torch.compile] integration with compilation control (#9058) youkaichao 2024-10-10 12:39:36 -07:00
78c0b4166c Suggest codeowners for the core componenets (#9210) Simon Mo 2024-10-10 12:29:24 -07:00
21efb603f5 [CI/Build] Make the Dockerfile.cpu file's PIP_EXTRA_INDEX_URL Configurable as a Build Argument (#9252) jordanyono 2024-10-10 14:18:18 -04:00
055f3270d4 [Doc] Improve debugging documentation (#9204) Rafael Vasquez 2024-10-10 13:48:51 -04:00
18511aeda6 [Bugfix] Fix Machete unittests failing with NotImplementedError (#9218) Lucas Wilkinson 2024-10-10 13:39:56 -04:00
83ea5c72b9 [OpenVINO] Use torch 2.4.0 and newer optimim version (#9121) Ilya Lavrenov 2024-10-10 21:18:58 +04:00
04de9057ab [Model] support input image embedding for minicpmv (#9237) whyiug 2024-10-10 23:00:47 +08:00
07c11cf4d4 [Bugfix] Fix lm_head weights tying with lora for llama (#9227) Isotr0py 2024-10-10 21:11:56 +08:00
f3a507f1d3 [Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149) sroy745 2024-10-09 23:17:17 -07:00
a64e7b9407 [Bugfix] Machete garbage results for some models (large K dim) (#9212) Lucas Wilkinson 2024-10-10 02:16:17 -04:00
ce00231a8b [Bugfix] Fix Weight Loading Multiple GPU Test - Large Models (#9213) Michael Goin 2024-10-10 02:15:40 -04:00
de895f1697 [misc] improve model support check in another process (#9208) youkaichao 2024-10-09 21:58:27 -07:00
cf25b93bdd [Core] Fix invalid args to _process_request (#9201) Russell Bryant 2024-10-10 00:10:09 -04:00
d5fbb8706d [CI/Build] Update Dockerfile install+deploy image to ubuntu 22.04 (#9130) Michael Goin 2024-10-09 14:51:47 -04:00
cdca8994bd [CI/Build] mypy: check vllm/entrypoints (#9194) Russell Bryant 2024-10-09 13:15:28 -04:00
ca77dd7a44 [Hardware][CPU] Support AWQ for CPU backend (#7515) Li, Jiang 2024-10-10 00:28:08 +08:00
7dea289066 Add Dependabot configuration for GitHub Actions updates (#1217) Ewout ter Hoeven 2024-10-09 17:16:26 +02:00
cfaa6008e6 [Bugfix] Access get_vocab instead of vocab in tool parsers (#9188) Cyrus Leung 2024-10-09 22:59:57 +08:00
21906a6f50 [Bugfix] Fix lora loading for Compressed Tensors in #9120 (#9179) Ahmad Fahadh Ilyas 2024-10-09 05:10:44 -07:00
dc4aea677a [Doc] Fix VLM prompt placeholder sample bug (#9170) Jiangtao Hu 2024-10-09 16:59:42 +08:00
c8627cd41b [ci][test] use load dummy for testing (#9165) youkaichao 2024-10-09 00:38:40 -07:00
8bfaa4e31e [Bugfix] fix composite weight loading and EAGLE weight loading (#9160) Cyrus Leung 2024-10-09 15:36:55 +08:00
0b5b5d767e [Frontend] Log the maximum supported concurrency (#8831) AlpinDale 2024-10-09 07:03:14 +00:00
cdc72e3c80 [Model] Remap FP8 kv_scale in CommandR and DBRX (#9174) Hui Liu 2024-10-08 23:43:06 -07:00
7627172bf4 [Bugfix][Doc] Report neuron error in output (#9159) Joe Rowell 2024-10-09 06:43:34 +01:00
480b7f40cf [Misc] Improve validation errors around best_of and n (#9167) Travis Johnson 2024-10-08 22:54:48 -06:00
acce7630c1 Update link to KServe deployment guide (#9173) Yuan Tang 2024-10-08 23:58:49 -04:00
ffc4b27ea8 Add classifiers in setup.py (#9171) Yuan Tang 2024-10-08 22:30:48 -04:00
2f4117c38e support bitsandbytes quantization with more models (#9148) chenqianfzh 2024-10-08 18:52:19 -07:00
9ba0bd6aa6 Add lm-eval directly to requirements-test.txt (#9161) Michael Goin 2024-10-08 21:22:31 -04:00
2a131965a8 mypy: check additional directories (#9162) Russell Bryant 2024-10-08 18:08:22 -04:00
bd37b9fbe2 [Bugfix] Try to handle older versions of pytorch (#9086) bnellnm 2024-10-08 17:28:12 -04:00
de24046fcd [Doc] Improve contributing and installation documentation (#9132) Rafael Vasquez 2024-10-08 16:22:08 -04:00
1874c6a1b0 [Doc] Update vlm.rst to include an example on videos (#9155) Sayak Paul 2024-10-08 23:42:29 +05:30
9a94ca4a5d [Bugfix] fix OpenAI API server startup with --disable-frontend-multiprocessing (#8537) Daniele 2024-10-08 18:38:40 +02:00
cfba685bd4 [CI/Build] Add examples folder into Docker image so that we can leverage the templates*.jinja when serving models (#8758) Peter Pan 2024-10-09 00:37:34 +08:00
069d3bd8d0 [Frontend] Add Early Validation For Chat Template / Tool Call Parser (#9151) Alex Brooks 2024-10-08 08:31:26 -06:00
a3691b6b5e [Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131) Alex Brooks 2024-10-08 08:12:56 -06:00
8c746226c9 [Frontend] API support for beam search for MQLLMEngine (#9117) Brendan Wong 2024-10-07 22:51:43 -07:00
e1faa2a598 [misc] improve ux on readme (#9147) youkaichao 2024-10-07 22:26:25 -07:00
80b57f00d5 [Intel GPU] Fix xpu decode input (#9145) Kunshang Ji 2024-10-08 11:51:14 +08:00
04c12f8157 [misc] update utils to support comparing multiple settings (#9140) youkaichao 2024-10-07 19:51:49 -07:00
8eeb857084 Add Slack to README (#9137) Simon Mo 2024-10-07 17:06:21 -07:00
fa45513a51 [misc] fix comment and variable name (#9139) youkaichao 2024-10-07 16:07:05 -07:00
c0d9a98d0c [Doc] Include performance benchmark in README (#9135) Kuntai Du 2024-10-07 15:04:06 -07:00
e0dbdb013d [CI/Build] Add linting for github actions workflows (#7876) Russell Bryant 2024-10-07 17:18:10 -04:00
93cf74a8a7 [Doc]: Add deploying_with_k8s guide (#8451) TimWang 2024-10-08 04:31:45 +08:00
151ef4efd2 [Model] Support NVLM-D and fix QK Norm in InternViT (#9045) Cyrus Leung 2024-10-07 19:55:12 +08:00
f19da64871 [Core] Refactor GGUF parameters packing and forwarding (#8859) Isotr0py 2024-10-07 18:01:46 +08:00
4f95ffee6f [Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089) Isotr0py 2024-10-07 14:50:35 +08:00

... 128 129 130 131 132 ...