Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

bb7991aa29 [V1] Add missing tokenizer options for Detokenizer (#10288) Roger Wang 2024-11-13 03:02:56 -08:00
d909acf9fe [Model][LoRA]LoRA support added for idefics3 (#10281) B-201 2024-11-13 17:25:59 +08:00
b6dde33019 [Core] Flashinfer - Remove advance step size restriction (#10282) Pavani Majety 2024-11-13 00:29:32 -08:00
1b886aa104 [Model] Adding Support for Qwen2VL as an Embedding Model. Using MrLight/dse-qwen2-2b-mrl-v1 (#9944) Austin Veselka 2024-11-13 02:28:13 -06:00
3945c82346 [Model] Add support for Qwen2-VL video embeddings input & multiple image embeddings input with varied resolutions (#10221) 电脑星人 2024-11-13 15:07:22 +08:00
032fcf16ae [Doc] Fix typo in arg_utils.py (#10264) Xin Yang 2024-11-12 21:54:52 -08:00
56a955e774 Bump to compressed-tensors v0.8.0 (#10279) Dipika Sikka 2024-11-13 00:54:10 -05:00
bbd3e86926 [V1] Support VLMs with fine-grained scheduling (#9871) Woosuk Kwon 2024-11-12 20:53:13 -08:00
0d4ea3fb5c [core][distributed] use tcp store directly (#10275) youkaichao 2024-11-12 17:36:08 -08:00
112fa0bbe5 [V1] Fix CI tests on V1 engine (#10272) Woosuk Kwon 2024-11-12 16:17:20 -08:00
377b74fe87 Revert "[ci][build] limit cmake version" (#10271) youkaichao 2024-11-12 15:06:48 -08:00
18081451f9 [doc] improve debugging doc (#10270) youkaichao 2024-11-12 14:43:52 -08:00
96ae0eaeb2 [doc] fix location of runllm widget (#10266) youkaichao 2024-11-12 14:34:39 -08:00
1f55e05713 [V1] Enable Inductor when using piecewise CUDA graphs (#10268) Woosuk Kwon 2024-11-12 13:39:56 -08:00
8a06428c70 [LoRA] Adds support for bias in LoRA (#5733) Umesh 2024-11-12 11:08:40 -08:00
b41fb9d3b1 [Encoder Decoder] Update Mllama to run with both FlashAttention and XFormers (#9982) sroy745 2024-11-12 10:53:57 -08:00
7c65527918 [V1] Use pickle for serializing EngineCoreRequest & Add multimodal inputs to EngineCoreRequest (#10245) Woosuk Kwon 2024-11-12 08:57:14 -08:00
47db6ec831 [Frontend] Add per-request number of cached token stats (#10174) zifeitong 2024-11-12 08:42:28 -08:00
176fcb1c71 [Bugfix] Fix QwenModel argument (#10262) Jie Fu (傅杰) 2024-11-13 00:36:51 +08:00
a838ba7254 [Misc]Fix Idefics3Model argument (#10255) Jee Jee Li 2024-11-12 21:07:11 +08:00
36c513a076 [BugFix] Do not raise a ValueError when tool_choice is set to the supported none option and tools are not defined. (#10000) Guillaume Calmettes 2024-11-12 12:13:46 +01:00
d201d41973 [CI][CPU]refactor CPU tests to allow to bind with different cores (#10222) Yuan 2024-11-12 18:07:32 +08:00
3a28f18b0b [doc] explain the class hierarchy in vLLM (#10240) youkaichao 2024-11-11 22:56:44 -08:00
812c981fa0 Splitting attention kernel file (#10091) Aleksandr Malyshev 2024-11-11 22:55:07 -08:00
7f5edb5900 [Misc][LoRA] Replace hardcoded cuda device with configurable argument (#10223) Jee Jee Li 2024-11-12 11:10:15 +08:00
eea55cca5b [1/N] torch.compile user interface design (#10237) youkaichao 2024-11-11 18:01:06 -08:00
9cdba9669c [Doc] Update help text for --distributed-executor-backend (#10231) Russell Bryant 2024-11-11 20:55:09 -05:00
d1c6799b88 [doc] update debugging guide (#10236) youkaichao 2024-11-11 15:21:12 -08:00
6ace6fba2c [V1] AsyncLLM Implementation (#9826) Robert Shaw 2024-11-11 18:05:38 -05:00
08f93e7439 Make shutil rename in python_only_dev (#10233) Nikolai Shcheglov 2024-11-11 16:29:19 -06:00
9d5b4e4dea [V1] Enable custom ops with piecewise CUDA graphs (#10228) Woosuk Kwon 2024-11-11 11:58:07 -08:00
8a7fe47d32 [misc][distributed] auto port selection and disable tests (#10226) youkaichao 2024-11-11 11:54:59 -08:00
4800339c62 Add docs on serving with Llama Stack (#10183) Yuan Tang 2024-11-11 14:28:55 -05:00
fe15729a2b [V1] Use custom ops for piecewise CUDA graphs (#10227) Woosuk Kwon 2024-11-11 11:26:48 -08:00
330e82d34a [v1][torch.compile] support managing cudagraph buffer (#10203) youkaichao 2024-11-11 11:10:27 -08:00
d7a4f2207b [V1] Do not use inductor for piecewise CUDA graphs (#10225) Woosuk Kwon 2024-11-11 11:05:57 -08:00
f9dadfbee3 [V1] Fix detokenizer ports (#10224) Woosuk Kwon 2024-11-11 10:42:07 -08:00
25144ceed0 Bump actions/setup-python from 5.2.0 to 5.3.0 (#10209) dependabot[bot] 2024-11-11 17:24:10 +00:00
e6de9784d2 [core][distributed] add stateless process group (#10216) youkaichao 2024-11-11 09:02:14 -08:00
36fc439de0 [Doc] fix doc string typo in block_manager swap_out function (#10212) Yangcheng Li 2024-11-12 00:53:07 +08:00
874f551b36 [Metrics] add more metrics (#4464) harrywu 2024-11-12 00:17:38 +08:00
2cebda42bb [Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (#10218) Isotr0py 2024-11-11 20:37:58 +08:00
5fb1f935b0 [V1] Allow tokenizer_mode and trust_remote_code for Detokenizer (#10211) Roger Wang 2024-11-11 02:01:18 -08:00
36e4acd02a [LoRA][Kernel] Remove the unused libentry module (#10214) Jee Jee Li 2024-11-11 17:43:23 +08:00
58170d6503 [Hardware][CPU] Add embedding models support for CPU backend (#10193) Isotr0py 2024-11-11 16:54:28 +08:00
9804ac7c7c Bump the patch-update group with 5 updates (#10210) dependabot[bot] 2024-11-11 07:22:40 +00:00
f89d18ff74 [6/N] pass whole config to inner model (#10205) youkaichao 2024-11-10 22:41:46 -08:00
f0f2e5638e [doc] improve debugging code (#10206) youkaichao 2024-11-10 17:49:40 -08:00
ad9a78bf64 [Doc] Fix typo error in vllm/entrypoints/openai/cli_args.py (#10196) yansh97 2024-11-11 08:14:22 +08:00
73b9083e99 [misc] improve cloudpickle registration and tests (#10202) youkaichao 2024-11-10 16:10:53 -08:00
20cf2f553c [Misc] small fixes to function tracing file path (#9543) Shawn Du 2024-11-11 07:21:06 +08:00
bfb7d61a7c [doc] Polish the integration with huggingface doc (#10195) Yongzao 2024-11-11 02:22:04 +08:00
19682023b6 [Doc] Fix typo error in CONTRIBUTING.md (#10190) FuryMartin 2024-11-10 15:47:24 +08:00
9fa4bdde9d [ci][build] limit cmake version (#10188) youkaichao 2024-11-09 16:27:26 -08:00
51c2e1fcef [CI/Build] Split up models tests (#10069) Cyrus Leung 2024-11-10 03:39:14 +08:00
b09895a618 [Frontend][Core] Override HF config.json via CLI (#5836) Krishna Mandal 2024-11-09 08:19:27 -08:00
d88bff1b96 [Frontend] add add_request_id middleware (#9594) cjackal 2024-11-09 19:18:29 +09:00
9e37266420 bugfix: fix the bug that stream generate not work (#2756) Zhao Yingzhuo 2024-11-09 18:09:48 +08:00
8a4358ecb5 [doc] explaining the integration with huggingface (#10173) youkaichao 2024-11-09 01:02:54 -08:00
bd46357ad9 [bugfix] fix broken tests of mlp speculator (#10177) youkaichao 2024-11-09 00:04:50 -08:00
f192aeba74 [Bugfix] Enable some fp8 and quantized fullgraph tests (#10171) bnellnm 2024-11-09 03:01:27 -05:00
8e1529dc57 [CI/Build] Add run-hpu-test.sh script (#10167) Chendi.Xue 2024-11-09 00:26:52 -06:00
1a95f10ee7 [5/N] pass the whole config to model (#9983) youkaichao 2024-11-08 22:17:28 -08:00
49d2a41a86 [Doc] Adjust RunLLM location (#10176) Cyrus Leung 2024-11-09 12:07:10 +08:00
47672f38b5 [CI/Build] Fix VLM broadcast tests tensor_parallel_size passing (#10161) Isotr0py 2024-11-09 12:02:59 +08:00
f83feccd7f [Bugfix] Ignore GPTQ quantization of Qwen2-VL visual module (#10169) Michael Goin 2024-11-08 22:36:46 -05:00
e0191a95d8 [0/N] Rename MultiModalInputs to MultiModalKwargs (#10040) Cyrus Leung 2024-11-09 11:31:02 +08:00
d7edca1dee [CI/Build] Adding timeout in CPU CI to avoid CPU test queue blocking (#6892) Li, Jiang 2024-11-09 11:27:11 +08:00
127c07480e [Kernel][Triton] Add Triton implementation for scaled_mm_triton to support fp8 and int8 SmoothQuant, symmetric case (#9857) rasmith 2024-11-08 18:59:22 -06:00
10b67d865d [Bugfix] SymIntArrayRef expected to contain concrete integers (#10170) bnellnm 2024-11-08 17:44:18 -05:00
4f93dfe952 [torch.compile] Fuse RMSNorm with quant (#9138) Luka Govedič 2024-11-08 16:20:08 -05:00
e1b5a82179 Rename vllm.logging to vllm.logging_utils (#10134) Florian Zimmermeister 2024-11-08 21:53:24 +01:00
87713c6053 [CI/Build] Ignore .gitignored files for shellcheck (#10162) Luka Govedič 2024-11-08 14:53:36 -05:00
b5815c8413 [V1] Fix non-cudagraph op name (#10166) Woosuk Kwon 2024-11-08 10:23:04 -08:00
6b30471586 [Misc] Improve Web UI (#10090) Rafael Vasquez 2024-11-08 12:51:04 -05:00
f6778620a9 Disable spec-decode + chunked-prefill for draft models with tensor parallelism > 1 (#10136) sroy745 2024-11-08 07:56:18 -08:00
0535e5fe6c Fix edge case Mistral tokenizer (#10152) Patrick von Platen 2024-11-08 16:42:27 +01:00
b489fc3c91 [CI/Build] Update CPU tests to include all "standard" tests (#5481) Cyrus Leung 2024-11-08 23:30:04 +08:00
208ce622c7 [V1]Enable APC by default only for text models (#10148) Roger Wang 2024-11-08 06:39:41 -08:00
1ff4aed5bd [Model] Expose size to Idefics3 as mm_processor_kwargs (#10146) Isotr0py 2024-11-08 17:56:58 +08:00
f10797c0ce [Bugfix][XPU] Fix xpu tp by introducing XpuCommunicator (#10144) Yan Ma 2024-11-08 17:41:03 +08:00
f4c2187e29 [Misc] Fix typo in #5895 (#10145) Cyrus Leung 2024-11-08 17:07:01 +08:00
aea6ad629f Add hf_transfer to testing image (#10096) Michael Goin 2024-11-08 03:35:25 -05:00
da07a9ead7 Fixes a typo about 'max_decode_seq_len' which causes crashes with cuda graph. (#9285) Tao He 2024-11-08 13:31:28 +08:00
3a7f15a398 [Doc] Move CONTRIBUTING to docs site (#9924) Russell Bryant 2024-11-08 00:15:12 -05:00
7371749d54 [Misc] Fix ImportError causing by triton (#9493) Mengqing Cao 2024-11-08 13:08:51 +08:00
ad39bd640c [Bugfix] Add error handling when server cannot respond any valid tokens (#5895) DearPlanet 2024-11-08 12:58:37 +08:00
40d0e7411d [Doc] Update FAQ links in spec_decode.rst (#9662) whyiug 2024-11-08 12:44:58 +08:00
6bb52b0f97 [CI/Build] Give PR cleanup job PR write access (#10139) Russell Bryant 2024-11-07 23:10:20 -05:00
201fc07730 [V1] Prefix caching (take 2) (#9972) Cody Yu 2024-11-07 17:34:44 -08:00
42b4f46b71 [V1] Add all_token_ids attribute to Request (#10135) Woosuk Kwon 2024-11-07 17:08:24 -08:00
073a472728 [Misc] report relevant env vars in collect_env.py tool (#9293) Jiangtao Hu 2024-11-07 16:14:01 -08:00
93bff421bc Bump actions/checkout from 4.2.1 to 4.2.2 (#9746) dependabot[bot] 2024-11-07 21:44:58 +00:00
28b2877d30 Online video support for VLMs (#10020) litianjian 2024-11-08 04:25:59 +08:00
97b8475beb Bump actions/setup-python from 5.2.0 to 5.3.0 (#9745) dependabot[bot] 2024-11-07 18:55:35 +00:00
a2f1f3b089 [CI/Build] Automate PR body text cleanup (#10082) Russell Bryant 2024-11-07 13:26:28 -05:00
3be5b26a76 [CI/Build] Add shell script linting using shellcheck (#7925) Russell Bryant 2024-11-07 13:17:29 -05:00
de0e61a323 [CI/Build] Always run mypy (#10122) Russell Bryant 2024-11-07 11:43:16 -05:00
9d43afcc53 [Feature] [Spec decode]: Combine chunked prefill with speculative decoding (#9291) Nicolò Lucchesi 2024-11-07 17:15:14 +01:00
ae62fd17c0 [Frontend] Tool calling parser for Granite 3.0 models (#9027) Maximilien de Bayser 2024-11-07 12:09:02 -03:00

... 124 125 126 127 128 ...