Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

7a3a83e3b8 [CI/Build] Move model-specific multi-modal processing tests (#11934) Cyrus Leung 2025-01-11 13:50:05 +08:00
c32a7c7c0c [Bugfix] fused_experts_impl wrong compute type for float32 (#11921) shaochangxu 2025-01-11 13:49:39 +08:00
2118d0565c [Bugfix][SpecDecode] Adjust Eagle model architecture to align with intended design (#11672) Sungjae Lee 2025-01-11 13:49:38 +09:00
899136b857 [ci] fix broken distributed-tests-4-gpus (#11937) youkaichao 2025-01-11 09:07:24 +08:00
c9f09a4fe8 [mypy] Fix mypy warnings in api_server.py (#11941) Fred Reiss 2025-01-10 17:04:58 -08:00
d45cbe70f5 [Bugfix] Check that number of images matches number of <|image|> tokens with mllama (#11939) Travis Johnson 2025-01-10 16:26:00 -07:00
8a579408f3 [Misc] Update benchmark_prefix_caching.py fixed example usage (#11920) minmin 2025-01-11 04:39:22 +08:00
46fa98ccad [Misc] Clean up debug code in Deepseek-V3 (#11930) Isotr0py 2025-01-11 03:19:15 +08:00
aa1e77a19c [Hardware][CPU] Support MOE models on x86 CPU (#11831) Li, Jiang 2025-01-11 00:07:58 +08:00
5959564f94 Doc fix in benchmark_long_document_qa_throughput.py (#11933) Kuntai Du 2025-01-10 23:51:43 +08:00
f33e033e27 [Docs] Fix docstring in get_ip function (#11932) Kuntai Du 2025-01-10 23:51:02 +08:00
482cdc494e [Doc] Rename offline inference examples (#11927) Harry Mellor 2025-01-10 15:50:29 +00:00
20410b2fda [platform] support custom torch.compile backend key (#11318) wangxiyuan 2025-01-10 23:46:51 +08:00
12664ddda5 [Doc] [1/N] Initial guide for merged multi-modal processor (#11925) Cyrus Leung 2025-01-10 22:30:25 +08:00
241ad7b301 [ci] Fix sampler tests (#11922) youkaichao 2025-01-10 20:45:33 +08:00
d85c47d6ad Replace "online inference" with "online serving" (#11923) Harry Mellor 2025-01-10 12:05:56 +00:00
ef725feafc [platform] support pytorch custom op pluggable (#11328) wangxiyuan 2025-01-10 18:02:38 +08:00
d907be7dc7 [misc] remove python function call for custom activation op (#11885) cennn 2025-01-10 17:18:25 +08:00
d53575a5f0 [ci] fix gh200 tests (#11919) youkaichao 2025-01-10 16:25:17 +08:00
61af633256 [BUGFIX] Fix UnspecifiedPlatform package name (#11916) Kunshang Ji 2025-01-10 16:20:46 +08:00
ac2f3f7fee [Bugfix] Validate lora adapters to avoid crashing server (#11727) Joe Runde 2025-01-10 00:56:36 -07:00
cf5f000d21 [torch.compile] Hide KV cache behind torch.compile boundary (#11677) Chen Zhang 2025-01-10 13:14:42 +08:00
3de2b1eafb [Doc] Show default pooling method in a table (#11904) Cyrus Leung 2025-01-10 11:25:20 +08:00
b844b99ad3 [VLM] Enable tokenized inputs for merged multi-modal processor (#11900) Cyrus Leung 2025-01-10 11:24:00 +08:00
c3cf54dda4 [Doc][5/N] Move Community and API Reference to the bottom (#11896) Cyrus Leung 2025-01-10 11:10:12 +08:00
36f5303578 [Docs] Add Modal to deployment frameworks (#11907) Charles Frye 2025-01-09 15:26:37 -08:00
9a228348d2 [Misc] Provide correct Pixtral-HF chat template (#11891) Cyrus Leung 2025-01-10 01:19:37 +08:00
bd82872211 [ci]try to fix flaky multi-step tests (#11894) youkaichao 2025-01-09 22:47:29 +08:00
405eb8e396 [platform] Allow platform specify attention backend (#11609) wangxiyuan 2025-01-09 21:46:50 +08:00
65097ca0af [Doc] Add model development API Reference (#11884) Cyrus Leung 2025-01-09 17:43:40 +08:00
1d967acb45 [Bugfix] fix beam search input errors and latency benchmark script (#11875) Ye (Charlotte) Qi 2025-01-09 01:36:39 -08:00
0bd1ff4346 [Bugfix] Override dunder methods of placeholder modules (#11882) Cyrus Leung 2025-01-09 17:02:53 +08:00
310aca88c9 [perf]fix current stream (#11870) youkaichao 2025-01-09 15:18:21 +08:00
a732900efc [Doc] Intended links Python multiprocessing library (#11878) Guspan Tanadi 2025-01-09 12:39:39 +07:00
d848800e88 [Misc] Move print_*_once from utils to logger (#11298) Cyrus Leung 2025-01-09 12:48:12 +08:00
730e9592e9 [Doc] Recommend uv and python 3.12 for quickstart guide (#11849) Michael Goin 2025-01-08 22:37:48 -05:00
1fe554bac3 treat do_lower_case in the same way as the sentence-transformers library (#11815) Maximilien de Bayser 2025-01-09 00:05:43 -03:00
615e4a5401 [CI] Turn on basic correctness tests for V1 (#10864) Tyler Michael Smith 2025-01-08 21:20:44 -05:00
3db0cafdf1 [Docs] Add Google Cloud Meetup (#11864) Simon Mo 2025-01-08 12:38:28 -08:00
526de822d5 [Kernel][Triton][AMD] Use block size heuristic for avg 2.8x speedup for int8 models (#11698) rasmith 2025-01-08 14:23:15 -06:00
56fe4c297c [TPU][Quantization] TPU W8A8 (#11785) Robert Shaw 2025-01-08 14:33:29 -05:00
47de8821d3 [Misc]add some explanations for BlockHashType (#11847) WangErXiao 2025-01-09 02:21:30 +08:00
5984499e47 [Doc] Expand Multimodal API Reference (#11852) Cyrus Leung 2025-01-09 01:14:14 +08:00
ca47e176af [Misc] Move some model utils into vision file (#11848) Cyrus Leung 2025-01-09 01:04:46 +08:00
78f4590b60 [Bugfix][XPU] fix silu_and_mul (#11823) Yan Ma 2025-01-09 00:11:50 +08:00
2f7024987e [CI/Build][Bugfix] Fix CPU CI image clean up (#11836) Li, Jiang 2025-01-08 23:18:28 +08:00
6cd40a5bfe [Doc][4/N] Reorganize API Reference (#11843) Cyrus Leung 2025-01-08 21:34:44 +08:00
aba8d6ee00 [Doc] Move examples into categories (#11840) Harry Mellor 2025-01-08 13:09:53 +00:00
2a0596bc48 [VLM] Reorganize profiling/processing-related code (#11812) Cyrus Leung 2025-01-08 18:59:58 +08:00
f12141170a [torch.compile] consider relevant code in compilation cache (#11614) youkaichao 2025-01-08 18:46:43 +08:00
cfd3219f58 [Hardware][Apple] Native support for macOS Apple Silicon (#11696) Wallas Henrique 2025-01-08 05:35:49 -03:00
a1b2b8606e [Docs] Update sponsor name: 'Novita' to 'Novita AI' (#11833) Simon Mo 2025-01-07 23:05:46 -08:00
ad9f1aa679 [doc] update wheels url (#11830) youkaichao 2025-01-08 14:36:49 +08:00
889e662eae [misc] improve memory profiling (#11809) youkaichao 2025-01-08 14:36:03 +08:00
ef68eb28d8 [Bug] Fix pickling of ModelConfig when RunAI Model Streamer is used (#11825) Cyrus Leung 2025-01-08 13:40:09 +08:00
259abd8953 [Docs] reorganize sponsorship page (#11639) Simon Mo 2025-01-07 21:16:08 -08:00
f645eb6954 [Bugfix] Add checks for LoRA and CPU offload (#11810) Jee Jee Li 2025-01-08 13:08:48 +08:00
f4923cb8bc [OpenVINO] Fixed Docker.openvino build (#11732) Ilya Lavrenov 2025-01-08 09:08:30 +04:00
b640b19cc0 Fixed docker build for ppc64le (#11518) Nishidha 2025-01-08 10:35:37 +05:30
dc71af0a71 Remove the duplicate imports of MultiModalKwargs and PlaceholderRange… (#11824) WangErXiao 2025-01-08 12:09:25 +08:00
4d29e91be8 [Misc] sort torch profiler table by kernel timing (#11813) Divakar Verma 2025-01-07 20:57:04 -06:00
91445c7bc8 [Bugfix] Fix image input for Pixtral-HF (#11741) Cyrus Leung 2025-01-08 10:17:16 +08:00
5950f555a1 [Doc] Group examples into categories (#11782) Harry Mellor 2025-01-08 01:20:12 +00:00
a4e2b26856 [Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (#11794) Jie Fu (傅杰) 2025-01-08 08:15:50 +08:00
973f5dc581 [Doc]Add documentation for using EAGLE in vLLM (#11417) sroy745 2025-01-07 11:19:12 -08:00
c994223d56 [Bugfix] update the prefix for qwen2 (#11795) jiangjiadi 2025-01-08 02:36:34 +08:00
869579a702 [optimization] remove python function call for custom op (#11750) youkaichao 2025-01-08 01:04:28 +08:00
c0efe92d8b [Doc] Add note to gte-Qwen2 models (#11808) Cyrus Leung 2025-01-07 21:50:58 +08:00
d9fa1c05ad [doc] update how pip can install nightly wheels (#11806) youkaichao 2025-01-07 21:42:58 +08:00
2de197bdd4 [V1] Support audio language models on V1 (#11733) Roger Wang 2025-01-07 03:47:36 -08:00
869e829b85 [doc] add doc to explain how to use uv (#11773) youkaichao 2025-01-07 18:41:17 +08:00
8f37be38eb [Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (#11800) Cyrus Leung 2025-01-07 18:25:02 +08:00
8082ad7950 [V1][Doc] Update V1 support for LLaVa-NeXT-Video (#11798) Roger Wang 2025-01-07 01:55:39 -08:00
1e4ce295ae [CI][CPU] adding build number to docker image name (#11788) Yuan 2025-01-07 15:28:01 +08:00
ce1917fcf2 [Doc] Create a vulnerability management team (#9925) Russell Bryant 2025-01-07 01:57:32 -05:00
e512f76a89 fix init error for MessageQueue when n_local_reader is zero (#11768) XiaobingZhang 2025-01-07 14:12:48 +08:00
898cdf033e [CI] Fix neuron CI and run offline tests (#11779) Liangfu Chen 2025-01-06 21:36:10 -08:00
0f3f3c86ec [Bugfix] Update attention interface in Whisper (#11784) Roger Wang 2025-01-06 20:36:24 -08:00
b278557935 [Kernel][LoRA]Punica prefill kernels fusion (#11234) Jee Jee Li 2025-01-07 12:01:39 +08:00
8ceffbf315 [Doc][3/N] Reorganize Serving section (#11766) Cyrus Leung 2025-01-07 11:20:01 +08:00
d93d2d74fd [XPU] Make pp group initilized for pipeline-parallelism (#11648) YiSheng5 2025-01-07 11:09:58 +08:00
d0169e1b0f [Model] Future-proof Qwen2-Audio multi-modal processor (#11776) Cyrus Leung 2025-01-07 11:05:17 +08:00
08fb75c72e [Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (#11772) Cyrus Leung 2025-01-07 09:10:54 +08:00
91b361ae89 [V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685) Roger Wang 2025-01-06 11:58:16 -08:00
e20c92bb61 [Kernel] Move attn_type to Attention.__init__() (#11690) Chen Zhang 2025-01-07 00:11:28 +08:00
32c9eff2ff [Bugfix][V1] Fix molmo text-only inputs (#11676) Jee Jee Li 2025-01-06 23:22:25 +08:00
4ca5d40adc [doc] explain how to add interleaving sliding window support (#11771) youkaichao 2025-01-06 21:57:44 +08:00
9279b9f83d [Bugfix] Fix max image size for LLaVA-Onevision (#11769) Roger Wang 2025-01-06 05:48:53 -08:00
ee77fdb5de [Doc][2/N] Reorganize Models and Usage sections (#11755) Cyrus Leung 2025-01-06 21:40:31 +08:00
996357e480 [VLM] Separate out profiling-related logic (#11746) Cyrus Leung 2025-01-06 16:02:21 +08:00
2a622d704a k8s-config: Update the secret to use stringData (#11679) Suraj Deshmukh 2025-01-06 00:01:22 -08:00
9c749713f6 [mypy] Forward pass function type hints in lora (#11740) Lucas Tucker 2025-01-06 01:59:36 -06:00
022c5c6944 [V1] Refactor get_executor_cls (#11754) Rui Qiao 2025-01-05 23:59:16 -08:00
f8fcca100b [Misc] Fix typo for valid_tool_parses (#11753) Rui Qiao 2025-01-05 23:12:38 -08:00
06bfb51963 [V1] Add BlockTable class (#11693) Woosuk Kwon 2025-01-06 14:24:42 +09:00
408e560015 [Bugfix] Remove block size constraint (#11723) Cody Yu 2025-01-05 20:49:55 -08:00
402d378360 [Doc] [1/N] Reorganize Getting Started section (#11645) Cyrus Leung 2025-01-06 10:18:33 +08:00
9e764e7b10 [distributed] remove pynccl's redundant change_state (#11749) cennn 2025-01-06 09:05:48 +08:00
33fc1e2e86 [Frontend] Improve StreamingResponse Exception Handling (#11752) Robert Shaw 2025-01-05 16:35:01 -05:00
eba17173d3 fix: [doc] fix typo (#11751) Lancer 2025-01-06 00:48:16 +08:00

... 117 118 119 120 121 ...