Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

540c0368b1 [Model] Initialize Fuyu-8B support (#3924) Isotr0py 2024-07-14 13:27:14 +08:00
fb6af8bc08 [ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417) Robert Shaw 2024-07-13 23:03:58 -04:00
eeceadaecc [Misc] Add deprecation warning for beam search (#6402) Woosuk Kwon 2024-07-13 11:52:22 -07:00
babf52dade [ Misc ] More Cleanup of Marlin (#6359) Robert Shaw 2024-07-13 06:21:37 -04:00
9da4aad44b Updating LM Format Enforcer version to v10.3 (#6411) Noam Gat 2024-07-13 13:09:12 +03:00
41708e5034 [ci] try to add multi-node tests (#6280) youkaichao 2024-07-12 21:51:48 -07:00
d80aef3776 [Docs] Clean up latest news (#6401) Woosuk Kwon 2024-07-12 19:36:53 -07:00
e1684a766a [Bugfix] Fix hard-coded value of x in context_attention_fwd (#6373) Thomas Parnell 2024-07-13 03:30:54 +02:00
a27f87da34 [Doc] Fix Typo in Doc (#6392) Saliya Ekanayake 2024-07-12 17:48:23 -07:00
16ff6bd58c [ci] Fix wording for GH bot (#6398) Kevin H. Luu 2024-07-12 16:34:37 -07:00
f8f9ff57ee [Bugfix][TPU] Fix megacore setting for v5e-litepod (#6397) Woosuk Kwon 2024-07-12 15:59:47 -07:00
6bc9710f6e Fix release pipeline's dir permission (#6391) Simon Mo 2024-07-12 15:52:43 -07:00
111fc6e7ec [Misc] Add generated git commit hash as vllm.__commit__ (#6386) Michael Goin 2024-07-12 18:52:15 -04:00
75f64d8b94 [Bugfix] Fix illegal memory access in FP8 MoE kernel (#6382) Cody Yu 2024-07-12 14:33:33 -07:00
21b2dcedab Fix release pipeline's -e flag (#6390) Simon Mo 2024-07-12 14:08:04 -07:00
07b35af86d Fix interpolation in release pipeline (#6389) Simon Mo 2024-07-12 14:03:39 -07:00
bb1a784b05 Fix release-pipeline.yaml (#6388) Simon Mo 2024-07-12 14:00:57 -07:00
d719ba24c5 Build some nightly wheels by default (#6380) Simon Mo 2024-07-12 13:56:59 -07:00
aa48e502fb [MISC] Upgrade dependency to PyTorch 2.3.1 (#5327) Cody Yu 2024-07-12 12:04:26 -07:00
4dbebd03cc [ci] Add GHA workflows to enable full CI run (#6381) Kevin H. Luu 2024-07-12 11:36:26 -07:00
b75bce1008 [ci] Add grouped tests & mark tests to run by default for fastcheck pipeline (#6365) Kevin H. Luu 2024-07-12 09:58:38 -07:00
b039cbbce3 [Misc] add fixture to guided processor tests (#6341) Yihuan Bu 2024-07-12 12:55:39 -04:00
f9d25c2519 [Build/CI] Checking/Waiting for the GPU's clean state (#6379) Alexei-V-Ivanov-AMD 2024-07-12 11:42:24 -05:00
024ad87cdc [Bugfix] Fix dtype mismatch in PaliGemma (#6367) Cyrus Leung 2024-07-12 23:22:18 +08:00
aea19f0989 [ Misc ] Support Models With Bias in compressed-tensors integration (#6356) Robert Shaw 2024-07-12 11:11:29 -04:00
f7160d946a [Misc][Bugfix] Update transformers for tokenizer issue (#6364) Roger Wang 2024-07-12 01:40:07 -07:00
6047187cd8 [ Misc ] Remove separate bias add (#6353) Robert Shaw 2024-07-12 01:06:09 -04:00
b6c16cf8ff [ROCm][AMD] unify CUDA_VISIBLE_DEVICES usage in cuda/rocm (#6352) Hongxia Yang 2024-07-12 00:30:46 -04:00
d26a8b3f1f [CI/Build] (2/2) Switching AMD CI to store images in Docker Hub (#6350) adityagoel14 2024-07-12 00:26:26 -04:00
d59eb98489 [Model][Phi3-Small] Remove scipy from blocksparse_attention (#6343) Michael Goin 2024-07-11 22:47:17 -04:00
adf32e0a0f [Bugfix] Fix usage stats logging exception warning with OpenVINO (#6349) Helena Kloosterman 2024-07-12 04:47:00 +02:00
2b0fb53481 [distributed][misc] be consistent with pytorch for libcudart.so (#6346) youkaichao 2024-07-11 19:35:17 -07:00
d6ab528997 [Misc] Remove flashinfer warning, add flashinfer tests to CI (#6351) Lily Liu 2024-07-11 18:32:06 -07:00
7ed6a4f0e1 [ BugFix ] Prompt Logprobs Detokenization (#6223) Robert Shaw 2024-07-11 18:02:29 -04:00
a4feba929b [CI/Build] Add nightly benchmarking for tgi, tensorrt-llm and lmdeploy (#5362) Kuntai Du 2024-07-11 13:28:38 -07:00
2d23b42d92 [doc] update pipeline parallel in readme (#6347) youkaichao 2024-07-11 11:38:40 -07:00
1df43de9bb [bug fix] Fix llava next feature size calculation. (#6339) xwjiang2010 2024-07-11 10:21:10 -07:00
52b7fcb35a Benchmark: add H100 suite (#6047) Simon Mo 2024-07-11 09:17:07 -07:00
b675069d74 [ Misc ] Refactor Marlin Python Utilities (#6082) Robert Shaw 2024-07-11 11:40:11 -04:00
55f692b46e [BugFix] get_and_reset only when scheduler outputs are not empty (#6266) Mor Zusman 2024-07-11 17:40:20 +03:00
8a1415cf77 [Bugfix] GPTBigCodeForCausalLM: Remove lm_head from supported_lora_modules. (#6326) Thomas Parnell 2024-07-11 16:05:59 +02:00
546b101fa0 [BugFix]: fix engine timeout due to request abort (#6255) pushan 2024-07-11 21:46:31 +08:00
3963a5335b [Misc] refactor(config): clean up unused code (#6320) aniaan 2024-07-11 17:39:07 +08:00
c4774eb841 [Bugfix] Fix snapshot download in serving benchmark (#6318) Roger Wang 2024-07-11 00:04:05 -07:00
fc17110bbe [BugFix]: set outlines pkg version (#6262) Lim Xiang Yang 2024-07-11 12:37:11 +08:00
439c84581a [Doc] Update description of vLLM support for CPUs (#6003) Jie Fu (傅杰) 2024-07-11 12:15:29 +08:00
99ded1e1c4 [Doc] Remove comments incorrectly copied from another project (#6286) daquexian 2024-07-11 01:05:26 +01:00
997df46a32 [Bugfix][Neuron] Fix soft prompt method error in NeuronExecutor (#6313) Woosuk Kwon 2024-07-10 16:39:02 -07:00
ae151d73be [Speculative Decoding] Enabling bonus token in speculative decoding for KV cache based models (#5765) sroy745 2024-07-10 16:02:47 -07:00
44cc76610d [Bugfix] Fix OpenVINOExecutor abstractmethod error (#6296) sangjune.park 2024-07-11 02:03:32 +09:00
b422d4961a [CI/Build] Enable mypy typing for remaining folders (#6268) Benjamin Muskalla 2024-07-10 16:15:55 +02:00
c38eba3046 [Bugfix] MLPSpeculator: Use ParallelLMHead in tie_weights=False case. (#6303) Thomas Parnell 2024-07-10 15:04:07 +02:00
e72ae80b06 [Bugfix] Support 2D input shape in MoE layer (#6287) Woosuk Kwon 2024-07-10 06:03:16 -07:00
8a924d2248 [Doc] Guide for adding multi-modal plugins (#6205) Cyrus Leung 2024-07-10 14:55:34 +08:00
5ed3505d82 [Bugfix][TPU] Add prompt adapter methods to TPUExecutor (#6279) Woosuk Kwon 2024-07-09 19:30:56 -07:00
da78caecfa [core][distributed] zmq fallback for broadcasting large objects (#6183) youkaichao 2024-07-09 18:49:11 -07:00
2416b26e11 [Speculative Decoding] Medusa Implementation with Top-1 proposer (#4978) Abhinav Goyal 2024-07-10 07:04:02 +05:30
d3a245138a [Bugfix]fix and needs_scalar_to_array logic check (#6238) Baoyuan Qi 2024-07-10 07:43:24 +08:00
673dd4cae9 [Docs] Docs update for Pipeline Parallel (#6222) Murali Andoorveedu 2024-07-09 16:24:58 -07:00
4d6ada947c [CORE] Adding support for insertion of soft-tuned prompts (#4645) Swapnil Parekh 2024-07-09 16:26:36 -04:00
a0550cbc80 Add support for multi-node on CI (#5955) Kevin H. Luu 2024-07-09 12:56:56 -07:00
08c5bdecae [Bugfix][TPU] Fix outlines installation in TPU Dockerfile (#6256) Woosuk Kwon 2024-07-09 02:56:06 -07:00
5d5b4c5fe5 [Bugfix][TPU] Add missing None to model input (#6245) Woosuk Kwon 2024-07-09 00:21:37 -07:00
70c232f85a [core][distributed] fix ray worker rank assignment (#6235) youkaichao 2024-07-08 21:31:44 -07:00
a3c9435d93 [hardware][cuda] use device id under CUDA_VISIBLE_DEVICES for get_device_capability (#6216) youkaichao 2024-07-08 20:02:15 -07:00
4f0e0ea131 Add FlashInfer to default Dockerfile (#6172) Simon Mo 2024-07-08 13:38:03 -07:00
ddc369fba1 [Bugfix] Mamba cache Cuda Graph padding (#6214) tomeras91 2024-07-08 21:25:51 +03:00
185ad31f37 [Bugfix] use diskcache in outlines _get_guide #5436 (#6203) Eric 2024-07-09 02:23:24 +08:00
543aa48573 [Kernel] Correctly invoke prefill & decode kernels for cross-attention (towards eventual encoder/decoder model support) (#4888) afeldman-nm 2024-07-08 13:12:15 -04:00
f7a8fa39d8 [Kernel] reloading fused_moe config on the last chunk (#6210) Avshalom Manevich 2024-07-08 18:00:38 +03:00
717f4bcea0 Feature/add benchmark testing (#5947) Haichuan 2024-07-08 15:52:06 +08:00
16620f439d do not exclude object field in CompletionStreamResponse (#6196) kczimm 2024-07-07 21:32:57 -05:00
3b08fe2b13 [misc][frontend] log all available endpoints (#6195) youkaichao 2024-07-07 15:11:12 -07:00
abfe705a02 [ Misc ] Support Fp8 via llm-compressor (#6110) Robert Shaw 2024-07-07 16:42:11 -04:00
333306a252 add benchmark for fix length input and output (#5857) Haichuan 2024-07-07 15:42:13 +08:00
6206dcb29e [Model] Add PaliGemma (#5189) Roger Wang 2024-07-06 18:25:50 -07:00
9389380015 [Doc] Move guide for multimodal model and other improvements (#6168) Cyrus Leung 2024-07-06 17:18:59 +08:00
175c43eca4 [Doc] Reorganize Supported Models by Type (#6167) Roger Wang 2024-07-05 22:59:36 -07:00
bc96d5c330 Move release wheel env var to Dockerfile instead (#6163) Simon Mo 2024-07-05 17:19:53 -07:00
f0250620dd Fix release wheel build env var (#6162) Simon Mo 2024-07-05 16:24:31 -07:00
2de490d60f Update wheel builds to strip debug (#6161) Simon Mo 2024-07-05 14:51:25 -07:00
79d406e918 [Docs] Fix readthedocs for tag build (#6158) v0.5.1 Simon Mo 2024-07-05 12:44:40 -07:00
abad5746a7 bump version to v0.5.1 (#6157) Simon Mo 2024-07-05 12:04:51 -07:00
e58294ddf2 [Bugfix] Add verbose error if scipy is missing for blocksparse attention (#5695) JGSweets 2024-07-05 12:41:01 -05:00
f1e15da6fe [Frontend] Continuous usage stats in OpenAI completion API (#5742) jvlunteren 2024-07-05 19:37:09 +02:00
0097bb1829 [Bugfix] Use templated datasource in grafana.json to allow automatic imports (#6136) Christian Rohmann 2024-07-05 18:49:47 +02:00
ea4b570483 [VLM] Cleanup validation and update docs (#6149) Cyrus Leung 2024-07-05 13:49:38 +08:00
a41357e941 [VLM] Improve consistency between feature size calculation and dummy data for profiling (#6146) Roger Wang 2024-07-04 18:29:47 -07:00
ae96ef8fbd [VLM] Calculate maximum number of multi-modal tokens by model (#6121) Cyrus Leung 2024-07-05 07:37:23 +08:00
69ec3ca14c [Kernel][Model] logits_soft_cap for Gemma2 with flashinfer (#6051) Lily Liu 2024-07-04 16:35:51 -07:00
81d7a50f24 [Hardware][Intel CPU] Adding intel openmp tunings in Docker file (#6008) Yuan 2024-07-05 06:22:12 +08:00
27902d42be [misc][doc] try to add warning for latest html (#5979) youkaichao 2024-07-04 09:57:09 -07:00
56b325e977 [ROCm][AMD][Model]Adding alibi slopes support in ROCm triton flash attention and naive flash attention (#6043) Gregory Shtrasberg 2024-07-04 01:19:38 -04:00
3dd507083f [CI/Build] Cleanup VLM tests (#6107) Cyrus Leung 2024-07-04 09:58:18 +08:00
0ed646b7aa [Distributed][Core] Support Py39 and Py38 for PP (#6120) Murali Andoorveedu 2024-07-03 17:52:29 -07:00
1dab9bc8a9 [Bugfix] set OMP_NUM_THREADS to 1 by default for multiprocessing (#6109) Travis Johnson 2024-07-03 17:56:59 -06:00
3de6e6a30e [core][distributed] support n layers % pp size != 0 (#6115) youkaichao 2024-07-03 16:40:31 -07:00
966fe72141 [doc][misc] bump up py version in installation doc (#6119) youkaichao 2024-07-03 15:52:04 -07:00
62963d129e [ Misc ] Clean Up CompressedTensorsW8A8 (#6113) Robert Shaw 2024-07-03 18:50:08 -04:00
d9e98f42e4 [vlm] Remove vision language config. (#6089) xwjiang2010 2024-07-03 15:14:16 -07:00

... 139 140 141 142 143 ...