Commit Graph - vllm - Gitea: Git with a cup of tea

biondizzle/vllm

Fork 0

Commit Graph

Select branches

Hide Pull Requests

cmm

main

ci/build/22474

submission

v0.1.0

v0.1.1

v0.1.2

v0.1.3

v0.1.4

v0.1.5

v0.1.6

v0.1.7

v0.10.0

v0.10.0rc1

v0.10.0rc2

v0.10.1

v0.10.1.1

v0.10.1rc1

v0.10.2

v0.10.2rc1

v0.10.2rc2

v0.10.2rc3

v0.11.0

v0.11.0rc1

v0.11.0rc2

v0.11.0rc3

v0.11.0rc4

v0.11.0rc5

v0.11.0rc6

v0.11.1

v0.11.1rc0

v0.11.1rc1

v0.11.1rc2

v0.11.1rc3

v0.11.1rc4

v0.11.1rc5

v0.11.1rc6

v0.11.1rc7

v0.11.2

v0.12.0

v0.13.0

v0.13.0rc1

v0.13.0rc2

v0.13.0rc3

v0.13.0rc4

v0.14.0

v0.14.0rc0

v0.14.0rc1

v0.14.0rc2

v0.14.1

v0.15.0

v0.15.0rc0

v0.15.0rc1

v0.15.0rc2

v0.15.0rc3

v0.15.1

v0.15.1rc0

v0.15.1rc1

v0.15.2rc0

v0.16.0

v0.16.0rc0

v0.16.0rc1

v0.16.0rc2

v0.16.0rc3

v0.16.1rc0

v0.17.0

v0.17.0rc0

v0.17.0rc1

v0.17.1

v0.17.1rc0

v0.17.2rc0

v0.18.0

v0.18.0rc0

v0.18.0rc1

v0.18.0rc2

v0.18.1

v0.18.1rc0

v0.18.2rc0

v0.19.0

v0.19.0rc0

v0.19.0rc1

v0.19.1rc0

v0.2.0

v0.2.1

v0.2.1.post1

v0.2.2

v0.2.3

v0.2.4

v0.2.5

v0.2.6

v0.2.7

v0.3.0

v0.3.1

v0.3.2

v0.3.3

v0.4.0

v0.4.0.post1

v0.4.1

v0.4.2

v0.4.3

v0.5.0

v0.5.0.post1

v0.5.1

v0.5.2

v0.5.3

v0.5.3.post1

v0.5.4

v0.5.5

v0.6.0

v0.6.1

v0.6.1.post1

v0.6.1.post2

v0.6.2

v0.6.3

v0.6.3.post1

v0.6.4

v0.6.4.post1

v0.6.5

v0.6.6

v0.6.6.post1

v0.7.0

v0.7.1

v0.7.2

v0.7.3

v0.8.0

v0.8.0rc1

v0.8.0rc2

v0.8.1

v0.8.2

v0.8.3

v0.8.3rc1

v0.8.4

v0.8.5

v0.8.5.post1

v0.9.0

v0.9.0.1

v0.9.1

v0.9.1rc1

v0.9.1rc2

v0.9.2

v0.9.2rc1

v0.9.2rc2

6d917d0eeb Enable mypy checking on V1 code (#11105) Mark McLoughlin 2024-12-14 17:54:04 +00:00
93abf23a64 [VLM] Fully dynamic prompt replacement in merged input processor (#11199) Cyrus Leung 2024-12-15 01:52:18 +08:00
9c3dadd1c9 [Frontend] Add logits_processors as an extra completion argument (#11150) Brad Hilton 2024-12-14 09:46:42 -07:00
3cb5769883 [Misc] Minor improvements to the readability of PunicaWrapperBase (#11200) Jee Jee Li 2024-12-15 00:38:27 +08:00
ea7bd68d10 [V1][Bugfix] Fix V1 TP trust-remote-code (#11182) Tyler Michael Smith 2024-12-14 03:21:23 -05:00
48259264a4 [Core] Update outlines and increase its threadpool size (#11140) Russell Bryant 2024-12-14 02:46:18 -05:00
24a3d12b82 update compressed-tensors to latest version (#11183) dhuangnm 2024-12-13 22:22:44 -05:00
9855aea21b [Bugfix][V1] Re-compute an entire block when fully cache hit (#11186) Cody Yu 2024-12-13 17:08:23 -08:00
4b5b8a6a3b [V1][Bugfix] Fix EngineCoreProc profile (#11185) Tyler Michael Smith 2024-12-13 20:02:35 -05:00
4863e5fba5 [Core] V1: Use multiprocessing by default (#11074) Russell Bryant 2024-12-13 19:27:32 -05:00
0d8451c3a4 [Distributed] Allow the placement group more time to wait for resources to be ready (#11138) Jiaxin Shan 2024-12-13 12:17:37 -08:00
0a56bcc03d [Bugfix][Hardware][CPU] Enable Gemma2 with SDPA on CPU backend (#11169) Jani Monoses 2024-12-13 20:00:40 +02:00
0920ab9131 [Doc] Reorganize online pooling APIs (#11172) Cyrus Leung 2024-12-14 00:22:22 +08:00
238c0d93b4 [Misc] Add tokenizer_mode param to benchmark_serving.py (#11174) Alexander Matveev 2024-12-13 11:19:10 -05:00
5b0ed8391d [Bugfix] using len(tokenizer) instead of tokenizer.vocab_size in AllowedTokenIdsLogitsProcessor (#11156) zhangjf 2024-12-13 23:56:19 +08:00
c31d4a57a6 [Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240) Sungjae Lee 2024-12-14 00:51:25 +09:00
d1fa714cb1 [Refactor]A simple device-related refactor (#11163) Chenguang Li 2024-12-13 21:39:00 +08:00
969da7d70b [V1][VLM] Fix edge case bug for InternVL2 (#11165) Roger Wang 2024-12-13 03:09:30 -08:00
eeec9e3390 [Frontend] Separate pooling APIs in offline inference (#11129) Cyrus Leung 2024-12-13 18:40:07 +08:00
f93bf2b189 [Bugfix][CI][CPU] add missing datasets package to requirements-cpu.txt (#11159) Li, Jiang 2024-12-13 16:50:35 +08:00
7cd7409142 PaliGemma 2 support (#11142) Jani Monoses 2024-12-13 09:40:07 +02:00
be39e3cd18 [core] clean up cudagraph batchsize padding logic (#10996) youkaichao 2024-12-12 22:57:50 -08:00
34f1a806d5 [Bugfix][V1] Fix 'NoneType' object has no attribute 'hash_value' (#11157) Cody Yu 2024-12-12 22:30:06 -08:00
00c1bde5d8 [ROCm][AMD] Disable auto enabling chunked prefill on ROCm (#11146) Gregory Shtrasberg 2024-12-13 00:31:26 -05:00
3989a79824 [Bugfix] Update starcoder2 to remap k/v scale names for kv_cache quantization (#11148) Dipika Sikka 2024-12-13 00:07:20 -05:00
1efce68605 [Bugfix] Use runner_type instead of task in GritLM (#11144) Pooya Davoodi 2024-12-12 20:09:53 -08:00
30870b4f66 [torch.compile] Dynamic fp8 + rms_norm fusion (#10906) Luka Govedič 2024-12-12 22:19:23 -05:00
78ed8f57d8 [Misc][V1] Fix type in v1 prefix caching (#11151) Cody Yu 2024-12-12 16:57:40 -08:00
db6c264a1e [Bugfix] Fix value unpack error of simple connector for KVCache transfer. (#11058) shangmingc 2024-12-13 05:19:17 +08:00
9f3974a319 Fix logging of the vLLM Config (#11143) Jeremy Arnold 2024-12-12 14:05:57 -06:00
2c97eca1ff [Misc] Validate grammar and fail early (#11119) Cody Yu 2024-12-12 10:34:26 -08:00
5d712571af [Bugfix] Quick fix to make Pixtral-HF load correctly again after 39e227c7ae. (#11024) Jeff Cook 2024-12-12 11:09:20 -07:00
d4d5291cc2 fix(docs): typo in helm install instructions (#11141) Ramon Ziai 2024-12-12 18:36:32 +01:00
4816d20aa4 [V1] Fix torch profiling for offline inference (#11125) Roger Wang 2024-12-12 07:51:53 -08:00
85362f028c [Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094) Jiaxin Shan 2024-12-12 01:25:16 -08:00
62de37a38e [core][distributed] initialization from StatelessProcessGroup (#10986) youkaichao 2024-12-12 01:04:19 -08:00
8195824206 [Hardware][Intel-Gaudi] Enable LoRA support for Intel Gaudi (HPU) (#10565) Sanju C Sudhakaran 2024-12-12 13:39:28 +05:30
f092153fbe [V1] Use more persistent buffers to optimize input preparation overheads (#11111) Woosuk Kwon 2024-12-11 23:14:20 -08:00
1da8f0e1dd [Model] Add support for embedding model GritLM (#10816) Pooya Davoodi 2024-12-11 22:39:16 -08:00
ccede2b264 [Core] cleanup zmq ipc sockets on exit (#11115) Russell Bryant 2024-12-11 22:12:24 -05:00
24a36d6d5f Update link to LlamaStack remote vLLM guide in serving_with_llamastack.rst (#11112) Yuan Tang 2024-12-11 21:39:21 -05:00
8fb26dac61 [Docs] Add media kit (#11121) Simon Mo 2024-12-11 17:33:11 -08:00
7439a8b5fc [Bugfix] Multiple fixes to tool streaming with hermes and mistral (#10979) Clayton 2024-12-11 17:10:12 -08:00
4e11683368 [V1] VLM preprocessor hashing (#11020) Alexander Matveev 2024-12-11 19:55:30 -05:00
452a723bf2 [V1][Core] Remove should_shutdown to simplify core process termination (#11113) Tyler Michael Smith 2024-12-11 18:34:54 -05:00
d1e21a979b [CI/Build] Split up VLM tests (#11083) Cyrus Leung 2024-12-12 06:18:16 +08:00
72ff3a9686 [core] Bump ray to use _overlap_gpu_communication in compiled graph tests (#10410) Rui Qiao 2024-12-11 11:36:35 -08:00
66aaa7722d [torch.compile] remove graph logging in ci (#11110) youkaichao 2024-12-11 10:59:50 -08:00
d643c2aba1 [V1] Use input_ids as input for text-only models (#11032) Woosuk Kwon 2024-12-11 10:49:23 -08:00
91642db952 [torch.compile] use depyf to dump torch.compile internals (#10972) youkaichao 2024-12-11 10:43:05 -08:00
fd22220687 [Doc] Installed version of llmcompressor for int8/fp8 quantization (#11103) bingps 2024-12-11 23:43:24 +08:00
b2f775456e [CI/Build] Enable prefix caching test for AMD (#11098) hissu-hyvarinen 2024-12-11 17:23:37 +02:00
cad5c0a6ed [Doc] Update docs to refer to pooling models (#11093) Cyrus Leung 2024-12-11 21:36:27 +08:00
8f10d5e393 [Misc] Split up pooling tasks (#10820) Cyrus Leung 2024-12-11 17:28:00 +08:00
40766ca1b8 [Bugfix]: Clamp -inf logprob values in prompt_logprobs (#11073) Rafael Vasquez 2024-12-11 04:27:39 -05:00
2e32f5d28d [Bugfix] Fix Idefics3 fails during multi-image inference (#11080) B-201 2024-12-11 17:27:07 +08:00
61b1d2f6ae [Core] v1: Use atexit to handle engine core client shutdown (#11076) Russell Bryant 2024-12-11 04:26:36 -05:00
9974fca047 [ci/build] Fix entrypoints test and pin outlines version (#11088) Kevin H. Luu 2024-12-11 01:01:53 -08:00
3fb4b4f163 [ci/build] Fix AMD CI dependencies (#11087) Kevin H. Luu 2024-12-11 00:39:53 -08:00
2e33fe4191 [CI/Build] Check transformers v4.47 (#10991) Cyrus Leung 2024-12-11 13:02:02 +08:00
e39400a4b6 Fix streaming for granite tool call when <|tool_call|> is present (#11069) Maximilien de Bayser 2024-12-11 01:51:40 -03:00
ffa48c9146 [Model] PP support for Mamba-like models (#10992) Mor Zusman 2024-12-11 04:53:37 +02:00
d5c5154fcf [Misc] LoRA + Chunked Prefill (#9057) Aurick Qiao 2024-12-10 21:09:20 -05:00
9a93973708 [Bugfix] Fix Mamba multistep (#11071) Tyler Michael Smith 2024-12-10 19:16:22 -05:00
134810b3d9 [V1][Bugfix] Always set enable_chunked_prefill = True for V1 (#11061) Woosuk Kwon 2024-12-10 14:41:23 -08:00
75f89dc44c [torch.compile] add a flag to track batchsize statistics (#11059) youkaichao 2024-12-10 12:40:52 -08:00
e739194926 [Core] Update to outlines >= 0.1.8 (#10576) Russell Bryant 2024-12-10 15:08:16 -05:00
250ee65d72 [BUG] Remove token param #10921 (#11022) Flávia Béo 2024-12-10 14:38:15 -03:00
9b9cef3145 [Bugfix] Backport request id validation to v0 (#11036) Joe Runde 2024-12-10 09:38:23 -07:00
d05f88679b [Misc][LoRA] Add PEFTHelper for LoRA (#11003) Jee Jee Li 2024-12-10 19:12:01 +08:00
beb16b2c81 [Bugfix] Handle <|tool_call|> token in granite tool parser (#11039) Travis Johnson 2024-12-10 03:27:11 -07:00
fe2e10c71b Add example of helm chart for vllm deployment on k8s (#9199) Maxime Fournioux 2024-12-10 10:19:27 +01:00
82c73fd510 [Bugfix] cuda error running llama 3.2 (#11047) Gene Der Su 2024-12-09 23:41:11 -08:00
bfd610430c Update README.md (#11034) Diego Marinho 2024-12-10 18:08:10 +11:00
e35879c276 [Bugfix] Fix xgrammar failing to read a vocab_size from LlavaConfig on PixtralHF. (#11043) Jeff Cook 2024-12-09 23:54:22 -07:00
ebf778061d monitor metrics of tokens per step using cudagraph batchsizes (#11031) youkaichao 2024-12-09 22:35:36 -08:00
28b3a1c7e5 [V1] Multiprocessing Tensor Parallel Support for v1 (#9856) Tyler Michael Smith 2024-12-10 01:28:14 -05:00
bc192a2b09 [Pixtral] Improve loading (#11040) Patrick von Platen 2024-12-10 07:09:32 +01:00
980ad394a8 [Frontend] Use request id from header (#10968) Joe Runde 2024-12-09 22:46:29 -07:00
391d7b2763 [Bugfix] Fix usage of deprecated decorator (#11025) Cyrus Leung 2024-12-10 13:45:47 +08:00
d1f6d1c8af [Model] Add has_weight to RMSNorm and re-enable weights loading tracker for Mamba (#10739) Isotr0py 2024-12-10 10:23:07 +08:00
6d525288c1 [Docs] Add dedicated tool calling page to docs (#10554) Michael Goin 2024-12-09 20:15:34 -05:00
6faec54505 [V1] Do not store None in self.generators (#11038) Woosuk Kwon 2024-12-09 15:08:19 -08:00
5ed5d5f128 Build tpu image in release pipeline (#10936) Richard Liu 2024-12-09 15:07:48 -08:00
b63ba84832 [ROCm][bugfix] scpecilative decoding worker class (#11035) Gregory Shtrasberg 2024-12-09 17:00:29 -05:00
9c6459e4cb [Neuron] Upgrade neuron to 2.20.2 (#11016) xendo 2024-12-09 22:53:24 +01:00
1a2f8fb828 [v1] fix use compile sizes (#11000) youkaichao 2024-12-09 13:47:24 -08:00
cbcbdb1ceb [Bugfix][Hardware][Gaudi] Bump vllm_hpu_extension version (#11028) Konrad Zawora 2024-12-09 22:21:06 +01:00
a811dd6608 [Model] merged input processor for Phi-3-Vision models (#10977) Isotr0py 2024-12-10 04:55:10 +08:00
ca871491ed [Misc][LoRA] Abstract PunicaWrapper (#10955) Jee Jee Li 2024-12-10 04:54:44 +08:00
3b61cb450d [V1] Further reduce CPU overheads in flash-attn (#10989) Woosuk Kwon 2024-12-09 12:38:46 -08:00
edc4fa3188 [ci/build] Recompile CI dependencies list with Python 3.12 (#11013) Kevin H. Luu 2024-12-09 11:46:58 -08:00
25b79d9fd3 [V1] Input Batch Relocation (#10962) Varun Sundar Rabindranath 2024-12-09 12:33:41 -05:00
aea2fc38c3 [Platform] Move async output check to platform (#10768) wangxiyuan 2024-12-10 01:24:46 +08:00
e691b26f6f [Core] Require xgrammar >= 0.1.6 (#11021) Russell Bryant 2024-12-09 11:44:27 -05:00
c690357928 [V1] Fix Detokenizer loading in AsyncLLM (#10997) Roger Wang 2024-12-09 08:27:10 -08:00
d1c2e15eb3 [torch.compile] add dynamo time tracking (#11005) youkaichao 2024-12-08 23:09:04 -08:00
af7c4a92e6 [Doc][V1] Add V1 support column for multimodal models (#10998) Roger Wang 2024-12-08 22:29:16 -08:00
46004e83a2 [misc] clean up and unify logging (#10999) youkaichao 2024-12-08 17:28:27 -08:00
43b05fa314 [torch.compile][misc] fix comments (#10993) youkaichao 2024-12-08 11:18:18 -08:00

... 120 121 122 123 124 ...