Commit Graph

3022 Commits

Author SHA1 Message Date
Daniele
a2c71c5405 [CI/Build] remove .github from .dockerignore, add dirty repo check (#9375)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.6.3.post1
2024-10-17 10:25:06 -07:00
Kuntai Du
81ede99ca4 [Core] Deprecating block manager v1 and make block manager v2 default (#8704)
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Li, Jiang
5eda21e773 [Hardware][CPU] compressed-tensor INT8 W8A8 AZP support (#9344) 2024-10-17 12:21:04 -04:00
Woosuk Kwon
8e1cddcd44 [TPU] Call torch._sync(param) during weight loading (#9437) 2024-10-17 09:00:11 -07:00
sasha0552
5e443b594f [Bugfix] Allow prefill of assistant response when using mistral_common (#9446) 2024-10-17 15:06:37 +00:00
Lucas Wilkinson
9d30a056e7 [misc] CUDA Time Layerwise Profiler (#8337)
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
2024-10-17 10:36:09 -04:00
Cyrus Leung
390be74649 [Misc] Print stack trace using logger.exception (#9461) 2024-10-17 13:55:48 +00:00
Lucas Wilkinson
e312e52b44 [Kernel] Add Exllama as a backend for compressed-tensors (#9395) 2024-10-17 09:48:26 -04:00
Yuan Tang
dbfa8d31d5 Add notes on the use of Slack (#9442) 2024-10-17 04:46:46 +00:00
rasmith
92d86da217 [BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391) 2024-10-17 01:34:06 +00:00
Tyler Michael Smith
c3fab5f769 [Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel (#9425) 2024-10-16 23:46:06 +00:00
Russell Bryant
776dbd74f1 [CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
2024-10-16 22:55:59 +00:00
Lily Liu
8345045833 [Performance][Spec Decode] Optimize ngram lookup performance (#9333) 2024-10-16 13:37:45 -06:00
Junhao Li
5b8a1fde84 [Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396) 2024-10-16 16:40:24 +00:00
Mor Zusman
fb60ae9b91 [Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189) 2024-10-16 12:12:43 -04:00
Patrick von Platen
415f76a9cb Support mistral interleaved attn (#9414) 2024-10-16 13:28:30 +00:00
Isotr0py
cf1d62a644 [Model] Support SDPA attention for Molmo vision backbone (#9410) 2024-10-16 11:52:01 +00:00
Roger Wang
59230ef32b [Misc] Consolidate example usage of OpenAI client for multimodal models (#9412)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-16 11:20:51 +00:00
Cyrus Leung
cee711fdbb [Core] Rename input data types (#8688) 2024-10-16 10:49:37 +00:00
Cyrus Leung
1de76a0e55 [CI/Build] Test VLM embeddings (#9406) 2024-10-16 09:44:30 +00:00
Cyrus Leung
7abba39ee6 [Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303) 2024-10-16 14:31:00 +08:00
Cyrus Leung
7e7eae338d [Misc] Standardize RoPE handling for Qwen2-VL (#9250) 2024-10-16 13:56:17 +08:00
Reza Salehi
ed920135c8 [Bugfix] Molmo text-only input bug fix (#9397)
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-16 04:56:09 +00:00
Lucas Wilkinson
717a5f82cd [Bugfix][CI/Build] Fix CUDA 11.8 Build (#9386) 2024-10-16 00:15:21 +00:00
Chang Su
ba30942240 [Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
2024-10-15 15:40:43 -07:00
Michael Goin
22f8a69549 [Misc] Directly use compressed-tensors for checkpoint definitions (#8909)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
2024-10-15 15:40:25 -07:00
Grace Ho
5d264f4ab8 pass ignore_eos parameter to all benchmark_serving calls (#9349) 2024-10-15 13:30:44 -07:00
Nick Hill
e9d517f276 [BugFix] Fix chat API continuous usage stats (#9357) 2024-10-14 23:19:48 -07:00
hhzhang16
55e081fbad [Bugfix] Update InternVL input mapper to support image embeds (#9351) 2024-10-14 21:29:19 -07:00
Michael Goin
8e836d982a [Doc] Fix code formatting in spec_decode.rst (#9348) 2024-10-14 21:29:11 -07:00
Steve Grubb
44eaa5a5d9 [Frontend] Clarify model_type error messages (#9345) 2024-10-14 21:29:01 -07:00
Tyler Michael Smith
169b530607 [Bugfix] Clean up some cruft in mamba.py (#9343) 2024-10-15 00:24:25 +00:00
Xiang Xu
f0fe4fe86d [Model] Make llama3.2 support multiple and interleaved images (#9095) 2024-10-14 15:24:26 -07:00
Brendan Wong
4d31cd424b [Frontend] merge beam search implementations (#9296) 2024-10-14 15:05:52 -07:00
Woosuk Kwon
473e7b3606 [TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350) 2024-10-14 15:02:06 -07:00
Simon Mo
fd47e57f4b [Docs] Remove PDF build from Readtehdocs (#9347)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.6.3
2024-10-14 11:57:47 -07:00
Daniele
203ab8f80f [CI/Build] setuptools-scm fixes (#8900) 2024-10-14 11:34:47 -07:00
Kunshang Ji
4141608c6a [Hardware][intel GPU] add async output process for xpu (#8897) 2024-10-14 12:23:33 -06:00
Reza Salehi
dfe43a2071 [Model] Molmo vLLM Integration (#9016)
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
2024-10-14 07:56:24 -07:00
Tyler Michael Smith
16b24e7dcd [Bugfix] Bandaid fix for speculative decoding tests (#9327) 2024-10-13 23:02:11 +00:00
Lily Liu
f519902c52 [CI] Fix merge conflict (#9317) 2024-10-13 06:41:23 +00:00
Jee Jee Li
250e26a63e [Bugfix]Fix MiniCPM's LoRA bug (#9286) 2024-10-12 09:36:47 -07:00
Yunmeng
2b184ddd4f [Misc][Installation] Improve source installation script and doc (#9309)
Co-authored-by: youkaichao <youkaichao@126.com>
2024-10-12 09:36:40 -07:00
Xiang Xu
00298e092c [Bugfix] Fix bug of xformer prefill for encoder-decoder (#9026) 2024-10-12 15:00:43 +08:00
Lily Liu
89feb4c84d [SpecDec] Remove Batch Expansion (2/3) (#9298) 2024-10-12 05:13:37 +00:00
Maximilien de Bayser
ec10cb8511 [BugFix] Fix tool call finish reason in streaming case (#9209)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
2024-10-11 18:24:26 -07:00
Prashant Gupta
d11b46f3a5 [bugfix] fix f-string for error (#9295)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
2024-10-11 17:03:48 -07:00
Allen Wang
c6cf9295e1 [Bugfix] Sets is_first_step_output for TPUModelRunner (#9202) 2024-10-11 13:28:10 -07:00
Lucas Wilkinson
de9fb4bef8 [Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected (#9254) 2024-10-11 15:57:39 -04:00
Wallas Henrique
8baf85e4e9 [Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
2024-10-11 11:18:50 -07:00