rasmith
|
92d86da217
|
[BugFix] [Kernel] Fix GPU SEGV occurring in int8 kernels (#9391)
|
2024-10-17 01:34:06 +00:00 |
|
Tyler Michael Smith
|
c3fab5f769
|
[Bugfix][Kernel] Prevent integer overflow in fp8 dynamic per-token quantize kernel (#9425)
|
2024-10-16 23:46:06 +00:00 |
|
Russell Bryant
|
776dbd74f1
|
[CI/Build] mypy: Resolve some errors from checking vllm/engine (#9267)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-16 22:55:59 +00:00 |
|
Lily Liu
|
8345045833
|
[Performance][Spec Decode] Optimize ngram lookup performance (#9333)
|
2024-10-16 13:37:45 -06:00 |
|
Junhao Li
|
5b8a1fde84
|
[Model][Bugfix] Add FATReLU activation and support for openbmb/MiniCPM-S-1B-sft (#9396)
|
2024-10-16 16:40:24 +00:00 |
|
Mor Zusman
|
fb60ae9b91
|
[Kernel][Model] Improve continuous batching for Jamba and Mamba (#9189)
|
2024-10-16 12:12:43 -04:00 |
|
Patrick von Platen
|
415f76a9cb
|
Support mistral interleaved attn (#9414)
|
2024-10-16 13:28:30 +00:00 |
|
Isotr0py
|
cf1d62a644
|
[Model] Support SDPA attention for Molmo vision backbone (#9410)
|
2024-10-16 11:52:01 +00:00 |
|
Roger Wang
|
59230ef32b
|
[Misc] Consolidate example usage of OpenAI client for multimodal models (#9412)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-16 11:20:51 +00:00 |
|
Cyrus Leung
|
cee711fdbb
|
[Core] Rename input data types (#8688)
|
2024-10-16 10:49:37 +00:00 |
|
Cyrus Leung
|
1de76a0e55
|
[CI/Build] Test VLM embeddings (#9406)
|
2024-10-16 09:44:30 +00:00 |
|
Cyrus Leung
|
7abba39ee6
|
[Model] VLM2Vec, the first multimodal embedding model in vLLM (#9303)
|
2024-10-16 14:31:00 +08:00 |
|
Cyrus Leung
|
7e7eae338d
|
[Misc] Standardize RoPE handling for Qwen2-VL (#9250)
|
2024-10-16 13:56:17 +08:00 |
|
Reza Salehi
|
ed920135c8
|
[Bugfix] Molmo text-only input bug fix (#9397)
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-10-16 04:56:09 +00:00 |
|
Lucas Wilkinson
|
717a5f82cd
|
[Bugfix][CI/Build] Fix CUDA 11.8 Build (#9386)
|
2024-10-16 00:15:21 +00:00 |
|
Chang Su
|
ba30942240
|
[Bugfix] Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-15 15:40:43 -07:00 |
|
Michael Goin
|
22f8a69549
|
[Misc] Directly use compressed-tensors for checkpoint definitions (#8909)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-15 15:40:25 -07:00 |
|
Grace Ho
|
5d264f4ab8
|
pass ignore_eos parameter to all benchmark_serving calls (#9349)
|
2024-10-15 13:30:44 -07:00 |
|
Nick Hill
|
e9d517f276
|
[BugFix] Fix chat API continuous usage stats (#9357)
|
2024-10-14 23:19:48 -07:00 |
|
hhzhang16
|
55e081fbad
|
[Bugfix] Update InternVL input mapper to support image embeds (#9351)
|
2024-10-14 21:29:19 -07:00 |
|
Michael Goin
|
8e836d982a
|
[Doc] Fix code formatting in spec_decode.rst (#9348)
|
2024-10-14 21:29:11 -07:00 |
|
Steve Grubb
|
44eaa5a5d9
|
[Frontend] Clarify model_type error messages (#9345)
|
2024-10-14 21:29:01 -07:00 |
|
Tyler Michael Smith
|
169b530607
|
[Bugfix] Clean up some cruft in mamba.py (#9343)
|
2024-10-15 00:24:25 +00:00 |
|
Xiang Xu
|
f0fe4fe86d
|
[Model] Make llama3.2 support multiple and interleaved images (#9095)
|
2024-10-14 15:24:26 -07:00 |
|
Brendan Wong
|
4d31cd424b
|
[Frontend] merge beam search implementations (#9296)
|
2024-10-14 15:05:52 -07:00 |
|
Woosuk Kwon
|
473e7b3606
|
[TPU] Fix TPU SMEM OOM by Pallas paged attention kernel (#9350)
|
2024-10-14 15:02:06 -07:00 |
|
Simon Mo
|
fd47e57f4b
|
[Docs] Remove PDF build from Readtehdocs (#9347)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.6.3
|
2024-10-14 11:57:47 -07:00 |
|
Daniele
|
203ab8f80f
|
[CI/Build] setuptools-scm fixes (#8900)
|
2024-10-14 11:34:47 -07:00 |
|
Kunshang Ji
|
4141608c6a
|
[Hardware][intel GPU] add async output process for xpu (#8897)
|
2024-10-14 12:23:33 -06:00 |
|
Reza Salehi
|
dfe43a2071
|
[Model] Molmo vLLM Integration (#9016)
Co-authored-by: sanghol <sanghol@allenai.org>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-10-14 07:56:24 -07:00 |
|
Tyler Michael Smith
|
16b24e7dcd
|
[Bugfix] Bandaid fix for speculative decoding tests (#9327)
|
2024-10-13 23:02:11 +00:00 |
|
Lily Liu
|
f519902c52
|
[CI] Fix merge conflict (#9317)
|
2024-10-13 06:41:23 +00:00 |
|
Jee Jee Li
|
250e26a63e
|
[Bugfix]Fix MiniCPM's LoRA bug (#9286)
|
2024-10-12 09:36:47 -07:00 |
|
Yunmeng
|
2b184ddd4f
|
[Misc][Installation] Improve source installation script and doc (#9309)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-12 09:36:40 -07:00 |
|
Xiang Xu
|
00298e092c
|
[Bugfix] Fix bug of xformer prefill for encoder-decoder (#9026)
|
2024-10-12 15:00:43 +08:00 |
|
Lily Liu
|
89feb4c84d
|
[SpecDec] Remove Batch Expansion (2/3) (#9298)
|
2024-10-12 05:13:37 +00:00 |
|
Maximilien de Bayser
|
ec10cb8511
|
[BugFix] Fix tool call finish reason in streaming case (#9209)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-10-11 18:24:26 -07:00 |
|
Prashant Gupta
|
d11b46f3a5
|
[bugfix] fix f-string for error (#9295)
Signed-off-by: Prashant Gupta <prashantgupta@us.ibm.com>
|
2024-10-11 17:03:48 -07:00 |
|
Allen Wang
|
c6cf9295e1
|
[Bugfix] Sets is_first_step_output for TPUModelRunner (#9202)
|
2024-10-11 13:28:10 -07:00 |
|
Lucas Wilkinson
|
de9fb4bef8
|
[Bugfix][CI/Build] Fix docker build where CUDA archs < 7.0 are being detected (#9254)
|
2024-10-11 15:57:39 -04:00 |
|
Wallas Henrique
|
8baf85e4e9
|
[Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-11 11:18:50 -07:00 |
|
homeffjy
|
1a1823871d
|
[Doc] Remove outdated comment to avoid misunderstanding (#9287)
|
2024-10-11 18:02:03 +00:00 |
|
sixgod
|
6cf1167c1a
|
[Model] Add GLM-4v support and meet vllm==0.6.2 (#9242)
|
2024-10-11 17:36:13 +00:00 |
|
Burkhard Ringlein
|
f710090d8e
|
[Kernel] adding fused moe kernel config for L40S TP4 (#9245)
|
2024-10-11 08:54:22 -07:00 |
|
Tyler Michael Smith
|
7342a7d7f8
|
[Model] Support Mamba (#6484)
|
2024-10-11 15:40:06 +00:00 |
|
Sebastian Schoennenbeck
|
df3dcdf49d
|
[Bugfix] Fix priority in multiprocessing engine (#9277)
|
2024-10-11 15:35:35 +00:00 |
|
Jee Jee Li
|
36ea79079b
|
[Misc][LoRA] Support loading LoRA weights for target_modules in reg format (#9275)
|
2024-10-11 12:31:21 +00:00 |
|
Cyrus Leung
|
e808156f30
|
[Misc] Collect model support info in a single process per model (#9233)
|
2024-10-11 11:08:11 +00:00 |
|
youkaichao
|
cbc2ef5529
|
[misc] hide best_of from engine (#9261)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-10-10 21:30:44 -07:00 |
|
Andy Dai
|
94bf9ae4e9
|
[Misc] Fix sampling from sonnet for long context case (#9235)
|
2024-10-11 00:33:16 +00:00 |
|