Cyrus Leung
|
1f26efbb3a
|
[Model] Support SigLIP encoder and alternative decoders for LLaVA models (#7153)
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-08-06 16:55:31 +08:00 |
|
Jee Jee Li
|
9118217f58
|
[LoRA] Relax LoRA condition (#7146)
|
2024-08-06 01:57:25 +00:00 |
|
Simon Mo
|
e3c664bfcb
|
[Build] Add initial conditional testing spec (#6841)
|
2024-08-05 17:39:22 -07:00 |
|
Isotr0py
|
360bd67cf0
|
[Core] Support loading GGUF model (#5191)
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-08-05 17:54:23 -06:00 |
|
Cody Yu
|
ef527be06c
|
[MISC] Use non-blocking transfer in prepare_input (#7172)
|
2024-08-05 23:41:27 +00:00 |
|
Jacob Schein
|
89b8db6bb2
|
[Bugfix] Specify device when loading LoRA and embedding tensors (#7129)
Co-authored-by: Jacob Schein <jacobschein@Jacobs-MacBook-Pro-2.local>
|
2024-08-05 16:35:47 -07:00 |
|
Thomas Parnell
|
789937af2e
|
[Doc] [SpecDecode] Update MLPSpeculator documentation (#7100)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-08-05 23:29:43 +00:00 |
|
youkaichao
|
dfb1a15dcb
|
[ci][frontend] deduplicate tests (#7101)
|
2024-08-05 15:59:22 -07:00 |
|
Simon Mo
|
4db5176d97
|
bump version to v0.5.4 (#7139)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.8, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
v0.5.4
|
2024-08-05 14:39:48 -07:00 |
|
Tyler Michael Smith
|
4cf1dc39be
|
[Bugfix][CI/Build] Fix CUTLASS FetchContent (#7171)
|
2024-08-05 14:22:57 -07:00 |
|
Tyler Michael Smith
|
6e4852ce28
|
[CI/Build] Suppress divide-by-zero and missing return statement warnings (#7001)
|
2024-08-05 16:00:01 -04:00 |
|
Tyler Michael Smith
|
8571ac4672
|
[Kernel] Update CUTLASS to 3.5.1 (#7085)
|
2024-08-05 15:13:43 -04:00 |
|
Rui Qiao
|
997cf78308
|
[Misc] Fix typo in GroupCoordinator.recv() (#7167)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-05 11:10:16 -07:00 |
|
Aditya Paliwal
|
57f560aa23
|
[BugFix] Use args.trust_remote_code (#7121)
|
2024-08-05 09:26:14 -07:00 |
|
Nick Hill
|
003f8ee128
|
[BugFix] Use IP4 localhost form for zmq bind (#7163)
|
2024-08-05 08:41:03 -07:00 |
|
Bongwon Jang
|
e9630458c7
|
[SpecDecode] Support FlashInfer in DraftModelRunner (#6926)
|
2024-08-05 08:05:05 -07:00 |
|
Cade Daniel
|
82a1b1a82b
|
[Speculative decoding] Add periodic log with time spent in proposal/scoring/verification (#6963)
|
2024-08-05 08:46:44 +00:00 |
|
Jungho Christopher Cho
|
c0d8f1636c
|
[Model] SiglipVisionModel ported from transformers (#6942)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-08-05 06:22:12 +00:00 |
|
Cyrus Leung
|
cc08fc7225
|
[Frontend] Reapply "Factor out code for running uvicorn" (#7095)
|
2024-08-04 20:40:51 -07:00 |
|
Alphi
|
7b86e7c9cd
|
[Model] Add multi-image support for minicpmv (#7122)
Co-authored-by: hezhihui <hzh7269@modelbest.cn>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-05 09:23:17 +08:00 |
|
Jee Jee Li
|
f80ab3521c
|
Clean up remaining Punica C information (#7027)
|
2024-08-04 15:37:08 -07:00 |
|
youkaichao
|
16a1cc9bb2
|
[misc][distributed] improve libcudart.so finding (#7127)
|
2024-08-04 11:31:51 -07:00 |
|
Thomas Parnell
|
b1c9aa3daa
|
[Bugfix] [SpecDecode] Default speculative_draft_tensor_parallel_size to 1 when using MLPSpeculator (#7105)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2024-08-04 07:13:18 -07:00 |
|
Jee Jee Li
|
179a6a36f2
|
[Model]Refactor MiniCPMV (#7020)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-04 08:12:41 +00:00 |
|
youkaichao
|
83c644fe7e
|
[core][misc] simply output processing with shortcut code path (#7117)
|
2024-08-04 00:22:19 -07:00 |
|
youkaichao
|
9fadc7b7a0
|
[misc] add zmq in collect env (#7119)
|
2024-08-03 22:03:46 -07:00 |
|
Yihuan Bu
|
654bc5ca49
|
Support for guided decoding for offline LLM (#6878)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-04 03:12:09 +00:00 |
|
Jeff Fialho
|
825b044863
|
[Frontend] Warn if user max_model_len is greater than derived max_model_len (#7080)
Signed-off-by: Jefferson Fialho <jfialho@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-08-03 16:01:38 -07:00 |
|
youkaichao
|
44dcb52e39
|
[ci][test] finalize fork_new_process_for_each_test (#7114)
|
2024-08-03 10:44:53 -07:00 |
|
Kuntai Du
|
67d745cc68
|
[CI] Temporarily turn off H100 performance benchmark (#7104)
|
2024-08-02 23:52:44 -07:00 |
|
Jee Jee Li
|
99d7cabd7b
|
[LoRA] ReplicatedLinear support LoRA (#7081)
|
2024-08-02 22:40:19 -07:00 |
|
Zach Zheng
|
fb2c1c86c1
|
[Bugfix] Fix block table for seqs that have prefix cache hits (#7018)
|
2024-08-02 22:38:15 -07:00 |
|
Isotr0py
|
0c25435daa
|
[Model] Refactor and decouple weight loading logic for InternVL2 model (#7067)
|
2024-08-02 22:36:14 -07:00 |
|
youkaichao
|
a0d164567c
|
[ci][distributed] disable ray dag tests (#7099)
|
2024-08-02 22:32:04 -07:00 |
|
youkaichao
|
04e5583425
|
[ci][distributed] merge distributed test commands (#7097)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-08-02 21:33:53 -07:00 |
|
Cyrus Leung
|
8c025fa703
|
[Frontend] Factor out chat message parsing (#7055)
|
2024-08-02 21:31:27 -07:00 |
|
youkaichao
|
69ea15e5cc
|
[ci][distributed] shorten wait time if server hangs (#7098)
|
2024-08-02 21:05:16 -07:00 |
|
Robert Shaw
|
ed812a73fa
|
[ Frontend ] Multiprocessing for OpenAI Server with zeromq (#6883)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Joe Runde <joe@joerun.de>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-08-02 18:27:28 -07:00 |
|
youkaichao
|
708989341e
|
[misc] add a flag to enable compile (#7092)
|
2024-08-02 16:18:45 -07:00 |
|
Rui Qiao
|
22e718ff1a
|
[Misc] Revive to use loopback address for driver IP (#7091)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-02 15:50:00 -07:00 |
|
Rui Qiao
|
05308891e2
|
[Core] Pipeline parallel with Ray ADAG (#6837)
Support pipeline-parallelism with Ray accelerated DAG.
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2024-08-02 13:55:40 -07:00 |
|
Lucas Wilkinson
|
a8d604ca2a
|
[Misc] Disambiguate quantized types via a new ScalarType (#6396)
|
2024-08-02 13:51:58 -07:00 |
|
Michael Goin
|
b482b9a5b1
|
[CI/Build] Add support for Python 3.12 (#7035)
|
2024-08-02 13:51:22 -07:00 |
|
youkaichao
|
806949514a
|
[ci] set timeout for test_oot_registration.py (#7082)
|
2024-08-02 10:03:24 -07:00 |
|
Jie Fu (傅杰)
|
c16eaac500
|
[Hardware][Intel CPU] Update torch 2.4.0 for CPU backend (#6931)
|
2024-08-02 08:55:58 -07:00 |
|
Peng Guanwen
|
db35186391
|
[Core] Comment out unused code in sampler (#7023)
|
2024-08-02 00:58:26 -07:00 |
|
youkaichao
|
660dea1235
|
[cuda][misc] remove error_on_invalid_device_count_status (#7069)
|
2024-08-02 00:14:21 -07:00 |
|
Bongwon Jang
|
cf2a1a4d9d
|
Fix tracing.py (#7065)
|
2024-08-01 23:28:00 -07:00 |
|
youkaichao
|
252357793d
|
[ci][distributed] try to fix pp test (#7054)
|
2024-08-01 22:03:12 -07:00 |
|
Cyrus Leung
|
3bb4b1e4cd
|
[mypy] Speed up mypy checking (#7056)
|
2024-08-01 19:49:43 -07:00 |
|