Murali Andoorveedu
|
0ed646b7aa
|
[Distributed][Core] Support Py39 and Py38 for PP (#6120)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-03 17:52:29 -07:00 |
|
Travis Johnson
|
1dab9bc8a9
|
[Bugfix] set OMP_NUM_THREADS to 1 by default for multiprocessing (#6109)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-07-03 16:56:59 -07:00 |
|
youkaichao
|
3de6e6a30e
|
[core][distributed] support n layers % pp size != 0 (#6115)
|
2024-07-03 16:40:31 -07:00 |
|
youkaichao
|
966fe72141
|
[doc][misc] bump up py version in installation doc (#6119)
|
2024-07-03 15:52:04 -07:00 |
|
Robert Shaw
|
62963d129e
|
[ Misc ] Clean Up CompressedTensorsW8A8 (#6113)
|
2024-07-03 22:50:08 +00:00 |
|
xwjiang2010
|
d9e98f42e4
|
[vlm] Remove vision language config. (#6089)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-03 22:14:16 +00:00 |
|
youkaichao
|
3c6325f0fc
|
[core][distributed] custom allreduce when pp size > 1 (#6117)
|
2024-07-03 14:41:32 -07:00 |
|
Michael Goin
|
47f0954af0
|
[Kernel] Expand FP8 support to Ampere GPUs using FP8 Marlin (#5975)
|
2024-07-03 17:38:00 +00:00 |
|
Roger Wang
|
7cd2ebb025
|
[Bugfix] Fix compute_logits in Jamba (#6093)
|
2024-07-03 00:32:35 -07:00 |
|
Roger Wang
|
f1c78138aa
|
[Doc] Fix Mock Import (#6094)
|
2024-07-03 00:13:56 -07:00 |
|
Roger Wang
|
3a86b54fb0
|
[VLM][Frontend] Proper Image Prompt Formatting from OpenAI API (#6091)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-07-02 23:41:23 -07:00 |
|
youkaichao
|
f666207161
|
[misc][distributed] error on invalid state (#6092)
|
2024-07-02 23:37:29 -07:00 |
|
Nick Hill
|
d830656a97
|
[BugFix] Avoid unnecessary Ray import warnings (#6079)
|
2024-07-03 14:09:40 +08:00 |
|
SangBin Cho
|
d18bab3587
|
[CI] Fix base url doesn't strip "/" (#6087)
|
2024-07-02 21:31:25 -07:00 |
|
Cyrus Leung
|
9831aec49f
|
[Core] Dynamic image size support for VLMs (#5276)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: ywang96 <ywang@roblox.com>
Co-authored-by: xwjiang2010 <87673679+xwjiang2010@users.noreply.github.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-07-02 20:34:00 -07:00 |
|
youkaichao
|
482045ee77
|
[hardware][misc] introduce platform abstraction (#6080)
|
2024-07-02 20:12:22 -07:00 |
|
Mor Zusman
|
9d6a8daa87
|
[Model] Jamba support (#4115)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: Erez Schwartz <erezs@ai21.com>
Co-authored-by: Mor Zusman <morz@ai21.com>
Co-authored-by: tomeras91 <57313761+tomeras91@users.noreply.github.com>
Co-authored-by: Tomer Asida <tomera@ai21.com>
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com>
Co-authored-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 23:11:29 +00:00 |
|
Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
Robert Shaw
|
7c008c51a9
|
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-02 21:54:35 +00:00 |
|
Robert Shaw
|
4d26d806e1
|
Update conftest.py (#6076)
|
2024-07-02 20:14:22 +00:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
Sirej Dua
|
15aba081f3
|
[Speculative Decoding] MLPSpeculator Tensor Parallel support (1/2) (#6050)
Co-authored-by: Sirej Dua <sirej.dua@databricks.com>
Co-authored-by: Sirej Dua <Sirej Dua>
|
2024-07-02 07:20:29 -07:00 |
|
Cyrus Leung
|
31354e563f
|
[Doc] Reinstate doc dependencies (#6061)
|
2024-07-02 10:53:16 +00:00 |
|
xwjiang2010
|
98d6682cd1
|
[VLM] Remove image_input_type from VLM config (#5852)
Signed-off-by: Xiaowei Jiang <xwjiang2010@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-07-02 07:57:09 +00:00 |
|
danieljannai21
|
2c37540aa6
|
[Frontend] Add template related params to request (#5709)
|
2024-07-01 23:01:57 -07:00 |
|
Alexander Matveev
|
3476ed0809
|
[Core] Optimize block_manager_v2 vs block_manager_v1 (to make V2 default) (#5602)
|
2024-07-01 20:10:37 -07:00 |
|
Thomas Parnell
|
54600709b6
|
[Model] Changes to MLPSpeculator to support tie_weights and input_scale (#5965)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>
|
2024-07-01 16:40:02 -07:00 |
|
James Whedbee
|
e373853e12
|
[Frontend] Relax api url assertion for openai benchmarking (#6046)
|
2024-07-01 23:39:10 +00:00 |
|
Nick Hill
|
c87ebc3ef9
|
[BugFix] Ensure worker model loop is always stopped at the right time (#5987)
|
2024-07-01 16:17:58 -07:00 |
|
Antoni Baum
|
c4059ea54f
|
[Bugfix] Add explicit end_forward calls to flashinfer (#6044)
|
2024-07-01 23:08:58 +00:00 |
|
Roger Wang
|
8e0817c262
|
[Bugfix][Doc] Fix Doc Formatting (#6048)
|
2024-07-01 15:09:11 -07:00 |
|
ning.zhang
|
83bdcb6ac3
|
add FAQ doc under 'serving' (#5946)
|
2024-07-01 14:11:36 -07:00 |
|
Avshalom Manevich
|
12a59959ed
|
[Bugfix] adding chunking mechanism to fused_moe to handle large inputs (#6029)
|
2024-07-01 21:08:29 +00:00 |
|
Antoni Baum
|
dec6fc6f3b
|
[Bugfix] Use RayActorError for older versions of Ray in RayTokenizerGroupPool (#6039)
|
2024-07-01 20:12:40 +00:00 |
|
youkaichao
|
8893130b63
|
[doc][misc] further lower visibility of simple api server (#6041)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-07-01 10:50:56 -07:00 |
|
zhyncs
|
bb60326836
|
[Misc] update benchmark backend for scalellm (#6018)
|
2024-07-01 10:20:33 -07:00 |
|
youkaichao
|
4050d646e5
|
[doc][misc] remove deprecated api server in doc (#6037)
|
2024-07-01 12:52:43 -04:00 |
|
Robert Shaw
|
d76084c12f
|
[ CI ] Re-enable Large Model LM Eval (#6031)
|
2024-07-01 12:40:45 -04:00 |
|
sroy745
|
80ca1e6a3a
|
[Speculative Decoding 2/2 ] Integrate typical acceptance sampler into Spec Decode Worker (#5348)
|
2024-07-01 00:33:05 -07:00 |
|
youkaichao
|
614aa51203
|
[misc][cuda] use nvml to avoid accidentally cuda initialization (#6007)
|
2024-06-30 20:07:34 -07:00 |
|
Robert Shaw
|
af9ad46fca
|
[ Misc ] Refactor w8a8 to use process_weights_after_load (Simplify Weight Loading) (#5940)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
|
2024-06-30 23:06:27 +00:00 |
|
Dipika Sikka
|
7836fdcc11
|
[Misc] Fix get_min_capability (#5971)
|
2024-06-30 20:15:16 +00:00 |
|
Robert Shaw
|
deacb7ec44
|
[ CI ] Temporarily Disable Large LM-Eval Tests (#6005)
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic>
|
2024-06-30 11:56:56 -07:00 |
|
SangBin Cho
|
f5e73c9f1b
|
[Lora] Use safetensor keys instead of adapter_config.json to find unexpected modules. (#5909)
Co-authored-by: sang <sangcho@anyscale.com>
|
2024-06-30 17:11:15 +00:00 |
|
llmpros
|
c6c240aa0a
|
[Frontend]: Support base64 embedding (#5935)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-06-30 23:53:00 +08:00 |
|
youkaichao
|
2be6955a3f
|
[ci][distributed] fix device count call
[ci][distributed] fix some cuda init that makes it necessary to use spawn (#5991)
|
2024-06-30 08:06:13 +00:00 |
|
Cyrus Leung
|
9d47f64eb6
|
[CI/Build] [3/3] Reorganize entrypoints tests (#5966)
|
2024-06-30 12:58:49 +08:00 |
|
Cyrus Leung
|
cff6a1fec1
|
[CI/Build] Reuse code for checking output consistency (#5988)
|
2024-06-30 11:44:25 +08:00 |
|
Roger Wang
|
bcc6a09b63
|
[CI/Build] Temporarily Remove Phi3-Vision from TP Test (#5989)
|
2024-06-30 09:18:31 +08:00 |
|
Matt Wong
|
9def10664e
|
[Bugfix][CI/Build][Hardware][AMD] Install matching torchvision to fix AMD tests (#5949)
|
2024-06-29 12:47:58 -07:00 |
|