Cyrus Leung
|
1de76a0e55
|
[CI/Build] Test VLM embeddings (#9406)
|
2024-10-16 09:44:30 +00:00 |
|
Daniele
|
203ab8f80f
|
[CI/Build] setuptools-scm fixes (#8900)
|
2024-10-14 11:34:47 -07:00 |
|
Tyler Michael Smith
|
7342a7d7f8
|
[Model] Support Mamba (#6484)
|
2024-10-11 15:40:06 +00:00 |
|
Kevin H. Luu
|
a78c6ba7c8
|
[ci/build] Add placeholder command for custom models test (#9262)
|
2024-10-10 15:45:09 -07:00 |
|
youkaichao
|
e4d652ea3e
|
[torch.compile] integration with compilation control (#9058)
|
2024-10-10 12:39:36 -07:00 |
|
sroy745
|
f3a507f1d3
|
[Core] Add an environment variable which needs to be set explicitly to allow BlockSpaceManagerV1 (#9149)
|
2024-10-10 14:17:17 +08:00 |
|
Li, Jiang
|
ca77dd7a44
|
[Hardware][CPU] Support AWQ for CPU backend (#7515)
|
2024-10-09 10:28:08 -06:00 |
|
youkaichao
|
c8627cd41b
|
[ci][test] use load dummy for testing (#9165)
|
2024-10-09 00:38:40 -07:00 |
|
Michael Goin
|
9ba0bd6aa6
|
Add lm-eval directly to requirements-test.txt (#9161)
|
2024-10-08 18:22:31 -07:00 |
|
Isotr0py
|
4f95ffee6f
|
[Hardware][CPU] Cross-attention and Encoder-Decoder models support on CPU backend (#9089)
|
2024-10-07 06:50:35 +00:00 |
|
Kuntai Du
|
fbb74420e7
|
[CI] Update performance benchmark: upgrade trt-llm to r24.07, and add SGLang (#7412)
|
2024-10-04 14:01:44 -07:00 |
|
Murali Andoorveedu
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
Lily Liu
|
1570203864
|
[Spec Decode] (1/2) Remove batch expansion (#8839)
|
2024-10-01 16:04:42 -07:00 |
|
Lily Liu
|
bce324487a
|
[CI][SpecDecode] Fix spec decode tests, use flash attention backend for spec decode CI tests. (#8975)
|
2024-10-01 00:51:40 +00:00 |
|
Kevin H. Luu
|
1425a1bcf9
|
[ci] Add CODEOWNERS for test directories (#8795)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-10-01 00:47:08 +00:00 |
|
Cyrus Leung
|
e1a3f5e831
|
[CI/Build] Update models tests & examples (#8874)
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-09-28 09:54:35 -07:00 |
|
Tyler Titsworth
|
260024a374
|
[Bugfix][Intel] Fix XPU Dockerfile Build (#7824)
Signed-off-by: tylertitsworth <tyler.titsworth@intel.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-27 23:45:50 -07:00 |
|
youkaichao
|
d86f6b2afb
|
[misc] fix wheel name (#8919)
|
2024-09-27 22:10:44 -07:00 |
|
Luka Govedič
|
172d1cd276
|
[Kernel] AQ AZP 4/4: Integrate asymmetric quantization to linear method (#7271)
|
2024-09-27 14:25:10 -04:00 |
|
Nick Hill
|
4b377d6feb
|
[BugFix] Fix test breakages from transformers 4.45 upgrade (#8829)
|
2024-09-26 16:46:43 -07:00 |
|
Kevin H. Luu
|
d9cfbc891e
|
[ci] Soft fail Entrypoints, Samplers, LoRA, Decoder-only VLM (#8872)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-26 15:02:16 -07:00 |
|
bnellnm
|
300da09177
|
[Kernel] Fullgraph and opcheck tests (#8479)
|
2024-09-25 08:35:52 -06:00 |
|
Hongxia Yang
|
1c046447a6
|
[CI/Build][Bugfix][Doc][ROCm] CI fix and doc update after ROCm 6.2 upgrade (#8777)
|
2024-09-25 22:26:37 +08:00 |
|
Alexey Kondratiev(AMD)
|
2940afa04e
|
[CI/Build] Removing entrypoints/openai/test_embedding.py test from ROCm build (#8670)
|
2024-09-20 10:27:44 -07:00 |
|
Alexey Kondratiev(AMD)
|
6cb748e190
|
[CI/Build] Re-enabling Entrypoints tests on ROCm, excluding ones that fail (#8551)
|
2024-09-19 13:06:32 -07:00 |
|
Alexander Matveev
|
7c7714d856
|
[Core][Bugfix][Perf] Introduce MQLLMEngine to avoid asyncio OH (#8157)
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-neuralmagic@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-18 13:56:58 +00:00 |
|
Alexey Kondratiev(AMD)
|
09deb4721f
|
[CI/Build] Excluding kernels/test_gguf.py from ROCm (#8520)
|
2024-09-17 16:40:29 -07:00 |
|
sroy745
|
1009e93c5d
|
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631)
|
2024-09-17 07:35:01 -07:00 |
|
youkaichao
|
99aa4eddaf
|
[torch.compile] register allreduce operations as custom ops (#8526)
|
2024-09-16 22:57:57 -07:00 |
|
Simon Mo
|
5478c4b41f
|
[perf bench] set timeout to debug hanging (#8516)
|
2024-09-16 14:30:02 -07:00 |
|
ywfang
|
8a0cf1ddc3
|
[Model] support minicpm3 (#8297)
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-09-14 14:50:26 +00:00 |
|
Cyrus Leung
|
a84e598e21
|
[CI/Build] Reorganize models tests (#7820)
|
2024-09-13 10:20:06 -07:00 |
|
Nick Hill
|
551ce01078
|
[Core] Add engine option to return only deltas or final output (#7381)
|
2024-09-12 12:02:00 -07:00 |
|
Joe Runde
|
f2e263b801
|
[Bugfix] Offline mode fix (#8376)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-09-12 11:11:57 -07:00 |
|
Lily Liu
|
775f00f81e
|
[Speculative Decoding] Test refactor (#8317)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-11 14:07:34 -07:00 |
|
Alexey Kondratiev(AMD)
|
aea02f30de
|
[CI/Build] Excluding test_moe.py from AMD Kernels tests for investigation (#8373)
|
2024-09-11 18:31:41 +00:00 |
|
Li, Jiang
|
0b952af458
|
[Hardware][Intel] Support compressed-tensor W8A8 for CPU backend (#7257)
|
2024-09-11 09:46:46 -07:00 |
|
sumitd2
|
02751a7a42
|
Fix ppc64le buildkite job (#8309)
|
2024-09-10 12:58:34 -07:00 |
|
Alexey Kondratiev(AMD)
|
f421f3cefb
|
[CI/Build] Enabling kernels tests for AMD, ignoring some of then that fail (#8130)
|
2024-09-10 11:51:15 -07:00 |
|
Dipika Sikka
|
6cd5e5b07e
|
[Misc] Fused MoE Marlin support for GPTQ (#8217)
|
2024-09-09 23:02:52 -04:00 |
|
sumitd2
|
b962ee1470
|
ppc64le: Dockerfile fixed, and a script for buildkite (#8026)
|
2024-09-07 11:18:40 -07:00 |
|
Cyrus Leung
|
288a938872
|
[Doc] Indicate more information about supported modalities (#8181)
|
2024-09-05 10:51:53 +00:00 |
|
Kevin H. Luu
|
ba262c4e5a
|
[ci] Mark LoRA test as soft-fail (#8160)
Signed-off-by: kevin <kevin@anyscale.com>
|
2024-09-04 20:33:12 -07:00 |
|
Kyle Mistele
|
e02ce498be
|
[Feature] OpenAI-Compatible Tools API + Streaming for Hermes & Mistral models (#5649)
Co-authored-by: constellate <constellate@1-ai-appserver-staging.codereach.com>
Co-authored-by: Kyle Mistele <kyle@constellate.ai>
|
2024-09-04 13:18:13 -07:00 |
|
alexeykondrat
|
d1dec64243
|
[CI/Build][ROCm] Enabling LoRA tests on ROCm (#7369)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2024-09-04 11:57:54 -07:00 |
|
Cody Yu
|
2ad2e5608e
|
[MISC] Consolidate FP8 kv-cache tests (#8131)
|
2024-09-04 18:53:25 +00:00 |
|
TimWang
|
ccd7207191
|
chore: Update check-wheel-size.py to read MAX_SIZE_MB from env (#8103)
|
2024-09-03 23:17:05 -07:00 |
|
Roger Wang
|
5231f0898e
|
[Frontend][VLM] Add support for multiple multi-modal items (#8049)
|
2024-08-31 16:35:53 -07:00 |
|
Michael Goin
|
af59df0a10
|
Remove faulty Meta-Llama-3-8B-Instruct-FP8.yaml lm-eval test (#7961)
|
2024-08-28 19:19:17 -04:00 |
|
youkaichao
|
ce6bf3a2cf
|
[torch.compile] avoid Dynamo guard evaluation overhead (#7898)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-08-28 16:10:12 -07:00 |
|