Richard Liu
|
5ed5d5f128
|
Build tpu image in release pipeline (#10936)
Signed-off-by: Richard Liu <ricliu@google.com>
Co-authored-by: Kevin H. Luu <kevin@anyscale.com>
|
2024-12-09 23:07:48 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Jee Jee Li
|
acf092d348
|
[Bugfix] Fix test-pipeline.yaml (#10973)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-07 12:08:54 +08:00 |
|
youkaichao
|
9743d64e4e
|
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 08:54:47 -08:00 |
|
Kevin H. Luu
|
7883c2bbe7
|
[benchmark] Make H100 benchmark optional (#10908)
|
2024-12-04 17:02:17 -08:00 |
|
Kevin H. Luu
|
c92acb9693
|
[ci/build] Update vLLM postmerge ECR repo (#10887)
|
2024-12-04 09:01:20 +00:00 |
|
Kevin H. Luu
|
c9ca4fce3f
|
[ci/build] Job to build and push release image (#10877)
|
2024-12-04 15:02:40 +08:00 |
|
Kevin H. Luu
|
fa2dea61df
|
[ci/build] Change queue name for Release jobs (#10875)
|
2024-12-04 15:02:16 +08:00 |
|
Yan Ma
|
2f2cdc745a
|
[MISC][XPU] quick fix for XPU CI (#10859)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2024-12-03 17:16:31 +00:00 |
|
Jee Jee Li
|
a4cf256159
|
[Bugfix] Fix QKVParallelLinearWithShardedLora bias bug (#10844)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-03 12:10:29 +08:00 |
|
Yan Ma
|
519cc6ca12
|
[Misc][XPU] Avoid torch compile for XPU platform (#10747)
Signed-off-by: yan ma <yan.ma@intel.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-12-02 17:53:55 +00:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
youkaichao
|
eda2b3589c
|
Revert "Print running script to enhance CI log readability" (#10601)
|
2024-11-23 21:31:47 -08:00 |
|
Jee Jee Li
|
1c445dca51
|
[CI/Build] Print running script to enhance CI log readability (#10594)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-11-24 03:57:13 +00:00 |
|
Jee Jee Li
|
1700c543a5
|
[Bugfix] Fix LoRA weight sharding (#10450)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-11-23 17:23:17 -08:00 |
|
Nishidha
|
651f6c31ac
|
For ppc64le, disabled tests for now and addressed space issues (#10538)
|
2024-11-23 09:33:53 +00:00 |
|
kliuae
|
7c25fe45a6
|
[AMD] Add support for GGUF quantization on ROCm (#10254)
|
2024-11-22 21:14:49 -08:00 |
|
Simon Mo
|
aed074860a
|
[Benchmark] Add new H100 machine (#10547)
|
2024-11-21 18:27:20 -08:00 |
|
Yunmeng
|
edec3385b6
|
[CI][Installation] Avoid uploading CUDA 11.8 wheel (#10535)
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: simon-mo <simon.mo@hey.com>
|
2024-11-21 13:03:58 -08:00 |
|
youkaichao
|
388ee3de66
|
[torch.compile] limit inductor threads and lazy import quant (#10482)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-20 18:36:33 -08:00 |
|
Simon Mo
|
5f1d6af2b6
|
[perf bench] H200 development (#9768)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-11-20 11:06:56 -08:00 |
|
Li, Jiang
|
63f1fde277
|
[Hardware][CPU] Support chunked-prefill and prefix-caching on CPU (#10355)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-20 10:57:39 +00:00 |
|
Kevin H. Luu
|
ed701ca963
|
[ci/build] Combine nightly and optional (#10465)
|
2024-11-19 21:36:03 -08:00 |
|
Yuan
|
b4614656b8
|
[CI][CPU] adding numa node number as container name suffix (#10441)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-11-19 13:16:43 +00:00 |
|
Chendi.Xue
|
905d0f0af4
|
[CI/Build] Fix IDC hpu [Device not found] issue (#10384)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2024-11-17 14:58:22 +08:00 |
|
Simon Mo
|
02dbf30e9a
|
[Build] skip renaming files for release wheels pipeline (#9671)
Create Release / Create Release (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (11.8, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.10, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.11, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.12, 2.4.0) (push) Has been cancelled
Create Release / Build Wheel (12.1, ubuntu-20.04, 3.9, 2.4.0) (push) Has been cancelled
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2024-11-14 23:31:52 -08:00 |
|
Cyrus Leung
|
b40cf6402e
|
[Model] Support Qwen2 embeddings and use tags to select model tests (#10184)
|
2024-11-14 20:23:09 -08:00 |
|
Cyrus Leung
|
972112d82f
|
[Bugfix] Fix unable to load some models (#10312)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-14 16:55:54 -08:00 |
|
Cyrus Leung
|
675d603400
|
[CI/Build] Make shellcheck happy (#10285)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-14 09:47:53 +00:00 |
|
Isotr0py
|
03025c023f
|
[CI/Build] Fix CPU CI online inference timeout (#10314)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-14 16:45:32 +08:00 |
|
Yuan
|
d201d41973
|
[CI][CPU]refactor CPU tests to allow to bind with different cores (#10222)
Signed-off-by: Yuan Zhou <yuan.zhou@intel.com>
|
2024-11-12 10:07:32 +00:00 |
|
Robert Shaw
|
6ace6fba2c
|
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-11 23:05:38 +00:00 |
|
Isotr0py
|
2cebda42bb
|
[Bugfix][Hardware][CPU] Fix broken encoder-decoder CPU runner (#10218)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-11 12:37:58 +00:00 |
|
Isotr0py
|
58170d6503
|
[Hardware][CPU] Add embedding models support for CPU backend (#10193)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-11 08:54:28 +00:00 |
|
Cyrus Leung
|
51c2e1fcef
|
[CI/Build] Split up models tests (#10069)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-09 11:39:14 -08:00 |
|
Chendi.Xue
|
8e1529dc57
|
[CI/Build] Add run-hpu-test.sh script (#10167)
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
|
2024-11-09 06:26:52 +00:00 |
|
Li, Jiang
|
d7edca1dee
|
[CI/Build] Adding timeout in CPU CI to avoid CPU test queue blocking (#6892)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-09 03:27:11 +00:00 |
|
Cyrus Leung
|
b489fc3c91
|
[CI/Build] Update CPU tests to include all "standard" tests (#5481)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-08 23:30:04 +08:00 |
|
Russell Bryant
|
3be5b26a76
|
[CI/Build] Add shell script linting using shellcheck (#7925)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-07 18:17:29 +00:00 |
|
Li, Jiang
|
a4b3e0c1e9
|
[Hardware][CPU] Update torch 2.5 (#9911)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2024-11-07 04:43:08 +00:00 |
|
youkaichao
|
719c1ca468
|
[core][distributed] add stateless_init_process_group (#10072)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-06 16:42:09 -08:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
youkaichao
|
4be3a45158
|
[distributed] add function to create ipc buffers directly (#10064)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-05 22:35:03 -08:00 |
|
Peter Salas
|
ffc0f2b47a
|
[Model][OpenVINO] Fix regressions from #8346 (#10045)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2024-11-06 04:19:15 +00:00 |
|
Michael Goin
|
02462465ea
|
[CI] Prune tests/models/decoder_only/language/* tests (#9940)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-05 16:02:23 -05:00 |
|
hissu-hyvarinen
|
5208dc7a20
|
[Bugfix][CI/Build][Hardware][AMD] Shard ID parameters in AMD tests running parallel jobs (#9279)
Signed-off-by: Hissu Hyvarinen <hissu.hyvarinen@amd.com>
|
2024-11-04 11:37:46 -08:00 |
|
Robert Shaw
|
1c45f4c385
|
[CI] Basic Integration Test For TPU (#9968)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
|
2024-11-04 11:34:26 -08:00 |
|
Alexei-V-Ivanov-AMD
|
77f7ef2908
|
[CI/Build] Adding a forced docker system prune to clean up space (#9849)
|
2024-11-01 01:02:58 +08:00 |
|