Shang Wang
|
33156f56e0
|
[docker] A follow-up patch to fix #30913: [docker] install cuda13 version of lmcache and nixl (#31775)
Signed-off-by: Shang Wang <shangw@nvidia.com>
|
2026-01-07 23:47:02 -08:00 |
|
Seiji Eicher
|
3c98c2d21b
|
[CI/Build] Allow user to configure NVSHMEM version via ENV or command line (#30732)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-05 15:56:08 -08:00 |
|
Qidong Su
|
af1b07b0c5
|
[docker] install cuda13 version of lmcache and nixl (#30913)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
|
2026-01-05 12:50:39 -08:00 |
|
Nick Cao
|
d7e05ac743
|
[docker] Fix downloading sccache on aarch64 platform (#30070)
Signed-off-by: Nick Cao <nickcao@nichi.co>
|
2025-12-23 21:36:33 -08:00 |
|
Amr Mahdi
|
c0a88df7f7
|
[docker] Allow kv_connectors install to fail on arm64 (#30806)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-16 16:41:57 -08:00 |
|
Amr Mahdi
|
ff21a0fc85
|
[docker] Restructure Dockerfile for more efficient and cache-friendly builds (#30626)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-15 18:52:19 -08:00 |
|
Noa Neria
|
6366c098d7
|
Validating Runai Model Streamer Integration with S3 Object Storage (#29320)
Signed-off-by: Noa Neria <noa@run.ai>
|
2025-12-04 18:04:43 +08:00 |
|
Shengqi Chen
|
1109f98288
|
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-03 14:08:19 -08:00 |
|
Amr Mahdi
|
f5d3d93c40
|
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-03 11:41:53 +00:00 |
|
Benjamin Bartels
|
2d613de9ae
|
[CI/Build] Fixes missing runtime dependencies (#29822)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-12-02 10:21:49 -08:00 |
|
Andrii Skliar
|
a5345bf49d
|
[BugFix] Fix plan API Mismatch when using latest FlashInfer (#29426)
Signed-off-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
Co-authored-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
|
2025-11-27 11:34:59 -08:00 |
|
Alec
|
c4c0354eec
|
[CI/Build] allow user modify pplx and deepep ref by ENV or command line (#29131)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
|
2025-11-26 17:41:16 +00:00 |
|
汪志鹏
|
7012d8b45e
|
[Docker] Optimize Dockerfile: consolidate apt-get and reduce image size by ~200MB (#29060)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-11-24 19:54:00 -07:00 |
|
Benjamin Bartels
|
4d6afcaddc
|
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies (#29270)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
|
2025-11-24 11:40:54 -08:00 |
|
Benjamin Bartels
|
eb5352a770
|
[CI/build] Removes source compilation from runtime image (#26966)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-11-22 10:23:09 -08:00 |
|
Cyrus Leung
|
9452863088
|
Revert "Revert #28875 (#29159)" (#29179)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 04:27:43 -08:00 |
|
Cyrus Leung
|
4d7231e774
|
Revert #28875 (#29159)
|
2025-11-21 01:40:17 -08:00 |
|
Qidong Su
|
698024ecce
|
[Doc] update installation guide regarding aarch64+cuda pytorch build (#28875)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-20 19:40:25 -08:00 |
|
Harry Mellor
|
811df41ee9
|
Update Flashinfer from v0.4.1 to v0.5.2 (#27952)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 16:24:42 -08:00 |
|
Huy Do
|
ba33e8830d
|
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27768)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-30 10:22:30 -07:00 |
|
Benjamin Bartels
|
17d055f527
|
[Feat] Adds runai distributed streamer (#27230)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: omer-dayan <omdayan@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-29 21:09:10 -07:00 |
|
Simon Mo
|
9007bf57e6
|
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27714)
|
2025-10-28 20:58:01 -07:00 |
|
Huy Do
|
f257544709
|
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 (#27598)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-28 19:39:15 -07:00 |
|
Huy Do
|
becb7de40b
|
Update PyTorch to 2.9.0+cu129 (#24994)
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-21 17:20:18 -04:00 |
|
Harry Mellor
|
bd66b8529b
|
[CI] Install pre-release version of apache-tvm-ffi for flashinfer (#27262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-21 14:23:56 +00:00 |
|
jiahanc
|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
Michael Goin
|
04b5f9802d
|
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 10:52:05 -07:00 |
|
Michael Goin
|
c9d33c60dc
|
[UX] Add FlashInfer as default CUDA dependency (#26443)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-09 14:10:02 -07:00 |
|
elvischenv
|
5e49c3e777
|
Bump Flashinfer to v0.4.0 (#26326)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-10-08 23:58:44 -07:00 |
|
pwschuurman
|
0d7c3cb51d
|
Update Dockerfile and install runai-model-streamer[gcs] package (#26464)
Signed-off-by: Peter Schuurman <psch@google.com>
|
2025-10-08 23:48:51 -07:00 |
|
Simon Mo
|
8229280a9c
|
[Misc] Define EP kernel arch list in Dockerfile (#25635)
Signed-off-by: Simon Mo <simon.mo@hey.com>
|
2025-10-07 00:05:33 +00:00 |
|
Tyler Michael Smith
|
27edd2aeb4
|
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-10-02 22:21:01 -07:00 |
|
Cyrus Leung
|
d00d652998
|
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 10:04:57 -07:00 |
|
Huy Do
|
d4e7a1152d
|
Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-02 05:48:04 -07:00 |
|
youkaichao
|
9360d34fa1
|
update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-29 17:51:43 +08:00 |
|
Clayton Coleman
|
5546acb463
|
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
|
2025-09-27 13:36:28 -04:00 |
|
Cyrus Leung
|
d346ec695e
|
[CI/Build] Consolidate model loader tests and requirements (#25765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 21:45:20 -07:00 |
|
Michael Goin
|
92da847cf5
|
Add flashinfer-build.sh and register precompiled cu128 wheel in Dockerfile (#25782)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 18:54:09 -07:00 |
|
Michael Goin
|
cf89202855
|
[CI] Fix FlashInfer AOT in release docker image (#25730)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-09-26 14:11:40 -07:00 |
|
Benjamin Bartels
|
64ad551878
|
Removes source compilation of nixl dependency (#24874)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniele <36171005+dtrifiro@users.noreply.github.com>
|
2025-09-17 01:33:18 +00:00 |
|
Simon Mo
|
fd2f10546c
|
[ci] fix wheel names for arm wheels (#24898)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-09-15 14:39:08 -07:00 |
|
Benjamin Bartels
|
94b03f88dd
|
Bump Flashinfer to 0.3.1 (#24868)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-09-15 12:45:55 -07:00 |
|
Daniele
|
2f5e5c18de
|
[CI/Build] bump timm dependency (#24189)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
|
2025-09-10 06:20:59 -07:00 |
|
Po-Han Huang (NVIDIA)
|
78336a0c3e
|
Upgrade FlashInfer to v0.3.0 (#24086)
Signed-off-by: Po-Han Huang <pohanh@nvidia.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-09-04 09:49:20 -07:00 |
|
Lucas Wilkinson
|
402759d472
|
[Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-04 02:47:59 -07:00 |
|
dongbo910220
|
4ba0c587ba
|
FIX: Add libnuma-dev to Dockerfile for dev stage (#20388)
Signed-off-by: dongbo910220 <1275604947@qq.com>
|
2025-09-03 07:17:20 -07:00 |
|
Jee Jee Li
|
dc1a53186d
|
[Kernel] Update DeepGEMM to latest commit (#23915)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-09-01 02:38:04 -07:00 |
|
weiliang
|
ae067888d6
|
Update Flashinfer to 0.2.14.post1 (#23537)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Weiliang Liu <weiliangl@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-08-25 18:30:44 -07:00 |
|
Michael Goin
|
f6818a92cb
|
[UX] Move Dockerfile DeepGEMM install to tools/install_deepgemm.sh (#23360)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-22 20:52:50 -06:00 |
|
Zhewen Li
|
0483fabc74
|
[CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-22 13:34:40 -07:00 |
|