Amr Mahdi
|
12b38c0f45
|
[CI/Build] Allow mounting AWS credentials for sccache S3 auth (#35912)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2026-03-03 14:30:47 -08:00 |
|
Tyler Michael Smith
|
eb19955c37
|
[WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-26 14:30:10 -08:00 |
|
Seungmin Kim
|
160424a937
|
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992)
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com>
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 18:15:51 -08:00 |
|
Wei Zhao
|
ea5f903f80
|
Bump Flashinfer Version and Re-enable DeepSeek NVFP4 AR+Norm Fusion (#34899)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-20 13:37:31 -08:00 |
|
zifeitong
|
52181baaea
|
Update DeepGEMM version pin in Dockerfile to match #32479 (#33935)
Signed-off-by: Zifei Tong <zifeitong@gmail.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-07 05:30:22 -08:00 |
|
Dimitrios Bariamis
|
207c3a0c20
|
Fix RoutingMethodType logic (#33919)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-02-06 14:03:34 -08:00 |
|
杨朱 · Kiki
|
a0a984ac2e
|
[CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles (#33553)
Signed-off-by: carlory <baofa.fan@daocloud.io>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-02-02 22:32:39 -08:00 |
|
Dimitrios Bariamis
|
f0bca83ee4
|
Add support for Mistral Large 3 inference with Flashinfer MoE (#33174)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-30 22:48:27 -08:00 |
|
Pengchao Wang
|
2515bbd027
|
[CI/Build][BugFix] fix cuda/compat loading order issue in docker build (#33116)
Signed-off-by: Pengchao Wang <wpc@fb.com>
Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2026-01-29 00:19:05 -08:00 |
|
Orion Reblitz-Richardson
|
68b0a6c1ba
|
[CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests (#30443)
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com>
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-01-23 10:22:56 -08:00 |
|
elvischenv
|
808d6fd7b9
|
Bump Flashinfer to v0.6.1 (#30993)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-01-21 08:49:50 -08:00 |
|
Mritunjay Kumar Sharma
|
9e078d0582
|
[CI/Build][Docker] Add centralized version manifest for Docker builds (#31492)
Signed-off-by: Mritunjay Sharma <mritunjay.sharma@chainguard.dev>
|
2026-01-17 13:45:30 +00:00 |
|
emricksini-h
|
2a60ac91d0
|
[Improvement] Persist CUDA compat libraries paths to prevent reset on apt-get (#30784)
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>
|
2026-01-13 14:35:05 -08:00 |
|
Shang Wang
|
33156f56e0
|
[docker] A follow-up patch to fix #30913: [docker] install cuda13 version of lmcache and nixl (#31775)
Signed-off-by: Shang Wang <shangw@nvidia.com>
|
2026-01-07 23:47:02 -08:00 |
|
Seiji Eicher
|
3c98c2d21b
|
[CI/Build] Allow user to configure NVSHMEM version via ENV or command line (#30732)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-01-05 15:56:08 -08:00 |
|
Qidong Su
|
af1b07b0c5
|
[docker] install cuda13 version of lmcache and nixl (#30913)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
|
2026-01-05 12:50:39 -08:00 |
|
Nick Cao
|
d7e05ac743
|
[docker] Fix downloading sccache on aarch64 platform (#30070)
Signed-off-by: Nick Cao <nickcao@nichi.co>
|
2025-12-23 21:36:33 -08:00 |
|
Amr Mahdi
|
c0a88df7f7
|
[docker] Allow kv_connectors install to fail on arm64 (#30806)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-16 16:41:57 -08:00 |
|
Amr Mahdi
|
ff21a0fc85
|
[docker] Restructure Dockerfile for more efficient and cache-friendly builds (#30626)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-15 18:52:19 -08:00 |
|
Noa Neria
|
6366c098d7
|
Validating Runai Model Streamer Integration with S3 Object Storage (#29320)
Signed-off-by: Noa Neria <noa@run.ai>
|
2025-12-04 18:04:43 +08:00 |
|
Shengqi Chen
|
1109f98288
|
[CI] fix docker image build by specifying merge-base commit id when downloading pre-compiled wheels (#29930)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-03 14:08:19 -08:00 |
|
Amr Mahdi
|
f5d3d93c40
|
[docker] Build CUDA kernels in separate Docker stage for faster rebuilds (#29452)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-03 11:41:53 +00:00 |
|
Benjamin Bartels
|
2d613de9ae
|
[CI/Build] Fixes missing runtime dependencies (#29822)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-12-02 10:21:49 -08:00 |
|
Andrii Skliar
|
a5345bf49d
|
[BugFix] Fix plan API Mismatch when using latest FlashInfer (#29426)
Signed-off-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
Co-authored-by: Andrii Skliar <askliar@askliar-mlt.client.nvidia.com>
|
2025-11-27 11:34:59 -08:00 |
|
Alec
|
c4c0354eec
|
[CI/Build] allow user modify pplx and deepep ref by ENV or command line (#29131)
Signed-off-by: alec-flowers <aflowers@nvidia.com>
|
2025-11-26 17:41:16 +00:00 |
|
汪志鹏
|
7012d8b45e
|
[Docker] Optimize Dockerfile: consolidate apt-get and reduce image size by ~200MB (#29060)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-11-24 19:54:00 -07:00 |
|
Benjamin Bartels
|
4d6afcaddc
|
[CI/Build] Moves to cuda-base runtime image while retaining minimal JIT dependencies (#29270)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
|
2025-11-24 11:40:54 -08:00 |
|
Benjamin Bartels
|
eb5352a770
|
[CI/build] Removes source compilation from runtime image (#26966)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-11-22 10:23:09 -08:00 |
|
Cyrus Leung
|
9452863088
|
Revert "Revert #28875 (#29159)" (#29179)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-21 04:27:43 -08:00 |
|
Cyrus Leung
|
4d7231e774
|
Revert #28875 (#29159)
|
2025-11-21 01:40:17 -08:00 |
|
Qidong Su
|
698024ecce
|
[Doc] update installation guide regarding aarch64+cuda pytorch build (#28875)
Signed-off-by: Qidong Su <soodoshll@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-11-20 19:40:25 -08:00 |
|
Harry Mellor
|
811df41ee9
|
Update Flashinfer from v0.4.1 to v0.5.2 (#27952)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-07 16:24:42 -08:00 |
|
Huy Do
|
ba33e8830d
|
Reapply "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27768)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-30 10:22:30 -07:00 |
|
Benjamin Bartels
|
17d055f527
|
[Feat] Adds runai distributed streamer (#27230)
Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: omer-dayan <omdayan@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-29 21:09:10 -07:00 |
|
Simon Mo
|
9007bf57e6
|
Revert "Install pre-built xformers-0.0.32.post2 built with pt-2.9.0" (#27714)
|
2025-10-28 20:58:01 -07:00 |
|
Huy Do
|
f257544709
|
Install pre-built xformers-0.0.32.post2 built with pt-2.9.0 (#27598)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-28 19:39:15 -07:00 |
|
Huy Do
|
becb7de40b
|
Update PyTorch to 2.9.0+cu129 (#24994)
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-21 17:20:18 -04:00 |
|
Harry Mellor
|
bd66b8529b
|
[CI] Install pre-release version of apache-tvm-ffi for flashinfer (#27262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-21 14:23:56 +00:00 |
|
jiahanc
|
41d3071918
|
[NVIDIA] [Perf] Update to leverage flashinfer trtllm FP4 MOE throughput kernel (#26714)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-10-16 16:20:25 -07:00 |
|
Michael Goin
|
04b5f9802d
|
[CI] Raise VLLM_MAX_SIZE_MB to 500 due to failing Build wheel - CUDA 12.9 (#26722)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-10-14 10:52:05 -07:00 |
|
Michael Goin
|
c9d33c60dc
|
[UX] Add FlashInfer as default CUDA dependency (#26443)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-09 14:10:02 -07:00 |
|
elvischenv
|
5e49c3e777
|
Bump Flashinfer to v0.4.0 (#26326)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-10-08 23:58:44 -07:00 |
|
pwschuurman
|
0d7c3cb51d
|
Update Dockerfile and install runai-model-streamer[gcs] package (#26464)
Signed-off-by: Peter Schuurman <psch@google.com>
|
2025-10-08 23:48:51 -07:00 |
|
Simon Mo
|
8229280a9c
|
[Misc] Define EP kernel arch list in Dockerfile (#25635)
Signed-off-by: Simon Mo <simon.mo@hey.com>
|
2025-10-07 00:05:33 +00:00 |
|
Tyler Michael Smith
|
27edd2aeb4
|
[Build/CI] Revert back to Ubuntu 20.04, install python 3.12 with uv (#26103)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-10-02 22:21:01 -07:00 |
|
Cyrus Leung
|
d00d652998
|
[CI/Build] Replace vllm.entrypoints.openai.api_server entrypoint with vllm serve command (#25967)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-02 10:04:57 -07:00 |
|
Huy Do
|
d4e7a1152d
|
Update base image to 22.04 (jammy) (#26065)
Signed-off-by: Huy Do <huydhn@gmail.com>
|
2025-10-02 05:48:04 -07:00 |
|
youkaichao
|
9360d34fa1
|
update to latest deepgemm for dsv3.2 (#25871)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-09-29 17:51:43 +08:00 |
|
Clayton Coleman
|
5546acb463
|
[Bug]: Set LD_LIBRARY_PATH to include the 'standard' CUDA location (#25766)
Signed-off-by: Clayton Coleman <smarterclayton@gmail.com>
|
2025-09-27 13:36:28 -04:00 |
|
Cyrus Leung
|
d346ec695e
|
[CI/Build] Consolidate model loader tests and requirements (#25765)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-09-26 21:45:20 -07:00 |
|