yzds
|
ac201a0eaf
|
[Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-09-06 13:24:05 +08:00 |
|
Rafael Vasquez
|
c954c6629c
|
[CI] Add timeouts to tests (#24260)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-09-05 17:26:22 -07:00 |
|
elvischenv
|
adc3ddb430
|
[Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-09-04 14:25:45 -07:00 |
|
Matthew Bonanni
|
2fd1a40a54
|
[CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-09-02 16:50:28 -07:00 |
|
Christian Pinto
|
1cb39dbcdd
|
[Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-08-31 23:07:12 -07:00 |
|
Isotr0py
|
ff0e59d83a
|
[CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-08-31 22:52:20 -07:00 |
|
Huy Do
|
67c14906aa
|
Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-29 18:57:35 +08:00 |
|
Jee Jee Li
|
b4f9e9631c
|
[CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-08-28 23:28:35 -07:00 |
|
elvischenv
|
16a45b3a28
|
[NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
|
2025-08-28 19:36:50 +00:00 |
|
Jean Schmidt
|
0583578f42
|
[ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
|
2025-08-28 08:59:19 -07:00 |
|
Alex
|
f48a9af892
|
[CI] make all multi-gpu weight loading tests run nightly (#23792)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
|
2025-08-27 21:27:36 -07:00 |
|
nvjullin
|
f66673a39d
|
[Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-26 06:54:04 -07:00 |
|
Michael Goin
|
906e461ed6
|
[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-25 18:29:00 -07:00 |
|
Zhewen Li
|
0483fabc74
|
[CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-08-22 13:34:40 -07:00 |
|
22quinn
|
480bdf5a7b
|
[Core] Support custom executor qualname (#23314)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-22 09:40:54 +08:00 |
|
Lain
|
f8ce022948
|
add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-21 17:05:47 +00:00 |
|
Cyrus Leung
|
2461d9e562
|
[CI/Build] Split out mm processor tests (#23260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-20 20:05:20 -07:00 |
|
Yong Hoon Shin
|
dfd2382039
|
[torch.compile] Support conditional torch.compile per module (#22269)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
|
2025-08-20 16:52:59 +00:00 |
|
Michael Goin
|
0f4f0191d8
|
[CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-19 15:07:30 -07:00 |
|
amirkl94
|
a38b8af4c3
|
[NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
|
2025-08-19 18:01:53 -04:00 |
|
22quinn
|
f7cf5b512e
|
[Frontend] Add /collective_rpc API endpoint (#23075)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-19 17:29:32 +00:00 |
|
elvischenv
|
03752dba8f
|
[NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-08-19 08:22:15 -04:00 |
|
Robert Shaw
|
6603288736
|
[CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests (#22871)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-08-18 17:39:01 -07:00 |
|
afeldman-nm
|
bf7f470b22
|
[V1] Logits processors extensibility (#19912)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-16 12:59:17 -07:00 |
|
bnellnm
|
8ad7285ea2
|
[Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-08-15 14:46:00 -04:00 |
|
Harry Mellor
|
e8b40c7fa2
|
[CI] Remove duplicated docs build from buildkite (#22924)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-08-15 05:58:06 -07:00 |
|
nvjullin
|
279a5f31b3
|
[Kernel] Add nvfp4 gemm flashinfer backends (#22346)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-08-14 16:03:55 -04:00 |
|
Woosuk Kwon
|
71683ca6f6
|
[V0 Deprecation] Remove multi-step scheduling (#22138)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
|
2025-08-12 20:18:39 -07:00 |
|
Cyrus Leung
|
ebf7605b0d
|
[Misc] Move tensor schema tests (#22612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-08-11 00:15:27 -07:00 |
|
22quinn
|
b799f4b9ea
|
[CI/Build] Fix tensorizer test for load_format change (#22583)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
|
2025-08-10 19:30:00 -07:00 |
|
Thomas Parnell
|
8a0ffd6285
|
Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-08-08 23:05:32 -07:00 |
|
elvischenv
|
83156c7b89
|
[NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2025-08-05 02:45:34 -07:00 |
|
lkchen
|
f4f4e7ef27
|
[V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-08-04 19:11:33 -07:00 |
|
Isotr0py
|
3dddbf1f25
|
[Misc] Add tensor schema test coverage for multimodal models (#21754)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-08-03 00:52:14 -07:00 |
|
Michael Goin
|
88faa466d7
|
[CI] Initial tests for SM100 Blackwell runner (#21877)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-08-01 16:18:38 -07:00 |
|
Charent
|
ad57f23f6a
|
[Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873)
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
|
2025-07-31 19:48:13 -07:00 |
|
Ilya Markov
|
6e672daf62
|
Add FlashInfer allreduce RMSNorm Quant fusion (#21069)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-31 13:58:38 -07:00 |
|
Alexei-V-Ivanov-AMD
|
0780bb5783
|
Removing amdproduction Tests (#22027)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-07-31 09:53:27 -07:00 |
|
Simon Mo
|
452b2a3180
|
[ci] mark blackwell test optional for now (#21878)
|
2025-07-29 18:03:27 -07:00 |
|
Simon Mo
|
0d0cc9e150
|
[ci] add b200 test placeholder (#21866)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-07-29 17:11:50 -07:00 |
|
Reza Barazesh
|
37efc63b64
|
[V0 deprecation] Guided decoding (#21347)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-07-29 03:15:30 -07:00 |
|
Michael Goin
|
afa2607596
|
[CI] Parallelize Kernels MoE Test (#21764)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-28 18:56:24 -07:00 |
|
Robert Shaw
|
d5b981f8b1
|
[DP] Internal Load Balancing Per Node [one-pod-per-node] (#21238)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-23 20:57:32 -07:00 |
|
Ming Yang
|
772ce5af97
|
[Misc] Add dummy maverick test to CI (#21324)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-07-23 20:22:42 -07:00 |
|
Nick Hill
|
316b1bf706
|
[Tests] Add tests for headless internal DP LB (#21450)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-07-23 07:49:25 -07:00 |
|
Alexei-V-Ivanov-AMD
|
107111a859
|
Changing "amdproduction" allocation. (#21409)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
|
2025-07-22 20:48:31 -07:00 |
|
Cyrus Leung
|
c401c64b4c
|
[CI/Build] Fix model executor tests (#21387)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-22 20:25:37 -07:00 |
|
Michael Goin
|
005ae9be6c
|
Fix bad lm-eval fork (#21318)
|
2025-07-21 10:47:51 -07:00 |
|
Seiji Eicher
|
d1fb65bde3
|
Enable v1 metrics tests (#20953)
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2025-07-20 03:22:02 +00:00 |
|
Woosuk Kwon
|
dd572c0ab3
|
[V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-07-18 21:47:50 -07:00 |
|