Commit Graph

359 Commits

Author SHA1 Message Date
yzds
ac201a0eaf [Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-06 13:24:05 +08:00
Rafael Vasquez
c954c6629c [CI] Add timeouts to tests (#24260)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-09-05 17:26:22 -07:00
elvischenv
adc3ddb430 [Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-04 14:25:45 -07:00
Matthew Bonanni
2fd1a40a54 [CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-02 16:50:28 -07:00
Christian Pinto
1cb39dbcdd [Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
Isotr0py
ff0e59d83a [CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-31 22:52:20 -07:00
Huy Do
67c14906aa Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-29 18:57:35 +08:00
Jee Jee Li
b4f9e9631c [CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-28 23:28:35 -07:00
elvischenv
16a45b3a28 [NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
Jean Schmidt
0583578f42 [ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
2025-08-28 08:59:19 -07:00
Alex
f48a9af892 [CI] make all multi-gpu weight loading tests run nightly (#23792)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
2025-08-27 21:27:36 -07:00
nvjullin
f66673a39d [Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-26 06:54:04 -07:00
Michael Goin
906e461ed6 [CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-25 18:29:00 -07:00
Zhewen Li
0483fabc74 [CI/Build] add EP dependencies to docker (#21976)
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-08-22 13:34:40 -07:00
22quinn
480bdf5a7b [Core] Support custom executor qualname (#23314)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-22 09:40:54 +08:00
Lain
f8ce022948 add tg-mxfp4-moe-test (#22540)
Signed-off-by: siyuanf <siyuanf@nvidia.com>
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-21 17:05:47 +00:00
Cyrus Leung
2461d9e562 [CI/Build] Split out mm processor tests (#23260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-20 20:05:20 -07:00
Yong Hoon Shin
dfd2382039 [torch.compile] Support conditional torch.compile per module (#22269)
Signed-off-by: Yong Hoon Shin <yhshin@meta.com>
2025-08-20 16:52:59 +00:00
Michael Goin
0f4f0191d8 [CI/Build] Replace lm-eval gsm8k tests with faster implementation (#23002)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-19 15:07:30 -07:00
amirkl94
a38b8af4c3 [NVIDIA] Add SM100 Flashinfer Cutlass MoE fp8 backend (#22357)
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
2025-08-19 18:01:53 -04:00
22quinn
f7cf5b512e [Frontend] Add /collective_rpc API endpoint (#23075)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-19 17:29:32 +00:00
elvischenv
03752dba8f [NVIDIA] Support Flashinfer TRTLLM FP8-q/kv/out Attention Kernel (#21716)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-08-19 08:22:15 -04:00
Robert Shaw
6603288736 [CI][V0 Deprecation] Removed V0 Only Chunked Prefill and Prefix Caching Tests (#22871)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-08-18 17:39:01 -07:00
afeldman-nm
bf7f470b22 [V1] Logits processors extensibility (#19912)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
Signed-off-by: Andrew Feldman <afeld2012@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Andrew Feldman <afeld2012@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-16 12:59:17 -07:00
bnellnm
8ad7285ea2 [Kernels] Clean up FusedMoeMethodBase and modular kernel setup. Remove extra arguments from modular kernel methods. (#22035)
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-15 14:46:00 -04:00
Harry Mellor
e8b40c7fa2 [CI] Remove duplicated docs build from buildkite (#22924)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-08-15 05:58:06 -07:00
nvjullin
279a5f31b3 [Kernel] Add nvfp4 gemm flashinfer backends (#22346)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
2025-08-14 16:03:55 -04:00
Woosuk Kwon
71683ca6f6 [V0 Deprecation] Remove multi-step scheduling (#22138)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-08-12 20:18:39 -07:00
Cyrus Leung
ebf7605b0d [Misc] Move tensor schema tests (#22612)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-08-11 00:15:27 -07:00
22quinn
b799f4b9ea [CI/Build] Fix tensorizer test for load_format change (#22583)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-08-10 19:30:00 -07:00
Thomas Parnell
8a0ffd6285 Remove mamba_ssm from vLLM requirements; install inside test container using --no-build-isolation (#22541)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
2025-08-08 23:05:32 -07:00
elvischenv
83156c7b89 [NVIDIA] Support Flashinfer TRT-LLM Prefill Attention Kernel (#22095)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
2025-08-05 02:45:34 -07:00
lkchen
f4f4e7ef27 [V0 deprecation][P/D] Deprecate v0 KVConnectorBase code (1/2) (#21785)
Signed-off-by: Linkun Chen <github@lkchen.net>
2025-08-04 19:11:33 -07:00
Isotr0py
3dddbf1f25 [Misc] Add tensor schema test coverage for multimodal models (#21754)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
2025-08-03 00:52:14 -07:00
Michael Goin
88faa466d7 [CI] Initial tests for SM100 Blackwell runner (#21877)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-08-01 16:18:38 -07:00
Charent
ad57f23f6a [Bugfix] Fix: Fix multi loras with tp >=2 and LRU cache (#20873)
Signed-off-by: charent <19562666+charent@users.noreply.github.com>
2025-07-31 19:48:13 -07:00
Ilya Markov
6e672daf62 Add FlashInfer allreduce RMSNorm Quant fusion (#21069)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
2025-07-31 13:58:38 -07:00
Alexei-V-Ivanov-AMD
0780bb5783 Removing amdproduction Tests (#22027)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-07-31 09:53:27 -07:00
Simon Mo
452b2a3180 [ci] mark blackwell test optional for now (#21878) 2025-07-29 18:03:27 -07:00
Simon Mo
0d0cc9e150 [ci] add b200 test placeholder (#21866)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-07-29 17:11:50 -07:00
Reza Barazesh
37efc63b64 [V0 deprecation] Guided decoding (#21347)
Signed-off-by: Reza Barazesh <rezabarazesh@meta.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2025-07-29 03:15:30 -07:00
Michael Goin
afa2607596 [CI] Parallelize Kernels MoE Test (#21764)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-07-28 18:56:24 -07:00
Robert Shaw
d5b981f8b1 [DP] Internal Load Balancing Per Node [one-pod-per-node] (#21238)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
2025-07-23 20:57:32 -07:00
Ming Yang
772ce5af97 [Misc] Add dummy maverick test to CI (#21324)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2025-07-23 20:22:42 -07:00
Nick Hill
316b1bf706 [Tests] Add tests for headless internal DP LB (#21450)
Signed-off-by: Nick Hill <nhill@redhat.com>
2025-07-23 07:49:25 -07:00
Alexei-V-Ivanov-AMD
107111a859 Changing "amdproduction" allocation. (#21409)
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com>
2025-07-22 20:48:31 -07:00
Cyrus Leung
c401c64b4c [CI/Build] Fix model executor tests (#21387)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2025-07-22 20:25:37 -07:00
Michael Goin
005ae9be6c Fix bad lm-eval fork (#21318) 2025-07-21 10:47:51 -07:00
Seiji Eicher
d1fb65bde3 Enable v1 metrics tests (#20953)
Some checks failed
Create Release / Create Release (push) Has been cancelled
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
2025-07-20 03:22:02 +00:00
Woosuk Kwon
dd572c0ab3 [V0 Deprecation] Remove V0 Spec Decode workers (#21152)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-07-18 21:47:50 -07:00