Commit Graph

1082 Commits

Author SHA1 Message Date
Woosuk Kwon
5c65a72bb1 [V0 Deprecation] Remove more V0 tests (#25117)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 22:05:25 -07:00
Woosuk Kwon
2fc24e94f9 [V0 Deprecation] Remove V0 Tracing & Metrics tests (#25115)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 19:40:44 -07:00
elvischenv
e6585ddb45 [Bugfix] Fix accuracy issue for silu_mul + nvfp4 quant fusion kernel (#24833)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2025-09-17 16:37:23 -07:00
Michael Goin
9f882d8791 Disable failing GPT-OSS Eval (Blackwell) for now (#25107)
Signed-off-by: mgoin <mgoin64@gmail.com>
2025-09-17 15:36:00 -07:00
Woosuk Kwon
4b946d693e [V0 Deprecation] Remove V0 Core tests (#25082)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-17 09:32:42 -07:00
Woosuk Kwon
5801e49776 [V0 Deprecation] Remove MQLLMEngine (#25019)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-16 21:29:27 -07:00
Michael Goin
493b10f8bf [CI] GPT-OSS GPQA eval test for Blackwell (#24920)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-16 18:13:21 -07:00
Ming Yang
4e5affeaa1 [CI] Add Decode Context Parallelism (DCP) test to CI (#24487)
Signed-off-by: Ming Yang <minos.future@gmail.com>
2025-09-16 21:21:28 +08:00
Woosuk Kwon
759ef49b15 Remove V0 Encoder-Decoder Support (#24907)
Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>
2025-09-15 21:17:14 -07:00
Jiangyun Zhu
5bcc153d7b [Compile] Fix noop_elimination pass and add tests for noop_elimination (#24880)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-15 23:33:18 +00:00
Simon Mo
fd2f10546c [ci] fix wheel names for arm wheels (#24898)
Signed-off-by: simon-mo <simon.mo@hey.com>
2025-09-15 14:39:08 -07:00
Didier Durand
4979eb79da [Doc]: fix typos in various files (#24821)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-15 01:08:52 -07:00
afeldman-nm
c8c42597ab [CI] Speed up model unit tests in CI (#24253)
Signed-off-by: Andrew Feldman <afeldman@redhat.com>
2025-09-12 10:36:50 -07:00
dongluw
a5b84f1cbf [Core] Shared memory based object store for Multimodal data caching and IPC (#20452)
Signed-off-by: donglu <donglu@cohere.com>
2025-09-12 07:54:17 -07:00
RichardoMu
40b6c9122b [V1] feat:add engine v1 tracing (#20372)
Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
Signed-off-by: Ye Zhang <zhysishu@gmail.com>
Signed-off-by: RichardoMu <44485717+RichardoMrMu@users.noreply.github.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Mu Huai <tianbowen.tbw@antgroup.com>
Co-authored-by: Ye Zhang <zhysishu@gmail.com>
Co-authored-by: Benjamin Bartels <benjamin@bartels.dev>
Co-authored-by: simon-mo <simon.mo@hey.com>
Co-authored-by: 瑜琮 <ly186375@antfin.com>
Co-authored-by: Aaron Pham <contact@aarnphm.xyz>
Co-authored-by: 22quinn <33176974+22quinn@users.noreply.github.com>
2025-09-11 17:10:39 -07:00
wang.yuqi
fd1ce98cdd [CI] Split mteb test from Language Models Test (#24634)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 06:37:51 -07:00
Fanli Lin
0cd72a7b72 [XPU] add missing dependency tblib for XPU CI (#24639)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
2025-09-11 11:22:33 +00:00
wang.yuqi
a8b0361c92 [CI] Split pooling from entrypoints Test (#24632)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-11 01:53:09 -07:00
Xingyu Liu
0fc36463e0 [CI]Add transformers_utils to Async Engine, Inputs, Utils, Worker Test (#24615)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
2025-09-11 01:52:10 -07:00
Jerry Zhang
2048c4e379 [torchao] Support quantization configs using module swap (#21982)
Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>
2025-09-10 23:53:24 -07:00
Hanjie Qiu
dcb28a332b [Kernel] Flashinfer MLA (trtllm-gen) decode kernel integration (#21078)
Signed-off-by: hjjq <hanjieq@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-09-10 15:31:10 -07:00
Russell Bryant
37e8182bfe [v1] Add Whisper model support (encoder-decoder) (#21088)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: NickLucche <nlucches@redhat.com>
2025-09-10 13:53:35 -07:00
wang.yuqi
bd98842c8a [CI] Add PPL test for generation models (#24485)
Signed-off-by: wang.yuqi <noooop@126.com>
2025-09-10 06:16:39 -07:00
Ye (Charlotte) Qi
492196ed0e [CI/Build] split true unit tests to Entrypoints Unit Tests (#24418)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
2025-09-10 06:16:07 -07:00
Jiangyun Zhu
b8a93076d3 [CI] execute all piecewise compilation tests together (#24502)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
2025-09-09 11:05:25 -07:00
Sahithi Chigurupati
6910b56da2 [CI] Add nightly multiarch manifests to dockerhub (#24102)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Signed-off-by: simon-mo <simon.mo@hey.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
2025-09-09 01:18:09 +00:00
Woosuk Kwon
4172235ab7 [V0 deprecation] Deprecate V0 Neuron backend (#21159)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-09-06 16:15:18 -07:00
yzds
ac201a0eaf [Feature] Support Decode Context Parallel (DCP) for MLA (#23734)
Signed-off-by: hongchao <hongchao@msh.team>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: hongchao <hongchao@msh.team>
Co-authored-by: youkaichao <youkaichao@gmail.com>
2025-09-06 13:24:05 +08:00
Rafael Vasquez
c954c6629c [CI] Add timeouts to tests (#24260)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
2025-09-05 17:26:22 -07:00
Louie Tsai
006e7a34ae Adding int4 and int8 models for CPU benchmarking (#23709)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
2025-09-05 20:08:50 +08:00
elvischenv
adc3ddb430 [Bugfix][Misc] Fix silu_and_mul_nvfp4_quant issue and extract common utils for nvfp4 kernel source files (#23727)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
2025-09-04 14:25:45 -07:00
Kunshang Ji
16ded21eeb [XPU] support Triton Attention backend on Intel GPU (#24149)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-09-04 20:41:08 +08:00
Lucas Wilkinson
402759d472 [Attention] FlashAttn MLA (#14258)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-04 02:47:59 -07:00
Didier Durand
02d411fdb2 [Doc]: fix typos in Python comments (#24115)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-09-02 21:14:07 -07:00
youkaichao
42dc59dbac Update release pipeline post PyTorch 2.8.0 update (#24073)
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
2025-09-03 10:09:19 +08:00
Matthew Bonanni
2fd1a40a54 [CI/Build] Disable SiluMul NVFP4 quant fusion tests (#24121)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2025-09-02 16:50:28 -07:00
Christian Pinto
1cb39dbcdd [Misc] IO Processor plugins for pooling models (#22820)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
2025-08-31 23:07:12 -07:00
Isotr0py
ff0e59d83a [CI/Build] Improve Tensor Schema tests speed by avoid engine core initialization (#23357)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2025-08-31 22:52:20 -07:00
Huy Do
67c14906aa Update PyTorch to 2.8.0 (#20358)
Signed-off-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-29 18:57:35 +08:00
Li, Jiang
ad39106b16 [CPU] Enable data parallel for CPU backend (#23903)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-29 02:19:58 -07:00
Jee Jee Li
b4f9e9631c [CI/Build] Clean up LoRA test (#23890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2025-08-28 23:28:35 -07:00
Simon Mo
27e88cee74 chore: build release image by default (#23852)
Signed-off-by: Codex <codex@openai.com>
2025-08-28 13:17:15 -07:00
elvischenv
16a45b3a28 [NVIDIA] Support SiluMul + NVFP4 quant fusion (#23671)
Signed-off-by: jindih <jindih@nvidia.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: jindih <jindih@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Luka Govedic <lgovedic@redhat.com>
2025-08-28 19:36:50 +00:00
Jean Schmidt
0583578f42 [ci] breaks down V1 Test into 3 groups of approx 30 minutes runtime (#23757)
Signed-off-by: Jean Schmidt <contato@jschmidt.me>
2025-08-28 08:59:19 -07:00
Li, Jiang
67cee40da0 [CI/Build][Bugfix] Fix Qwen VL tests on CPU (#23818)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-08-28 11:57:05 +00:00
Alex
f48a9af892 [CI] make all multi-gpu weight loading tests run nightly (#23792)
Signed-off-by: Alex Yun <alexyun04@gmail.com>
2025-08-27 21:27:36 -07:00
Eli Uriegas
3c0ef769ba ci: Add arm64 docker build to release pipeline (#23210)
Signed-off-by: Eli Uriegas <eliuriegas@meta.com>
Signed-off-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>
2025-08-27 10:41:48 -07:00
Kunshang Ji
fce10dbed5 [XPU] Add xpu torch.compile support (#22609)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2025-08-27 05:33:27 +00:00
Didier Durand
7c04779afa [Doc]: fix various spelling issues in multiple files (#23636)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
2025-08-26 14:05:29 +00:00
nvjullin
f66673a39d [Kernel] Added flashinfer fp8 per-tensor gemms (#22895)
Signed-off-by: Julien Lin <jullin@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2025-08-26 06:54:04 -07:00