Commit Graph

14386 Commits

Author SHA1 Message Date
Augusto Yao
8e75d88554 add io_process_plugin for sparse embedding (#34214)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-28 09:16:37 +00:00
Mario Hong
0892d1ab1f [Feature]Supports Anthropic Thinking Block (#33671)
Signed-off-by: mariohong <mariohong128@gmail.com>
Co-authored-by: zetaohong <i-hongzetao@stepfun.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-02-28 09:02:33 +00:00
Hashem Hashemi
7600642eae Add padding support to wvSplitK solution for skinny GEMMs (#33762)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
2026-02-28 09:02:05 +00:00
Andreas Karatzas
1e69c04887 [ROCm][CI] Parametrize vision score tests across attention backends with per-backend tolerances (#35571)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-28 08:59:26 +00:00
Cyrus Leung
4292e3b807 [Benchmark] Improve UX of sweep scripts (#35600)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-28 00:36:02 -08:00
Cyrus Leung
24d6ea8afd [Benchmark] Rename SLA Finder to Workload Explorer (#35586)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-27 23:31:55 -08:00
Chauncey
57c86c0741 [Misc] Change logging level from info to debug for tool parser import (#35575)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-02-28 14:51:35 +08:00
Chauncey
06254d4cbb [CI] add trainer_send_weights for MockWeightTransferEngine (#35589)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-02-28 06:47:43 +00:00
Andreas Karatzas
f5d1281c9d [ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-28 13:57:31 +08:00
Andreas Karatzas
94029ffaf0 [ROCm] Derive device capability from GCN arch string without CUDA init (#35069)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-28 13:55:28 +08:00
Andreas Karatzas
88e8525f2e [ROCm][CI] Adding infiniband mappings for moriio tests (#35170)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-28 13:53:28 +08:00
Ilya Markov
b2d8b422b2 [EPLB] Enforce sync eplb for NCCL-based all2all backend (#35212)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
2026-02-28 05:47:12 +00:00
Umut Polat
1d5ab5d603 [Bugfix] Move chat completion response_format validation to Pydantic model_validator (#35510)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-02-27 21:26:19 -08:00
Huy Do
7b346ba8ed [Bugfix] Propagate compilation_time from workers to main process for TP>1 (#35503)
Signed-off-by: Huy Do <huydhn@gmail.com>
2026-02-28 05:03:22 +00:00
Itay Alroy
dea268336f [1/N] Elastic EP Milestone 2 (#34861)
Signed-off-by: Yongji Wu <wuyongji317@gmail.com>
Signed-off-by: Itay Alroy <ialroy@nvidia.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com>
Co-authored-by: Yongji Wu <wuyongji317@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>
2026-02-28 04:46:42 +00:00
Ma Jian
90805ff464 [CI/Build] CPU release supports both of AVX2 and AVX512 (#35466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Co-authored-by: jiang1.li <jiang1.li@intel.com>
2026-02-28 04:35:21 +00:00
Matthew Bonanni
2562e0271e [MTP] Validate that MTP weights are actually loaded (#35548)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-02-28 12:27:40 +08:00
Cyrus Leung
fd68cd132b [Bugfix] Fixes for SLA finder (#35537)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-27 20:20:55 -08:00
Micah Williamson
0edf101d2b [ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-02-28 12:16:34 +08:00
Douglas Lehr
d5b6f3ba36 [ROCm][Quantization] Add Composable Kernel (CK) backend support for M… (#34301)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
2026-02-28 03:37:01 +00:00
Woosuk Kwon
1a014a0a93 [Model Runner V2] Move MM encoder to Model States [3/N] (#35564)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-27 18:32:38 -08:00
Woosuk Kwon
86ac7bcf84 [Model Runner V2] Support pooling models (#35120)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-27 18:03:01 -08:00
Umut Polat
405f28d38d [Misc] Clean up ResponsesRequest model validators (#35531)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
2026-02-28 01:19:21 +00:00
youkaichao
5323672bc2 [misc] cleanup one level of error stack when nixl fails to initialize (#35517)
Signed-off-by: youkaichao <youkaichao@gmail.com>
2026-02-28 08:42:37 +08:00
Roberto L. Castro
a201ad72d8 [Refactor][Kernel] Add global helper to deduplicate vectorized memory ops (#35105)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
2026-02-27 16:28:17 -08:00
Rohan Potdar
e3691988d0 [ROCm]: fix aiter rope functionalization (#35533)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2026-02-27 22:42:30 +00:00
Gregory Shtrasberg
9fa6c68fa6 [ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
2026-02-27 21:32:55 +00:00
Aaron Hao
2ce6f3cf67 [Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
2026-02-27 13:45:21 -07:00
Jakub Zakrzewski
1f3dbd95fd [Bugfix][Model] Fix gpt-oss batch invariance (#35404)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
2026-02-27 20:41:24 +00:00
Lucas Wilkinson
1d532f9d8f [DP] Only use DP padding when cudagraphs are actually used (#34102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-02-27 15:14:31 -05:00
Lucas Kabela
234a65b781 [Bugfix] Add monkeypatch to prevent race condition from writing (#35420)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
2026-02-27 14:51:36 -05:00
SteadfastAsArt
2decec9856 [Transformers backend] Ignore MTP weights when num_nextn_predict_layers=0 (#34888)
Signed-off-by: SteadfastAsArt <695488173@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-27 19:39:23 +00:00
Zhengxu Chen
29b35477b0 [compile] Fix caching error over pytree slice node. (#35308)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2026-02-27 19:34:16 +00:00
Nick Hill
b1d9f5372d [Model Runner V2] Warmup kernels (#35172)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-27 10:43:30 -08:00
Raushan Turganbay
fd6de37fca [BugFix] Fix 3D rope in transformers backend (#35097)
Signed-off-by: raushan <raushan@huggingface.co>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-27 18:34:49 +00:00
Netanel Haber
c8aca0c9e1 Support parakeet as audio encoder for nemotron-nano-vl (#35100)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-27 11:07:38 -07:00
Martin Hickey
b602e4f299 [Doc] Fix link to Llama chat template for usability (#35525)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
2026-02-27 17:51:09 +00:00
Huamin Li
157722da75 [perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
2026-02-28 01:50:37 +08:00
Nick Hill
1d897ff04f [Misc] Fill in some v1 CODEOWNERS gaps (#35524)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-27 09:34:37 -08:00
fort726
905d76b51d [Model] Add huggingface skt/A.X-K1 model (#32407)
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com>
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
2026-02-27 09:26:02 -08:00
Yanan Cao
9098ce690c [Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching (#34390)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
2026-02-27 09:21:35 -08:00
Nick Hill
876312f0b5 [Core] Fix gpu_worker.py pre-commit errors (#35312)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-27 07:54:24 -08:00
Boyuan Feng
5de98abc12 Add @BoyuanFeng to CODEOWNERS (#35317)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
2026-02-27 15:53:47 +00:00
Koushik Dutta
9251ed5c4f [Bugfix] Handle case when kimi ends reasoning with a tool call (#33646)
Signed-off-by: Koushik Dutta <koushd@gmail.com>
Co-authored-by: mondaylord <20212010046@fudan.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-27 14:58:28 +00:00
Yueqian Lin
e8249378e4 [Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests (#35487)
Signed-off-by: linyueqian <linyueqian@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
2026-02-27 06:48:25 -08:00
haosdent
6d4f9d3ad5 [Bugfix] Fix DCP + FA3 crash due to missing num_splits in _forward_with_dcp (#35082)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-02-27 22:27:06 +08:00
Harry Mellor
fbe3f0120a Revert "Add GlmOcrConfig for GLM-OCR model type recognition" (#35512) 2026-02-27 06:13:27 -08:00
Jason Li
66c1751d13 [compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence parallelism (#35410)
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
2026-02-27 08:36:37 -05:00
Tib
6467b635b6 [Bugfix] Add missing activation attr to RMSNormGated (#35423)
Signed-off-by: tibG <naps@qubes.milou>
Co-authored-by: tibG <naps@qubes.milou>
2026-02-27 12:53:35 +00:00
Max Hu
9c3fe9936b Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>
2026-02-27 20:20:23 +08:00