Commit Graph

14213 Commits

Author SHA1 Message Date
rasmith
b188bab441 [CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor (#34985)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
2026-02-25 19:18:00 +00:00
Lucas Wilkinson
15d76f74e2 Revert "[Misc] Enable weights loading tracking for quantized models" (#35309) 2026-02-25 09:20:15 -08:00
Andreas Karatzas
8fd6975479 [ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results (#35049)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-25 16:48:37 +00:00
pushkar
5d18bf8b32 [Bugfix] Fix Harmony preamble visibility in Responses API (#32114)
Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
2026-02-25 08:08:16 -08:00
haosdent
0788ff0a15 [Bugfix] Gracefully disable AllReduceFusionPass on GPUs without multicast support (#35085)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-02-25 07:31:45 -08:00
Chendi.Xue
d72b0be33c [XPU]Fix for Qwen-OMNI crash (#35249)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
2026-02-25 07:31:07 -08:00
Bhoomit
42489e43c2 [Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773)
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com>
2026-02-25 23:30:55 +08:00
Mario Hong
af5e6afa0a [Bugfix] Fix step3p5 reasoning with interleaved thinking (#34211)
Signed-off-by: mariohong <mariohong128@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-02-25 15:13:01 +00:00
Benjamin Chislett
ee59a7c615 [Tests] Add GSM8k check to SpecDec E2E tests (#34772)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-02-25 07:51:14 -05:00
Joao Gante
709eadbb0b Doc link typo (#35281)
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 03:00:31 -08:00
Harry Mellor
90fc7f9109 Fix custom processors that use deleted behaviour for Transformers v5 (#35107)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-25 02:36:21 -08:00
Yanwen Lin
675ec59aa9 [Bugfix][CPU] Fix basic unit tests failing in CPU platforms (#34677)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-25 08:36:15 +00:00
Yanwen Lin
80e60a6133 [Doc] Suggest "--managed-python" flag when installing python using uv (#33069)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
2026-02-25 08:19:43 +00:00
jonoillar
26e722f906 [DOC][BugFix] Specfiy build dependency installation (#34513)
Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com>
Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com>
2026-02-25 08:04:06 +00:00
lichuang
2c619e5e3f [Docs]Fix documentation formatting in architecture overview (#34679)
Signed-off-by: codedump <lichuang1982@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-25 08:00:15 +00:00
Simon Mo
8a685be8d9 docs: document committer proposal process in governance (#35225)
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-25 07:58:48 +00:00
Laura Wang
2465071510 [Perf] Add opt-in SM100 Oink RMSNorm custom-op path (#31828)
Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
2026-02-24 23:01:53 -08:00
wenshuai
cd43673668 [Perf] Optimize FP8 gemm of sm120. (#34424)
Signed-off-by: wenshuai <wenshuai@xiaomi.com>
2026-02-24 22:25:24 -08:00
Xinyu Chen
35d44b4557 [XPU]Support CUDAGraph on XPU Platform (#34482)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
Co-authored-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: zhenwei-intel <zhenwei.liu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-02-24 22:22:52 -08:00
Kunshang Ji
8ad54a991b [Platform] Add current_platform.num_compute_units interface (#35042)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
2026-02-24 22:22:49 -08:00
Kunshang Ji
92510edc32 remove cuda check in top_k_top_p_triton kernel (#35011)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-02-24 22:22:31 -08:00
Isotr0py
a6c137521c [Misc] Add shard_id validation for MergedColumnLinear (#35055)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-24 22:12:28 -08:00
Isotr0py
4572a06afe [Misc] Enable weights loading tracking for quantized models (#35074)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-02-24 22:11:03 -08:00
Zhengxu Chen
5cc29cfb8b [compile] Improve error message during artifacts load failure. (#35115)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
2026-02-24 22:01:09 -08:00
Chen Zhang
8fae54faff [Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
2026-02-24 22:00:19 -08:00
Harry Mellor
f7967577f5 Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM (#35203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-24 22:00:06 -08:00
pks
af770b8e7b [Bugfix] Fix AttributeError when passing StructuredOutputsParams to CompletionRequest (#35237)
Signed-off-by: Patrick Simianer <patrick@lilt.com>
2026-02-24 22:00:03 -08:00
Andreas Karatzas
2ff3e436ad [Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-25 05:52:44 +00:00
Jhao-Ting Chen
c2c4c4611a [FIX] fused moe with lora shared expert dual stream (1.07x otps) (#34933)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-25 04:40:45 +00:00
Rohan Potdar
f38f8c9742 [ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
2026-02-25 04:36:40 +00:00
Flora Feng
ec1d30c0f6 [Responses] Decouple SSE event helpers from Harmony context (#35148)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
2026-02-24 20:05:25 -08:00
Pooya Davoodi
e3b2324ec4 [Frontend] Use init_app_state and FrontendArgs in run_batch (#32967)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-02-24 19:40:39 -08:00
Nick Hill
dbf0da817a [Core] Cleanup engine pause/sleep logic (#34528)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-24 19:33:34 -08:00
Xin Yang
3bbb2046ff [Bugfix] Fix expert_ids padding values in moe_align_block_size kernel (#35161)
Signed-off-by: Xin Yang <xyangx@amazon.com>
2026-02-24 17:14:24 -08:00
yugong333
576fe50333 Adding Nemotron fp8 Triton MoE Config (#34674)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
2026-02-24 15:56:38 -08:00
Hashem Hashemi
a0e50a4260 Convert wvSplitKQ to 16x16 MFMA in prep for mi4xx. (#34100)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
2026-02-24 23:35:21 +00:00
Benjamin Chislett
9fa5b25a23 [Bug][DSV3.2] Always prepare metadata for DeepGEMM Sparse Attention (#35075)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-02-24 14:55:22 -08:00
Robert Shaw
ea97750414 [CI] Fix Distributed Tests (#35236)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
2026-02-24 22:31:56 +00:00
Andreas Karatzas
067c5d9ad1 [ROCm][CI] Added MI325 mirrors (#34923)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-02-24 13:37:15 -08:00
Benjamin Chislett
f5972a872f [Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-02-24 09:49:56 -08:00
Matthew Bonanni
a9e15e040d Add @MatthewBonanni to CODEOWNERS (#35207)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
2026-02-24 10:45:10 -07:00
Lucas Wilkinson
542ca66357 Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" (#35211) 2026-02-24 09:26:42 -08:00
Cyrus Leung
fc8456c336 [CI/Build] Fix kernels test location (#35205)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-02-24 09:20:34 -08:00
Wentao Ye
9ce8fad2a9 [Perf] Optimize Python Slice for Structured Output using islice instead of [:] (#33593)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-24 09:02:36 -08:00
Harry Mellor
c38b8d5a31 Remove padding_index from models that don't use it for better Transformers v5 compatibility (#35189)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-02-24 08:04:46 -08:00
Robert Shaw
60da0e1544 [CI] Remove Duplicated Tests (#35199)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-02-24 23:53:30 +08:00
danisereb
9609b1f18d Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 (#35053)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-02-24 08:45:13 -07:00
danisereb
a0c7081695 Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#35088)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2026-02-24 07:25:44 -08:00
R3hankhan
34ce0ffd1f [CPU][Perf] Accelerate Attention head for s390x using vector intrinsics (#34434)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-02-24 07:25:39 -08:00
Robin Nabel
0de5333989 Fix GLM4 parser tests (#34905)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
2026-02-24 22:27:42 +08:00