Bvicii
|
999dfc1622
|
[Bugfix] Offload blocking tokenizer ops to shared thread pool to unblock event loop (#34789)
Signed-off-by: Bvicii <yizhanhuang2002@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-03-26 22:17:00 -07:00 |
|
wenjun liu
|
d86060122a
|
[CI/Build] enable Intel XPU test flow with prebuilt image (#37447)
Signed-off-by: wendyliu235 <wenjun.liu@intel.com>
|
2026-03-26 18:16:04 -07:00 |
|
Harry Mellor
|
f73bcb1c51
|
Various Transformers v5 config fixes (#38247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 23:06:59 +00:00 |
|
yzong-rh
|
28048bd6b0
|
[Bugfix] Add missing f-string prefix in xgrammar choices error message (#38162)
Signed-off-by: Yifan Zong <yzong@redhat.com>
|
2026-03-26 21:43:03 +00:00 |
|
Giancarlo Delfin
|
c32e97602d
|
[Model Runner V2] Enable forcing a specific acceptance rate during rejection sampling (#38045)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-03-26 13:38:12 -07:00 |
|
Wei Zhao
|
0904b6550d
|
Fix multi-node allreduce fusion (#38136)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: root <root@theia0053.lyris.clusters.nvidia.com>
|
2026-03-26 20:24:36 +00:00 |
|
Stig-Arne Grönroos
|
f26fcdfb9e
|
[Bugfix][ROCm] Fix lru_cache on paged_mqa_logits_module (#37547)
Signed-off-by: Stig-Arne Grönroos <stig-arne.gronroos@amd.com>
|
2026-03-26 19:01:05 +00:00 |
|
TJian
|
bc9c6fbbe6
|
[ROCm] [Bugfix] [Release] Fix nightly rocm release pipeline (#38263)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-03-26 18:47:10 +00:00 |
|
Andreas Karatzas
|
bff9a1c266
|
[ROCm][CI] Override PYTORCH_ROCM_ARCH with detected GPU arch in test containers (#38165)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 18:33:45 +00:00 |
|
Andreas Karatzas
|
db01535e2b
|
[ROCm][CI] Add uv pip compile workflow for rocm-test.txt lockfile (#37930)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 12:44:01 -05:00 |
|
jennyyyyzhen
|
a4cf9b22ba
|
[ROCM][Bugfix] Use correct stride in cp_mha_gather_cache_kernel for hybrid model (#37228) (#37228)
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu>
Co-authored-by: yZhen <yZhen@fb.com>
|
2026-03-26 10:33:39 -07:00 |
|
Andreas Karatzas
|
9c3ae04bfe
|
[ROCm][CI] Add LM Eval Qwen3.5 Models test for MI355 (#38155)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 16:51:18 +00:00 |
|
Andreas Karatzas
|
a8e48a7b85
|
[CI] Fix conch kernel crash on 3D input by reshaping to 2D before GEMM (#38178)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 11:46:03 -05:00 |
|
Divakar Verma
|
b9dbc5c4ab
|
[Mamba][APC] Add test case to compare apc outputs (#34977)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2026-03-26 16:40:35 +00:00 |
|
TJian
|
60af7b967b
|
[Releases] [ROCm] Enable Nightly Docker Image and Wheel Releases for ROCm (#37283)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Co-authored-by: Hongxia Yang <hongxiay.yang@amd.com>
|
2026-03-26 16:32:25 +00:00 |
|
Andreas Karatzas
|
bdc1719eb9
|
[ROCm][CI] Fix AITER state leak in shared_fused_moe_routed_transform test (#38137)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 09:26:46 -07:00 |
|
haosdent
|
0aac2048bf
|
[Bugfix] Restore CUDA graph persistent buffers for FP8 FlashMLA decode (#35175)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-26 16:13:39 +00:00 |
|
Chuan (Richard) Li
|
cb2263218e
|
[Bugfix][Minor] Fix potential NameError in mamba backend selector and misc typos (#35886)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-26 11:59:24 -04:00 |
|
Wentao Ye
|
e054f152fa
|
[CI] Add batch invariant test for b200 (#38014)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 11:54:54 -04:00 |
|
zhang-prog
|
0f5b526040
|
[Fix] Remove unused packing_position_embedding from PaddleOCRVL for better checkpoint compatibility (#38232)
Signed-off-by: zhangyue66 <zhangyue66@baidu.com>
|
2026-03-26 15:34:49 +00:00 |
|
Zhewen Li
|
be1a85b7a2
|
Revert "[MoE Kernel] Flashinfer nvfp4 cutedsl moe kernel integration" (#38050) (#38169)
Co-authored-by: Zhewen Li <zhewenli@inferact.ai>
|
2026-03-26 07:59:09 -07:00 |
|
Cyrus Leung
|
2e225f7bd2
|
[Renderer] Consolidate factory methods (#38218)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 12:19:22 +00:00 |
|
Jared Wen
|
757eafcf37
|
[bug-fix] GLM OCR Patch Merger context_dim (#37962)
Signed-off-by: JaredforReal <w13431838023@gmail.com>
|
2026-03-26 05:11:21 -07:00 |
|
wang.yuqi
|
dcdc145893
|
[CI] Reorganize scoring tests (#38207)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-26 12:07:01 +00:00 |
|
Andreas Karatzas
|
f2d16207c7
|
[ROCm][CI] Fix flaky GPTQ compile correctness test (#38161)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:57:00 +08:00 |
|
Andreas Karatzas
|
37a83007fe
|
[ROCm][CI] Fix wvSplitKrc mock argument order in test_rocm_unquantized_gemm (#38167)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 19:54:59 +08:00 |
|
Wentao Ye
|
bf5eec638d
|
[Refactor] Remove unused utils (#38153)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-03-26 17:08:19 +08:00 |
|
Mateusz Sokół
|
b1cb1d3d2c
|
DOC: Documentation pages fixes (#38125)
Signed-off-by: Mateusz Sokół <mat646@gmail.com>
|
2026-03-26 16:55:42 +08:00 |
|
Kunshang Ji
|
6ae8bbd0c2
|
[XPU] Disable xpu graph by default (#38193)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-26 01:53:45 -07:00 |
|
Cyrus Leung
|
a9213c0ffe
|
[Doc] Fix outdated reference to CUDAGraphManager (#38209)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 01:52:38 -07:00 |
|
Cyrus Leung
|
502c41a8f6
|
[Model] Use helper function to run MM processors with token inputs (where applicable) (#38018)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-26 16:44:04 +08:00 |
|
Vadim Gimpelson
|
52069012fe
|
[Bugfix] Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-26 01:21:47 -07:00 |
|
Fadi Arafeh
|
71161e8b63
|
[cpu][ci] remove soft-fail for Arm CI and add quant model tests (#37691)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2026-03-26 07:03:31 +00:00 |
|
Terry Gao
|
38de822310
|
[Model] Add torch.compile support for InternVL vision encoder (#38049)
Signed-off-by: tianrengao <terrygao87@gmail.com>
|
2026-03-25 23:52:29 -07:00 |
|
Jee Jee Li
|
2bfbdca23c
|
[Bugfix] Fix benchmark_fused_collective.py (#38082)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-03-25 23:51:00 -07:00 |
|
Matej Rojec
|
2908094567
|
Add /v1/chat/completions/batch endpoint for batched chat completions (#38011)
Signed-off-by: Matej Rojec <64556640+MatejRojec@users.noreply.github.com>
|
2026-03-26 12:13:33 +08:00 |
|
BadrBasowid
|
e6bf9f15ec
|
[Bugfix][CI] Fix Marlin FP8 Linear Kernel for Compressed Tensors Format (#38092)
Signed-off-by: BadrBasowid <Badr.Basowid@gmail.com>
Signed-off-by: BadrBasowid <61441185+BadrBasowid@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-25 21:11:43 -07:00 |
|
Woosuk Kwon
|
144030c84e
|
Relocate Encoder CUDA graph manager (#38116)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-25 20:52:12 -07:00 |
|
Flora Feng
|
e2db2b4234
|
[Tool Parser][1/3] Pass tools to ToolParser constructor (#38029)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-03-26 10:29:06 +08:00 |
|
Chauncey
|
87f05d6880
|
[Revert] Remove DeepGEMM availability check in DeepseekV32IndexerMetadataBuilder (#38076)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-26 01:43:51 +00:00 |
|
Andreas Karatzas
|
36f6aede23
|
[Misc] Optimized check to encapsulate both CUDA and ROCm platforms (#34549)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-26 09:43:07 +08:00 |
|
Xin Yang
|
9704a5c310
|
Disable dual stream execution of input projection for Qwen3 (#38152)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-26 01:20:39 +00:00 |
|
Wei Zhao
|
74056039b7
|
Fix minimax m2.5 nvfp4 kv scales weight loading (#37214)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-03-26 00:48:06 +00:00 |
|
Jacob Platin
|
d7d51a7ee5
|
[Bugfix] Fix Qwen3.5-FP8 Weight Loading Error on TPU (#37348)
Signed-off-by: Jacob Platin <jacobplatin@google.com>
|
2026-03-26 00:46:01 +00:00 |
|
Harry Mellor
|
3c3c084240
|
Various Transformers v5 fixes (#38127)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-26 00:10:08 +00:00 |
|
Ekagra Ranjan
|
7b54f60db0
|
[Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2026-03-25 16:13:51 -07:00 |
|
Rohan Potdar
|
a0e8c74005
|
[ROCm]: Update rope+kvcache fusion conditions and disable custom op by default (#36716)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-25 20:58:44 +00:00 |
|
Guillaume Guy
|
70a2152830
|
[MultiModal] add support for numpy array embeddings (#38119)
Signed-off-by: guillaume_guy <guillaume.guy@airbnb.com>
Signed-off-by: Guillaume Guy <guillaume.c.guy@gmail.com>
Co-authored-by: guillaume_guy <guillaume.guy@airbnb.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-03-25 20:13:04 +00:00 |
|
Sathish Sanjeevi
|
978fc18bf0
|
[ROCm] Utilize persistent MLA kernel from AITER (#36574)
Signed-off-by: Sathish Sanjeevi <sathish.krishnan.p.s@gmail.com>
|
2026-03-26 03:00:42 +08:00 |
|
Andreas Karatzas
|
7d6917bef5
|
[ROCm] Fix MoE kernel test failures on gfx950 (#37833)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-03-25 13:46:40 -05:00 |
|