Micah Williamson
|
0edf101d2b
|
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-28 12:16:34 +08:00 |
|
Douglas Lehr
|
d5b6f3ba36
|
[ROCm][Quantization] Add Composable Kernel (CK) backend support for M… (#34301)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Signed-off-by: Douglas Lehr <Doug.Lehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
|
2026-02-28 03:37:01 +00:00 |
|
Woosuk Kwon
|
1a014a0a93
|
[Model Runner V2] Move MM encoder to Model States [3/N] (#35564)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-27 18:32:38 -08:00 |
|
Woosuk Kwon
|
86ac7bcf84
|
[Model Runner V2] Support pooling models (#35120)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-27 18:03:01 -08:00 |
|
Umut Polat
|
405f28d38d
|
[Misc] Clean up ResponsesRequest model validators (#35531)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
|
2026-02-28 01:19:21 +00:00 |
|
youkaichao
|
5323672bc2
|
[misc] cleanup one level of error stack when nixl fails to initialize (#35517)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2026-02-28 08:42:37 +08:00 |
|
Roberto L. Castro
|
a201ad72d8
|
[Refactor][Kernel] Add global helper to deduplicate vectorized memory ops (#35105)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: LopezCastroRoberto <roberto.lopez.castro@udc.es>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
|
2026-02-27 16:28:17 -08:00 |
|
Rohan Potdar
|
e3691988d0
|
[ROCm]: fix aiter rope functionalization (#35533)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-27 22:42:30 +00:00 |
|
Gregory Shtrasberg
|
9fa6c68fa6
|
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-27 21:32:55 +00:00 |
|
Aaron Hao
|
2ce6f3cf67
|
[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-27 13:45:21 -07:00 |
|
Jakub Zakrzewski
|
1f3dbd95fd
|
[Bugfix][Model] Fix gpt-oss batch invariance (#35404)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-02-27 20:41:24 +00:00 |
|
Lucas Wilkinson
|
1d532f9d8f
|
[DP] Only use DP padding when cudagraphs are actually used (#34102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-27 15:14:31 -05:00 |
|
Lucas Kabela
|
234a65b781
|
[Bugfix] Add monkeypatch to prevent race condition from writing (#35420)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-02-27 14:51:36 -05:00 |
|
SteadfastAsArt
|
2decec9856
|
[Transformers backend] Ignore MTP weights when num_nextn_predict_layers=0 (#34888)
Signed-off-by: SteadfastAsArt <695488173@qq.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-27 19:39:23 +00:00 |
|
Zhengxu Chen
|
29b35477b0
|
[compile] Fix caching error over pytree slice node. (#35308)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-27 19:34:16 +00:00 |
|
Nick Hill
|
b1d9f5372d
|
[Model Runner V2] Warmup kernels (#35172)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-27 10:43:30 -08:00 |
|
Raushan Turganbay
|
fd6de37fca
|
[BugFix] Fix 3D rope in transformers backend (#35097)
Signed-off-by: raushan <raushan@huggingface.co>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-27 18:34:49 +00:00 |
|
Netanel Haber
|
c8aca0c9e1
|
Support parakeet as audio encoder for nemotron-nano-vl (#35100)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-27 11:07:38 -07:00 |
|
Martin Hickey
|
b602e4f299
|
[Doc] Fix link to Llama chat template for usability (#35525)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-27 17:51:09 +00:00 |
|
Huamin Li
|
157722da75
|
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2026-02-28 01:50:37 +08:00 |
|
Nick Hill
|
1d897ff04f
|
[Misc] Fill in some v1 CODEOWNERS gaps (#35524)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-27 09:34:37 -08:00 |
|
fort726
|
905d76b51d
|
[Model] Add huggingface skt/A.X-K1 model (#32407)
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com>
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-27 09:26:02 -08:00 |
|
Yanan Cao
|
9098ce690c
|
[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching (#34390)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-02-27 09:21:35 -08:00 |
|
Nick Hill
|
876312f0b5
|
[Core] Fix gpu_worker.py pre-commit errors (#35312)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-27 07:54:24 -08:00 |
|
Boyuan Feng
|
5de98abc12
|
Add @BoyuanFeng to CODEOWNERS (#35317)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2026-02-27 15:53:47 +00:00 |
|
Koushik Dutta
|
9251ed5c4f
|
[Bugfix] Handle case when kimi ends reasoning with a tool call (#33646)
Signed-off-by: Koushik Dutta <koushd@gmail.com>
Co-authored-by: mondaylord <20212010046@fudan.edu.cn>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-27 14:58:28 +00:00 |
|
Yueqian Lin
|
e8249378e4
|
[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests (#35487)
Signed-off-by: linyueqian <linyueqian@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-27 06:48:25 -08:00 |
|
haosdent
|
6d4f9d3ad5
|
[Bugfix] Fix DCP + FA3 crash due to missing num_splits in _forward_with_dcp (#35082)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-27 22:27:06 +08:00 |
|
Harry Mellor
|
fbe3f0120a
|
Revert "Add GlmOcrConfig for GLM-OCR model type recognition" (#35512)
|
2026-02-27 06:13:27 -08:00 |
|
Jason Li
|
66c1751d13
|
[compile] Cleanup: Remove unnecessary +rms_norm forcing for sequence parallelism (#35410)
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
|
2026-02-27 08:36:37 -05:00 |
|
Tib
|
6467b635b6
|
[Bugfix] Add missing activation attr to RMSNormGated (#35423)
Signed-off-by: tibG <naps@qubes.milou>
Co-authored-by: tibG <naps@qubes.milou>
|
2026-02-27 12:53:35 +00:00 |
|
Max Hu
|
9c3fe9936b
|
Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>
|
2026-02-27 20:20:23 +08:00 |
|
Umut Polat
|
b66a74649e
|
[Bugfix] Replace assert with ValueError for response_format validation in completions endpoint (#35456)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
|
2026-02-27 08:01:06 +00:00 |
|
Wang Xingran
|
07bdabef03
|
[Bugfix] Use 'sum' reduction instead of 'avg' in Async TP reduce-scatter (#33088)
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hongjian Zhang <hirokenovo@gmail.com>
Co-authored-by: Hongjian Zhang <hirokenovo@gmail.com>
|
2026-02-27 07:06:08 +00:00 |
|
Chengyi Nie
|
a572baff5e
|
[Model Performance] Add Qwen3MoE tuned MoE configs for H200 (#35457)
Signed-off-by: Chengyi Nie <cnie@roblox.com>
Co-authored-by: Chengyi Nie <cnie@roblox.com>
|
2026-02-27 13:51:14 +08:00 |
|
zofia
|
516cf26698
|
[Bug] correct out dtype of rms_norm_gated native path (#35369)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-27 05:19:51 +00:00 |
|
Jiangyun Zhu
|
487e5c51f7
|
[Bugfix] disable allreduce_rms_fusion by default when pp size > 1 (#35424)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-02-27 04:18:52 +00:00 |
|
Daniel Huang
|
1a8c71674e
|
[BugFix] Repo utils debug print patch (#35434)
Signed-off-by: Daniel Huang <daniel1.huang@intel.com>
|
2026-02-27 03:50:56 +00:00 |
|
Wentao Ye
|
062b789632
|
[Bug] Fix outdated links in source code (#35314)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-27 03:50:46 +00:00 |
|
gnovack
|
a532c83849
|
use 'max_active_experts' for moe lora input size (#33197)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2026-02-27 03:50:43 +00:00 |
|
Jee Jee Li
|
1e5ad9b74f
|
[Bugfix] Fix Qwen3NextForCausalLM packed_modules_mapping (#35413)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-26 19:46:30 -08:00 |
|
Nicolò Lucchesi
|
cabdaa7619
|
[Misc] Move GPUModelRunner.prepare_kernel_block_sizes to utils (#35400)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-27 11:42:51 +08:00 |
|
Chenyaaang
|
06be53563b
|
[Core]Extract is_last_rank in Ray for tpu to override (#33012)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2026-02-27 03:18:52 +00:00 |
|
Angela Yi
|
c29ee9c326
|
[compile] Invalidate cache for cpu flags (#35119)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2026-02-27 02:54:11 +00:00 |
|
daniel-salib
|
d43048ce05
|
[Bugfix] Emit reasoning_part events in simple streaming path for Resp… (#35184)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2026-02-27 09:49:06 +08:00 |
|
Michael Goin
|
4fec53cfcb
|
[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274)
|
2026-02-26 17:58:03 -07:00 |
|
roikoren755
|
38c498b8e3
|
[Performance] Cublas Bf16 Gate with Fp32 Output (#35121)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2026-02-26 16:51:28 -08:00 |
|
Andrii Skliar
|
56a6371706
|
[Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687)
Signed-off-by: Andrii <askliar@nvidia.com>
Co-authored-by: Andrii <askliar@nvidia.com>
|
2026-02-26 16:31:43 -08:00 |
|
Pavani Majety
|
6283021142
|
[Bugfix] Fix KV Scale loading for MLA Models (#35430)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-26 23:38:19 +00:00 |
|
Aleksandr Malyshev
|
01923eec70
|
[ROCm][Quantization] GPT OSS Upstream MoE wmxfp4_afp8 with static scales (#30357)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2026-02-26 16:50:16 -06:00 |
|