Andreas Karatzas
|
2ff3e436ad
|
[Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 05:52:44 +00:00 |
|
Jhao-Ting Chen
|
c2c4c4611a
|
[FIX] fused moe with lora shared expert dual stream (1.07x otps) (#34933)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 04:40:45 +00:00 |
|
Rohan Potdar
|
f38f8c9742
|
[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-25 04:36:40 +00:00 |
|
Flora Feng
|
ec1d30c0f6
|
[Responses] Decouple SSE event helpers from Harmony context (#35148)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-24 20:05:25 -08:00 |
|
Pooya Davoodi
|
e3b2324ec4
|
[Frontend] Use init_app_state and FrontendArgs in run_batch (#32967)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2026-02-24 19:40:39 -08:00 |
|
Nick Hill
|
dbf0da817a
|
[Core] Cleanup engine pause/sleep logic (#34528)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-24 19:33:34 -08:00 |
|
Xin Yang
|
3bbb2046ff
|
[Bugfix] Fix expert_ids padding values in moe_align_block_size kernel (#35161)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-24 17:14:24 -08:00 |
|
yugong333
|
576fe50333
|
Adding Nemotron fp8 Triton MoE Config (#34674)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-24 15:56:38 -08:00 |
|
Hashem Hashemi
|
a0e50a4260
|
Convert wvSplitKQ to 16x16 MFMA in prep for mi4xx. (#34100)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-24 23:35:21 +00:00 |
|
Benjamin Chislett
|
9fa5b25a23
|
[Bug][DSV3.2] Always prepare metadata for DeepGEMM Sparse Attention (#35075)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-24 14:55:22 -08:00 |
|
Robert Shaw
|
ea97750414
|
[CI] Fix Distributed Tests (#35236)
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
|
2026-02-24 22:31:56 +00:00 |
|
Andreas Karatzas
|
067c5d9ad1
|
[ROCm][CI] Added MI325 mirrors (#34923)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-24 13:37:15 -08:00 |
|
Benjamin Chislett
|
f5972a872f
|
[Model][Spec Decode] Nemotron-H MTP and Mamba Speculative Decoding Support (#33726)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: Roi Koren <roik@nvidia.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-24 09:49:56 -08:00 |
|
Matthew Bonanni
|
a9e15e040d
|
Add @MatthewBonanni to CODEOWNERS (#35207)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-24 10:45:10 -07:00 |
|
Lucas Wilkinson
|
542ca66357
|
Revert "[CI/Build] Remove redundant OpenTelemetry pip install from CI configs" (#35211)
|
2026-02-24 09:26:42 -08:00 |
|
Cyrus Leung
|
fc8456c336
|
[CI/Build] Fix kernels test location (#35205)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-24 09:20:34 -08:00 |
|
Wentao Ye
|
9ce8fad2a9
|
[Perf] Optimize Python Slice for Structured Output using islice instead of [:] (#33593)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-24 09:02:36 -08:00 |
|
Harry Mellor
|
c38b8d5a31
|
Remove padding_index from models that don't use it for better Transformers v5 compatibility (#35189)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-24 08:04:46 -08:00 |
|
Robert Shaw
|
60da0e1544
|
[CI] Remove Duplicated Tests (#35199)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-24 23:53:30 +08:00 |
|
danisereb
|
9609b1f18d
|
Integrate flashinfer mm_mxfp8 in ModelOpt MXFP8 (#35053)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-24 08:45:13 -07:00 |
|
danisereb
|
a0c7081695
|
Fix fallback to default tactic (flashinfer autotuner) with trtllm_fp4_block_scale_moe (#35088)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-24 07:25:44 -08:00 |
|
R3hankhan
|
34ce0ffd1f
|
[CPU][Perf] Accelerate Attention head for s390x using vector intrinsics (#34434)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2026-02-24 07:25:39 -08:00 |
|
Robin Nabel
|
0de5333989
|
Fix GLM4 parser tests (#34905)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-02-24 22:27:42 +08:00 |
|
Eldar Kurtić
|
a87cc50859
|
[Attn,KV-cache] Use per-head scales in the attention selector (#34281)
Signed-off-by: Your Name <you@example.com>
Signed-off-by: Eldar Kurtic <research@neuralmagic.com>
Co-authored-by: Eldar Kurtic <research@neuralmagic.com>
Co-authored-by: Your Name <you@example.com>
|
2026-02-24 09:02:43 -05:00 |
|
Cyrus Leung
|
761e63e541
|
[Frontend] Always pass supported_tasks to validation (#35186)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-24 04:16:33 -08:00 |
|
Isotr0py
|
d12d201409
|
[Bugfix] Fix failing FunASR processor test (#35111)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-24 04:13:45 -08:00 |
|
eustlb
|
b3ad37c5db
|
[glm-asr] change defaults dummy audio size (#35108)
Signed-off-by: Eustache Le Bihan <eulebihan@gmail.com>
|
2026-02-24 04:13:33 -08:00 |
|
Wentao Ye
|
14561fabfd
|
[Perf] Optimize pooling model redundant copy, 1.8% throughput improvement (#35127)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-24 04:13:11 -08:00 |
|
Zhengxu Chen
|
c77f3e1207
|
[compile] Save aot compile artifacts atomically. (#35117)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-24 04:11:01 -08:00 |
|
Dor Huri
|
012dee9233
|
[Feature] Add LoRA tower/connector support for Llama 4 Vision (mllama4) (#35147)
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-02-24 04:10:32 -08:00 |
|
Tugsbayasgalan Manlaibaatar
|
f1c664545b
|
Make voxtral compile friendly (#33959)
Signed-off-by: Tugsbayasgalan Manlaibaatar <tmanlaibaatar@fb.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-24 09:33:35 +01:00 |
|
Xin Yang
|
c870eb9e0f
|
[LoRA] Update LoRA expand kernel block_n calculation (#32621)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-23 23:17:53 -08:00 |
|
BadrBasowid
|
6af03f2394
|
[Refactor] [1/N] Reorganize kernel abstraction directory (#34055)
Signed-off-by: BadrBasowid <badr.basowid@gmail.com>
Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-24 06:47:22 +00:00 |
|
Vlad Tiberiu Mihailescu
|
1a6cf39dec
|
[CI/Build] Remove redundant OpenTelemetry pip install from CI configs (#35032)
Signed-off-by: Vlad Mihailescu <vtmihailescu@gmail.com>
|
2026-02-23 22:24:11 -08:00 |
|
Nicolò Lucchesi
|
f91808ae0d
|
[MM] Allow audio chunking for offline LLM (#34628)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-23 21:04:28 -08:00 |
|
Vadim Gimpelson
|
33a0d43c71
|
[BUGFIX][Qwen3.5] Hardcode mlp.gate as not quantizable (#35156)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-23 19:42:24 -08:00 |
|
pschlan-amd
|
80d93fd6da
|
gpu_model_runner: Cache is_encoder_decoder from model config (#35099)
Signed-off-by: Patrick Schlangen <pschlan@amd.com>
|
2026-02-23 19:08:34 -08:00 |
|
Jia Guo
|
ec85340531
|
[Quantization] Support FP8 MoE bias for models like GPT-OSS (#34906)
Signed-off-by: jasperjiaguo <jasperg662@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-02-23 19:07:47 -08:00 |
|
Rohan Potdar
|
2ff4e51152
|
[ROCm] AITER fused RoPE+KVCache (#33443)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: charlifu <charlifu@amd.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
|
2026-02-23 19:06:00 -08:00 |
|
Asaf Gardin
|
95642441d0
|
[Mamba1] - Change supports_update_block_table to True (#35054)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-02-23 19:05:57 -08:00 |
|
Xin Yang
|
a7c9f7b7ec
|
[Bugfix] Fix lora_ids in FusedMoE LoRA test (#35135)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-02-23 21:49:25 -05:00 |
|
Michael Goin
|
a4bd661fb3
|
[Perf] Enable FlashInfer DeepGEMM swapAB on SM90 by default (#34924)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-23 17:34:41 -08:00 |
|
Michael Goin
|
3ef9fd0f98
|
[Bugfix] Fix DSV3 kernels breaking _C and _moe_C on unsupported arches (#35123)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-23 17:11:27 -08:00 |
|
Michael Goin
|
22a97e6613
|
[Perf] Improve default triton fused moe configs (#34846)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-23 16:01:28 -08:00 |
|
Aaron Hao
|
596ed1f02e
|
[RL] Validation for pause_mode='keep' (#34992)
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-23 16:30:56 -05:00 |
|
Nicolò Lucchesi
|
b8d8b7e934
|
[Misc] Monitor interface changes (#35113)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-23 17:14:51 +00:00 |
|
Harry Mellor
|
28c5e69ba0
|
Enforce that model is the first positional arg when --served-model-name is used (#34973)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-23 08:38:05 -08:00 |
|
Harry Mellor
|
864167d376
|
Fix custom processors that use deleted import for Transformers v5 (#35101)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-23 08:38:00 -08:00 |
|
haosdent
|
a2ba6a5244
|
[Bugfix] Fix prefix caching for Mamba 'all' mode (Nemotron models) (#34874)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-23 17:31:51 +01:00 |
|
Harry Mellor
|
c4f38696f7
|
Use Xet high performance mode for Transformers v5 (#35098)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-23 08:19:30 -08:00 |
|