Vadim Gimpelson
|
7624525bf6
|
cherry-pick [Bugfix] Restore prepare_fp8_layer_for_marlin removed by merge conflict resolution
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: vadiklyutiy <vgimpelson@nvidia.com>
#38398
|
2026-03-27 14:49:47 -07:00 |
|
khluu
|
9fdc0f3aeb
|
merge
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-26 02:17:52 -07:00 |
|
Vadim Gimpelson
|
05d96d7991
|
merge
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-26 01:25:41 -07:00 |
|
Dimitrios Bariamis
|
ccbc5ac449
|
[Bugfix] Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
Signed-off-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
Co-authored-by: Dimitrios Bariamis <12195802+dbari@users.noreply.github.com>
(cherry picked from commit 1204cf0a9d)
|
2026-03-24 17:59:17 -07:00 |
|
khluu
|
bcf2be9612
|
[cherry-pick][Bugfix] Disable monolithic TRTLLM MoE for Renormalize routing (#37591)#37605
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-19 15:06:38 -07:00 |
|
Elvir Crnčević
|
89138b21cc
|
[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
(cherry picked from commit ef2c4f778d)
|
2026-03-18 18:44:16 -07:00 |
|
JartX
|
6edd43de3c
|
[Bugfix][ROCm] Fix worker startup OOM on ROCm by skipping unreliable cudagraph memory profiling (#36720)
Signed-off-by: JartX <sagformas@epdcenter.es>
(cherry picked from commit e8f9dbc369)
|
2026-03-18 18:43:52 -07:00 |
|
khluu
|
262ddd0d81
|
[cherry-pick][Bugfix] Fix EP weight filter breaking EPLB and NVFP4 accuracy #37322
Signed-off-by: khluu <khluu000@gmail.com>
|
2026-03-18 01:48:32 -07:00 |
|
Li, Jiang
|
e60c1674b3
|
[Bugfix] Avoid OpenMP thread reallocation in CPU torch compile (#37391)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
(cherry picked from commit 261801242f)
|
2026-03-18 01:41:42 -07:00 |
|
Roy Wang
|
faa80947f5
|
[Performance] Add --enable-ep-weight-filter CLI option (#37351)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
(cherry picked from commit 761e0aa7a0)
|
2026-03-18 01:41:25 -07:00 |
|
Terry Gao
|
eeabf740bb
|
[Custom Ops] Add functional + out variant for scaled_fp4_quant (#34389)
Signed-off-by: tianrengao <terrygao87@gmail.com>
(cherry picked from commit 3e6a1e1686)
|
2026-03-18 01:41:09 -07:00 |
|
Elvir Crnčević
|
cdcffafef8
|
Fix eplb nvfp4 experts hook (#37217)
Signed-off-by: Elvir Crncevic <elvircrn@gmail.com>
Signed-off-by: Elvir Crncevic <elvir@anthropic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
(cherry picked from commit fd4d96302a)
|
2026-03-18 01:40:57 -07:00 |
|
Walter Beller-Morales
|
4d22667c32
|
[Feature][Frontend] add support for Cohere Embed v2 API (#37074)
Signed-off-by: walterbm <walter.beller.morales@gmail.com>
(cherry picked from commit 061980c36a)
|
2026-03-16 22:05:47 -07:00 |
|
Andreas Karatzas
|
1fe3932c8b
|
[ROCm] Fix AttributeError for torch.compiler.skip_all_guards_unsafe on older PyTorch (#37219)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
(cherry picked from commit 54a62a79f7)
|
2026-03-16 21:03:49 -07:00 |
|
zhanqiuhu
|
2dccb38f73
|
[Bugfix][MultiConnector] Fix MultiConnector for SupportsHMA sub-connectors (#36549)
|
2026-03-16 20:51:04 +00:00 |
|
Kunshang Ji
|
d157216093
|
[BUGFIX][Mamba] Use uint64 for address in KVBlockZeroer (#37197)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-16 21:39:56 +01:00 |
|
Matthew Bonanni
|
93f3c8e531
|
[Misc] Add float16 to CacheDType (#37199)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:24:48 -07:00 |
|
Flora Feng
|
dfa8852db2
|
[Refactor] Consolidate GPT-OSS reasoning parser tests (#36915)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Flora Feng <4florafeng@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-16 15:53:07 -04:00 |
|
Lucas Kabela
|
714c6e0eab
|
[torch.compile][BE] Modify cudagraph callable to check for is_forward_context_set (#36288)
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
|
2026-03-16 19:42:34 +00:00 |
|
Sage
|
0fefd00e6c
|
[Bugfix] Fix render server crash for quantized models on CPU-only hosts (#37215)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-16 18:59:01 +00:00 |
|
Nicolò Lucchesi
|
f5c081d432
|
[PD][Nixl] Add support for hybrid SSM-FA models (#36687)
|
2026-03-16 19:58:06 +01:00 |
|
Matthew Bonanni
|
c88ea8338b
|
[MTP][Sparse MLA] Take advantage of native MTP support in indexer when possible (#36982)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-03-16 13:51:21 -04:00 |
|
Max de Bayser
|
9f9ecff4cd
|
Add simple granite4 tool parser (#36827)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2026-03-16 10:49:09 -07:00 |
|
haosdent
|
ca1954d58c
|
[Bugfix] Disable cross-layer KV cache for MLA attention backends (#37090)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-16 19:03:10 +02:00 |
|
Chauncey
|
6682c231fa
|
[Bugfix] Add error handling for FINISHED_ERROR in OpenAIServing (#37148)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-16 16:27:47 +00:00 |
|
Itay Etelis
|
5ae685c1c8
|
[Bugfix] Relax TRTLLM KV cache contiguity assertion for cross-layer layout (#34158)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
|
2026-03-16 11:20:51 -04:00 |
|
Yuanheng Zhao
|
8d8855fdae
|
[Bugfix] Add safety check and fallback for null scaling factor (#36106)
Signed-off-by: Yuanheng Zhao <jonathan.zhaoyh@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 14:27:29 +00:00 |
|
Benjamin Bartels
|
0e5a9382af
|
[Bugfix] accept redacted thinking blocks in Anthropic messages (#36992)
Signed-off-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
Signed-off-by: bbartels <benjamin@bartels.dev>
Co-authored-by: Benjamin Bartels <benjaminba@tiglab-ubuntu.ilab.local>
|
2026-03-16 22:01:57 +08:00 |
|
Fynn Schmitt-Ulms
|
04bf5a35fa
|
[Spec Decode] Update extract_hidden_states to use deferred kv_connector clear (#37013)
|
2026-03-16 14:53:45 +01:00 |
|
Tianyu Guo
|
43a73f853b
|
Remove unused EVS functions in qwen3_vl.py (#37183)
Signed-off-by: Tianyu Guo <guoty9@mail2.sysu.edu.cn>
|
2026-03-16 13:09:09 +00:00 |
|
Julien Denize
|
ffbc2e5bdb
|
Patch Mistral config (#37104)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2026-03-16 12:22:18 +00:00 |
|
Lukas Geiger
|
f9e6db3034
|
[Models][Qwen3 ViT] Keep max_seqlen on CPU to prevent D2H sync (#37139)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 12:11:59 +00:00 |
|
Artem Perevedentsev
|
f5e59ee7a6
|
[Performance] Add prefetch for checkpoints to OS page cache (#36012)
Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>
|
2026-03-16 11:32:02 +00:00 |
|
Robin Nabel
|
bf9a185395
|
GLM4 tool parser: fix streaming mode (#35208)
Signed-off-by: Robin Nabel <opensource@nabel.co>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-03-16 18:48:52 +08:00 |
|
Harry Mellor
|
ad041c79db
|
Fix text only inputs for MRoPE models with the Transformers modelling backend (#37055)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:31:16 +00:00 |
|
Kunshang Ji
|
747b068136
|
[Hardware] Replace memory related torch.cuda APIs (#37031)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-03-16 10:24:48 +00:00 |
|
Harry Mellor
|
122f75d939
|
Fix pipeline parallel with multimodal models with the Transformers modelling backend (#37057)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-16 10:20:37 +00:00 |
|
Roy Wang
|
0115e957d4
|
[Frontend][Misc] Remove unused log in /is_sleeping (#37093)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-03-16 17:46:28 +08:00 |
|
haosdent
|
116ed130f4
|
[Bugfix] Fix GDN attention crash with mixed decode/spec-decode batches (#34871)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-03-16 10:30:23 +01:00 |
|
Vadim Gimpelson
|
8374387bd8
|
[FlashInfer] Revert block_size 16 + head_size 256 workaround on Blackwell (#36987)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-03-16 09:04:29 +00:00 |
|
Isotr0py
|
912fbe9555
|
[Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-16 08:56:06 +00:00 |
|
Laith Sakka
|
52131f88d9
|
use skip_all_guards_unsafe to drop global_state and torch_function_mode_stack guards instead of previous hacks (#36204)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2026-03-16 08:52:31 +00:00 |
|
Roy Wang
|
821eb80c0d
|
[Performance][Model Loader] Skip non-local expert weights during EP model loading (#37136)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2026-03-16 01:33:36 -07:00 |
|
Andreas Karatzas
|
911355e216
|
[ROCm] Fix KV copy methods and auto-select attention backend for ROCm (#36845)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 16:07:27 +08:00 |
|
Chauncey
|
8d3f8f485e
|
[Bugfix] fix Qwen3.5 tool calling bug (#36774)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2026-03-16 15:38:42 +08:00 |
|
Woosuk Kwon
|
96efb91480
|
[Model Runner V2] Fix processed logits in sample() (#37144)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-16 00:35:49 -07:00 |
|
leo-cf-tian
|
2754231ba3
|
[Kernel] Add FlashInfer MoE A2A Kernel (#36022)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Signed-off-by: Leo Tian <lctian@nvidia.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Stefano Castagnetta <scastagnetta@nvidia.com>
Co-authored-by: root <root@lyris0267.lyris.clusters.nvidia.com>
|
2026-03-15 23:45:32 -07:00 |
|
bigshanedogg
|
2390d44209
|
[Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
|
2026-03-16 06:40:05 +00:00 |
|
Li, Jiang
|
7362b4450a
|
[Bugfix] Avoid LD_PRELOAD check on MacOS (#37145)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-03-15 23:31:44 -07:00 |
|
Andreas Karatzas
|
57a314d155
|
[CI][Bugfix] Fix 500 errors from priority overflow and TemplateError subclasses in schema fuzz tests (#37127)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-16 05:27:21 +00:00 |
|