Lucas Wilkinson
|
c7914d30f9
|
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-11 07:07:56 -08:00 |
|
Adam Binford
|
1b8756562e
|
Responses harmony system message structured (#34268)
Signed-off-by: Adam Binford <adamq43@gmail.com>
|
2026-02-11 05:14:28 -08:00 |
|
Linda
|
275e0d2a99
|
[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-11 12:38:11 +00:00 |
|
Harry Mellor
|
0f5e55e7a8
|
Make JAIS compatible with Transformers v5 (#34264)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-11 12:30:37 +00:00 |
|
Harry Mellor
|
1e9204bff3
|
Make Qwen3VL compatible with Transformers v5 (#34262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-11 04:13:23 -08:00 |
|
Li, Jiang
|
05339a7b20
|
[Bugfix][CPU] Fix llama4 inference on CPU (#34321)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-11 19:07:23 +08:00 |
|
Harry Mellor
|
40b8f55358
|
[Docs] Reduce time spent generating API docs (#34255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-11 02:56:02 -08:00 |
|
Seiji Eicher
|
5045d5c983
|
Patch protobuf for CVE-2026-0994 (#34253)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-02-11 02:25:04 -08:00 |
|
Nick Hill
|
e09546cf05
|
[Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-11 11:03:24 +01:00 |
|
Tianqi Ren
|
786806dd44
|
[Doc] Update Marlin support matrix for Turing (#34319)
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>
|
2026-02-11 09:03:41 +00:00 |
|
Nick Hill
|
79504027ef
|
[Misc] Bump fastsafetensors version for latest fixes (#34273)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-11 00:30:09 -08:00 |
|
Luka Govedič
|
addac0e653
|
[torch.compile] Enable AR+rms fusion by default available for -O2 (#34299)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-02-11 00:30:00 -08:00 |
|
Cyrus Leung
|
675a22ed66
|
[Chore] Move BaseRenderer to base.py (#34308)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 00:29:51 -08:00 |
|
Kunshang Ji
|
cb9574eb85
|
[XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-11 00:27:15 -08:00 |
|
AllenDou
|
21dfb842d7
|
[model] support FunASR model (#33247)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-02-11 07:37:09 +00:00 |
|
R3hankhan
|
d1b837f0ae
|
[CPU] Enable FP16 (Half dtype) support for s390x (#34116)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-11 14:41:42 +08:00 |
|
Roger Wang
|
0b20469c62
|
[Bugfix] Fix weight naming in Qwen3.5 (#34313)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 21:37:14 -08:00 |
|
Tyler Michael Smith
|
d7982daff5
|
[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-11 05:15:52 +00:00 |
|
Robert Shaw
|
9b17c57460
|
[ModelBash][DSR1 NVFp4] Removed Bf16 Bias Cast (#34298)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-02-11 05:00:00 +00:00 |
|
Hashem Hashemi
|
1b3540e6c6
|
Threshold fix wvSplitk for occasional CI fails (#34013)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-02-11 03:59:14 +00:00 |
|
Matthias Gehre
|
7a048ee65f
|
[Bugfix] Fix benchmark_moe.py inplace assertion with torch >= 2.9 (#34149)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-02-11 03:58:56 +00:00 |
|
Cyrus Leung
|
c9a1923bb4
|
[Plugin] Simplify IO Processor Plugin interface (#34236)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-10 19:47:39 -08:00 |
|
zofia
|
b482f71e9f
|
[XPU][7/N] enable xpu fp8 moe (#34202)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
|
2026-02-11 03:33:59 +00:00 |
|
Дзержи́нский
|
1485396abb
|
[Kernel] Apply 256bit LDG/STG To Activation Kernels (#33022)
Signed-off-by: Dzerzhinsky <256908701+AstroVoyager7@users.noreply.github.com>
Signed-off-by: Дзержи́нский <256908701+AstroVoyager7@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-02-10 19:31:51 -08:00 |
|
Kebe
|
5ee5c86eeb
|
[Bugfix][DeepSeek-V3.2] fix fp8 kvcache type cast (#33884)
Signed-off-by: Kebe <mail@kebe7jun.com>
|
2026-02-10 19:31:36 -08:00 |
|
Cyrus Leung
|
b5dcb372e4
|
[Misc] Clean up validation logic in input processor (#34144)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-10 19:29:29 -08:00 |
|
Tyler Michael Smith
|
066c6da6a0
|
[WideEP] Fix nvfp4 DeepEP High Throughput All2All backend (#33738)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-10 19:15:43 -08:00 |
|
Richard Zou
|
e30cedd44b
|
[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-10 19:15:40 -08:00 |
|
Cyrus Leung
|
3bcd494ef4
|
[Redo] Add --trust-remote-code to dataset bench args (#34251)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 11:10:12 +08:00 |
|
tianshu-Michael-yu
|
0e725a7d22
|
[Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
|
2026-02-11 11:07:51 +08:00 |
|
Lucas Wilkinson
|
ba0511fd80
|
[Misc] Add run one batch script that supports profiling (#32968)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-10 18:29:49 -08:00 |
|
Micah Williamson
|
4a1550d22d
|
[ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-11 01:08:11 +00:00 |
|
bnellnm
|
d1481ba783
|
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-02-10 19:51:07 -05:00 |
|
7. Sun
|
dc6de33c3d
|
[CI] Add pip caching to cleanup_pr_body workflow (#32979)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-02-11 00:45:28 +00:00 |
|
Tyler Michael Smith
|
c4b9e6778f
|
[Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-10 15:13:20 -08:00 |
|
Richard Zou
|
341eed3d30
|
[torch.compile] Disable recursive pre_grad_passes (#34092)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-10 18:02:31 -05:00 |
|
Zhengkai Zhang
|
6f2f59f2b3
|
[Misc][Spec Decode] support different load config for draft model (#34022)
Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>
Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>
|
2026-02-10 14:52:43 -08:00 |
|
Ilya Markov
|
bb2fc8b5e7
|
[BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-02-10 22:34:47 +00:00 |
|
Ilya Markov
|
67132945bb
|
[Perf] Move eplb rebalance algo to async thread (#30888)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-02-10 22:19:10 +00:00 |
|
Gregory Shtrasberg
|
f0ca0671c7
|
[Feature] Warn about unrecognized environment variables (#33581)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-10 15:45:38 -06:00 |
|
Pavani Majety
|
578977bb5e
|
[SM100] Resubmit FMHA FP8 prefill for MLA (#31195)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-10 16:18:43 -05:00 |
|
Roger Wang
|
9615575afc
|
[Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 13:12:31 -08:00 |
|
Matthew Bonanni
|
4293c00b84
|
[Benchmarks] Fix attention benchmark smoke test (#34269)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-10 16:04:07 -05:00 |
|
J Seppänen
|
506ad7d7c1
|
[Bugfix] Fix weights offloading for sleep mode (#32947)
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-02-10 20:38:17 +00:00 |
|
Reagan Lee
|
fdd6f2ad58
|
Convert online APIs to use Renderer (#34084)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
|
2026-02-10 19:44:31 +00:00 |
|
Qi Wang
|
33bcd3dc3b
|
[Misc] Introduce ec_both role EC (encoder cache) connector (#34182)
Signed-off-by: Qi Wang <qiwa@nvidia.com>
|
2026-02-10 18:55:35 +00:00 |
|
Michael Goin
|
1f5febb4b8
|
[UX nit] Fix non-default api_server_count message (#34152)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-10 10:35:58 -08:00 |
|
Andy Lo
|
ae871ca923
|
Minor cleanup for Voxtral (#34247)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-02-10 18:18:30 +00:00 |
|
Woosuk Kwon
|
a2443de5fa
|
[Model Runner V2] Use pinned memory for write_contents (#34222)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-10 08:55:22 -08:00 |
|
Harry Mellor
|
f84a2a8f31
|
[Docs] Speed up build environment set-up (#34240)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-10 16:34:43 +00:00 |
|