Richard Zou
|
e30cedd44b
|
[torch.compile] Stop doing unnecessary FakeTensorProp in PiecewiseCompileInterpreter (#34093)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-10 19:15:40 -08:00 |
|
Cyrus Leung
|
3bcd494ef4
|
[Redo] Add --trust-remote-code to dataset bench args (#34251)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 11:10:12 +08:00 |
|
tianshu-Michael-yu
|
0e725a7d22
|
[Bugfix] Fix Worker.load_model context-manager composition for sleep mode (#34021)
Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>
|
2026-02-11 11:07:51 +08:00 |
|
Lucas Wilkinson
|
ba0511fd80
|
[Misc] Add run one batch script that supports profiling (#32968)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-10 18:29:49 -08:00 |
|
Micah Williamson
|
4a1550d22d
|
[ROCm][CI] Fix test_sequence_parallel.py location in AMD CI pipeline (#34280)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-11 01:08:11 +00:00 |
|
bnellnm
|
d1481ba783
|
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner (#32344)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-02-10 19:51:07 -05:00 |
|
7. Sun
|
dc6de33c3d
|
[CI] Add pip caching to cleanup_pr_body workflow (#32979)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
|
2026-02-11 00:45:28 +00:00 |
|
Tyler Michael Smith
|
c4b9e6778f
|
[Misc] Add pre-commit hook to catch boolean ops in with-statements (#34271)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-10 15:13:20 -08:00 |
|
Richard Zou
|
341eed3d30
|
[torch.compile] Disable recursive pre_grad_passes (#34092)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-02-10 18:02:31 -05:00 |
|
Zhengkai Zhang
|
6f2f59f2b3
|
[Misc][Spec Decode] support different load config for draft model (#34022)
Signed-off-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>
Co-authored-by: zzhengkai <zzhengkai@devgpu049.ldc1.facebook.com>
|
2026-02-10 14:52:43 -08:00 |
|
Ilya Markov
|
bb2fc8b5e7
|
[BugFix] Fix async EPLB hang with DeepEP LL all2all backend (#32860)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
|
2026-02-10 22:34:47 +00:00 |
|
Ilya Markov
|
67132945bb
|
[Perf] Move eplb rebalance algo to async thread (#30888)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-02-10 22:19:10 +00:00 |
|
Gregory Shtrasberg
|
f0ca0671c7
|
[Feature] Warn about unrecognized environment variables (#33581)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-10 15:45:38 -06:00 |
|
Pavani Majety
|
578977bb5e
|
[SM100] Resubmit FMHA FP8 prefill for MLA (#31195)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-10 16:18:43 -05:00 |
|
Roger Wang
|
9615575afc
|
[Bugfix] Fix mamba cache dtype for Qwen3.5 (#34200)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 13:12:31 -08:00 |
|
Matthew Bonanni
|
4293c00b84
|
[Benchmarks] Fix attention benchmark smoke test (#34269)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-02-10 16:04:07 -05:00 |
|
J Seppänen
|
506ad7d7c1
|
[Bugfix] Fix weights offloading for sleep mode (#32947)
Signed-off-by: Jarno Seppänen <jseppanen@nvidia.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-02-10 20:38:17 +00:00 |
|
Reagan Lee
|
fdd6f2ad58
|
Convert online APIs to use Renderer (#34084)
Signed-off-by: Reagan Lee <“reaganjlee@gmail.com”>
Co-authored-by: Reagan Lee <“reaganjlee@gmail.com”>
|
2026-02-10 19:44:31 +00:00 |
|
Qi Wang
|
33bcd3dc3b
|
[Misc] Introduce ec_both role EC (encoder cache) connector (#34182)
Signed-off-by: Qi Wang <qiwa@nvidia.com>
|
2026-02-10 18:55:35 +00:00 |
|
Michael Goin
|
1f5febb4b8
|
[UX nit] Fix non-default api_server_count message (#34152)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-10 10:35:58 -08:00 |
|
Andy Lo
|
ae871ca923
|
Minor cleanup for Voxtral (#34247)
Signed-off-by: Andy Lo <andy@mistral.ai>
|
2026-02-10 18:18:30 +00:00 |
|
Woosuk Kwon
|
a2443de5fa
|
[Model Runner V2] Use pinned memory for write_contents (#34222)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-10 08:55:22 -08:00 |
|
Harry Mellor
|
f84a2a8f31
|
[Docs] Speed up build environment set-up (#34240)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-10 16:34:43 +00:00 |
|
Vadim Gimpelson
|
000214c4bb
|
[BUGFIX] Fix accuracy bugs in Qwen3-Next MTP (#34077)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-02-10 10:57:11 -05:00 |
|
junuxyz
|
c5a66d1697
|
[Core][BugFix] Fix PP KV cache sharding memory validation (#33698)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-10 10:46:24 -05:00 |
|
Roberto L. Castro
|
afdce12c89
|
[Perf][Kernel] Add faster topKperRow decode kernel for DeepSeek-V3.2 sparse attention (#33680)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
|
2026-02-10 10:29:52 -05:00 |
|
Zhengxu Chen
|
82e11973cc
|
[compile] Enable AOT compile with 2.10 in trunk. (#34155)
Signed-off-by: Zhengxu Chen <zhxchen17@meta.com>
|
2026-02-10 23:24:42 +08:00 |
|
xuebwang-amd
|
b129136c7a
|
[ROCm][Quantization] GPT_OSS in amd-quark format model loading and emulations (#29008)
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-10 10:08:05 -05:00 |
|
mgazz
|
599e4335a4
|
Support benchmarking of Geospatial models (#33922)
Signed-off-by: Michele Gazzetti <michele.gazzetti1@ibm.com>
|
2026-02-10 07:04:16 -08:00 |
|
Fan Yang
|
a1946570d8
|
add --insecure arg to the vllm bench to skip TLS (#34026)
Signed-off-by: Fan Yang <yan9fan@meta.com>
Co-authored-by: Fan Yang <yan9fan@meta.com>
|
2026-02-10 22:23:52 +08:00 |
|
Harry Mellor
|
d0bc520569
|
Bump mamba-ssm version in CI for Transformers v5 compatibility (#34233)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-10 14:46:01 +01:00 |
|
Krish Gupta
|
748625cdaf
|
[V1][BugFix] Fix EAGLE3 encoder cache miss with disable_chunked_mm_input (#34220)
Signed-off-by: KrxGu <krishom70@gmail.com>
|
2026-02-10 13:05:32 +00:00 |
|
Harry Mellor
|
61413973e8
|
Stop testing for slow tokenizers as they will not exist soon (#34235)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-10 12:08:20 +00:00 |
|
Phúc H. Lê Khắc
|
94de871546
|
[Misc] allow specify is_mm_prefix_lm in hf_config (#34215)
|
2026-02-10 11:16:21 +00:00 |
|
tc-mb
|
e042d7e685
|
Add flagos in MiniCPM-o (#34126)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Signed-off-by: Vincent-Xiao <vincent.xiao.me@gmail.com>
Co-authored-by: Vincent-Xiao <vincent.xiao.me@gmail.com>
|
2026-02-10 02:51:48 -08:00 |
|
Roger Wang
|
ae4e280602
|
[Bugfix] Fix FI kernelchunk_gated_delta_rule output shape for Qwen3.5 (#34219)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 10:41:24 +00:00 |
|
zzaebok
|
cbea11c9f0
|
[Docs] Fix format error in KV load failure recovery doc (#34137)
Signed-off-by: Jaebok Lee <jaebok9541@naver.com>
|
2026-02-10 02:16:26 -08:00 |
|
Cyrus Leung
|
2c32558a3c
|
[Bugfix] Fix --trust-remote-code conflict (#34218)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-10 00:29:10 -08:00 |
|
Zetong Li
|
5f970120f0
|
[Bugfix] Fix memory inconsistency in cross-process shared memory (#32022)
Signed-off-by: Zetong Li <slippersss@126.com>
|
2026-02-10 08:22:03 +00:00 |
|
Cyrus Leung
|
998e2d91f8
|
Revert #34208 (#34216)
|
2026-02-09 23:59:04 -08:00 |
|
Wentao Ye
|
e1060a71a1
|
[Perf] Optimize detokenizer python logic (#32975)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2026-02-09 23:54:41 -08:00 |
|
Chen Zhang
|
97fa8f6590
|
[BugFix] Avoid prefix cache hit in the same schedule step for mamba layers (#29387)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2026-02-10 07:41:16 +00:00 |
|
wang.yuqi
|
dab1de9f38
|
[Frontend][CI] Consolidate instrumentator entrypoints (#34123)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-02-10 07:30:19 +00:00 |
|
Balaxxe
|
8d48d0a9d9
|
[Bugfix] Sort hf_weights_files in fastsafetensors_weights_iterator to match #33491 (#34190)
Signed-off-by: Balaxxe <136368465+jaim12005@users.noreply.github.com>
|
2026-02-09 23:06:30 -08:00 |
|
Andrew Xia
|
9608844f96
|
[responsesAPI] fix simpleContext streaming output_messages (#34188)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-02-09 22:53:07 -08:00 |
|
Cyrus Leung
|
f69b903b4c
|
[Bugfix] Add --trust-remote-code to dataset bench args (#34208)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-09 22:37:50 -08:00 |
|
Lucas Wilkinson
|
81e217fe6b
|
[Bugfix] Fix DP Attention Padding in Dummy Run (#34187)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-10 05:29:39 +00:00 |
|
Cyrus Leung
|
ab97bcf662
|
[CI/Build] Relax test_mcp_tool_call (#34204)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-10 05:18:57 +00:00 |
|
Cyrus Leung
|
25e48a3aae
|
[Doc] Update usage of --limit-mm-per-prompt (#34148)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-09 21:12:13 -08:00 |
|
Roger Wang
|
8a5e0e2b2b
|
[Bugfix][Core] Fix CPU memory leak from Request reference cycle in prefix caching (#34183)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 13:03:32 +08:00 |
|