Kevin H. Luu
|
db14f61f2d
|
[ci] Refactor CI file structure (#29343)
|
2025-12-08 17:25:43 -09:00 |
|
Micah Williamson
|
78c7503364
|
[ROCm][CI] Skip NVIDIA-Only Prime-RL Test in AMD CI (#29420)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2025-12-09 02:14:02 +00:00 |
|
Christina Norman
|
e41312a2f5
|
[Bugfix] Skip generation config fallback for GGUF to prevent multi-process hang (#30209)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2025-12-09 01:52:43 +00:00 |
|
Yanan Cao
|
7b35011ad1
|
Mark qwen2_5_vl as xfail (#30283)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-12-09 01:14:10 +00:00 |
|
Zhewen Li
|
ae339b1a67
|
[Bugfix] Fix DeepGEMM after #29546 (#30267)
Signed-off-by: zhewenli <zhewenli@meta.com>
Signed-off-by: Zhewen Li <zhewenli@meta.com>
|
2025-12-09 01:05:27 +00:00 |
|
Wentao Ye
|
0ee6416f67
|
[Perf] Optimize group_topk kernel, 1.9% Throughput improvement, 2.1% TPOT improvemnt (#30159)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-08 19:44:01 -05:00 |
|
Wentao Ye
|
d9417096d1
|
[Feature] Batch invariant: Enable TRITON_MLA without prefix-caching (#29125)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-08 19:31:57 -05:00 |
|
Ming Yang
|
9d6235ca9a
|
[moe] Allow disabling DP chunking (#29936)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-09 00:29:36 +00:00 |
|
Victor Ziliang Peng
|
f1599ca55d
|
feat(metrics): Add prefill KV compute metric excluding cached tokens (#30189)
Signed-off-by: Ziliang Peng <ziliang@character.ai>
|
2025-12-09 00:08:48 +00:00 |
|
Ming Yang
|
60d17251c9
|
[Disagg] Support large batch size in proxy server and update NixlConnector doc for DP (#28782)
Signed-off-by: Ming Yang <minos.future@gmail.com>
|
2025-12-09 00:01:08 +00:00 |
|
Lain
|
1fb632fdb6
|
[Perf] Improve fp8 quant in mla; replace ReduceSum with ReduceScatterSum (#29795)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
|
2025-12-08 15:02:34 -08:00 |
|
Charlie Fu
|
6af70e11a0
|
[ROCm][CI] Fix test_max_len.py for Rocm (#29916)
Signed-off-by: charlifu <charlifu@amd.com>
Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>
|
2025-12-08 16:58:30 -05:00 |
|
roikoren755
|
ae0f69b16a
|
Add SpecDec support to selective_state_update (#29488)
Signed-off-by: Roi Koren <roik@nvidia.com>
|
2025-12-08 16:45:18 -05:00 |
|
Dmitry Tokarev
|
799804d140
|
Bump nvshmem to 3.3.24 and fix CUDA 13 installation (#30149)
Signed-off-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-08 20:24:34 +00:00 |
|
Vasiliy Kuznetsov
|
0d402d2600
|
online fp8 quant with streaming weight post-processing (#29196)
Signed-off-by: vasiliy <vasiliy@fb.com>
|
2025-12-08 20:15:10 +00:00 |
|
Johnny Yang
|
d1b5e7afbf
|
[TPU] Bump tpu-inference to 0.12.0 (#30221)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
|
2025-12-08 20:10:10 +00:00 |
|
shaharmor98
|
fcd5306f65
|
Add latent MoE support (#30203)
Signed-off-by: Shahar Mor <smor@nvidia.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-12-08 17:35:01 +00:00 |
|
weiguihua2
|
398a596ed2
|
[MP executor] fix get device count for multi node of mp executor feature (#30042)
Signed-off-by: weiguihua2 <weiguihua2@huawei.com>
|
2025-12-09 01:33:48 +08:00 |
|
Jee Jee Li
|
67312cad11
|
[Misc] Split the LoRA code (#30253)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-09 00:59:31 +08:00 |
|
Laith Sakka
|
87aee9ed2b
|
Add evaluate_guards option to DynamicShapesConfig (#27432)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-12-08 10:46:15 -05:00 |
|
Daniel Cámpora
|
184076c3fe
|
[DeepSeek v3.2] Make top-k work for any logit values. (#27568)
Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-08 06:55:58 -08:00 |
|
Ye (Charlotte) Qi
|
eb1051fb95
|
[ROCm] Guard group quant RMS norm fusion patterns (#30239)
|
2025-12-08 14:44:48 +00:00 |
|
Jee Jee Li
|
80433e225e
|
[LoRA] Reduce the loading time of MoE LoRA (#30243)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-08 13:29:47 +00:00 |
|
Harry Mellor
|
5c2433a6f3
|
Add tip for mypy and markdownlint to the pre-commit comment (#30259)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-08 13:11:51 +00:00 |
|
Simon Mo
|
77072e93b3
|
[docs] governance documents (#24801)
Signed-off-by: simon-mo <simon.mo@hey.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: Simon Mo <simon.mo@hey.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-12-08 12:06:20 +00:00 |
|
wang.yuqi
|
2e660c2434
|
[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-08 12:01:21 +00:00 |
|
Shiming Zhang
|
408cf42f67
|
[CI] Prevents triggering of an inactive issue/PR check for forked repository. (#29654)
Signed-off-by: Shiming Zhang <wzshiming@hotmail.com>
|
2025-12-08 10:29:14 +00:00 |
|
wang.yuqi
|
9e77ffca3f
|
[Model][7/N] Improve all pooling task | Deprecation as_reward_model. Extract hidden states prefer using new multi-vector retrieval API (#26686)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-08 08:10:09 +00:00 |
|
Dazhi Jiang
|
bcb6f5947f
|
[Perf] Remove sync point in vit torch sdpa attn backend (#30232)
Signed-off-by: Dazhi Jiang <dazhi_jiang@163.com>
|
2025-12-08 07:12:42 +00:00 |
|
Zhiyu
|
cd00c443d2
|
[Misc] Rename TensorRT Model Optimizer to Model Optimizer (#30091)
Signed-off-by: Zhiyu Cheng <zhiyuc@nvidia.com>
|
2025-12-08 07:05:27 +00:00 |
|
Jiangyun Zhu
|
d143271234
|
[Bugfix] fix fuse_allreduce_rms when tp =1 (#30178)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2025-12-08 06:43:47 +00:00 |
|
Zhiwei
|
c6df05ebb4
|
[ROCm] [Fused Moe EP] Use binary expert mask for aiter fused moe kernel (#29773)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
|
2025-12-08 05:23:46 +00:00 |
|
Nick Hill
|
d726a7b0ed
|
[BugFix] Unblock use of LoRA with data parallel mode (#30220)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-08 12:21:05 +08:00 |
|
Zhijian Jiang
|
344b50d525
|
Address comment to mergify.yml in #30117 (#30219)
Signed-off-by: Zhijian Jiang <Zhijian.Jiang@outlook.com>
|
2025-12-08 11:26:25 +08:00 |
|
Andrew Xia
|
735284ed86
|
[responsesAPI][7] Browser, Container MCP tools for non harmony models (#29989)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-08 10:04:03 +08:00 |
|
daniel-salib
|
444f0e3f33
|
[Frontend] Add MCP type support infrastructure to Responses API (#30054)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2025-12-08 10:02:52 +08:00 |
|
ElizaWszola
|
af0444bf40
|
[Performance] Fused blockwise quant RMS norm (#27883)
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 16:38:04 +00:00 |
|
Lucas Wilkinson
|
0044c4038c
|
[BugFix][DeepSeek-V3.2] Fix backend selection logic for Blackwell (#30195)
|
2025-12-07 10:53:51 -05:00 |
|
Isotr0py
|
b952f4d3c3
|
[v1] Add PrefixLM support to FlexAttention backend (#27938)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-07 15:51:36 +00:00 |
|
Wentao Ye
|
541a2ef892
|
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 20:31:14 +08:00 |
|
Jee Jee Li
|
b0f4866a77
|
[CI/Build]Temporary workaround for test_default_mm_loras timeout (#30202)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-07 20:27:11 +08:00 |
|
Jinzhen Lin
|
879ddb09c3
|
[Kernel][MoE] optimize moe_align_block_size (#29642)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-07 01:58:47 -08:00 |
|
Yifan Qiao
|
1b0482b9d1
|
[Misc][Core] Remove unused req_index increment in scheduler (#30176)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
|
2025-12-07 08:39:21 +00:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
Luke
|
a49d813fa8
|
Lazy loading to avoid importing all files (#29716)
Signed-off-by: Luke <yq0536@gmail.com>
|
2025-12-07 07:13:14 +00:00 |
|
Wentao Ye
|
17eb25e327
|
[Perf] Enable cuda graph for deepepHT, 5.3% throughput improvement, 4.4% TTFT improvement (#29558)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 04:44:50 +00:00 |
|
jeremyteboul
|
dce6d229f7
|
Support multiple image/audio embeddings per requests (#29988)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2025-12-07 04:34:24 +00:00 |
|
Yanan Cao
|
cbedb703cc
|
[Frontend] Remove confusing -O.xx flag error (#30169)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2025-12-07 02:53:42 +00:00 |
|
AuruTus
|
8d3da4c79d
|
[MISC]: change NIXL compatibility hash logging level to debug (#30182)
|
2025-12-07 00:21:03 +00:00 |
|