Wentao Ye
|
2f4226fe52
|
[CI] Fix pre-commit mypy issue in main (#36049)
|
2026-03-04 18:13:12 -08:00 |
|
nkm-meta
|
792cbd64ca
|
Add platform method to enable custom collective ops registration (#34760)
Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com>
|
2026-03-05 00:50:32 +00:00 |
|
Zhengxu Chen
|
2ed4722e26
|
[compile] Reduce log spam from compile. (#36044)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 00:48:36 +00:00 |
|
Nick Hill
|
a3299c3d1d
|
[Model Runner V2] Misc code simplification (#35941)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 15:26:35 -08:00 |
|
Andreas Karatzas
|
6c21a0c2d7
|
[ROCm][CI] Added MI325 mirrors (stage C) (#35239)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-04 14:48:46 -08:00 |
|
Shanshan Shen
|
562339abc3
|
[Misc] Support OOT linear method registering (#35981)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-03-04 22:25:56 +00:00 |
|
amitz-nv
|
d7adcadb9b
|
[Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2026-03-04 22:23:51 +00:00 |
|
Simon Mo
|
f678c3f61a
|
[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
|
2026-03-04 17:05:32 -05:00 |
|
Thomas Parnell
|
be0a3f7570
|
[Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy (#36013)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-04 13:52:44 -08:00 |
|
Harry Mellor
|
17dc9c7fc9
|
[CI] Bump mypy version (#34950)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 20:55:11 +00:00 |
|
fenypatel99
|
7eca859110
|
Add PyTorch profiler schedule support with warmup/active iterations (#35240)
|
2026-03-04 12:53:38 -08:00 |
|
Russell Bryant
|
636ee223ac
|
[Docs] Document security risks of GPT-OSS Python tool (#35139)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 20:27:31 +00:00 |
|
Robert Shaw
|
b7d59ffce2
|
[UX] Remove NoOpOffloader log (#35678)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-04 12:13:40 -08:00 |
|
Richard Zou
|
5569f5218d
|
[torch.compile] Stop lazily compiling (#35472)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-04 12:13:17 -08:00 |
|
Davina Zaman
|
138d891d7f
|
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441)
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 11:44:39 -08:00 |
|
Stefano Castagnetta
|
d7166e74c1
|
[CI] Add Blackwell AsyncTP correctness test (#35871)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
|
2026-03-04 19:41:21 +00:00 |
|
Nick Hill
|
417fd28fb1
|
[Model Runner V2] Fix pooling (#36019)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 10:53:17 -08:00 |
|
tomeras91
|
7faba503c4
|
[Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels (#35397)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-04 19:47:17 +01:00 |
|
Hyunkyun Moon
|
bc6be89d16
|
[Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551)
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
|
2026-03-04 18:41:52 +00:00 |
|
Maxime Grenu
|
32224f568a
|
docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882)
Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:31:35 -08:00 |
|
Abhishek Mathukiya
|
f3dc292e9f
|
docs: add version requirement note for --profiler-config flag (#32454)
Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>
|
2026-03-04 18:13:54 +00:00 |
|
Chen
|
138c5fa186
|
[Docs] Add RunPod GPU deployment guide for vLLM (#34531)
Signed-off-by: lisperz <zhuchen200245@163.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:11:34 -08:00 |
|
Russell Bryant
|
2f2c1d73a7
|
[Docs] Upgrade dynamic LoRA warning to admonition block (#35218)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 10:01:42 -08:00 |
|
Bhuminjay Soni
|
fb3e78ab09
|
[Feature][CI]: compare func & no_func outputs in test_functionalization.py (#35481)
Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com>
Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-04 18:01:16 +00:00 |
|
Michael Yao
|
fd3bfe74c9
|
[Docs] Update design/multiprocessing.md (#30677)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2026-03-04 17:58:59 +00:00 |
|
tc-mb
|
bfdb512f11
|
fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… (#34127)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Co-authored-by: hezhihui <hezhihui@modelbest.cn>
|
2026-03-04 17:46:17 +00:00 |
|
Sage
|
d25c1ec3c9
|
docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-04 17:45:35 +00:00 |
|
Xing Liu
|
7cc6058ac6
|
[Doc] Add MTP docs and update speculative decoding guidance (#35197)
Signed-off-by: liuxing <945764858@qq.com>
|
2026-03-04 17:23:34 +00:00 |
|
Manrique Vargas
|
28028dff2f
|
fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784)
Signed-off-by: machov <mv1742@nyu.edu>
|
2026-03-04 17:15:35 +00:00 |
|
Dr Alex Mitre
|
3417ba5648
|
docs: add README for logits_processor examples (#35933)
|
2026-03-04 17:09:19 +00:00 |
|
Yan Ma
|
58cfe0dc44
|
Fix phi4-mm and remove cuda binding (#35964)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-05 01:08:05 +08:00 |
|
simone-dotolo
|
e86221deb6
|
[Doc] Fix GPU Worker count in Process Count Summary (#36000)
Signed-off-by: simone-dotolo <simonedotolo@libero.it>
Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 17:03:14 +00:00 |
|
Netanel Haber
|
289fc48ab7
|
Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653)
|
2026-03-04 08:43:13 -08:00 |
|
Christian Pinto
|
2f2212e6cc
|
Split generic IO Processor plugins tests from Terratorch specific ones (#35756)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2026-03-05 00:01:03 +08:00 |
|
Nicolò Lucchesi
|
18e01a0a10
|
[Misc] Add --attention-backend auto option (#35738)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-04 15:12:27 +00:00 |
|
sungsoo ha
|
6cb901093f
|
[Core] Add All-to-All communication backend for DCP (#34883)
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
Signed-off-by: sungsoo ha <hasungsoo@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:01:57 -05:00 |
|
Cyrus Leung
|
ead7bde1ab
|
[Bugfix] Make kaldi_native_fbank optional (#35996)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-04 06:47:32 -08:00 |
|
Qi Wang
|
6aa6ad8992
|
[BugFix] Fix implicit and incorrect assumption on ECConnector is_producer (#34783)
Signed-off-by: Qi Wang <qiwa@nvidia.com>
|
2026-03-04 15:01:30 +01:00 |
|
Raghavan
|
c8c3935b70
|
[Bugfix][Model] Fix FP8 k_scale/v_scale not loaded for Qwen3-MoE (#35656)
Signed-off-by: raghavan <oneraghavan@gmail.com>
|
2026-03-04 13:15:38 +00:00 |
|
Ronen Schaffer
|
bb6888b8b1
|
[Bugfix][CPUOffloadingManager] Prevent eviction of already-stored blocks in LRU/ARC prepare_store() (#35846)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-03-04 14:25:33 +02:00 |
|
Taneem Ibrahim
|
1aaec59d79
|
[MISC] fixed tool_parser mypy errors (#35640)
Signed-off-by: Taneem Ibrahim <taneem.ibrahim@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 12:23:12 +00:00 |
|
pougetat
|
1659b2e058
|
[Feature] Add basic metrics for /realtime endpoint (#35500)
Signed-off-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Signed-off-by: pougetat <thomas.pougetabadie@gmail.com>
Co-authored-by: Thomas Pouget-Abadie <thomaspou@microsoft.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 19:56:32 +08:00 |
|
haosdent
|
d6e04f4c43
|
[Bugfix] Cap FULL decode cudagraph sizes for Mamba/hybrid models (#34094) (#34571)
Signed-off-by: haosdent <haosdent@gmail.com>
Co-authored-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-03-04 11:56:22 +01:00 |
|
Kunshang Ji
|
a8f66cbde8
|
[XPU] bump vllm-xpu-kernels to v0.1.3 (#35984)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-04 18:23:31 +08:00 |
|
Kunshang Ji
|
16d2ad1d38
|
[Hardware] Replace torch.cuda.empty_cache with torch.accelerator.empty_cache (#30681)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 09:49:47 +00:00 |
|
Chuan (Richard) Li
|
5dc3538736
|
[ROCm][Bugfix] Fall back from CK MXFP4 MoE when GEMM dimensions are unsupported (#35893)
Signed-off-by: Li <chuali@amd.com>
|
2026-03-04 08:30:54 +00:00 |
|
Nathan Price
|
36bf213181
|
[Bugfix] Add missing dynamic_arg_dims for Qwen3-ASR torch.compile (#35869)
Signed-off-by: Nathan Price <nathan@abridge.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-04 08:29:01 +00:00 |
|
Joe Runde
|
6f0dd93801
|
[Core] Remove busy loop from idle buffer readers (#28053)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 07:44:20 +00:00 |
|
Andrii Skliar
|
5d199ac8f2
|
Support Audio Extraction from MP4 Video for Nemotron Nano VL (#35539)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Andrii <askliar@nvidia.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Andrii Skliar <askliar@oci-nrt-cs-001-vscode-01.cm.cluster>
Co-authored-by: Andrii <askliar@nvidia.com>
Co-authored-by: root <root@pool0-03748.cm.cluster>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: root <root@pool0-02416.cm.cluster>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: wangxiyuan <wangxiyuan1007@gmail.com>
Co-authored-by: root <root@pool0-04880.cm.cluster>
|
2026-03-03 23:20:33 -08:00 |
|
Komal Kumar Teru
|
9e0f44bec4
|
[cohere][fix][spec-decode]: fix crash when allowed_token_ids is set without penalties (#35654)
Signed-off-by: kkt-cohere <komal@cohere.com>
|
2026-03-03 23:20:15 -08:00 |
|