Christian Munley
|
48e376a007
|
qwen3coder tool parser fix anyOf double encoded parameters (#36032)
Signed-off-by: Christian Munley <cmunley@nvidia.com>
|
2026-03-05 09:06:57 +00:00 |
|
Isotr0py
|
21eb2c3372
|
[Chore] Correct MTP models test registry ordering (#36115)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-05 08:55:04 +00:00 |
|
Seiji Eicher
|
e2b31243c0
|
[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2026-03-05 06:24:08 +00:00 |
|
Martin Hickey
|
c3598d02fa
|
[Misc] Remove deprecated items that are due for removal (#36006)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-05 06:14:50 +00:00 |
|
Benjamin Chislett
|
57c629e9c1
|
[Bugfix] Fix block_size for hybrid model MTP (#36036)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-05 06:10:54 +00:00 |
|
zihaoanllm
|
d106bf39f5
|
[Doc] Add Parallel Draft Models (#35973)
Signed-off-by: <zihaoan2@amd.com>
Signed-off-by: zihaoanllm <zihaoan2@amd.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 05:44:07 +00:00 |
|
Yanan Cao
|
b0651021e5
|
[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 (#36062)
|
2026-03-04 21:25:59 -08:00 |
|
Hanjun Cho
|
f600d5192e
|
[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849)
Signed-off-by: Hanjun Cho <gkswns0531@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-04 20:57:20 -08:00 |
|
Tianmu Li
|
8e7820131e
|
[Perf] Use dummy M for weight prepacking on x86 (#35890)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
2026-03-05 04:56:49 +00:00 |
|
Andrii Skliar
|
0a12cea25f
|
Order config.py in Lexicographical order (#35866)
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
|
2026-03-04 20:56:47 -08:00 |
|
Zhengxu Chen
|
dd6dbd93f8
|
[compile] Fix extra cache save on warm start. (#35921)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 12:56:30 +08:00 |
|
Harry Mellor
|
26366009c5
|
[CI] Don't leave docs preview comment on closed PRs (#36087)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 04:51:46 +00:00 |
|
Nick Hill
|
16c472abe7
|
[Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper (#35328)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-05 12:11:59 +08:00 |
|
daje0601
|
3b23d57c96
|
[Model] Add LoRA support for Whisper models (#29856)
Signed-off-by: daje0601 <englishmt4118@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-05 10:38:25 +08:00 |
|
Wentao Ye
|
2f4226fe52
|
[CI] Fix pre-commit mypy issue in main (#36049)
|
2026-03-04 18:13:12 -08:00 |
|
nkm-meta
|
792cbd64ca
|
Add platform method to enable custom collective ops registration (#34760)
Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com>
|
2026-03-05 00:50:32 +00:00 |
|
Zhengxu Chen
|
2ed4722e26
|
[compile] Reduce log spam from compile. (#36044)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 00:48:36 +00:00 |
|
Nick Hill
|
a3299c3d1d
|
[Model Runner V2] Misc code simplification (#35941)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 15:26:35 -08:00 |
|
Andreas Karatzas
|
6c21a0c2d7
|
[ROCm][CI] Added MI325 mirrors (stage C) (#35239)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-04 14:48:46 -08:00 |
|
Shanshan Shen
|
562339abc3
|
[Misc] Support OOT linear method registering (#35981)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-03-04 22:25:56 +00:00 |
|
amitz-nv
|
d7adcadb9b
|
[Bugfix] Fix passing of activation_type to trtllm fused MoE NVFP4 and FP8 (#36017)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2026-03-04 22:23:51 +00:00 |
|
Simon Mo
|
f678c3f61a
|
[RL] [Weight Sync] Guard IPC update-info pickle deserialization behind insecure serialization flag (#35928)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
|
2026-03-04 17:05:32 -05:00 |
|
Thomas Parnell
|
be0a3f7570
|
[Bugfix] Fix race in non-blocking num_accepted_tokens GPU->CPU copy (#36013)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-04 13:52:44 -08:00 |
|
Harry Mellor
|
17dc9c7fc9
|
[CI] Bump mypy version (#34950)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 20:55:11 +00:00 |
|
fenypatel99
|
7eca859110
|
Add PyTorch profiler schedule support with warmup/active iterations (#35240)
|
2026-03-04 12:53:38 -08:00 |
|
Russell Bryant
|
636ee223ac
|
[Docs] Document security risks of GPT-OSS Python tool (#35139)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 20:27:31 +00:00 |
|
Robert Shaw
|
b7d59ffce2
|
[UX] Remove NoOpOffloader log (#35678)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-03-04 12:13:40 -08:00 |
|
Richard Zou
|
5569f5218d
|
[torch.compile] Stop lazily compiling (#35472)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-03-04 12:13:17 -08:00 |
|
Davina Zaman
|
138d891d7f
|
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441)
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 11:44:39 -08:00 |
|
Stefano Castagnetta
|
d7166e74c1
|
[CI] Add Blackwell AsyncTP correctness test (#35871)
Signed-off-by: Stefano Castagnetta <scastagnetta@nvidia.com>
|
2026-03-04 19:41:21 +00:00 |
|
Nick Hill
|
417fd28fb1
|
[Model Runner V2] Fix pooling (#36019)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-04 10:53:17 -08:00 |
|
tomeras91
|
7faba503c4
|
[Kernel][Mamba] Optimize Mamba2 SSD prefill Triton kernels (#35397)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-04 19:47:17 +01:00 |
|
Hyunkyun Moon
|
bc6be89d16
|
[Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551)
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
|
2026-03-04 18:41:52 +00:00 |
|
Maxime Grenu
|
32224f568a
|
docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882)
Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:31:35 -08:00 |
|
Abhishek Mathukiya
|
f3dc292e9f
|
docs: add version requirement note for --profiler-config flag (#32454)
Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>
|
2026-03-04 18:13:54 +00:00 |
|
Chen
|
138c5fa186
|
[Docs] Add RunPod GPU deployment guide for vLLM (#34531)
Signed-off-by: lisperz <zhuchen200245@163.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:11:34 -08:00 |
|
Russell Bryant
|
2f2c1d73a7
|
[Docs] Upgrade dynamic LoRA warning to admonition block (#35218)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 10:01:42 -08:00 |
|
Bhuminjay Soni
|
fb3e78ab09
|
[Feature][CI]: compare func & no_func outputs in test_functionalization.py (#35481)
Signed-off-by: Bhuminjay <bhuminjaysoni@gmail.com>
Signed-off-by: Bhuminjay Soni <Soni5Happy@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-04 18:01:16 +00:00 |
|
Michael Yao
|
fd3bfe74c9
|
[Docs] Update design/multiprocessing.md (#30677)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2026-03-04 17:58:59 +00:00 |
|
tc-mb
|
bfdb512f11
|
fix minicpmo4.5: fix attn_mask in vit attn && fix resampler pos_emb i… (#34127)
Signed-off-by: tc-mb <caitianchi@modelbest.cn>
Co-authored-by: hezhihui <hezhihui@modelbest.cn>
|
2026-03-04 17:46:17 +00:00 |
|
Sage
|
d25c1ec3c9
|
docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-04 17:45:35 +00:00 |
|
Xing Liu
|
7cc6058ac6
|
[Doc] Add MTP docs and update speculative decoding guidance (#35197)
Signed-off-by: liuxing <945764858@qq.com>
|
2026-03-04 17:23:34 +00:00 |
|
Manrique Vargas
|
28028dff2f
|
fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784)
Signed-off-by: machov <mv1742@nyu.edu>
|
2026-03-04 17:15:35 +00:00 |
|
Dr Alex Mitre
|
3417ba5648
|
docs: add README for logits_processor examples (#35933)
|
2026-03-04 17:09:19 +00:00 |
|
Yan Ma
|
58cfe0dc44
|
Fix phi4-mm and remove cuda binding (#35964)
Signed-off-by: Yan Ma <yan.ma@intel.com>
|
2026-03-05 01:08:05 +08:00 |
|
simone-dotolo
|
e86221deb6
|
[Doc] Fix GPU Worker count in Process Count Summary (#36000)
Signed-off-by: simone-dotolo <simonedotolo@libero.it>
Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 17:03:14 +00:00 |
|
Netanel Haber
|
289fc48ab7
|
Use MMEncoderAttention (=use FlashAttention) instead of torch.sdpa in radio.py (#35653)
|
2026-03-04 08:43:13 -08:00 |
|
Christian Pinto
|
2f2212e6cc
|
Split generic IO Processor plugins tests from Terratorch specific ones (#35756)
Signed-off-by: Christian Pinto <christian.pinto@ibm.com>
|
2026-03-05 00:01:03 +08:00 |
|
Nicolò Lucchesi
|
18e01a0a10
|
[Misc] Add --attention-backend auto option (#35738)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-03-04 15:12:27 +00:00 |
|
sungsoo ha
|
6cb901093f
|
[Core] Add All-to-All communication backend for DCP (#34883)
Signed-off-by: Sungsoo Ha <sungsooh@nvidia.com>
Signed-off-by: sungsoo ha <hasungsoo@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:01:57 -05:00 |
|