Kunshang Ji
|
fde4771bbd
|
[XPU][Doc] update xpu document about triton dependency/conflict issue. (#36301)
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
|
2026-03-09 02:09:22 +00:00 |
|
Wei Zhao
|
379689d533
|
[Perf] Support FP8 KV cache for Flashinfer MLA Sparse (#35891)
|
2026-03-07 13:51:54 -08:00 |
|
rahul-sarvam
|
85f50eb41f
|
Adding support to Sarvam's MoE models (#33942)
Signed-off-by: rahul-sarvam <140298821+rahul-sarvam@users.noreply.github.com>
|
2026-03-08 01:16:24 +08:00 |
|
lif
|
00b814ba5a
|
[V0 Deprecation] Remove unused swap_space parameter (#36216)
Signed-off-by: majiayu000 <1835304752@qq.com>
Co-authored-by: mcelrath
|
2026-03-07 22:09:55 +08:00 |
|
Copilot
|
ce8546a12b
|
[docs][torch.compile] Add fusions.md — kernel/operator fusion reference page (#35538)
Signed-off-by: ProExpertProg <luka.govedic@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: ProExpertProg <11367180+ProExpertProg@users.noreply.github.com>
Co-authored-by: ProExpertProg <luka.govedic@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-03-06 23:55:06 +00:00 |
|
Andreas Karatzas
|
807d680337
|
[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 15:15:12 +08:00 |
|
Xiang Shi
|
e68de8adc0
|
docs: fix wrong cc in int8.md (#36209)
Signed-off-by: Xiang Shi <realkevin@tutanota.com>
|
2026-03-06 06:01:02 +00:00 |
|
Rohan Potdar
|
c5362c739f
|
Reenable features for ROCm attention backends (#36185)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-05 20:21:06 -08:00 |
|
Yanhong Li
|
a911f4dd20
|
[Model] Add support for OLMo Hybrid (#32550)
|
2026-03-05 14:51:06 -05:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Sage Moore
|
8c760b6ab6
|
[ROCm] Refactor ROCm attention backend selection logic (#35246)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-05 10:51:26 -06:00 |
|
Harry Mellor
|
8df523351f
|
[Docs] Only build docs if documentation or ready labels are present (#36135)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 13:58:16 +00:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Paco Xu
|
7493c51c55
|
[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
|
2026-03-05 17:39:50 +08:00 |
|
Reagan Lee
|
ac773bbe80
|
[Docs] Update docs to include mm processor + encoder benchmarks (#34083)
Signed-off-by: Reagan <reaganjlee@gmail.com>
|
2026-03-05 01:38:25 -08:00 |
|
zihaoanllm
|
d106bf39f5
|
[Doc] Add Parallel Draft Models (#35973)
Signed-off-by: <zihaoan2@amd.com>
Signed-off-by: zihaoanllm <zihaoan2@amd.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 05:44:07 +00:00 |
|
Russell Bryant
|
636ee223ac
|
[Docs] Document security risks of GPT-OSS Python tool (#35139)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 20:27:31 +00:00 |
|
Davina Zaman
|
138d891d7f
|
[Docs] Clarify structured outputs configuration for Qwen3 reasoning mode (#32441)
Signed-off-by: Davina Zaman <davzaman@users.noreply.github.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 11:44:39 -08:00 |
|
Maxime Grenu
|
32224f568a
|
docs: update CPU Docker images to reference Docker Hub instead of AWS ECR (#34882)
Signed-off-by: Maxime Grenu <69890511+cluster2600@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:31:35 -08:00 |
|
Abhishek Mathukiya
|
f3dc292e9f
|
docs: add version requirement note for --profiler-config flag (#32454)
Signed-off-by: abhishkh <mathukiya.a@northeastern.edu>
|
2026-03-04 18:13:54 +00:00 |
|
Chen
|
138c5fa186
|
[Docs] Add RunPod GPU deployment guide for vLLM (#34531)
Signed-off-by: lisperz <zhuchen200245@163.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-04 10:11:34 -08:00 |
|
Russell Bryant
|
2f2c1d73a7
|
[Docs] Upgrade dynamic LoRA warning to admonition block (#35218)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-04 10:01:42 -08:00 |
|
Michael Yao
|
fd3bfe74c9
|
[Docs] Update design/multiprocessing.md (#30677)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2026-03-04 17:58:59 +00:00 |
|
Sage
|
d25c1ec3c9
|
docs(cpu): Clarify pre-built wheels requirement for CPU Python-only build (#35090)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
|
2026-03-04 17:45:35 +00:00 |
|
Xing Liu
|
7cc6058ac6
|
[Doc] Add MTP docs and update speculative decoding guidance (#35197)
Signed-off-by: liuxing <945764858@qq.com>
|
2026-03-04 17:23:34 +00:00 |
|
Manrique Vargas
|
28028dff2f
|
fix(docs): use static rdzv backend in multi-node troubleshooting script (#34784)
Signed-off-by: machov <mv1742@nyu.edu>
|
2026-03-04 17:15:35 +00:00 |
|
simone-dotolo
|
e86221deb6
|
[Doc] Fix GPU Worker count in Process Count Summary (#36000)
Signed-off-by: simone-dotolo <simonedotolo@libero.it>
Signed-off-by: simone-dotolo <84937474+simone-dotolo@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-03-04 17:03:14 +00:00 |
|
AllenDou
|
c1d963403c
|
[model] support FireRedASR2 (#35727)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-03 19:41:30 -08:00 |
|
Shanshan Shen
|
77e6dcbbfa
|
[PluggableLayer][MM] Add PluggableLayer for RelPosAttention (#33753)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2026-03-03 19:41:27 -08:00 |
|
Robert Shaw
|
97995f6376
|
[MoE Refactor] Create MK for TRTLLM Kernels (#32564)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-03-03 10:39:50 -08:00 |
|
Woosuk Kwon
|
4f85bae9d6
|
[Docs][Model Runner V2] Add Design Docs (#35819)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-03-02 19:58:14 -08:00 |
|
Jakub Zakrzewski
|
c8b678e53e
|
[Model] Add support for nvidia/llama-nemotron-rerank-vl-1b-v2 (#35735)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-03-03 08:32:14 +08:00 |
|
Lucas Wilkinson
|
8b5014d3dd
|
[Attention] FA4 integration (#32974)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2026-03-01 23:44:57 +00:00 |
|
Augusto Yao
|
8e75d88554
|
add io_process_plugin for sparse embedding (#34214)
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com>
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-28 09:16:37 +00:00 |
|
Cyrus Leung
|
4292e3b807
|
[Benchmark] Improve UX of sweep scripts (#35600)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-28 00:36:02 -08:00 |
|
Cyrus Leung
|
24d6ea8afd
|
[Benchmark] Rename SLA Finder to Workload Explorer (#35586)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-27 23:31:55 -08:00 |
|
Cyrus Leung
|
fd68cd132b
|
[Bugfix] Fixes for SLA finder (#35537)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-27 20:20:55 -08:00 |
|
Micah Williamson
|
0edf101d2b
|
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN (#35527)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-28 12:16:34 +08:00 |
|
Gregory Shtrasberg
|
9fa6c68fa6
|
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends (#35334)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-27 21:32:55 +00:00 |
|
Martin Hickey
|
b602e4f299
|
[Doc] Fix link to Llama chat template for usability (#35525)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-27 17:51:09 +00:00 |
|
fort726
|
905d76b51d
|
[Model] Add huggingface skt/A.X-K1 model (#32407)
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com>
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-27 09:26:02 -08:00 |
|
Boyuan Feng
|
5de98abc12
|
Add @BoyuanFeng to CODEOWNERS (#35317)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2026-02-27 15:53:47 +00:00 |
|
Wentao Ye
|
062b789632
|
[Bug] Fix outdated links in source code (#35314)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-27 03:50:46 +00:00 |
|
Tyler Michael Smith
|
eb19955c37
|
[WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-26 14:30:10 -08:00 |
|
Jakub Zakrzewski
|
111d869069
|
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-02-26 14:17:17 +00:00 |
|
Jiangyun Zhu
|
ab87f85231
|
[Model] Ring 2.5 (#35102)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-02-26 02:17:11 -08:00 |
|
Cyrus Leung
|
d3a51da92a
|
[Benchmark] Simplify SLA scan (#35306)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-25 22:35:41 -08:00 |
|
Seungmin Kim
|
160424a937
|
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992)
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com>
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 18:15:51 -08:00 |
|
Joao Gante
|
709eadbb0b
|
Doc link typo (#35281)
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-25 03:00:31 -08:00 |
|
Yanwen Lin
|
80e60a6133
|
[Doc] Suggest "--managed-python" flag when installing python using uv (#33069)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
|
2026-02-25 08:19:43 +00:00 |
|