Woosuk Kwon
|
4147910f1e
|
[Model Runner V2] Move mrope_positions buffer to MRopeState (#32532)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-17 20:09:48 -08:00 |
|
Karan Bansal
|
3055232ba0
|
[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386)
Signed-off-by: Karan Bansal <karanb192@gmail.com>
|
2026-01-18 11:02:01 +08:00 |
|
Shengqi Chen
|
965765aef9
|
[build] fix cu130 related release pipeline steps and publish as nightly image (#32522)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2026-01-17 18:36:11 -08:00 |
|
Mritunjay Kumar Sharma
|
9e078d0582
|
[CI/Build][Docker] Add centralized version manifest for Docker builds (#31492)
Signed-off-by: Mritunjay Sharma <mritunjay.sharma@chainguard.dev>
|
2026-01-17 13:45:30 +00:00 |
|
Guofang.Tang
|
2b99f210f5
|
[Misc] Fix typo: seperator -> separator in flashmla_sparse.py (#32411)
Signed-off-by: Guofang Tang <tinggofun@gmail.com>
Co-authored-by: Guofang Tang <tinggofun@gmail.com>
|
2026-01-17 12:18:30 +00:00 |
|
Kim Hee Su
|
1646fea672
|
[Model] Molmo2: Enable quantized weight mapping for vision backbone (#32385)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
|
2026-01-17 09:33:05 +00:00 |
|
Paul Pak
|
d3317bbba4
|
[Models] Lfm2Moe: minor name changes for resolving lora conflicts (#29063)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2026-01-16 22:12:55 -08:00 |
|
Shengqi Chen
|
8e61425ee6
|
[CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 (#31032)
|
2026-01-17 04:52:33 +00:00 |
|
Matthew Bonanni
|
2e7c89e708
|
Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… (#32484)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-17 04:42:39 +00:00 |
|
vanshil shah
|
037a6487af
|
apply _validate_input to MistralTokenizer token-id chat prompts (#32448)
Signed-off-by: Vanshil Shah <vanshilshah@gmail.com>
|
2026-01-17 03:23:45 +00:00 |
|
Simon Mo
|
5a3050a089
|
[Docs][Governance] Add @robertshaw2-redhat to lead maintainers group (#32498)
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-01-16 18:35:49 -08:00 |
|
Chenyaaang
|
484e22bc18
|
[TPU][Core] Enable Pipeline Parallelism on TPU backend (#28506)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2026-01-16 15:29:20 -08:00 |
|
Lucas Wilkinson
|
ca21288080
|
[CI] Fix OOM in Hopper Fusion E2E Tests (H100) (#32489)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-16 21:27:16 +00:00 |
|
Andrew Xia
|
4c82b6fac7
|
[responsesAPI] allow tuning include_stop_str_in_output (#32383)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-01-16 21:14:40 +00:00 |
|
Xin Yang
|
a884bc62d6
|
[LoRA] Update LoRA expand kernel heuristic (#32425)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-16 18:38:07 +00:00 |
|
Hashem Hashemi
|
7a1030431a
|
Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-01-16 11:45:04 -06:00 |
|
Wentao Ye
|
9fd918e510
|
[CI] Update deepgemm to newer version (#32479)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-17 01:18:05 +08:00 |
|
Ilya Markov
|
c9a533079c
|
[EPLB][BugFix]Possible deadlock fix (#32418)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-01-16 09:11:01 -05:00 |
|
rasmith
|
6ca4f400d8
|
[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm (#32444)
Signed-off-by: Randall Smith <ransmith@amd.com>
|
2026-01-16 16:22:53 +08:00 |
|
Cyrus Leung
|
180e981d56
|
[Chore] Replace swish with silu (#32459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-16 08:22:45 +00:00 |
|
Micah Williamson
|
b84c426a8c
|
[ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms (#32460)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-16 00:17:44 -08:00 |
|
Rabi Mishra
|
b66b0d6abb
|
fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm (#32244)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-16 15:31:10 +08:00 |
|
Hongxin Xu
|
03da3b52ef
|
[Bugfix] Refactor to support DP parallel in R3 (#32306)
Signed-off-by: xhx1022 <1737006628@qq.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>
|
2026-01-16 15:13:58 +08:00 |
|
Lucas Wilkinson
|
14ce524249
|
[CI] Breakup h200 tests (#30499)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-16 06:23:22 +00:00 |
|
wang.yuqi
|
4ae77dfd42
|
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-16 06:17:04 +00:00 |
|
XiongfeiWei
|
73f635a75f
|
[Bug] Add TPU backend option (#32438)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
|
2026-01-16 05:17:12 +00:00 |
|
cjackal
|
35bf5d08e8
|
[bugfix] Fix online serving crash when text type response_format is received (#26822)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
Signed-off-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
Co-authored-by: j0shuajun <59368606+j0shuajun@users.noreply.github.com>
|
2026-01-16 12:23:54 +08:00 |
|
Kebe
|
5de6dd0662
|
[Bugfix] [DeepSeek-V3.2] fix sparse_attn_indexer padding (#32175)
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-16 03:21:55 +00:00 |
|
ltd0924
|
709502558c
|
[Model] Add Step3vl 10b (#32329)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-01-15 19:04:16 -08:00 |
|
Micah Williamson
|
46f8a982b1
|
[ROCm][CI] Enable AITER Unified Attention On ROCm For gpt-oss Test (#32431)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-16 00:55:57 +00:00 |
|
Matthew Bonanni
|
bcf2333cd6
|
[CI] Fix LM Eval Large Models (H100) (#32423)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-16 00:52:49 +00:00 |
|
Michael Goin
|
83239ff19a
|
Add thread_n=64 support to Marlin MoE (#32360)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 16:45:44 -08:00 |
|
TomerBN-Nvidia
|
c277fbdf31
|
[Feat] Support non-gated MoE with Marlin, NVFP4 CUTLASS, FP8, INT8, compressed-tensors (#32257)
Signed-off-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@computelab-frontend-8.nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Tomer Natan <tbarnatan@ipp1-1429.ipp1a1.colossus.nvidia.com>
|
2026-01-15 16:15:05 -08:00 |
|
Wentao Ye
|
aca5c51487
|
[Refactor] Remove unused file (#32422)
|
2026-01-15 15:59:38 -07:00 |
|
Yongye Zhu
|
31c29257c8
|
[MoE Refactor][17/N] Apply Refactor to Bf16 (#31827)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-15 12:53:40 -08:00 |
|
Aleksandr Malyshev
|
8c11001ba2
|
[ROCM] DSfp4 mla projection gemms weight dynamic quantization (#32238)
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2026-01-15 14:13:08 -06:00 |
|
Richard Zou
|
bd292be0c0
|
[BugFix] Python file source reading can fail on UnicodeDecodeError (#32416)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-01-15 20:01:41 +00:00 |
|
TJian
|
41c544f78a
|
[ROCm] [CI] [Release] Rocm wheel pipeline with sccache (#32264)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-01-16 02:56:18 +08:00 |
|
Michael Goin
|
1be5a73571
|
[UX] Use kv_offloading_backend=native by default (#32421)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-01-15 18:55:11 +00:00 |
|
Lucas Wilkinson
|
c36ba69bda
|
[BugFix] Fix assert x_s.shape[-1] == x_q.shape[-1] // group_shape[1] in Blackwell Quantized MoE Test (#32362)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-15 10:19:12 -08:00 |
|
Matthias Gehre
|
047413375c
|
[Attention][AMD] Make flash-attn optional (#30361)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
|
2026-01-15 17:18:24 +00:00 |
|
smit kadvani
|
74e4bb1c5a
|
fixing podman build issue (#32131)
Signed-off-by: Smit Kadvani <smit.kadvani@gmail.com>
Co-authored-by: Smit Shaileshbhai Kadvani <kadvani@meta.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-01-15 11:07:08 -06:00 |
|
Wentao Ye
|
b34474bf2c
|
[Feature] Support async scheduling + PP (#32359)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-15 12:06:23 -05:00 |
|
Woosuk Kwon
|
6218034dd7
|
[Model Runner V2] Support FlashInfer backend & Fix CUDA Graph bug [1/2] (#32348)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-15 08:59:23 -08:00 |
|
Pleaplusone
|
77c16df31d
|
[ROCm][Bugfix] Disable hip sampler to fix deepseek's accuracy issue on ROCm (#32413)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-15 16:35:47 +00:00 |
|
Pleaplusone
|
130d6c9514
|
[ROCm][Perf] Enable shuffle kv cache layout and assembly paged attention kernel for AiterFlashAttentionBackend (#29887)
Signed-off-by: ganyi <ygan@amd.com>
|
2026-01-15 15:29:53 +00:00 |
|
Dipika Sikka
|
361dfdc9d8
|
[Quant] Support MXFP4 W4A16 for compressed-tensors MoE models (#32285)
Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-15 07:25:55 -08:00 |
|
Matthew Bonanni
|
8ebfacaa75
|
[Attention][MLA] Make FLASHINFER_MLA the default MLA backend on Blackwell, and TRTLLM the default prefill (#32339)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-15 09:49:57 -05:00 |
|
brian033
|
b89275d018
|
[ROCm] Improve error handling while loading quantized model on gfx120… (#31715)
Signed-off-by: brian033 <85883730+brian033@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-01-15 04:16:00 -08:00 |
|
Cyrus Leung
|
28459785ff
|
[3/N] Group together media-related code (#32406)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-15 11:52:12 +00:00 |
|