Daniel Mescheder
|
cdd03d25d3
|
[CI/Build] Fix dependency conflict between model-hosting-container-standards and starlette (#32560)
Signed-off-by: Daniel Mescheder <dmesch@amazon.com>
Co-authored-by: Daniel Mescheder <dmesch@amazon.com>
|
2026-01-19 03:27:08 -08:00 |
|
Nicolò Lucchesi
|
74c583bc50
|
[Core] Whisper support torch.compile (#30385)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-01-19 10:02:31 +00:00 |
|
Andreas Karatzas
|
c0a350ca73
|
[ROCm][CI] Add ROCm attention backend support for EAGLE DP tests (#32363)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-19 09:57:54 +00:00 |
|
Yuxuan Zhang
|
71832ba71e
|
[GLM-4.7] GLM Model support for GLM-Lite (#31386)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Yuxuan Zhang <2448370773@qq.com>
|
2026-01-19 01:18:38 -08:00 |
|
Matt
|
11bbf86f6a
|
[CI][Hardware][AMD] Fix test_rotary_embedding_mla_cache_fused (#32408)
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
|
2026-01-19 08:25:47 +00:00 |
|
Hyunkyun Moon
|
3c8740aacb
|
[Frontend] Add render endpoints for prompt preprocessing (#32473)
Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>
Signed-off-by: Hyunkyun Moon <mhg5303@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-01-19 12:21:46 +08:00 |
|
Alex Brooks
|
7518a3dc65
|
[CI/Build] Use Common Event Map Fixture in Harmony / MCP Server Tests (#32531)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2026-01-19 04:05:51 +00:00 |
|
honglyua
|
976af2f314
|
[BugFix] Fix embed_input_ids argument error of QwenVLForConditionalGeneration (#32462)
|
2026-01-19 03:06:02 +00:00 |
|
Woosuk Kwon
|
9a1f16da1e
|
[Model Runner V2] Refactor update_states (#32562)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-18 17:32:42 -08:00 |
|
Woosuk Kwon
|
bb1848cd62
|
[Model Runner V2] Support VLM (#32546)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-18 16:58:51 -08:00 |
|
Vadim Gimpelson
|
6101a26dc9
|
[BUGFIX] Fix degenerate strides in TRTLLM query tensors for FlashInfer backend. Fixes issue #32353 (#32417)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2026-01-18 16:57:32 -08:00 |
|
Iryna Boiko
|
f5d1740030
|
[Bugfix] Add OOT backend option (#32471)
Signed-off-by: Iryna Boiko <iboiko@habana.ai>
|
2026-01-18 22:20:39 +00:00 |
|
Wentao Ye
|
eebc58df0c
|
[Refactor] Remove unused cutlass moe problem size function (#32047)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-18 12:46:59 -08:00 |
|
Wentao Ye
|
16de822c71
|
[Refactor] Remove unused file pallas_kv_cache_update.py (#32433)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-18 12:46:39 -08:00 |
|
Deming
|
5480c6b1fa
|
[Doc] Correct comment for _jobs dict in OffloadingConnectorWorker (#32556)
|
2026-01-18 12:46:00 -08:00 |
|
Andrey Khalyavin
|
ba29ab441e
|
Use the same memory for workspace13 and fused_output. (#31531)
Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>
|
2026-01-18 19:14:22 +00:00 |
|
Robert Shaw
|
afc3622602
|
[CI] Move Distributed Tests from H200 -> H100 (#32555)
|
2026-01-18 10:25:23 -08:00 |
|
bnellnm
|
327a02d8db
|
[MoE Refactor] Separate Router into OO Classes (#30623)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-01-18 11:40:49 -05:00 |
|
tjp_zju
|
2f03035a61
|
"refactor: refactor_repeated_interfaces" (#32486)
Signed-off-by: tom-zju <tanjianpingzju1990@gmail.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-18 22:07:01 +08:00 |
|
Isotr0py
|
38bf2ffb21
|
[Bugfix] Fix GLM-ASR audio encoder RoPE dim (#32540)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-18 19:17:59 +08:00 |
|
Li Xie
|
c826c72a96
|
[Model] Support Step1 Model (#32511)
Signed-off-by: xieli <xieli@stepfun.com>
|
2026-01-18 10:20:46 +00:00 |
|
Canlin Guo
|
fe36bf5e80
|
[Model] Remove the unnecessary dtype conversion in MiniCPM (#32523)
Signed-off-by: gcanlin <canlinguosdu@gmail.com>
|
2026-01-18 08:07:28 +00:00 |
|
Woosuk Kwon
|
963dc0b865
|
[Model Runner V2] Minor optimization for eagle input processing (#32535)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-17 21:55:17 -08:00 |
|
Isotr0py
|
8cc26acd8b
|
[Performance] Improve Triton prefill attention kernel's performance (#32403)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-17 20:19:59 -08:00 |
|
Robert Shaw
|
4a6af8813f
|
[MoE Refactor] Move Test Impl into Test Dirs (#32129)
Signed-off-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
|
2026-01-18 12:16:59 +08:00 |
|
Woosuk Kwon
|
4147910f1e
|
[Model Runner V2] Move mrope_positions buffer to MRopeState (#32532)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2026-01-17 20:09:48 -08:00 |
|
Karan Bansal
|
3055232ba0
|
[Feature] Add FIPS 140-3 compliant hash algorithm option for multimodal hashing (#32386)
Signed-off-by: Karan Bansal <karanb192@gmail.com>
|
2026-01-18 11:02:01 +08:00 |
|
Shengqi Chen
|
965765aef9
|
[build] fix cu130 related release pipeline steps and publish as nightly image (#32522)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2026-01-17 18:36:11 -08:00 |
|
Mritunjay Kumar Sharma
|
9e078d0582
|
[CI/Build][Docker] Add centralized version manifest for Docker builds (#31492)
Signed-off-by: Mritunjay Sharma <mritunjay.sharma@chainguard.dev>
|
2026-01-17 13:45:30 +00:00 |
|
Guofang.Tang
|
2b99f210f5
|
[Misc] Fix typo: seperator -> separator in flashmla_sparse.py (#32411)
Signed-off-by: Guofang Tang <tinggofun@gmail.com>
Co-authored-by: Guofang Tang <tinggofun@gmail.com>
|
2026-01-17 12:18:30 +00:00 |
|
Kim Hee Su
|
1646fea672
|
[Model] Molmo2: Enable quantized weight mapping for vision backbone (#32385)
Signed-off-by: kimheesu <wlskaka4@gmail.com>
|
2026-01-17 09:33:05 +00:00 |
|
Paul Pak
|
d3317bbba4
|
[Models] Lfm2Moe: minor name changes for resolving lora conflicts (#29063)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2026-01-16 22:12:55 -08:00 |
|
Shengqi Chen
|
8e61425ee6
|
[CI] Implement uploading to PyPI and GitHub in the release pipeline, enable release image building for CUDA 13.0 (#31032)
|
2026-01-17 04:52:33 +00:00 |
|
Matthew Bonanni
|
2e7c89e708
|
Revert "[Attention][MLA] Make FLASHINFER_MLA the default MLA backen… (#32484)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-01-17 04:42:39 +00:00 |
|
vanshil shah
|
037a6487af
|
apply _validate_input to MistralTokenizer token-id chat prompts (#32448)
Signed-off-by: Vanshil Shah <vanshilshah@gmail.com>
|
2026-01-17 03:23:45 +00:00 |
|
Simon Mo
|
5a3050a089
|
[Docs][Governance] Add @robertshaw2-redhat to lead maintainers group (#32498)
Co-authored-by: Claude <noreply@anthropic.com>
|
2026-01-16 18:35:49 -08:00 |
|
Chenyaaang
|
484e22bc18
|
[TPU][Core] Enable Pipeline Parallelism on TPU backend (#28506)
Signed-off-by: Chenyaaang <chenyangli@google.com>
|
2026-01-16 15:29:20 -08:00 |
|
Lucas Wilkinson
|
ca21288080
|
[CI] Fix OOM in Hopper Fusion E2E Tests (H100) (#32489)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-16 21:27:16 +00:00 |
|
Andrew Xia
|
4c82b6fac7
|
[responsesAPI] allow tuning include_stop_str_in_output (#32383)
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2026-01-16 21:14:40 +00:00 |
|
Xin Yang
|
a884bc62d6
|
[LoRA] Update LoRA expand kernel heuristic (#32425)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-01-16 18:38:07 +00:00 |
|
Hashem Hashemi
|
7a1030431a
|
Atomics Reduce Counting Optimization for SplitK Skinny GEMMs. (#29843)
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com>
|
2026-01-16 11:45:04 -06:00 |
|
Wentao Ye
|
9fd918e510
|
[CI] Update deepgemm to newer version (#32479)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-17 01:18:05 +08:00 |
|
Ilya Markov
|
c9a533079c
|
[EPLB][BugFix]Possible deadlock fix (#32418)
Signed-off-by: ilmarkov <markovilya197@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-01-16 09:11:01 -05:00 |
|
rasmith
|
6ca4f400d8
|
[CI][AMD] Skip test_permute_cols since the kernel is not used and not built for ROCm (#32444)
Signed-off-by: Randall Smith <ransmith@amd.com>
|
2026-01-16 16:22:53 +08:00 |
|
Cyrus Leung
|
180e981d56
|
[Chore] Replace swish with silu (#32459)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-01-16 08:22:45 +00:00 |
|
Micah Williamson
|
b84c426a8c
|
[ROCm][CI] Skip Qwen3-30B-A3B-MXFP4A16 Eval Test On Non-CUDA Platforms (#32460)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-01-16 00:17:44 -08:00 |
|
Rabi Mishra
|
b66b0d6abb
|
fix(rocm): Enable non-gated MoE (is_act_and_mul=False) support on ROCm (#32244)
Signed-off-by: rabi <ramishra@redhat.com>
|
2026-01-16 15:31:10 +08:00 |
|
Hongxin Xu
|
03da3b52ef
|
[Bugfix] Refactor to support DP parallel in R3 (#32306)
Signed-off-by: xhx1022 <1737006628@qq.com>
Co-authored-by: arlenxu <arlenxu@tencent.com>
|
2026-01-16 15:13:58 +08:00 |
|
Lucas Wilkinson
|
14ce524249
|
[CI] Breakup h200 tests (#30499)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-16 06:23:22 +00:00 |
|
wang.yuqi
|
4ae77dfd42
|
[Frontend][1/n] Make pooling entrypoints request schema consensus | CompletionRequest (#32395)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-16 06:17:04 +00:00 |
|