danisereb
|
dea63512bb
|
Add config file for fused MoE for Nemotron (TP4, B200) (#34411)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2026-02-12 06:09:55 -08:00 |
|
Douglas Lehr
|
8a798be929
|
[ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter (#34192)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>
|
2026-02-12 05:06:33 -08:00 |
|
Cyrus Leung
|
fb455ed547
|
[V0 Deprecation] Remove code related to per-request logits processors (#34400)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-12 20:44:28 +08:00 |
|
baonudesifeizhai
|
f5897613fb
|
Fix Mistral config remap to accept compressed-tensors quantization #34028 (#34104)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2026-02-12 08:22:06 +00:00 |
|
Louie Tsai
|
55a1a9563a
|
Vllm CPU benchmark suite improvement (#34128)
Signed-off-by: louie-tsai <louie.tsai@intel.com>
|
2026-02-12 16:04:44 +08:00 |
|
AllenDou
|
386bfe5d08
|
[bugfix] refactor FunASR's _get_data_parser (#34397)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-02-12 07:26:49 +00:00 |
|
Kyle Sayers
|
e9cd691132
|
[Bugfix] Fix Sparse24 Compressed Tensors models (#33446)
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-11 23:15:16 -08:00 |
|
Yichuan Wang
|
80f2ba6ea6
|
Fix DeepSeek-OCR tensor validation for all size variants (#34085)
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-02-11 22:50:23 -08:00 |
|
Lucas Wilkinson
|
136b0bfa59
|
[BugFix] Fix DP chunking (#34379)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Co-authored-by: Bill Nell <bnell@redhat.com>
|
2026-02-12 06:44:03 +00:00 |
|
Cyrus Leung
|
b96f7314b4
|
[Refactor] Pass Renderer to Input Processor (#34329)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 19:38:11 -08:00 |
|
Cyrus Leung
|
ced2a92f40
|
[Refactor] Move validation to params definitions (#34362)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 19:33:15 -08:00 |
|
Runkai Tao
|
e1d97c38f8
|
[Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment (#33848)
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>
|
2026-02-12 11:30:57 +08:00 |
|
Michael Goin
|
ec12d39d44
|
[Bugfix] Fix MTP accuracy for GLM-5 (#34385)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-12 11:08:19 +08:00 |
|
Michael Goin
|
ff1f83b056
|
[Refactor] Replace activation: str with MoEActivation enum (#33843)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-11 17:29:32 -08:00 |
|
Kevin H. Luu
|
83b47f67b1
|
[ci] Integrate AMD tests into CI (#33626)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-12 08:54:17 +08:00 |
|
Micah Williamson
|
fb7b30c716
|
[ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI (#34384)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
|
2026-02-11 15:52:34 -08:00 |
|
bnellnm
|
31d992d215
|
[Bugfix] Fix some issues with MoERunner PR #32344 (#34371)
Signed-off-by: Bill Nell <bnell@redhat.com>
|
2026-02-11 14:33:14 -08:00 |
|
Wei Zhao
|
5aff2699bd
|
Fix CI failure - Flashinfer Kernel tests (#34316)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-02-11 14:17:16 -08:00 |
|
Raushan Turganbay
|
527ca32197
|
[Bugfix] Fix more multimodal tests for transformers V5 (#34334)
Signed-off-by: raushan <raushan@huggingface.co>
|
2026-02-11 22:02:05 +01:00 |
|
Junseo Park
|
5458eb835d
|
[Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963)
Signed-off-by: pjs102793 <pjs102793@naver.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-11 21:01:53 +00:00 |
|
Tomas Ruiz
|
144d9b7cc8
|
[Benchmarks] Reduce ready checker log verbosity (#34349)
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>
|
2026-02-11 20:57:57 +00:00 |
|
elvischenv
|
83e26c834e
|
[GPT-OSS] Remove unnecessary contiguous (#34337)
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
|
2026-02-11 15:29:29 -05:00 |
|
TJian
|
5001211369
|
[ROCm] [CI] fix test_unrecognized_env (#34350)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2026-02-11 18:50:44 +00:00 |
|
Eldar Kurtić
|
11c7ace340
|
[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243)
Signed-off-by: Your Name <you@example.com>
Co-authored-by: Your Name <you@example.com>
|
2026-02-11 13:24:22 -05:00 |
|
Xinyu Dong
|
be7f3d5d20
|
[Bugfix] fix default is_neox_style is True for deepseek (#34353)
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>
|
2026-02-11 18:20:45 +00:00 |
|
Isotr0py
|
0ab06100f4
|
[Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder (#34330)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-11 09:37:40 -08:00 |
|
Xinyu Chen
|
ffb3d553cc
|
[Model Runner V2] Init cuda graph pool when necessary (#33217)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-02-11 09:12:13 -08:00 |
|
junuxyz
|
fa7e0bfacf
|
[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458)
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>
|
2026-02-11 17:03:48 +00:00 |
|
SorenDreano
|
48134a2c22
|
[Docs] Fix typo ("defult") and double spacing (#34348)
Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com>
Co-authored-by: Soren Dreano <soren@numind.ai>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-11 09:02:27 -08:00 |
|
kliuae
|
64f570ab56
|
[ROCm] [aiter] Split KV cache update for AiterFlashAttention (#33681)
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
|
2026-02-11 16:26:44 +00:00 |
|
Rohan Potdar
|
fd618871b4
|
[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-11 11:12:05 -05:00 |
|
Harry Mellor
|
67a42b5a44
|
Don't try and run GLM-ASR with remote code (#34352)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-11 08:09:40 -08:00 |
|
Lucas Wilkinson
|
c7914d30f9
|
Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-11 07:07:56 -08:00 |
|
Adam Binford
|
1b8756562e
|
Responses harmony system message structured (#34268)
Signed-off-by: Adam Binford <adamq43@gmail.com>
|
2026-02-11 05:14:28 -08:00 |
|
Linda
|
275e0d2a99
|
[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
Co-authored-by: Pavani Majety <pmajety@nvidia.com>
|
2026-02-11 12:38:11 +00:00 |
|
Harry Mellor
|
0f5e55e7a8
|
Make JAIS compatible with Transformers v5 (#34264)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-11 12:30:37 +00:00 |
|
Harry Mellor
|
1e9204bff3
|
Make Qwen3VL compatible with Transformers v5 (#34262)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-11 04:13:23 -08:00 |
|
Li, Jiang
|
05339a7b20
|
[Bugfix][CPU] Fix llama4 inference on CPU (#34321)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2026-02-11 19:07:23 +08:00 |
|
Harry Mellor
|
40b8f55358
|
[Docs] Reduce time spent generating API docs (#34255)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-11 02:56:02 -08:00 |
|
Seiji Eicher
|
5045d5c983
|
Patch protobuf for CVE-2026-0994 (#34253)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
|
2026-02-11 02:25:04 -08:00 |
|
Nick Hill
|
e09546cf05
|
[Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-11 11:03:24 +01:00 |
|
Tianqi Ren
|
786806dd44
|
[Doc] Update Marlin support matrix for Turing (#34319)
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>
|
2026-02-11 09:03:41 +00:00 |
|
Nick Hill
|
79504027ef
|
[Misc] Bump fastsafetensors version for latest fixes (#34273)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-02-11 00:30:09 -08:00 |
|
Luka Govedič
|
addac0e653
|
[torch.compile] Enable AR+rms fusion by default available for -O2 (#34299)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2026-02-11 00:30:00 -08:00 |
|
Cyrus Leung
|
675a22ed66
|
[Chore] Move BaseRenderer to base.py (#34308)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-11 00:29:51 -08:00 |
|
Kunshang Ji
|
cb9574eb85
|
[XPU][9/N] clean up existing ipex code/doc (#34111)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-11 00:27:15 -08:00 |
|
AllenDou
|
21dfb842d7
|
[model] support FunASR model (#33247)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
|
2026-02-11 07:37:09 +00:00 |
|
R3hankhan
|
d1b837f0ae
|
[CPU] Enable FP16 (Half dtype) support for s390x (#34116)
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>
|
2026-02-11 14:41:42 +08:00 |
|
Roger Wang
|
0b20469c62
|
[Bugfix] Fix weight naming in Qwen3.5 (#34313)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2026-02-10 21:37:14 -08:00 |
|
Tyler Michael Smith
|
d7982daff5
|
[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-11 05:15:52 +00:00 |
|