Cyrus Leung
fc22cae4ac
[CI/Build] Update video URLs for testing ( #34446 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 18:15:36 -08:00
Yanan Cao
96161fe978
[Kernel] [Helion] [4/N] Add silu_mul_fp8 Helion kernel ( #33373 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
2026-02-12 18:13:12 -08:00
Jaewon
4453ba8d9e
[Core] Profiler improvements and lazy initialization ( #33198 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:38 -08:00
Jaewon
aa181c923b
[Core] Add sleep level 0 mode with enqueue/wait pattern ( #33195 )
...
Signed-off-by: Jaewon Lee <jaewon@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-12 16:16:25 -08:00
Alec S
be7370daf3
[Frontend] Enable generic structured_outputs for responses API ( #33709 )
...
Signed-off-by: Alec Solder <alecs@fb.com >
Co-authored-by: Alec Solder <alecs@fb.com >
2026-02-12 16:15:48 -08:00
Mengtao (Martin) Yuan
9ea1f598ce
Use paged_attention_v1 for sliding window decode in rocm_aiter_fa ( #34378 )
...
Signed-off-by: Martin Yuan <myuan@meta.com >
Co-authored-by: Martin Yuan <myuan@meta.com >
2026-02-12 16:14:43 -08:00
amitz-nv
f120bd42d3
[Kernel] Support Flashinfer trtllm fused MoE non gated FP8 & NVFP4 ( #33506 )
...
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com >
2026-02-12 13:06:58 -08:00
Hashem Hashemi
fac4e96940
small adjustment to wvSplitKrc ( #34410 )
...
Signed-off-by: Hashem Hashemi <hashem.hashemi@amd.com >
2026-02-12 20:26:36 +00:00
Michael Goin
6d4e27ce29
[Bugfix] Enforce DeepGEMM when using sparse_attn_indexer on CUDA ( #34374 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-12 12:08:06 -08:00
Andreas Karatzas
4c078fa546
[ROCm][CI] Pin TorchCodec to v0.10.0 for ROCm compatibility ( #34447 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-12 18:47:34 +00:00
Patrick von Platen
6c0baee610
[Voxtral Realtime] Refactor & Improve buffering logic ( #34428 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-12 09:46:43 -08:00
Patrick von Platen
1100a97621
[Voxstral Realtime] Enable tests ( #33803 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
2026-02-12 09:43:24 -08:00
xuebwang-amd
766e167821
[ROCm][quantization] improve OCP weight quant parser robust ( #34431 )
...
Signed-off-by: xuebwang-amd <xuebwang@amd.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-12 09:40:19 -08:00
Isotr0py
becbe24808
[Bugfix] Remove broken raw url GGUF model loading support ( #34433 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-12 09:40:01 -08:00
Harry Mellor
679ca5d8d3
Fix MoE for the Transformers modelling backend ( #34436 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-12 09:29:42 -08:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Nicolò Lucchesi
334c715e0f
[Docs] Spec decoding docs warning removal ( #34439 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-12 09:01:51 -08:00
Aaron Hao
7b5a8b4a9d
[BUG] Reset running requests when clearing cache for pause/resume ( #34382 )
...
Signed-off-by: hao-aaron <ahao@anyscale.com >
2026-02-12 16:19:13 +00:00
danisereb
dea63512bb
Add config file for fused MoE for Nemotron (TP4, B200) ( #34411 )
...
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com >
2026-02-12 06:09:55 -08:00
Douglas Lehr
8a798be929
[ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter ( #34192 )
...
Signed-off-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Doug Lehr <douglehr@amd.com >
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com >
Co-authored-by: tjtanaavllm <tunjian.tan@amd.com >
2026-02-12 05:06:33 -08:00
Cyrus Leung
fb455ed547
[V0 Deprecation] Remove code related to per-request logits processors ( #34400 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-12 20:44:28 +08:00
baonudesifeizhai
f5897613fb
Fix Mistral config remap to accept compressed-tensors quantization #34028 ( #34104 )
...
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2026-02-12 08:22:06 +00:00
Louie Tsai
55a1a9563a
Vllm CPU benchmark suite improvement ( #34128 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-02-12 16:04:44 +08:00
AllenDou
386bfe5d08
[bugfix] refactor FunASR's _get_data_parser ( #34397 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-12 07:26:49 +00:00
Kyle Sayers
e9cd691132
[Bugfix] Fix Sparse24 Compressed Tensors models ( #33446 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 23:15:16 -08:00
Yichuan Wang
80f2ba6ea6
Fix DeepSeek-OCR tensor validation for all size variants ( #34085 )
...
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-11 22:50:23 -08:00
Lucas Wilkinson
136b0bfa59
[BugFix] Fix DP chunking ( #34379 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Bill Nell <bnell@redhat.com >
Co-authored-by: Bill Nell <bnell@redhat.com >
2026-02-12 06:44:03 +00:00
Cyrus Leung
b96f7314b4
[Refactor] Pass Renderer to Input Processor ( #34329 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:38:11 -08:00
Cyrus Leung
ced2a92f40
[Refactor] Move validation to params definitions ( #34362 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-11 19:33:15 -08:00
Runkai Tao
e1d97c38f8
[Bug Fix] Fix naive_block_assignment always defaulting to False due to arg misalignment ( #33848 )
...
Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu >
2026-02-12 11:30:57 +08:00
Michael Goin
ec12d39d44
[Bugfix] Fix MTP accuracy for GLM-5 ( #34385 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-12 11:08:19 +08:00
Michael Goin
ff1f83b056
[Refactor] Replace activation: str with MoEActivation enum ( #33843 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
2026-02-11 17:29:32 -08:00
Kevin H. Luu
83b47f67b1
[ci] Integrate AMD tests into CI ( #33626 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
Signed-off-by: khluu <khluu000@gmail.com >
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-12 08:54:17 +08:00
Micah Williamson
fb7b30c716
[ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI ( #34384 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-11 15:52:34 -08:00
bnellnm
31d992d215
[Bugfix] Fix some issues with MoERunner PR #32344 ( #34371 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-11 14:33:14 -08:00
Wei Zhao
5aff2699bd
Fix CI failure - Flashinfer Kernel tests ( #34316 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-11 14:17:16 -08:00
Raushan Turganbay
527ca32197
[Bugfix] Fix more multimodal tests for transformers V5 ( #34334 )
...
Signed-off-by: raushan <raushan@huggingface.co >
2026-02-11 22:02:05 +01:00
Junseo Park
5458eb835d
[Bugfix] send None sentinel on final commit so server properly sends transcription.done ( #33963 )
...
Signed-off-by: pjs102793 <pjs102793@naver.com >
Co-authored-by: Nick Hill <nickhill123@gmail.com >
2026-02-11 21:01:53 +00:00
Tomas Ruiz
144d9b7cc8
[Benchmarks] Reduce ready checker log verbosity ( #34349 )
...
Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com >
2026-02-11 20:57:57 +00:00
elvischenv
83e26c834e
[GPT-OSS] Remove unnecessary contiguous ( #34337 )
...
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com >
2026-02-11 15:29:29 -05:00
TJian
5001211369
[ROCm] [CI] fix test_unrecognized_env ( #34350 )
...
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com >
2026-02-11 18:50:44 +00:00
Eldar Kurtić
11c7ace340
[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) ( #34243 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-11 13:24:22 -05:00
Xinyu Dong
be7f3d5d20
[Bugfix] fix default is_neox_style is True for deepseek ( #34353 )
...
Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com >
2026-02-11 18:20:45 +00:00
Isotr0py
0ab06100f4
[Multimodal] Expose mm_processor_kwargs for DummyInputsBuilder ( #34330 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-11 09:37:40 -08:00
Xinyu Chen
ffb3d553cc
[Model Runner V2] Init cuda graph pool when necessary ( #33217 )
...
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com >
2026-02-11 09:12:13 -08:00
junuxyz
fa7e0bfacf
[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… ( #32458 )
...
Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com >
2026-02-11 17:03:48 +00:00
SorenDreano
48134a2c22
[Docs] Fix typo ("defult") and double spacing ( #34348 )
...
Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com >
Co-authored-by: Soren Dreano <soren@numind.ai >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-11 09:02:27 -08:00
kliuae
64f570ab56
[ROCm] [aiter] Split KV cache update for AiterFlashAttention ( #33681 )
...
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com >
2026-02-11 16:26:44 +00:00
Rohan Potdar
fd618871b4
[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache ( #33948 )
...
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com >
2026-02-11 11:12:05 -05:00
Harry Mellor
67a42b5a44
Don't try and run GLM-ASR with remote code ( #34352 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-11 08:09:40 -08:00