haosdent
a7f341c323
[Bugfix] Fix MRotaryEmbedding missing truncate attr with YaRN scaling ( #35080 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-23 16:05:52 +00:00
Robert Shaw
d13ece38d7
[CI] Skip Responses API ( #34990 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-23 07:46:45 -08:00
Mark McLoughlin
5cc7c4452e
[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) ( #30950 )
...
Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus.
`--enable-mfu-metrics` is required for these to be exposed.
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-23 15:01:07 +00:00
Eldar Kurtić
b95bb6927f
[kv-cache, ct] Use compressed-tensors as a source of ground-truth for quant strategies ( #34254 )
...
Signed-off-by: Your Name <you@example.com >
Co-authored-by: Your Name <you@example.com >
2026-02-23 07:37:55 -07:00
Cyrus Leung
392645454b
[Refactor] Decouple TimingContext from InputProcessingContext ( #35083 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-23 14:15:50 +00:00
Eldar Kurtić
1e8438a89a
[Llama4,CI] Bring back Llama-4 bug fixes, and also fix Maverick tests ( #35033 )
...
Signed-off-by: Eldar Kurtic <you@example.com >
Co-authored-by: Eldar Kurtic <you@example.com >
2026-02-23 09:04:34 -05:00
Robert Shaw
8435b2e049
[ModelBash][DSV3] Add TRTLLM DSV3 Router GEMM kernel (6% B1 Speedup) ( #34302 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-23 14:02:26 +00:00
Yan Ma
b1b5e045df
[XPU] allow TORCH_SDPA/TRITON_ATTN as XPU vit Backend ( #35010 )
...
Signed-off-by: Yan Ma <yan.ma@intel.com >
2026-02-23 05:06:44 -08:00
Andreas Karatzas
5f68464f92
[ROCm][CI] Fix spec decode profile assertion and logprob test determinism ( #35043 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-23 05:05:54 -08:00
Vincent Gimenes
aa08a30fc9
[CLEANING] Remove unused disable_by_batch_size from SpeculativeConfig ( #35060 )
...
Signed-off-by: Vincent Gimenes <vincent.gimenes@gmail.com >
2026-02-23 05:05:36 -08:00
Wentao Ye
7f40e9e516
[Refactor] Remove dead private func _fp8_perm and _extract_mask_for_item ( #35068 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-23 05:05:20 -08:00
Harry Mellor
103e614b14
Fix pipeline parallel with embed scaling in the Transformers modelling backend ( #35094 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 05:04:47 -08:00
Neil Schemenauer
54e2f83d0a
[Feature] Lazy import for the "mistral" tokenizer module. ( #34651 )
...
Signed-off-by: Neil Schemenauer <nas@arctrix.com >
2026-02-23 00:43:01 -08:00
Gabe Goodhart
e631f8e78e
fix: Apply embedding_multiplier to inputs_embeds ( #34813 )
...
Signed-off-by: Gabe Goodhart <ghart@us.ibm.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-23 00:42:46 -08:00
Martin Hickey
e97c46a92d
[BugFix]: Fix local mypy issues ( #34739 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-23 00:40:29 -08:00
Jee Jee Li
7291d1b288
[Bugfix] Fix kernel benchmark ( #33752 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-22 21:18:08 -08:00
Cyrus Leung
987506bca6
[Refactor] Simplify dummy data generation ( #35025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-22 20:55:27 -08:00
Woosuk Kwon
c645e9a214
[Model Runner V2] Remove propose_draft method ( #35070 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-22 18:27:12 -08:00
Nick Hill
944ffb5968
[Model Runner V2][Minor] Remove redundant do_spec_decode field ( #35039 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-22 16:18:04 -08:00
qizixi
2bcf71b9c0
[Spec Decode] Reduce TP communication for speculative decoding draft token generation ( #34049 )
...
Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-22 14:59:16 -08:00
tacos8me
b7892a3bef
[Model] Add NVFP4 quantization support for Step3.5-Flash ( #34478 )
...
Signed-off-by: tacos8me <ian@cloudhabit.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-22 12:30:46 -07:00
Benjamin Chislett
682566b18e
[Bug] Refactor max_num_batched_tokens to account for drafting ( #34898 )
...
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com >
2026-02-22 11:18:46 -05:00
qizixi
b9c2a565cc
[Spec Decode] Defer clearing KV connector metadata for EAGLE3 speculative decode + prefill / decode disagg setup ( #34529 )
...
Signed-off-by: qizixi <qizixi@meta.com >
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com >
2026-02-22 08:08:32 -08:00
Andreas Karatzas
dd8c3a7fb2
[ROCm][CI] Fix realtime test timeouts caused by aiter JIT compilation delays ( #35052 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 10:07:18 +00:00
Andreas Karatzas
a8a47c17b6
[ROCm][CI] Fix flaky embedding chat test by using tolerance-based comparison ( #35050 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 09:03:44 +00:00
Roger Wang
40f88d8318
[Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser ( #34779 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-21 23:15:35 -08:00
Woosuk Kwon
2cbf9656ce
[Model Runner V2] Enable CUDA graph for Eagle3 ( #35040 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 21:42:50 -08:00
Xiao Li
30132cd144
Fix apply_top_k_top_p_triton called by non-cuda logits Tensor ( #35030 )
...
Signed-off-by: Xiao Li <ilx@meta.com >
2026-02-21 21:11:54 -08:00
Cyrus Leung
cbd95a2dd1
[Benchmark] Use sns.relplot for plotting ( #35027 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 20:26:48 -08:00
Athrael Soju
970861ac0c
[New Model] Add ColModernVBERT ( #34558 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com >
2026-02-22 12:23:41 +08:00
Wentao Ye
d24bdd7c4b
[CI] Bump mteb version to mteb[bm25s]>=2, <3 for pooling model unit tests ( #34961 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-21 20:23:24 -08:00
Andreas Karatzas
d403c1da1c
[CI] Stabilizing ROCm amd-ci signal and minor name fix in upstream ( #35008 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-22 04:01:10 +00:00
Woosuk Kwon
b71fbd06e2
[Model Runner V2] Support attention group ( #35036 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 16:42:53 -08:00
Vadim Gimpelson
74d90b1ce4
[Model Bash][DSR1] Add selective dynamic shape marking for CustomOp ( #34900 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-21 19:28:01 -05:00
Woosuk Kwon
a4047d4ea9
[Model Runner V2] Support Eagle3 (no CUDA graph) ( #35029 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-21 12:55:24 -08:00
Cyrus Leung
965fe45935
[CI/Build] Fix gRPC version mismatch ( #35013 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 12:14:41 -07:00
Roman
98b0205c3c
[Frontend] Add automatic language detection for Whisper transcription ( #34342 )
...
Signed-off-by: space_check <roman.vuskov@rwth-aachen.de >
Signed-off-by: Roman <45857014+spacecheck@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-21 04:49:41 -08:00
Huy Do
272b535ab3
[Bugfix] Gate 256-bit instructions to CUDA 12.9+ ( #34791 )
...
Signed-off-by: Huy Do <huydhn@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 04:48:14 -08:00
Cyrus Leung
f74f1572ca
[Benchmark] Improve benchmarks ( #35012 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-21 10:31:58 +00:00
petrpechman
bebfe55b1c
[Doc] Fix example of eagle3 ( #34960 )
...
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz >
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz >
2026-02-21 09:57:53 +00:00
Nick Hill
820d7815eb
[Core] Minor structured-output related scheduler optimization ( #34765 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:38:28 -08:00
Nicolò Lucchesi
ab6f3487a6
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" ( #34896 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56
[ROCm] Enable bitsandbytes quantization support on ROCm ( #34688 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2026-02-21 00:34:55 -08:00
jennyyyyzhen
2aab2bb543
[ROCM] Optimize ROCM_AITER_FA spec decode eagle performance ( #34541 )
...
Signed-off-by: jennyyyyzhen <yzhen@hmc.edu >
2026-02-20 20:32:05 -08:00
Andreas Karatzas
54254f7a61
[ROCm][CI] Fix spec decode logprobs flakiness and parametrize tree attention backends ( #34599 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:25:23 -08:00
Andreas Karatzas
cf93c1a128
[ROCm][AITER] Fix aiter paged_attention_v1 decode for sliding window and head_size < 64 ( #34570 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:25:07 -08:00
Andreas Karatzas
89358f0d35
[CI] Fix ColBERT HF comparison tests on AMD CI + refactor ( #34567 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:12:05 -08:00
zhongdaor-nv
a0fe7ea2f0
[feat] Add per-block extra_keys to KV events ( #33304 )
...
Signed-off-by: zhongdaor-nv <zhongdaor@nvidia.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-20 20:11:40 -08:00
Andreas Karatzas
991d6bff38
[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure ( #33949 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-20 20:03:32 -08:00
Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00