Lucas Wilkinson
8b5014d3dd
[Attention] FA4 integration ( #32974 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com >
2026-03-01 23:44:57 +00:00
Augusto Yao
8e75d88554
add io_process_plugin for sparse embedding ( #34214 )
...
Signed-off-by: augusto.yjh <augusto.yjh@antgroup.com >
Signed-off-by: Augusto Yao <augusto.yjh@antgroup.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-28 09:16:37 +00:00
Cyrus Leung
4292e3b807
[Benchmark] Improve UX of sweep scripts ( #35600 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-28 00:36:02 -08:00
Cyrus Leung
24d6ea8afd
[Benchmark] Rename SLA Finder to Workload Explorer ( #35586 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-27 23:31:55 -08:00
Cyrus Leung
fd68cd132b
[Bugfix] Fixes for SLA finder ( #35537 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-27 20:20:55 -08:00
Micah Williamson
0edf101d2b
[ROCm] Add stablelm Head Size 80 To Supported Head Sizes For ROCM_ATTN ( #35527 )
...
Signed-off-by: Micah Williamson <micah.williamson@amd.com >
2026-02-28 12:16:34 +08:00
Gregory Shtrasberg
9fa6c68fa6
[ROCm] Enabling encoder and encoder-decoder on ROCm and AITER unified backends ( #35334 )
...
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com >
2026-02-27 21:32:55 +00:00
Martin Hickey
b602e4f299
[Doc] Fix link to Llama chat template for usability ( #35525 )
...
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-27 17:51:09 +00:00
fort726
905d76b51d
[Model] Add huggingface skt/A.X-K1 model ( #32407 )
...
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com >
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com >
2026-02-27 09:26:02 -08:00
Boyuan Feng
5de98abc12
Add @BoyuanFeng to CODEOWNERS ( #35317 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2026-02-27 15:53:47 +00:00
Wentao Ye
062b789632
[Bug] Fix outdated links in source code ( #35314 )
...
Signed-off-by: yewentao256 <zhyanwentao@126.com >
2026-02-27 03:50:46 +00:00
Tyler Michael Smith
eb19955c37
[WideEP] Remove pplx all2all backend ( #33724 )
...
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com >
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com >
2026-02-26 14:30:10 -08:00
Jakub Zakrzewski
111d869069
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model ( #35297 )
...
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com >
2026-02-26 14:17:17 +00:00
Jiangyun Zhu
ab87f85231
[Model] Ring 2.5 ( #35102 )
...
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com >
2026-02-26 02:17:11 -08:00
Cyrus Leung
d3a51da92a
[Benchmark] Simplify SLA scan ( #35306 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-25 22:35:41 -08:00
Seungmin Kim
160424a937
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs ( #33992 )
...
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com >
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com >
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-25 18:15:51 -08:00
Joao Gante
709eadbb0b
Doc link typo ( #35281 )
...
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-25 03:00:31 -08:00
Yanwen Lin
80e60a6133
[Doc] Suggest "--managed-python" flag when installing python using uv ( #33069 )
...
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com >
2026-02-25 08:19:43 +00:00
jonoillar
26e722f906
[DOC][BugFix] Specfiy build dependency installation ( #34513 )
...
Signed-off-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
Co-authored-by: Jon OILLARBURU <jon.oillarburu@multiversecomputing.com >
2026-02-25 08:04:06 +00:00
lichuang
2c619e5e3f
[Docs]Fix documentation formatting in architecture overview ( #34679 )
...
Signed-off-by: codedump <lichuang1982@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-25 08:00:15 +00:00
Simon Mo
8a685be8d9
docs: document committer proposal process in governance ( #35225 )
...
Signed-off-by: Simon Mo <simon.mo@hey.com >
Co-authored-by: Cursor <cursoragent@cursor.com >
2026-02-25 07:58:48 +00:00
Harry Mellor
f7967577f5
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM ( #35203 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-24 22:00:06 -08:00
Nicolò Lucchesi
f91808ae0d
[MM] Allow audio chunking for offline LLM ( #34628 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-23 21:04:28 -08:00
Mark McLoughlin
5cc7c4452e
[Metrics] Add Prometheus counters for Model FLOPs Utilization (MFU) ( #30950 )
...
Export the existing Model FLOPs Utilization (MFU) metrics via Prometheus.
`--enable-mfu-metrics` is required for these to be exposed.
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Signed-off-by: Mark McLoughlin <markmc@redhat.com >
2026-02-23 15:01:07 +00:00
Cyrus Leung
987506bca6
[Refactor] Simplify dummy data generation ( #35025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-22 20:55:27 -08:00
Athrael Soju
970861ac0c
[New Model] Add ColModernVBERT ( #34558 )
...
Signed-off-by: Athrael Soju <athrael.soju@gmail.com >
Signed-off-by: athrael-soju <athrael-soju@users.noreply.github.com >
2026-02-22 12:23:41 +08:00
petrpechman
bebfe55b1c
[Doc] Fix example of eagle3 ( #34960 )
...
Signed-off-by: Petr Pechman <petr.pechman@firma.seznam.cz >
Co-authored-by: Petr Pechman <petr.pechman@firma.seznam.cz >
2026-02-21 09:57:53 +00:00
Nicolò Lucchesi
ab6f3487a6
[PD] Change kv_load_failure_policy Default from "recompute" to "fail" ( #34896 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-21 01:34:57 -08:00
BADAOUI Abdennacer
8dc8a99b56
[ROCm] Enable bitsandbytes quantization support on ROCm ( #34688 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2026-02-21 00:34:55 -08:00
Kata Coder
5719a4e4e6
[Frontend] Support multimodal inputs for late-interaction scoring (ColQwen3) + NewModel: nvidia/nemotron-colembed ( #34574 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
2026-02-20 20:01:40 -08:00
Kyle Sayers
64ac1395e8
[Docs] Clean up speculators docs ( #34065 )
...
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com >
2026-02-18 13:48:11 -08:00
Cyrus Leung
a766b30349
[Renderer] Deprecate code paths for old input processing ( #34775 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-18 00:35:04 -08:00
Matthew Bonanni
dc5fa77a4e
[Bugfix][MTP][Sparse MLA] Allow sparse MLA with MTP to run with FULL cudagraphs ( #34457 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-17 14:01:27 -05:00
Harry Mellor
a21cedf4ff
Bump lm-eval version for Transformers v5 compatibility ( #33994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 05:24:35 -08:00
Parth Bansal
5653021094
[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model ( #34584 )
...
Signed-off-by: Parth Bansal <parthbansal127@gmail.com >
2026-02-16 12:09:00 +08:00
Maryam Tahhan
f07a128413
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… ( #33079 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-15 06:33:08 -08:00
Isotr0py
19fab44152
[Doc] Update Encoder-Decoder models support doc with Florence-2 ( #34581 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 04:18:57 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
myselvess
bcf0731aa0
[New Model] support new model ovis2.6 ( #34426 )
...
Signed-off-by: myselvess <23743269+myselvess@users.noreply.github.com >
2026-02-13 00:12:45 -08:00
Matthew Bonanni
f2c47886fd
[Attention] Add FlashInfer Sparse MLA backend ( #33451 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com >
2026-02-12 17:21:54 +00:00
Nicolò Lucchesi
334c715e0f
[Docs] Spec decoding docs warning removal ( #34439 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-12 09:01:51 -08:00
Louie Tsai
55a1a9563a
Vllm CPU benchmark suite improvement ( #34128 )
...
Signed-off-by: louie-tsai <louie.tsai@intel.com >
2026-02-12 16:04:44 +08:00
Tianqi Ren
786806dd44
[Doc] Update Marlin support matrix for Turing ( #34319 )
...
Signed-off-by: Tianqi Ren <tianqi.r@outlook.com >
2026-02-11 09:03:41 +00:00
Kunshang Ji
cb9574eb85
[XPU][9/N] clean up existing ipex code/doc ( #34111 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-11 00:27:15 -08:00
AllenDou
21dfb842d7
[model] support FunASR model ( #33247 )
...
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com >
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com >
2026-02-11 07:37:09 +00:00
Cyrus Leung
c9a1923bb4
[Plugin] Simplify IO Processor Plugin interface ( #34236 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-10 19:47:39 -08:00
bnellnm
d1481ba783
[MoE Refactor] Introduce MoERunner abstraction and move execution logic from FusedMoE to DefaultMoERunner ( #32344 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-10 19:51:07 -05:00
Cyrus Leung
25e48a3aae
[Doc] Update usage of --limit-mm-per-prompt ( #34148 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-09 21:12:13 -08:00
Michael Goin
5e75a14a66
[Doc] Add DCP support to attention backend doc ( #33936 )
2026-02-09 18:33:43 -05:00