Cyrus Leung
|
119f00630b
|
[Renderer] Clean up renderer code (#26216)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-04 17:05:29 +00:00 |
|
Isotr0py
|
a42d2df75f
|
[Frontend] Cache chat template kwargs resolution (#26227)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-04 15:32:30 +00:00 |
|
Li, Jiang
|
5c057e068f
|
[CPU] Refine batch reorder of CPU attention backend (#26096)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-10-04 21:54:35 +08:00 |
|
Thomas Parnell
|
ed3aeb25a4
|
[V1] [Hybrid] Remove code to override default CUDA graph configuration (#26226)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-04 13:47:48 +00:00 |
|
yuafng
|
86ee949128
|
Fix tensor device and dtype placement in Qwen2VL model (#26219)
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Yuanfeng Li <yuanfengli@meta.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-10-04 06:41:39 -07:00 |
|
Cyrus Leung
|
4570535ec4
|
[Model] CLIP Embedding Support (#26010)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-04 06:21:42 -07:00 |
|
Nicolò Lucchesi
|
2a6dc67eb5
|
[Bugfix] Fix _reqs_to_process leak on abort (#26012)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-04 11:39:31 +00:00 |
|
Yannick Schnider
|
f05fea1f5e
|
[Core] Enable decode of context length equal to max model length (#26168)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-04 09:59:26 +00:00 |
|
Luca Soldaini
|
d0df145c2a
|
Add Olmo 3 reasoning parser (#26054)
Signed-off-by: Luca Soldaini <luca@soldaini.net>
|
2025-10-04 17:48:29 +08:00 |
|
Cyrus Leung
|
1838cd4860
|
Revert "Add batch invariant kernel override for FlashInfer backend [2/n]" (#26220)
|
2025-10-04 02:45:08 -07:00 |
|
Huamin Li
|
7d6b03381e
|
[CI Failure] fix_test_auto_prefix_cache_support (#26053)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2025-10-04 02:44:49 -07:00 |
|
Cyrus Leung
|
7c2e91c4e0
|
[Misc] Remove unused executor.apply_model (#26215)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-04 01:45:53 -07:00 |
|
Cyrus Leung
|
736fbf4c89
|
[Misc] Require merge_by_field_config argument (#26214)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-04 01:40:14 -07:00 |
|
Cyrus Leung
|
44ea85137a
|
[Model] Support nested structures for TensorSchema (#26212)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-04 01:20:32 -07:00 |
|
Harry Mellor
|
d3d649efec
|
Support expert parallel in Transformers backend (#26162)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-10-04 04:35:04 +00:00 |
|
Stan Wozniak
|
ea507c3a93
|
[V1] [Hybrid] Mamba2 Automatic Prefix Caching (#25752)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Thomas Ortner <boh@zurich.ibm.com>
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Thomas Ortner <boh@zurich.ibm.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-04 06:34:22 +02:00 |
|
Fadi Arafeh
|
9705fba7b7
|
[cpu][perf] Accelerate unquantized-linear for AArch64 through oneDNN/ACL and weight prepack (#25948)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
|
2025-10-04 12:16:38 +08:00 |
|
Bram Wasti
|
2f7dbc9b42
|
Add batch invariant kernel override for FlashInfer backend [2/n] (#25769)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2025-10-03 19:49:30 -07:00 |
|
Ben Browning
|
ea25a76c05
|
[BugFix] Use async Mistral Tokenizer in Chat Completions (#26134)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-10-04 09:42:08 +08:00 |
|
Roger Wang
|
67bc0c003e
|
[Bugfix] Fix qwen3 vl dummy data generation with overrides (#26193)
Signed-off-by: Roger Wang <hey@rogerw.io>
|
2025-10-04 01:40:20 +00:00 |
|
Eugene Khvedchenya
|
5a05f26603
|
Fix issue of using only the part of video frame [Nemotron Nano] (#26186)
Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>
|
2025-10-04 00:21:00 +00:00 |
|
Varun Sundar Rabindranath
|
7ef40bb983
|
[GPTOSS][DP/EP][Marlin] Enable GPTOSS DP/EP using Marlin kernels (#25488)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-10-03 20:13:13 -04:00 |
|
Wentao Ye
|
767cbb011d
|
[CI] Fix Pre-commit Mypy Error (#26181)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 16:08:03 -07:00 |
|
Angela Yi
|
7cfa4b24bf
|
[BugFix] Fix de-functionalization pass for rotary_embedding (#23953)
Signed-off-by: angelayi <yiangela7@gmail.com>
|
2025-10-03 15:44:18 -07:00 |
|
Sergei Skvortsov
|
b71fcd4905
|
[Misc] Add penalties sampling parameters to serve tool (#25974)
Signed-off-by: Sergei Skvortsov <sergeyskv@nebius.com>
Co-authored-by: Sergei Skvortsov <sergeyskv@nebius.com>
|
2025-10-03 15:43:14 -07:00 |
|
Sahithi Chigurupati
|
75003f34e8
|
[CI] Push multiarch manifests as nightly builds (#25764)
Signed-off-by: Sahithi Chigurupati <chigurupati.sahithi@gmail.com>
|
2025-10-03 15:42:55 -07:00 |
|
Bowen Bao
|
78b8015a4d
|
[Bugfix] Relax tokenizer regex for mixtral to include 'tokenizer.model' (#25964)
Signed-off-by: Bowen Bao <bowenbao@amd.com>
|
2025-10-03 18:31:59 -04:00 |
|
Andrew Xia
|
831b124151
|
[responsesAPI] add better error messaging for long prompts (#25724)
Signed-off-by: Andrew Xia <axia@meta.com>
Signed-off-by: Andrew Xia <axia@fb.com>
Co-authored-by: Andrew Xia <axia@fb.com>
|
2025-10-03 14:33:13 -07:00 |
|
Wentao Ye
|
c1ffcb55da
|
[Refactor] Optimize FP8 MOE Backend Choice and Log (#26044)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-10-03 15:23:42 -06:00 |
|
Corey Lowman
|
0879736aab
|
[Perf] Remove hardcoded num_warps=1 (#26183)
Signed-off-by: Corey Lowman <clowman1993@gmail.com>
|
2025-10-03 20:38:50 +00:00 |
|
Pavani Majety
|
a26917332f
|
[Quantization/NVFP4] Speed up TRTLLM NVFP4 MOE weight loading and fix K/V scale loading for MLA Attn (#25968)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
|
2025-10-03 19:35:06 +00:00 |
|
Nikhil G
|
cd9e5b8340
|
Fix V1 engine serialization error with Ray distributed executor (#26148)
Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
|
2025-10-03 18:39:45 +00:00 |
|
Matthew Bonanni
|
300a59c4c3
|
Avoid division by zero in cache DS MLA kernel (#26174)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-10-03 17:35:17 +00:00 |
|
Harry Mellor
|
d76541a6c5
|
Stop mergify from keeping stale PRs alive (#26169)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-03 16:42:34 +00:00 |
|
Chendi.Xue
|
dd96465fd7
|
[BugFix][QWEN-VL]fix wrong apply_rotary_emb_torch selection introduced by #24642 (#26123)
Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-10-03 08:52:26 -07:00 |
|
Jun Jiang
|
4f8f47e87e
|
Fix undefined symbol: cutlass_moe_mm_sm100 (#26098)
Signed-off-by: Jun Jiang <jasl9187@hotmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-10-03 15:48:32 +00:00 |
|
Cyrus Leung
|
d78fda7cda
|
[Renderer] Move Processor out of LLMEngine (#26165)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-03 15:08:22 +00:00 |
|
Aleksandr Samarin
|
73a99cc2a5
|
[Model] Fixed stream generator for gpt-oss + spec-decoding (#26027)
Signed-off-by: Aleksandr Samarin <astrlrd@nebius.com>
|
2025-10-03 13:43:41 +00:00 |
|
Xiang Si
|
adae0c1f43
|
[CI/Build] do not enforce precompilation on tpu ci tests (#25992)
Signed-off-by: Xiang Si <sixiang@google.com>
|
2025-10-03 13:38:42 +00:00 |
|
whx
|
cbf9221992
|
[Model] Supplement to PR 24862: Pass param prefix to LLMHead (#25805)
Signed-off-by: whx-sjtu <2952154980@qq.com>
|
2025-10-03 21:34:53 +08:00 |
|
Paul Pak
|
5f42fc53b6
|
[backends][short_conv] CUDA graph piecewise edits (#24215)
Signed-off-by: Paul Pak <paulpak58@gmail.com>
|
2025-10-03 12:59:48 +00:00 |
|
Yannick Schnider
|
8ee846c27c
|
[Bugfix] Re-enable prefill of max model length (#24446)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-03 14:13:34 +02:00 |
|
Yang Liu
|
812b7f54a8
|
[Renderer] Move Processor out of AsyncLLM (#24138)
Signed-off-by: Yang <lymailforjob@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-10-03 11:29:45 +00:00 |
|
Sage Moore
|
5f2cacdb1e
|
Quick fix for IMA with the Prefix Prefill kernel during graph capture (#25983)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2025-10-03 11:28:22 +00:00 |
|
Egor
|
aa5053e3fe
|
[Doc] Fixed shape description for fused_batched_moe.py (#25668)
Signed-off-by: Egor <e.a.krivov@gmail.com>
|
2025-10-03 04:00:23 -07:00 |
|
Wenlong Wang
|
79aa244678
|
[Multi Modal] Configurable MM Profiling (#25631)
Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-03 03:59:10 -07:00 |
|
kyt
|
2ed3f20dba
|
[openai] Fix missing tool usage check (system message) (#24768)
Signed-off-by: kyt <eluban4532@gmail.com>
|
2025-10-03 18:55:44 +08:00 |
|
Nicolò Lucchesi
|
48f309029a
|
[NIXL][Misc] Expose metrics from NIXL for logging to CLI (#25388)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-10-03 10:47:59 +00:00 |
|
Thomas Parnell
|
0e93ac0b3a
|
[CI] Fix distributed hybrid tests in CI (#26155)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-10-03 09:14:18 +00:00 |
|
Yannick Schnider
|
5446ad1d24
|
[test utils] correct wrong typing (#26159)
Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com>
|
2025-10-03 02:11:49 -07:00 |
|