Jeremy Arnold
|
58abe35455
|
[Benchmarks] Make detokenization optional in benchmark scripts (#11697)
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>
|
2025-03-07 08:09:00 -08:00 |
|
York-RDWang
|
f7ebad2307
|
[Doc] Update prefix_caching.md to match the example image (#14420)
|
2025-03-07 15:29:00 +00:00 |
|
Aaron Pham
|
80e9afb5bc
|
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 07:19:11 -08:00 |
|
iefgnoix
|
1e3598edeb
|
Use the optimized block sizes after tuning the kernel. (#14329)
|
2025-03-07 13:25:13 +00:00 |
|
Harry Mellor
|
f7a6bd0fa1
|
Fix missing kv_caches and attn_metadata in OpenVINOCausalLM (#14271)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-07 12:30:42 +00:00 |
|
Aleksandr Malyshev
|
0ca3b8e01c
|
[BUGFIX] Skip tokenization support for throughput benchmark (#12712)
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-03-07 02:51:47 -08:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
cc10281498
|
[Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-03-07 10:40:01 +00:00 |
|
Cyrus Leung
|
05fb6718f0
|
[Bugfix] Clean up multi-modal processors (#14417)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-07 10:33:38 +00:00 |
|
Jee Jee Li
|
12c29a881f
|
[Bugfix] Further clean up LoRA test (#14422)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-07 10:30:55 +00:00 |
|
Peng Li
|
70da0c0748
|
correct wrong markdown syntax (#14414)
Signed-off-by: vincent-pli <justdoit.pli@gmail.com>
|
2025-03-07 08:01:18 +00:00 |
|
Cyrus Leung
|
c1588a2c94
|
[GH] Auto-apply multi-modality label to relevant PRs (#14402)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-07 15:26:32 +08:00 |
|
Ilya Lavrenov
|
8ca7a71df7
|
OpenVINO: added CPU-like conditions (#14338)
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
|
2025-03-06 22:24:49 -08:00 |
|
Isotr0py
|
63137cd922
|
[Build] Add nightly wheel fallback when latest commit wheel unavailable (#14358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-06 22:10:57 -08:00 |
|
Jee Jee Li
|
ddd1ef66ec
|
[Bugfix] Fix JambaForCausalLM LoRA (#14370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-06 22:05:47 -08:00 |
|
Lucas Wilkinson
|
e5e03c2c1b
|
[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396)
|
2025-03-06 21:56:06 -08:00 |
|
Luka Govedič
|
e1744502c2
|
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-07 05:20:16 +00:00 |
|
Lucas Wilkinson
|
dae6896977
|
[Perf] Reduce MLA CPU overheads in V1 (#14384)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-06 19:59:14 -08:00 |
|
Brayden Zhong
|
c34eeec58d
|
[Bugfix] Correctly call cudaProfilerStop in benchmarks script (#14183)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-07 00:42:49 +00:00 |
|
Daniel Li
|
ad60bbb2b2
|
[Doc] Fix a typo (#14385)
|
2025-03-06 16:31:52 -08:00 |
|
Chengji Yao
|
0578e5a462
|
[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue (#14310)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-06 23:31:05 +00:00 |
|
Michael Goin
|
04222984f8
|
[Docs] Add nsight guide to profiling docs (#14298)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-06 14:19:58 -08:00 |
|
Michael Goin
|
6832707e90
|
[V1][Bugfix] Standardize quantized kv cache rejection for attention backends (#14221)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-06 14:18:29 -08:00 |
|
Michael Goin
|
6b2ef5cd17
|
[Bug] Fix Attention when ignored in by quant_method (#14313)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-06 14:18:06 -08:00 |
|
Tyler Michael Smith
|
958adce478
|
[Bugfix] Fix use_direct_call condition in FusedMoE layer for (#14382)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-06 14:17:21 -08:00 |
|
Tyler Michael Smith
|
99b0915d3b
|
[Kernel] Add needs_fixed_stride_order tag to most GEMMs (#14306)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-06 14:17:09 -08:00 |
|
Thomas Parnell
|
8ca2b21c98
|
[CI] Disable spawn when running V1 Test (#14345)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-03-06 21:52:46 +00:00 |
|
Michael Goin
|
d9292786e1
|
[CI/Build] Use uv python for docker rather than ppa:deadsnakes/ppa (#13569)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-03-06 16:08:36 -05:00 |
|
Tyler Michael Smith
|
cc2f9b32c8
|
[Distributed] Add enable_expert_parallel arg (#14305)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-06 18:54:45 +00:00 |
|
Himanshu Jaju
|
cd579352bf
|
[V1] Do not detokenize if sampling param detokenize is False (#14224)
Signed-off-by: Himanshu Jaju <hj@mistral.ai>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-06 10:40:24 -08:00 |
|
Ying Zhong
|
9f1710f1ac
|
Fix mla prefill context performance (#13897)
Signed-off-by: ZhongYingMatrix <zhongyingmatrix@gmail.com>
|
2025-03-06 09:35:49 -08:00 |
|
Thomas Parnell
|
e642ec962c
|
Add authors to license header. (#14371)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
Co-authored-by: Burkhard Ringlein <ngl@zurich.ibm.com>
Co-authored-by: Jan van Lunteren <jvl@zurich.ibm.com>
|
2025-03-06 08:43:09 -08:00 |
|
Dilip Gowda Bhagavan
|
ada19210a3
|
Adding cpu inference with VXE ISA for s390x architecture (#12613)
Signed-off-by: Dilip Gowda Bhagavan <dilip.bhagavan@ibm.com>
Signed-off-by: Rishika Kedia <rishika.kedia@in.ibm.com>
Co-authored-by: Rishika Kedia <rishika.kedia@in.ibm.com>
|
2025-03-06 08:40:53 -08:00 |
|
Harry Mellor
|
bf0560bda9
|
Reinstate best_of for V0 (#14356)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-06 08:34:22 -08:00 |
|
youkaichao
|
151b08e0fe
|
[RLHF] use worker_extension_cls for compatibility with V0 and V1 (#14185)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-07 00:32:46 +08:00 |
|
Jitse Klomp
|
81b2f4a45f
|
[Doc] Fix date typo in README.md (#14366)
Signed-off-by: Jitse Klomp <jitse.klomp@conclusionxforce.nl>
|
2025-03-06 08:29:57 -08:00 |
|
Cyrus Leung
|
82551ad616
|
[Core] Don't use cache during multi-modal profiling (#14336)
|
2025-03-06 08:03:31 -08:00 |
|
courage17340
|
caac5c2e59
|
[Bugfix][Core] fix abort_seq_group and memory leak when n>1 (#14326)
Signed-off-by: courage17340 <courage17340@163.com>
|
2025-03-06 23:59:32 +08:00 |
|
Thomas Parnell
|
6bd1dd9d26
|
[Kernel] [V1] Improved performance for V1 Triton (ROCm) backend (#14152)
|
2025-03-06 07:39:16 -08:00 |
|
Irina Yuryeva
|
4f27044aab
|
[Doc] Correct beam_search using in generative_models.md (#14363)
|
2025-03-06 15:37:10 +00:00 |
|
Yanyi Liu
|
0ddc991f5c
|
[Doc] Update reasoning with stream example to use OpenAI library (#14077)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-03-06 13:20:37 +00:00 |
|
Nicolò Lucchesi
|
fa82b93853
|
[Frontend][Docs] Transcription API streaming (#13301)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-03-06 10:39:35 +00:00 |
|
Nicolò Lucchesi
|
69ff99fdcd
|
[Core] Optimizing cross-attention QKVParallelLinear computation (#12325)
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
Co-authored-by: NickLucche <nick@nlucches-4xa100.c.openshift-330514.internal>
|
2025-03-06 09:37:26 +00:00 |
|
lkchen
|
5d802522a7
|
[V1][VLM][Pixtral-HF] Support Pixtral-HF on V1 (#14275)
Signed-off-by: Linkun Chen <github@lkchen.net>
|
2025-03-06 08:58:41 +00:00 |
|
kYLe
|
1769928079
|
[Model] Update Paligemma multimodal processing with PromptUpdate (#14015)
Signed-off-by: Kyle Huang <kylhuang@nvidia.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-03-06 08:31:38 +00:00 |
|
Pavani Majety
|
ed6ea06577
|
[Hardware] Update the flash attn tag to support Blackwell (#14244)
|
2025-03-05 22:01:37 -08:00 |
|
Nicolò Lucchesi
|
5ee10e990d
|
[Bugfix][CI] ALiBi test case in xformers multi_query_kv_attention (#11301)
|
2025-03-05 20:00:53 -08:00 |
|
Varun Sundar Rabindranath
|
3dbd2d813a
|
[V1] LoRA - Enable more V1 tests (#14315)
Signed-off-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2025-03-06 11:55:42 +08:00 |
|
Ce Gao
|
f5f7f00cd9
|
[Bugfix][Structured Output] Support outlines engine with reasoning outputs for DeepSeek R1 (#14114)
|
2025-03-06 03:49:20 +00:00 |
|
Rui Qiao
|
abcc61e0af
|
[misc] Mention ray list nodes command to troubleshoot ray issues (#14318)
Signed-off-by: Rui Qiao <ruisearch42@gmail.com>
|
2025-03-06 02:00:36 +00:00 |
|
Lucas Wilkinson
|
f6bb18fd9a
|
[BugFix] MLA + V1, illegal memory access and accuracy issues (#14253)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-05 17:10:13 -08:00 |
|