Isotr0py
|
609ef61fea
|
[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling (#14361)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-08 16:52:34 +00:00 |
|
Lucas Wilkinson
|
db84f5eb3b
|
[Bugfix] DeepSeek Accuracy (#14476)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-08 16:47:03 +00:00 |
|
Harry Mellor
|
206e2577fa
|
Move requirements into their own directory (#12547)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 16:44:35 +00:00 |
|
Cyrus Leung
|
e02883c400
|
[Misc] Don't run ruff at all on 3rd party libs (#14493)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 07:16:40 -08:00 |
|
Russell Bryant
|
9085aabd62
|
[benchmarks] Add option to use unique jsonschema for each request (#14457)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-08 06:36:39 -08:00 |
|
Roger Wang
|
8d5aa466fb
|
[V1][Core] Fix memory issue with logits & sampling (#13776)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-08 06:11:04 -08:00 |
|
Aaron Pham
|
0b7f06b447
|
[Misc] add use_tqdm_on_load to reduce logs (#14407)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-08 05:57:46 -08:00 |
|
Isotr0py
|
03fe18ae0f
|
[VLM] Add TP support for Phi-4-MM (#14453)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-08 05:57:14 -08:00 |
|
Alexander Matveev
|
cb8bdfade2
|
[V1] TPU - Add tensor parallel support via Ray (#13618)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-08 08:19:38 -05:00 |
|
Cyrus Leung
|
33f227e16b
|
[CI/Build] Use a fixed seed to avoid flaky tests (#14480)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 11:30:09 +00:00 |
|
Harry Mellor
|
cfd0ae8234
|
Add RLHF document (#14482)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 09:51:39 +00:00 |
|
Lucas Wilkinson
|
7caff01a7b
|
[Build/BugFix] Fix hopper 12.8 build (#14354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 08:11:56 +00:00 |
|
Harry Mellor
|
be0b399d74
|
Add training doc signposting to TRL (#14439)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 07:35:07 +00:00 |
|
Jee Jee Li
|
b8b0ccbd2d
|
[Bugfix] Make the deviceprofiler include LoRA memory. (#14469)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-08 07:12:22 +00:00 |
|
Robin
|
c908a07f57
|
[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-08 07:07:32 +00:00 |
|
Robin
|
7b6fd6e486
|
[Doc]add doc for Qwen models tool calling (#14478)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-08 06:58:46 +00:00 |
|
Harry Mellor
|
47512b3200
|
Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 14:46:15 +08:00 |
|
Roger Meier
|
3b9c6c6947
|
[CI/Build] refactor: set timezone of container to UTC (#12888)
Signed-off-by: Roger Meier <r.meier@siemens.com>
|
2025-03-07 22:42:01 -08:00 |
|
Aviv Keshet
|
4aae667668
|
[core] add extra_args to SamplingParams (#13300)
Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>
|
2025-03-08 14:41:18 +08:00 |
|
Cody Yu
|
9f3bc0f58c
|
[MISC][V1] Register process killing handler only in the main thread (#14380)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-07 22:40:06 -08:00 |
|
Mathis Felardos
|
980385f8c1
|
[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache (#14369)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2025-03-07 22:39:31 -08:00 |
|
Tyler Michael Smith
|
ca7a2d5f28
|
Revert "[Perf] Reduce MLA CPU overheads in V1 (#14384)" (#14471)
|
2025-03-07 22:18:53 -08:00 |
|
Tyler Michael Smith
|
333681408f
|
[Bugfix][V1] Handle MLA in kv_cache_interface (#14462)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-07 22:18:25 -08:00 |
|
afeldman-nm
|
ef64044079
|
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949)
|
2025-03-08 01:48:12 +00:00 |
|
yarongmu-google
|
66e16a038e
|
[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-03-07 23:17:04 +00:00 |
|
Mark McLoughlin
|
e1f0835ae0
|
[V1][Metrics] Fix traceback with preemptions+LoRA (#14220)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-07 15:36:16 -05:00 |
|
Nick Hill
|
8ed5421aaa
|
[V1] Eagerly remove finished requests from the batch (#14388)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 10:56:00 -08:00 |
|
youkaichao
|
c6359e8ca6
|
[v1] torch.compile integration explanation (#14437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-08 01:55:50 +08:00 |
|
Jee Jee Li
|
952a074980
|
[Misc] Add Phi4-MM example (#14343)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-07 17:28:52 +00:00 |
|
Jinzhen Lin
|
d0feea31c7
|
[Kernel] optimize performance of gptq marlin kernel when n is small (#14138)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-07 11:53:38 -05:00 |
|
Jeremy Arnold
|
58abe35455
|
[Benchmarks] Make detokenization optional in benchmark scripts (#11697)
Signed-off-by: Jeremy Arnold <Jeremy.Arnold@amd.com>
|
2025-03-07 08:09:00 -08:00 |
|
York-RDWang
|
f7ebad2307
|
[Doc] Update prefix_caching.md to match the example image (#14420)
|
2025-03-07 15:29:00 +00:00 |
|
Aaron Pham
|
80e9afb5bc
|
[V1][Core] Support for Structured Outputs (#12388)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 07:19:11 -08:00 |
|
iefgnoix
|
1e3598edeb
|
Use the optimized block sizes after tuning the kernel. (#14329)
|
2025-03-07 13:25:13 +00:00 |
|
Harry Mellor
|
f7a6bd0fa1
|
Fix missing kv_caches and attn_metadata in OpenVINOCausalLM (#14271)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-07 12:30:42 +00:00 |
|
Aleksandr Malyshev
|
0ca3b8e01c
|
[BUGFIX] Skip tokenization support for throughput benchmark (#12712)
Signed-off-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Signed-off-by: Aleksandr Malyshev <maleksan@amd.com>
Co-authored-by: root <root@banff-cyxtera-s73-5.ctr.dcgpu>
Co-authored-by: Aleksandr Malyshev <maleksan@amd.com>
|
2025-03-07 02:51:47 -08:00 |
|
மனோஜ்குமார் பழனிச்சாமி
|
cc10281498
|
[Misc] Set default value of seed to None (#14274)
Signed-off-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
|
2025-03-07 10:40:01 +00:00 |
|
Cyrus Leung
|
05fb6718f0
|
[Bugfix] Clean up multi-modal processors (#14417)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-07 10:33:38 +00:00 |
|
Jee Jee Li
|
12c29a881f
|
[Bugfix] Further clean up LoRA test (#14422)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-07 10:30:55 +00:00 |
|
Peng Li
|
70da0c0748
|
correct wrong markdown syntax (#14414)
Signed-off-by: vincent-pli <justdoit.pli@gmail.com>
|
2025-03-07 08:01:18 +00:00 |
|
Cyrus Leung
|
c1588a2c94
|
[GH] Auto-apply multi-modality label to relevant PRs (#14402)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-07 15:26:32 +08:00 |
|
Ilya Lavrenov
|
8ca7a71df7
|
OpenVINO: added CPU-like conditions (#14338)
Signed-off-by: Ilya Lavrenov <ilya.lavrenov@intel.com>
|
2025-03-06 22:24:49 -08:00 |
|
Isotr0py
|
63137cd922
|
[Build] Add nightly wheel fallback when latest commit wheel unavailable (#14358)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-06 22:10:57 -08:00 |
|
Jee Jee Li
|
ddd1ef66ec
|
[Bugfix] Fix JambaForCausalLM LoRA (#14370)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-06 22:05:47 -08:00 |
|
Lucas Wilkinson
|
e5e03c2c1b
|
[BugFix] Illegal Memory Access in the blockwise cutlass fp8 GEMMs (#14396)
|
2025-03-06 21:56:06 -08:00 |
|
Luka Govedič
|
e1744502c2
|
[FP8] Refactor apply_fp8_linear and apply_fp8_linear_generic into an object (#14390)
Signed-off-by: luka <luka@neuralmagic.com>
|
2025-03-07 05:20:16 +00:00 |
|
Lucas Wilkinson
|
dae6896977
|
[Perf] Reduce MLA CPU overheads in V1 (#14384)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-06 19:59:14 -08:00 |
|
Brayden Zhong
|
c34eeec58d
|
[Bugfix] Correctly call cudaProfilerStop in benchmarks script (#14183)
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
|
2025-03-07 00:42:49 +00:00 |
|
Daniel Li
|
ad60bbb2b2
|
[Doc] Fix a typo (#14385)
|
2025-03-06 16:31:52 -08:00 |
|
Chengji Yao
|
0578e5a462
|
[Hardware][TPU]Enable ragged paged attention kernel and resolve recompilation issue (#14310)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-06 23:31:05 +00:00 |
|