Roger Wang
|
dea985aef0
|
[V1][Bugfix] Fix handing of second_per_grid_ts for Qwen2-VL & Qwen2.5-VL (#14548)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-10 16:03:11 +00:00 |
|
Harry Mellor
|
39be30351f
|
Correct capitalisation: Github -> GitHub (#14561)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 15:53:33 +00:00 |
|
Cyrus Leung
|
001a9c7b0d
|
[Doc] Update PaliGemma note to a warning (#14565)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-10 15:02:28 +00:00 |
|
Szymon Ożóg
|
89cdaa83e7
|
[Kernel] Add more dtype support for GGUF kernels (#14043)
Signed-off-by: SzymonOzog <szymon.ozog@aleph-alpha.com>
Signed-off-by: SzymonOzog <szymon.ozog@gmail.com>
|
2025-03-10 07:30:04 -07:00 |
|
Chauncey
|
b0746fae3d
|
[Frontend] support image embeds (#13955)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-10 12:36:03 +00:00 |
|
Harry Mellor
|
60a98b2de5
|
[Docs] Mention model_impl arg when explaining Transformers fallback (#14552)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-10 12:13:10 +00:00 |
|
Chauncey
|
460f553a6d
|
[Misc] Add log information for handle_process_request. (#14130)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-03-10 08:40:50 +00:00 |
|
Jennifer Zhao
|
1253b15774
|
[Feature] Consolidate performance benchmark datasets (#14036)
Signed-off-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Jennifer Zhao <7443418+JenZhao@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-10 07:23:11 +00:00 |
|
Martin Hoyer
|
dc74613fa2
|
[Bugfix] Wrong requirements path - rocm (#14527)
Signed-off-by: Martin Hoyer <mhoyer@redhat.com>
|
2025-03-10 02:49:46 +00:00 |
|
Yanyi Liu
|
a21076ed3a
|
[Misc] Ensure out-of-tree quantization method recognize by cli args (#14328)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
|
2025-03-09 12:13:31 +00:00 |
|
Chengji Yao
|
212007b168
|
[Hardware][TPU] Fix the recompiling issue in logits processor after warmup (#14510)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-03-09 05:44:39 -04:00 |
|
Isotr0py
|
fb16eea48b
|
[Bugfix] Revert QKVCrossParallelLinear usage in Mllama to keep BNB quantization work (#14498)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-09 04:47:45 +00:00 |
|
Yuchen Yan
|
73ae0b44e9
|
[Bugfix] Fix tqdm progress bar when SamplingParams.n > 1 (#12428)
Signed-off-by: Yuchen Yan <740987012@qq.com>
|
2025-03-08 20:14:53 -08:00 |
|
Jiayi Yao
|
6d7f037748
|
[Feat] Support chunked prefill for LMCache connector (#14505)
Signed-off-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2025-03-08 19:30:06 -08:00 |
|
iefgnoix
|
10f7552789
|
[V1][TPU] Remove unnecessary padding for running on TPU. (#14467)
|
2025-03-08 21:56:04 -05:00 |
|
Lucas Wilkinson
|
b0d541947a
|
[Attention] Default to FlashMLA backend for MLA (#14451)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 18:18:39 -08:00 |
|
Robert Shaw
|
5f0b53c6ea
|
Revert "[V1][Core] Fix memory issue with logits & sampling" (#14504)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2025-03-08 17:43:37 -08:00 |
|
22quinn
|
eb8b5eb183
|
[V1] Support bad_words in sampler (#13376)
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-03-08 14:50:26 -08:00 |
|
Cyrus Leung
|
9513290032
|
[Misc] Upgrade to Python 3.9 typing for additional directories (#14492)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 17:35:50 +00:00 |
|
Russell Bryant
|
0d5e73d30e
|
Update CODEOWNERS for structured output (#14496)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-08 17:19:51 +00:00 |
|
Isotr0py
|
609ef61fea
|
[Bugfix] Fix profiling OOM and decouple encoder multimodal profiling (#14361)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-08 16:52:34 +00:00 |
|
Lucas Wilkinson
|
db84f5eb3b
|
[Bugfix] DeepSeek Accuracy (#14476)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-03-08 16:47:03 +00:00 |
|
Harry Mellor
|
206e2577fa
|
Move requirements into their own directory (#12547)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 16:44:35 +00:00 |
|
Cyrus Leung
|
e02883c400
|
[Misc] Don't run ruff at all on 3rd party libs (#14493)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 07:16:40 -08:00 |
|
Russell Bryant
|
9085aabd62
|
[benchmarks] Add option to use unique jsonschema for each request (#14457)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2025-03-08 06:36:39 -08:00 |
|
Roger Wang
|
8d5aa466fb
|
[V1][Core] Fix memory issue with logits & sampling (#13776)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2025-03-08 06:11:04 -08:00 |
|
Aaron Pham
|
0b7f06b447
|
[Misc] add use_tqdm_on_load to reduce logs (#14407)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2025-03-08 05:57:46 -08:00 |
|
Isotr0py
|
03fe18ae0f
|
[VLM] Add TP support for Phi-4-MM (#14453)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-03-08 05:57:14 -08:00 |
|
Alexander Matveev
|
cb8bdfade2
|
[V1] TPU - Add tensor parallel support via Ray (#13618)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-03-08 08:19:38 -05:00 |
|
Cyrus Leung
|
33f227e16b
|
[CI/Build] Use a fixed seed to avoid flaky tests (#14480)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-03-08 11:30:09 +00:00 |
|
Harry Mellor
|
cfd0ae8234
|
Add RLHF document (#14482)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 09:51:39 +00:00 |
|
Lucas Wilkinson
|
7caff01a7b
|
[Build/BugFix] Fix hopper 12.8 build (#14354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-08 08:11:56 +00:00 |
|
Harry Mellor
|
be0b399d74
|
Add training doc signposting to TRL (#14439)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 07:35:07 +00:00 |
|
Jee Jee Li
|
b8b0ccbd2d
|
[Bugfix] Make the deviceprofiler include LoRA memory. (#14469)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-08 07:12:22 +00:00 |
|
Robin
|
c908a07f57
|
[Doc] Added QwQ-32B to the supported models list in the reasoning out… (#14479)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-08 07:07:32 +00:00 |
|
Robin
|
7b6fd6e486
|
[Doc]add doc for Qwen models tool calling (#14478)
Signed-off-by: WangErXiao <863579016@qq.com>
|
2025-03-08 06:58:46 +00:00 |
|
Harry Mellor
|
47512b3200
|
Default to generation_config from model (#12622)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-03-08 14:46:15 +08:00 |
|
Roger Meier
|
3b9c6c6947
|
[CI/Build] refactor: set timezone of container to UTC (#12888)
Signed-off-by: Roger Meier <r.meier@siemens.com>
|
2025-03-07 22:42:01 -08:00 |
|
Aviv Keshet
|
4aae667668
|
[core] add extra_args to SamplingParams (#13300)
Signed-off-by: Aviv Keshet <akeshet@scaledcognition.com>
|
2025-03-08 14:41:18 +08:00 |
|
Cody Yu
|
9f3bc0f58c
|
[MISC][V1] Register process killing handler only in the main thread (#14380)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2025-03-07 22:40:06 -08:00 |
|
Mathis Felardos
|
980385f8c1
|
[Bugfix][Disaggregated] Add a check in send_kv_caches_and_hidden_states and fix the reshape of the KVCache (#14369)
Signed-off-by: Mathis Felardos <mathis@mistral.ai>
|
2025-03-07 22:39:31 -08:00 |
|
Tyler Michael Smith
|
ca7a2d5f28
|
Revert "[Perf] Reduce MLA CPU overheads in V1 (#14384)" (#14471)
|
2025-03-07 22:18:53 -08:00 |
|
Tyler Michael Smith
|
333681408f
|
[Bugfix][V1] Handle MLA in kv_cache_interface (#14462)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-03-07 22:18:25 -08:00 |
|
afeldman-nm
|
ef64044079
|
[V1] Prompt logprobs + APC compatibility; prompt logprobs reqs cannot fill APC (#13949)
|
2025-03-08 01:48:12 +00:00 |
|
yarongmu-google
|
66e16a038e
|
[Bugfix] Fix torch_xla which can't handle None seed introduced in #14274 (#14459)
Signed-off-by: Yarong Mu <ymu@google.com>
|
2025-03-07 23:17:04 +00:00 |
|
Mark McLoughlin
|
e1f0835ae0
|
[V1][Metrics] Fix traceback with preemptions+LoRA (#14220)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-03-07 15:36:16 -05:00 |
|
Nick Hill
|
8ed5421aaa
|
[V1] Eagerly remove finished requests from the batch (#14388)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-03-07 10:56:00 -08:00 |
|
youkaichao
|
c6359e8ca6
|
[v1] torch.compile integration explanation (#14437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-03-08 01:55:50 +08:00 |
|
Jee Jee Li
|
952a074980
|
[Misc] Add Phi4-MM example (#14343)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-03-07 17:28:52 +00:00 |
|
Jinzhen Lin
|
d0feea31c7
|
[Kernel] optimize performance of gptq marlin kernel when n is small (#14138)
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
|
2025-03-07 11:53:38 -05:00 |
|