Nicolò Lucchesi
|
fa3bba2a53
|
[TPU][V1] Enable Top-P (#16843)
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-04-22 00:46:07 +00:00 |
|
Michael Goin
|
986537f1c3
|
[V1] V1 FlashInfer Attention (#16684)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Aurick Qiao <qiao@aurick.net>
|
2025-04-22 00:38:41 +00:00 |
|
Nicolò Lucchesi
|
210207525e
|
[TPU][V1] Capture multimodal encoder during model compilation (#15051)
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Siyuan Liu <lsiyuan@google.com>
|
2025-04-21 18:36:59 -06:00 |
|
Michael Goin
|
71eda0bb76
|
Update Qwen1.5-MoE-W4A16-compressed-tensors.yaml (#16946)
|
2025-04-21 18:35:32 -06:00 |
|
Chengji Yao
|
471fe65630
|
[TPU][V1] Implicitly adjust page size when there's SMEM OOM (#16871)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-04-21 15:43:13 -06:00 |
|
Woosuk Kwon
|
3a0fba5cf4
|
[V1][Spec Decode] Handle draft tokens beyond max_model_len (#16087)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-21 12:38:50 -07:00 |
|
Chanh Nguyen
|
299ebb62b2
|
[Core] Speed up decode by remove synchronizing operation in sampler (#16436)
Signed-off-by: Chanh Nguyen <cnguyen@linkedin.com>
Co-authored-by: Chanh Nguyen <cnguyen@linkedin.com>
|
2025-04-21 18:18:22 +00:00 |
|
David Xia
|
f728ab8e35
|
[Doc] mention how to install in CPU editable mode (#16923)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-04-21 17:45:51 +00:00 |
|
David Xia
|
63e26fff78
|
[doc] install required python3-dev apt package (#16888)
Signed-off-by: David Xia <david@davidxia.com>
|
2025-04-21 16:15:18 +00:00 |
|
Yan Ma
|
fe3462c774
|
[XPU][Bugfix] minor fix for XPU (#15591)
Signed-off-by: yan ma <yan.ma@intel.com>
|
2025-04-22 00:02:57 +08:00 |
|
Kartik Ramesh
|
3b34fd5273
|
Raise error for data-parallel with benchmark_throughput (#16737)
Signed-off-by: Kartik Ramesh <kartikx2000@gmail.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2025-04-21 23:51:43 +08:00 |
|
Isotr0py
|
55d6d3fdb8
|
[Bugfix] Fix GLM rotary_dim issue and support v1 (#16912)
Signed-off-by: isotr0py <2037008807@qq.com>
|
2025-04-21 14:26:34 +00:00 |
|
Shanshan Shen
|
7272bfae77
|
[Misc] Refactor platform to get device specific stream and event (#14411)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-21 21:25:49 +08:00 |
|
wangxiyuan
|
d9ac9e3dc5
|
[Misc] fix collect_env version parse (#15267)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2025-04-21 20:29:40 +08:00 |
|
Han Zhang
|
d41faaf9df
|
Restore buffers when wake up from level 2 sleep (#16564) (#16889)
Signed-off-by: Han <zh950713@gmail.com>
|
2025-04-21 20:18:28 +08:00 |
|
Alex Brooks
|
b34f33438a
|
[Doc] Split dummy_processor_inputs() in Multimodal Docs (#16915)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-04-21 11:10:01 +00:00 |
|
Yang Fan
|
26c0406555
|
[Bugfix] Fix distributed bug in Qwen2.5-VL & Qwen2.5-Omni (#16907)
|
2025-04-21 10:25:21 +00:00 |
|
Woosuk Kwon
|
4c41278b77
|
[CI/CD][V1] Add spec decode tests to CI (#16900)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-20 22:37:16 -07:00 |
|
qizixi
|
bb3605db85
|
[Bugfix] Fix v1/spec_decode/test_ngram.py (#16895)
Signed-off-by: qizixi <qizixi@meta.com>
|
2025-04-20 20:54:29 -07:00 |
|
Richard Zou
|
fe742aef5a
|
[easy] Pass compile_fx only the config patches (#16845)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-20 12:25:19 +08:00 |
|
Harry Mellor
|
4b07d36891
|
Improve configs - CacheConfig (#16835)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-20 12:25:04 +08:00 |
|
Staszek Paśko
|
87aaadef73
|
Serialize tensors using int8 views (#16866)
Signed-off-by: Staszek Pasko <staszek@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-04-19 10:28:34 -07:00 |
|
Richard Zou
|
682e0b6d2f
|
Log how much time loading a compiled artifact takes (#16848)
Signed-off-by: rzou <zou3519@gmail.com>
|
2025-04-19 16:50:46 +00:00 |
|
Reid
|
d6195a748b
|
[doc] update hyperlink (#16877)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-19 16:40:38 +00:00 |
|
Cyrus Leung
|
205d84aaa9
|
[VLM] Clean up models (#16873)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-19 12:13:06 +00:00 |
|
Roger Wang
|
5124f5bf51
|
[Model] Qwen2.5-Omni Cleanup (#16872)
|
2025-04-19 09:37:02 +00:00 |
|
Isotr0py
|
83f3c3bd91
|
[Model] Refactor Phi-4-multimodal to use merged processor and support V1 (#15477)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-19 02:26:11 -07:00 |
|
vie-serendipity
|
d9737ca1c6
|
[V1][Misc] stop update prefix cache stats when logs_stats is disabled (#16460)
Signed-off-by: vie-serendipity <2733147505@qq.com>
|
2025-04-19 02:25:19 -07:00 |
|
Nicolò Lucchesi
|
9d4ca19d50
|
[Misc] Benchmarks for audio models (#16505)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-04-19 02:24:14 -07:00 |
|
Nicolò Lucchesi
|
2ef0dc53b8
|
[Frontend] Add sampling params to v1/audio/transcriptions endpoint (#16591)
Signed-off-by: Jannis Schönleber <joennlae@gmail.com>
Signed-off-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Jannis Schönleber <joennlae@gmail.com>
|
2025-04-19 07:03:54 +00:00 |
|
Divakar Verma
|
1d4680fad2
|
[rocm][MI300] llama4 maverick fp8 moe config tp8 (#16847)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-04-19 06:21:43 +00:00 |
|
Yang Fan
|
2c1bd848a6
|
[Model][VLM] Add Qwen2.5-Omni model support (thinker only) (#15130)
Signed-off-by: fyabc <suyang.fy@alibaba-inc.com>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Xiong Wang <wangxiongts@163.com>
|
2025-04-18 23:14:36 -07:00 |
|
omrishiv
|
5c9121203c
|
[release] Publish neuron docker image (#16733)
Signed-off-by: omrishiv <327609+omrishiv@users.noreply.github.com>
|
2025-04-18 17:11:25 -07:00 |
|
Justin Ho
|
490b1698a5
|
[Doc] Updated Llama section in tool calling docs to have llama 3.2 config info (#16857)
Signed-off-by: jmho <jaylenho734@gmail.com>
|
2025-04-18 23:28:53 +00:00 |
|
Reid
|
5a5e29de88
|
[Misc] refactor examples series - Chat Completion Client With Tools (#16829)
Signed-off-by: reidliu41 <reid201711@gmail.com>
Co-authored-by: reidliu41 <reid201711@gmail.com>
|
2025-04-18 23:24:42 +00:00 |
|
wang.yuqi
|
3d3ab3689f
|
[New Model]: Snowflake Arctic Embed (Family) (#16649)
|
2025-04-18 08:11:57 -07:00 |
|
Harry Mellor
|
686623c5e7
|
Fix nullable_kvs fallback (#16837)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-18 05:58:39 -07:00 |
|
Cyrus Leung
|
aadb656562
|
[Misc] Clean up Kimi-VL (#16833)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-18 05:15:09 -07:00 |
|
Jonghyun Choe
|
87e067de41
|
[Model] use AutoWeightsLoader for BigCode, GPT-J (#16823)
Signed-off-by: Jonghyun Choe <andy.choe729@gmail.com>
|
2025-04-18 10:42:41 +00:00 |
|
Michael Yao
|
26507f8973
|
[Docs] Fix a link and grammar issue in production-stack.md (#16809)
Signed-off-by: windsonsea <haifeng.yao@daocloud.io>
|
2025-04-18 06:42:58 +00:00 |
|
Nathan Weinberg
|
9c1d5b456d
|
[Doc] add podman setup instructions for official image (#16796)
Signed-off-by: Nathan Weinberg <nweinber@redhat.com>
|
2025-04-18 06:10:49 +00:00 |
|
Lucia Fang
|
e31045f95c
|
[Bugfix] fix pp for llama4 (#16746)
Signed-off-by: Lu Fang <fanglu@fb.com>
|
2025-04-18 13:51:30 +08:00 |
|
Luka Govedič
|
aaec845f8e
|
[ROCm] [Attention] Cleanup ROCm output passing (#16431)
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
|
2025-04-18 05:46:45 +00:00 |
|
rongfu.leng
|
7bdfd29a35
|
[Misc] add collect_env to cli and docker image (#16759)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-04-17 22:13:35 -07:00 |
|
Harry Mellor
|
e78587a64c
|
Improve-mm-and-pooler-and-decoding-configs (#16789)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-04-17 22:13:32 -07:00 |
|
Lucas Wilkinson
|
7eb4255628
|
[BugFix] Accuracy fix for llama4 int4 - improperly casted scales (#16801)
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
|
2025-04-17 22:13:29 -07:00 |
|
Michael Goin
|
6a0f547561
|
Add hardware print to TPU V1 test (#16792)
|
2025-04-17 22:13:26 -07:00 |
|
Shanshan Shen
|
30ed81b7ca
|
[V1][Structured Output] Minor modification to _validate_structured_output() (#16748)
Signed-off-by: shen-shanshan <467638484@qq.com>
|
2025-04-18 13:12:54 +08:00 |
|
Chauncey
|
7a4a5de729
|
[Misc] Update outdated note: LMCache now supports chunked prefill (#16697)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-04-18 05:12:42 +00:00 |
|
Cyrus Leung
|
c16fb5dae8
|
[Doc] Improve help examples for --compilation-config (#16729)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-04-17 21:22:34 -07:00 |
|