maang-h
|
57e729e874
|
[Doc]: Update OpenAI-Compatible Server documents (#12082)
|
2025-01-15 16:07:45 +00:00 |
|
maang-h
|
87054a57ab
|
[Doc]: Update the Json Example of the Engine Arguments document (#12045)
|
2025-01-14 17:03:04 +00:00 |
|
Jie Fu (傅杰)
|
a4e2b26856
|
[Bugfix] Significant performance drop on CPUs with --num-scheduler-steps > 1 (#11794)
|
2025-01-07 16:15:50 -08:00 |
|
Cyrus Leung
|
ee77fdb5de
|
[Doc][2/N] Reorganize Models and Usage sections (#11755)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 21:40:31 +08:00 |
|
youkaichao
|
b12e87f942
|
[platforms] enable platform plugins (#11602)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-30 20:24:45 +08:00 |
|
Rafael Vasquez
|
32aa2059ad
|
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-23 22:35:38 +00:00 |
|
yansh97
|
94d545a1a1
|
[Doc] Fix typo in the help message of '--guided-decoding-backend' (#11440)
|
2024-12-23 20:20:44 +00:00 |
|
omer-dayan
|
995f56236b
|
[Core] Loading model from S3 using RunAI Model Streamer as optional loader (#10192)
Signed-off-by: OmerD <omer@run.ai>
|
2024-12-20 16:46:24 +00:00 |
|
Yanyi Liu
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
Alexander Matveev
|
fdea8ec167
|
[V1] VLM - enable processor cache by default (#11305)
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
|
2024-12-18 18:54:46 -05:00 |
|
Konrad Zawora
|
866fa4550d
|
[Bugfix] Restore support for larger block sizes (#11259)
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
|
2024-12-17 16:39:07 -08:00 |
|
Cody Yu
|
bf8717ebae
|
[V1] Prefix caching for vision language models (#11187)
Signed-off-by: Cody Yu <hao.yu.cody@gmail.com>
|
2024-12-17 16:37:59 -08:00 |
|
wangxiyuan
|
e88db68cf5
|
[Platform] platform agnostic for EngineArgs initialization (#11225)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2024-12-16 22:11:06 -08:00 |
|
youkaichao
|
551603feff
|
[core] overhaul memory profiling and fix backward compatibility (#10511)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-16 13:32:25 -08:00 |
|
chenqianfzh
|
69ba344de8
|
[Bugfix] Fix block size validation (#10938)
|
2024-12-15 16:38:40 -08:00 |
|
Brad Hilton
|
9c3dadd1c9
|
[Frontend] Add logits_processors as an extra completion argument (#11150)
Signed-off-by: Brad Hilton <brad.hilton.nw@gmail.com>
|
2024-12-14 16:46:42 +00:00 |
|
Gregory Shtrasberg
|
00c1bde5d8
|
[ROCm][AMD] Disable auto enabling chunked prefill on ROCm (#11146)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2024-12-13 05:31:26 +00:00 |
|
Alexander Matveev
|
4e11683368
|
[V1] VLM preprocessor hashing (#11020)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 00:55:30 +00:00 |
|
Cyrus Leung
|
cad5c0a6ed
|
[Doc] Update docs to refer to pooling models (#11093)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 13:36:27 +00:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Woosuk Kwon
|
134810b3d9
|
[V1][Bugfix] Always set enable_chunked_prefill = True for V1 (#11061)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-10 14:41:23 -08:00 |
|
Roger Wang
|
a11f326528
|
[V1] Initial support of multimodal models for V1 re-arch (#10699)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-12-08 12:50:51 +00:00 |
|
youkaichao
|
fd57d2b534
|
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:05:21 +00:00 |
|
Russell Bryant
|
69d357ba12
|
[Core] Cleanup startup logging a bit (#10961)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-07 02:30:23 +00:00 |
|
Cyrus Leung
|
aa39a8e175
|
[Doc] Create a new "Usage" section (#10827)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 11:19:35 +08:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Ricky Xu
|
d9b4b3f069
|
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-27 23:59:28 -08:00 |
|
Michael Goin
|
9a99273b48
|
[Bugfix] Fix using -O[0,3] with LLM entrypoint (#10677)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-26 10:44:01 -08:00 |
|
Ricky Xu
|
519e8e4182
|
[v1] EngineArgs for better config handling for v1 (#10382)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-25 21:09:43 -08:00 |
|
Wallas Henrique
|
c27df94e1f
|
[Bugfix] Fix chunked prefill with model dtype float32 on Turing Devices (#9850)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-11-25 12:23:32 -05:00 |
|
youkaichao
|
25d806e953
|
[misc] add torch.compile compatibility check (#10618)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-24 23:40:08 -08:00 |
|
youkaichao
|
a111d0151f
|
[platforms] absorb worker cls difference into platforms folder (#10555)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2024-11-21 21:00:32 -08:00 |
|
youkaichao
|
7560ae5caf
|
[8/N] enable cli flag without a space (#10529)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-21 12:30:42 -08:00 |
|
Cyrus Leung
|
b4be5a8adb
|
[Bugfix] Enforce no chunked prefill for embedding models (#10470)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-20 05:12:51 +00:00 |
|
youkaichao
|
803f37eaaa
|
[6/N] torch.compile rollout to users (#10437)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-19 10:09:03 -08:00 |
|
Russell Bryant
|
5390d6664f
|
[Doc] Add the start of an arch overview page (#10368)
|
2024-11-19 09:52:11 +00:00 |
|
Cyrus Leung
|
32e46e000f
|
[Frontend] Automatic detection of chat content format from AST (#9919)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-16 13:35:40 +08:00 |
|
shangmingc
|
f2056f726d
|
[Misc] Fix some help info of arg_utils to improve readability (#10362)
|
2024-11-15 12:40:30 +00:00 |
|
Xin Yang
|
26908554b2
|
[Doc] Remove float32 choice from --lora-dtype (#10348)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-11-15 10:22:57 +00:00 |
|
Cyrus Leung
|
2ac6d0e75b
|
[Misc] Consolidate pooler config overrides (#10351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-15 06:59:00 +00:00 |
|
Cyrus Leung
|
972112d82f
|
[Bugfix] Fix unable to load some models (#10312)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-14 16:55:54 -08:00 |
|
Xin Yang
|
032fcf16ae
|
[Doc] Fix typo in arg_utils.py (#10264)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-11-12 21:54:52 -08:00 |
|
Umesh
|
8a06428c70
|
[LoRA] Adds support for bias in LoRA (#5733)
Signed-off-by: Umesh Deshpande <udeshpa@us.ibm.com>
Co-authored-by: Umesh Deshpande <udeshpa@us.ibm.com>
|
2024-11-12 11:08:40 -08:00 |
|
Russell Bryant
|
9cdba9669c
|
[Doc] Update help text for --distributed-executor-backend (#10231)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-11-12 09:55:09 +08:00 |
|
youkaichao
|
73b9083e99
|
[misc] improve cloudpickle registration and tests (#10202)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 00:10:53 +00:00 |
|
Krishna Mandal
|
b09895a618
|
[Frontend][Core] Override HF config.json via CLI (#5836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-09 16:19:27 +00:00 |
|
Flávia Béo
|
aa9078fa03
|
Adds method to read the pooling types from model's files (#9506)
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Co-authored-by: Max de Bayser <mbayser@br.ibm.com>
|
2024-11-07 08:42:40 +00:00 |
|
Konrad Zawora
|
a02a50e6e5
|
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
|
2024-11-06 01:09:10 -08:00 |
|
Chauncey
|
ac6b8f19b9
|
[Frontend] Multi-Modality Support for Loading Local Image Files (#9915)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-04 15:34:57 +00:00 |
|