Jeremy Arnold
9f3974a319
Fix logging of the vLLM Config ( #11143 )
2024-12-12 12:05:57 -08:00
Cyrus Leung
8f10d5e393
[Misc] Split up pooling tasks ( #10820 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-11 01:28:00 -08:00
youkaichao
ebf778061d
monitor metrics of tokens per step using cudagraph batchsizes ( #11031 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-12-09 22:35:36 -08:00
Cyrus Leung
391d7b2763
[Bugfix] Fix usage of deprecated decorator ( #11025 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-10 13:45:47 +08:00
youkaichao
46004e83a2
[misc] clean up and unify logging ( #10999 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-12-08 17:28:27 -08:00
youkaichao
b031a455a9
[torch.compile] add logging for compilation time ( #10941 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
2024-12-06 10:07:15 +00:00
Xin Yang
01d079fd8e
[LoRA] Change lora_tokenizers capacity ( #10796 )
...
Signed-off-by: Xin Yang <xyang19@gmail.com >
2024-12-04 17:40:16 +00:00
Aaron Pham
9323a3153b
[Core][Performance] Add XGrammar support for guided decoding and set it as default ( #10785 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
Signed-off-by: mgoin <michael@neuralmagic.com >
Co-authored-by: mgoin <michael@neuralmagic.com >
2024-12-03 15:17:00 +08:00
Cyrus Leung
3257d449fa
[Misc] Remove deprecated names ( #10817 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-03 06:52:57 +00:00
Cyrus Leung
d2f058e76c
[Misc] Rename embedding classes to pooling ( #10801 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-12-01 14:36:51 +08:00
Mor Zusman
197b4484a3
[Bugfix][Mamba] Fix Multistep on Mamba-like models ( #10705 )
...
Signed-off-by: mzusman <mor.zusmann@gmail.com >
2024-11-27 19:02:27 +00:00
Ricky Xu
519e8e4182
[v1] EngineArgs for better config handling for v1 ( #10382 )
...
Signed-off-by: rickyx <rickyx@anyscale.com >
2024-11-25 21:09:43 -08:00
Russell Bryant
28598f3939
[Core] remove temporary local variables in LLMEngine.__init__ ( #10577 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2024-11-22 16:22:53 -08:00
youkaichao
803f37eaaa
[6/N] torch.compile rollout to users ( #10437 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-11-19 10:09:03 -08:00
Travis Johnson
272e31c0bd
[Bugfix] Guard for negative counter metrics to prevent crash ( #10430 )
...
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com >
2024-11-19 04:57:10 +00:00
Cyrus Leung
32e46e000f
[Frontend] Automatic detection of chat content format from AST ( #9919 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-16 13:35:40 +08:00
Cyrus Leung
b311efd0bd
[Misc] Fix import error in tensorizer tests and cleanup some code ( #10349 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-15 09:34:17 +00:00
Cyrus Leung
0b8bb86bf1
[1/N] Initial prototype for multi-modal processor ( #10044 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-13 12:39:03 +00:00
harrywu
874f551b36
[Metrics] add more metrics ( #4464 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-12 00:17:38 +08:00
Krishna Mandal
b09895a618
[Frontend][Core] Override HF config.json via CLI ( #5836 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-09 16:19:27 +00:00
Konrad Zawora
a02a50e6e5
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend ( #6143 )
...
Signed-off-by: yuwenzho <yuwen.zhou@intel.com >
Signed-off-by: Chendi.Xue <chendi.xue@intel.com >
Signed-off-by: Bob Zhu <bob.zhu@intel.com >
Signed-off-by: zehao-intel <zehao.huang@intel.com >
Signed-off-by: Konrad Zawora <kzawora@habana.ai >
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com >
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai >
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai >
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai >
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com >
Co-authored-by: Vivek Goel <vgoel@habana.ai >
Co-authored-by: yuwenzho <yuwen.zhou@intel.com >
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai >
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com >
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com >
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai >
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com >
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai >
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com >
Co-authored-by: Ilia Taraban <tarabanil@gmail.com >
Co-authored-by: Chendi.Xue <chendi.xue@intel.com >
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai >
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai >
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com >
Co-authored-by: Sun Choi <schoi@habana.ai >
Co-authored-by: Iryna Boiko <iboiko@habana.ai >
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com >
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com >
Co-authored-by: Zehao Huang <zehao.huang@intel.com >
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com >
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com >
Co-authored-by: Nir David <ndavid@habana.ai >
Co-authored-by: Yu-Zhou <yu.zhou@intel.com >
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai >
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai >
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai >
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu >
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com >
Co-authored-by: Jacek Czaja <jczaja@habana.ai >
Co-authored-by: Yuan <yuan.zhou@outlook.com >
2024-11-06 01:09:10 -08:00
Aaron Pham
21063c11c7
[CI/Build] drop support for Python 3.8 EOL ( #8464 )
...
Signed-off-by: Aaron Pham <contact@aarnphm.xyz >
2024-11-06 07:11:55 +00:00
Cyrus Leung
bbc3619dc8
[Core] Make encoder-decoder inputs a nested structure to be more composable ( #9604 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-11-05 10:07:31 +08:00
tomeras91
ac04a97a9f
[Frontend] Add max_tokens prometheus metric ( #9881 )
...
Signed-off-by: Tomer Asida <tomera@ai21.com >
2024-11-04 22:53:24 +00:00
youkaichao
e893795443
[2/N] executor pass the complete config to worker/modelrunner ( #9938 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2024-11-02 07:35:05 -07:00
youkaichao
18bd7587b7
[1/N] pass the complete config from engine to executor ( #9933 )
...
Signed-off-by: youkaichao <youkaichao@gmail.com >
2024-11-01 13:51:57 -07:00
Joe Runde
031a7995f3
[Bugfix][Frontend] Reject guided decoding in multistep mode ( #9892 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2024-11-01 01:09:46 +00:00
Went-Liang
81f09cfd80
[Model] Support math-shepherd-mistral-7b-prm model ( #9697 )
...
Signed-off-by: Went-Liang <wenteng_liang@163.com >
2024-10-30 09:33:42 -07:00
Joe Runde
67bdf8e523
[Bugfix][Frontend] Guard against bad token ids ( #9634 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
2024-10-29 14:13:20 -07:00
科英
74fc2d77ae
[Misc] Add metrics for request queue time, forward time, and execute time ( #9659 )
2024-10-29 10:32:56 -07:00
Vasiliy Alekseev
07e981fdf4
[Frontend] Bad words sampling parameter ( #9717 )
...
Signed-off-by: Vasily Alexeev <alvasian@yandex.ru >
2024-10-26 16:29:38 +00:00
youkaichao
4fdc581f9e
[core] simplify seq group code ( #9569 )
...
Co-authored-by: Zhuohan Li <zhuohan123@gmail.com >
2024-10-24 00:16:44 -07:00
Vinay R Damodaran
33bab41060
[Bugfix]: Make chat content text allow type content ( #9358 )
...
Signed-off-by: Vinay Damodaran <vrdn@hey.com >
2024-10-24 05:05:49 +00:00
Tyler Michael Smith
e5ac6a4199
[Bugfix] Fix divide by zero when serving Mamba models ( #9617 )
...
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com >
2024-10-23 16:40:43 +00:00
yulei
b17046e298
[BugFix] Fix metrics error for --num-scheduler-steps > 1 ( #8234 )
2024-10-22 15:43:03 -07:00
Ronen Schaffer
cd5601ac37
[BugFix] Prevent exporting duplicate OpenTelemetry spans ( #9017 )
2024-10-22 11:11:53 -07:00
youkaichao
76a5e13270
[core] move parallel sampling out from vllm core ( #9302 )
2024-10-22 00:31:44 +00:00
Kunjan
9bb10a7d27
[MISC] Add lora requests to metrics ( #9477 )
...
Co-authored-by: Kunjan Patel <kunjanp_google_com@vllm.us-central1-a .c.kunjanp-gke-dev-2.internal>
2024-10-18 20:50:18 +00:00
Cyrus Leung
051eaf6db3
[Model] Add user-configurable task for models that support both generation and embedding ( #9424 )
2024-10-18 11:31:58 -07:00
Kuntai Du
81ede99ca4
[Core] Deprecating block manager v1 and make block manager v2 default ( #8704 )
...
Removing the block manager v1. This is the initial piece of prefix-caching-centric design. In order to achieve prefix-caching-centric design, we need to simplify the code path so that we only use v2 block manager (which has much higher performance on prefix caching).
2024-10-17 11:38:15 -05:00
Russell Bryant
776dbd74f1
[CI/Build] mypy: Resolve some errors from checking vllm/engine ( #9267 )
...
Signed-off-by: Russell Bryant <rbryant@redhat.com >
2024-10-16 22:55:59 +00:00
Cyrus Leung
cee711fdbb
[Core] Rename input data types ( #8688 )
2024-10-16 10:49:37 +00:00
youkaichao
cbc2ef5529
[misc] hide best_of from engine ( #9261 )
...
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com >
2024-10-10 21:30:44 -07:00
Alex Brooks
a3691b6b5e
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs ( #9131 )
...
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com >
2024-10-08 14:12:56 +00:00
Varun Sundar Rabindranath
cb3b2b9ba4
[Bugfix] Fix incorrect updates to num_computed_tokens in multi-step scheduling ( #9038 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2024-10-06 12:48:11 -07:00
Roger Wang
26aa325f4f
[Core][VLM] Test registration for OOT multimodal models ( #8717 )
...
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk >
2024-10-04 10:38:25 -07:00
Lily Liu
1570203864
[Spec Decode] (1/2) Remove batch expansion ( #8839 )
2024-10-01 16:04:42 -07:00
Joe Runde
062c89e7c9
[Frontend][Core] Move guided decoding params into sampling params ( #8252 )
...
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com >
Co-authored-by: Nick Hill <nickhill@us.ibm.com >
2024-10-01 09:34:25 +08:00
Sebastian Schoennenbeck
bd429f2b75
[Core] Priority-based scheduling in async engine ( #8850 )
2024-09-27 15:07:10 -07:00
Varun Sundar Rabindranath
c2ec430ab5
[Core] Multi-Step + Single Step Prefills via Chunked Prefill code path ( #8378 )
...
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com >
2024-09-27 13:32:07 -07:00