dpxa
|
4f074fbf53
|
[Misc]Suppress irrelevant exception stack trace information when CUDA… (#11438)
Co-authored-by: shiquan <shiquan>
|
2024-12-24 08:43:39 +00:00 |
|
Rafael Vasquez
|
32aa2059ad
|
[Docs] Convert rST to MyST (Markdown) (#11145)
Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>
|
2024-12-23 22:35:38 +00:00 |
|
Yanyi Liu
|
5aef49806d
|
[Feature] Add load generation config from model (#11164)
Signed-off-by: liuyanyi <wolfsonliu@163.com>
Signed-off-by: Yanyi Liu <wolfsonliu@163.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-12-19 10:50:38 +00:00 |
|
Russell Bryant
|
4863e5fba5
|
[Core] V1: Use multiprocessing by default (#11074)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-12-13 16:27:32 -08:00 |
|
Cyrus Leung
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Cyrus Leung
|
391d7b2763
|
[Bugfix] Fix usage of deprecated decorator (#11025)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-10 13:45:47 +08:00 |
|
youkaichao
|
fd57d2b534
|
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:05:21 +00:00 |
|
Cyrus Leung
|
3257d449fa
|
[Misc] Remove deprecated names (#10817)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-03 06:52:57 +00:00 |
|
Cyrus Leung
|
d2f058e76c
|
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 14:36:51 +08:00 |
|
Michael Goin
|
9a99273b48
|
[Bugfix] Fix using -O[0,3] with LLM entrypoint (#10677)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-11-26 10:44:01 -08:00 |
|
Maximilien de Bayser
|
214efc2c3c
|
Support Cross encoder models (#10400)
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
|
2024-11-24 18:56:20 -08:00 |
|
Woosuk Kwon
|
446c7806b2
|
[Minor] Fix line-too-long (#10563)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-21 19:40:40 -08:00 |
|
youkaichao
|
33e0a2540a
|
[9/N] torch.compile LLM usage (#10552)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-21 19:13:31 -08:00 |
|
Cyrus Leung
|
32e46e000f
|
[Frontend] Automatic detection of chat content format from AST (#9919)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-16 13:35:40 +08:00 |
|
Cyrus Leung
|
b311efd0bd
|
[Misc] Fix import error in tensorizer tests and cleanup some code (#10349)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-15 09:34:17 +00:00 |
|
Cyrus Leung
|
2ac6d0e75b
|
[Misc] Consolidate pooler config overrides (#10351)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-15 06:59:00 +00:00 |
|
Cyrus Leung
|
972112d82f
|
[Bugfix] Fix unable to load some models (#10312)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-14 16:55:54 -08:00 |
|
Robert Shaw
|
6ace6fba2c
|
[V1] AsyncLLM Implementation (#9826)
Signed-off-by: Nick Hill <nickhill@us.ibm.com>
Signed-off-by: rshaw@neuralmagic.com <rshaw@neuralmagic.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-11-11 23:05:38 +00:00 |
|
Krishna Mandal
|
b09895a618
|
[Frontend][Core] Override HF config.json via CLI (#5836)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-09 16:19:27 +00:00 |
|
Joe Runde
|
d58268c56a
|
[V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-06 11:57:35 -08:00 |
|
Chauncey
|
ac6b8f19b9
|
[Frontend] Multi-Modality Support for Loading Local Image Files (#9915)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-04 15:34:57 +00:00 |
|
Roger Wang
|
3ea2dc2ec4
|
[Misc] Remove deprecated arg for cuda graph capture (#9864)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-10-31 07:22:07 +00:00 |
|
Went-Liang
|
81f09cfd80
|
[Model] Support math-shepherd-mistral-7b-prm model (#9697)
Signed-off-by: Went-Liang <wenteng_liang@163.com>
|
2024-10-30 09:33:42 -07:00 |
|
Woosuk Kwon
|
6c5af09b39
|
[V1] Implement vLLM V1 [1/N] (#9289)
|
2024-10-22 01:24:07 -07:00 |
|
Cyrus Leung
|
051eaf6db3
|
[Model] Add user-configurable task for models that support both generation and embedding (#9424)
|
2024-10-18 11:31:58 -07:00 |
|
Nick Hill
|
1ffc8a7362
|
[BugFix] Typing fixes to RequestOutput.prompt and beam search (#9473)
|
2024-10-18 07:19:53 +00:00 |
|
Russell Bryant
|
cdca8994bd
|
[CI/Build] mypy: check vllm/entrypoints (#9194)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-10-09 17:15:28 +00:00 |
|
Alex Brooks
|
a3691b6b5e
|
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:12:56 +00:00 |
|
Brendan Wong
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
Cyrus Leung
|
f22619fe96
|
[Misc] Remove user-facing error for removed VLM args (#9104)
|
2024-10-06 01:33:52 -07:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
Joe Runde
|
062c89e7c9
|
[Frontend][Core] Move guided decoding params into sampling params (#8252)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Nick Hill <nickhill@us.ibm.com>
|
2024-10-01 09:34:25 +08:00 |
|
danieljannai21
|
6c9ba48fde
|
[Frontend] Added support for HF's new continue_final_message parameter (#8942)
|
2024-09-29 17:59:47 +00:00 |
|
Edouard B.
|
090e945e36
|
[Frontend] Make beam search emulator temperature modifiable (#8928)
Co-authored-by: Eduard Balzin <nfunctor@yahoo.fr>
|
2024-09-28 11:30:21 -07:00 |
|
Cyrus Leung
|
3b00b9c26c
|
[Core] renamePromptInputs and inputs (#8876)
|
2024-09-26 20:35:15 -07:00 |
|
Simon Mo
|
4f1ba0844b
|
Revert "rename PromptInputs and inputs with backward compatibility (#8760) (#8810)
|
2024-09-25 10:36:26 -07:00 |
|
Cyrus Leung
|
28e1299e60
|
rename PromptInputs and inputs with backward compatibility (#8760)
|
2024-09-25 09:36:47 -07:00 |
|
Archit Patke
|
6da1ab6b41
|
[Core] Adding Priority Scheduling (#5958)
|
2024-09-24 19:50:50 -07:00 |
|
Andy
|
2529d09b5a
|
[Frontend] Batch inference for llm.chat() API (#8648)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <136131678+ywang96@users.noreply.github.com>
|
2024-09-24 09:44:11 -07:00 |
|
Simon Mo
|
3185fb0cca
|
Revert "[Core] Rename PromptInputs to PromptType, and inputs to prompt" (#8750)
|
2024-09-24 05:45:20 +00:00 |
|
youkaichao
|
0250dd68c5
|
re-implement beam search on top of vllm core (#8726)
Co-authored-by: Brendan Wong <bjwpokemon@gmail.com>
|
2024-09-23 22:08:12 -07:00 |
|
Alex Brooks
|
9b8c8ba119
|
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-23 07:44:48 +00:00 |
|
Cyrus Leung
|
0057894ef7
|
[Core] Rename PromptInputs and inputs(#8673)
|
2024-09-20 19:00:54 -07:00 |
|
Patrick von Platen
|
a54ed80249
|
[Model] Add mistral function calling format to all models loaded with "mistral" format (#8515)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-09-17 17:50:37 +00:00 |
|
sroy745
|
1009e93c5d
|
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631)
|
2024-09-17 07:35:01 -07:00 |
|
Nick Hill
|
551ce01078
|
[Core] Add engine option to return only deltas or final output (#7381)
|
2024-09-12 12:02:00 -07:00 |
|
Cyrus Leung
|
8c054b7a62
|
[Frontend] Clean up type annotations for mistral tokenizer (#8314)
|
2024-09-10 16:49:11 +00:00 |
|
William Lin
|
12dd715807
|
[misc] [doc] [frontend] LLM torch profiler support (#7943)
|
2024-09-06 17:48:48 -07:00 |
|