Cyrus Leung
|
db7db4aab9
|
[Misc] Consolidate ModelConfig code related to HF config (#10104)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-07 06:00:21 +00:00 |
|
Konrad Zawora
|
a02a50e6e5
|
[Hardware][Intel-Gaudi] Add Intel Gaudi (HPU) inference backend (#6143)
Signed-off-by: yuwenzho <yuwen.zhou@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Signed-off-by: Bob Zhu <bob.zhu@intel.com>
Signed-off-by: zehao-intel <zehao.huang@intel.com>
Signed-off-by: Konrad Zawora <kzawora@habana.ai>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Co-authored-by: Sanju C Sudhakaran <scsudhakaran@habana.ai>
Co-authored-by: Michal Adamczyk <madamczyk@habana.ai>
Co-authored-by: Marceli Fylcek <mfylcek@habana.ai>
Co-authored-by: Himangshu Lahkar <49579433+hlahkar@users.noreply.github.com>
Co-authored-by: Vivek Goel <vgoel@habana.ai>
Co-authored-by: yuwenzho <yuwen.zhou@intel.com>
Co-authored-by: Dominika Olszewska <dolszewska@habana.ai>
Co-authored-by: barak goldberg <149692267+bgoldberg-habana@users.noreply.github.com>
Co-authored-by: Michal Szutenberg <37601244+szutenberg@users.noreply.github.com>
Co-authored-by: Jan Kaniecki <jkaniecki@habana.ai>
Co-authored-by: Agata Dobrzyniewicz <160237065+adobrzyniewicz-habana@users.noreply.github.com>
Co-authored-by: Krzysztof Wisniewski <kwisniewski@habana.ai>
Co-authored-by: Dudi Lester <160421192+dudilester@users.noreply.github.com>
Co-authored-by: Ilia Taraban <tarabanil@gmail.com>
Co-authored-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>
Co-authored-by: Jakub Maksymczuk <jmaksymczuk@habana.ai>
Co-authored-by: Tomasz Zielinski <85164140+tzielinski-habana@users.noreply.github.com>
Co-authored-by: Sun Choi <schoi@habana.ai>
Co-authored-by: Iryna Boiko <iboiko@habana.ai>
Co-authored-by: Bob Zhu <41610754+czhu15@users.noreply.github.com>
Co-authored-by: hlin99 <73271530+hlin99@users.noreply.github.com>
Co-authored-by: Zehao Huang <zehao.huang@intel.com>
Co-authored-by: Andrzej Kotłowski <Andrzej.Kotlowski@intel.com>
Co-authored-by: Yan Tomsinsky <73292515+Yantom1@users.noreply.github.com>
Co-authored-by: Nir David <ndavid@habana.ai>
Co-authored-by: Yu-Zhou <yu.zhou@intel.com>
Co-authored-by: Ruheena Suhani Shaik <rsshaik@habana.ai>
Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>
Co-authored-by: Marcin Swiniarski <mswiniarski@habana.ai>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Jacek Czaja <jacek.czaja@intel.com>
Co-authored-by: Jacek Czaja <jczaja@habana.ai>
Co-authored-by: Yuan <yuan.zhou@outlook.com>
|
2024-11-06 01:09:10 -08:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
Chauncey
|
93dee88f6b
|
[Misc] vllm CLI flags should be ordered for better user readability (#10017)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2024-11-05 18:59:56 +08:00 |
|
youkaichao
|
af7380d83b
|
[torch.compile] fix cpu broken code (#9947)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 23:35:47 -07:00 |
|
sroy745
|
a78dd3303e
|
[Encoder Decoder] Add flash_attn kernel support for encoder-decoder models (#9559)
|
2024-11-01 23:22:49 -07:00 |
|
youkaichao
|
96e0c9cbbd
|
[torch.compile] directly register custom op (#9896)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-31 21:56:09 -07:00 |
|
Alex Brooks
|
cc98f1e079
|
[CI/Build] VLM Test Consolidation (#9372)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-30 09:32:17 -07:00 |
|
youkaichao
|
ff5ed6e1bc
|
[torch.compile] rework compile control with piecewise cudagraph (#9715)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-29 23:03:49 -07:00 |
|
wangshuai09
|
622b7ab955
|
[Hardware] using current_platform.seed_everything (#9785)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-29 14:47:44 +00:00 |
|
wangshuai09
|
4e2d95e372
|
[Hardware][ROCM] using current_platform.is_rocm (#9642)
Signed-off-by: wangshuai09 <391746016@qq.com>
|
2024-10-28 04:07:00 +00:00 |
|
madt2709
|
34a9941620
|
[Bugfix] Fix load config when using bools (#9533)
|
2024-10-27 13:46:41 -04:00 |
|
youkaichao
|
8549c82660
|
[core] cudagraph output with tensor weak reference (#9724)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-10-27 00:19:28 -07:00 |
|
Mengqing Cao
|
5cbdccd151
|
[Hardware][openvino] is_openvino --> current_platform.is_openvino (#9716)
|
2024-10-26 10:59:06 +00:00 |
|
Mengqing Cao
|
2394962d70
|
[Hardware][XPU] using current_platform.is_xpu (#9605)
|
2024-10-23 08:28:21 +00:00 |
|
xendo
|
9dbcce84a7
|
[Neuron] [Bugfix] Fix neuron startup (#9374)
Co-authored-by: Jerzy Zagorski <jzagorsk@amazon.com>
|
2024-10-22 12:51:41 +00:00 |
|
wangshuai09
|
3ddbe25502
|
[Hardware][CPU] using current_platform.is_cpu (#9536)
|
2024-10-22 00:50:43 -07:00 |
|
Travis Johnson
|
b729901139
|
[Bugfix]: serialize config by value for --trust-remote-code (#6751)
Signed-off-by: Travis Johnson <tsjohnso@us.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-10-21 19:46:24 -07:00 |
|
Nick Hill
|
9d9186be97
|
[Frontend] Reduce frequency of client cancellation checking (#7959)
|
2024-10-21 13:28:10 -07:00 |
|
Cyrus Leung
|
051eaf6db3
|
[Model] Add user-configurable task for models that support both generation and embedding (#9424)
|
2024-10-18 11:31:58 -07:00 |
|
Luka Govedič
|
0f41fbe5a3
|
[torch.compile] Fine-grained CustomOp enabling mechanism (#9300)
|
2024-10-17 18:36:37 +00:00 |
|
Wallas Henrique
|
8baf85e4e9
|
[Doc] Compatibility matrix for mutual exclusive features (#8512)
Signed-off-by: Wallas Santos <wallashss@ibm.com>
|
2024-10-11 11:18:50 -07:00 |
|
Alex Brooks
|
a3691b6b5e
|
[Core][Frontend] Add Support for Inference Time mm_processor_kwargs (#9131)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-10-08 14:12:56 +00:00 |
|
Brendan Wong
|
8c746226c9
|
[Frontend] API support for beam search for MQLLMEngine (#9117)
|
2024-10-08 05:51:43 +00:00 |
|
Cyrus Leung
|
8c6de96ea1
|
[Model] Explicit interface for vLLM models and support OOT embedding models (#9108)
|
2024-10-07 06:10:35 +00:00 |
|
youkaichao
|
18b296fdb2
|
[core] remove beam search from the core (#9105)
|
2024-10-07 05:47:04 +00:00 |
|
Brendan Wong
|
168cab6bbf
|
[Frontend] API support for beam search (#9087)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-10-05 23:39:03 -07:00 |
|
Andy Dai
|
5df1834895
|
[Bugfix] Fix order of arguments matters in config.yaml (#8960)
|
2024-10-05 17:35:11 +00:00 |
|
Cyrus Leung
|
26a68d5d7e
|
[CI/Build] Add test decorator for minimum GPU memory (#8925)
|
2024-09-29 02:50:51 +00:00 |
|
Russell Bryant
|
d1537039ce
|
[Core] Improve choice of Python multiprocessing method (#8823)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-29 09:17:07 +08:00 |
|
Tyler Michael Smith
|
e2f6f26e86
|
[Bugfix] Fix print_warning_once's line info (#8867)
|
2024-09-26 16:18:26 -07:00 |
|
Russell Bryant
|
b05f5c9238
|
[Core] Allow IPv6 in VLLM_HOST_IP with zmq (#8575)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2024-09-23 12:15:41 -07:00 |
|
Alex Brooks
|
9b8c8ba119
|
[Core][Frontend] Support Passing Multimodal Processor Kwargs (#8657)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2024-09-23 07:44:48 +00:00 |
|
Huazhong Ji
|
ca2b628b3c
|
[MISC] rename CudaMemoryProfiler to DeviceMemoryProfiler (#8703)
|
2024-09-22 10:44:09 -07:00 |
|
Kunshang Ji
|
d4bf085ad0
|
[MISC] add support custom_op check (#8557)
Co-authored-by: youkaichao <youkaichao@126.com>
|
2024-09-20 19:03:55 -07:00 |
|
Cyrus Leung
|
6ffa3f314c
|
[CI/Build] Avoid CUDA initialization (#8534)
|
2024-09-18 10:38:11 +00:00 |
|
sroy745
|
1009e93c5d
|
[Encoder decoder] Add cuda graph support during decoding for encoder-decoder models (#7631)
|
2024-09-17 07:35:01 -07:00 |
|
Simon Mo
|
546034b466
|
[refactor] remove triton based sampler (#8524)
|
2024-09-16 20:04:48 -07:00 |
|
Nick Hill
|
acd5511b6d
|
[BugFix] Fix clean shutdown issues (#8492)
|
2024-09-16 09:33:46 -07:00 |
|
Kevin Lin
|
295c4730a8
|
[Misc] Raise error when using encoder/decoder model with cpu backend (#8355)
|
2024-09-12 05:45:24 +00:00 |
|
Jiaxin Shan
|
db3bf7c991
|
[Core] Support load and unload LoRA in api server (#6566)
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-09-05 18:10:33 -07:00 |
|
Kaunil Dhruv
|
058344f89a
|
[Frontend]-config-cli-args (#7737)
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Kaunil Dhruv <kaunil_dhruv@intuit.com>
|
2024-08-30 08:21:02 -07:00 |
|
bnellnm
|
c166e7e43e
|
[Bugfix] Allow ScalarType to be compiled with pytorch 2.3 and add checks for registering FakeScalarType and dynamo support. (#7886)
|
2024-08-27 23:13:45 -04:00 |
|
youkaichao
|
b74a125800
|
[ci] try to log process using the port to debug the port usage (#7711)
|
2024-08-20 17:41:12 -07:00 |
|
Cyrus Leung
|
3f674a49b5
|
[VLM][Core] Support profiling with multiple multi-modal inputs per prompt (#7126)
|
2024-08-14 17:55:42 +00:00 |
|
Woosuk Kwon
|
d6e634f3d7
|
[TPU] Suppress import custom_ops warning (#7458)
|
2024-08-13 00:30:30 -07:00 |
|
youkaichao
|
4d2dc5072b
|
[hardware] unify usage of is_tpu to current_platform.is_tpu() (#7102)
|
2024-08-13 00:16:42 -07:00 |
|
Cyrus Leung
|
9ba85bc152
|
[mypy] Misc. typing improvements (#7417)
|
2024-08-13 09:20:20 +08:00 |
|
sasha0552
|
91294d56e1
|
[Bugfix] Handle PackageNotFoundError when checking for xpu version (#7398)
|
2024-08-12 16:07:20 -07:00 |
|
Cyrus Leung
|
4ddc4743d7
|
[Core] Consolidate GB constant and enable float GB arguments (#7416)
|
2024-08-12 14:14:14 -07:00 |
|