Cyrus Leung
|
0920ab9131
|
[Doc] Reorganize online pooling APIs (#11172)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-14 00:22:22 +08:00 |
|
Sungjae Lee
|
c31d4a57a6
|
[Core] support LoRA and prompt adapter in content-based hashing for Block Manager v2 prefix caching (#8240)
|
2024-12-13 07:51:25 -08:00 |
|
Cyrus Leung
|
eeec9e3390
|
[Frontend] Separate pooling APIs in offline inference (#11129)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-13 10:40:07 +00:00 |
|
youkaichao
|
be39e3cd18
|
[core] clean up cudagraph batchsize padding logic (#10996)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-13 06:57:50 +00:00 |
|
Pooya Davoodi
|
1efce68605
|
[Bugfix] Use runner_type instead of task in GritLM (#11144)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2024-12-13 04:09:53 +00:00 |
|
Luka Govedič
|
30870b4f66
|
[torch.compile] Dynamic fp8 + rms_norm fusion (#10906)
Signed-off-by: luka <luka@neuralmagic.com>
Co-authored-by: Varun Sundar Rabindranath <varun@neuralmagic.com>
|
2024-12-13 03:19:23 +00:00 |
|
Cody Yu
|
78ed8f57d8
|
[Misc][V1] Fix type in v1 prefix caching (#11151)
|
2024-12-13 00:57:40 +00:00 |
|
Jiaxin Shan
|
85362f028c
|
[Misc][LoRA] Ensure Lora Adapter requests return adapter name (#11094)
Signed-off-by: Jiaxin Shan <seedjeffwan@gmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-12 09:25:16 +00:00 |
|
youkaichao
|
62de37a38e
|
[core][distributed] initialization from StatelessProcessGroup (#10986)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-12 09:04:19 +00:00 |
|
Pooya Davoodi
|
1da8f0e1dd
|
[Model] Add support for embedding model GritLM (#10816)
Signed-off-by: Pooya Davoodi <pooya.davoodi@parasail.io>
|
2024-12-12 06:39:16 +00:00 |
|
Alexander Matveev
|
4e11683368
|
[V1] VLM preprocessor hashing (#11020)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: Alexander Matveev <alexm@neuralmagic.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-12 00:55:30 +00:00 |
|
Cyrus Leung
|
d1e21a979b
|
[CI/Build] Split up VLM tests (#11083)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-12 06:18:16 +08:00 |
|
Cyrus Leung
|
8f10d5e393
|
[Misc] Split up pooling tasks (#10820)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 01:28:00 -08:00 |
|
Cyrus Leung
|
2e33fe4191
|
[CI/Build] Check transformers v4.47 (#10991)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-11 05:02:02 +00:00 |
|
Mor Zusman
|
ffa48c9146
|
[Model] PP support for Mamba-like models (#10992)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-12-10 21:53:37 -05:00 |
|
Aurick Qiao
|
d5c5154fcf
|
[Misc] LoRA + Chunked Prefill (#9057)
|
2024-12-11 10:09:20 +08:00 |
|
Jee Jee Li
|
d05f88679b
|
[Misc][LoRA] Add PEFTHelper for LoRA (#11003)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-10 11:12:01 +00:00 |
|
youkaichao
|
ebf778061d
|
monitor metrics of tokens per step using cudagraph batchsizes (#11031)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-09 22:35:36 -08:00 |
|
Tyler Michael Smith
|
28b3a1c7e5
|
[V1] Multiprocessing Tensor Parallel Support for v1 (#9856)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-10 06:28:14 +00:00 |
|
Isotr0py
|
a811dd6608
|
[Model] merged input processor for Phi-3-Vision models (#10977)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2024-12-09 12:55:10 -08:00 |
|
Jee Jee Li
|
ca871491ed
|
[Misc][LoRA] Abstract PunicaWrapper (#10955)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-09 12:54:44 -08:00 |
|
youkaichao
|
fd57d2b534
|
[torch.compile] allow candidate compile sizes (#10984)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-08 11:05:21 +00:00 |
|
zhou fan
|
78029b34ed
|
[BugFix][Kernel]: fix illegal memory access in causal_conv1d when conv_states is None (#10928)
Signed-off-by: xffxff <1247714429@qq.com>
|
2024-12-08 01:21:18 +08:00 |
|
Cyrus Leung
|
c889d5888b
|
[Doc] Explicitly state that PP isn't compatible with speculative decoding yet (#10975)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:20:49 +00:00 |
|
Cyrus Leung
|
39e227c7ae
|
[Model] Update multi-modal processor to support Mantis(LLaVA) model (#10711)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-07 17:10:05 +00:00 |
|
Cyrus Leung
|
955fa9533a
|
[3/N] Support and implement merged input processor for LLaVA model (#10676)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-07 00:50:58 -08:00 |
|
Cyrus Leung
|
222f5b082a
|
[CI/Build] Fix broken multimodal test (#10950)
|
2024-12-06 10:41:23 +00:00 |
|
youkaichao
|
9743d64e4e
|
[ci][build] add tests for python only compilation (#10915)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-05 08:54:47 -08:00 |
|
Isotr0py
|
998eeafe58
|
[CI/Build] Bump test transformers version (#10106)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-05 16:05:52 +00:00 |
|
Jee Jee Li
|
571da8fc43
|
[Misc][LoRA] Clean up the function interface of Punica (#10917)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-05 13:22:28 +00:00 |
|
Michael Goin
|
8d370e91cb
|
[Bugfix] Fallback to outlines for complex json schemas (#10899)
Signed-off-by: mgoin <michael@neuralmagic.com>
|
2024-12-05 11:14:06 +08:00 |
|
Xin Yang
|
01d079fd8e
|
[LoRA] Change lora_tokenizers capacity (#10796)
Signed-off-by: Xin Yang <xyang19@gmail.com>
|
2024-12-04 17:40:16 +00:00 |
|
Tyler Michael Smith
|
d2bd88b122
|
[CI/Build] Replace mean with torch.all in test_pynccl.py (#10876)
Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2024-12-04 03:23:21 +00:00 |
|
Alexander Matveev
|
3bc94cab69
|
[V1] VLM - Run the mm_mapper preprocessor in the frontend process (#10640)
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-03 10:33:10 +00:00 |
|
Aaron Pham
|
9323a3153b
|
[Core][Performance] Add XGrammar support for guided decoding and set it as default (#10785)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
Signed-off-by: mgoin <michael@neuralmagic.com>
Co-authored-by: mgoin <michael@neuralmagic.com>
|
2024-12-03 15:17:00 +08:00 |
|
youkaichao
|
dc5ce861bf
|
[torch.compile] remove compilation_context and simplify code (#10838)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 06:19:02 +00:00 |
|
youkaichao
|
21fe7b481a
|
[core][distributed] add pynccl broadcast (#10843)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-12-03 04:53:23 +00:00 |
|
Jee Jee Li
|
b45f0d7946
|
[Misc][LoRA] Move the implementation of lora bias to punica.py (#10829)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-02 17:53:36 +00:00 |
|
zhou fan
|
ef31eabc68
|
[Model]: add some tests for aria model (#10770)
Signed-off-by: xffxff <1247714429@qq.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-02 05:36:36 +00:00 |
|
Woosuk Kwon
|
073a4bd1c0
|
[Kernel] Use out arg in flash_attn_varlen_func (#10811)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-12-01 17:55:39 -08:00 |
|
Kuntai Du
|
0590ec3fd9
|
[Core] Implement disagg prefill by StatelessProcessGroup (#10502)
This PR provides initial support for single-node disaggregated prefill in 1P1D scenario.
Signed-off-by: KuntaiDu <kuntai@uchicago.edu>
Co-authored-by: ApostaC <yihua98@uchicago.edu>
Co-authored-by: YaoJiayi <120040070@link.cuhk.edu.cn>
|
2024-12-01 19:01:00 -06:00 |
|
Cyrus Leung
|
d2f058e76c
|
[Misc] Rename embedding classes to pooling (#10801)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 14:36:51 +08:00 |
|
Cyrus Leung
|
133707123e
|
[Model] Replace embedding models with pooling adapter (#10769)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-12-01 08:02:54 +08:00 |
|
Cyrus Leung
|
fa6ecb9aa7
|
[Model] Clean up MiniCPMV (#10751)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-11-29 04:47:06 +00:00 |
|
sixgod
|
5fc5ce0fe4
|
[Model] Added GLM-4 series hf format model support vllm==0.6.4 (#10561)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2024-11-28 14:53:31 +00:00 |
|
Woosuk Kwon
|
a79b122400
|
[V1] Do not allocate beyond the max_model_len (#10730)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2024-11-28 00:13:15 -08:00 |
|
Ricky Xu
|
d9b4b3f069
|
[Bug][CLI] Allow users to disable prefix caching explicitly (#10724)
Signed-off-by: rickyx <rickyx@anyscale.com>
|
2024-11-27 23:59:28 -08:00 |
|
tomeras91
|
395b1c7454
|
[Frontend] don't block event loop in tokenization (preprocess) in OpenAI compatible server (#10635)
Signed-off-by: Tomer Asida <tomera@ai21.com>
|
2024-11-27 13:21:10 -08:00 |
|
Mor Zusman
|
197b4484a3
|
[Bugfix][Mamba] Fix Multistep on Mamba-like models (#10705)
Signed-off-by: mzusman <mor.zusmann@gmail.com>
|
2024-11-27 19:02:27 +00:00 |
|
youkaichao
|
308cc5e21e
|
[ci] fix slow tests (#10698)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-27 09:26:14 -08:00 |
|