Akshat Tripathi
|
8bddb73512
|
[Hardware][CPU] Multi-LoRA implementation for the CPU backend (#11100)
Signed-off-by: Akshat Tripathi <akshat@krai.ai>
Signed-off-by: Oleg Mosalov <oleg@krai.ai>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Oleg Mosalov <oleg@krai.ai>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-12 13:01:52 +00:00 |
|
Isotr0py
|
f967e51f38
|
[Model] Initialize support for Deepseek-VL2 models (#11578)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-12 00:17:24 -08:00 |
|
Nicolò Lucchesi
|
d697dc01b4
|
[Bugfix] Fix RobertaModel loading (#11940)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-01-11 14:05:09 +00:00 |
|
Cyrus Leung
|
a991f7d508
|
[Doc] Basic guide for writing unit tests for new models (#11951)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-11 21:27:24 +08:00 |
|
Cyrus Leung
|
7a3a83e3b8
|
[CI/Build] Move model-specific multi-modal processing tests (#11934)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-11 13:50:05 +08:00 |
|
youkaichao
|
899136b857
|
[ci] fix broken distributed-tests-4-gpus (#11937)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-11 09:07:24 +08:00 |
|
Li, Jiang
|
aa1e77a19c
|
[Hardware][CPU] Support MOE models on x86 CPU (#11831)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-01-10 11:07:58 -05:00 |
|
Harry Mellor
|
482cdc494e
|
[Doc] Rename offline inference examples (#11927)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 23:50:29 +08:00 |
|
youkaichao
|
241ad7b301
|
[ci] Fix sampler tests (#11922)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-10 20:45:33 +08:00 |
|
Harry Mellor
|
d85c47d6ad
|
Replace "online inference" with "online serving" (#11923)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-10 12:05:56 +00:00 |
|
Joe Runde
|
ac2f3f7fee
|
[Bugfix] Validate lora adapters to avoid crashing server (#11727)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-10 15:56:36 +08:00 |
|
Chen Zhang
|
cf5f000d21
|
[torch.compile] Hide KV cache behind torch.compile boundary (#11677)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-10 13:14:42 +08:00 |
|
Cyrus Leung
|
b844b99ad3
|
[VLM] Enable tokenized inputs for merged multi-modal processor (#11900)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-10 03:24:00 +00:00 |
|
Cyrus Leung
|
9a228348d2
|
[Misc] Provide correct Pixtral-HF chat template (#11891)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-09 10:19:37 -07:00 |
|
youkaichao
|
bd82872211
|
[ci]try to fix flaky multi-step tests (#11894)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2025-01-09 14:47:29 +00:00 |
|
wangxiyuan
|
405eb8e396
|
[platform] Allow platform specify attention backend (#11609)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
Signed-off-by: Mengqing Cao <cmq0113@163.com>
Co-authored-by: Mengqing Cao <cmq0113@163.com>
|
2025-01-09 21:46:50 +08:00 |
|
Cyrus Leung
|
0bd1ff4346
|
[Bugfix] Override dunder methods of placeholder modules (#11882)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-09 09:02:53 +00:00 |
|
Maximilien de Bayser
|
1fe554bac3
|
treat do_lower_case in the same way as the sentence-transformers library (#11815)
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
|
2025-01-09 11:05:43 +08:00 |
|
Tyler Michael Smith
|
615e4a5401
|
[CI] Turn on basic correctness tests for V1 (#10864)
|
2025-01-08 21:20:44 -05:00 |
|
Robert Shaw
|
56fe4c297c
|
[TPU][Quantization] TPU W8A8 (#11785)
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-08 19:33:29 +00:00 |
|
Harry Mellor
|
aba8d6ee00
|
[Doc] Move examples into categories (#11840)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-01-08 13:09:53 +00:00 |
|
Cyrus Leung
|
2a0596bc48
|
[VLM] Reorganize profiling/processing-related code (#11812)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-08 18:59:58 +08:00 |
|
youkaichao
|
889e662eae
|
[misc] improve memory profiling (#11809)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-01-08 06:36:03 +00:00 |
|
Cyrus Leung
|
8f37be38eb
|
[Bugfix] Comprehensively test and fix LLaVA-NeXT feature size calculation (#11800)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-07 18:25:02 +08:00 |
|
Jee Jee Li
|
b278557935
|
[Kernel][LoRA]Punica prefill kernels fusion (#11234)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: Abatom <abzhonghua@gmail.com>
Co-authored-by: Zhonghua Deng <abatom@163.com>
|
2025-01-07 04:01:39 +00:00 |
|
Cyrus Leung
|
08fb75c72e
|
[Bugfix] Fix LLaVA-NeXT feature size precision error (for real) (#11772)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-07 01:10:54 +00:00 |
|
Roger Wang
|
91b361ae89
|
[V1] Extend beyond image modality and support mixed-modality inference with Llava-OneVision (#11685)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 19:58:16 +00:00 |
|
Chen Zhang
|
e20c92bb61
|
[Kernel] Move attn_type to Attention.__init__() (#11690)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2025-01-07 00:11:28 +08:00 |
|
Jee Jee Li
|
32c9eff2ff
|
[Bugfix][V1] Fix molmo text-only inputs (#11676)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-06 15:22:25 +00:00 |
|
Cyrus Leung
|
996357e480
|
[VLM] Separate out profiling-related logic (#11746)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-06 16:02:21 +08:00 |
|
Rui Qiao
|
022c5c6944
|
[V1] Refactor get_executor_cls (#11754)
|
2025-01-06 07:59:16 +00:00 |
|
cennn
|
9e764e7b10
|
[distributed] remove pynccl's redundant change_state (#11749)
|
2025-01-06 09:05:48 +08:00 |
|
cennn
|
635b897246
|
[distributed] remove pynccl's redundant stream (#11744)
|
2025-01-05 23:09:11 +08:00 |
|
Jee Jee Li
|
47831430cc
|
[Bugfix][V1] Fix test_kv_cache_utils.py (#11738)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-04 16:07:59 +00:00 |
|
Cyrus Leung
|
ba214dffbe
|
[Bugfix] Fix precision error in LLaVA-NeXT (#11735)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 23:45:57 +08:00 |
|
Cyrus Leung
|
eed11ebee9
|
[VLM] Merged multi-modal processors for LLaVA-NeXT-Video and LLaVA-OneVision (#11717)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-04 11:40:53 +00:00 |
|
Yan Burman
|
300acb8347
|
[Core][Bugfix] Use correct device to initialize GPU data during CUDA-graph-capture (#11233)
Signed-off-by: Yan Burman <yanburman@users.noreply.github.com>
Signed-off-by: Ido Asraff <idoa@atero.ai>
|
2025-01-04 14:50:16 +08:00 |
|
xcnick
|
d91457d529
|
[V1] Add kv cache utils tests. (#11513)
Signed-off-by: xcnick <xcnick0412@gmail.com>
|
2025-01-04 14:49:46 +08:00 |
|
Robert Shaw
|
80c751e7f6
|
[V1] Simplify Shutdown (#11659)
|
2025-01-03 17:25:38 +00:00 |
|
Aurick Qiao
|
e1a5c2f0a1
|
[Model] Whisper model implementation (#11280)
Co-authored-by: Aurick Qiao <aurick.qiao@snowflake.com>
|
2025-01-03 16:39:19 +08:00 |
|
Cyrus Leung
|
8c38ee7007
|
[VLM] Merged multi-modal processor for LLaVA-NeXT (#11682)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-01-02 16:39:27 +00:00 |
|
Cyrus Leung
|
a115ac46b5
|
[VLM] Move supported limits and max tokens to merged multi-modal processor (#11669)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-01-01 15:44:42 +00:00 |
|
Woosuk Kwon
|
73001445fb
|
[V1] Implement Cascade Attention (#11635)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-01-01 21:56:46 +09:00 |
|
Jee Jee Li
|
11d8a091c6
|
[Misc] Optimize Qwen2-VL LoRA test (#11663)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-01-01 14:42:23 +08:00 |
|
Cyrus Leung
|
365801fedd
|
[VLM] Add max-count checking in data parser for single image models (#11661)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Roger Wang <ywang@roblox.com>
Co-authored-by: Roger Wang <ywang@roblox.com>
|
2024-12-31 22:15:21 -08:00 |
|
Joe Runde
|
4db72e57f6
|
[Bugfix][Refactor] Unify model management in frontend (#11660)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2025-01-01 02:21:51 +00:00 |
|
Roger Wang
|
e7c7c5e822
|
[V1][VLM] V1 support for selected single-image models. (#11632)
Signed-off-by: Roger Wang <ywang@roblox.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2024-12-31 21:17:22 +00:00 |
|
Chen Zhang
|
8c3230d8c1
|
[V1] Simpify vision block hash for prefix caching by removing offset from hash (#11646)
|
2024-12-31 08:56:01 +00:00 |
|
sakunkun
|
2c5718809b
|
[Bugfix] Move the _touch(computed_blocks) call in the allocate_slots method to after the check for allocating new blocks. (#11565)
|
2024-12-31 06:29:04 +00:00 |
|
John Giorgi
|
82c49d3260
|
[Misc][LoRA] Support Rank Stabilized LoRA (RSLoRA) (#6909)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2024-12-30 22:15:58 -08:00 |
|