youkaichao
|
c055747867
|
[model][utils] add extract_layer_index utility function (#10599)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-23 22:22:54 -08:00 |
|
youkaichao
|
eebad39f26
|
[torch.compile] support all attention backends (#10558)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-22 14:04:42 -08:00 |
|
Isotr0py
|
c4e464333e
|
[Misc] Add uninitialized params tracking for AutoWeightsLoader (#10327)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2024-11-18 09:07:46 +08:00 |
|
Roger Wang
|
643ecf7b11
|
[V1] Refactor model executable interface for all text-only language models (#10374)
Signed-off-by: Roger Wang <ywang@roblox.com>
|
2024-11-17 05:18:46 +00:00 |
|
youkaichao
|
f89d18ff74
|
[6/N] pass whole config to inner model (#10205)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-11 06:41:46 +00:00 |
|
youkaichao
|
1a95f10ee7
|
[5/N] pass the whole config to model (#9983)
Signed-off-by: youkaichao <youkaichao@gmail.com>
|
2024-11-09 14:17:28 +08:00 |
|
Joe Runde
|
d58268c56a
|
[V1] Make v1 more testable (#9888)
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
2024-11-06 11:57:35 -08:00 |
|
Aaron Pham
|
21063c11c7
|
[CI/Build] drop support for Python 3.8 EOL (#8464)
Signed-off-by: Aaron Pham <contact@aarnphm.xyz>
|
2024-11-06 07:11:55 +00:00 |
|
Yongzao
|
2b5bf20988
|
[torch.compile] Adding torch compile annotations to some models (#9876)
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2024-11-01 00:25:47 -07:00 |
|
Murali Andoorveedu
|
0f6d7a9a34
|
[Models] Add remaining model PP support (#7168)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
Signed-off-by: Murali Andoorveedu <muralidhar.andoorveedu@centml.ai>
Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2024-10-04 10:56:58 +08:00 |
|
Jee Jee Li
|
e497b8aeff
|
[Misc] Skip loading extra bias for Qwen2-MOE GPTQ models (#8329)
|
2024-09-10 20:59:19 -04:00 |
|
afeldman-nm
|
428dd1445e
|
[Core] Logprobs support in Multi-step (#7652)
|
2024-08-29 19:19:08 -07:00 |
|
Zijian Hu
|
f4fc7337bf
|
[Bugfix] support tie_word_embeddings for all models (#5724)
|
2024-08-19 20:00:04 -07:00 |
|
Dipika Sikka
|
d3bdfd3ab9
|
[Misc] Update Fused MoE weight loading (#7334)
|
2024-08-13 14:57:45 -04:00 |
|
Cyrus Leung
|
7025b11d94
|
[Bugfix] Fix weight loading for Chameleon when TP>1 (#7410)
|
2024-08-13 05:33:41 +00:00 |
|
xuyi
|
1d2e7fb73f
|
[Model] Pipeline parallel support for Qwen2 (#6924)
|
2024-07-31 18:49:51 -07:00 |
|
Robert Shaw
|
fb6af8bc08
|
[ Misc ] Apply MoE Refactor to Deepseekv2 To Support Fp8 (#6417)
|
2024-07-13 20:03:58 -07:00 |
|
Woosuk Kwon
|
e72ae80b06
|
[Bugfix] Support 2D input shape in MoE layer (#6287)
|
2024-07-10 09:03:16 -04:00 |
|
Qubitium-ModelCloud
|
ee93f4f92a
|
[CORE] Quantized lm-head Framework (#4442)
Co-authored-by: Robert Shaw <rshaw@neuralmagic.com>
Co-authored-by: ZX <zx@lbx.dev>
|
2024-07-02 22:25:17 +00:00 |
|
Robert Shaw
|
7c008c51a9
|
[ Misc ] Refactor MoE to isolate Fp8 From Mixtral (#5970)
Co-authored-by: Robert Shaw <rshaw@neuralmagic>
Co-authored-by: Michael Goin <michael@neuralmagic.com>
|
2024-07-02 21:54:35 +00:00 |
|
Murali Andoorveedu
|
c5832d2ae9
|
[Core] Pipeline Parallel Support (#4412)
Signed-off-by: Muralidhar Andoorveedu <muralidhar.andoorveedu@centml.ai>
|
2024-07-02 10:58:08 -07:00 |
|
Cody Yu
|
a3a73ab069
|
[Misc] Load FP8 kv-cache scaling factors from checkpoints (#4893)
The 2nd PR for #4532.
This PR supports loading FP8 kv-cache scaling factors from a FP8 checkpoint (with .kv_scale parameter).
|
2024-05-22 13:28:20 -07:00 |
|
eigenLiu
|
48d5985a08
|
Sync huggingface modifications of qwen Moe model (#4774)
|
2024-05-17 09:43:19 -07:00 |
|
Woosuk Kwon
|
0fca3cdcf2
|
[Misc] Enhance attention selector (#4751)
|
2024-05-13 10:47:25 -07:00 |
|
Cody Yu
|
a62aaf1df5
|
[Misc][Refactor] Generalize linear_method to be quant_method (#4373)
|
2024-04-26 16:41:14 -04:00 |
|
Antoni Baum
|
69e1d2fb69
|
[Core] Refactor model loading code (#4097)
|
2024-04-16 11:34:39 -07:00 |
|
youkaichao
|
63e7176f26
|
[Core][Refactor] move parallel_utils into vllm/distributed (#3950)
[WIP][Core][Refactor] move vllm/model_executor/parallel_utils into vllm/distributed and vllm/device_communicators (#3950)
|
2024-04-10 15:33:30 -07:00 |
|
wenyujin333
|
d6ea427f04
|
[Model] Add support for Qwen2MoeModel (#3346)
|
2024-03-28 15:19:59 +00:00 |
|