[Doc] Link to RFC for pooling optimizations (#21806)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
This commit is contained in:
@@ -7,9 +7,9 @@ These models use a [Pooler][vllm.model_executor.layers.pooler.Pooler] to extract
|
|||||||
before returning them.
|
before returning them.
|
||||||
|
|
||||||
!!! note
|
!!! note
|
||||||
We currently support pooling models primarily as a matter of convenience.
|
We currently support pooling models primarily as a matter of convenience. This is not guaranteed to have any performance improvement over using HF Transformers / Sentence Transformers directly.
|
||||||
As shown in the [Compatibility Matrix](../features/compatibility_matrix.md), most vLLM features are not applicable to
|
|
||||||
pooling models as they only work on the generation or decode stage, so performance may not improve as much.
|
We are now planning to optimize pooling models in vLLM. Please comment on <gh-issue:21796> if you have any suggestions!
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user