[Doc] Fix typo in documentation (#14783)

Signed-off-by: yasu52 <tsuguro4649@gmail.com>
This commit is contained in:
yasu52
2025-03-13 20:33:09 -07:00
committed by GitHub
parent d47807ba08
commit 3fb17d26c8
13 changed files with 19 additions and 19 deletions

View File

@@ -1,6 +1,6 @@
# Reinforcement Learning from Human Feedback
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviours.
Reinforcement Learning from Human Feedback (RLHF) is a technique that fine-tunes language models using human-generated preference data to align model outputs with desired behaviors.
vLLM can be used to generate the completions for RLHF. The best way to do this is with libraries like [TRL](https://github.com/huggingface/trl), [OpenRLHF](https://github.com/OpenRLHF/OpenRLHF) and [verl](https://github.com/volcengine/verl).