Commit Graph

9 Commits

Author SHA1 Message Date
Woosuk Kwon
86ac7bcf84 [Model Runner V2] Support pooling models (#35120)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-02-27 18:03:01 -08:00
Nick Hill
40b2f1c3d9 [Model Runner V2] Minor CPU optimizations (#34856)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-19 16:05:37 -08:00
Nick Hill
e535d90deb [ModelRunner V2] Misc minor simplifications and optimizations (#33467)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-02-01 22:17:14 +00:00
Nick Hill
6bf3b46d78 [ModelRunner V2] Misc code simplification and cleanup (#33266)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
2026-01-28 14:41:23 -08:00
Woosuk Kwon
d471b2aff0 [Model Runner V2] Support num NaNs in logits (#30187)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-12-09 10:00:49 -08:00
Woosuk Kwon
ae0ce1be27 [Model Runner V2][BugFix] Keep reference to GPU tensors in AsyncOutput (#29623)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-27 12:38:53 -08:00
Woosuk Kwon
7f12c82fa6 [Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-23 09:42:52 -08:00
Woosuk Kwon
1bed891f72 [Chore] Fix pre-commit error after #25266 (#29190) 2025-11-21 10:21:40 -08:00
Woosuk Kwon
30b44a1598 GPU Model Runner V2 (#25266)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-21 08:20:55 -08:00