Commit Graph

12 Commits

Author SHA1 Message Date
Woosuk Kwon
cec418b5df [Model Runner V2] Change Numba AoT to JIT (#29328)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-24 09:34:37 -08:00
Woosuk Kwon
cc313cb73d [Model Runner V2] Implement Single-step Eagle 1 (#29300)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-24 09:32:27 -08:00
Woosuk Kwon
3e1ad40655 [Model Runner V2] Add apply_temperature option to gumbel_sample (#29276)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-23 14:13:00 -08:00
Woosuk Kwon
62d54ba46d [Model Runner V2] Optimize CUDA graph capture time (#29275)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-23 11:15:32 -08:00
Woosuk Kwon
b004c00418 [Model Runner V2] Support spec decoding [1/N] (#29274)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-23 10:09:06 -08:00
Woosuk Kwon
7f12c82fa6 [Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-23 09:42:52 -08:00
Woosuk Kwon
20ee418adc [Model Runner V2] Minor fix for cudagraph_utils (#29256) 2025-11-22 20:12:50 -08:00
Woosuk Kwon
e9056056fb [Model Runner V2] Limit cudagraph size to max decode batch size (#29221)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-21 20:21:35 -08:00
Wentao Ye
1d34eb11e0 [CI] Bug: Fix triton import issue (#29202)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2025-11-21 17:14:49 -08:00
Woosuk Kwon
e9af6ba62a [Model Runner V2] Optimize Gumbel Sampling Kernel (#29210)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-21 15:52:28 -08:00
Woosuk Kwon
1bed891f72 [Chore] Fix pre-commit error after #25266 (#29190) 2025-11-21 10:21:40 -08:00
Woosuk Kwon
30b44a1598 GPU Model Runner V2 (#25266)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
2025-11-21 08:20:55 -08:00