Woosuk Kwon
|
cec418b5df
|
[Model Runner V2] Change Numba AoT to JIT (#29328)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 09:34:37 -08:00 |
|
Woosuk Kwon
|
cc313cb73d
|
[Model Runner V2] Implement Single-step Eagle 1 (#29300)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 09:32:27 -08:00 |
|
Woosuk Kwon
|
3e1ad40655
|
[Model Runner V2] Add apply_temperature option to gumbel_sample (#29276)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 14:13:00 -08:00 |
|
Woosuk Kwon
|
62d54ba46d
|
[Model Runner V2] Optimize CUDA graph capture time (#29275)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 11:15:32 -08:00 |
|
Woosuk Kwon
|
b004c00418
|
[Model Runner V2] Support spec decoding [1/N] (#29274)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 10:09:06 -08:00 |
|
Woosuk Kwon
|
7f12c82fa6
|
[Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 09:42:52 -08:00 |
|
Woosuk Kwon
|
20ee418adc
|
[Model Runner V2] Minor fix for cudagraph_utils (#29256)
|
2025-11-22 20:12:50 -08:00 |
|
Woosuk Kwon
|
e9056056fb
|
[Model Runner V2] Limit cudagraph size to max decode batch size (#29221)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 20:21:35 -08:00 |
|
Wentao Ye
|
1d34eb11e0
|
[CI] Bug: Fix triton import issue (#29202)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-11-21 17:14:49 -08:00 |
|
Woosuk Kwon
|
e9af6ba62a
|
[Model Runner V2] Optimize Gumbel Sampling Kernel (#29210)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 15:52:28 -08:00 |
|
Woosuk Kwon
|
1bed891f72
|
[Chore] Fix pre-commit error after #25266 (#29190)
|
2025-11-21 10:21:40 -08:00 |
|
Woosuk Kwon
|
30b44a1598
|
GPU Model Runner V2 (#25266)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 08:20:55 -08:00 |
|