Cyrus Leung
|
64bc09ba27
|
[Core] Enable inputs_embeds_size separate from hidden_size (#29741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-30 17:31:12 +08:00 |
|
Woosuk Kwon
|
6afc0ffaf6
|
[Model Runner V2] Add sample/ directory and reorganize files (#29719)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-29 00:41:01 -08:00 |
|
Woosuk Kwon
|
4a80ad0a25
|
[Model Runner V2] Don't use UVA buffer for prefill_len (#29713)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 20:27:16 -08:00 |
|
Woosuk Kwon
|
ca1b1e7296
|
[Model Runner V2] Refactor prefill token preparation (#29712)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 19:49:17 -08:00 |
|
Woosuk Kwon
|
1dcafb3dea
|
[Model Runner V2] Support penalties using bin counts (#29703)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-28 17:53:17 -08:00 |
|
Woosuk Kwon
|
da3222f371
|
[Model Runner V2] Implement multi-step Eagle with CUDA graph (#29559)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-27 00:09:41 -08:00 |
|
Woosuk Kwon
|
ee80aee1ca
|
[Model Runner V2] Minor cleanup for build_attn_metadata (#29576)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-26 20:10:12 -08:00 |
|
Woosuk Kwon
|
0aeb698b77
|
[Model Runner V2] Minor code cleanup (#29570)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-26 19:47:17 -08:00 |
|
Woosuk Kwon
|
f32c7d6f54
|
[Model Runner V2] Simplify Eagle bookkeeping with num_rejected (#29347)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 13:54:59 -08:00 |
|
Woosuk Kwon
|
cec418b5df
|
[Model Runner V2] Change Numba AoT to JIT (#29328)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 09:34:37 -08:00 |
|
Woosuk Kwon
|
cc313cb73d
|
[Model Runner V2] Implement Single-step Eagle 1 (#29300)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-24 09:32:27 -08:00 |
|
Woosuk Kwon
|
62d54ba46d
|
[Model Runner V2] Optimize CUDA graph capture time (#29275)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 11:15:32 -08:00 |
|
Woosuk Kwon
|
b004c00418
|
[Model Runner V2] Support spec decoding [1/N] (#29274)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 10:09:06 -08:00 |
|
Woosuk Kwon
|
7f12c82fa6
|
[Model Runner V2] Change bookkeeping logic in preparation for spec decoding (#29194)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-23 09:42:52 -08:00 |
|
Woosuk Kwon
|
20ee418adc
|
[Model Runner V2] Minor fix for cudagraph_utils (#29256)
|
2025-11-22 20:12:50 -08:00 |
|
Woosuk Kwon
|
1bed891f72
|
[Chore] Fix pre-commit error after #25266 (#29190)
|
2025-11-21 10:21:40 -08:00 |
|
Woosuk Kwon
|
30b44a1598
|
GPU Model Runner V2 (#25266)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-11-21 08:20:55 -08:00 |
|