Zhewen Li
|
93c8672ceb
|
[Bugfix] Fix spec decode memory regression after #28549 (#28819)
Signed-off-by: zhewenli <zhewenli@meta.com>
|
2025-11-20 19:05:50 +08:00 |
|
Shreyas Kulkarni
|
95ae50b7d1
|
[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435)
Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>
|
2025-11-17 15:01:34 -08:00 |
|
Eldar Kurtić
|
e439c784fa
|
Add support for Eagle with separate lm-head and embed_tokens layers (#28549)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-11-15 06:12:02 -08:00 |
|
Harry Mellor
|
97d1c99302
|
Rename clashing method names for vLLM model protocol (#27583)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-12 19:14:33 -08:00 |
|
Rahul Tuli
|
e3b90c1ba2
|
[Bugfix][Speculative Decoding] Extend Eagle quantization config fix to llama_eagle.py (#26590)
Signed-off-by: Rahul Tuli <rtuli@redhat.com>
|
2025-10-13 17:17:13 +00:00 |
|
Harry Mellor
|
8fcaaf6a16
|
Update Optional[x] -> x | None and Union[x, y] to x | y (#26633)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-12 09:51:31 -07:00 |
|
Harry Mellor
|
d6953beb91
|
Convert formatting to use ruff instead of yapf + isort (#26247)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-10-05 07:06:22 -07:00 |
|
Tyler Michael Smith
|
a5354b3ed2
|
[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2025-09-27 14:22:28 +00:00 |
|
Cyrus Leung
|
27d7638b94
|
[Bugfix] Merge MM embeddings by index instead of token IDs (#16229)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: NickLucche <nlucches@redhat.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-09-27 08:15:12 +00:00 |
|
jiahanc
|
d5944d5146
|
[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-09-23 15:44:35 -04:00 |
|
Chen Ding
|
1a0a04dae9
|
[Perf] Optimize memory peak during EAGLE model loading. (#24585)
Signed-off-by: Chen Ding <candy.dc@alibaba-inc.com>
|
2025-09-19 03:31:16 +00:00 |
|
zhiweiz
|
9e0726e5bf
|
[Meta] Official Eagle mm support, first enablement on llama4 (#20788)
Signed-off-by: morgendave <morgendave@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.me>
|
2025-07-31 10:35:07 -07:00 |
|
Benjamin Chislett
|
3465b87ef8
|
[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033)
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
|
2025-06-05 19:10:08 -07:00 |
|
Simon Mo
|
02f0c7b220
|
[Misc] Add SPDX-FileCopyrightText (#19100)
Signed-off-by: simon-mo <simon.mo@hey.com>
|
2025-06-03 11:20:17 -07:00 |
|
Mark McLoughlin
|
c6b636f9fb
|
[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273)
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
|
2025-05-23 02:05:44 +00:00 |
|
Harry Mellor
|
26d0419309
|
Update deprecated type hinting in models (#18132)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-05-14 22:06:50 -07:00 |
|
Ekagra Ranjan
|
418d2f8bfb
|
[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326)
Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-05-14 12:31:46 -07:00 |
|
Bryan Lu
|
70788bdbdc
|
[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-29 21:10:00 +00:00 |
|
Woosuk Kwon
|
b278911229
|
[Minor][Models] Fix Return Types of Llama & Eagle (#17220)
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
|
2025-04-25 21:54:47 -07:00 |
|
Benjamin Chislett
|
a0e619e62a
|
[V1][Spec Decode] EAGLE-3 Support (#16937)
Signed-off-by: Bryan Lu <yuzhelu@amazon.com>
Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>
Co-authored-by: Bryan Lu <yuzhelu@amazon.com>
|
2025-04-25 15:43:07 -07:00 |
|
Lily Liu
|
e8224f3dca
|
[V1][Spec Decode] Eagle Model loading (#16035)
Signed-off-by: LiuXiaoxuanPKU <lilyliupku@gmail.com>
|
2025-04-10 11:21:48 -07:00 |
|