Logo
Explore Help
Register Sign In
biondizzle/vllm
1
0
Fork 0
You've already forked vllm
Code Issues Pull Requests Actions 2 Packages Projects Releases Wiki Activity
Files
d215d1efca7a18eb2a19007f229bbb070bfbee93
vllm/vllm/v1/spec_decode
History
Matthias Gehre a889b7f584 [Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
Signed-off-by: Matthias Gehre <matthias.gehre@amd.com>
2026-03-25 11:42:58 +00:00
..
__init__.py
[V1][BugFix] Add __init__.py to v1/spec_decode/ (#13359)
2025-02-16 09:39:08 -08:00
draft_model.py
[Spec Decode] Unified Parallel Drafting (#32887)
2026-02-05 12:37:18 -05:00
eagle.py
[Bugfix] Pass drafter quant_config to ParallelLMHead in Eagle3 (#37280)
2026-03-25 11:42:58 +00:00
extract_hidden_states.py
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951)
2026-03-23 15:37:22 -04:00
medusa.py
[CI] Enable mypy import following for vllm/spec_decode (#33282)
2026-01-30 06:43:32 +00:00
metadata.py
[V1][spec decode] return logprobs for spec decoding (#26060)
2025-10-22 22:59:59 -07:00
metrics.py
[Metrics] Some small refactoring for better maintainability (#33898)
2026-03-20 16:11:34 +00:00
ngram_proposer_gpu.py
[Bugfix] dtype mismatch in ngram gpu propose (#37246)
2026-03-17 05:19:55 +00:00
ngram_proposer.py
[Performance] Split FlashAttn attention and cache update (#25954)
2026-01-23 17:28:06 -08:00
suffix_decoding.py
[CI] Enable mypy import following for vllm/spec_decode (#33282)
2026-01-30 06:43:32 +00:00
utils.py
[Async][Spec Decoding] Zero-bubble async scheduling + spec decoding (#32951)
2026-03-23 15:37:22 -04:00
Powered by Gitea Version: 1.25.2 Page: 742ms Template: 9ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API