biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Benjamin Chislett	af3162d3aa	[Spec Decode] Unified Parallel Drafting (#32887 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-02-05 12:37:18 -05:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Hanjie Qiu	5f9679a43b	[Spec Decode] Add support for EAGLE3 heads that do not use_aux_hidden_states (#27688 ) Signed-off-by: hjjq <hanjieq@nvidia.com> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-11-24 20:13:12 -05:00
Izzy Putterman	02f5903b84	Eagle: MM Cuda Graphs with MRope (#28896 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-19 15:01:05 -05:00
Shreyas Kulkarni	95ae50b7d1	[Quantization] [Eagle] Add complete quantization support to the draft model in Eagle (#28435 ) Signed-off-by: Shreyas Kulkarni <shreyas.gp269@gmail.com>	2025-11-17 15:01:34 -08:00
Eldar Kurtić	e439c784fa	Add support for Eagle with separate lm-head and embed_tokens layers (#28549 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>	2025-11-15 06:12:02 -08:00
Harry Mellor	97d1c99302	Rename clashing method names for vLLM model protocol (#27583 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 19:14:33 -08:00
Jee Jee Li	9d1c474704	[LoRA][1/N]Remove LoRA extra vocab (#28382 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-11 11:06:21 -08:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Lucas Wilkinson	29255cfc3b	[Spec-Decode] Support piecewise cudagraphs for Eagle head (#25109 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>	2025-10-10 01:20:31 -04:00
Rahul Tuli	cf4cd6c24f	Add: Support for multiple hidden layers in Eagle3 (#26164 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-09 07:30:50 +00:00
Rahul Tuli	05f6846ede	Support llama3 eagle3 head with llama4 verifier (#25961 ) Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-06 13:56:08 -04:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Rahul Tuli	145ac73317	[Bugfix][Speculative Decoding] Fix Eagle3 quantization config issue (#25883 ) Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-09-29 11:37:20 -04:00
Tyler Michael Smith	a5354b3ed2	[Bugfix][WideEP] Apply TP Attn + EP MoE fix to other models (#24982 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2025-09-27 14:22:28 +00:00
Cyrus Leung	27d7638b94	[Bugfix] Merge MM embeddings by index instead of token IDs (#16229 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: NickLucche <nlucches@redhat.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: NickLucche <nlucches@redhat.com> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-09-27 08:15:12 +00:00
WeiQing Chen	f1d53d150c	[Multimodal][Speculative Decoding]Eagle Eagle3 mm support, enablement on qwen2.5vl (#22872 ) Signed-off-by: Junhong <liujunhong11@huawei.com> Signed-off-by: Junhong Liu <98734602+LJH-LBJ@users.noreply.github.com> Co-authored-by: Junhong <liujunhong11@huawei.com> Co-authored-by: LJH-LBJ <98734602+LJH-LBJ@users.noreply.github.com>	2025-09-27 03:35:47 +00:00
jiahanc	d5944d5146	[Speculators][Speculative Decoding] Fix gpt-oss eagle3 accuracy issue (#25406 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-09-23 15:44:35 -04:00
Woosuk Kwon	1c3ffdbecc	[V0 Deprecation] Remove V0 sampling metadata (#25345 ) Signed-off-by: Woosuk Kwon <woosuk@thinkingmachines.ai>	2025-09-21 10:37:11 -07:00
whx	4a9375fe9d	[Model] Pass param prefix to LLMHead (#24862 ) Signed-off-by: whx-sjtu <2952154980@qq.com>	2025-09-17 16:01:27 +08:00
vllmellm	8c54610265	[Bug] [Spec Dec]: Fix kv_cache dtype mismatch for Eagle3 drafter on FP8 target (#24505 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-09-16 04:45:38 +00:00
Wenlong Wang	53b42f4102	[BugFix][Spec Decode] Fix out-of-range index triggered by eagle3; re-enable test for LlamaForCausalLMEagle3 (#24392 ) Signed-off-by: wwl2755 <wangwenlong2755@gmail.com>	2025-09-09 21:24:23 -07:00
Dipika Sikka	dfbc1f8880	[Speculative Decoding] Add `speculators` config support (#21345 )	2025-08-01 08:25:18 -04:00
zhiweiz	9e0726e5bf	[Meta] Official Eagle mm support, first enablement on llama4 (#20788 ) Signed-off-by: morgendave <morgendave@gmail.com> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-07-31 10:35:07 -07:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Benjamin Chislett	1bc86a3da1	[Bugfix] Fix EAGLE3 broken logits (#18909 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-05-31 19:58:07 -07:00
Benjamin Chislett	583507d130	[Spec Decode] Make EAGLE3 draft token ID mapping optional (#18488 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-22 20:17:39 -07:00
Mark McLoughlin	c6b636f9fb	[V1][Spec Decoding] Use model_loader.get_model() to load models (#18273 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2025-05-23 02:05:44 +00:00
Harry Mellor	26d0419309	Update deprecated type hinting in `models` (#18132 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-14 22:06:50 -07:00
Ekagra Ranjan	418d2f8bfb	[V1][Spec Decode] Share input embedding of target model with EAGLE draft model to free ~1GB for llama 3 model (#17326 ) Co-authored-by: root <root@ekagra-8xh100.us-east5-a.c.serving-efficiency-poc.internal> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-05-14 12:31:46 -07:00
qizixi	39c0813a7f	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE3 (#17504 ) Signed-off-by: qizixi <qizixi@meta.com>	2025-05-01 16:19:30 -07:00
Bryan Lu	70788bdbdc	[V1][Spec Decode] Apply torch.compile & cudagraph to EAGLE (#17211 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-29 21:10:00 +00:00
Woosuk Kwon	b278911229	[Minor][Models] Fix Return Types of Llama & Eagle (#17220 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-04-25 21:54:47 -07:00
Benjamin Chislett	a0e619e62a	[V1][Spec Decode] EAGLE-3 Support (#16937 ) Signed-off-by: Bryan Lu <yuzhelu@amazon.com> Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai> Co-authored-by: Bryan Lu <yuzhelu@amazon.com>	2025-04-25 15:43:07 -07:00

35 Commits