Isotr0py
63b1da76ba
[Chore]: Reorganize gguf utils funtions under transformers_utils ( #29891 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-12-02 17:33:23 +00:00
Harry Mellor
951445a52d
Remove default values from InitVars so that they're not stored ( #29859 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 12:16:37 +00:00
Julien Denize
d8c6210eea
Add Mistral Large 3 and Ministral 3 ( #29757 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Mickael Seznec <mickael@mistral.ai >
2025-12-02 10:29:00 +00:00
Boyuan Feng
70fb77b4dc
[BugFix] add max-num-batched-token to scheduler hash ( #29829 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-02 08:55:02 +00:00
Wei Wei
fc95521ba5
[Misc] Throw error on unintended access to scheduler_config.max_model_len ( #29771 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-12-02 10:58:44 +08:00
Nengjun Ma
eaf81485ed
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode ( #28935 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com >
2025-12-01 15:02:18 -05:00
shivampr
cabc77cc86
[Core][Observability] Add KV cache residency metrics ( #27793 )
...
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:
vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block
These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.
Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.
Two new runtime flags are introduced:
--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com >
2025-12-01 18:27:53 +00:00
FredericOdermatt
5d43f7372e
[Doc] Update description disable_any_whitespace ( #29784 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2025-12-01 16:48:33 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration ( #29767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-01 11:34:58 +00:00
Xingyu Liu
21c2627934
[Misc]Remove redundant hidden_size property in ModelConfig ( #29749 )
...
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-30 17:14:23 +00:00
Cyrus Leung
64bc09ba27
[Core] Enable inputs_embeds_size separate from hidden_size ( #29741 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-30 17:31:12 +08:00
Zhengxu Chen
6173682b6e
[compile] Include enable_sleep_mode into caching factors. ( #29696 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-11-29 07:58:38 +08:00
Yanan Cao
3461e7efd8
[Frontend] Remap -O to -cc commandline flag ( #29557 )
...
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com >
Co-authored-by: Claude <noreply@anthropic.com >
2025-11-28 21:51:12 +00:00
wang.yuqi
f4b76056ee
Improve enable chunked_prefill & prefix_caching logic. ( #26623 )
...
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io >
Signed-off-by: wang.yuqi <noooop@126.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-27 22:05:48 -08:00
Cyrus Leung
a24ea5414b
[Deprecation] Advance deprecation status ( #29617 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-27 19:04:58 +00:00
Injae Ryou
0840abdd24
[BugFix] Optional tokenizer argument when loading GGUF models ( #29582 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-27 16:53:10 +00:00
Matthew Bonanni
fc1d8be3dc
[Attention] Update attention imports ( #29540 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2025-11-27 11:19:09 -05:00
Didier Durand
66d3d5422c
[Doc]: fixing typos in diverse files ( #29492 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
2025-11-27 07:15:50 -08:00
Morrison Turnansky
0838b52e2e
[Frontend][torch.compile] CompilationConfig Overhaul ( #20283 ): Set up -O infrastructure ( #26847 )
...
Signed-off-by: morrison-turnansky <mturnans@redhat.com >
Signed-off-by: adabeyta <aabeyta@redhat.com >
Signed-off-by: Morrison Turnansky <mturnans@redhat.com >
Co-authored-by: adabeyta <aabeyta@redhat.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-27 01:55:58 -08:00
George D. Torres
56531b79cc
[Misc] Add backup hash algorithm for FIPS constrained environments ( #28795 )
...
Signed-off-by: George D. Torres <gdavtor@gmail.com >
Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com >
Signed-off-by: Russell Bryant <rbryant@redhat.com >
Co-authored-by: Russell Bryant <rbryant@redhat.com >
2025-11-26 00:50:22 +00:00
Lucia Fang
d8819c88eb
fix assertion for single world use case (uni) ( #29429 )
...
Signed-off-by: Lu Fang <fanglu@fb.com >
Co-authored-by: Lucia (Lu) Fang <fanglu@meta.com >
2025-11-26 00:14:23 +00:00
Zhengxu Chen
0abc79482a
[caching] Add enable_prompt_embeds and cpu_offload_gb to compile hashes. ( #29435 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
2025-11-25 21:46:41 +00:00
Harry Mellor
a1f2676879
Scheduled removal of override_pooler_config and disable_log_requests ( #29402 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-11-25 16:08:57 +00:00
Injae Ryou
794029f012
[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type ( #29137 )
...
Signed-off-by: Injae Ryou <injaeryou@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-25 14:28:53 +00:00
Harry Mellor
51fc9e017a
Scheduled removal of CompilationConfig.use_inductor ( #29323 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 12:55:42 +00:00
Icey
888152bf87
Allow oot custom compiler extension via CompilerInterface ( #28623 )
...
Signed-off-by: wxsIcey <1790571317@qq.com >
Signed-off-by: Mengqing Cao <cmq0113@163.com >
Signed-off-by: Icey <1790571317@qq.com >
Co-authored-by: Mengqing Cao <cmq0113@163.com >
2025-11-25 15:25:15 +08:00
Isotr0py
92effb07a4
[Model] Add HunyuanOCR support ( #29327 )
...
Signed-off-by: manayang <jackmanayang@gmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: sergeywang <sergeywang@tencent.com >
Co-authored-by: manayang <jackmanayang@gmail.com >
Co-authored-by: manayang <manayang@tencent.com >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-11-25 03:28:51 +00:00
Maryam Tahhan
87185c88d5
[Bugfix] Make deprecated --task embedding consistent with `--runner… ( #29312 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
2025-11-25 03:19:52 +00:00
Harry Mellor
a4ad43ad5a
Scheduled removal of ParallelConfig's direct child EPLB fields ( #29324 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-25 01:58:58 +00:00
Laith Sakka
7a228b5305
Add option to use unbacked, and backed size obl dynamic shapes for more sounds compilation. ( #26199 )
...
Signed-off-by: Laith Sakka <lsakka@meta.com >
2025-11-24 10:12:41 -05:00
WeiQing Chen
2601f18a82
[EPLB] Optimize EPLB for Async Rearrange Experts ( #22179 )
...
Signed-off-by: David Chen <530634352@qq.com >
Co-authored-by: SunChenxiang123 <1291824390@qq.com >
2025-11-24 09:08:29 -05:00
Didier Durand
eca7a8fb59
[Doc]: fix typos in various files ( #29230 )
...
Signed-off-by: Didier Durand <durand.didier@gmail.com >
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-24 11:10:48 +00:00
Roger Wang
0ff70821c9
[Core] Deprecate xformers ( #29262 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2025-11-24 04:18:55 +00:00
Cyrus Leung
5a4802588e
[Misc] Further clean up chunked prefill and prefix caching init ( #29186 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-22 19:34:15 +08:00
Lucas Wilkinson
30d6466238
[BugFix] Fix Eagle IndexError: list index out of range for even num_speculative_tokens ( #29102 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-22 00:47:05 +00:00
Julien Denize
57430fc95c
Default model load/config/tokenizer to mistral format if relevant files exist ( #28659 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Signed-off-by: mgoin <mgoin64@gmail.com >
Signed-off-by: Michael Goin <mgoin64@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-11-21 13:58:59 -08:00
Cyrus Leung
d7219bcda3
[Misc] Move dynamic seed initialization to EngineArgs ( #29165 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-11-21 15:27:44 +00:00
Boyuan Feng
8c25f9cfb6
[BugFix] skip combo kernel on cpu ( #29129 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-11-21 11:50:59 +08:00
Jee Jee Li
9875be6431
[LoRA][2/2]Remove LoRA extra vocab ( #28545 )
...
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com >
2025-11-21 09:46:43 +08:00
Cyrus Leung
20e4497be2
[V0 Deprecation] Remove num_lookahead_slots ( #29000 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2025-11-20 06:39:10 +00:00
Isotr0py
64192d5624
[Bugfix] Revert custom attention mask for gemma3-mm ( #28995 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-20 13:23:22 +08:00
Lucas Wilkinson
8f4f77a727
[BugFix] Fix false assertion with spec-decode=[2,4,..] and TP>2 ( #29036 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-19 13:43:54 -08:00
Qiu
2fd893b4ce
[Feature] Prefill Context Parallel (PCP) basic support ( #28718 )
...
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com >
Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com >
Signed-off-by: LookAround <lixushi@huawei.com >
Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com >
Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com >
Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com >
Co-authored-by: LookAround <lixushi@huawei.com >
Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com >
Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com >
Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com >
2025-11-19 15:52:44 -05:00
Harry Mellor
a8b70304d6
Update rope_scaling to rope_parameters in preparation for Transformers v5 ( #28542 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-11-19 09:06:36 -08:00
vnadathur
1ffe934c8a
[torch.compile] caching of config fields should be opt-out by default ( #26468 )
...
Signed-off-by: vnadathur <glvikramn@gmail.com >
Signed-off-by: WorldExplored <srreyansh.sethi@gmail.com >
Signed-off-by: Srreyansh Sethi <srreyansh.sethi@gmail.com >
Signed-off-by: Srreyansh Sethi <107075589+WorldExplored@users.noreply.github.com >
Co-authored-by: WorldExplored <srreyansh.sethi@gmail.com >
Co-authored-by: Srreyansh Sethi <107075589+worldexplored@users.noreply.github.com >
Co-authored-by: vnadathur <236933696+vnadathur@users.noreply.github.com >
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
2025-11-19 06:13:54 -08:00
Tova Movshovitz
ba558c029a
[config] Expose get_total_num_hidden_layers() in ModelConfig ( #28961 )
...
Signed-off-by: tovam <tovam@pliops.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-11-19 11:37:11 +00:00
Li, Jiang
20852c8f4c
[CPU] Refactor CPU WNA16 ( #28826 )
...
Signed-off-by: jiang1.li <jiang1.li@intel.com >
2025-11-19 10:32:00 +08:00
Luciano Martins
c2612371ad
[Model] Add Gemma3 GGUF multimodal support ( #27772 )
...
Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2025-11-18 08:56:29 -08:00
Ronald
d8874c61a5
[Core] Async Scheduling X Spec Decoding Compatibility ( #24799 )
...
Signed-off-by: Ronald1995 <ronaldautomobile@163.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com >
2025-11-17 12:16:20 -08:00
Lucas Wilkinson
64e39d667c
[BugFix] Temporary fix for IMA with MTP = 2 and full-cg ( #28315 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
2025-11-17 09:41:22 -05:00