Julien Denize
6a6108511f
[BUGFIX] Fix regex pattern for Mistral Tool Call ( #29918 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
(cherry picked from commit 1b1e35aaf9 )
2025-12-02 15:08:47 -08:00
Julien Denize
9057fc2f1b
[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention ( #29908 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
(cherry picked from commit 5e5646e206 )
2025-12-02 15:08:34 -08:00
Chauncey
a05b580540
[Bugfix] fix --scheduling-policy=priority & n>1 crashes engine ( #29764 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
(cherry picked from commit 0a9caca9f5 )
2025-12-02 15:08:24 -08:00
Sage Moore
b6ae5aeca6
[Bugfix][EPLB] Prevent user-provided EPLB config from being overwritten with defaults ( #29911 )
...
Signed-off-by: Sage Moore <sage@neuralmagic.com >
(cherry picked from commit e6f114ac25 )
2025-12-02 15:08:06 -08:00
jthomson04
5c7c09af8f
[Perf] Avoid pageable HtoD transfer in MinTokensLogitsProcessor ( #29826 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com >
(cherry picked from commit 1528e079e2 )
2025-12-02 14:57:40 -08:00
Benjamin Bartels
7f718169d1
[CI/Build] Fixes missing runtime dependencies ( #29822 )
...
Signed-off-by: bbartels <benjamin@bartels.dev >
(cherry picked from commit 2d613de9ae )
2025-12-02 12:33:30 -08:00
Matthew Bonanni
339e84ce86
[Bugfix] Fix DeepSeek R1 MTP weight loading ( #29545 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com >
(cherry picked from commit 51c57b51dd )
2025-12-02 12:33:18 -08:00
Cyrus Leung
34a8559be7
[Chore] Use tokenizer.encode and tokenizer.decode directly ( #29851 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
(cherry picked from commit 68ffbca7e4 )
2025-12-02 12:32:14 -08:00
Harry Mellor
85fb2e3120
Remove default values from InitVars so that they're not stored ( #29859 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
(cherry picked from commit 951445a52d )
2025-12-02 12:32:06 -08:00
Julien Denize
d8c6210eea
Add Mistral Large 3 and Ministral 3 ( #29757 )
...
Signed-off-by: Julien Denize <julien.denize@mistral.ai >
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com >
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Signed-off-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Roger Wang <hey@rogerw.io >
Co-authored-by: Mickael Seznec <mickael@mistral.ai >
2025-12-02 10:29:00 +00:00
Louie Tsai
8bbcf8b6e7
[vLLM Benchmark Suite] Add default parameters section and update CPU benchmark cases ( #29381 )
...
Signed-off-by: Tsai, Louie <louie.tsai@intel.com >
Signed-off-by: Louie Tsai <louie.tsai@intel.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
2025-12-02 09:00:23 +00:00
Boyuan Feng
70fb77b4dc
[BugFix] add max-num-batched-token to scheduler hash ( #29829 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-02 08:55:02 +00:00
杰兮
48d15a32aa
[CI] Fix Bad_words test for tokenizer encode/decode asymmetry ( #28193 )
...
Signed-off-by: zhyajie <yajizhan@amd.com >
Co-authored-by: zhyajie <yajizhan@amd.com >
2025-12-02 00:02:12 -08:00
Boyuan Feng
3b221cb661
[BugFix] respect VLLM_LOGGING_LEVEL in logger ( #29761 )
...
Signed-off-by: Boyuan Feng <boyuan@meta.com >
2025-12-02 07:49:16 +00:00
Wushi Dong
0037b5746a
[Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) ( #29800 )
...
Signed-off-by: Wushi Dong <dongws@meta.com >
2025-12-02 07:08:07 +00:00
Harry Mellor
f5b0846ba0
Fix some Transformers nightly tests ( #29802 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2025-12-02 07:05:27 +00:00
Zhang Xiangze
13ea39bc09
[CPU]Parallelize over tokens in int4 moe ( #29600 )
...
Signed-off-by: Zhang Xiangze <Xiangze.Zhang@arm.com >
2025-12-02 06:21:39 +00:00
Shengqi Chen
4b612664fd
[CI] Renovation of nightly wheel build & generation (take 2) ( #29838 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-01 22:17:10 -08:00
Cyrus Leung
653591d5e7
[Chore] Move tokenizer initialization methods ( #29793 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-02 13:33:37 +08:00
Divakar Verma
e2fbfc955e
[CI][AMD] spec_decode:eagle skip FLASH_ATTN for deepseek on ROCm ( #29827 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
2025-12-02 05:27:46 +00:00
Divakar Verma
a690fb5bd6
[CI][ROCm] Fix test_correctness_sliding_window ( #29243 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-02 04:53:27 +00:00
usberkeley
81fe3f82af
[BugFix] Fix index error in ngram_proposer ( #29779 )
...
Signed-off-by: Bradley <bradley.b.pitt@gmail.com >
2025-12-02 04:48:11 +00:00
Zuyi Zhao
53bf71b0f0
[Misc] Update conftest for entrypoints/sagemaker test folder ( #29799 )
...
Signed-off-by: Zuyi Zhao <zhaozuy@amazon.com >
2025-12-01 18:56:39 -09:00
Johnny Yang
f441d36cee
Add missing return in _check_vllm_model_embed_input_ids ( #29834 )
...
Signed-off-by: Johnny Yang <johnnyyang@google.com >
2025-12-01 19:22:50 -08:00
Seiji Eicher
22274b2184
[Misc] Add ReplicaId to Ray metrics ( #24267 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
Co-authored-by: rongfu.leng <1275177125@qq.com >
2025-12-02 03:21:44 +00:00
Wei Wei
fc95521ba5
[Misc] Throw error on unintended access to scheduler_config.max_model_len ( #29771 )
...
Signed-off-by: Wei Wei <wwei6@meta.com >
2025-12-02 10:58:44 +08:00
Zhuohan Li
d0cd728907
[Core] Support reseting all running requests' KV while calling reset_prefix_cache ( #28827 )
...
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com >
Signed-off-by: Nick Hill <nhill@redhat.com >
Co-authored-by: Nick Hill <nhill@redhat.com >
2025-12-02 02:25:05 +00:00
Andrew Xia
fa8804ad9c
[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug ( #29555 )
...
Signed-off-by: Andrew Xia <axia@fb.com >
Co-authored-by: Andrew Xia <axia@fb.com >
2025-12-02 02:11:35 +00:00
Divakar Verma
4b40924998
[ROCm] Fallback pytorch GELU with tanh approximation to GELU() ( #29244 )
...
Signed-off-by: Divakar Verma <divakar.verma@amd.com >
Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-02 02:02:22 +00:00
Hendrik Holtmann
c0dfc89485
SM120 / NVFP4: add device guard and runtime SM dispatch to cutlass_scaled_fp4_mm ( #29711 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
Co-authored-by: mgoin <mgoin64@gmail.com >
2025-12-01 17:24:18 -08:00
Nick Hill
44822d7ff2
[BugFix] Preserve spec decoding uniform decode when scheduling ( #29759 )
...
Signed-off-by: Nick Hill <nhill@redhat.com >
2025-12-01 17:15:52 -08:00
Alexei-V-Ivanov-AMD
342c4f1472
Updated CI mirror 2025-11-25 ( #29434 )
...
Signed-off-by: Alexei V. Ivanov <alexei.ivanov@amd.com >
Signed-off-by: Alexei-V-Ivanov-AMD <156011006+Alexei-V-Ivanov-AMD@users.noreply.github.com >
Co-authored-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-01 23:44:33 +00:00
Kevin H. Luu
1336a1ea24
Revert #29787 and #29690 ( #29815 )
2025-12-01 13:42:03 -08:00
Nengjun Ma
eaf81485ed
[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode ( #28935 )
...
Signed-off-by: leo-pony <nengjunma@outlook.com >
2025-12-01 15:02:18 -05:00
Finbarr Timbers
38caf7fa1a
Update FAQ on interleaving sliding windows support ( #29796 )
...
Signed-off-by: Finbarr Timbers <finbarrtimbers@gmail.com >
2025-12-01 19:15:19 +00:00
shivampr
cabc77cc86
[Core][Observability] Add KV cache residency metrics ( #27793 )
...
Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior:
vllm:kv_block_lifetime_seconds — total lifetime from allocation to free
vllm:kv_block_idle_before_evict_seconds — idle duration before eviction
vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block
These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates.
Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled.
Two new runtime flags are introduced:
--kv-cache-metrics – enable KV cache residency metrics
--kv-cache-metrics-sample – control sampling ratio (default: 0.01)
Signed-off-by: Shivam <shivamprasad91@gmail.com >
2025-12-01 18:27:53 +00:00
Kevin H. Luu
ec7035c9d4
[ci] Make distributed 8 gpus test optional ( #29801 )
...
Signed-off-by: Kevin H. Luu <khluu000@gmail.com >
2025-12-01 10:22:05 -08:00
knlnguyen1802
fc6acc88ca
[Bugfix] Missing cached item in the MultiModalReceiverCache ( #28525 )
...
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com >
Co-authored-by: Chenguang Zheng <645327136@qq.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-01 10:18:07 -08:00
BADAOUI Abdennacer
d0985c5feb
[Hardware][AMD] Remove ROCm skip conditions for transformers backend tests ( #29782 )
...
Signed-off-by: badaoui <abdennacerbadaoui0@gmail.com >
2025-12-02 02:03:13 +08:00
sangbumlikeagod
092bb73b8a
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation ( #24209 )
...
Signed-off-by: sangbumlikeagod <oironese@naver.com >
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com >
2025-12-01 18:19:17 +01:00
FredericOdermatt
5d43f7372e
[Doc] Update description disable_any_whitespace ( #29784 )
...
Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch >
2025-12-01 16:48:33 +00:00
Shengqi Chen
37593deb02
[CI] fix url-encoding behavior in nightly metadata generation ( #29787 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-01 23:17:20 +08:00
Liu Jinyi
f5516039c5
[Doc] fix heading levels ( #29783 )
...
Signed-off-by: KKKZOZ <kkkzoz@qq.com >
2025-12-01 14:49:22 +00:00
Shengqi Chen
36db0a35e4
[CI] Renovation of nightly wheel build & generation ( #29690 )
...
Signed-off-by: Shengqi Chen <harry-chen@outlook.com >
2025-12-01 21:25:39 +08:00
Marcin Ostrowski
5cfa967efa
[Bugfix] TypeError: 'NoneType' object is not callable ( #29414 )
...
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com >
2025-12-01 13:16:44 +00:00
Isotr0py
b95db244ee
[v1] Add real sliding window calculation to FlexAttention direct BlockMask building ( #26015 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com >
2025-12-01 13:12:51 +00:00
Zhengxu Chen
ad9d656bfa
[multimodal][test] Reduce memory utilization for test_siglip to avoid OOM ( #29504 )
...
Signed-off-by: zhxchen17 <zhxchen17@fb.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2025-12-01 20:41:48 +08:00
Fanli Lin
f37e8938d2
[XPU] Fix AWQ skipped layer detection in IPEX quantization ( #29774 )
...
Signed-off-by: Fanli Lin <fanli.lin@intel.com >
2025-12-01 12:00:52 +00:00
Cyrus Leung
f0a28bf661
[Misc] Unify tokenizer registration ( #29767 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2025-12-01 11:34:58 +00:00
Mickaël Seznec
86e178f7c4
[crashfix] Eagle + multimodal can crash on mm cache miss ( #29750 )
...
Signed-off-by: Mickael Seznec <mickael@mistral.ai >
Co-authored-by: Roger Wang <hey@rogerw.io >
2025-12-01 17:29:33 +08:00