biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Boyuan Feng	70fb77b4dc	[BugFix] add max-num-batched-token to scheduler hash (#29829 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-02 08:55:02 +00:00
Boyuan Feng	3b221cb661	[BugFix] respect VLLM_LOGGING_LEVEL in logger (#29761 ) Signed-off-by: Boyuan Feng <boyuan@meta.com>	2025-12-02 07:49:16 +00:00
Wushi Dong	0037b5746a	[Core] Eliminate redundant is_encoder_decoder lookups (20-40us/step) (#29800 ) Signed-off-by: Wushi Dong <dongws@meta.com>	2025-12-02 07:08:07 +00:00
Harry Mellor	f5b0846ba0	Fix some Transformers nightly tests (#29802 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-12-02 07:05:27 +00:00
Shengqi Chen	4b612664fd	[CI] Renovation of nightly wheel build & generation (take 2) (#29838 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 22:17:10 -08:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
usberkeley	81fe3f82af	[BugFix] Fix index error in ngram_proposer (#29779 ) Signed-off-by: Bradley <bradley.b.pitt@gmail.com>	2025-12-02 04:48:11 +00:00
Johnny Yang	f441d36cee	Add missing return in _check_vllm_model_embed_input_ids (#29834 ) Signed-off-by: Johnny Yang <johnnyyang@google.com>	2025-12-01 19:22:50 -08:00
Seiji Eicher	22274b2184	[Misc] Add ReplicaId to Ray metrics (#24267 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: rongfu.leng <1275177125@qq.com>	2025-12-02 03:21:44 +00:00
Wei Wei	fc95521ba5	[Misc] Throw error on unintended access to scheduler_config.max_model_len (#29771 ) Signed-off-by: Wei Wei <wwei6@meta.com>	2025-12-02 10:58:44 +08:00
Zhuohan Li	d0cd728907	[Core] Support reseting all running requests' KV while calling `reset_prefix_cache` (#28827 ) Signed-off-by: Zhuohan Li <zhuohan123@gmail.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-12-02 02:25:05 +00:00
Andrew Xia	fa8804ad9c	[responsesAPI][4] fix responseOutputItem Kimi K2 thinking bug (#29555 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-12-02 02:11:35 +00:00
Divakar Verma	4b40924998	[ROCm] Fallback pytorch GELU with tanh approximation to GELU() (#29244 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com> Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-02 02:02:22 +00:00
Nick Hill	44822d7ff2	[BugFix] Preserve spec decoding uniform decode when scheduling (#29759 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-12-01 17:15:52 -08:00
Kevin H. Luu	1336a1ea24	Revert #29787 and #29690 (#29815 )	2025-12-01 13:42:03 -08:00
Nengjun Ma	eaf81485ed	[Ascend]: Fixed the issue where OOT Platform vllm-ascend could not enable SP in Eager mode (#28935 ) Signed-off-by: leo-pony <nengjunma@outlook.com>	2025-12-01 15:02:18 -05:00
shivampr	cabc77cc86	[Core][Observability] Add KV cache residency metrics (#27793 ) Introduces three new Prometheus histograms for fine-grained observability of KV cache residency behavior: vllm:kv_block_lifetime_seconds — total lifetime from allocation to free vllm:kv_block_idle_before_evict_seconds — idle duration before eviction vllm:kv_block_reuse_gap_seconds — time between consecutive reuses of the same block These metrics help operators analyze KV cache efficiency, reuse patterns, and eviction timing beyond simple utilization rates. Implementation uses monotonic timestamps for accuracy, 1% sampling for minimal overhead (~48 bytes/block), and is fully thread-safe with zero runtime cost when disabled. Two new runtime flags are introduced: --kv-cache-metrics – enable KV cache residency metrics --kv-cache-metrics-sample – control sampling ratio (default: 0.01) Signed-off-by: Shivam <shivamprasad91@gmail.com>	2025-12-01 18:27:53 +00:00
knlnguyen1802	fc6acc88ca	[Bugfix] Missing cached item in the MultiModalReceiverCache (#28525 ) Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com> Co-authored-by: Chenguang Zheng <645327136@qq.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-01 10:18:07 -08:00
sangbumlikeagod	092bb73b8a	[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2025-12-01 18:19:17 +01:00
FredericOdermatt	5d43f7372e	[Doc] Update description disable_any_whitespace (#29784 ) Signed-off-by: Frederic Odermatt <frederic.odermatt@44ai.ch>	2025-12-01 16:48:33 +00:00
Shengqi Chen	36db0a35e4	[CI] Renovation of nightly wheel build & generation (#29690 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2025-12-01 21:25:39 +08:00
Isotr0py	b95db244ee	[v1] Add real sliding window calculation to FlexAttention direct BlockMask building (#26015 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Co-authored-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2025-12-01 13:12:51 +00:00
Fanli Lin	f37e8938d2	[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774 ) Signed-off-by: Fanli Lin <fanli.lin@intel.com>	2025-12-01 12:00:52 +00:00
Cyrus Leung	f0a28bf661	[Misc] Unify tokenizer registration (#29767 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-01 11:34:58 +00:00
Mickaël Seznec	86e178f7c4	[crashfix] Eagle + multimodal can crash on mm cache miss (#29750 ) Signed-off-by: Mickael Seznec <mickael@mistral.ai> Co-authored-by: Roger Wang <hey@rogerw.io>	2025-12-01 17:29:33 +08:00
daniel-salib	014ece97c7	[Frontend] Add tool filtering support to ToolServer (#29224 ) Signed-off-by: Daniel Salib <danielsalib@meta.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>	2025-12-01 08:03:57 +00:00
wang.yuqi	62de4f4257	[Frontend] Resettle pooling entrypoints (#29634 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>	2025-12-01 15:30:43 +08:00
Yifei Zhang	1ab8fc8197	Make PyTorch profiler gzip and CUDA time dump configurable (#29568 ) Signed-off-by: Yifei Zhang <yifei.zhang1992@outlook.com>	2025-12-01 04:30:46 +00:00
Shu Wang	f72a817bdf	[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141 ) Signed-off-by: Shu Wang <shuw@nvidia.com> Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-30 16:05:32 -08:00
Woosuk Kwon	ec38a7368d	[Model Runner V2] Use packed mask for prompt bin counts (#29756 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-30 14:15:42 -08:00
Xingyu Liu	21c2627934	[Misc]Remove redundant hidden_size property in ModelConfig (#29749 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-30 17:14:23 +00:00
Omer Ullman Argov	39d28108f4	[Feat] Support non-gated activations in NVFP4 modelopt path (#29004 )	2025-11-30 11:02:40 -05:00
Harry Mellor	cd719de5cb	Fix RoPE failures in Transformers nightly (#29700 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-30 14:29:32 +00:00
Pleaplusone	8c363ed666	[ROCm][Attention] Sliding window support for `AiterFlashAttentionBackend` (#29234 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-30 11:31:50 +00:00
Cyrus Leung	64bc09ba27	[Core] Enable `inputs_embeds_size` separate from `hidden_size` (#29741 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 17:31:12 +08:00
Isotr0py	47539cfd3e	[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-30 09:15:01 +00:00
Cyrus Leung	2afcec4dec	[Misc] Update `TokenizerLike` interface and move `get_cached_tokenizer` (#29730 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-30 14:59:47 +08:00
朝	9381b5cde0	[Doc]: Fix typo in fused_moe layer (#29731 ) Signed-off-by: BowTen <bowten@qq.com>	2025-11-29 22:29:13 -08:00
Vensen	66b5840287	[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783 ) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-11-30 14:24:25 +08:00
Huamin Li	82c795d6f2	Fix AttributeError about _use_fi_prefill (#29734 ) Signed-off-by: Huamin Li <3ericli@gmail.com>	2025-11-30 06:04:55 +00:00
Isotr0py	e1464c3a08	[Quantization] Enable compressed-tensors AWQ for Turing GPU (#29732 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-30 06:04:28 +00:00
Xin Yang	a491b0911b	[LoRA] Support FusedMoE LoRA Triton kernel for mxfp4 (#29708 ) Signed-off-by: Xin Yang <xyangx@amazon.com> Signed-off-by: Xin Yang <105740670+xyang16@users.noreply.github.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2025-11-30 10:37:25 +08:00
Jinzhen Lin	1656ad3704	[Kernel][Quantization] add w4a8 support for marlin kernel (#24722 ) Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin@redhat.com>	2025-11-29 07:19:33 -08:00
Cyrus Leung	fa59fe417f	[Chore] Move `detokenizer_utils` to `vllm/tokenizers` (#29727 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 06:25:17 -08:00
Cyrus Leung	fe3398fab2	[Chore] Enable passing `tokenizer=None` into MM processor (#29724 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 06:25:10 -08:00
Cyrus Leung	34a984274e	[Misc] Refactor tokenizer interface (#29693 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-29 04:02:21 -08:00
Woosuk Kwon	f223ed4181	[Model Runner V2] Fuse penalties and temperature into single kernel (#29720 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-29 02:29:16 -08:00
Didier Durand	04a797cd0e	[Doc]: fixing typos in various files. (#29717 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-29 01:15:39 -08:00
Woosuk Kwon	6afc0ffaf6	[Model Runner V2] Add sample/ directory and reorganize files (#29719 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-11-29 00:41:01 -08:00
Jee Jee Li	39e63dec7c	[LoRA] Cleanup LoRA unused code (#29611 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-11-28 22:52:58 -08:00

1 2 3 4 5 ...

8271 Commits