biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	30a3e5af69	[CI] Add Qwen3 MoE NVFP4 to Blackwell lm-eval (#26316 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-07 10:36:15 -07:00
fxmarty-amd	a38c1bfe09	[ci] Rename `test_mxfp4_moe.py` to `test_ocp_mx_moe.py` (#26364 ) Signed-off-by: Felix Marty <Felix.Marty@amd.com>	2025-10-07 09:52:24 -07:00
Paul Pak	320feae6f5	[Model] Lfm2Moe (#26344 ) Signed-off-by: Paul Pak <paulpak58@gmail.com>	2025-10-07 16:03:05 +00:00
Cyrus Leung	1e4ecca1d0	[V0 Deprecation] Remove `VLLM_USE_V1` from tests (#26341 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:42:31 +00:00
Cyrus Leung	c0a7b89d8e	[Misc] Move `LRUCache` into its own file (#26342 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 15:08:40 +00:00
antrec	6f59beaf0b	[Model] Add support for ModernBertForTokenClassification (#26340 ) Signed-off-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr> Signed-off-by: antrec <antoine.recanati@gmail.com> Co-authored-by: Antoine Recanati Le Goat <antoine.recanati@sancare.fr> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-07 14:29:19 +00:00
fxmarty-amd	41f1cf38f2	[Feature][OCP MX] Support mxfp6 and mixed mxfp6-mxfp4 (#21166 )	2025-10-07 09:35:26 -04:00
Isotr0py	08d26a1b7e	[Model] Use `merge_by_field_config` for MM models (Ovis family) (#26308 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-10-07 12:54:22 +00:00
fhl2000	63773a6200	[Docs] add docs for cuda graph v1 (#24374 ) Signed-off-by: fhl <2410591650@qq.com> Signed-off-by: fhl2000 <63384265+fhl2000@users.noreply.github.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-07 05:25:05 -07:00
Sergio Paniego Blanco	883b42896a	Add TRL example notebook to RLHF docs (#26346 ) Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>	2025-10-07 11:31:28 +00:00
Daniel Cámpora	e1098ced95	Add topk logits torch op for DS3.2. (#25945 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-10-07 10:07:32 +00:00
Grant Holmes (Ren)	d100d78eb3	Optimize KV cache distribution for asymmetric pipeline parallelism (#25164 ) Signed-off-by: gholmes829 <g.holmes429@gmail.com>	2025-10-07 09:20:30 +00:00
Cyrus Leung	7e4cd070b0	[V0 Deprecation] Remove `VLLM_USE_V1` from docs and scripts (#26336 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 16:46:44 +08:00
Snehlata	46b0779996	[BugFix] Update KV block hash type from BlockHash to ExternalBlockHash in kv_events_subscriber - #26264 (#26265 ) Signed-off-by: atalhens <sneh.lata@nutanix.com>	2025-10-07 08:42:28 +00:00
Ayush Satyam	de342585ff	[Model] Define merge_by_field_config MM interface (R-T) (#26260 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 16:10:55 +08:00
Andrew Xia	185d8ed44f	[responsesAPI][bugfix] serialize harmony messages (#26185 ) Signed-off-by: Andrew Xia <axia@meta.com> Co-authored-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-07 07:07:53 +00:00
Cyrus Leung	d9836d4517	[Deprecation] Deprecate `LLM.set_tokenizer` (#26333 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 06:50:57 +00:00
Ayush Satyam	5f7e8a916a	[Model] Define merge_by_field_config MM interface (U-Z) (#26261 ) Signed-off-by: Ayush Satyam <ayushsatyam146@gmail.com> Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-07 06:45:49 +00:00
ahao-anyscale	4dbdf4a294	[BUG] Fix file parsing for load_format runai_streamer_sharded (#26324 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2025-10-07 11:23:07 +08:00
Michael Goin	c6873c4e6d	[UX] Support nested dicts in hf_overrides (#25727 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-10-07 11:19:16 +08:00
Sage Moore	2111b4643c	[Core] Simplify the Dp padding/should ubatch coordination logic (#25768 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-10-07 01:57:49 +00:00
Sage Moore	c50901f3b9	[Docs][DBO] Add initial doc that describes the DBO implementation (#26024 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2025-10-07 00:47:28 +00:00
Simon Mo	8229280a9c	[Misc] Define EP kernel arch list in Dockerfile (#25635 ) Signed-off-by: Simon Mo <simon.mo@hey.com>	2025-10-07 00:05:33 +00:00
Benjamin Chislett	f77df94647	[Perf] Add decode full-graph support to FlashInfer-MLA backend (#26313 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 23:03:49 +00:00
Gregory Shtrasberg	f231e5bc21	[ROCm] Split AITER unified attention into its own backend (#25507 ) Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>	2025-10-06 22:49:23 +00:00
Benjamin Chislett	2161efe978	[Bugfix] Allow skipping MoE in NVFP4 (fix for MTP) (#25987 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 16:16:30 -04:00
Varun Sundar Rabindranath	f23b4c04fd	[BugFix] Pad input buffers in _dummy_run (#26209 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-06 16:07:51 -04:00
Varun Sundar Rabindranath	93540958b8	[Docs] Fix broken table in moe_kernel_features doc (#26314 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-10-06 15:58:05 -04:00
Cyrus Leung	44b9af5bb2	[Benchmark] Enable MM Embedding benchmarks (#26310 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 19:51:58 +00:00
Raushan Turganbay	7cd95dc8a3	[Bugfix] Fix gemma3 with transformers backend (#23178 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Raushan Turganbay <raushan@huggingface.co> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 18:42:32 +00:00
Crefeda Rodrigues	c02058c222	Add bias handling to CPUFusedMOE kernel (#26289 ) Signed-off-by: Crefeda Rodrigues <crefeda.rodrigues@arm.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: Crefeda Rodrigues <65665931+cfRod@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Sharif Inamdar <Sharif.Inamdar@arm.com> Co-authored-by: Isotr0py <2037008807@qq.com>	2025-10-06 18:39:10 +00:00
7mile	b2ea5ba677	[Bugfix][Spec Decode] Fix wrong valid_mask for padded speculation when chunked prefill occurs (#26231 ) Signed-off-by: seven-mile <i@7li.moe> Signed-off-by: Benjamin Chislett <bchislett@nvidia.com> Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>	2025-10-06 18:24:22 +00:00
Karan Goel	824a3f403f	[Misc] auto_tune: kill specific vllm process (#26304 ) Signed-off-by: Karan Goel <karangoel@google.com>	2025-10-06 18:02:51 +00:00
Rahul Tuli	05f6846ede	Support llama3 eagle3 head with llama4 verifier (#25961 ) Signed-off-by: rahul-tuli <rtuli@redhat.com> Signed-off-by: Rahul Tuli <rtuli@redhat.com>	2025-10-06 13:56:08 -04:00
Michael Goin	20db99cc69	[CI Bugfix] Make sure TRTLLM attention is available in test_blackwell_moe (#26188 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 13:50:11 -04:00
Yannick Schnider	6431be808f	[Tests] conftest: Extending VllmRunner and HfRunner to accept token_ids as input (#26295 ) Signed-off-by: Yannick Schnider <yannick.schnider1@ibm.com> Signed-off-by: Yannick Schnider <Yannick.Schnider1@ibm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-06 17:19:34 +00:00
Matthew Bonanni	4727a8afa7	[Attention] Remove unused reorder_batch method (#24463 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2025-10-06 13:13:39 -04:00
tomeras91	b8f603cebe	[Model] EVS support for nano_nemotron_vl (#26269 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com> Signed-off-by: tomeras91 <57313761+tomeras91@users.noreply.github.com> Signed-off-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Eugene Khvedchenia <ekhvedchenia@nvidia.com>	2025-10-07 00:23:37 +08:00
Chatcharin Sangbutsarakum	fc679696f8	Fix `DotsOCR` tensor type (#26281 ) Signed-off-by: what_in_the_nim <chatcharinsang@gmail.com>	2025-10-06 12:23:43 +00:00
Raushan Turganbay	ab5e7d93f4	[Bugfix] Fix mrope in Transformers Backend (#26087 ) Signed-off-by: raushan <raushan@huggingface.co> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 11:40:50 +00:00
Harry Mellor	0340f45553	Support expert parallel load balancing in Transformers backend (#26287 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 11:20:16 +00:00
Cyrus Leung	19a00eb210	[Model] Use `merge_by_field_config` for MM models (Llava family) (#26280 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:45:26 +00:00
Cyrus Leung	391612e78b	[Frontend] Consolidate tokenizer init code (#26276 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-06 09:34:52 +00:00
abhisheksheth28	77c95f72f7	[Doc] add KAITO to integrations (#25521 ) Signed-off-by: "Abhishek Sheth" <absheth@microsoft.com>	2025-10-06 17:30:03 +08:00
Aritra Roy Gosthipaty	59f30d0448	[Docs] Edit HF Inference Endpoints documentation (#26275 ) Signed-off-by: Aritra Roy Gosthipaty <aritra.born2fly@gmail.com> Signed-off-by: ariG23498 <aritra.born2fly@gmail.com>	2025-10-06 10:13:09 +01:00
Roger Wang	43c146ca42	[Misc] Clean up unnecessary E501 ignore (#26274 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-10-06 07:29:18 +00:00
Yasmin Moslem	7c2ec0fe87	[Benchmarking] Add disable_shuffle option for dataset loading (#26258 ) Signed-off-by: Yasmin Moslem <48152713+ymoslem@users.noreply.github.com>	2025-10-06 07:05:44 +00:00
dependabot[bot]	039b6bade3	Bump actions/stale from 10.0.0 to 10.1.0 (#26272 ) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-06 07:01:21 +00:00
Harry Mellor	6c04638214	Fix per file ruff ignores related to line length (#26262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-06 05:12:40 +00:00
wuhang	91ac7f764d	[CI][gpt-oss] Enable python tool tests in CI (#24315 ) Signed-off-by: wuhang <wuhang6@huawei.com>	2025-10-06 04:20:06 +00:00

1 2 3 4 5 ...

10215 Commits