biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
danisereb	dea63512bb	Add config file for fused MoE for Nemotron (TP4, B200) (#34411 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2026-02-12 06:09:55 -08:00
Douglas Lehr	8a798be929	[ROCm] Enable MXFP4 MoE weight pre-shuffling on gfx950 and update aiter (#34192 ) Signed-off-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Doug Lehr <douglehr@amd.com> Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com> Co-authored-by: tjtanaavllm <tunjian.tan@amd.com>	2026-02-12 05:06:33 -08:00
Cyrus Leung	fb455ed547	[V0 Deprecation] Remove code related to per-request logits processors (#34400 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-12 20:44:28 +08:00
baonudesifeizhai	f5897613fb	Fix Mistral config remap to accept compressed-tensors quantization #34028 (#34104 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2026-02-12 08:22:06 +00:00
Louie Tsai	55a1a9563a	Vllm CPU benchmark suite improvement (#34128 ) Signed-off-by: louie-tsai <louie.tsai@intel.com>	2026-02-12 16:04:44 +08:00
AllenDou	386bfe5d08	[bugfix] refactor FunASR's _get_data_parser (#34397 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-12 07:26:49 +00:00
Kyle Sayers	e9cd691132	[Bugfix] Fix Sparse24 Compressed Tensors models (#33446 ) Signed-off-by: Kyle Sayers <kylesayrs@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 23:15:16 -08:00
Yichuan Wang	80f2ba6ea6	Fix DeepSeek-OCR tensor validation for all size variants (#34085 ) Co-authored-by: Cursor <cursoragent@cursor.com>	2026-02-11 22:50:23 -08:00
Lucas Wilkinson	136b0bfa59	[BugFix] Fix DP chunking (#34379 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Bill Nell <bnell@redhat.com> Co-authored-by: Bill Nell <bnell@redhat.com>	2026-02-12 06:44:03 +00:00
Cyrus Leung	b96f7314b4	[Refactor] Pass Renderer to Input Processor (#34329 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 19:38:11 -08:00
Cyrus Leung	ced2a92f40	[Refactor] Move validation to params definitions (#34362 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 19:33:15 -08:00
Runkai Tao	e1d97c38f8	[Bug Fix] Fix `naive_block_assignment` always defaulting to False due to arg misalignment (#33848 ) Signed-off-by: Runkai Tao <rt572@physics.rutgers.edu>	2026-02-12 11:30:57 +08:00
Michael Goin	ec12d39d44	[Bugfix] Fix MTP accuracy for GLM-5 (#34385 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-02-12 11:08:19 +08:00
Michael Goin	ff1f83b056	[Refactor] Replace `activation: str` with `MoEActivation` enum (#33843 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-02-11 17:29:32 -08:00
Kevin H. Luu	83b47f67b1	[ci] Integrate AMD tests into CI (#33626 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Signed-off-by: khluu <khluu000@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-12 08:54:17 +08:00
Micah Williamson	fb7b30c716	[ROCm][CI] Revert Test Groups From mi325_8 to mi325_1 Agent Pool In AMD CI (#34384 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-11 15:52:34 -08:00
bnellnm	31d992d215	[Bugfix] Fix some issues with MoERunner PR #32344 (#34371 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-11 14:33:14 -08:00
Wei Zhao	5aff2699bd	Fix CI failure - Flashinfer Kernel tests (#34316 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-11 14:17:16 -08:00
Raushan Turganbay	527ca32197	[Bugfix] Fix more multimodal tests for transformers V5 (#34334 ) Signed-off-by: raushan <raushan@huggingface.co>	2026-02-11 22:02:05 +01:00
Junseo Park	5458eb835d	[Bugfix] send None sentinel on final commit so server properly sends transcription.done (#33963 ) Signed-off-by: pjs102793 <pjs102793@naver.com> Co-authored-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 21:01:53 +00:00
Tomas Ruiz	144d9b7cc8	[Benchmarks] Reduce ready checker log verbosity (#34349 ) Signed-off-by: Tomas Ruiz <tomas.ruiz.te@gmail.com>	2026-02-11 20:57:57 +00:00
elvischenv	83e26c834e	[GPT-OSS] Remove unnecessary contiguous (#34337 ) Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>	2026-02-11 15:29:29 -05:00
TJian	5001211369	[ROCm] [CI] fix test_unrecognized_env (#34350 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-02-11 18:50:44 +00:00
Eldar Kurtić	11c7ace340	[Bugfix] Enable attn quantization of Llama-4 by correctly permuting scales for rope (int8, fp8) (#34243 ) Signed-off-by: Your Name <you@example.com> Co-authored-by: Your Name <you@example.com>	2026-02-11 13:24:22 -05:00
Xinyu Dong	be7f3d5d20	[Bugfix] fix default is_neox_style is True for deepseek (#34353 ) Signed-off-by: dongxinyu03 <dongxinyu03@baidu.com>	2026-02-11 18:20:45 +00:00
Isotr0py	0ab06100f4	[Multimodal] Expose `mm_processor_kwargs` for `DummyInputsBuilder` (#34330 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-11 09:37:40 -08:00
Xinyu Chen	ffb3d553cc	[Model Runner V2] Init cuda graph pool when necessary (#33217 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-02-11 09:12:13 -08:00
junuxyz	fa7e0bfacf	[CI][BugFix] Fix silent failure in shellcheck hook and baseline exist… (#32458 ) Signed-off-by: junuxyz <216036880+junuxyz@users.noreply.github.com>	2026-02-11 17:03:48 +00:00
SorenDreano	48134a2c22	[Docs] Fix typo ("defult") and double spacing (#34348 ) Signed-off-by: SorenDreano <71752785+SorenDreano@users.noreply.github.com> Co-authored-by: Soren Dreano <soren@numind.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-11 09:02:27 -08:00
kliuae	64f570ab56	[ROCm] [aiter] Split KV cache update for AiterFlashAttention (#33681 ) Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>	2026-02-11 16:26:44 +00:00
Rohan Potdar	fd618871b4	[Bugfix]: Fix ROCm fusion attn test; use AttentionBackend utils to create kv cache (#33948 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-02-11 11:12:05 -05:00
Harry Mellor	67a42b5a44	Don't try and run GLM-ASR with remote code (#34352 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 08:09:40 -08:00
Lucas Wilkinson	c7914d30f9	Reapply [Attention][FA3] Update FA3 to include new swizzle optimization (#34043 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-02-11 07:07:56 -08:00
Adam Binford	1b8756562e	Responses harmony system message structured (#34268 ) Signed-off-by: Adam Binford <adamq43@gmail.com>	2026-02-11 05:14:28 -08:00
Linda	275e0d2a99	[NVIDIA][test] Tests for flashinfer TRTLLM BF16 MoE (#33715 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com> Co-authored-by: Pavani Majety <pmajety@nvidia.com>	2026-02-11 12:38:11 +00:00
Harry Mellor	0f5e55e7a8	Make JAIS compatible with Transformers v5 (#34264 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 12:30:37 +00:00
Harry Mellor	1e9204bff3	Make Qwen3VL compatible with Transformers v5 (#34262 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Co-authored-by: Roger Wang <hey@rogerw.io>	2026-02-11 04:13:23 -08:00
Li, Jiang	05339a7b20	[Bugfix][CPU] Fix llama4 inference on CPU (#34321 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-02-11 19:07:23 +08:00
Harry Mellor	40b8f55358	[Docs] Reduce time spent generating API docs (#34255 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-11 02:56:02 -08:00
Seiji Eicher	5045d5c983	Patch protobuf for CVE-2026-0994 (#34253 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-02-11 02:25:04 -08:00
Nick Hill	e09546cf05	[Frontend] Exploit tokenizers "new stream" in FastIncrementalDetokenizer (#34217 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 11:03:24 +01:00
Tianqi Ren	786806dd44	[Doc] Update Marlin support matrix for Turing (#34319 ) Signed-off-by: Tianqi Ren <tianqi.r@outlook.com>	2026-02-11 09:03:41 +00:00
Nick Hill	79504027ef	[Misc] Bump `fastsafetensors` version for latest fixes (#34273 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-02-11 00:30:09 -08:00
Luka Govedič	addac0e653	[torch.compile] Enable AR+rms fusion by default available for `-O2` (#34299 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-02-11 00:30:00 -08:00
Cyrus Leung	675a22ed66	[Chore] Move `BaseRenderer` to `base.py` (#34308 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-11 00:29:51 -08:00
Kunshang Ji	cb9574eb85	[XPU][9/N] clean up existing ipex code/doc (#34111 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-02-11 00:27:15 -08:00
AllenDou	21dfb842d7	[model] support FunASR model (#33247 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>	2026-02-11 07:37:09 +00:00
R3hankhan	d1b837f0ae	[CPU] Enable FP16 (Half dtype) support for s390x (#34116 ) Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com>	2026-02-11 14:41:42 +08:00
Roger Wang	0b20469c62	[Bugfix] Fix weight naming in Qwen3.5 (#34313 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-10 21:37:14 -08:00
Tyler Michael Smith	d7982daff5	[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides (#34279 ) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-11 05:15:52 +00:00

1 2 3 4 5 ...

13868 Commits