Commit Graph

1126 Commits

Author SHA1 Message Date
Michael Goin
db5d0719e1 [Kernel] Add MXFP8 to Marlin GEMM/MoE and refactor Mxfp8LinearOp (#34664)
Signed-off-by: mgoin <mgoin64@gmail.com>
2026-04-01 09:41:42 -07:00
Lukas Geiger
4f6eed3bd4 [Core] Simplify multimodal masking (#34246)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
2026-04-01 01:18:22 -07:00
wliao2
4dfad17ed1 replace cuda_device_count_stateless() to current_platform.device_count() (#37841)
Signed-off-by: Liao, Wei <wei.liao@intel.com>
Signed-off-by: wliao2 <wei.liao@intel.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-31 22:32:54 +08:00
wang.yuqi
719735d6c5 [CI Failure] pin colmodernvbert revision (#38612)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-03-31 10:54:54 +00:00
wang.yuqi
d9d21eb8e3 [Frontend][3/n] Improve pooling entrypoints | scoring. (#28631)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-31 07:52:00 +00:00
Micah Williamson
d9c7db18da [ROCm][CI] Pin test_hybrid test to TRITON_ATTN on ROCm (#38381)
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
2026-03-30 20:26:46 +00:00
Benjamin Chislett
494636b29d [Feat][Spec Decode] DFlash (#36847)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-03-30 15:03:15 -04:00
haosdent
a08b7733fd [CI] Fix SPLADE pooler test broken by #38139 (#38495)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-30 07:48:33 +00:00
haosdent
d39b8daf5f [Feature] Add Qwen3-ForcedAligner support via token classification pooling (#35367)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-29 00:27:52 +00:00
haosdent
b2bc736b12 [CI] Fix Ernie4.5-VL initialization test (#38429)
Signed-off-by: haosdent <haosdent@gmail.com>
2026-03-28 22:43:24 +08:00
Nicolò Lucchesi
44a6528028 [CI] Skip failing test (#38369)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-03-27 13:25:19 -07:00
Divakar Verma
b9dbc5c4ab [Mamba][APC] Add test case to compare apc outputs (#34977)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
2026-03-26 16:40:35 +00:00
wang.yuqi
dcdc145893 [CI] Reorganize scoring tests (#38207)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-26 12:07:01 +00:00
Ekagra Ranjan
7b54f60db0 [Cohere] Enable Cohere-Transcribe (#38120)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-25 16:13:51 -07:00
Cyrus Leung
ba2f0acc2d [Misc] Reorganize inputs (#35182)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-25 10:22:54 -07:00
Harry Mellor
d215d1efca [Mypy] Better fixes for the mypy issues in vllm/config (#37902)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-25 06:14:43 -07:00
Nick Cao
935c46dd9b [Model] Add Granite 4.0 1B speech to supported models (#38019)
Signed-off-by: Nick Cao <ncao@redhat.com>
2026-03-24 18:23:41 +00:00
Lasha Koroshinadze
e7767eccae Fix AudioFlamingo3/MusicFlamingo HF parity and RoTE handling (#37643)
Signed-off-by: Lasha <26011196+lashahub@users.noreply.github.com>
2026-03-23 10:29:07 +08:00
Andreas Karatzas
c862481c02 [CI] Skip ISAAC multimodal tests due to broken upstream HF model weights (#37781)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-22 13:23:32 +08:00
Andreas Karatzas
0d50fa1db6 [ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-21 12:57:25 +08:00
Isotr0py
c7f98b4d0a [Frontend] Remove librosa from audio dependency (#37058)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-21 11:36:15 +08:00
Andreas Karatzas
fb4e8bf442 [ROCm][CI] Fix accuracy for llama-nemotron-vl pooling tests (#37613)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 10:16:59 -07:00
Ilya Boytsov
8b6c6b9505 [Model] Add LFM2-ColBERT-350M support (#37528)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
2026-03-20 14:57:57 +00:00
Harry Mellor
9f6d9dd371 Fix attribute error in isaac_patch_hf_runner (#37685)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-20 14:49:40 +00:00
Andreas Karatzas
5a4a179591 [ROCm][CI] Fix granite_speech test for gfx90a by selecting compatible attention backend (#37611)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-20 17:07:26 +08:00
Cyrus Leung
765e461065 [Bugfix] Fix Nemotron Parse loading (#37407)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-19 09:55:29 +00:00
Cyrus Leung
99267c23ca [2/3] Refactor InternVL-based processors (#37324)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-18 22:22:19 +08:00
Andreas Karatzas
eaf7c9b976 [CI] Fix PaddleOCR-VL HF test failure due to create_causal_mask API rename (#37328)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-18 09:44:12 +00:00
Athrael Soju
c0745a851a [Model] Add ColQwen3.5 4.5B support (#36887)
Signed-off-by: Athrael Soju <athrael.soju@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-17 21:17:02 +00:00
Ekagra Ranjan
b5ca9c3557 [Models] Cohere ASR (#35809)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2026-03-17 21:04:17 +00:00
Cyrus Leung
51f0acda79 [Model] Remove unused handle_oov_mm_token (#37321)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 19:44:52 +00:00
Isotr0py
a836524d20 [Chore] Replace all base64 usages with faster pybase64 package (#37290)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-17 14:44:19 +00:00
Harry Mellor
ecfcdd2ce4 Fix Phi3 test that fails with Transformers v5 (#37298)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-17 14:29:24 +00:00
Cyrus Leung
f340324335 [1/2] Move InternVL-based processors (#37260)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-17 21:50:56 +08:00
EdalatiAli
e5b807607c [Quant][Feature] Support online MXFP8 quantization for MoE and dense models (#35448)
Signed-off-by: EdalatiAli <aliedalati@cohere.com>
2026-03-16 18:07:39 -04:00
Raushan Turganbay
55e6d3d5c0 [Bugfix] Make siglip/clip compatible with transformers v5 (#37200)
Signed-off-by: raushan <raushan@huggingface.co>
2026-03-16 16:48:18 +00:00
Isotr0py
912fbe9555 [Bugfix] Fix Qwen2.5-Omni/Qwen3-Omni use_audio_in_video with multi-video inputs (#37147)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-16 08:56:06 +00:00
bigshanedogg
2390d44209 [Model] Add HyperCLOVAX-SEED-Think-14B language model support (#37107)
Signed-off-by: bigshanedogg <bigshane319@gmail.com>
2026-03-16 06:40:05 +00:00
Andreas Karatzas
d4c57863f7 [ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test (#37138)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-03-16 04:49:31 +00:00
Harry Mellor
0005d2a3c9 Use Transformers v5 WeightRenaming for Transformers modeling backend (#31545)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-13 20:49:08 +00:00
Isotr0py
abf61aaa8e [Bugfix] Fix Qwen2.5-omni/Qwen3-omni mm_processor cache for audio_in_video request (#36800)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-03-13 18:16:05 +00:00
whyiug
1ce13cf992 [Model] Add support for BERT-like Chinese ERNIE pooling models (#36385)
Signed-off-by: whyiug <whyiug@hotmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-13 03:23:53 +00:00
Nikita
10f08dedfa [Model] Add ColPali late interaction model for multi-modal retrieval (#36818)
Signed-off-by: Nikita Sukharev <kaonael@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2026-03-13 02:18:57 +00:00
Marc Sun
c973ecdead [bnb] Skip moe + bnb test (#36896)
Signed-off-by: Marc Sun <marc@huggingface.co>
2026-03-12 18:03:25 +00:00
Kunshang Ji
53ec16a705 [Hardware] Replace torch.cuda.device_count/current_device/set_device API (#36145)
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
2026-03-12 07:57:47 -07:00
István Ketykó
00726c74c9 [Bugfix][Model] Fix DeepSeek-OCR TensorSchema crash on empty images_crop (#36670)
Signed-off-by: István Ketykó <istvan.ketyko@gmail.com>
2026-03-12 15:35:54 +08:00
wang.yuqi
6ecabe4936 [CI Failure] Fix Language Models Test (Extended Pooling) daily CI Failure (#36761)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
2026-03-12 12:22:05 +08:00
Harry Mellor
65986db6ba Make Gemma and Gemma 2 accept inputs_embeds like Gemma 3 (#36787)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 18:12:43 +00:00
Harry Mellor
5efa206a8c Fix ExaoneMoeMTP test that never ran in Transformers v4 (#36792)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
2026-03-11 17:10:23 +00:00
Cyrus Leung
196802dfa6 [Misc] Clean up renderers (#36770)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-03-11 16:39:29 +00:00