Commit Graph

12630 Commits

Author SHA1 Message Date
Andreas Karatzas
f2b6dfd237 [ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-05 02:17:05 +00:00
Andreas Karatzas
89f1f25310 [CI] Skip Phi-MoE test due to old API util (#31632)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-05 08:52:07 +08:00
Nick Hill
b53b89fdb3 [BugFix] Async scheduling: handle model forward errors more cleanly (#31611)
Signed-off-by: njhill <nickhill123@gmail.com>
2026-01-04 11:04:37 -08:00
Ning Xie
6522721d17 [misc] Sort uvicorn log level description according to verbosity (#31137)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
2026-01-04 18:45:37 +00:00
Yuxuan Zhang
0d4044edd8 fix no think of GLM-4.5 / GLM-4.7 (#31449)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
2026-01-04 11:43:00 +08:00
Reagan Lee
41ab179738 [Docs] Fix argparse include path for mm-processor benchmark (#31654)
Signed-off-by: Reagan <reaganjlee@gmail.com>
2026-01-04 03:31:29 +00:00
Robert Shaw
268b1c55ad [MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
2026-01-03 12:26:36 -08:00
Andreas Karatzas
4f9ce35afe [CI][Bugfix] Fix token counting in chunked prefill compl test (#31630)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-03 14:28:49 +08:00
jeremyteboul
97a01308e9 Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
2026-01-03 04:31:09 +00:00
Xingyu Liu
0eee877f67 [Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
2026-01-02 15:13:15 -08:00
Alfred
a0e9ee83c7 [Benchmark] Fix OOM during MoE kernel tuning for large models (#31604)
Signed-off-by: Alfred <massif0601@gmail.com>
2026-01-02 22:24:51 +00:00
Yongye Zhu
a3f2f40947 [MoE Refactor] Explicit construct mk for flashinfer bf16 kernel (#31504)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-01-02 13:54:50 -08:00
Yongye Zhu
5a468ff7c7 [MoE Refactor] Split invoke_fused_moe_kernel (#31050)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-01-02 13:47:15 -08:00
Andreas Karatzas
6ef770df7c [MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs (#31596)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-02 15:46:23 +00:00
Nick Hill
bd877162eb [BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
2026-01-02 23:36:38 +08:00
Xinyu Chen
08f425bad1 CustomOp: test forward dispatch for grouped_topk (#31530)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
2026-01-02 10:04:01 -05:00
labAxiaoming
a01f2faedf Add multimodal input method in the documentation (#31601)
Signed-off-by: xiaoming <1259730330@qq.com>
2026-01-02 12:43:30 +00:00
Kyuyeun Kim
cc410e8644 [Bugfix] Fix weight_loader v1 block scale (#31103)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
2026-01-02 13:14:10 +08:00
Kevin McKay
825c2dc133 [Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2026-01-01 21:14:00 -08:00
Vaibhav Sourirajan
1f43c121d5 Remove unused use_marlin variable in Mxfp4MoEMethod (#31549)
Signed-off-by: vaibhav sourirajan <vs2787@columbia.edu>
2026-01-01 21:13:36 -08:00
Tmn07
ca179d0f64 [Bugfix] Fix activation quantization for compressed-tensors W4A16 (#31572)
Signed-off-by: Tmn07 <tmn0796@gmail.com>
2026-01-01 21:13:22 -08:00
Andreas Karatzas
013b54088c [ROCm][CI] Fix ModernBERT token classification test (#31612)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-02 04:19:08 +00:00
Jay Hemnani
5ac55eb30f [Model] Enable LoRA support for tower and connector in LLaVA (#31513)
Signed-off-by: Jay Hemnani <jayhemnani9910@gmail.com>
Co-authored-by: Jay Hemnani <jayhemnani9910@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-01-01 19:32:39 -08:00
Benjamin Chislett
ea53ca5e85 [Bugfix] Fix block size used in EAGLE slot mapping (#31540)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
2026-01-01 19:32:30 -08:00
zhima771
27864a851c feat: support LoRA for DeepSeek-OCR(Language Model part) (#31569)
Signed-off-by: zhima771 <15836938703@163.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-01 19:32:11 -08:00
Andreas Karatzas
5cc4876630 [ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-01 19:29:42 -08:00
Kevin McKay
5fff44064b [Bugfix] Replace BaseException with specific exceptions in FLA utils (#31590)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
2026-01-01 19:27:54 -08:00
Reagan Lee
1f5b7c41c3 Add Multimodal Processor Benchmark (#29105)
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan <reaganjlee@gmail.com>
2026-01-01 19:26:53 -08:00
Ekagra Ranjan
adcf682fc7 [Audio] Improve Audio Inference Scripts (offline/online) (#29279)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
2025-12-31 23:34:18 +00:00
Andreas Karatzas
21de6d4b02 [CI][Bugfix] Fix token counting in chunked prefill streaming test (#31565)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-31 23:05:14 +00:00
Nick Hill
6c2cfb62ff [BugFix] Fix async scheduling for pooling models (#31584)
Signed-off-by: njhill <nickhill123@gmail.com>
2025-12-31 14:48:51 -08:00
Fanjiang Ye
d8da76f3b7 [Bugfix] Fix BAGEL online serving for text and image understanding (#31546)
Signed-off-by: Dylan1229 <yvanphys@gmail.com>
Signed-off-by: UED <zxr3611244710@gmail.com>
Signed-off-by: mr-ye-cao <yecaoyc2019@gmail.com>
Co-authored-by: UED <zxr3611244710@gmail.com>
Co-authored-by: mr-ye-cao <yecaoyc2019@gmail.com>
Co-authored-by: Mr-Ye-Cao <60802056+Mr-Ye-Cao@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-31 14:46:10 -08:00
baonudesifeizhai
d722e9e614 Add GLM-ASR multimodal support (#31436)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-12-31 23:12:24 +08:00
Andreas Karatzas
cf16342d43 [ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing (#31551)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2025-12-31 00:12:01 -08:00
Wentao Ye
357d435c54 [Bug] Fix log issue with \n (#31390)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2025-12-30 21:16:55 -08:00
danisereb
108a2728f7 Add get_expert_mapping to NemotronHModel (for LoRA support) (#31539)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
2025-12-30 21:09:03 -08:00
TJian
578c8f51f6 [CI] [Critical] [CUDA] Fix duplicated test name (#31562)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2025-12-30 21:01:09 -08:00
maang-h
b4bb5f312f [Core] Remove unused num_tokens parameter from _init_model_kwargs (#31517)
Signed-off-by: maang <maang_h@163.com>
2025-12-30 20:47:23 -08:00
SameerAsal
70e1acefcd [BugFix] Fix NUMA node validation in CPU platform (#31520)
Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com>
Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com>
2025-12-31 04:06:49 +00:00
Qiu
84f6cd741b [Mics] add pcp basic support to MoE model (#31003) 2025-12-30 20:01:29 -08:00
B-201
ecd49ce7e6 [Fix] Align fused moe lora_b shape with peft (#31534)
Signed-off-by: bk-201 <joy25810@foxmail.com>
2025-12-31 09:44:59 +08:00
Amr Mahdi
e1ee11b2a5 Add docker buildx bake configuration (#31477)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
2025-12-31 01:08:54 +00:00
vintipandey
04147dcfa7 [Bugfix]Fix pooling model always disabled due to incorrect PP rank check (#31505)
Signed-off-by: vintipandey <vinti.pandey@gmail.com>
2025-12-30 11:27:10 -08:00
JartX
07728bf5cd [BugFix] add select_gemm_impl on CompressedTensorsWNA16MoEMethod to support LoRA (#31453)
Signed-off-by: JartX <sagformas@epdcenter.es>
2025-12-30 11:20:15 -08:00
yt0428
3f52fa5aa2 [Model] Add support for openPangu moe model (#28775)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
2025-12-30 08:11:38 -08:00
Li, Jiang
7157596103 [CPU] Disable async schedule on CPU (#31525)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
2025-12-30 12:34:08 +00:00
Nicolò Lucchesi
ab1af6aa3e [CI][NIXL] Split DPEP tests (#31491)
Signed-off-by: NickLucche <nlucches@redhat.com>
2025-12-30 07:26:12 -05:00
Pleaplusone
1a834df2d4 [ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled (#31523)
Signed-off-by: ganyi <ygan@amd.com>
2025-12-30 09:21:49 +00:00
Kevin
51085c2aeb [Frontend] add continue_final_message parameter to /embeddings endpoint (#31497)
Signed-off-by: Kevin P-W <140451262+kevin-pw@users.noreply.github.com>
2025-12-30 07:21:13 +00:00
Roger Feng
3d973764ce [xpu] [bugfix] upgrade to latest oneccl in dockerfile (#31522)
Signed-off-by: roger feng <roger.feng@intel.com>
2025-12-30 14:52:28 +08:00