Andreas Karatzas
|
f2b6dfd237
|
[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-05 02:17:05 +00:00 |
|
Andreas Karatzas
|
89f1f25310
|
[CI] Skip Phi-MoE test due to old API util (#31632)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-05 08:52:07 +08:00 |
|
Nick Hill
|
b53b89fdb3
|
[BugFix] Async scheduling: handle model forward errors more cleanly (#31611)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-04 11:04:37 -08:00 |
|
Ning Xie
|
6522721d17
|
[misc] Sort uvicorn log level description according to verbosity (#31137)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-01-04 18:45:37 +00:00 |
|
Yuxuan Zhang
|
0d4044edd8
|
fix no think of GLM-4.5 / GLM-4.7 (#31449)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
|
2026-01-04 11:43:00 +08:00 |
|
Reagan Lee
|
41ab179738
|
[Docs] Fix argparse include path for mm-processor benchmark (#31654)
Signed-off-by: Reagan <reaganjlee@gmail.com>
|
2026-01-04 03:31:29 +00:00 |
|
Robert Shaw
|
268b1c55ad
|
[MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Robert Shaw <robertgshaw2@gmail.com>
Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
|
2026-01-03 12:26:36 -08:00 |
|
Andreas Karatzas
|
4f9ce35afe
|
[CI][Bugfix] Fix token counting in chunked prefill compl test (#31630)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-03 14:28:49 +08:00 |
|
jeremyteboul
|
97a01308e9
|
Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2026-01-03 04:31:09 +00:00 |
|
Xingyu Liu
|
0eee877f67
|
[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
|
2026-01-02 15:13:15 -08:00 |
|
Alfred
|
a0e9ee83c7
|
[Benchmark] Fix OOM during MoE kernel tuning for large models (#31604)
Signed-off-by: Alfred <massif0601@gmail.com>
|
2026-01-02 22:24:51 +00:00 |
|
Yongye Zhu
|
a3f2f40947
|
[MoE Refactor] Explicit construct mk for flashinfer bf16 kernel (#31504)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-02 13:54:50 -08:00 |
|
Yongye Zhu
|
5a468ff7c7
|
[MoE Refactor] Split invoke_fused_moe_kernel (#31050)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-01-02 13:47:15 -08:00 |
|
Andreas Karatzas
|
6ef770df7c
|
[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs (#31596)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-02 15:46:23 +00:00 |
|
Nick Hill
|
bd877162eb
|
[BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-02 23:36:38 +08:00 |
|
Xinyu Chen
|
08f425bad1
|
CustomOp: test forward dispatch for grouped_topk (#31530)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-01-02 10:04:01 -05:00 |
|
labAxiaoming
|
a01f2faedf
|
Add multimodal input method in the documentation (#31601)
Signed-off-by: xiaoming <1259730330@qq.com>
|
2026-01-02 12:43:30 +00:00 |
|
Kyuyeun Kim
|
cc410e8644
|
[Bugfix] Fix weight_loader v1 block scale (#31103)
Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>
|
2026-01-02 13:14:10 +08:00 |
|
Kevin McKay
|
825c2dc133
|
[Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2026-01-01 21:14:00 -08:00 |
|
Vaibhav Sourirajan
|
1f43c121d5
|
Remove unused use_marlin variable in Mxfp4MoEMethod (#31549)
Signed-off-by: vaibhav sourirajan <vs2787@columbia.edu>
|
2026-01-01 21:13:36 -08:00 |
|
Tmn07
|
ca179d0f64
|
[Bugfix] Fix activation quantization for compressed-tensors W4A16 (#31572)
Signed-off-by: Tmn07 <tmn0796@gmail.com>
|
2026-01-01 21:13:22 -08:00 |
|
Andreas Karatzas
|
013b54088c
|
[ROCm][CI] Fix ModernBERT token classification test (#31612)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-02 04:19:08 +00:00 |
|
Jay Hemnani
|
5ac55eb30f
|
[Model] Enable LoRA support for tower and connector in LLaVA (#31513)
Signed-off-by: Jay Hemnani <jayhemnani9910@gmail.com>
Co-authored-by: Jay Hemnani <jayhemnani9910@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-01 19:32:39 -08:00 |
|
Benjamin Chislett
|
ea53ca5e85
|
[Bugfix] Fix block size used in EAGLE slot mapping (#31540)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-01 19:32:30 -08:00 |
|
zhima771
|
27864a851c
|
feat: support LoRA for DeepSeek-OCR(Language Model part) (#31569)
Signed-off-by: zhima771 <15836938703@163.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2026-01-01 19:32:11 -08:00 |
|
Andreas Karatzas
|
5cc4876630
|
[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-01 19:29:42 -08:00 |
|
Kevin McKay
|
5fff44064b
|
[Bugfix] Replace BaseException with specific exceptions in FLA utils (#31590)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2026-01-01 19:27:54 -08:00 |
|
Reagan Lee
|
1f5b7c41c3
|
Add Multimodal Processor Benchmark (#29105)
Signed-off-by: Reagan Lee <reaganjlee@gmail.com>
Signed-off-by: Reagan <reaganjlee@gmail.com>
|
2026-01-01 19:26:53 -08:00 |
|
Ekagra Ranjan
|
adcf682fc7
|
[Audio] Improve Audio Inference Scripts (offline/online) (#29279)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2025-12-31 23:34:18 +00:00 |
|
Andreas Karatzas
|
21de6d4b02
|
[CI][Bugfix] Fix token counting in chunked prefill streaming test (#31565)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-31 23:05:14 +00:00 |
|
Nick Hill
|
6c2cfb62ff
|
[BugFix] Fix async scheduling for pooling models (#31584)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2025-12-31 14:48:51 -08:00 |
|
Fanjiang Ye
|
d8da76f3b7
|
[Bugfix] Fix BAGEL online serving for text and image understanding (#31546)
Signed-off-by: Dylan1229 <yvanphys@gmail.com>
Signed-off-by: UED <zxr3611244710@gmail.com>
Signed-off-by: mr-ye-cao <yecaoyc2019@gmail.com>
Co-authored-by: UED <zxr3611244710@gmail.com>
Co-authored-by: mr-ye-cao <yecaoyc2019@gmail.com>
Co-authored-by: Mr-Ye-Cao <60802056+Mr-Ye-Cao@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-31 14:46:10 -08:00 |
|
baonudesifeizhai
|
d722e9e614
|
Add GLM-ASR multimodal support (#31436)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-31 23:12:24 +08:00 |
|
Andreas Karatzas
|
cf16342d43
|
[ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing (#31551)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-31 00:12:01 -08:00 |
|
Wentao Ye
|
357d435c54
|
[Bug] Fix log issue with \n (#31390)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-12-30 21:16:55 -08:00 |
|
danisereb
|
108a2728f7
|
Add get_expert_mapping to NemotronHModel (for LoRA support) (#31539)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2025-12-30 21:09:03 -08:00 |
|
TJian
|
578c8f51f6
|
[CI] [Critical] [CUDA] Fix duplicated test name (#31562)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-30 21:01:09 -08:00 |
|
maang-h
|
b4bb5f312f
|
[Core] Remove unused num_tokens parameter from _init_model_kwargs (#31517)
Signed-off-by: maang <maang_h@163.com>
|
2025-12-30 20:47:23 -08:00 |
|
SameerAsal
|
70e1acefcd
|
[BugFix] Fix NUMA node validation in CPU platform (#31520)
Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com>
Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com>
|
2025-12-31 04:06:49 +00:00 |
|
Qiu
|
84f6cd741b
|
[Mics] add pcp basic support to MoE model (#31003)
|
2025-12-30 20:01:29 -08:00 |
|
B-201
|
ecd49ce7e6
|
[Fix] Align fused moe lora_b shape with peft (#31534)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-31 09:44:59 +08:00 |
|
Amr Mahdi
|
e1ee11b2a5
|
Add docker buildx bake configuration (#31477)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-31 01:08:54 +00:00 |
|
vintipandey
|
04147dcfa7
|
[Bugfix]Fix pooling model always disabled due to incorrect PP rank check (#31505)
Signed-off-by: vintipandey <vinti.pandey@gmail.com>
|
2025-12-30 11:27:10 -08:00 |
|
JartX
|
07728bf5cd
|
[BugFix] add select_gemm_impl on CompressedTensorsWNA16MoEMethod to support LoRA (#31453)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-12-30 11:20:15 -08:00 |
|
yt0428
|
3f52fa5aa2
|
[Model] Add support for openPangu moe model (#28775)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-30 08:11:38 -08:00 |
|
Li, Jiang
|
7157596103
|
[CPU] Disable async schedule on CPU (#31525)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-30 12:34:08 +00:00 |
|
Nicolò Lucchesi
|
ab1af6aa3e
|
[CI][NIXL] Split DPEP tests (#31491)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-30 07:26:12 -05:00 |
|
Pleaplusone
|
1a834df2d4
|
[ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled (#31523)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-12-30 09:21:49 +00:00 |
|
Kevin
|
51085c2aeb
|
[Frontend] add continue_final_message parameter to /embeddings endpoint (#31497)
Signed-off-by: Kevin P-W <140451262+kevin-pw@users.noreply.github.com>
|
2025-12-30 07:21:13 +00:00 |
|
Roger Feng
|
3d973764ce
|
[xpu] [bugfix] upgrade to latest oneccl in dockerfile (#31522)
Signed-off-by: roger feng <roger.feng@intel.com>
|
2025-12-30 14:52:28 +08:00 |
|