Ekagra Ranjan
|
adcf682fc7
|
[Audio] Improve Audio Inference Scripts (offline/online) (#29279)
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>
|
2025-12-31 23:34:18 +00:00 |
|
Andreas Karatzas
|
21de6d4b02
|
[CI][Bugfix] Fix token counting in chunked prefill streaming test (#31565)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-31 23:05:14 +00:00 |
|
Nick Hill
|
6c2cfb62ff
|
[BugFix] Fix async scheduling for pooling models (#31584)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2025-12-31 14:48:51 -08:00 |
|
Fanjiang Ye
|
d8da76f3b7
|
[Bugfix] Fix BAGEL online serving for text and image understanding (#31546)
Signed-off-by: Dylan1229 <yvanphys@gmail.com>
Signed-off-by: UED <zxr3611244710@gmail.com>
Signed-off-by: mr-ye-cao <yecaoyc2019@gmail.com>
Co-authored-by: UED <zxr3611244710@gmail.com>
Co-authored-by: mr-ye-cao <yecaoyc2019@gmail.com>
Co-authored-by: Mr-Ye-Cao <60802056+Mr-Ye-Cao@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-31 14:46:10 -08:00 |
|
baonudesifeizhai
|
d722e9e614
|
Add GLM-ASR multimodal support (#31436)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-31 23:12:24 +08:00 |
|
Andreas Karatzas
|
cf16342d43
|
[ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing (#31551)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-31 00:12:01 -08:00 |
|
Wentao Ye
|
357d435c54
|
[Bug] Fix log issue with \n (#31390)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2025-12-30 21:16:55 -08:00 |
|
danisereb
|
108a2728f7
|
Add get_expert_mapping to NemotronHModel (for LoRA support) (#31539)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
|
2025-12-30 21:09:03 -08:00 |
|
TJian
|
578c8f51f6
|
[CI] [Critical] [CUDA] Fix duplicated test name (#31562)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-30 21:01:09 -08:00 |
|
maang-h
|
b4bb5f312f
|
[Core] Remove unused num_tokens parameter from _init_model_kwargs (#31517)
Signed-off-by: maang <maang_h@163.com>
|
2025-12-30 20:47:23 -08:00 |
|
SameerAsal
|
70e1acefcd
|
[BugFix] Fix NUMA node validation in CPU platform (#31520)
Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com>
Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com>
|
2025-12-31 04:06:49 +00:00 |
|
Qiu
|
84f6cd741b
|
[Mics] add pcp basic support to MoE model (#31003)
|
2025-12-30 20:01:29 -08:00 |
|
B-201
|
ecd49ce7e6
|
[Fix] Align fused moe lora_b shape with peft (#31534)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-31 09:44:59 +08:00 |
|
Amr Mahdi
|
e1ee11b2a5
|
Add docker buildx bake configuration (#31477)
Signed-off-by: Amr Mahdi <amrmahdi@meta.com>
|
2025-12-31 01:08:54 +00:00 |
|
vintipandey
|
04147dcfa7
|
[Bugfix]Fix pooling model always disabled due to incorrect PP rank check (#31505)
Signed-off-by: vintipandey <vinti.pandey@gmail.com>
|
2025-12-30 11:27:10 -08:00 |
|
JartX
|
07728bf5cd
|
[BugFix] add select_gemm_impl on CompressedTensorsWNA16MoEMethod to support LoRA (#31453)
Signed-off-by: JartX <sagformas@epdcenter.es>
|
2025-12-30 11:20:15 -08:00 |
|
yt0428
|
3f52fa5aa2
|
[Model] Add support for openPangu moe model (#28775)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-30 08:11:38 -08:00 |
|
Li, Jiang
|
7157596103
|
[CPU] Disable async schedule on CPU (#31525)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-30 12:34:08 +00:00 |
|
Nicolò Lucchesi
|
ab1af6aa3e
|
[CI][NIXL] Split DPEP tests (#31491)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-30 07:26:12 -05:00 |
|
Pleaplusone
|
1a834df2d4
|
[ROCm][Bugfix] Fix accuracy issue on fmoe when VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS enabled (#31523)
Signed-off-by: ganyi <ygan@amd.com>
|
2025-12-30 09:21:49 +00:00 |
|
Kevin
|
51085c2aeb
|
[Frontend] add continue_final_message parameter to /embeddings endpoint (#31497)
Signed-off-by: Kevin P-W <140451262+kevin-pw@users.noreply.github.com>
|
2025-12-30 07:21:13 +00:00 |
|
Roger Feng
|
3d973764ce
|
[xpu] [bugfix] upgrade to latest oneccl in dockerfile (#31522)
Signed-off-by: roger feng <roger.feng@intel.com>
|
2025-12-30 14:52:28 +08:00 |
|
Nick Hill
|
3b312fb792
|
[Minor] Various small code cleanups/simplifications (#31508)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2025-12-29 22:42:06 -08:00 |
|
ZT-AIA
|
f84bf7d79b
|
Add Loraconfig parameter to get_punica_wrapper function (#31408)
Signed-off-by: ZT-AIA <1028681969@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-29 22:27:31 -08:00 |
|
Roy Wang
|
99dcf5dcc5
|
Migrate meetups & sponsors [2/N] (#31500)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2025-12-30 04:26:15 +00:00 |
|
Hojin Yang
|
dc837bc23e
|
feat(frontend): add --default-chat-template-kwargs CLI argument (#31343)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
|
2025-12-30 03:38:47 +00:00 |
|
Nick Hill
|
e54ee3ea33
|
[Core] Deduplicate generate/encode logic in AsyncLLM (#31510)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2025-12-30 10:42:45 +08:00 |
|
wangln19
|
358bfd315c
|
fix: update kimi k2 tool parser logic (#31207)
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: Wang Linian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-30 10:01:58 +08:00 |
|
Sage
|
39512aba72
|
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>
|
2025-12-30 00:17:16 +00:00 |
|
qli88
|
0f35429a0c
|
[CI]Test Group 'NixlConnector PD accuracy tests' is fixed (#31460)
Signed-off-by: qli88 <qiang.li2@amd.com>
|
2025-12-29 23:48:56 +00:00 |
|
Alexei-V-Ivanov-AMD
|
d63b969675
|
[CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187)
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Signed-off-by: <>
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
|
2025-12-29 16:53:59 -05:00 |
|
Robert Shaw
|
56f516254c
|
[Bugfix][ROCm] Fix Static Quant Issue (#31502)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-29 13:27:55 -08:00 |
|
Robert Shaw
|
9152a30d8f
|
[MoE Refactor][12/N] Marlin Fp8 MoE Pure Function (#31499)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-29 13:27:00 -08:00 |
|
Nick Hill
|
c2ff33cc8c
|
[Core] Enable async scheduling by default (#27614)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2025-12-29 13:20:55 -07:00 |
|
chunxiaozheng
|
b12cb38398
|
implements register kv caches in lmcache connector (#31397)
Signed-off-by: idellzheng <idellzheng@tencent.com>
|
2025-12-29 11:13:42 -08:00 |
|
Roger Young
|
5bc664110f
|
Optimize QKNorm for MiniMax-M2/M2.1 (#31493)
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-12-29 16:30:18 +00:00 |
|
RickyChen / 陳昭儒
|
b3a2bdf1ac
|
[Feature] Add offline FastAPI documentation support for air-gapped environments (#30184)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-29 16:22:39 +00:00 |
|
Harry Mellor
|
e37e7349e6
|
Replace nn.ConvNd with vLLM's ConvNdLayer for Transformers modeling backend (#31498)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-29 16:20:01 +00:00 |
|
Roy Wang
|
b5d2d71d26
|
Migrate doc to website: Hardware Plugins (1/N) (#31496)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2025-12-29 15:55:20 +00:00 |
|
Harry Mellor
|
decc244767
|
[Docs] Use relative md links instead of absolute html links for cross referencing (#31494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-29 13:33:44 +00:00 |
|
amittell
|
9c884faa95
|
[Bugfix] Preserve tool call id/type/name in streaming finish chunk (#31438)
Signed-off-by: amittell <mittell@me.com>
Signed-off-by: Alex Mittell <mittell@me.com>
|
2025-12-29 21:10:52 +08:00 |
|
Chauncey
|
48d5ca4e8b
|
[CI] fix test_chat_truncation_content_not_null test (#31488)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-29 12:47:08 +00:00 |
|
twj
|
bf73a3e4d7
|
[Bugfix][Frontend] Fix Jina reranker multimodal input compatibility (#31445)
Signed-off-by: tianwenjing <tianwenjing@jfgenius.com>
Signed-off-by: twj <151701930+twjww@users.noreply.github.com>
Co-authored-by: tianwenjing <tianwenjing@jfgenius.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-29 01:13:18 -08:00 |
|
Andreas Karatzas
|
3ecfdc3776
|
[ROCm][GPTQ][Bugfix] Fix GPTQ GEMM kernel output zeroing race condition (#30719)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-29 01:13:14 -08:00 |
|
Andreas Karatzas
|
45c1ca1ca1
|
[ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-29 16:31:10 +09:00 |
|
Li, Jiang
|
17347daaa2
|
[CI/Build][CPU] Update CPU CI test cases (#31466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-29 14:17:52 +08:00 |
|
Mamy Ratsimbazafy
|
b9793e6a8c
|
Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 (#31407)
Signed-off-by: Mamy Ratsimbazafy <mamy_github@numforge.co>
|
2025-12-28 08:38:33 -08:00 |
|
Jzz1943
|
0b6b701050
|
[Model] Add tuned triton fused_moe configs for Qwen3Moe on B200 (#31448)
Signed-off-by: Zhongze Jiang <jiangzhongze.jzz@ant-intl.com>
|
2025-12-28 08:38:07 -08:00 |
|
Nick Hill
|
094fcce250
|
[BugFix] Re-fix async multimodal cpu tensor race condition (#31373)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2025-12-28 03:05:08 -08:00 |
|
Andreas Karatzas
|
573dd0e6f0
|
[ROCm] Migrate xgrammar to upstream release (#31327)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-28 00:08:29 -08:00 |
|