Robert Shaw
|
af8fd73051
|
[MoE Refactor][14/N] Clean Up FI Quant Config Smuggling (#31593)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-06 15:47:04 +00:00 |
|
Robert Shaw
|
d3e477c013
|
[MoE Refactor] Add Temporary Integration Tests - H100/B200 (#31759)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2026-01-06 10:34:17 -05:00 |
|
wang.yuqi
|
96860af655
|
[Model] rename use_pad_token to use_sep_token (#31784)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-06 14:16:04 +00:00 |
|
Lucas Wilkinson
|
e0327c9db2
|
[Attention][1/n] Remove usage of deprecated seq_lens_cpu and num_computed_tokens_cpu CommonAttentionMetadata properties (#31773)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-01-06 04:05:17 -08:00 |
|
wang.yuqi
|
43d384bab4
|
[CI] Increase the MTEB_EMBED_TOL threshold to 5e-4. (#31797)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-01-06 19:30:05 +08:00 |
|
Isotr0py
|
ee2e69d6cd
|
[Bugfix][CI/Build] Fix failing pooling models test due to Triton kernel accuracy diff (#31776)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:44:22 -08:00 |
|
Kevin McKay
|
1fb0209bbc
|
[Bugfix][Hardware][AMD] Fix exception types in AITER MLA FP8 check (#31177)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-01-06 14:10:59 +08:00 |
|
John Calderon
|
2f4e6548ef
|
[Bugfix] vLLM produces invalid UTF-8 tokens and “�” (#28874)
Signed-off-by: John Calderon <jcalderon@nvidia.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-01-06 00:23:00 +00:00 |
|
Wentao Ye
|
af9a7ec255
|
[Bug] Revert torch warning fix (#31585)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-01-05 22:31:21 +00:00 |
|
Matthew Bonanni
|
276e03b92c
|
[CI][DeepSeek] Add nightly DeepSeek R1 lm_eval tests on H200 (#30356)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-05 17:17:59 -05:00 |
|
Nick Hill
|
32f4e4db00
|
[Cleanup] Remove deprecated fields from CachedRequestData class (#31734)
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-05 21:07:14 +00:00 |
|
amitz-nv
|
ee21291825
|
[Model] Nemotron Parse 1.1 Support (#30864)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-01-05 13:00:14 -08:00 |
|
Isotr0py
|
51e38a8e30
|
[Misc] Enable Paligemma's PrefixLM attention mask computation (#31725)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 03:31:49 +08:00 |
|
Or Ozeri
|
d8e38d4939
|
Triton Attention: Support cross-layers blocks (#30687)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-01-05 19:29:16 +00:00 |
|
Isotr0py
|
6aa5b18e1d
|
[v1] Add encoder-only/cross attention support to Triton Attention backend (#31406)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-06 00:00:23 +08:00 |
|
wang.yuqi
|
911d38ed99
|
[Model] Let more models to support the score template. (#31335)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-01-05 11:54:26 +00:00 |
|
wangxiyuan
|
bb4337b34c
|
[Platform] Deprecate seed_everything (#31659)
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
|
2026-01-04 18:34:04 -08:00 |
|
Isotr0py
|
367856de14
|
[CI/Build] Revive skipped reward models e2e test (#31665)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-01-05 02:33:46 +00:00 |
|
Andreas Karatzas
|
f2b6dfd237
|
[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-05 02:17:05 +00:00 |
|
Andreas Karatzas
|
89f1f25310
|
[CI] Skip Phi-MoE test due to old API util (#31632)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-05 08:52:07 +08:00 |
|
Andreas Karatzas
|
4f9ce35afe
|
[CI][Bugfix] Fix token counting in chunked prefill compl test (#31630)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-03 14:28:49 +08:00 |
|
jeremyteboul
|
97a01308e9
|
Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2026-01-03 04:31:09 +00:00 |
|
Xingyu Liu
|
0eee877f67
|
[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>
|
2026-01-02 15:13:15 -08:00 |
|
Nick Hill
|
bd877162eb
|
[BugFix] Support online dense model DP without overhead (#30739)
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2026-01-02 23:36:38 +08:00 |
|
Xinyu Chen
|
08f425bad1
|
CustomOp: test forward dispatch for grouped_topk (#31530)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-01-02 10:04:01 -05:00 |
|
Andreas Karatzas
|
013b54088c
|
[ROCm][CI] Fix ModernBERT token classification test (#31612)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-01-02 04:19:08 +00:00 |
|
Andreas Karatzas
|
21de6d4b02
|
[CI][Bugfix] Fix token counting in chunked prefill streaming test (#31565)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-31 23:05:14 +00:00 |
|
baonudesifeizhai
|
d722e9e614
|
Add GLM-ASR multimodal support (#31436)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-31 23:12:24 +08:00 |
|
Andreas Karatzas
|
cf16342d43
|
[ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing (#31551)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-31 00:12:01 -08:00 |
|
B-201
|
ecd49ce7e6
|
[Fix] Align fused moe lora_b shape with peft (#31534)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-12-31 09:44:59 +08:00 |
|
yt0428
|
3f52fa5aa2
|
[Model] Add support for openPangu moe model (#28775)
Signed-off-by: yuantao <2422264527@qq.com>
Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-30 08:11:38 -08:00 |
|
Nicolò Lucchesi
|
ab1af6aa3e
|
[CI][NIXL] Split DPEP tests (#31491)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-30 07:26:12 -05:00 |
|
ZT-AIA
|
f84bf7d79b
|
Add Loraconfig parameter to get_punica_wrapper function (#31408)
Signed-off-by: ZT-AIA <1028681969@qq.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-29 22:27:31 -08:00 |
|
Hojin Yang
|
dc837bc23e
|
feat(frontend): add --default-chat-template-kwargs CLI argument (#31343)
Signed-off-by: effortprogrammer <yhjhoward7@gmail.com>
|
2025-12-30 03:38:47 +00:00 |
|
wangln19
|
358bfd315c
|
fix: update kimi k2 tool parser logic (#31207)
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: Wang Linian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-30 10:01:58 +08:00 |
|
Sage
|
39512aba72
|
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>
|
2025-12-30 00:17:16 +00:00 |
|
Alexei-V-Ivanov-AMD
|
d63b969675
|
[CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187)
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Signed-off-by: <>
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
|
2025-12-29 16:53:59 -05:00 |
|
amittell
|
9c884faa95
|
[Bugfix] Preserve tool call id/type/name in streaming finish chunk (#31438)
Signed-off-by: amittell <mittell@me.com>
Signed-off-by: Alex Mittell <mittell@me.com>
|
2025-12-29 21:10:52 +08:00 |
|
Chauncey
|
48d5ca4e8b
|
[CI] fix test_chat_truncation_content_not_null test (#31488)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-29 12:47:08 +00:00 |
|
twj
|
bf73a3e4d7
|
[Bugfix][Frontend] Fix Jina reranker multimodal input compatibility (#31445)
Signed-off-by: tianwenjing <tianwenjing@jfgenius.com>
Signed-off-by: twj <151701930+twjww@users.noreply.github.com>
Co-authored-by: tianwenjing <tianwenjing@jfgenius.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-29 01:13:18 -08:00 |
|
Andreas Karatzas
|
45c1ca1ca1
|
[ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-29 16:31:10 +09:00 |
|
Boyuan Feng
|
2f12cd32c0
|
[BugFix] Fix cache issue in compilation_config (#31376)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-27 09:30:39 -05:00 |
|
Isotr0py
|
3d024985ab
|
[CI/Build] Ignore max transformers version for more common tests (#31401)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-27 13:06:26 +00:00 |
|
baonudesifeizhai
|
8711b21676
|
Fix/get raw stream patch #30905 (#30912)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-26 20:08:47 -08:00 |
|
Yifan Qiao
|
52bf066516
|
[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-12-26 18:25:46 -08:00 |
|
Kunshang Ji
|
5326c89803
|
[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-26 21:40:44 +00:00 |
|
Jee Jee Li
|
ce1eafd1a5
|
[Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
|
2025-12-26 04:48:20 -08:00 |
|
Andreas Karatzas
|
c79dbfa9ad
|
[CI] Fix flaky vision beam search test with flexible semantic validation (#31324)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-26 04:39:32 +00:00 |
|
Isotr0py
|
2cd94259c8
|
[CI/Build] Ignore max transformers version skipping for initialization tests (#30619)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-26 10:50:32 +08:00 |
|
oscardev256
|
b7165d53c6
|
Feature/isaac 0.1 (#28367)
Signed-off-by: oscardev256 <42308241+oscardev256@users.noreply.github.com>
Signed-off-by: Oscar Gonzalez <ogonzal6@alumni.jh.edu>
Signed-off-by: Yang <lymailforjob@gmail.com>
Co-authored-by: Yang <lymailforjob@gmail.com>
|
2025-12-25 18:49:11 -08:00 |
|