wangln19
|
358bfd315c
|
fix: update kimi k2 tool parser logic (#31207)
Signed-off-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Signed-off-by: Wang Linian <wanglinian@stu.pku.edu.cn>
Co-authored-by: wangln19 <wanglinian@dev.wanglinian.msh-dev.svc.cluster.local>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-30 10:01:58 +08:00 |
|
Sage
|
39512aba72
|
[Prefix Cache] Include lora_name in BlockStored event for deterministic KV-cache reconstruction (#27577)
Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>
Co-authored-by: Sage <80211083+sagiahrac@users.noreply.github.com>
|
2025-12-30 00:17:16 +00:00 |
|
qli88
|
0f35429a0c
|
[CI]Test Group 'NixlConnector PD accuracy tests' is fixed (#31460)
Signed-off-by: qli88 <qiang.li2@amd.com>
|
2025-12-29 23:48:56 +00:00 |
|
Alexei-V-Ivanov-AMD
|
d63b969675
|
[CI/ROCm] Fixing "V1 Test attention (H100)" test group. (#31187)
Signed-off-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Signed-off-by: <>
Co-authored-by: DCCS-4560 <alivanov@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
Co-authored-by: root <root@chi-mi325x-pod1-108.ord.vultr.cpe.ice.amd.com>
|
2025-12-29 16:53:59 -05:00 |
|
Robert Shaw
|
56f516254c
|
[Bugfix][ROCm] Fix Static Quant Issue (#31502)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2025-12-29 13:27:55 -08:00 |
|
Robert Shaw
|
9152a30d8f
|
[MoE Refactor][12/N] Marlin Fp8 MoE Pure Function (#31499)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-29 13:27:00 -08:00 |
|
Nick Hill
|
c2ff33cc8c
|
[Core] Enable async scheduling by default (#27614)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2025-12-29 13:20:55 -07:00 |
|
chunxiaozheng
|
b12cb38398
|
implements register kv caches in lmcache connector (#31397)
Signed-off-by: idellzheng <idellzheng@tencent.com>
|
2025-12-29 11:13:42 -08:00 |
|
Roger Young
|
5bc664110f
|
Optimize QKNorm for MiniMax-M2/M2.1 (#31493)
Signed-off-by: xuebi <xuebi@minimaxi.com>
Co-authored-by: xuebi <xuebi@minimaxi.com>
|
2025-12-29 16:30:18 +00:00 |
|
RickyChen / 陳昭儒
|
b3a2bdf1ac
|
[Feature] Add offline FastAPI documentation support for air-gapped environments (#30184)
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-29 16:22:39 +00:00 |
|
Harry Mellor
|
e37e7349e6
|
Replace nn.ConvNd with vLLM's ConvNdLayer for Transformers modeling backend (#31498)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-29 16:20:01 +00:00 |
|
Roy Wang
|
b5d2d71d26
|
Migrate doc to website: Hardware Plugins (1/N) (#31496)
Signed-off-by: esmeetu <jasonailu87@gmail.com>
|
2025-12-29 15:55:20 +00:00 |
|
Harry Mellor
|
decc244767
|
[Docs] Use relative md links instead of absolute html links for cross referencing (#31494)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-29 13:33:44 +00:00 |
|
amittell
|
9c884faa95
|
[Bugfix] Preserve tool call id/type/name in streaming finish chunk (#31438)
Signed-off-by: amittell <mittell@me.com>
Signed-off-by: Alex Mittell <mittell@me.com>
|
2025-12-29 21:10:52 +08:00 |
|
Chauncey
|
48d5ca4e8b
|
[CI] fix test_chat_truncation_content_not_null test (#31488)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-12-29 12:47:08 +00:00 |
|
twj
|
bf73a3e4d7
|
[Bugfix][Frontend] Fix Jina reranker multimodal input compatibility (#31445)
Signed-off-by: tianwenjing <tianwenjing@jfgenius.com>
Signed-off-by: twj <151701930+twjww@users.noreply.github.com>
Co-authored-by: tianwenjing <tianwenjing@jfgenius.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-29 01:13:18 -08:00 |
|
Andreas Karatzas
|
3ecfdc3776
|
[ROCm][GPTQ][Bugfix] Fix GPTQ GEMM kernel output zeroing race condition (#30719)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-29 01:13:14 -08:00 |
|
Andreas Karatzas
|
45c1ca1ca1
|
[ROCm][CI] Skip DeepGemm-dependent test on ROCm platform (#31462)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-29 16:31:10 +09:00 |
|
Li, Jiang
|
17347daaa2
|
[CI/Build][CPU] Update CPU CI test cases (#31466)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-12-29 14:17:52 +08:00 |
|
Mamy Ratsimbazafy
|
b9793e6a8c
|
Add Fused MoE Triton kernels for GLM-4.5-Air, GLM-4.5v, GLM-4.6v on 2x RTX Pro 6000 (#31407)
Signed-off-by: Mamy Ratsimbazafy <mamy_github@numforge.co>
|
2025-12-28 08:38:33 -08:00 |
|
Jzz1943
|
0b6b701050
|
[Model] Add tuned triton fused_moe configs for Qwen3Moe on B200 (#31448)
Signed-off-by: Zhongze Jiang <jiangzhongze.jzz@ant-intl.com>
|
2025-12-28 08:38:07 -08:00 |
|
Nick Hill
|
094fcce250
|
[BugFix] Re-fix async multimodal cpu tensor race condition (#31373)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: njhill <nickhill123@gmail.com>
|
2025-12-28 03:05:08 -08:00 |
|
Andreas Karatzas
|
573dd0e6f0
|
[ROCm] Migrate xgrammar to upstream release (#31327)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-28 00:08:29 -08:00 |
|
Andreas Karatzas
|
f70368867e
|
[ROCm][CI] Add TorchCodec source build for transcription tests (#31323)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-28 16:06:05 +08:00 |
|
Andreas Karatzas
|
96142f2094
|
[ROCm][CI] Added perceptron lib in requirements for isaac multi-modal test (#31441)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-28 04:15:14 +00:00 |
|
Boyuan Feng
|
62def07d67
|
[BugFix] register quant scale tensors as buffer (#31395)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-28 11:20:02 +08:00 |
|
yitingdc
|
b326598e97
|
add tip for VLLM_USE_PRECOMPILED arg to reduce docker build time (#31385)
Signed-off-by: yiting.jiang <yiting.jiang@daocloud.io>
|
2025-12-28 03:19:47 +00:00 |
|
Robert Shaw
|
727c41f3fd
|
[MoE Refactor][10/N] Cleanup Fp8 Process Weights After Loading (#31169)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
|
2025-12-27 20:22:48 +00:00 |
|
Boyuan Feng
|
2f12cd32c0
|
[BugFix] Fix cache issue in compilation_config (#31376)
Signed-off-by: Boyuan Feng <boyuan@meta.com>
|
2025-12-27 09:30:39 -05:00 |
|
Isotr0py
|
40a8756224
|
[Chore]: Remove HF format Phi4-MM examples (#31405)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-27 13:42:02 +00:00 |
|
Isotr0py
|
3d024985ab
|
[CI/Build] Ignore max transformers version for more common tests (#31401)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-27 13:06:26 +00:00 |
|
baonudesifeizhai
|
8711b21676
|
Fix/get raw stream patch #30905 (#30912)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2025-12-26 20:08:47 -08:00 |
|
Yifan Qiao
|
52bf066516
|
[Core][Hybrid allocator + connector] Support hybrid allocator + kv cache connector (#30166)
Signed-off-by: Yifan Qiao <yifanqiao@berkeley.edu>
Co-authored-by: KuntaiDu <kuntai@uchicago.edu>
|
2025-12-26 18:25:46 -08:00 |
|
Kunshang Ji
|
5326c89803
|
[XPU][CI]skip test_preprocess_error_handling due to fork/spawn issue (#31381)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2025-12-26 21:40:44 +00:00 |
|
Xinyu Chen
|
87f1b8ca2c
|
CustomOp: Unify aiter impl into GroupedTopk (#31221)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2025-12-26 12:44:29 -05:00 |
|
rongfu.leng
|
887e900b77
|
[Docs] Add profiler user docs for http request (#31370)
Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>
|
2025-12-26 23:48:15 +08:00 |
|
Patrick von Platen
|
48e744976c
|
[Mistral common] Ensure all functions are imported from the top & only use public methods (#31138)
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-26 04:48:24 -08:00 |
|
Jee Jee Li
|
ce1eafd1a5
|
[Core] Initialize LoRA support for tower and connector in multi-modal models (#26674)
Signed-off-by: bk-201 <joy25810@foxmail.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: bk-201 <joy25810@foxmail.com>
Co-authored-by: prashanth058 <prashanth.dannamaneni@uipath.com>
Co-authored-by: Anexdeus <5142168@mail.ru>
|
2025-12-26 04:48:20 -08:00 |
|
Harry Mellor
|
0b544e6476
|
[Docs] Fix some snippets (#31378)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-26 12:47:41 +00:00 |
|
Jee Jee Li
|
c3666f56fd
|
[Misc] Fix Qwen2-MoE shared_expert_gate (#31339)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-26 05:10:39 +00:00 |
|
Andreas Karatzas
|
c79dbfa9ad
|
[CI] Fix flaky vision beam search test with flexible semantic validation (#31324)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-26 04:39:32 +00:00 |
|
Shinichi Hemmi
|
9ee05cbe7f
|
Support LoRA and GPTQModel for PLaMo 2/3 (#31322)
Signed-off-by: Shinichi Hemmi <50256998+Alnusjaponica@users.noreply.github.com>
|
2025-12-26 11:41:33 +08:00 |
|
Ning Xie
|
3b8f31b362
|
[benchmark] use model card root instead of id (#31329)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-12-26 10:55:56 +08:00 |
|
Isotr0py
|
2cd94259c8
|
[CI/Build] Ignore max transformers version skipping for initialization tests (#30619)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-26 10:50:32 +08:00 |
|
oscardev256
|
b7165d53c6
|
Feature/isaac 0.1 (#28367)
Signed-off-by: oscardev256 <42308241+oscardev256@users.noreply.github.com>
Signed-off-by: Oscar Gonzalez <ogonzal6@alumni.jh.edu>
Signed-off-by: Yang <lymailforjob@gmail.com>
Co-authored-by: Yang <lymailforjob@gmail.com>
|
2025-12-25 18:49:11 -08:00 |
|
Nick Hill
|
81786c8774
|
[BugFix] Fix async scheduling + reasoning with struct output (#31332)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2025-12-25 23:01:02 +00:00 |
|
Stan Wozniak
|
f1531d9f2a
|
[Hybrid] Mamba2 prefix cache blocks freeing for running requests (#28047)
Signed-off-by: Stanislaw Wozniak <stw@zurich.ibm.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
|
2025-12-25 20:54:06 +00:00 |
|
SongHe
|
2d6001f491
|
[Model][Ernie4.5-VL] Support video metadata for timestamp rendering (#31274)
Signed-off-by: dengsonghe <dengsonghe@baidu.com>
Co-authored-by: dengsonghe <dengsonghe@baidu.com>
|
2025-12-25 14:07:15 +00:00 |
|
Amir Samani
|
030fc44914
|
use the same stream for cuda graph catpure and replay for NCCL (#29207)
Signed-off-by: Amir Samani <asamani@nvidia.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-25 19:10:03 +08:00 |
|
Isotr0py
|
2532f437ee
|
[Doc] Add troubleshooting for Triton PTX error about undefined gpu-name (#31338)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-25 02:26:34 -08:00 |
|