Commit Graph

13306 Commits

Author SHA1 Message Date
wangln19
2d7053438a fix: preserve native tool call ID in multi-turn tool calling (#32768)
Signed-off-by: wanglinian <wanglinian@stu.pku.edu.cn>
Signed-off-by: wangln19 <96399074+wangln19@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <2037008807@qq.com>
2026-01-27 10:22:35 +08:00
Robert Shaw
5a93b9162b [MoE Refactor] Integrate Naive Prepare Finalize into MK (#32567)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: Amir Klein <203507526+amirkl94@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: amirkl94 <203507526+amirkl94@users.noreply.github.com>
2026-01-27 01:28:02 +00:00
Woosuk Kwon
6d86fde09c [Model Runner V2] Remove UvaBufferPool for cpu->gpu copy (#33055)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: Nick Hill <nhill@redhat.com>
2026-01-26 16:47:35 -08:00
XiongfeiWei
510ed1e8d3 [Bugfix][TPU] Return a Default fp8 MoE Backend (#32908)
Signed-off-by: Xiongfei Wei <isaacwxf23@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-01-26 18:46:11 -05:00
Pengchao Wang
8caffd92df [Bugfix][MXFP4] Call trtllm_fp4_block_scale_moe with kwargs (#33104)
Signed-off-by: Pengchao Wang <wpc@fb.com>
2026-01-26 15:13:18 -08:00
dolpm
58a05b0ca1 [fix] CPUDNNLGEMMHandler pointer baked into inductor artifact (#32913)
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
2026-01-26 16:59:44 -05:00
Jared Wen
6ee7f18f33 [Logging] add --disable-access-log-for-endpoints CLI option (#30011)
Add a new CLI option --disable-access-log-for-endpoints to suppress
uvicorn access logs for specified endpoints (e.g., /health, /metrics, /ping).

This addresses the need to reduce log noise in production environments
where health check endpoints are frequently polled by load balancers or
monitoring systems, generating excessive log entries that obscure
meaningful request logs.

Fixes #29982

Signed-off-by: JaredforReal <w13431838023@gmail.com>
2026-01-26 21:49:03 +00:00
Wentao Ye
8f987883cb [Refactor] Remove unused _moe_permute function (#33108)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
2026-01-26 16:06:45 -05:00
Kevin H. Luu
ebe0ba91db [ci] Sync test areas with test-pipeline.yaml and enable new pipeline generator (#33080)
Signed-off-by: Kevin H. Luu <khluu000@gmail.com>
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Kevin Luu <khluu@Kevins-MacBook-Pro.local>
2026-01-26 12:28:20 -08:00
Robert Shaw
43a013c3a2 [Bugfix] Fix Dtypes for Pynccl Wrapper (#33030)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-26 20:09:32 +00:00
Cyrus Leung
c25dbee40d [Model] Bump transformers version for test registry (#33100)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 18:53:22 +00:00
Nicolò Lucchesi
19ab0f7ce5 [Bugfix] Fix Voxtral streaming slot_mapping (#33073)
Signed-off-by: NickLucche <nlucches@redhat.com>
2026-01-26 10:40:40 -08:00
danielafrimi
67fe677c53 [FIX] Always support TP > 4 for FP4 Gemm (#31099)
Signed-off-by: dafrimi <dafrimi@nvidia.com>
Co-authored-by: root <root@gpu-51.slurm-workers-slurm.slurm.svc.cluster.local>
2026-01-26 11:04:20 -07:00
Andy Lo
d56afd45fd Remove unused logic in models/mistral.py (#33095)
Signed-off-by: Andy Lo <andy@mistral.ai>
2026-01-26 09:01:52 -08:00
Chauncey
a2393ed496 [CI] Fix AssertionError: MCP tool call not found in output_messages (#33093)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-26 15:19:57 +00:00
Pleaplusone
be6931ee27 [ROCm][Bugfix] Fix ptpc scale load issue for fused shared expert path in deepseek mtp (#33018)
Signed-off-by: ganyi <ygan@amd.com>
2026-01-26 23:19:04 +08:00
Chauncey
9ef3b718d9 [Bugfix] Fix Can't instantiate abstract class DeepseekV32IndexerBackend (#33052)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
2026-01-26 06:44:02 -08:00
Yuxuan Zhang
bb17e8f11c [GLM-OCR] GLM-OCR with MTP Support (#33005)
Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-26 06:24:43 -08:00
Cyrus Leung
dcd80206b7 [Chore] Update type annotation of input_ids in model forward (#33063)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 06:02:10 -08:00
danisereb
f4a0921c9c [Performance] Tune Mamba selective scan kernel for B200 (#32873)
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
2026-01-26 05:56:54 -08:00
VihaanThat
208c56256f [Feature] Add LoRA support for Gemma3 vision components (#32764) 2026-01-26 13:56:40 +00:00
Alex Brooks
9ac818a551 [Misc] HF Hub LoRA Resolver (#20320)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
2026-01-26 13:56:32 +00:00
Itay Etelis
6ca2c91b96 [Model] Use mm_position to compute mrope positions for Qwen3-Omni (#33010)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
2026-01-26 13:48:07 +00:00
cwazai
e33192b269 [lora/moe] Improve fused MoE‑LoRA kernel indexing and memory access (#32770)
Signed-off-by: 陈建华 <1647430658@qq.com>
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Signed-off-by: kimheesu <wlskaka4@gmail.com>
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Signed-off-by: ganyi <ygan@amd.com>
Signed-off-by: whx-sjtu <2952154980@qq.com>
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
Signed-off-by: Xin Yang <xyangx@amazon.com>
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>
Signed-off-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com>
Signed-off-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Huy Do <huydhn@gmail.com>
Signed-off-by: Micah Williamson <micah.williamson@amd.com>
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
Signed-off-by: Kebe <mail@kebe7jun.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Alex Sun <alex.s@amd.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Liran Schour <lirans@il.ibm.com>
Signed-off-by: liranschour <liranschour@users.noreply.github.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
Signed-off-by: Or Ozeri <oro@il.ibm.com>
Signed-off-by: Lucas Kabela <lucaskabela@meta.com>
Signed-off-by: Richard Zou <zou3519@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: AuYang <459461160@qq.com>
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>
Signed-off-by: Luka Govedič <lgovedic@redhat.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: Karan Bansal <karanb192@gmail.com>
Signed-off-by: jiang1.li <jiang1.li@intel.com>
Signed-off-by: Li, Jiang <bigpyj64@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>
Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
Signed-off-by: Matteo Fari <matteofari06@gmail.com>
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com>
Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com>
Signed-off-by: marksverdhei <marksverdhei@hotmail.com>
Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: jon <joninco@bullpoint.org>
Signed-off-by: dolpm <34420038+dolpm@users.noreply.github.com>
Signed-off-by: ElizaWszola <ewszola@redhat.com>
Signed-off-by: Luka Govedič <luka.govedic@gmail.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: mohammad najafi <mohammad.najafi@amd.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
Signed-off-by: esmeetu <jasonailu87@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Reagan <reaganjlee@gmail.com>
Signed-off-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Signed-off-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
Signed-off-by: cwazai <38356712+cwazai@users.noreply.github.com>
Co-authored-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Kim Hee Su <wlskaka4@gmail.com>
Co-authored-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Pleaplusone <ygan@amd.com>
Co-authored-by: whx <56632993+whx-sjtu@users.noreply.github.com>
Co-authored-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>
Co-authored-by: danisereb <daserebrenik@nvidia.com>
Co-authored-by: Yanan Cao <gmagogsfm@users.noreply.github.com>
Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>
Co-authored-by: Matt <156021403+mawong-amd@users.noreply.github.com>
Co-authored-by: knlnguyen1802 <knlnguyen1802@gmail.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Ifta khairul Alam Adil <25082512+ikaadil@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Huy Do <huydhn@gmail.com>
Co-authored-by: Micah Williamson <micah.williamson@amd.com>
Co-authored-by: Andreas Karatzas <akaratza@amd.com>
Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com>
Co-authored-by: Kebe <mail@kebe7jun.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Co-authored-by: Alex Sun <minchsun@amd.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: liranschour <liranschour@users.noreply.github.com>
Co-authored-by: Or Ozeri <or@ozery.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
Co-authored-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
Co-authored-by: Or Ozeri <oro@il.ibm.com>
Co-authored-by: Lucas Kabela <lucaskabela@meta.com>
Co-authored-by: Richard Zou <zou3519@users.noreply.github.com>
Co-authored-by: Maximilien de Bayser <maxdebayser@gmail.com>
Co-authored-by: Xu Jinyang <72930776+AuYang261@users.noreply.github.com>
Co-authored-by: Vadim Gimpelson <156319763+vadiklyutiy@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: David Ramon Prados <davidramon3@hotmail.es>
Co-authored-by: RickyChen / 陳昭儒 <ricky.chen@infinirc.com>
Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Fadi Arafeh <115173828+fadara01@users.noreply.github.com>
Co-authored-by: Eldar Kurtić <8884008+eldarkurtic@users.noreply.github.com>
Co-authored-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Rishabh Saini <rishabhsaini01@gmail.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Karan Bansal <karanb192@users.noreply.github.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: tianshu-Michael-yu <101950379+tianshu-Michael-yu@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
Co-authored-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com>
Co-authored-by: Mark McLoughlin <markmc@redhat.com>
Co-authored-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
Co-authored-by: Matteo Fari <matteofari06@gmail.com>
Co-authored-by: Harry Huang <vastrockhuang162@gmail.com>
Co-authored-by: Chen Zhang <zhangch99@outlook.com>
Co-authored-by: Orion Reblitz-Richardson <orionr@gmail.com>
Co-authored-by: Kevin H. Luu <khluu000@gmail.com>
Co-authored-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: rasmith <Randall.Smith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: joninco <joninco@bullpoint.org>
Co-authored-by: dolpm <34420038+dolpm@users.noreply.github.com>
Co-authored-by: ElizaWszola <ewszola@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <varunsundar08@gmail.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Joe Runde <Joseph.Runde@ibm.com>
Co-authored-by: monajafi-amd <mohammad.najafi@amd.com>
Co-authored-by: ruizcrp <ruiz.crp@gmail.com>
Co-authored-by: Shengqi Chen <i@harrychen.xyz>
Co-authored-by: 7. Sun <jhao.sun@gmail.com>
Co-authored-by: Roy Wang <jasonailu87@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Reagan Lee <96998476+reaganjlee@users.noreply.github.com>
Co-authored-by: Hiroken. <105287758+HirokenOvo@users.noreply.github.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: david guan <102001211+Chenhao-Guan@users.noreply.github.com>
Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Joshua Deng <91448271+joshuadeng@users.noreply.github.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
Co-authored-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-26 04:56:34 -08:00
Cyrus Leung
61274bdef5 [Doc] Further update multi-modal impl doc (#33065)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 10:54:20 +00:00
ltd0924
b40db4dfec [StepVL] add step vl offline example (#33054)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
2026-01-26 01:00:32 -08:00
Cyrus Leung
11b556878b [Refactor] Use data parser for matching data items to multi-modal UUIDs (#32955)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
2026-01-26 15:00:28 +08:00
Danielle Robinson
ee484b3f4b Set splitk=1 for fused-moe-lora expand kernel (#32882)
Signed-off-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Danielle Robinson <dmmaddix@amazon.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-25 22:52:34 -08:00
Woosuk Kwon
a9b53dd435 [Model Runner V2] Add LoRAState to consolidate lora logic (#33062)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-01-25 22:21:12 -08:00
Robert Shaw
254db42ede [Tests] Remove Duplicates (#33032)
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
2026-01-26 05:23:54 +00:00
ltd0924
105d104576 [StepVL] support close img patch (#32923)
Signed-off-by: luotingdan <luotingdan@stepfun.com>
Signed-off-by: ltd0924 <32387785+ltd0924@users.noreply.github.com>
Co-authored-by: luotingdan <luotingdan@stepfun.com>
2026-01-25 20:56:39 -08:00
Lucas Wilkinson
566cdb6cfb [CI] Fix MHA attention test failure (AttributeError when model_config is None in ViT attention backend) (#33033)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-25 19:49:53 -08:00
Woosuk Kwon
2f0d3ba745 [Model Runner V2] Minor simplification for finish_requests (#33048)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
2026-01-25 18:35:02 -08:00
Woosuk Kwon
edf927bc9f [Model Runner V2] Fix slot_mapping after #25954 (#33046)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
Co-authored-by: Woosuk Kwon <woosuk@inferact.ai>
2026-01-25 18:29:49 -08:00
Andreas Karatzas
22aeb43007 [Bugfix][VLM] Fix transformers backend embed_multimodal for Qwen2.5-VL profiling (#32969)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
2026-01-26 08:34:05 +08:00
Itay Etelis
a698e8e7ad [Model] Use mm_position to compute mrope positions for Qwen2.5-Omni (#32772)
Signed-off-by: Itay Etelis <itay.etelis@ibm.com>
Co-authored-by: Itay Etelis <itay.etelis@ibm.com>
2026-01-25 20:15:53 +08:00
zhanqiuhu
151e5451c2 [Doc] Add Qwen2.5 models to batch invariance tested models (#33016)
Signed-off-by: Zhanqiu Hu <zh338@cornell.edu>
2026-01-25 09:20:46 +00:00
Jee Jee Li
73b243463b [BugFix] Add env variable to control PDL in LoRA (#32836)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
2026-01-25 16:32:30 +08:00
JJJYmmm
7e67df5570 [Bugfix] fix encoder cache hang in Qwen3VL (#32684)
Signed-off-by: JJJYmmm <92386084+JJJYmmm@users.noreply.github.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
2026-01-25 05:17:31 +00:00
7. Sun
ff6c1da4e6 [Docs] Fix Apple silicon include path in CPU installation docs (#32977)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
2026-01-25 01:51:49 +00:00
Roberto L. Castro
fcb9df99bd [Perf][Kernel] Optimize FP4 quantization kernels (SM100F) (#32520)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
2026-01-24 18:45:27 -07:00
TJian
1ebdff412a [DOC] [ROCm] Update doc for v0.14.1 (#32998)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
2026-01-25 09:13:21 +08:00
Joshua Deng
91601ff478 [Feature] add session based streaming input support to v1 (#28973)
Signed-off-by: Joshua Deng <joshuakdeng@gmail.com>
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com>
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nick Hill <nickhill123@gmail.com>
2026-01-24 12:06:28 -08:00
yugong333
d4dbb7af63 Using max_loras + 1 to construct grid in fused_moe_lora (#32277)
Signed-off-by: Yu Gong <yu3.gong@gmail.com>
2026-01-24 12:39:30 -05:00
Maryam Tahhan
203d0bc0c2 [CPU] Improve CPU Docker build (#30953)
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com>
Co-authored-by: Li, Jiang <jiang1.li@intel.com>
2026-01-24 17:08:24 +00:00
Fadi Arafeh
17ab54de81 [CPU Backend][BugFix] Fix failing Darwin pipelines (#33002)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
2026-01-24 17:02:22 +00:00
7. Sun
cd775bdbe0 [Tests] Replace flaky sleep with polling in test_background_cancel (#32986)
Signed-off-by: 7. Sun <jhao.sun@gmail.com>
2026-01-24 16:39:07 +00:00
Lucas Wilkinson
da5e7b12be [MLA] Fuse cat and qaunt for fp8 kv-cache (#32950)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
2026-01-24 16:03:02 +00:00
Louie Tsai
719ac592ed Update CPU doc according to feedback (#32963)
Signed-off-by: Tsai, Louie <louie.tsai@intel.com>
Signed-off-by: Louie Tsai <louie.tsai@intel.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-01-24 16:02:44 +00:00
Hiroken.
1209b784f2 [Bugfix]: resolve torch.compile cache conflict between mm_encoder_tp_modes (#32842)
Signed-off-by: Hongjian Zhang <zhanghongjian@xiaohongshu.com>
Signed-off-by: Xingran Wang <wangxingran123456@outlook.com>
Co-authored-by: Xingran Wang <wangxingran123456@outlook.com>
2026-01-24 14:45:14 +00:00