Congcong Chen
|
2c11a738b3
|
[Model] New model support for microsoft/Phi-4-mini-flash-reasoning (#20702)
Signed-off-by: Congcong Chen <congcongchen@microsoft.com>
|
2025-07-12 06:02:10 -07:00 |
|
Isotr0py
|
147afb448b
|
[Bugfix] Replace unavailable video url in multimodal test (#20854)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-12 05:25:39 +00:00 |
|
Isotr0py
|
01cae37713
|
[CI/Build] Ensure compatability with Transformers v4.53 (#20541)
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-07-11 20:53:07 -07:00 |
|
yurhett
|
11c0198615
|
[Bugfix] Fix tensor parallel issue in Qwen3 reranker weight loading (#20682)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
|
2025-07-11 20:52:43 -07:00 |
|
Trevor Morris
|
a8593237c0
|
Add pynccl all-gatherv and reducescatterv (#20154)
Signed-off-by: Trevor Morris <tmorris@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 18:59:23 -07:00 |
|
Ilya Markov
|
fc0f41d10a
|
Integration SM100 FlashInfer fused allreduce RMSNorm (#20691)
Signed-off-by: ilmarkov <imarkov@redhat.com>
Co-authored-by: ilmarkov <imarkov@redhat.com>
|
2025-07-11 18:58:15 -07:00 |
|
bigmoyan
|
5f0af36af5
|
Update kimi-k2 tool calling docs, enable unit tests (#20821)
Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-07-11 20:16:14 +00:00 |
|
Isotr0py
|
0d21b2664c
|
[Bugfix] Fix OOM in language generation test (#20814)
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-07-11 11:21:52 -07:00 |
|
Varun Sundar Rabindranath
|
53fa457391
|
[Misc] Add unit tests for MoE ModularKernel combinations + Profiling utility (#20449)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-11 07:51:46 -07:00 |
|
QiliangCui
|
b4f0b5f9aa
|
Temporarily suspend google/gemma-3-1b-it. (#20722)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-11 11:21:26 +00:00 |
|
Cyrus Leung
|
cbd14ed561
|
[Bugfix] Refactor /invocations to be task-agnostic (#20764)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-07-11 03:20:54 -07:00 |
|
Pavani Majety
|
7bd4c37ae7
|
[Core] Add Flashinfer TRTLLM Backend for Flashinfer decode path (SM100). (#19825)
Signed-off-by: Pavani Majety <pmajety@nvidia.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: shuw <shuw@nvidia.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 09:23:23 +00:00 |
|
Jee Jee Li
|
8020e98c9f
|
[Quantization][1/N] MoE support BNB-Inflight Quantization (#20061)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-11 08:01:13 +00:00 |
|
Luka Govedič
|
762be26a8e
|
[Bugfix] Upgrade depyf to 0.19 and streamline custom pass logging (#20777)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Signed-off-by: luka <lgovedic@redhat.com>
|
2025-07-11 00:15:22 -07:00 |
|
Luka Govedič
|
31d5c1797f
|
[Perf][fp8] Use CustomOp abstraction for fp8 quant for better perf (#19830)
Signed-off-by: Luka Govedic <lgovedic@redhat.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 04:56:28 +00:00 |
|
Wentao Ye
|
e2de455c34
|
[Feature] Integrate SM100 DeepGEMM support (#20087)
|
2025-07-10 20:18:05 -07:00 |
|
Alexander Matveev
|
5b032352cc
|
[Attention] MLA - Flashinfer Ragged Prefill (#20034)
|
2025-07-10 20:17:47 -07:00 |
|
Michael Goin
|
922f316441
|
[Model] Support HF format of minimax (#20211)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-07-11 02:55:21 +00:00 |
|
bigmoyan
|
0cf893cae1
|
Add kimi-k2 tool parser (#20789)
Signed-off-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@moonshot.cn>
Co-authored-by: wangzhengtao <wangzhengtao@msh.team>
|
2025-07-11 10:36:23 +08:00 |
|
Varun Sundar Rabindranath
|
f0c98cae27
|
[Misc] MoE ModularKernel : Introduce TopKWeightAndReduce (#20648)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 14:40:38 -07:00 |
|
Varun Sundar Rabindranath
|
fdadb6f43a
|
[Bugfix] Fused MoE Modular Kernel chunking loop (#20392)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-07-10 20:31:10 +00:00 |
|
Alex Brooks
|
41060c6e08
|
[Core] Add Support for Default Modality Specific LoRAs [generate / chat completions] (#19126)
Signed-off-by: Alex-Brooks <Alex.Brooks@ibm.com>
|
2025-07-10 21:09:37 +01:00 |
|
Nathan Hoos
|
d6902ce79f
|
[V0][V1][Core] Add outlines integration for V1, and update V0 integration. (#15975)
Signed-off-by: Nathan Hoos <thwackyy.y@gmail.com>
|
2025-07-10 15:30:26 -04:00 |
|
Sanger Steel
|
5e53c89a74
|
[Bugfix] [CI] Fix Tensorizer LoRA test (#20760)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-10 19:07:06 +00:00 |
|
shineran96
|
4bed167768
|
[Model][VLM] Support JinaVL Reranker (#20260)
Signed-off-by: shineran96 <shinewang96@gmail.com>
|
2025-07-10 10:43:43 -07:00 |
|
Asher
|
b140416abf
|
[Model] Add reason parser for Hunyuan A13B Model. (#20625)
Signed-off-by: Asher Zhang <asherszhang@tencent.com>
|
2025-07-10 16:33:26 +00:00 |
|
Isotr0py
|
77f77a951e
|
[Misc] Clean up mark to fork process in BNB tests (#20692)
Signed-off-by: Isotr0py <2037008807@qq.com>
|
2025-07-10 13:59:40 +00:00 |
|
Michael Goin
|
be1e128dfb
|
[CI Bugfix] Skip failing Tensorizer+LoRA test (#20724)
|
2025-07-10 21:15:03 +09:00 |
|
Jee Jee Li
|
7571a4a7e5
|
[CI/Build] Fix Basic Models Test (#20728)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-07-10 09:57:19 +00:00 |
|
Chauncey
|
8f2720def9
|
[Frontend] Support Tool Calling with both tool_choice='required' and $defs. (#20629)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-10 13:56:35 +08:00 |
|
Yiming
|
cd587c93ef
|
[BugFix]: Properly set engine_id when using multi connector (#19487)
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: leiyiming <leiyiming@kingsoft.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-07-09 20:32:44 +00:00 |
|
fxmarty-amd
|
332d4cb17b
|
[Feature][Quantization] MXFP4 support for MOE models (#17888)
Signed-off-by: Felix Marty <felmarty@amd.com>
Signed-off-by: Bowen Bao <bowenbao@amd.com>
Signed-off-by: Felix Marty <Felix.Marty@amd.com>
Co-authored-by: Bowen Bao <bowenbao@amd.com>
|
2025-07-09 13:19:02 -07:00 |
|
Tuan, Hoang-Trong
|
47043eb678
|
[Kernel] Triton implementation of causal-conv1d for Mamba-based models (#18218)
Signed-off-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tuan M. Hoang-Trong <tmhoangt@us.ibm.com>
Co-authored-by: Tyler Michael Smith <tysmith@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-07-09 12:53:55 -07:00 |
|
Li, Jiang
|
e59ba9e142
|
[CI/Build] Enlarge tolerance for a CPU multi-modal test (#20684)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-09 17:48:52 +00:00 |
|
Liangliang Ma
|
a3e4e85ece
|
[XPU][CI] enhance xpu test support (#20652)
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
|
2025-07-09 16:53:09 +00:00 |
|
Chengji Yao
|
eb58f5953d
|
[TPU][Bugfix] fix test_pallas (#20666)
Signed-off-by: Chengji Yao <chengjiyao@google.com>
|
2025-07-09 09:32:48 -07:00 |
|
Sanger Steel
|
4ac9c33f78
|
[Bugfix] Fix handling of Tensorizer arguments for LoadConfig (#20643)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
|
2025-07-09 15:36:37 +00:00 |
|
Chauncey
|
2155e95ef1
|
[Bugfix] Fix the issue where reasoning_content is None when Thinkng is enabled and tool_choice is set to 'required'. (#20662)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-09 07:39:58 +00:00 |
|
Dmitry Rogozhkin
|
e760fcef22
|
[XPU] Use spawn with XPU multiprocessing (#20649)
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
|
2025-07-09 00:34:28 -07:00 |
|
B-201
|
6bbf1795b7
|
[Misc] Fix the size of batched_dummy_mm_inputs in profile_run (#20434)
Signed-off-by: bk-201 <joy25810@foxmail.com>
|
2025-07-08 20:15:44 -07:00 |
|
kourosh hakhamaneshi
|
baed180aa0
|
[tech debt] Revisit lora request model checker (#20636)
Signed-off-by: Kourosh Hakhamaneshi <kourosh@anyscale.com>
|
2025-07-09 09:42:41 +08:00 |
|
QiliangCui
|
d8ee5a2ca4
|
[TPU][Bugfix] disable phi-3 test (#20632)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-07-08 23:14:26 +00:00 |
|
wang.yuqi
|
baba0389f7
|
[CI] Increase the threshold of the MTEB RERANK tests (#20615)
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-07-08 08:10:11 -07:00 |
|
Nicolò Lucchesi
|
71d1d75b7a
|
[PD][Nixl] Remote consumer READ timeout for clearing request blocks (#20139)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-07-08 08:56:40 +01:00 |
|
Sanger Steel
|
72d14d0eed
|
[Frontend] [Core] Integrate Tensorizer in to S3 loading machinery, allow passing arbitrary arguments during save/load (#19619)
Signed-off-by: Sanger Steel <sangersteel@gmail.com>
Co-authored-by: Eta <esyra@coreweave.com>
|
2025-07-07 22:47:43 -07:00 |
|
Li, Jiang
|
7721ef1786
|
[CI/Build][CPU] Fix CPU CI and remove all CPU V0 files (#20560)
Signed-off-by: jiang1.li <jiang1.li@intel.com>
|
2025-07-07 22:13:44 -07:00 |
|
Chauncey
|
93b9d9f499
|
[Bugfix]: Fix messy code when using logprobs (#19209)
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>
|
2025-07-08 11:02:15 +08:00 |
|
Ming Yang
|
afb7cff1b9
|
[Bugfix] Fix Maverick correctness by filling zero to cache space in cutlass_moe (#20167)
Signed-off-by: Ming Yang <yming@meta.com>
|
2025-07-08 01:07:22 +00:00 |
|
Anton
|
e601efcb10
|
[Misc] Add fully interleaved support for multimodal 'string' content format (#14047)
Signed-off-by: drobyshev.anton <drobyshev.anton@wb.ru>
Co-authored-by: drobyshev.anton <drobyshev.anton@wb.ru>
|
2025-07-07 19:43:08 +00:00 |
|
ztang2370
|
a37d75bbec
|
[Front-end] microbatch tokenization (#19334)
Signed-off-by: zt2370 <ztang2370@gmail.com>
|
2025-07-07 17:54:10 +01:00 |
|