Jee Jee Li
|
3380ed5e11
|
[Doc] Add llama4 LoRA tag (#28825)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-17 14:08:48 +08:00 |
|
Jay Caldwell
|
6f37419244
|
[Bugfix][Model] Prevent special token leakage in KimiK2ToolParser streaming mode (#28543)
Signed-off-by: Jscaldwell55 <jay.s.caldwell@gmail.com>
|
2025-11-17 13:54:46 +08:00 |
|
Xiake Sun
|
60e089f0b9
|
[ROCm][Qwen3-32B] Fix AITER MHA accuracy issue cause by #25763 (#28670)
Signed-off-by: Xiake Sun <xiake.sun@amd.com>
|
2025-11-16 20:52:11 -08:00 |
|
liuzhenwei
|
d64429bb36
|
[NIXL][XPU] update install script of NIXL (#28778)
Signed-off-by: zhenwei-intel <zhenwei.liu@intel.com>
|
2025-11-17 03:01:33 +00:00 |
|
jiahanc
|
561253b37f
|
[Performance][Fix] update nvfp4 code to support renorm routing (#28569)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-16 18:02:42 -08:00 |
|
Nick Hill
|
80b6080ddc
|
[BugFix] Fix async scheduling + chunked prefill + preemption (#28787)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-17 06:46:46 +08:00 |
|
amirkl94
|
03ee48111d
|
Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261)
|
2025-11-16 13:39:44 -05:00 |
|
Lukas Geiger
|
5a87076d6e
|
[Model][QwenVL] Optimize Qwen2_5_VisionAttention q,k preparation (#28769)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-16 17:37:15 +00:00 |
|
Ning Xie
|
ac1daf3233
|
fix comment typo (#28802)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2025-11-16 17:03:21 +00:00 |
|
Didier Durand
|
63fed55506
|
[Doc]: fix typos in various files (#28811)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-16 14:30:06 +00:00 |
|
Anna Shors
|
8d259fad6c
|
Fix gpt oss weight loading with EP + bf16 (#28765)
Signed-off-by: ashors1 <ashors@nvidia.com>
|
2025-11-16 13:12:45 +00:00 |
|
scottzh8
|
3bc1175798
|
[Bugfix] Fix host and port join for ipv6 in bench serve (#28679)
Signed-off-by: Scott Zhang <scottzh@fb.com>
Co-authored-by: Scott Zhang <scottzh@fb.com>
|
2025-11-16 10:20:57 +00:00 |
|
Dezhan
|
af02c40970
|
Fixed gpt-oss _load_weights_other() parameter position bug (#28715)
Co-authored-by: Dezhan Tu <dztu@meta.com>
|
2025-11-16 09:46:29 +00:00 |
|
Lucia Fang
|
b316ac6589
|
[V1] Support MP Executor for multi node distributed inference (#23691)
Signed-off-by: Lu Fang <fanglu@fb.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Signed-off-by: Lucia Fang <fanglu@fb.com>
Signed-off-by: Lucia Fang <116399278+luccafong@users.noreply.github.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
|
2025-11-16 09:01:21 +00:00 |
|
wang.yuqi
|
a55b64635c
|
[Model] Allow users to control skip reading cache per request. (#28194)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Signed-off-by: wang.yuqi <noooop@126.com>
|
2025-11-16 00:04:50 -08:00 |
|
ai-jz
|
d231876ce3
|
[Benchmark] Fix client seed synchronization in multi-turn benchmark (#28512)
Signed-off-by: ai-jz <aijz.xplr@gmail.com>
|
2025-11-16 15:04:32 +08:00 |
|
Bram Wasti
|
f849ee739c
|
Adding a benchmark for batch invariance (#28161)
Signed-off-by: Bram Wasti <bwasti@meta.com>
Signed-off-by: Bram Wasti <bwasti@fb.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-11-16 13:22:17 +08:00 |
|
Lucas Wilkinson
|
be263f7645
|
[BugFix] Fix AssertionError: DCP not support reorder_batch_threshold > 1 now. (#28751)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-11-15 22:35:06 +00:00 |
|
Didier Durand
|
2bb4435cb7
|
[Doc]: fix typos in various files (#28567)
Signed-off-by: Didier Durand <durand.didier@gmail.com>
|
2025-11-15 19:27:50 +00:00 |
|
Lukas Geiger
|
07cadab27a
|
[Model][Qwen3VL] Cache positional embedding indices (#28475)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2025-11-15 19:03:09 +00:00 |
|
Nick Hill
|
637f292196
|
[CI] Fix broken pipeline (#28781)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-15 08:44:14 -08:00 |
|
Eldar Kurtić
|
e439c784fa
|
Add support for Eagle with separate lm-head and embed_tokens layers (#28549)
Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com>
|
2025-11-15 06:12:02 -08:00 |
|
hwhaokun
|
085a525332
|
[Model] Fix lmhead init bug of bailing_moe (#28777)
Signed-off-by: hwhaokun <haokun0405@163.com>
Co-authored-by: zhaozx-cn <zhaozx2116@163.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-11-15 05:44:12 -08:00 |
|
Cyrus Leung
|
89d3679221
|
[Doc] Fix failing doc build (#28772)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-15 05:33:27 -08:00 |
|
tingtinggithub
|
cb15ee28db
|
Allow Gemma3 to take image embeddings (#28483)
Signed-off-by: tingtinggithub <streamttt@gmail.com>
|
2025-11-15 04:18:08 -08:00 |
|
Angela Yi
|
f36292dbee
|
[compile] Enable sequence parallelism matching w/o custom ops enabled (#27126)
Signed-off-by: angelayi <yiangela7@gmail.com>
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Signed-off-by: ProExpertProg <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <lgovedic@redhat.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
Co-authored-by: Luka Govedič <luka.govedic@gmail.com>
|
2025-11-15 11:46:12 +00:00 |
|
Vadim Gimpelson
|
173b356abf
|
[PERF] Remove TRTLLM Gen attn kernel limitation max_seq_len <=131072 (#28755)
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>
|
2025-11-15 15:43:41 +05:30 |
|
Cyrus Leung
|
638e4196d1
|
[Misc] Make SchedulerConfig.max_model_len init-only (#28733)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-15 01:59:31 -08:00 |
|
Zhewen Li
|
1ec978c209
|
[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709)
Signed-off-by: Zhewen Li <zhewenli@meta.com>
|
2025-11-15 01:10:48 -08:00 |
|
Jane (Yuan) Xu
|
74b5267d3a
|
Use narrow over indexing in hadacore_transform to prep for ABI stable (#28756)
Signed-off-by: Jane Xu <janeyx@meta.com>
|
2025-11-15 01:10:15 -08:00 |
|
Zhuohan Li
|
dd6ac1c2bb
|
[RL] [V1] Remove unused device argument from reset_kv_cache (#28766)
Signed-off-by: Zhuohan Li <zhuohan123@gmail.com>
|
2025-11-14 23:59:42 -08:00 |
|
Cyrus Leung
|
98b4d389ed
|
[Redo] #26368 (#28771)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Co-authored-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 22:47:41 -08:00 |
|
Varun Sundar Rabindranath
|
6965ef436f
|
[Performance][DeepGEMM] Estimate expected_m (#28694)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2025-11-15 13:52:14 +08:00 |
|
Chendi.Xue
|
c9e665852a
|
[NIXL] heterogeneous block_size support (#26759)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Signed-off-by: Chendi.Xue <chendi.xue@intel.com>
Co-authored-by: Nicolò Lucchesi <nicolo.lucchesi@gmail.com>
|
2025-11-14 21:51:32 -08:00 |
|
Mohammad Othman
|
363aaeef0f
|
Fix IntermediateTensors initialization and add type hints (#28743)
Signed-off-by: Mohammad Othman <Mo@MohammadOthman.com>
Co-authored-by: Mohammad Othman <Mo@MohammadOthman.com>
|
2025-11-15 04:31:36 +00:00 |
|
Nick Hill
|
ac86bff8cb
|
Revert "[Core] Performance: Use list[np.ndarray] instead of list[list… (#28773)
|
2025-11-14 20:24:00 -08:00 |
|
Michael Goin
|
edfe498189
|
[Bugfix] Build hadacore kernels on >SM90 (#28748)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-14 19:51:05 -08:00 |
|
Lukas Geiger
|
f05d474c8a
|
[Model][Qwen3VL] Use mm_position to compute mrope positions (#28730)
Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-11-14 19:45:11 -08:00 |
|
QiliangCui
|
9fc81ec765
|
[TPU] Fix import error in tpu launch (#28758)
Signed-off-by: Qiliang Cui <derrhein@gmail.com>
|
2025-11-15 00:58:32 +00:00 |
|
Jialin Ouyang
|
186352b270
|
[Core] Performance: Use list[np.ndarray] instead of list[list[int]] for output tokens for GC optimization (#26368)
Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>
|
2025-11-14 16:04:04 -08:00 |
|
Nick Hill
|
58e61e56b7
|
[Test] Rework e2e async scheduling tests (#28744)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-11-14 16:01:09 -08:00 |
|
Gregory Shtrasberg
|
75f01b9d3c
|
[ROCm][CI/Build] Upgrade to ROCm 7.1 and AITER main (#28753)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-11-14 15:53:21 -08:00 |
|
rasmith
|
ba041d980b
|
[Log] Save profiler results to file instead of stdout (#28144)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-11-14 23:26:39 +00:00 |
|
Thomas Parnell
|
e0c910bb89
|
[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295)
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-11-14 22:55:42 +00:00 |
|
Benjamin Chislett
|
bf3ffb61e6
|
[Bugfix] Fix ChunkedLocalAttention CUDA Graph setting (#28739)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-11-14 14:14:46 -08:00 |
|
Alexander Matveev
|
e5c78956c0
|
[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-11-14 14:13:46 -08:00 |
|
Laith Sakka
|
2e0ad629b0
|
Avoid bytecode hook and simplify TorchCompileWrapperWithCustomDipatch (#25110)
Signed-off-by: Laith Sakka <lsakka@meta.com>
|
2025-11-14 14:11:10 -08:00 |
|
Gregory Shtrasberg
|
5a84b76b86
|
[ROCm][CI/Build] Change install location of uv (#28741)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2025-11-14 21:34:18 +00:00 |
|
Marcin Ostrowski
|
0de4f217ab
|
[Bugfix] TypeError: 'NoneType' object is not callable (#27410)
Signed-off-by: Marcin Ostrowski <marcinx.ostrowski@intel.com>
|
2025-11-14 21:13:53 +00:00 |
|
Michael Goin
|
f08eab2acc
|
[CI] Fix macos smoke test uv cache issue (#28736)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-11-14 13:29:55 -07:00 |
|