Douglas Lehr
|
ec8ab9d254
|
[ROCm] Add dynamic mxfp4 quantization for DeepSeek V2 projection layers (#34157)
Signed-off-by: Doug Lehr <douglehr@amd.com>
Signed-off-by: Douglas Lehr <91553416+dllehr-amd@users.noreply.github.com>
Co-authored-by: Doug Lehr <douglehr@amd.com>
Co-authored-by: Rohan Potdar <66227218+Rohan138@users.noreply.github.com>
Co-authored-by: Gregory Shtrasberg <156009573+gshtras@users.noreply.github.com>
|
2026-02-26 10:00:49 -06:00 |
|
Wentao Ye
|
05972ea7e5
|
[Refactor] Remove dead or duplicate func utils or variables (#35318)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-26 10:57:56 -05:00 |
|
Jakub Zakrzewski
|
111d869069
|
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-02-26 14:17:17 +00:00 |
|
stingoChen
|
7fea7250a4
|
[Bug] Fix missing <think> tag after tool call in MiniMax 2.1 (#35352)
Signed-off-by: 冬马 <chenxinke@cai-inc.com>
Co-authored-by: 冬马 <chenxinke@cai-inc.com>
|
2026-02-26 22:11:07 +08:00 |
|
Cyrus Leung
|
845ee348ef
|
[Misc] Standardize handling of mm_processor_kwargs.size (#35284)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-26 13:05:46 +00:00 |
|
Asaf Gardin
|
ec13e549d3
|
[Bugfix] Fix uint32 overflow in Mamba selective scan state pointer arithmetic (#35275)
Signed-off-by: Josephasafg <ajgard7@gmail.com>
|
2026-02-26 12:22:06 +00:00 |
|
Li-Yongwen
|
c6ca51598a
|
[Bugfix] fix device_name for routing replay (#34336)
Signed-off-by: liyongwen <1310439159@qq.com>
|
2026-02-26 12:18:38 +00:00 |
|
Yueqian Lin
|
c0615a296d
|
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression (#35368)
Signed-off-by: linyueqian <linyueqian@outlook.com>
|
2026-02-26 11:58:23 +00:00 |
|
Harry Mellor
|
01914445b0
|
Remove bc-lint (#35274)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-26 03:01:01 -08:00 |
|
Kunshang Ji
|
5281713e11
|
[XPU] use fixed UMD version in dockerfile.xpu (#35392)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-26 18:54:55 +08:00 |
|
HZY
|
32693db8ce
|
[Bugfix] [Qwen3.5]Fix Qwen3.5 FP8 quantization: tuple shard_id weight loading (#35289)
Signed-off-by: daowu.hzy <daowu.hzy@alibaba-inc.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-26 18:26:15 +08:00 |
|
Akash kaothalkar
|
e03ddcfbd4
|
[Hardware][Powerpc]Enable prefix caching and chunked prefill for ppc64le (#35081)
Signed-off-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
Co-authored-by: Akash kaothalkar <akash.kaothalkar@ibm.com>
|
2026-02-26 10:21:24 +00:00 |
|
Sophie du Couédic
|
02acd16861
|
[Benchmarks] Plot benchmark timeline and requests statistics (#35220)
Signed-off-by: Sophie du Couédic <sop@zurich.ibm.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2026-02-26 02:17:43 -08:00 |
|
Jiangyun Zhu
|
ab87f85231
|
[Model] Ring 2.5 (#35102)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-02-26 02:17:11 -08:00 |
|
Krish Gupta
|
3827c8c55a
|
[Test] Add tests for n parameter in chat completions API (#35283)
Signed-off-by: KrxGu <krishom70@gmail.com>
v0.16.1rc0
|
2026-02-26 09:14:07 +00:00 |
|
Kevin McKay
|
ade81f17fe
|
[Bugfix][Hardware][AMD] Gate FP4 ops on gfx950 to prevent MI300X crash (#35250)
Signed-off-by: c0de128 <kevin.mckay@outlook.com>
|
2026-02-26 16:11:07 +08:00 |
|
Gregory Shtrasberg
|
6042e66cd5
|
[ROCm] Add extra step in config initialization to populate custom ops before compilation config init (#34848)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-02-26 16:05:40 +08:00 |
|
Chaojun Zhang
|
9f9a675b23
|
[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-26 15:46:44 +08:00 |
|
Ofir Zafrir
|
a07c4c5939
|
[BugFix][XPU] Fix speculative decoding on Intel XPU due to bug with IGC_ForceOCLSIMDWidth=16 (#35298)
Signed-off-by: Ofir Zafrir <ofir.zafrir@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-26 07:15:16 +00:00 |
|
Cyrus Leung
|
d3a51da92a
|
[Benchmark] Simplify SLA scan (#35306)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-25 22:35:41 -08:00 |
|
Flora Feng
|
186ea22efe
|
[Misc][Harmony] Move Responses API only harmony utils to responses/harmony.py (#35339)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-26 14:35:16 +08:00 |
|
Daniele
|
4a9c07a0a2
|
[BugFix] anthropic/serving_messages: fix tool call arguments streaming (#34887)
Signed-off-by: Daniele Trifirò <dtrifiro@redhat.com>
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>
|
2026-02-26 05:39:48 +00:00 |
|
Jason Li
|
9d37941017
|
[torch.compile] Sequence Parallelism threshold compile ranges (#28672)
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-26 05:00:12 +00:00 |
|
Fadi Arafeh
|
4171ff6dd9
|
[CPU][Feat] Enable KleidiAI INT8_W4A8 for all input dtypes (#34890)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
|
2026-02-26 05:00:10 +00:00 |
|
Woosuk Kwon
|
13025e71e8
|
[Model Runner V2] Add coding style guide (#35325)
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>
|
2026-02-25 20:42:40 -08:00 |
|
Hanjie Qiu
|
71dfce6aa6
|
[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109)
Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
|
2026-02-26 03:17:20 +00:00 |
|
hujiaxin0
|
2aa4140402
|
openpangu-vl support video input (#34134)
Signed-off-by: hujiaxin <524446785@qq.com>
Signed-off-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Emilie1001 <79921183+Emilie1001@users.noreply.github.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-02-26 03:08:09 +00:00 |
|
Roberto L. Castro
|
86c3b5a808
|
[BugFix] Fix fp4 quant kernel on CUDA 12.8 (#35210)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
|
2026-02-25 18:32:50 -08:00 |
|
Seungmin Kim
|
160424a937
|
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992)
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com>
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 18:15:51 -08:00 |
|
Lucas Wilkinson
|
9511a3f8ee
|
[Bugfix] Fix AttributeError in SMControlContextManager (#35338)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-25 18:01:10 -08:00 |
|
Michael Goin
|
de527e1cec
|
[UX] Add --moe-backend arg for explicit kernel selection (#33807)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-25 17:44:44 -08:00 |
|
Yongye Zhu
|
1976356ee6
|
[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2026-02-25 17:32:39 -08:00 |
|
Michael Goin
|
cbf8f7028c
|
[UX] Add --performance-mode {balanced,interactivity,throughput} (#34936)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-02-25 17:28:31 -08:00 |
|
Ming Yang
|
6831650c40
|
[offloader] v2: Hide weight onloading latency via prefetching (#29941)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 17:20:59 -08:00 |
|
Andreas Karatzas
|
ed42507f6d
|
[ROCm][CI] Amending deletion of AMD mirror (#35322)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 14:17:56 -08:00 |
|
Andreas Karatzas
|
9571e99945
|
[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 14:16:18 -08:00 |
|
Elizabeth Thomas
|
c97234c08b
|
fix(mxfp4): Disable monolithic path for TRITON backend with EP (#34270)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 13:33:42 -08:00 |
|
rasmith
|
b188bab441
|
[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor (#34985)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-25 19:18:00 +00:00 |
|
Lucas Wilkinson
|
15d76f74e2
|
Revert "[Misc] Enable weights loading tracking for quantized models" (#35309)
|
2026-02-25 09:20:15 -08:00 |
|
Andreas Karatzas
|
8fd6975479
|
[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results (#35049)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 16:48:37 +00:00 |
|
pushkar
|
5d18bf8b32
|
[Bugfix] Fix Harmony preamble visibility in Responses API (#32114)
Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
|
2026-02-25 08:08:16 -08:00 |
|
haosdent
|
0788ff0a15
|
[Bugfix] Gracefully disable AllReduceFusionPass on GPUs without multicast support (#35085)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-02-25 07:31:45 -08:00 |
|
Chendi.Xue
|
d72b0be33c
|
[XPU]Fix for Qwen-OMNI crash (#35249)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
|
2026-02-25 07:31:07 -08:00 |
|
Bhoomit
|
42489e43c2
|
[Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773)
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com>
|
2026-02-25 23:30:55 +08:00 |
|
Mario Hong
|
af5e6afa0a
|
[Bugfix] Fix step3p5 reasoning with interleaved thinking (#34211)
Signed-off-by: mariohong <mariohong128@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-02-25 15:13:01 +00:00 |
|
Benjamin Chislett
|
ee59a7c615
|
[Tests] Add GSM8k check to SpecDec E2E tests (#34772)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-25 07:51:14 -05:00 |
|
Joao Gante
|
709eadbb0b
|
Doc link typo (#35281)
Signed-off-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-02-25 03:00:31 -08:00 |
|
Harry Mellor
|
90fc7f9109
|
Fix custom processors that use deleted behaviour for Transformers v5 (#35107)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-25 02:36:21 -08:00 |
|
Yanwen Lin
|
675ec59aa9
|
[Bugfix][CPU] Fix basic unit tests failing in CPU platforms (#34677)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-25 08:36:15 +00:00 |
|
Yanwen Lin
|
80e60a6133
|
[Doc] Suggest "--managed-python" flag when installing python using uv (#33069)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
|
2026-02-25 08:19:43 +00:00 |
|