Aaron Hao
|
2ce6f3cf67
|
[Feat][RL][2/2] Native Weight Syncing API: IPC (#34171)
Signed-off-by: hao-aaron <ahao@anyscale.com>
Signed-off-by: Aaron Hao <ahao@anyscale.com>
Signed-off-by: ahao-anyscale <ahao@anyscale.com>
|
2026-02-27 13:45:21 -07:00 |
|
Lucas Wilkinson
|
1d532f9d8f
|
[DP] Only use DP padding when cudagraphs are actually used (#34102)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-27 15:14:31 -05:00 |
|
Zhengxu Chen
|
29b35477b0
|
[compile] Fix caching error over pytree slice node. (#35308)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-02-27 19:34:16 +00:00 |
|
Huamin Li
|
157722da75
|
[perf] Use pinned memory for async H2D transfer in do_mamba_copy_block (#35480)
Signed-off-by: Huamin Li <3ericli@gmail.com>
|
2026-02-28 01:50:37 +08:00 |
|
fort726
|
905d76b51d
|
[Model] Add huggingface skt/A.X-K1 model (#32407)
Signed-off-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Signed-off-by: fort726 <38447663+fort726@users.noreply.github.com>
Co-authored-by: Sungwan(Alex) Kim <sw0726.kim@sktelecom.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
|
2026-02-27 09:26:02 -08:00 |
|
Yanan Cao
|
9098ce690c
|
[Kernel] [Helion] [7/N] Use HOP to represent Helion Kernel call to enable fx tracing and pattern matching (#34390)
Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>
|
2026-02-27 09:21:35 -08:00 |
|
Yueqian Lin
|
e8249378e4
|
[Bugfix] Fix check_interleaved_audio_video false positive for batched non-interleaved requests (#35487)
Signed-off-by: linyueqian <linyueqian@outlook.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-02-27 06:48:25 -08:00 |
|
Max Hu
|
9c3fe9936b
|
Flashinfer cuDNN backend for Qwen3 VL ViT attention (#34580)
Signed-off-by: Max Hu <maxhu@nvidia.com>
Signed-off-by: Max Hu <hyoung2991@gmail.com>
Co-authored-by: Max Hu <maxhu@nvidia.com>
Co-authored-by: Shang Wang <shangw@nvidia.com>
|
2026-02-27 20:20:23 +08:00 |
|
Umut Polat
|
b66a74649e
|
[Bugfix] Replace assert with ValueError for response_format validation in completions endpoint (#35456)
Signed-off-by: umut-polat <52835619+umut-polat@users.noreply.github.com>
|
2026-02-27 08:01:06 +00:00 |
|
gnovack
|
a532c83849
|
use 'max_active_experts' for moe lora input size (#33197)
Signed-off-by: gnovack <gnovack@amazon.com>
|
2026-02-27 03:50:43 +00:00 |
|
Nicolò Lucchesi
|
cabdaa7619
|
[Misc] Move GPUModelRunner.prepare_kernel_block_sizes to utils (#35400)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2026-02-27 11:42:51 +08:00 |
|
daniel-salib
|
d43048ce05
|
[Bugfix] Emit reasoning_part events in simple streaming path for Resp… (#35184)
Signed-off-by: Daniel Salib <danielsalib@meta.com>
|
2026-02-27 09:49:06 +08:00 |
|
Michael Goin
|
4fec53cfcb
|
[CI] Actually run tests/kernels/quantization/test_block_fp8.py in CI (#34274)
|
2026-02-26 17:58:03 -07:00 |
|
Andrii Skliar
|
56a6371706
|
[Update] Use FlashInfer fast_decode_plan directly instead of replication (#34687)
Signed-off-by: Andrii <askliar@nvidia.com>
Co-authored-by: Andrii <askliar@nvidia.com>
|
2026-02-26 16:31:43 -08:00 |
|
Tyler Michael Smith
|
eb19955c37
|
[WideEP] Remove pplx all2all backend (#33724)
Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-02-26 14:30:10 -08:00 |
|
Lucia Fang
|
0f2f24c8b2
|
[Bugfix] Fix MessageQueue connect_ip for cross-node data parallelism (#35429)
Signed-off-by: Lu Fang <fanglu@fb.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-26 22:08:16 +00:00 |
|
不做了睡大觉
|
967572dd5f
|
fix(reasoning): Qwen3ReasoningParser returns truncated output as reasoning (#35230)
Signed-off-by: stakeswky <stakeswky@users.noreply.github.com>
Co-authored-by: stakeswky <stakeswky@users.noreply.github.com>
|
2026-02-26 20:30:45 +00:00 |
|
Lucas Wilkinson
|
5e58bdc711
|
[Bugfix] Remove erroneous lower bound on LoRA vocab size constraint (#35354)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-02-26 18:44:50 +00:00 |
|
Yiliu Dong
|
d940607629
|
[Core] Support min_tokens with speculative decoding (#32642)
Signed-off-by: qianlihuang <yiliu.dong@qq.com>
Co-authored-by: qianlihuang <yiliu.dong@qq.com>
|
2026-02-26 12:31:28 -05:00 |
|
Wentao Ye
|
99c7892c5b
|
[Perf] Optimize maxsim scores computation for pooling models, 13.9% E2E throughput improvement (#35330)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-02-26 17:14:54 +00:00 |
|
Jakub Zakrzewski
|
111d869069
|
[Model] Add nvidia/llama-nemotron-embed-vl-1b-v2 multimodal embedding model (#35297)
Signed-off-by: Jakub Zakrzewski <jzakrzewski@nvidia.com>
|
2026-02-26 14:17:17 +00:00 |
|
Cyrus Leung
|
845ee348ef
|
[Misc] Standardize handling of mm_processor_kwargs.size (#35284)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-26 13:05:46 +00:00 |
|
Yueqian Lin
|
c0615a296d
|
[Bugfix] Fix Qwen2.5-Omni and Qwen3-Omni mixed-modality embed regression (#35368)
Signed-off-by: linyueqian <linyueqian@outlook.com>
|
2026-02-26 11:58:23 +00:00 |
|
Jiangyun Zhu
|
ab87f85231
|
[Model] Ring 2.5 (#35102)
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
|
2026-02-26 02:17:11 -08:00 |
|
Krish Gupta
|
3827c8c55a
|
[Test] Add tests for n parameter in chat completions API (#35283)
Signed-off-by: KrxGu <krishom70@gmail.com>
|
2026-02-26 09:14:07 +00:00 |
|
Chaojun Zhang
|
9f9a675b23
|
[XPU][8/N] Fix kernel bugs in XPU LoRA and MOE LORA (#34115)
Signed-off-by: chzhang <chaojun.zhang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-02-26 15:46:44 +08:00 |
|
Cyrus Leung
|
d3a51da92a
|
[Benchmark] Simplify SLA scan (#35306)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-02-25 22:35:41 -08:00 |
|
Flora Feng
|
186ea22efe
|
[Misc][Harmony] Move Responses API only harmony utils to responses/harmony.py (#35339)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-26 14:35:16 +08:00 |
|
Jason Li
|
9d37941017
|
[torch.compile] Sequence Parallelism threshold compile ranges (#28672)
Signed-off-by: jasonlizhengjian <jasonlizhengjian@gmail.com>
Signed-off-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>
|
2026-02-26 05:00:12 +00:00 |
|
Hanjie Qiu
|
71dfce6aa6
|
[Kernel] Refactor FlashInfer allreduce for mnnvl backend (#34109)
Signed-off-by: hjjq <50634613+hjjq@users.noreply.github.com>
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: wzhao18 <wzhao18.sz@gmail.com>
Co-authored-by: Wei Zhao <51183510+wzhao18@users.noreply.github.com>
|
2026-02-26 03:17:20 +00:00 |
|
Seungmin Kim
|
160424a937
|
[Bugfix] Fix CUDA compatibility path setting for both datacenter and consumer NVIDIA GPUs (#33992)
Signed-off-by: Seungmin Kim <8457324+ehfd@users.noreply.github.com>
Signed-off-by: Andrew Mello <19512127+88plug@users.noreply.github.com>
Co-authored-by: 88plug <19512127+88plug@users.noreply.github.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 18:15:51 -08:00 |
|
Michael Goin
|
de527e1cec
|
[UX] Add --moe-backend arg for explicit kernel selection (#33807)
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-02-25 17:44:44 -08:00 |
|
Yongye Zhu
|
1976356ee6
|
[MoE Refactor] MXFP4 Cutlass Experts to MK (#34542)
Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>
|
2026-02-25 17:32:39 -08:00 |
|
Ming Yang
|
6831650c40
|
[offloader] v2: Hide weight onloading latency via prefetching (#29941)
Signed-off-by: Ming Yang <minos.future@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 17:20:59 -08:00 |
|
Andreas Karatzas
|
9571e99945
|
[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 14:16:18 -08:00 |
|
Elizabeth Thomas
|
c97234c08b
|
fix(mxfp4): Disable monolithic path for TRITON backend with EP (#34270)
Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-02-25 13:33:42 -08:00 |
|
rasmith
|
b188bab441
|
[CI][AMD][BugFix] Add torch.cuda.set_device to test_punica_ops so punica kernels execute on same device as tensor (#34985)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-02-25 19:18:00 +00:00 |
|
Andreas Karatzas
|
8fd6975479
|
[ROCm][CI] Disable skinny GEMMs in multimodal tests to fix non-deterministic results (#35049)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 16:48:37 +00:00 |
|
pushkar
|
5d18bf8b32
|
[Bugfix] Fix Harmony preamble visibility in Responses API (#32114)
Signed-off-by: Pushkar Patel <git@thepushkarp.com>
Signed-off-by: pupa <pupa@users.noreply.github.com>
|
2026-02-25 08:08:16 -08:00 |
|
Bhoomit
|
42489e43c2
|
[Misc][LoRA] Increase max vocab size limit to 258048 in logits processor (#34773)
Signed-off-by: Bhoomit Vasani <vbhoomit@amazon.com>
|
2026-02-25 23:30:55 +08:00 |
|
Mario Hong
|
af5e6afa0a
|
[Bugfix] Fix step3p5 reasoning with interleaved thinking (#34211)
Signed-off-by: mariohong <mariohong128@gmail.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-02-25 15:13:01 +00:00 |
|
Benjamin Chislett
|
ee59a7c615
|
[Tests] Add GSM8k check to SpecDec E2E tests (#34772)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-02-25 07:51:14 -05:00 |
|
Yanwen Lin
|
675ec59aa9
|
[Bugfix][CPU] Fix basic unit tests failing in CPU platforms (#34677)
Signed-off-by: Yanwen Lin <lyw1124278064@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-25 08:36:15 +00:00 |
|
Laura Wang
|
2465071510
|
[Perf] Add opt-in SM100 Oink RMSNorm custom-op path (#31828)
Signed-off-by: Laura Wang <3700467+Laurawly@users.noreply.github.com>
Co-authored-by: Lu Fang <30275821+houseroad@users.noreply.github.com>
|
2026-02-24 23:01:53 -08:00 |
|
Kunshang Ji
|
8ad54a991b
|
[Platform] Add current_platform.num_compute_units interface (#35042)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Kunshang Ji <jikunshang95@gmail.com>
|
2026-02-24 22:22:49 -08:00 |
|
Chen Zhang
|
8fae54faff
|
[Linear Attention] fix bug for linear attention + prefix caching + reset_prefix_cache (#35157)
Signed-off-by: Chen Zhang <zhangch99@outlook.com>
|
2026-02-24 22:00:19 -08:00 |
|
Harry Mellor
|
f7967577f5
|
Remove requirement to use --hf-overrides for DeepseekVLV2ForCausalLM (#35203)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-02-24 22:00:06 -08:00 |
|
Andreas Karatzas
|
2ff3e436ad
|
[Responses][CI] Filter negative token IDs in schema fuzz test to avoid 500 errors (#35231)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-02-25 05:52:44 +00:00 |
|
Rohan Potdar
|
f38f8c9742
|
[ROCm]: Enable customop and rope+kvcache fusion for AITER RoPE (#35180)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-02-25 04:36:40 +00:00 |
|
Flora Feng
|
ec1d30c0f6
|
[Responses] Decouple SSE event helpers from Harmony context (#35148)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-02-24 20:05:25 -08:00 |
|