Maral
|
2e9034c998
|
[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892)
Signed-off-by: maral <maralbahari.98@gmail.com>
Signed-off-by: Maral <maralbahari.98@gmail.com>
|
2026-04-09 08:50:39 +08:00 |
|
Benjamin Chislett
|
8332078cfd
|
[Bugfix] FlashInfer MXINT4 MoE crashes, missing do_finalize (#39315)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
Signed-off-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2026-04-08 20:36:33 -04:00 |
|
Richard Zou
|
ba4a78eb5d
|
[torch.compile] Allow usage of Opaque Objects in PyTorch 2.11 (#39286)
Signed-off-by: Richard Zou <zou3519@gmail.com>
|
2026-04-08 23:21:10 +00:00 |
|
Kai Song
|
f3c7941ec8
|
[Bugfix]Fix EP precision for Qwen3.5, Qwen3-Next (#39181)
Signed-off-by: Song Kai <songkai05@baidu.com>
|
2026-04-09 01:47:48 +04:00 |
|
Wentao Ye
|
3352bf8b03
|
[CI Bug] Fix pre-commit issue in main (#39347)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 14:10:05 -07:00 |
|
triangleXIV
|
7c94ae16c6
|
[BugFix] --max-model-len=-1 causes over-limit requests to hang and starve the entire service (#39102)
Signed-off-by: triangle14 <y1019026570@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: mgoin <mgoin64@gmail.com>
|
2026-04-08 14:03:17 -07:00 |
|
Rishi Puri
|
ad05edfbca
|
tests/v1/e2e/spec_decode: assert async scheduling is used (#39206)
Signed-off-by: Rishi Puri <riship@nvidia.com>
Signed-off-by: Rishi Puri <puririshi98@berkeley.edu>
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Co-authored-by: Benjamin Chislett <chislett.ben@gmail.com>
Co-authored-by: Flora Feng <4florafeng@gmail.com>
|
2026-04-08 20:30:03 +00:00 |
|
Wentao Ye
|
2018137242
|
[Feature] Batch invariant nvfp4 linear support (#39322)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2026-04-08 16:29:13 -04:00 |
|
Jackmin801
|
a776a48b1c
|
[MoE] Move DEEP_GEMM into experts/ subdirectory (#39005)
Signed-off-by: Jackmin801 <ongjackm@gmail.com>
Signed-off-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <robshaw@redhat.com>
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>
|
2026-04-08 19:23:08 +00:00 |
|
Ben Browning
|
8477fe427d
|
[Tool] adjust_request to reasoning parser, and Gemma4 fixes (#39027)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
|
2026-04-08 19:04:04 +00:00 |
|
Lain
|
e24e0a43a4
|
[Attention] relax the head dim 512 and paged kv for sm90+FA4 (#38835)
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com>
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
Co-authored-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-08 18:23:18 +00:00 |
|
Roberto L. Castro
|
b55d830ec7
|
[Perf][Kernel] Persistent TopK scheduler: unified CUDAGraph-safe kernel with dynamic per-row dispatch - DeepSeek-V3.2 DSA decode (#37421)
Signed-off-by: LopezCastroRoberto <rocastro@redhat.com>
Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2026-04-08 13:35:57 -04:00 |
|
Shengqi Chen
|
75e01a39a1
|
[Feature] NUMA binding support for GPU workers (#38635)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
Co-authored-by: Jason Li <jasonlizhengjian@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
|
2026-04-08 09:55:24 -07:00 |
|
Or Ozeri
|
512c5eb455
|
[kv_offload+HMA][5/N]: Track group block hashes and block IDs (#37109)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-04-08 19:50:28 +03:00 |
|
Flora Feng
|
13151a4df4
|
[Bugfix] Fix Gemma4 streaming tool call corruption for split boolean/number values (#39114)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 16:46:27 +00:00 |
|
Gregory Shtrasberg
|
56c976c1b5
|
[ROCm] Enable fused_silu_mul_block_quant on ROCm (#38817)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-04-08 11:23:32 -05:00 |
|
Frederik Gossen
|
d74a306c4b
|
[Core] Use tuple_return in split_module for tuple-conformant subgraphs (#38752)
Signed-off-by: Frederik Gossen <frgossen@meta.com>
Co-authored-by: Boyuan Feng <boyuan@meta.com>
|
2026-04-08 09:09:58 -07:00 |
|
Gregory Shtrasberg
|
0e9f0a516c
|
[ROCm][CI-Build] Cherry pick triton BUFFER_OPS fix and update AITER (#38580)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
|
2026-04-08 10:38:03 -05:00 |
|
haosdent
|
8904fc4d19
|
[Bugfix] Fix V1 logprobs empty strings for multi-byte UTF-8 tokens when logprobs > 0 (#34875)
Signed-off-by: haosdent <haosdent@gmail.com>
|
2026-04-08 15:30:00 +00:00 |
|
nemanjaudovic
|
1a2c17634e
|
[Bugfix] Add missing ASRDataset import and CLI args in benchmarks/throughput.py (#38114)
Signed-off-by: nemanjaudovic <nudovic@amd.com>
|
2026-04-08 13:53:53 +00:00 |
|
Matthew Bonanni
|
308cec5864
|
[FlashAttention] Symlink FA4 instead of copying when using VLLM_FLASH_ATTN_SRC_DIR (#38814)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2026-04-08 12:04:34 +00:00 |
|
wang.yuqi
|
4e2ab1861d
|
[CI Failure] pin nomic-embed-text-v1 revision (#39292)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-04-08 11:43:06 +00:00 |
|
JartX
|
140cbb1186
|
[Bugfix] Cuda Clean up scales Kvcache fp8/int8_per_token_head (#39224)
Signed-off-by: JartX <sagformas@epdcenter.es>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2026-04-08 04:08:04 -07:00 |
|
Kevin H. Luu
|
6155bbd1dd
|
[Bugfix][Docs] Fix ReadTheDocs build crash from mocked torch decorator (#39284)
Signed-off-by: khluu <khluu000@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
2026-04-08 09:43:01 +00:00 |
|
rasmith
|
78434b923c
|
[CI][AMD][BugFix][Kernel] Cast induction variable to int64 on MI350 for chunk_gated_delta_rule_fwd_kernel_h_blockdim64 to avoid illegal memory access (#39087)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-04-08 16:57:18 +08:00 |
|
Michael Goin
|
2488d1dca2
|
[Docs] Update README (#39251)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2026-04-08 11:34:07 +08:00 |
|
yoke
|
d734445fcd
|
[Bugfix][Frontend] Fix Gemma4 streaming HTML duplication after tool calls (#38909)
Signed-off-by: yoke233 <yoke2012@gmail.com>
|
2026-04-08 11:03:54 +08:00 |
|
Flora Feng
|
927975ead8
|
[Parser] Migrate response api streaming to unified parser (#38755)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
Signed-off-by: Andrew Xia <axia@meta.com>
|
2026-04-08 10:09:00 +08:00 |
|
Flora Feng
|
9ea7d670d8
|
[Bugfix] Fix Qwen3 tool parser for Responses API tools (#38848)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 10:08:51 +08:00 |
|
Varun Sundar Rabindranath
|
7b80cd8ac3
|
[Docs] Add Phi-4-reasoning-vision to supported models + examples (#39232)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
|
2026-04-08 02:02:26 +00:00 |
|
Andrey Talman
|
2111997f96
|
[release 2.11] Update to torch 2.11 (#34644)
|
2026-04-07 18:55:48 -07:00 |
|
Flora Feng
|
5af684c319
|
[CI] Add reasoning parser tests to CI (#37025)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 00:57:36 +00:00 |
|
Md. Mekayel Anik
|
d521dcdbcc
|
docs: clarify SMT and OMP acronyms in CpuPlatform (#39085)
|
2026-04-07 17:42:07 -07:00 |
|
Giancarlo Delfin
|
5daf62271d
|
[Model Runner V2] Fuse probabilistic rejection sample kernels (#38496)
Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>
|
2026-04-07 17:37:37 -07:00 |
|
zofia
|
ad3304425b
|
[XPU] add xpu backend implementation of mxfp8 quant (#38682)
Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-08 08:30:35 +08:00 |
|
Lucas Wilkinson
|
70406eb1dc
|
[Attention][V0 Deprecation] Deprecate accept output buffer (#39125)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2026-04-07 17:14:58 -04:00 |
|
Yubo Wang
|
08bfedc152
|
[Bugfix] Fix extract_hidden_states crash with quantized KV cache dtype (#39160)
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
|
2026-04-07 11:18:33 -07:00 |
|
Flora Feng
|
0102bd2f4c
|
[Parser] Pass request.tools to tool parser (#38860)
Signed-off-by: sfeng33 <4florafeng@gmail.com>
|
2026-04-08 01:36:21 +08:00 |
|
rasmith
|
83d09d36b5
|
[CI][Bugfix][AMD][ Ensure weights created when using emulating OCP MXFP4 (#36993)
Signed-off-by: Randall Smith <Randall.Smith@amd.com>
|
2026-04-08 00:37:16 +08:00 |
|
Chendi.Xue
|
92b9afeecd
|
[XPU] Quick fix for TritonMLA to remove cuda hardcode (#39088)
Signed-off-by: Chendi Xue <chendi.xue@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-04-08 00:17:58 +08:00 |
|
Jinzhen Lin
|
7310555482
|
[Bugfix] Fix marlin nvfp4 rescaling (#37502)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
|
2026-04-07 08:57:17 -07:00 |
|
ibifrost
|
96b5004b71
|
[KVConnector] Support 3FS KVConnector (#37636)
Signed-off-by: wuchenxin <wuchenxin.wcx@alibaba-inc.com>
Signed-off-by: ibifrost <47308427+ibifrost@users.noreply.github.com>
Co-authored-by: Simon Mo <simon.mo@hey.com>
|
2026-04-07 15:46:00 +00:00 |
|
kkyyxhll
|
98e1a43af7
|
[Bugfix][Quantization] Fix PerTensorScale loading with tuple shard_id in MergedColumnParallelLinear (#38517)
Signed-off-by: loukang <loukang@xiaohongshu.com>
|
2026-04-07 11:16:26 -04:00 |
|
maobaolong
|
729eb59f60
|
[KVConnector]: prioritize external connector over internal registry (#38301)
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2026-04-07 15:03:11 +00:00 |
|
Ilya Boytsov
|
6e1100889e
|
fix(test): recompute Jina ColBERT rotary inv_freq cleared by transformers v5 weight loader (#39176)
Signed-off-by: Ilya Boytsov <ilyaboytsov1805@gmail.com>
|
2026-04-07 22:40:55 +08:00 |
|
Harry Mellor
|
edcc37a8ce
|
Fix Mistral yarn warning in Transformers v5 (#37292)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
|
2026-04-07 13:23:33 +00:00 |
|
Harry Mellor
|
79df4a794d
|
Automatically add links to API docs for matching strings in docs (#37434)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-04-07 21:21:18 +08:00 |
|
Ronen Schaffer
|
7c139ab23f
|
[KV Offload] Clean up ARC/LRU refactoring leftovers: group ARC tests and fix stale comment (#38217)
Signed-off-by: Ronen Schaffer <ronen.schaffer@ibm.com>
|
2026-04-07 15:14:45 +03:00 |
|
Wei Zhao
|
0be9516ea4
|
[Bug] Fix Trtllm Fp8 MoE Weight Shuffle Memory Fragamentation (#39054)
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>
|
2026-04-07 08:04:08 -04:00 |
|
Kyle Mylonakis
|
7b9de7c892
|
[Bugfix] Correct mistake in chained comparison in static assert logic (#38699)
Signed-off-by: Kyle Mylonakis <kyle@protopia.ai>
|
2026-04-07 18:24:39 +08:00 |
|