Cyrus Leung
|
6dd302653f
|
[Misc] Rename group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs (#36158)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-06 12:32:48 +08:00 |
|
Cyrus Leung
|
de00ebeac4
|
[Bugfix] Fix simple Mistral-Small example (#36156)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 20:25:11 -08:00 |
|
Andreas Karatzas
|
639680d220
|
[ROCm][CI] Adding missing dependencies for Multi-modal models tests (#36177)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-06 12:23:10 +08:00 |
|
Rohan Potdar
|
c5362c739f
|
Reenable features for ROCm attention backends (#36185)
Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>
|
2026-03-05 20:21:06 -08:00 |
|
Nikhil Gupta
|
0a49676fb0
|
cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147)
Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>
|
2026-03-06 03:48:59 +00:00 |
|
Jeffrey Wang
|
c012a8c477
|
Don't fire ray compatibility webhook when PR or branch is not provided (#36088)
Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>
|
2026-03-06 00:42:21 +00:00 |
|
Dor Huri
|
ebed80a7c8
|
[Performance] Extract KV-cache update from TreeAttention backend (#35384)
Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il>
|
2026-03-06 00:22:43 +00:00 |
|
Nick Hill
|
a73af584fe
|
[Model Runner V2] Fix warmup for very small kvcache and/or blocksizes (#36176)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-05 14:48:10 -08:00 |
|
Zhengxu Chen
|
a97954b6a8
|
[compile] Consistent compiler config for saved/loaded vllm backends. (#35810)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 15:08:12 -05:00 |
|
Yanhong Li
|
a911f4dd20
|
[Model] Add support for OLMo Hybrid (#32550)
|
2026-03-05 14:51:06 -05:00 |
|
Russell Bryant
|
5395471d29
|
[CI] Add explicit permissions to macOS smoke test workflow (#35775)
Signed-off-by: Russell Bryant <rbryant@redhat.com>
|
2026-03-05 19:08:48 +00:00 |
|
Frank Wang
|
a57c877f18
|
[BugFix] Fallback from FA4->FA2 for Batch Invariance (#36059)
Signed-off-by: frankwang28 <frank.wbb@hotmail.com>
|
2026-03-05 14:05:56 -05:00 |
|
Xin Yang
|
f917020983
|
[Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794)
Signed-off-by: Xin Yang <xyangx@amazon.com>
|
2026-03-05 13:47:53 -05:00 |
|
tomeras91
|
86483ca774
|
[Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146)
Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
|
2026-03-05 09:49:05 -08:00 |
|
Netanel Haber
|
b93a9e6f6d
|
ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133)
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
|
2026-03-05 17:29:30 +00:00 |
|
Xinyu Chen
|
d8839ef7d9
|
[XPU] Enable ModelRunnerV2 on XPU (#36078)
Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>
|
2026-03-05 17:19:18 +00:00 |
|
Avery Miao
|
e998fa76b9
|
[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994)
Signed-off-by: Miao, Avery <avery.miao@intel.com>
|
2026-03-05 09:16:29 -08:00 |
|
Jiayi Yan
|
6a895197fa
|
[Bugfix][CI] fix typos (#34934)
Signed-off-by: 1195343015 <1195343015@qq.com>
Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 17:05:46 +00:00 |
|
Sage Moore
|
8c760b6ab6
|
[ROCm] Refactor ROCm attention backend selection logic (#35246)
Signed-off-by: Sage Moore <sage@neuralmagic.com>
|
2026-03-05 10:51:26 -06:00 |
|
AllenDou
|
3ee68590c7
|
refactor funasr model. (#36108)
Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-05 08:07:37 -08:00 |
|
Cyrus Leung
|
7196348157
|
[Bugfix] Fix Qwen-VL tokenizer implementation (#36140)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2026-03-05 08:07:19 -08:00 |
|
Ning Xie
|
176c799f4c
|
[openai api] log exception in exception handler (1/N) (#31164)
Signed-off-by: Andy Xie <andy.xning@gmail.com>
|
2026-03-05 16:00:12 +00:00 |
|
Or Ozeri
|
612e7729c2
|
[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616)
Signed-off-by: Or Ozeri <oro@il.ibm.com>
|
2026-03-05 14:25:15 +00:00 |
|
Harry Mellor
|
ecde7af9c4
|
Fix import that was moved in Transformers 5.2.0 (#36120)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 13:59:44 +00:00 |
|
Harry Mellor
|
8df523351f
|
[Docs] Only build docs if documentation or ready labels are present (#36135)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 13:58:16 +00:00 |
|
Andreas Karatzas
|
b03ff6a96b
|
[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2026-03-05 21:52:49 +08:00 |
|
Ajay Anubolu
|
ed81d5edd1
|
[Bugfix] Fix RunAI streamer crash with S3-hosted model paths (#35976)
Signed-off-by: AjAnubolu <anuboluajay@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
|
2026-03-05 12:14:20 +00:00 |
|
Shiyan Deng
|
3c23ac840e
|
[Bugfix] Fix mypy errors in hermes_tool_parser.py (#36114)
Signed-off-by: Shiyan Deng <dsy842974287@meta.com>
|
2026-03-05 11:37:47 +00:00 |
|
cjackal
|
a708ef5944
|
[Misc] Fix SyntaxWarning - invalid escape sequence '\e' (#36020)
Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>
|
2026-03-05 10:55:31 +00:00 |
|
Kunshang Ji
|
66a2209645
|
[Hardware] Replace torch.cuda.synchronize() api with torch.accelerator.synchronize (#36085)
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>
|
2026-03-05 10:36:39 +00:00 |
|
Doug Smith
|
0bfa229bf1
|
[Release] Include source distribution (sdist) in PyPI uploads (#35136)
Signed-off-by: dougbtv <dosmith@redhat.com>
Co-authored-by: Daniele Trifirò <dtrifiro@redhat.com>
|
2026-03-05 01:43:50 -08:00 |
|
Paco Xu
|
7493c51c55
|
[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767)
Signed-off-by: Paco Xu <paco.xu@daocloud.io>
|
2026-03-05 17:39:50 +08:00 |
|
Reagan Lee
|
ac773bbe80
|
[Docs] Update docs to include mm processor + encoder benchmarks (#34083)
Signed-off-by: Reagan <reaganjlee@gmail.com>
|
2026-03-05 01:38:25 -08:00 |
|
Christian Munley
|
48e376a007
|
qwen3coder tool parser fix anyOf double encoded parameters (#36032)
Signed-off-by: Christian Munley <cmunley@nvidia.com>
|
2026-03-05 09:06:57 +00:00 |
|
Isotr0py
|
21eb2c3372
|
[Chore] Correct MTP models test registry ordering (#36115)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2026-03-05 08:55:04 +00:00 |
|
Seiji Eicher
|
e2b31243c0
|
[Docs] Update CacheConfig block_size docstring to remove inaccurate limit when using CUDA (#35632)
Signed-off-by: Seiji Eicher <seiji@anyscale.com>
|
2026-03-05 06:24:08 +00:00 |
|
Martin Hickey
|
c3598d02fa
|
[Misc] Remove deprecated items that are due for removal (#36006)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2026-03-05 06:14:50 +00:00 |
|
Benjamin Chislett
|
57c629e9c1
|
[Bugfix] Fix block_size for hybrid model MTP (#36036)
Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>
|
2026-03-05 06:10:54 +00:00 |
|
zihaoanllm
|
d106bf39f5
|
[Doc] Add Parallel Draft Models (#35973)
Signed-off-by: <zihaoan2@amd.com>
Signed-off-by: zihaoanllm <zihaoan2@amd.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 05:44:07 +00:00 |
|
Yanan Cao
|
b0651021e5
|
[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 (#36062)
|
2026-03-04 21:25:59 -08:00 |
|
Hanjun Cho
|
f600d5192e
|
[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849)
Signed-off-by: Hanjun Cho <gkswns0531@gmail.com>
Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2026-03-04 20:57:20 -08:00 |
|
Tianmu Li
|
8e7820131e
|
[Perf] Use dummy M for weight prepacking on x86 (#35890)
Signed-off-by: Li, Tianmu <tianmu.li@intel.com>
|
2026-03-05 04:56:49 +00:00 |
|
Andrii Skliar
|
0a12cea25f
|
Order config.py in Lexicographical order (#35866)
Signed-off-by: Andrii Skliar <askliar@nvidia.com>
Co-authored-by: Andrii Skliar <askliar@nvidia.com>
|
2026-03-04 20:56:47 -08:00 |
|
Zhengxu Chen
|
dd6dbd93f8
|
[compile] Fix extra cache save on warm start. (#35921)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 12:56:30 +08:00 |
|
Harry Mellor
|
26366009c5
|
[CI] Don't leave docs preview comment on closed PRs (#36087)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2026-03-05 04:51:46 +00:00 |
|
Nick Hill
|
16c472abe7
|
[Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper (#35328)
Signed-off-by: Nick Hill <nickhill123@gmail.com>
|
2026-03-05 12:11:59 +08:00 |
|
daje0601
|
3b23d57c96
|
[Model] Add LoRA support for Whisper models (#29856)
Signed-off-by: daje0601 <englishmt4118@gmail.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
|
2026-03-05 10:38:25 +08:00 |
|
Wentao Ye
|
2f4226fe52
|
[CI] Fix pre-commit mypy issue in main (#36049)
|
2026-03-04 18:13:12 -08:00 |
|
nkm-meta
|
792cbd64ca
|
Add platform method to enable custom collective ops registration (#34760)
Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com>
|
2026-03-05 00:50:32 +00:00 |
|
Zhengxu Chen
|
2ed4722e26
|
[compile] Reduce log spam from compile. (#36044)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2026-03-05 00:48:36 +00:00 |
|