Wentao Ye
|
541a2ef892
|
[Perf] Deepgemm fused layout kernel for activations, 4.3% throughput improvement, 10.7% TTFT improvement. (#29546)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-07 20:31:14 +08:00 |
|
Jinzhen Lin
|
879ddb09c3
|
[Kernel][MoE] optimize moe_align_block_size (#29642)
Signed-off-by: Jinzhen Lin <jinzhen.ljz@antgroup.com>
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-07 01:58:47 -08:00 |
|
Cyrus Leung
|
e83b7e379c
|
Revert "[Renderer] Separate out RendererConfig from ModelConfig (#30145)" (#30199)
|
2025-12-07 00:00:22 -08:00 |
|
Cyrus Leung
|
27f4c2fd46
|
[Renderer] Separate out RendererConfig from ModelConfig (#30145)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 23:15:42 -08:00 |
|
Cyrus Leung
|
671427efbf
|
[Model] Move multimodal_cpu_fields definition to field config (#30181)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 13:40:02 +00:00 |
|
Cyrus Leung
|
c46b932df2
|
[Chore] Deprecate SupportsMultiModal.merge_by_field_config (#30170)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-06 07:57:28 +00:00 |
|
Peter Salas
|
e858bc4d14
|
[Model] Add support for transformer-based Ultravox v0.7 projector (#30089)
Signed-off-by: Peter Salas <peter@fixie.ai>
|
2025-12-05 20:55:43 -08:00 |
|
Dongjie Zou
|
e3fbb6f152
|
fix#30092 Kimi-Linear model loading failure with missing indexer_rotary_emb (#30093)
Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>
|
2025-12-05 20:55:09 -08:00 |
|
yuttian1
|
c4d62618ca
|
Fix AWQ MoE marlin check issue in marlin_utils.py for AMD backend (#30102)
Signed-off-by: yuttian1 <yuttian@amd.com>
|
2025-12-05 20:54:38 -08:00 |
|
rasmith
|
dc839ad03d
|
[CI/Build][AMD][Quantization] Fix test_int8_kernel.py by updating int8_utils to use hip.libdevice.round (#30151)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-05 20:52:11 -08:00 |
|
Wentao Ye
|
7b5575fa7d
|
[Bug] Fix vLLM config is not set error (#29999)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-05 16:42:12 -05:00 |
|
Divakar Verma
|
962d703818
|
[Bugfix][llama4_eagle] Fix missing 'lm_head' attribute (#29926)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
|
2025-12-05 19:57:26 +00:00 |
|
Matthew Bonanni
|
66e674cdd5
|
[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>
|
2025-12-05 09:48:43 -08:00 |
|
Yi Liu
|
0d8a7d8a26
|
[Compressed Tensors] Add XPU wNa16 support (#29484)
Signed-off-by: yiliu30 <yi4.liu@intel.com>
|
2025-12-05 22:02:09 +08:00 |
|
Zhiwei
|
3628bcaaf2
|
[ROCm][MXFP4] Infer w4a4 quant method in rocm aiter fused moe (#29775)
Signed-off-by: ZhiweiYan-96 <zhiwei.yan@amd.com>
|
2025-12-05 11:01:16 +00:00 |
|
amitz-nv
|
6038b1b04b
|
[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978)
Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>
|
2025-12-05 00:34:33 -08:00 |
|
Alexander Matveev
|
4470ee2f90
|
[Perf] Enable separate shared_experts stream only for CUDA (#30085)
Signed-off-by: Alexander Matveev <amatveev@redhat.com>
|
2025-12-05 00:03:17 +00:00 |
|
Harry Mellor
|
e10c84e06a
|
Access partial_rotary_factor from rope_parameters (#29966)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-04 18:42:49 +00:00 |
|
Jee Jee Li
|
652ba93da3
|
[Bugfix] Fix FP8 MoE LoRA (#29890)
Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>
|
2025-12-04 18:17:49 +00:00 |
|
Tao Yun
|
6dcb07f676
|
support qwen3-vl handle requests with embeddings (#30037)
Signed-off-by: taoyun <1069423820@qq.com>
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>
|
2025-12-04 17:34:06 +00:00 |
|
Cyrus Leung
|
b286a311c2
|
[Chore] Deprecate merge_by_field_config arg (#30035)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-04 17:21:24 +00:00 |
|
Harry Mellor
|
9998ea5b57
|
Delete HF version of Phi 4 MM (#30049)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-04 13:44:50 +00:00 |
|
wang.yuqi
|
74c4d80c6c
|
[Model][6/N] Improve all pooling task | Support chunked prefill with ALL pooling (#27145)
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-04 13:44:15 +00:00 |
|
Cyrus Leung
|
68eb5c8d97
|
[Misc] Move functions into PoolingMetadata (#30027)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-04 08:21:19 +00:00 |
|
TJian
|
3f1b03739a
|
[ROCm] [Bugfix] compute_attn_mask_seqlen for qwen3 omni (#29974)
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
|
2025-12-04 08:20:24 +00:00 |
|
Cyrus Leung
|
9ae2f60374
|
[Misc] Various cleanups for MM input processing (#29970)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-04 06:22:20 +00:00 |
|
bnellnm
|
2902c34826
|
[Kernels] Remove BatchedTritonOrDeepGemmExperts and default fallback to Triton (#29929)
Signed-off-by: Bill Nell <bnell@redhat.com>
Signed-off-by: bnellnm <49004751+bnellnm@users.noreply.github.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-12-03 20:49:00 +00:00 |
|
Varun Sundar Rabindranath
|
19bee6d12d
|
[Performance][DP/EP] Add silu_mul_per_token_group_quant_fp8_colmajor kernel (#29470)
Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>
Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>
|
2025-12-03 18:04:59 +00:00 |
|
HDCharles
|
b294e28db2
|
[refactor] CTMoEMethods to use QuantizationArgs (#28871)
Signed-off-by: HDCharles <charlesdavidhernandez@gmail.com>
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-03 11:00:56 +00:00 |
|
Tsukasa OI
|
42c1949643
|
[Bugfix][Quantization] Support BF16 tensors on GGUF (#29948)
Signed-off-by: Tsukasa OI <floss_llm@irq.a4lg.com>
|
2025-12-03 10:33:46 +00:00 |
|
Isotr0py
|
a21cd9ed23
|
[Bugfix] Fix incorrect image_grid_thw rank for HunyuanOCR from missing merge_by_field_config=True (#29950)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-12-03 10:05:10 +00:00 |
|
Julien Denize
|
5e5646e206
|
[BUGFIX] llama_4_scaling wrongly passed to DeepseekAttention (#29908)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
|
2025-12-02 14:51:20 -08:00 |
|
Harry Mellor
|
6fc5841db1
|
Fix some more Transformers nightly tests (#29872)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-02 21:49:44 +00:00 |
|
Navanit Dubey
|
a2b053dc85
|
feat(model): Add BitsAndBytes quantization support for Qwen3-Omni-MoE (#29896)
Signed-off-by: navanit-git <navanitdubey@gmail.com>
|
2025-12-02 19:28:35 +00:00 |
|
Matthew Bonanni
|
1d93f11675
|
[Attention][CUDAGraph] Remove CG padding from attention backends (#29352)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
|
2025-12-02 13:48:08 -05:00 |
|
Isotr0py
|
0ec8422171
|
[Bugfix] Fix incorrect channel order for idefics3 in edge case (#29881)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
Signed-off-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-02 16:03:52 +00:00 |
|
Matthew Bonanni
|
51c57b51dd
|
[Bugfix] Fix DeepSeek R1 MTP weight loading (#29545)
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>
Co-authored-by: Benjamin Chislett <bchislett@nvidia.com>
|
2025-12-02 15:52:18 +00:00 |
|
Cyrus Leung
|
68ffbca7e4
|
[Chore] Use tokenizer.encode and tokenizer.decode directly (#29851)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-02 12:30:40 +00:00 |
|
Julien Denize
|
d8c6210eea
|
Add Mistral Large 3 and Ministral 3 (#29757)
Signed-off-by: Julien Denize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Signed-off-by: Mickael Seznec <mickael@mistral.ai>
Signed-off-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Roger Wang <hey@rogerw.io>
Co-authored-by: Mickael Seznec <mickael@mistral.ai>
|
2025-12-02 10:29:00 +00:00 |
|
Harry Mellor
|
f5b0846ba0
|
Fix some Transformers nightly tests (#29802)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-02 07:05:27 +00:00 |
|
Cyrus Leung
|
653591d5e7
|
[Chore] Move tokenizer initialization methods (#29793)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-02 13:33:37 +08:00 |
|
Johnny Yang
|
f441d36cee
|
Add missing return in _check_vllm_model_embed_input_ids (#29834)
Signed-off-by: Johnny Yang <johnnyyang@google.com>
|
2025-12-01 19:22:50 -08:00 |
|
Divakar Verma
|
4b40924998
|
[ROCm] Fallback pytorch GELU with tanh approximation to GELU() (#29244)
Signed-off-by: Divakar Verma <divakar.verma@amd.com>
Signed-off-by: Divakar Verma <137818590+divakar-amd@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-02 02:02:22 +00:00 |
|
sangbumlikeagod
|
092bb73b8a
|
[Frontend] add 'verbose_json' and 'timestamp' feature on Whisper Transcription/Translation (#24209)
Signed-off-by: sangbumlikeagod <oironese@naver.com>
Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>
|
2025-12-01 18:19:17 +01:00 |
|
Fanli Lin
|
f37e8938d2
|
[XPU] Fix AWQ skipped layer detection in IPEX quantization (#29774)
Signed-off-by: Fanli Lin <fanli.lin@intel.com>
|
2025-12-01 12:00:52 +00:00 |
|
Shu Wang
|
f72a817bdf
|
[MoE] CuteDSL MoE with Nvfp4 DeepEP dispatch (#27141)
Signed-off-by: Shu Wang <shuw@nvidia.com>
Signed-off-by: Shu Wang. <shuw@nvidia.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: root <root@umbriel-b200-017.ipp4a1.colossus.nvidia.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
|
2025-11-30 16:05:32 -08:00 |
|
Xingyu Liu
|
21c2627934
|
[Misc]Remove redundant hidden_size property in ModelConfig (#29749)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-11-30 17:14:23 +00:00 |
|
Omer Ullman Argov
|
39d28108f4
|
[Feat] Support non-gated activations in NVFP4 modelopt path (#29004)
|
2025-11-30 11:02:40 -05:00 |
|
Cyrus Leung
|
64bc09ba27
|
[Core] Enable inputs_embeds_size separate from hidden_size (#29741)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-11-30 17:31:12 +08:00 |
|
Isotr0py
|
47539cfd3e
|
[Bugfix] Fix mismatched nvfp4 gemm output shape (#29742)
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>
|
2025-11-30 09:15:01 +00:00 |
|