realliujiaxu
|
d2c919dcc2
|
[bugfix] fix bug when top_logprobs=0 with spec decoding (#30059)
Signed-off-by: realliujiaxu <realliujiaxu@163.com>
|
2025-12-12 09:03:35 -08:00 |
|
Benjamin Bartels
|
f3237f3f6b
|
[Frontend] Fixes anthropic streaming message_start usage nesting (#30266)
Signed-off-by: bbartels <benjamin@bartels.dev>
|
2025-12-12 16:28:54 +00:00 |
|
jvlunteren
|
9c0ee995a8
|
[Kernel] Support CUDA Graphs in 3D Triton Attention Kernel (#28306)
Signed-off-by: Jan van Lunteren <jvl@zurich.ibm.com>
Signed-off-by: jvlunteren <161835099+jvlunteren@users.noreply.github.com>
Co-authored-by: Thomas Parnell <tom.parnell@gmail.com>
Co-authored-by: Thomas Parnell <tpa@zurich.ibm.com>
|
2025-12-12 16:55:40 +01:00 |
|
Michael Goin
|
09ad3b76b3
|
[Bug] Fix attention_backend arg string parsing (#30534)
Signed-off-by: mgoin <mgoin64@gmail.com>
|
2025-12-12 08:40:50 -07:00 |
|
Christina Norman
|
dc13c99eed
|
fix(gguf): Disable bfloat16 for GGUF on blackwell device (#30408)
Signed-off-by: Christina <truffle@gmail.com>
Signed-off-by: Isotr0py <2037008807@qq.com>
Signed-off-by: Christina Norman <christina@example.com>
Co-authored-by: Isotr0py <isotr0py@users.noreply.github.com>
Co-authored-by: Isotr0py <2037008807@qq.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-12 23:10:12 +08:00 |
|
Vladislav Nosivskoy
|
3e34adcdfb
|
[DeepSeek V3.2] Proper drop_thinking logic (#30490)
Signed-off-by: Vladislav Nosivskoy <vladnosiv@gmail.com>
|
2025-12-12 15:01:06 +00:00 |
|
Lucas Wilkinson
|
3e41992fec
|
[Attention] Use sparse prefill kernel for fp8 kv-cache in DeepSeek-v3.2 (#27532)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-12 05:57:47 -08:00 |
|
吴坎
|
91401c7a26
|
[Bugfix] Fix CMakeLists Environment Variable (#21804)
Signed-off-by: wu-kan <github@wu-kan.com>
Signed-off-by: 吴坎 <github@wu-kan.cn>
Signed-off-by: wu-kan <github@wu-kan.cn>
|
2025-12-12 10:54:52 +00:00 |
|
Jaehwang Jung
|
f90319d5d1
|
[Bugfix] Schedule failure due to wrong get_image_size_with_most_features (#29692)
|
2025-12-12 02:27:20 -08:00 |
|
rasmith
|
302b2c1eb9
|
[CI/Build][AMD] Fix ref_dynamic_per_token_quant reference implementation on ROCm. (#30291)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 09:30:23 +00:00 |
|
Ben Browning
|
8f8fda261a
|
[Bugfix] Multiple fixes for gpt-oss Chat Completion prompting (#28729)
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Chauncey <chaunceyjiang@gmail.com>
|
2025-12-12 12:59:53 +08:00 |
|
Zhengxu Chen
|
fe1787107e
|
[compile] Parse compile range cache keys as Range during cache loading. (#30516)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-12 04:30:51 +00:00 |
|
Andreas Karatzas
|
783644e4ac
|
[ROCm][CI] Skip multi-GPU speculative decoding tests when insufficient GPUs available (#30527)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-12 03:54:56 +00:00 |
|
Ryan Rock
|
197473c4e7
|
[CI/Build] Use spawn subprocess for ROCm (#30272)
Signed-off-by: Ryan Rock <ryan.rock@amd.com>
|
2025-12-12 03:33:17 +00:00 |
|
Nick Hill
|
947dfda9c2
|
[LMCache] Relax lmcache version requirement (#30425)
Signed-off-by: Nick Hill <nhill@redhat.com>
|
2025-12-11 18:18:47 -09:00 |
|
Michael Goin
|
9f2fc16a69
|
[Bugfix][Model] Fix Afmoe rope_parameters issue (#30505)
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-12 02:53:57 +00:00 |
|
Bhanu Prakash Voutharoja
|
6a6fc41c79
|
gptq marlin quantization support for fused moe with lora (#30254)
Signed-off-by: Bhanu068 <voutharoja.bhanu06@gmail.com>
|
2025-12-12 02:27:22 +00:00 |
|
Fadi Arafeh
|
f355ad5412
|
[CPU][FIX] Fix build failures on Arm CPUs with torch nightly (#30481)
Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>
|
2025-12-12 02:09:25 +00:00 |
|
Lucas Wilkinson
|
042da73244
|
[Core] Refactor _build_attention_metadata (#29628)
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>
|
2025-12-11 17:54:12 -08:00 |
|
Andreas Karatzas
|
b5945d49c0
|
[ROCm][CI] Use mi325_4 agent pool for V1 e2e tests (#30526)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-12 01:37:24 +00:00 |
|
rasmith
|
ba80926681
|
[CI/Build][AMD] Skip test_cutlass_w4a8_moe tests on ROCm sine they require cutlass_pack_scale_fp8 (#30508)
Signed-off-by: Randall Smith <ransmith@amd.com>
Signed-off-by: Michael Goin <mgoin64@gmail.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Michael Goin <mgoin64@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
2025-12-12 01:02:19 +00:00 |
|
jiahanc
|
0ab23c2b2b
|
[fix] fix SM check for Flashinfer TRTLLM MOE (#30314)
Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>
|
2025-12-12 01:00:58 +00:00 |
|
rasmith
|
48661d275f
|
[CI/Build][AMD] Skip tests in test_fusions_e2e and test_dbo_dp_ep_gsm8k that require non-existing imports for ROCm (#30417)
Signed-off-by: Randall Smith <ransmith@amd.com>
Co-authored-by: Randall Smith <ransmith@amd.com>
|
2025-12-12 00:24:20 +00:00 |
|
Ev Lacey
|
d527cf0b3d
|
[FIX]Patch run-cluster.sh (fix for #28328) (#30002)
Signed-off-by: elacey <elacey@nvidia.com>
Signed-off-by: Ev Lacey <github@everettlacey.com>
|
2025-12-11 23:36:31 +00:00 |
|
Concurrensee
|
2cc5affc38
|
[ROCM][CI] Fix AMD Examples Test Group (#30276)
Signed-off-by: Yida Wu <yida.wu@amd.com>
Signed-off-by: Yida <yida.wu@amd.com>
|
2025-12-11 18:03:54 -05:00 |
|
Andrew Briand
|
a00d88973d
|
[EPLB] Support EPLB w/ NVFP4 (#29804)
Signed-off-by: Andrew Briand <abriand@nvidia.com>
Co-authored-by: Andrew Briand <abriand@nvidia.com>
|
2025-12-11 22:59:40 +00:00 |
|
Wentao Ye
|
61249b177d
|
[Refactor] Remove useless syncwarp (#30510)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
|
2025-12-11 17:43:41 -05:00 |
|
Wentao Ye
|
c817b14151
|
[Perf] Optimize deepgemm experts initialization, 3.9% TTFT improvement (#30494)
Signed-off-by: yewentao256 <zhyanwentao@126.com>
Co-authored-by: li-jinpeng <3332126450@qq.com>
Co-authored-by: youkaichao <youkaichao@gmail.com>
|
2025-12-11 17:28:34 -05:00 |
|
ioana ghiban
|
3efdc3feae
|
[Docs][CPU backend] Add pre-built Arm CPU Docker images (#30491)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
|
2025-12-11 22:03:29 +00:00 |
|
Nicolò Lucchesi
|
0efd9f867c
|
[Core] Whisper Enable Encoder Batching (#29421)
Signed-off-by: NickLucche <nlucches@redhat.com>
|
2025-12-11 21:06:51 +00:00 |
|
Xingyu Liu
|
90d6cf921f
|
[BugFix][MM]support VLLM_RANDOMIZE_DP_DUMMY_INPUTS (#30472)
Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-11 21:00:15 +00:00 |
|
Harry Mellor
|
cf3eacfe58
|
Standardise get_rope to use rope_parameters["partial_rotary_factor"], not rotary_dim (#30389)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 20:45:23 +00:00 |
|
Zhengxu Chen
|
92fea56fd1
|
[compile] Stop one-off setting enable_aot_compile and use context manager instead. (#30503)
Signed-off-by: zhxchen17 <zhxchen17@fb.com>
|
2025-12-11 20:28:03 +00:00 |
|
Ye (Charlotte) Qi
|
e458270a95
|
[Misc] Add mcp to requirements (#30474)
Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>
|
2025-12-11 20:06:09 +00:00 |
|
Andreas Karatzas
|
72aaac5b66
|
[ROCm][Bugfix] Add MLACommonMetadata to allowed attention types for speculative decoding (#30430)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
|
2025-12-11 19:25:01 +00:00 |
|
汪志鹏
|
0e71eaa644
|
[Feature] AWQ marlin quantization support for fused moe with lora (#30442)
Signed-off-by: princepride <wangzhipeng628@gmail.com>
|
2025-12-11 18:03:32 +00:00 |
|
Harry Mellor
|
8781cd6b88
|
Add Eagle and Eagle3 support to Transformers modeling backend (#30340)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 17:02:10 +00:00 |
|
Julien Denize
|
aa3c250c48
|
[IMPROVEMENT] Change MistralReasoningParser behavior (#30391)
Signed-off-by: juliendenize <julien.denize@mistral.ai>
Signed-off-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
|
2025-12-11 17:53:26 +01:00 |
|
Shengqi Chen
|
305b168a9f
|
[CI] refine more logic when generating and using nightly wheels & indices, add cuda130 build for aarch64, specify correct manylinux version (#30341)
Signed-off-by: Shengqi Chen <harry-chen@outlook.com>
|
2025-12-12 00:42:30 +08:00 |
|
Harry Mellor
|
93db3256a4
|
Give pooling examples better names (#30488)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 16:22:58 +00:00 |
|
ioana ghiban
|
17cb540248
|
[Docs][CPU Backend] Add nightly and per revision pre-built Arm CPU wheels (#30402)
Signed-off-by: Ioana Ghiban <ioana.ghiban@arm.com>
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 15:57:10 +00:00 |
|
Harry Mellor
|
97a042f3bc
|
Make the httpx logger less annoying when Transformers v5 is installed (#30480)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
|
2025-12-11 15:44:56 +00:00 |
|
Cyrus Leung
|
3a3b06ee70
|
[Misc] Improve error message for is_multimodal (#30483)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-11 06:39:51 -08:00 |
|
Martin Hickey
|
f4417f8449
|
[KVConnector] Add KV events to KV Connectors (#28309)
Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>
|
2025-12-11 15:30:29 +01:00 |
|
Qiu
|
a11f4a81e0
|
[Misc][PCP&DCP] relocate PCP feature check (#30050)
Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
|
2025-12-11 03:36:18 -08:00 |
|
Kenichi Maehashi
|
853611bb18
|
Fix typo of endpoint name in CLI args docs (#30473)
Signed-off-by: Kenichi Maehashi <maehashi@preferred.jp>
|
2025-12-11 11:07:56 +00:00 |
|
Cyrus Leung
|
d917747c95
|
[Bugfix] Fix task still being passed in tests/benchmarks (#30476)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-11 10:33:55 +00:00 |
|
wang.yuqi
|
a5f9fb5960
|
[Deprecation] Deprecation --convert reward, use --convert embed instead. (#30463)
Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io>
|
2025-12-11 10:18:25 +00:00 |
|
jeremyteboul
|
4515eb1a0b
|
[Fix] Update lazing loading of video loader backend (#30444)
Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com>
Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>
|
2025-12-11 10:14:57 +00:00 |
|
Cyrus Leung
|
13d63b65e0
|
[Deprecation] Remove missed fallback for embed_input_ids (#30469)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
|
2025-12-11 10:06:36 +00:00 |
|