Harry Mellor
be8168ff88
Fix Gemma3 GGUF for Transformers v5 ( #33683 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 12:36:53 +00:00
Harry Mellor
f6af34626d
Fix offline test for Transformers v5 ( #33682 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-03 12:07:24 +00:00
Song Zhixin
ceab70c89d
[Bugfix] fix qwen3-asr response error ( #33644 )
...
Signed-off-by: jesse <szxfml@gmail.com >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-03 03:33:56 -08:00
Cyrus Leung
52683ccbe1
[Misc] Update default image format of encode_base64 ( #33656 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-03 03:13:16 -08:00
Michael Goin
e346e2d056
[Bugfix] Disable RoutingMethodType.[Renormalize,RenormalizeNaive] TRTLLM per-tensor FP8 MoE ( #33620 )
...
Signed-off-by: mgoin <mgoin64@gmail.com >
2026-02-03 10:37:15 +00:00
Cyrus Leung
83449a5ff0
[Refactor] Clean up pooling serial utils ( #33665 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-03 10:29:18 +00:00
Lucas Hänke de Cansino
dad2d6a590
[Bugfix][Model] Fix DeepSeek-OCR-2 chat template to include BOS token ( #33642 )
...
Signed-off-by: l4b4r4b4b4 <lucas.cansino@mail.de >
2026-02-03 00:35:58 -08:00
Isotr0py
32e84fa1ff
[CI/Build] Investigate torchrun distributed tests hanging issue ( #33650 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-03 15:49:17 +08:00
Richard Zou
fd9c83d0e0
[torch.compile] Document the workaround to standalone_compile failing ( #33571 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-03 07:16:55 +00:00
杨朱 · Kiki
b95cc5014d
[Misc] Remove deprecated VLLM_ALL2ALL_BACKEND environment variable ( #33535 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-03 15:01:59 +08:00
Nick Hill
61397891ce
[Minor] Some code simplification in scheduler.py ( #33597 )
...
Signed-off-by: Nick Hill <nickhill123@gmail.com >
2026-02-03 15:00:00 +08:00
杨朱 · Kiki
ef248ff740
[Misc] Remove deprecated profiler environment variables ( #33536 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-03 14:58:44 +08:00
Kunshang Ji
e10604480b
[XPU][1/N] Deprecate ipex and switch to vllm-xpu-kernels for xpu platform ( #33379 )
...
Signed-off-by: Kunshang Ji <kunshang.ji@intel.com >
2026-02-02 22:46:10 -08:00
Chauncey
bf001da4bf
[Bugfix] Interleaved thinking keeps compatibility with reasoning_content ( #33635 )
...
Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com >
Co-authored-by: Koushik Dutta <koushd@gmail.com >
2026-02-03 06:46:05 +00:00
杨朱 · Kiki
a0a984ac2e
[CI/Build] Remove hardcoded America/Los_Angeles timezone from Dockerfiles ( #33553 )
...
Signed-off-by: carlory <baofa.fan@daocloud.io >
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com >
2026-02-02 22:32:39 -08:00
Shengliang Xu
f1cb9b5544
Fix quantized Falcon-H1 model loading issues ( #32728 )
...
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-02 22:31:27 -08:00
Daniel Mescheder
4c4b6f7a97
[Frontend] Add sampling parameters to Responses API ( #32609 )
...
Signed-off-by: Daniel Mescheder <dmesch@amazon.com >
Co-authored-by: Daniel Mescheder <dmesch@amazon.com >
2026-02-03 13:51:10 +08:00
Roger Wang
10546f925a
[Bugfix] Fix mm budget setting for Qwen Omni models ( #33634 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-03 04:56:25 +00:00
Radu Salavat
e69c990c21
[Feature][CPU Backend]: Optimize ARM vectorization backend ( #30329 )
...
Signed-off-by: Radu Salavat <radu.salavat@arm.com >
2026-02-02 20:17:56 -08:00
Richard Zou
5eac9a1b34
[torch.compile] Don't do the fast moe cold start optimization if there is speculative decoding ( #33624 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com >
2026-02-03 03:38:49 +00:00
Nathan Weinberg
1b60b45d0d
[CI/Build] add directions for CPU image upload to Docker Hub ( #32032 )
...
Signed-off-by: Nathan Weinberg <nweinber@redhat.com >
Signed-off-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com >
Co-authored-by: Li, Jiang <bigpyj64@gmail.com >
2026-02-03 02:48:06 +00:00
Dezhan
4b3803d180
[BugFix] DPMetadata raises assert error for dense model ( #32739 )
...
Co-authored-by: Dezhan Tu <dztu@meta.com >
2026-02-03 00:56:44 +00:00
Patrick von Platen
5019c59dd2
[Voxtral Realtime] Introduce global log mel max ( #33574 )
...
Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 17:01:47 -05:00
Lain
089cd4f002
fix cutlass_3x_gemm_fp8_blockwise on sm103a ( #32224 )
...
Signed-off-by: Siyuan Fu <siyuanf@nvidia.com >
Co-authored-by: Pavani Majety <pmajety@nvidia.com >
2026-02-02 11:47:46 -08:00
Vasiliy Kuznetsov
0130223bd9
fix memory for online fp8 quantization with streaming weight load ( #31914 )
...
Signed-off-by: vasiliy <vasiliy@fb.com >
2026-02-02 14:17:42 -05:00
Matthew Bonanni
5d1aef3004
[UX] Format attention backend log line ( #33570 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-02 18:57:12 +00:00
yugong333
ffe1fc7a28
Reduce the kernel overhead when num of active loras is smaller than max loras. Multiple cuda graphs are captured for each num of active-loras. ( #32005 )
...
Signed-off-by: Yu Gong <yu3.gong@gmail.com >
2026-02-02 12:30:06 -05:00
Harry Mellor
8b7346d5f1
Update huggingface-hub again ( #33567 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-02 09:20:54 -08:00
Harry Mellor
6141ebe0dd
Remove incorrect tokenizer info test ( #33565 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-02 17:11:44 +00:00
Yang Liu
199e3cb476
[Model] Use mm_position to compute mrope positions for GLM-4.xV ( #33039 )
...
Signed-off-by: Yang <lymailforjob@gmail.com >
2026-02-02 16:55:48 +00:00
Matthew Bonanni
9f8cb81b44
[CI] Add DeepSeek V3.2 nightly eval ( #33566 )
...
Signed-off-by: Matthew Bonanni <mbonanni@redhat.com >
2026-02-02 16:10:02 +00:00
Cyrus Leung
d7e17aaacd
[Refactor] Move profiling methods to MM budget ( #33559 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 23:27:00 +08:00
Kebe
528e9b1490
[Feature][Core] Support Fabric detection to adapt the MNNVL protocol for the GB series ( #33540 )
...
Signed-off-by: Kebe <mail@kebe7jun.com >
Signed-off-by: youkaichao <youkaichao@gmail.com >
Co-authored-by: Thomas Vegas <tvegas@nvidia.com >
Co-authored-by: youkaichao <youkaichao@gmail.com >
2026-02-02 22:55:46 +08:00
shanjiaz
d95b4be47a
move spec decode slow test to test_areas.yaml ( #33365 )
...
Signed-off-by: shanjiaz <zsjwpianpian@gmail.com >
2026-02-02 06:28:36 -08:00
Isotr0py
4061dcf4c5
[Bugfix] Enable Kimi k25 processor test ( #33562 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 14:25:25 +00:00
danielafrimi
0aca8b8c62
[MoE] Enable Shared/Routed Overlap For Latent MoE (Nemotron-H) ( #32790 )
...
Signed-off-by: dafrimi <dafrimi@nvidia.com >
2026-02-02 09:18:50 -05:00
Rabi Mishra
9eb58f8cf1
fix[ROCm]: Remove unconditional aiter import ( #32902 )
...
Signed-off-by: rabi <ramishra@redhat.com >
2026-02-02 22:10:02 +08:00
Cyrus Leung
b10d05b8a8
[Model] Use explicit types in get_generation_prompt ( #33551 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 12:38:49 +00:00
Borushiki
b398e5c819
Update get_expert_mapping to include self parameter ( #33525 )
...
Signed-off-by: Borushiki <38628261+Otsutsukii@users.noreply.github.com >
2026-02-02 20:29:07 +08:00
Grzegorz K. Karch
78061ef584
Fix accessing hidden_act from model config ( #32686 )
...
Signed-off-by: Grzegorz Karch <gkarch@nvidia.com >
2026-02-02 11:11:33 +00:00
Nicolò Lucchesi
528b3076af
[CI][Bugfix] Fix flaky tests/v1/kv_connector/unit/test_multi_connector.py::test_multi_example_connector_consistency ( #33555 )
...
Signed-off-by: NickLucche <nlucches@redhat.com >
2026-02-02 03:01:29 -08:00
Cyrus Leung
a502831d36
[Chore] Remove redundant input parsing methods ( #33542 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-02 10:50:47 +00:00
Komal Kumar Teru
ba871fb788
[Misc] support arbitrary MM datasets in spec dec bench ( #33486 )
...
Signed-off-by: kkt-cohere <komal@cohere.com >
Signed-off-by: Komal Kumar Teru <162363718+kkt-cohere@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-02 08:49:48 +00:00
R3hankhan
ab374786c7
[CPU][IBM Z][Dockerfile] Fix IBM Z builds ( #33243 )
...
Signed-off-by: Rehan Khan <Rehan.Khan7@ibm.com >
2026-02-01 23:41:29 -08:00
RED
808dd87b30
[Model] Support DeepSeek-OCR-2 ( #33165 )
...
Signed-off-by: liuli <ll407707@alibaba-inc.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: liuli <ll407707@alibaba-inc.com >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-02 06:24:10 +00:00
Andy Lo
beb8899482
Fix mistral sliding window parsing ( #33521 )
...
Signed-off-by: Andy Lo <andy@mistral.ai >
2026-02-02 05:08:04 +00:00
Sawyer Bowerman
ce88756b96
[Doc]: update paths for Offline/Online/Others example sections ( #33494 )
...
Signed-off-by: Sawyer Bowerman <sbowerma@redhat.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-02 03:56:53 +00:00
Paco Xu
a3154a6092
[Doc] add missing model entries in supported_models.md ( #33220 )
...
Signed-off-by: Paco Xu <paco.xu@daocloud.io >
2026-02-02 03:37:25 +00:00
jack
7c036432fc
[Bugfix] GLM-4 tool parser: incremental string streaming ( #33218 )
...
Signed-off-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com >
Co-authored-by: QwertyJack <7554089+QwertyJack@users.noreply.github.com >
2026-02-02 11:13:31 +08:00
Robert Shaw
318b120766
[Nightly CI] Remove CT Model ( #33530 )
...
Signed-off-by: Robert Shaw <robshaw@redhat.com >
Co-authored-by: Robert Shaw <robshaw@redhat.com >
2026-02-01 19:09:09 -08:00