Isotr0py
3bb4e4311c
[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj ( #34492 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-16 07:32:51 -08:00
Amr Mahdi
08f8c198ae
[CI] Disable precompiled wheel path in CI image builds ( #34606 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-16 15:14:43 +00:00
Harry Mellor
a21cedf4ff
Bump lm-eval version for Transformers v5 compatibility ( #33994 )
...
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com >
2026-02-16 05:24:35 -08:00
emricksini-h
3ef74cde5d
[CI][Tracing] Fix race condition by adding server readiness check ( #34364 )
...
Attempt to resolve #34284 : "Metrics Tracing (2GPU)" fails with a
segmentation fault.
Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai >
2026-02-16 12:57:39 +00:00
Ekagra Ranjan
cd81cdb399
[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs ( #31058 )
...
Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com >
Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com >
2026-02-16 11:08:44 +00:00
Andreas Karatzas
1e828573b4
[CI][Metrics] Stabilize tests with polling and subprocess guards ( #34566 )
...
test_abort_metrics_reset is flaky due to hardware-dependent
fixed sleeps: replace fixed sleeps with polling.
test_metrics_exist_run_batch passes even when the engine crashes
on startup (false positive): add subprocess lifecycle guards.
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-16 10:52:02 +00:00
Samu Tamminen
a5ccc85c8c
[Bugfix] Fix Dynamo unexpected keyword argument ( #34320 )
...
Signed-off-by: Samu Tamminen <stammine@amd.com >
Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com >
2026-02-16 01:32:30 -08:00
Roger Wang
b5475d0534
Revert "[Misc] fix qwen3.5 config" ( #34610 )
2026-02-16 01:06:05 -08:00
JJJYmmm
9521002f0a
[Misc] fix qwen3.5 config ( #34604 )
2026-02-16 00:25:38 -08:00
Cyrus Leung
ec17bdd894
[Renderer] Move InputPreprocessor into Renderer (1.5/2) ( #34598 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
2026-02-15 23:46:33 -08:00
Amr Mahdi
bb59c90248
[CI] Write bake config to temp directory instead of repo root ( #34569 )
...
Signed-off-by: Amr Mahdi <amrmahdi@meta.com >
2026-02-15 22:15:47 -08:00
bnellnm
5bff999d12
[Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues ( #34453 )
...
Signed-off-by: Bill Nell <bnell@redhat.com >
2026-02-15 20:10:50 -08:00
Lucas Wilkinson
bb85929aa6
[BugFix] Fix Python 3.13 FlashMLA import error ( #34548 )
...
Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com >
Co-authored-by: Jee Jee Li <pandaleefree@gmail.com >
2026-02-15 20:09:18 -08:00
Parth Bansal
5653021094
[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model ( #34584 )
...
Signed-off-by: Parth Bansal <parthbansal127@gmail.com >
2026-02-16 12:09:00 +08:00
Andreas Karatzas
974d829b05
[CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice ( #34590 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-15 20:06:48 -08:00
Isotr0py
91ac5d9bfd
[CI/Build] Enable tests for recent day-0 new models ( #34585 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 18:17:04 -08:00
Luka Govedič
23d825aba1
[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test ( #34392 )
...
Signed-off-by: Luka Govedič <lgovedic@redhat.com >
Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com >
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-02-15 06:33:57 -08:00
Maryam Tahhan
f07a128413
[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… ( #33079 )
...
Signed-off-by: Maryam Tahhan <mtahhan@redhat.com >
Co-authored-by: Li, Jiang <jiang1.li@intel.com >
2026-02-15 06:33:08 -08:00
Isotr0py
71cd89264f
[MM Encoder] Add Triton ViT attention backend ( #32183 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 06:32:47 -08:00
Isotr0py
19fab44152
[Doc] Update Encoder-Decoder models support doc with Florence-2 ( #34581 )
...
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-15 04:18:57 -08:00
Seiji Eicher
79c7e09235
[KV Connector] Add temporary, off-by-default VLLM_DISABLE_REQUEST_ID_RANDOMIZATION workaround ( #34415 )
...
Signed-off-by: Seiji Eicher <seiji@anyscale.com >
2026-02-14 23:26:10 -08:00
haosdent
79f3fab05a
[Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE ( #34494 )
...
Signed-off-by: haosdent <haosdent@gmail.com >
2026-02-14 23:25:46 -08:00
Vadim Gimpelson
604b9eaec5
[BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 ( #34476 )
...
Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com >
2026-02-14 23:25:17 -08:00
Stanislav Kirillov
50dbd6c9e6
[bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used ( #34516 )
...
Signed-off-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Stanislav Kirillov <stas@nebius.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-14 23:24:25 -08:00
Andreas Karatzas
98bcc6ca59
[CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 ( #34468 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 23:08:38 -08:00
Andreas Karatzas
f13e86d8dd
[Kernels] Fix Helion GPU utils to use platform-agnostic device name API ( #34537 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 20:29:23 -08:00
Woosuk Kwon
9ca768c740
[Model Runner V2] Minor cleanup for Sampler ( #34563 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-14 18:29:03 -08:00
Thomas Parnell
d5fe3f702c
[Hybrid] Enable mamba prefix cache "align" mode with async scheduling ( #33997 )
...
Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com >
2026-02-14 13:15:56 -08:00
Cyrus Leung
73391a1baa
[Renderer] Move InputPreprocessor into Renderer (1/2) ( #34510 )
...
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk >
Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-14 10:14:21 -08:00
Andreas Karatzas
b3c14229b0
[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests ( #34538 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-14 07:32:09 -08:00
Roger Wang
2f186635cb
[Bugfix] Fix Qwen3.5 config loading ( #34554 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-14 03:56:11 -08:00
Christian Pinto
342a7cda2d
[Misc] Update tests and examples for Prithvi/Terratorch models ( #34416 )
...
Signed-off-by: Christian Pinto <christian.pinto@ibm.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 23:03:51 -08:00
Kata Coder
d1ea65d0a1
[new model] add COLQwen3 code & Inference ( #34398 )
...
Signed-off-by: craftsangjae <craftsangjae@gmail.com >
Signed-off-by: katacoder <craftsangjae@gmail.com >
2026-02-14 12:15:19 +08:00
Andreas Karatzas
de42abb366
[CI] Heavy refactoring of Voxtral multimodal audio model tests ( #34294 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:04:29 -08:00
Julien Denize
60ca7981bc
Add explicit validation error for tool calls. ( #34438 )
...
Signed-off-by: juliendenize <julien.denize@mistral.ai >
2026-02-13 20:04:01 -08:00
Christian S. Perone
0ef5b9147b
fix: use __annotations__ instead of get_type_hints() for dynamic kwargs detection ( #34527 )
...
Signed-off-by: Christian S. Perone <christian.perone@gmail.com >
Signed-off-by: Christian S. Perone <perone@users.noreply.github.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
2026-02-13 20:03:37 -08:00
Shiyan Deng
ed242652d7
[bug] Make sure get_modality_with_max_tokens is deterministic ( #34533 )
...
Signed-off-by: Shiyan Deng <dsy842974287@meta.com >
2026-02-13 20:02:59 -08:00
Wei Zhao
b37b679770
[Feature][Perf] Support Selective CPU Weight Offloading ( #34535 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
2026-02-13 20:02:24 -08:00
Andreas Karatzas
a0638d052d
[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 ( #34543 )
...
Signed-off-by: Andreas Karatzas <akaratza@amd.com >
2026-02-13 20:01:42 -08:00
Harry Huang
c027541eaf
[Hybrid] Enable spec decoding in mamba cache align mode ( #33705 )
...
Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com >
2026-02-13 13:02:28 -08:00
Ben Browning
fd267bc7b7
[Bugfix]: Fix structured output in multi-turn gpt-oss ( #34454 )
...
Signed-off-by: Ben Browning <bbrownin@redhat.com >
Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com >
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk >
2026-02-13 11:12:48 -08:00
Michael Goin
bfaa559305
Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" ( #34530 )
2026-02-13 10:35:29 -08:00
Richard Zou
87789c8364
[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only ( #34523 )
...
Signed-off-by: Richard Zou <zou3519@gmail.com >
2026-02-13 09:52:20 -08:00
Pushpinder Singh
bcd65c1f6a
[Bugfix] Replace c10::optional with std::optional in topk kernel ( #34467 )
...
Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com >
2026-02-13 08:30:23 -08:00
Wei Zhao
59d53066d8
[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation ( #32993 )
...
Signed-off-by: wzhao18 <wzhao18.sz@gmail.com >
Co-authored-by: Michael Goin <mgoin64@gmail.com >
2026-02-13 08:11:26 -08:00
LoganJane
4a9952ec1b
[Bugfix] Add quant_config in ViT of Kimi-K2.5 ( #34501 )
...
Signed-off-by: LoganJane <LoganJane73@hotmail.com >
Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn >
Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn >
2026-02-13 16:05:34 +00:00
Roger Wang
1dae7b7843
[Bugfix] Exclude language_model_only key from MM AOT compile hash but include in model one ( #34508 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 13:59:00 +00:00
Roger Wang
5885e330ef
[Misc] Port Qwen3.5 Configs ( #34512 )
...
Signed-off-by: Roger Wang <hey@rogerw.io >
2026-02-13 05:24:25 -08:00
Ilya Boytsov
071d863e20
Extend ColBERT support to non-standard BERT backbones ( #34170 )
...
Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com >
2026-02-13 09:53:09 +00:00
Woosuk Kwon
0916e7960b
[GDN] Use CPU tensors to build GDN metadata ( #34498 )
...
Signed-off-by: Woosuk Kwon <woosuk@inferact.ai >
2026-02-13 01:24:45 -08:00