biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Isotr0py	3bb4e4311c	[Models] Fuse Qwen3.5 GDN's qkvz_proj and ba_proj (#34492 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-16 07:32:51 -08:00
Amr Mahdi	08f8c198ae	[CI] Disable precompiled wheel path in CI image builds (#34606 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2026-02-16 15:14:43 +00:00
Harry Mellor	a21cedf4ff	Bump `lm-eval` version for Transformers v5 compatibility (#33994 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-02-16 05:24:35 -08:00
emricksini-h	3ef74cde5d	[CI][Tracing] Fix race condition by adding server readiness check (#34364 ) Attempt to resolve #34284: "Metrics Tracing (2GPU)" fails with a segmentation fault. Signed-off-by: emricksini-h <emrick.birivoutin@hcompany.ai>	2026-02-16 12:57:39 +00:00
Ekagra Ranjan	cd81cdb399	[Scheduler][ASR] Fix CrossAttn blocks per-request for Variable length encoder inputs (#31058 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com> Co-authored-by: Nicolò Lucchesi <nlucches@redhat.com>	2026-02-16 11:08:44 +00:00
Andreas Karatzas	1e828573b4	[CI][Metrics] Stabilize tests with polling and subprocess guards (#34566 ) test_abort_metrics_reset is flaky due to hardware-dependent fixed sleeps: replace fixed sleeps with polling. test_metrics_exist_run_batch passes even when the engine crashes on startup (false positive): add subprocess lifecycle guards. Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-16 10:52:02 +00:00
Samu Tamminen	a5ccc85c8c	[Bugfix] Fix Dynamo unexpected keyword argument (#34320 ) Signed-off-by: Samu Tamminen <stammine@amd.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-02-16 01:32:30 -08:00
Roger Wang	b5475d0534	Revert "[Misc] fix qwen3.5 config" (#34610 )	2026-02-16 01:06:05 -08:00
JJJYmmm	9521002f0a	[Misc] fix qwen3.5 config (#34604 )	2026-02-16 00:25:38 -08:00
Cyrus Leung	ec17bdd894	[Renderer] Move InputPreprocessor into Renderer (1.5/2) (#34598 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-02-15 23:46:33 -08:00
Amr Mahdi	bb59c90248	[CI] Write bake config to temp directory instead of repo root (#34569 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2026-02-15 22:15:47 -08:00
bnellnm	5bff999d12	[Bugfix] Add method to swap quant_method on FusedMoE to fix LoRA issues (#34453 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-02-15 20:10:50 -08:00
Lucas Wilkinson	bb85929aa6	[BugFix] Fix Python 3.13 FlashMLA import error (#34548 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-02-15 20:09:18 -08:00
Parth Bansal	5653021094	[Doc] Add Mistral-7b-v0.3 model to the batch invariance validated model (#34584 ) Signed-off-by: Parth Bansal <parthbansal127@gmail.com>	2026-02-16 12:09:00 +08:00
Andreas Karatzas	974d829b05	[CI][Frontend] Return 422 instead of 500 for invalid Anthropic tool_choice (#34590 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-15 20:06:48 -08:00
Isotr0py	91ac5d9bfd	[CI/Build] Enable tests for recent day-0 new models (#34585 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 18:17:04 -08:00
Luka Govedič	23d825aba1	[torch.compile] Disable ar-rms fusion for ds3-fp4 & DP, fix CI test (#34392 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-02-15 06:33:57 -08:00
Maryam Tahhan	f07a128413	[CPU][ARM] Add ARM BF16 cross-compilation support and improve documen… (#33079 ) Signed-off-by: Maryam Tahhan <mtahhan@redhat.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-02-15 06:33:08 -08:00
Isotr0py	71cd89264f	[MM Encoder] Add Triton ViT attention backend (#32183 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 06:32:47 -08:00
Isotr0py	19fab44152	[Doc] Update Encoder-Decoder models support doc with Florence-2 (#34581 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-15 04:18:57 -08:00
Seiji Eicher	79c7e09235	[KV Connector] Add temporary, off-by-default `VLLM_DISABLE_REQUEST_ID_RANDOMIZATION` workaround (#34415 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-02-14 23:26:10 -08:00
haosdent	79f3fab05a	[Bugfix] Handle num_expert_group=None in flashinfer block-scale FP8 MoE (#34494 ) Signed-off-by: haosdent <haosdent@gmail.com>	2026-02-14 23:25:46 -08:00
Vadim Gimpelson	604b9eaec5	[BUGFIX] Fix accuracy regression for NVIDIA-Nemotron-3-Nano-30B-A3B-NVFP4 with TP>1 (#34476 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-02-14 23:25:17 -08:00
Stanislav Kirillov	50dbd6c9e6	[bugfix] Fix critical bug when reporting for all paths where handler.create_error_response is used (#34516 ) Signed-off-by: Stanislav Kirillov <stas@nebius.com> Co-authored-by: Stanislav Kirillov <stas@nebius.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-14 23:24:25 -08:00
Andreas Karatzas	98bcc6ca59	[CI][Entrypoints] Validate detokenize token IDs to prevent int64 overflow causing 500 (#34468 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-14 23:08:38 -08:00
Andreas Karatzas	f13e86d8dd	[Kernels] Fix Helion GPU utils to use platform-agnostic device name API (#34537 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-14 20:29:23 -08:00
Woosuk Kwon	9ca768c740	[Model Runner V2] Minor cleanup for Sampler (#34563 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-14 18:29:03 -08:00
Thomas Parnell	d5fe3f702c	[Hybrid] Enable mamba prefix cache "align" mode with async scheduling (#33997 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2026-02-14 13:15:56 -08:00
Cyrus Leung	73391a1baa	[Renderer] Move InputPreprocessor into Renderer (1/2) (#34510 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-14 10:14:21 -08:00
Andreas Karatzas	b3c14229b0	[ROCm][CI] Guard sparse MLA backend imports for ROCm compatibility in tests (#34538 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-14 07:32:09 -08:00
Roger Wang	2f186635cb	[Bugfix] Fix Qwen3.5 config loading (#34554 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-14 03:56:11 -08:00
Christian Pinto	342a7cda2d	[Misc] Update tests and examples for Prithvi/Terratorch models (#34416 ) Signed-off-by: Christian Pinto <christian.pinto@ibm.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-13 23:03:51 -08:00
Kata Coder	d1ea65d0a1	[new model] add COLQwen3 code & Inference (#34398 ) Signed-off-by: craftsangjae <craftsangjae@gmail.com> Signed-off-by: katacoder <craftsangjae@gmail.com>	2026-02-14 12:15:19 +08:00
Andreas Karatzas	de42abb366	[CI] Heavy refactoring of Voxtral multimodal audio model tests (#34294 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-13 20:04:29 -08:00
Julien Denize	60ca7981bc	Add explicit validation error for tool calls. (#34438 ) Signed-off-by: juliendenize <julien.denize@mistral.ai>	2026-02-13 20:04:01 -08:00
Christian S. Perone	0ef5b9147b	fix: use `__annotations__` instead of `get_type_hints()` for dynamic `kwargs` detection (#34527 ) Signed-off-by: Christian S. Perone <christian.perone@gmail.com> Signed-off-by: Christian S. Perone <perone@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-02-13 20:03:37 -08:00
Shiyan Deng	ed242652d7	[bug] Make sure get_modality_with_max_tokens is deterministic (#34533 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2026-02-13 20:02:59 -08:00
Wei Zhao	b37b679770	[Feature][Perf] Support Selective CPU Weight Offloading (#34535 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-02-13 20:02:24 -08:00
Andreas Karatzas	a0638d052d	[Bugfix] Fix ROCm UVA CPU weight offloading broken by #32993 (#34543 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-13 20:01:42 -08:00
Harry Huang	c027541eaf	[Hybrid] Enable spec decoding in mamba cache align mode (#33705 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com>	2026-02-13 13:02:28 -08:00
Ben Browning	fd267bc7b7	[Bugfix]: Fix structured output in multi-turn gpt-oss (#34454 ) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2026-02-13 11:12:48 -08:00
Michael Goin	bfaa559305	Revert "[Bugfix] Fix fused MoE IMA (sans chunking) by using int64 for strides" (#34530 )	2026-02-13 10:35:29 -08:00
Richard Zou	87789c8364	[Misc] vLLM's --enforce-eager should turn off compile and cudagraphs only (#34523 ) Signed-off-by: Richard Zou <zou3519@gmail.com>	2026-02-13 09:52:20 -08:00
Pushpinder Singh	bcd65c1f6a	[Bugfix] Replace c10::optional with std::optional in topk kernel (#34467 ) Signed-off-by: Pushpinder Singh <pushpindersingh135@gmail.com>	2026-02-13 08:30:23 -08:00
Wei Zhao	59d53066d8	[Feature] Support CPU Offloading without Pytorch Pinned Memory that leads to doubled allocation (#32993 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-02-13 08:11:26 -08:00
LoganJane	4a9952ec1b	[Bugfix] Add quant_config in ViT of Kimi-K2.5 (#34501 ) Signed-off-by: LoganJane <LoganJane73@hotmail.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-02-13 16:05:34 +00:00
Roger Wang	1dae7b7843	[Bugfix] Exclude `language_model_only` key from MM AOT compile hash but include in model one (#34508 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-13 13:59:00 +00:00
Roger Wang	5885e330ef	[Misc] Port Qwen3.5 Configs (#34512 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2026-02-13 05:24:25 -08:00
Ilya Boytsov	071d863e20	Extend ColBERT support to non-standard BERT backbones (#34170 ) Signed-off-by: Ilya Boytsov <ilya.boytsov@aleph-alpha.com>	2026-02-13 09:53:09 +00:00
Woosuk Kwon	0916e7960b	[GDN] Use CPU tensors to build GDN metadata (#34498 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-02-13 01:24:45 -08:00

1 2 3 4 5 ...

13960 Commits