biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Andreas Karatzas	f2b6dfd237	[ROCm][CI] Fix language generation test accuracy by disabling HF flash_sdp and mem_efficient_sdp (#31597 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-05 02:17:05 +00:00
Andreas Karatzas	89f1f25310	[CI] Skip Phi-MoE test due to old API util (#31632 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-05 08:52:07 +08:00
Nick Hill	b53b89fdb3	[BugFix] Async scheduling: handle model forward errors more cleanly (#31611 ) Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-04 11:04:37 -08:00
Ning Xie	6522721d17	[misc] Sort uvicorn log level description according to verbosity (#31137 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-01-04 18:45:37 +00:00
Yuxuan Zhang	0d4044edd8	fix no think of GLM-4.5 / GLM-4.7 (#31449 ) Signed-off-by: zRzRzRzRzRzRzR <2448370773@qq.com>	2026-01-04 11:43:00 +08:00
Reagan Lee	41ab179738	[Docs] Fix argparse include path for mm-processor benchmark (#31654 ) Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-01-04 03:31:29 +00:00
Robert Shaw	268b1c55ad	[MoE Refactor][13/N] Convert FI to Use PFNoEP (#31533 ) Signed-off-by: Robert Shaw <robshaw@redhat.com> Signed-off-by: Robert Shaw <robertgshaw2@gmail.com> Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Robert Shaw <robshaw@redhat.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2026-01-03 12:26:36 -08:00
Andreas Karatzas	4f9ce35afe	[CI][Bugfix] Fix token counting in chunked prefill compl test (#31630 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-03 14:28:49 +08:00
jeremyteboul	97a01308e9	Improve HF qwen3_omni: preserve audio_sample_rate in kwargs restructuring (#29255 ) Signed-off-by: Jeremy Teboul <jeremyteboul@fb.com> Co-authored-by: Jeremy Teboul <jeremyteboul@fb.com>	2026-01-03 04:31:09 +00:00
Xingyu Liu	0eee877f67	[Core] Parse vLLM engine required fields from hf_config to model_arch_config (#28454 ) Signed-off-by: Xingyu Liu <charlotteliu12x@gmail.com> Signed-off-by: Xingyu Liu <38244988+charlotte12l@users.noreply.github.com>	2026-01-02 15:13:15 -08:00
Alfred	a0e9ee83c7	[Benchmark] Fix OOM during MoE kernel tuning for large models (#31604 ) Signed-off-by: Alfred <massif0601@gmail.com>	2026-01-02 22:24:51 +00:00
Yongye Zhu	a3f2f40947	[MoE Refactor] Explicit construct mk for flashinfer bf16 kernel (#31504 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-02 13:54:50 -08:00
Yongye Zhu	5a468ff7c7	[MoE Refactor] Split `invoke_fused_moe_kernel` (#31050 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2026-01-02 13:47:15 -08:00
Andreas Karatzas	6ef770df7c	[MoE] Fix output_shape calculation in Attention layer to handle 3D query inputs (#31596 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-02 15:46:23 +00:00
Nick Hill	bd877162eb	[BugFix] Support online dense model DP without overhead (#30739 ) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: njhill <nickhill123@gmail.com>	2026-01-02 23:36:38 +08:00
Xinyu Chen	08f425bad1	CustomOp: test forward dispatch for grouped_topk (#31530 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-01-02 10:04:01 -05:00
labAxiaoming	a01f2faedf	Add multimodal input method in the documentation (#31601 ) Signed-off-by: xiaoming <1259730330@qq.com>	2026-01-02 12:43:30 +00:00
Kyuyeun Kim	cc410e8644	[Bugfix] Fix weight_loader v1 block scale (#31103 ) Signed-off-by: Kyuyeun Kim <kyuyeunk@google.com>	2026-01-02 13:14:10 +08:00
Kevin McKay	825c2dc133	[Bugfix][Hardware][AMD] Fix last_page_len calculation in AITER MLA decode (#31282 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2026-01-01 21:14:00 -08:00
Vaibhav Sourirajan	1f43c121d5	Remove unused `use_marlin` variable in `Mxfp4MoEMethod` (#31549 ) Signed-off-by: vaibhav sourirajan <vs2787@columbia.edu>	2026-01-01 21:13:36 -08:00
Tmn07	ca179d0f64	[Bugfix] Fix activation quantization for compressed-tensors W4A16 (#31572 ) Signed-off-by: Tmn07 <tmn0796@gmail.com>	2026-01-01 21:13:22 -08:00
Andreas Karatzas	013b54088c	[ROCm][CI] Fix ModernBERT token classification test (#31612 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-02 04:19:08 +00:00
Jay Hemnani	5ac55eb30f	[Model] Enable LoRA support for tower and connector in LLaVA (#31513 ) Signed-off-by: Jay Hemnani <jayhemnani9910@gmail.com> Co-authored-by: Jay Hemnani <jayhemnani9910@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-01-01 19:32:39 -08:00
Benjamin Chislett	ea53ca5e85	[Bugfix] Fix block size used in EAGLE slot mapping (#31540 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-01-01 19:32:30 -08:00
zhima771	27864a851c	feat: support LoRA for DeepSeek-OCR(Language Model part) (#31569 ) Signed-off-by: zhima771 <15836938703@163.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com>	2026-01-01 19:32:11 -08:00
Andreas Karatzas	5cc4876630	[ROCm][CI] Fix failure in Language Models Tests (Extra Standard) by reducing agent pool size (#31553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-01 19:29:42 -08:00
Kevin McKay	5fff44064b	[Bugfix] Replace BaseException with specific exceptions in FLA utils (#31590 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2026-01-01 19:27:54 -08:00
Reagan Lee	1f5b7c41c3	Add Multimodal Processor Benchmark (#29105 ) Signed-off-by: Reagan Lee <reaganjlee@gmail.com> Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-01-01 19:26:53 -08:00
Ekagra Ranjan	adcf682fc7	[Audio] Improve Audio Inference Scripts (offline/online) (#29279 ) Signed-off-by: Ekagra Ranjan <3116519+ekagra-ranjan@users.noreply.github.com>	2025-12-31 23:34:18 +00:00
Andreas Karatzas	21de6d4b02	[CI][Bugfix] Fix token counting in chunked prefill streaming test (#31565 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-31 23:05:14 +00:00
Nick Hill	6c2cfb62ff	[BugFix] Fix async scheduling for pooling models (#31584 ) Signed-off-by: njhill <nickhill123@gmail.com>	2025-12-31 14:48:51 -08:00
Fanjiang Ye	d8da76f3b7	[Bugfix] Fix BAGEL online serving for text and image understanding (#31546 ) Signed-off-by: Dylan1229 <yvanphys@gmail.com> Signed-off-by: UED <zxr3611244710@gmail.com> Signed-off-by: mr-ye-cao <yecaoyc2019@gmail.com> Co-authored-by: UED <zxr3611244710@gmail.com> Co-authored-by: mr-ye-cao <yecaoyc2019@gmail.com> Co-authored-by: Mr-Ye-Cao <60802056+Mr-Ye-Cao@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-31 14:46:10 -08:00
baonudesifeizhai	d722e9e614	Add GLM-ASR multimodal support (#31436 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com> Signed-off-by: baonudesifeizhai <85092850+baonudesifeizhai@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-31 23:12:24 +08:00
Andreas Karatzas	cf16342d43	[ROCm][CI] Update MiniCPM model test: MiniCPM3-4B to MiniCPM4.1-8B and simplify attention backend testing (#31551 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2025-12-31 00:12:01 -08:00
Wentao Ye	357d435c54	[Bug] Fix log issue with `\n` (#31390 ) Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-12-30 21:16:55 -08:00
danisereb	108a2728f7	Add get_expert_mapping to NemotronHModel (for LoRA support) (#31539 ) Signed-off-by: Daniel Serebrenik <daserebrenik@nvidia.com>	2025-12-30 21:09:03 -08:00
TJian	578c8f51f6	[CI] [Critical] [CUDA] Fix duplicated test name (#31562 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-12-30 21:01:09 -08:00
maang-h	b4bb5f312f	[Core] Remove unused `num_tokens` parameter from `_init_model_kwargs` (#31517 ) Signed-off-by: maang <maang_h@163.com>	2025-12-30 20:47:23 -08:00
SameerAsal	70e1acefcd	[BugFix] Fix NUMA node validation in CPU platform (#31520 ) Signed-off-by: SameerAsal <SameerAsal@users.noreply.github.com> Co-authored-by: SameerAsal <SameerAsal@users.noreply.github.com>	2025-12-31 04:06:49 +00:00
Qiu	84f6cd741b	[Mics] add pcp basic support to MoE model (#31003 )	2025-12-30 20:01:29 -08:00
B-201	ecd49ce7e6	[Fix] Align fused moe lora_b shape with peft (#31534 ) Signed-off-by: bk-201 <joy25810@foxmail.com>	2025-12-31 09:44:59 +08:00
Amr Mahdi	e1ee11b2a5	Add docker buildx bake configuration (#31477 ) Signed-off-by: Amr Mahdi <amrmahdi@meta.com>	2025-12-31 01:08:54 +00:00
vintipandey	04147dcfa7	[Bugfix]Fix pooling model always disabled due to incorrect PP rank check (#31505 ) Signed-off-by: vintipandey <vinti.pandey@gmail.com>	2025-12-30 11:27:10 -08:00
JartX	07728bf5cd	[BugFix] add select_gemm_impl on CompressedTensorsWNA16MoEMethod to support LoRA (#31453 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2025-12-30 11:20:15 -08:00
yt0428	3f52fa5aa2	[Model] Add support for openPangu moe model (#28775 ) Signed-off-by: yuantao <2422264527@qq.com> Signed-off-by: yt0428 <51468697+yt0428@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>	2025-12-30 08:11:38 -08:00
Li, Jiang	7157596103	[CPU] Disable async schedule on CPU (#31525 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-12-30 12:34:08 +00:00
Nicolò Lucchesi	ab1af6aa3e	[CI][NIXL] Split DPEP tests (#31491 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2025-12-30 07:26:12 -05:00
Pleaplusone	1a834df2d4	[ROCm][Bugfix] Fix accuracy issue on fmoe when `VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS` enabled (#31523 ) Signed-off-by: ganyi <ygan@amd.com>	2025-12-30 09:21:49 +00:00
Kevin	51085c2aeb	[Frontend] add continue_final_message parameter to /embeddings endpoint (#31497 ) Signed-off-by: Kevin P-W <140451262+kevin-pw@users.noreply.github.com>	2025-12-30 07:21:13 +00:00
Roger Feng	3d973764ce	[xpu] [bugfix] upgrade to latest oneccl in dockerfile (#31522 ) Signed-off-by: roger feng <roger.feng@intel.com>	2025-12-30 14:52:28 +08:00

1 2 3 4 5 ...

12630 Commits