biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Harry Mellor	e5d96dc8fc	Fix `SpeculatorsConfig` now that `PreTrainedConfig` is a `dataclass` in Transformers (#37574 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 18:04:40 +00:00
EdalatiAli	daa05bf340	[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-19 17:58:33 +00:00
Lucas Kabela	7769b58307	[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-19 17:26:12 +00:00
Chauncey	2f9f946b22	[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation (#37535 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-19 16:41:20 +00:00
Fadi Arafeh	2890aecce5	[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37561 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-19 16:35:45 +00:00
Harry Mellor	34f093b417	[CI] Gate pre-commit on `ready` label or number of contributions (#37544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 16:21:57 +00:00
Harry Mellor	4dce8321a9	Run MacOS smoke test on daily `cron` job instead of every commit (#37567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 16:19:50 +00:00
Cyrus Leung	657855ab41	[Misc] Cleanup more configs and processors (#37560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 15:45:23 +00:00
Wei Zhao	e27b8ba3d1	[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#37346 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-19 11:43:06 -04:00
Woosuk Kwon	40b8363b45	[MRV2] Use fp32 for draft logits (#37526 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-19 08:41:21 -07:00
mikaylagawarecki	8b10e4fb31	[1/n] Migrate permute_cols to libtorch stable ABI (#31509 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-19 11:27:26 -04:00
Ifta khairul Alam Adil	104605cbf2	Remove deprecated reasoning_content message field(part-2) (#37480 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Philip Ottesen <phiott256@gmail.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Signed-off-by: sihao.li <sihao.li@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: JartX <sagformas@epdcenter.es> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Philip Ottesen <phiott256@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com> Co-authored-by: Andy Lo <andy@mistral.ai> Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com> Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 15:20:08 +00:00
Jee Jee Li	96266f119b	[LoRA] Minor improvements to LoRA log (#37557 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-19 15:18:06 +00:00
Sage Moore	7c0cf3bcd0	Cap the number of API servers to 1 when using Elastic EP. (#37466 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-19 10:42:57 -04:00
Harry Mellor	572b432913	Stop bench CLI from recursively casting all configs to `dict` (#37559 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 14:04:03 +00:00
Cyrus Leung	9515c20868	[Misc] Clean up processing logic (#37541 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 13:30:20 +00:00
DorBernsohn	c63ca2b2e6	[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438 ) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>	2026-03-19 21:08:00 +08:00
Harry Mellor	a32eaf5bb2	[CI] Merge `cleanup_pr_body.yml` and `reminder_comment.yml` (#37552 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 12:55:07 +00:00
XueLiang Yang	e390742c59	Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536 ) Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com> Co-authored-by: xueliangyang-oeuler <yxl546827391@gmail.com>	2026-03-19 12:05:07 +00:00
Cyrus Leung	7a6ebcbfcf	[Model] Remove unnecessary `get_language_model` (#37545 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 20:00:36 +08:00
Cyrus Leung	c7bc12c20f	[CI/Build] Split out MM pooling tests (#37542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 11:36:11 +00:00
wang.yuqi	f9e2a38386	[Docs] Reorganize pooling docs. (#35592 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 11:25:47 +00:00
Harry Mellor	4426447bba	Don't log `exc_info` when vLLM tries to doenload a file that doesn't exist (#37458 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 10:38:29 +00:00
Li, Jiang	3322e26420	[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-19 10:24:39 +00:00
Cyrus Leung	765e461065	[Bugfix] Fix Nemotron Parse loading (#37407 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 09:55:29 +00:00
Duyi-Wang	6a9cceb219	[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418 ) Signed-off-by: Duyi-Wang <duyi.wang@amd.com>	2026-03-19 09:49:27 +00:00
yassha	199f914183	fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369 ) Signed-off-by: yassha <50112520+yassha@users.noreply.github.com>	2026-03-19 17:45:06 +08:00
Kunshang Ji	ca21483bf9	[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-19 09:23:24 +00:00
TJian	da70c87e81	[CI] Fix wrong path test file, missing `rlhf_async_new_apis.py` (#37532 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-03-19 02:21:55 -07:00
Collin McCarthy	0b6d52629f	Support temporal compression for Nemotron-3-VL videos (#36808 ) Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>	2026-03-19 08:02:19 +00:00
Ziming Huang	d3cc379567	[Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425 ) Signed-off-by: 智鸣 <hzm414167@alibaba-inc.com>	2026-03-19 15:43:48 +08:00
cdpath	354cd580d5	fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic streaming (#37510 ) Signed-off-by: cdpath <cdpath@outlook.com>	2026-03-19 07:23:35 +00:00
zhanqiuhu	d49f273144	[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310 )	2026-03-19 08:22:00 +01:00
Flora Feng	b21d384304	[Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-19 07:19:36 +00:00
Hongxia Yang	e3126cd107	[ROCm] issue management - request information for bug issues on ROCm (#37009 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-19 03:51:29 +00:00
Wentao Ye	e37ff5b5c8	[Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement (#37347 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-19 10:27:51 +08:00
Aaron Hao	6accb21f2a	[bug] Fix deadlock with pause resume and collective_rpc (#37024 ) Signed-off-by: hao-aaron <ahao@anyscale.com>	2026-03-19 01:49:02 +00:00
Giancarlo Delfin	053f3b6309	[Model Runner V2] Spec decode rejection sampler logprobs support (#37237 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-19 01:36:27 +00:00
Aaron Hao	5f82706a21	[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-03-19 00:45:10 +00:00
Sage Moore	c32a58cc2a	[EPLB] Simplify EPLB rearrange by only returning one map (#36267 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-18 20:34:00 -04:00
Elvir Crnčević	ef2c4f778d	[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-19 00:28:37 +00:00
sihao_li	9dade5da3a	[XPU]Unify xpu test dependencies in dockerfile.xpu (#36477 ) Signed-off-by: sihao.li <sihao.li@intel.com>	2026-03-19 08:12:07 +08:00
Thillai Chithambaram	828f862acb	[Bugfix] Expand quantization method support in perf metrics (#37231 ) Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>	2026-03-18 23:54:19 +00:00
Andy Lo	577df69b26	[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 23:07:29 +00:00
Giancarlo Delfin	04244fd0e1	[Model Runner V2] Spec decode rejection sampler greedy support (#37238 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-18 15:59:03 -07:00
Michael Goin	9482b0b085	[Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-03-18 15:37:49 -07:00
Woosuk Kwon	5bc1da147f	[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-18 22:34:19 +00:00
Philip Ottesen	0091017188	fix(worker): optimize swap_states to copy only active token prefixes (#34733 ) Signed-off-by: Philip Ottesen <phiott256@gmail.com>	2026-03-18 14:59:27 -07:00
Wentao Ye	0d81a1fe61	[V0 Deprecation] Deprecate virtual engine (#37195 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:30:14 -07:00
Netanel Haber	6ae4c8d6fc	chunk parakeet into 30s clips to prevent OOMs on long audios (#36671 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-18 14:22:24 -07:00

1 2 3 4 5 ...

15037 Commits