biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
mikaylagawarecki	8b10e4fb31	[1/n] Migrate permute_cols to libtorch stable ABI (#31509 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-19 11:27:26 -04:00
Ifta khairul Alam Adil	104605cbf2	Remove deprecated reasoning_content message field(part-2) (#37480 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Philip Ottesen <phiott256@gmail.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Signed-off-by: sihao.li <sihao.li@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: JartX <sagformas@epdcenter.es> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Philip Ottesen <phiott256@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com> Co-authored-by: Andy Lo <andy@mistral.ai> Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com> Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 15:20:08 +00:00
Jee Jee Li	96266f119b	[LoRA] Minor improvements to LoRA log (#37557 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-19 15:18:06 +00:00
Sage Moore	7c0cf3bcd0	Cap the number of API servers to 1 when using Elastic EP. (#37466 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-19 10:42:57 -04:00
Harry Mellor	572b432913	Stop bench CLI from recursively casting all configs to `dict` (#37559 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 14:04:03 +00:00
Cyrus Leung	9515c20868	[Misc] Clean up processing logic (#37541 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 13:30:20 +00:00
DorBernsohn	c63ca2b2e6	[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438 ) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>	2026-03-19 21:08:00 +08:00
Harry Mellor	a32eaf5bb2	[CI] Merge `cleanup_pr_body.yml` and `reminder_comment.yml` (#37552 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 12:55:07 +00:00
XueLiang Yang	e390742c59	Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536 ) Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com> Co-authored-by: xueliangyang-oeuler <yxl546827391@gmail.com>	2026-03-19 12:05:07 +00:00
Cyrus Leung	7a6ebcbfcf	[Model] Remove unnecessary `get_language_model` (#37545 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 20:00:36 +08:00
Cyrus Leung	c7bc12c20f	[CI/Build] Split out MM pooling tests (#37542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 11:36:11 +00:00
wang.yuqi	f9e2a38386	[Docs] Reorganize pooling docs. (#35592 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 11:25:47 +00:00
Harry Mellor	4426447bba	Don't log `exc_info` when vLLM tries to doenload a file that doesn't exist (#37458 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 10:38:29 +00:00
Li, Jiang	3322e26420	[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-19 10:24:39 +00:00
Cyrus Leung	765e461065	[Bugfix] Fix Nemotron Parse loading (#37407 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 09:55:29 +00:00
Duyi-Wang	6a9cceb219	[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418 ) Signed-off-by: Duyi-Wang <duyi.wang@amd.com>	2026-03-19 09:49:27 +00:00
yassha	199f914183	fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369 ) Signed-off-by: yassha <50112520+yassha@users.noreply.github.com>	2026-03-19 17:45:06 +08:00
Kunshang Ji	ca21483bf9	[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-19 09:23:24 +00:00
TJian	da70c87e81	[CI] Fix wrong path test file, missing `rlhf_async_new_apis.py` (#37532 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2026-03-19 02:21:55 -07:00
Collin McCarthy	0b6d52629f	Support temporal compression for Nemotron-3-VL videos (#36808 ) Signed-off-by: Collin McCarthy <cmccarthy@nvidia.com>	2026-03-19 08:02:19 +00:00
Ziming Huang	d3cc379567	[Perf] Fix slow hasattr in CUDAGraphWrapper.__getattr__ (#37425 ) Signed-off-by: 智鸣 <hzm414167@alibaba-inc.com>	2026-03-19 15:43:48 +08:00
cdpath	354cd580d5	fix(anthropic): remove non-standard 'data: [DONE]' from Anthropic streaming (#37510 ) Signed-off-by: cdpath <cdpath@outlook.com>	2026-03-19 07:23:35 +00:00
zhanqiuhu	d49f273144	[SSM/Mamba] Follow-up: N-1 prefill for P/D disaggregation (#37310 )	2026-03-19 08:22:00 +01:00
Flora Feng	b21d384304	[Refactor] Relocate endpoint tests to mirror serving code directory structure (#37504 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-19 07:19:36 +00:00
Hongxia Yang	e3126cd107	[ROCm] issue management - request information for bug issues on ROCm (#37009 ) Signed-off-by: Hongxia Yang <hongxiay.yang@amd.com>	2026-03-19 03:51:29 +00:00
Wentao Ye	e37ff5b5c8	[Perf] Optimize token_embed for pooling models, 1.0% token throughput improvement (#37347 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-19 10:27:51 +08:00
Aaron Hao	6accb21f2a	[bug] Fix deadlock with pause resume and collective_rpc (#37024 ) Signed-off-by: hao-aaron <ahao@anyscale.com>	2026-03-19 01:49:02 +00:00
Giancarlo Delfin	053f3b6309	[Model Runner V2] Spec decode rejection sampler logprobs support (#37237 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-19 01:36:27 +00:00
Aaron Hao	5f82706a21	[BUG] Exclude SKIP_TENSORS from get_layer_size() + new weight sync example for dpep (#37334 ) Signed-off-by: ahao-anyscale <ahao@anyscale.com>	2026-03-19 00:45:10 +00:00
Sage Moore	c32a58cc2a	[EPLB] Simplify EPLB rearrange by only returning one map (#36267 ) Signed-off-by: Sage Moore <sage@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-18 20:34:00 -04:00
Elvir Crnčević	ef2c4f778d	[Bugfix] Zero-init MLA attention output buffers to prevent NaN from CUDA graph padding (#37442 ) Signed-off-by: Elvir Crncevic <elvircrn@gmail.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com>	2026-03-19 00:28:37 +00:00
sihao_li	9dade5da3a	[XPU]Unify xpu test dependencies in dockerfile.xpu (#36477 ) Signed-off-by: sihao.li <sihao.li@intel.com>	2026-03-19 08:12:07 +08:00
Thillai Chithambaram	828f862acb	[Bugfix] Expand quantization method support in perf metrics (#37231 ) Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com>	2026-03-18 23:54:19 +00:00
Andy Lo	577df69b26	[Bugfix] Fix KV scales inconsistency in fp8 MLA & FlashInfer kv_cache_dtype "auto" leading to gibberish (#37054 ) Signed-off-by: Andy Lo <andy@mistral.ai>	2026-03-18 23:07:29 +00:00
Giancarlo Delfin	04244fd0e1	[Model Runner V2] Spec decode rejection sampler greedy support (#37238 ) Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai>	2026-03-18 15:59:03 -07:00
Michael Goin	9482b0b085	[Bugfix] Remove assertion for NVFP4 scale dynamic range (#37465 ) Signed-off-by: Michael Goin <mgoin64@gmail.com>	2026-03-18 15:37:49 -07:00
Woosuk Kwon	5bc1da147f	[LoRA][BugFix] Fix skipped LoRA adapters for Mistral3 (#36928 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-18 22:34:19 +00:00
Philip Ottesen	0091017188	fix(worker): optimize swap_states to copy only active token prefixes (#34733 ) Signed-off-by: Philip Ottesen <phiott256@gmail.com>	2026-03-18 14:59:27 -07:00
Wentao Ye	0d81a1fe61	[V0 Deprecation] Deprecate virtual engine (#37195 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:30:14 -07:00
Netanel Haber	6ae4c8d6fc	chunk parakeet into 30s clips to prevent OOMs on long audios (#36671 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-18 14:22:24 -07:00
JartX	a913b612d8	[Bugfix] Fix ROCm crash in qwen3_next multi-stream events (#36795 ) (#37427 ) Signed-off-by: JartX <sagformas@epdcenter.es>	2026-03-18 16:06:31 -04:00
Harry Mellor	5ce2d10e4a	Fix models which use `layer_type_validation` for Transformers v5 (#37398 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-18 18:41:51 +00:00
Chengyu Fang	738d0a281f	[Bugfix] Fix incorrect use of merge_size in Qwen3-VL video timestamp calculation (#37439 ) Signed-off-by: chengyufang <cnyvfang@outlook.com>	2026-03-18 11:36:34 -07:00
youkaichao	70b81c4f3d	[bugfix][async scheduling] fix extra cuda context in device 0 with EP/DP (#37449 ) Signed-off-by: youkaichao <youkaichao@gmail.com>	2026-03-18 18:32:30 +00:00
Cyrus Leung	7476d148db	[Model] Remove unnecessary processor definition for Nemotron Parse (#37456 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-18 18:25:13 +00:00
Cyrus Leung	f3732bd931	[Misc] Clean up model registry (#37457 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-18 18:24:44 +00:00
Wentao Ye	0ef7f79054	[Perf] Add tuned triton moe config for Qwen3.5 H200, 9.9% E2E throughput improvement (#37340 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-18 14:18:34 -04:00
Or Ozeri	5dd8df0701	[kv_offload+HMA][2/N]: Support multiple KV groups in GPULoadStoreSpec (#36642 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-18 19:26:40 +02:00
Harry Mellor	39bfb57b7c	Add API docs link if the CLI arg is a config class (#37432 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-18 17:19:35 +00:00
RonaldBXu	c9d838fc33	Adding deterministic lora benchmarking to vLLM Bench (#36057 ) Signed-off-by: Ubuntu <ubuntu@ip-172-31-43-201.ap-northeast-1.compute.internal> Signed-off-by: Ronald Xu <ronaldxu@amazon.com>	2026-03-18 16:02:03 +00:00

... 2 3 4 5 6 ...

15177 Commits