biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Yuxiang Liang	638a872d77	fix(xpu): Re-compute compile ranges after platform-specific config updates (#37523 ) Signed-off-by: Yuxiang Liang <yuxiang.liang@intel.com> Signed-off-by: Yuxiang Liang <yuliang@habana.ai> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-20 03:52:35 +00:00
Flora Feng	9040151fe1	[V0 Deprecation] Deprecate --disable-frontend-multiprocessing (#37612 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 11:31:43 +08:00
Jee Jee Li	8fbe3f303f	[Bugfix][LoRA] Fix Qwen35 LoRA (#36976 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2026-03-20 11:09:32 +08:00
Xiao	ea2c148fa7	[compile][graph_partition]Add tensor size handling (#36038 ) Signed-off-by: Xiao Fu <xiaofu@meta.com>	2026-03-19 19:55:25 -07:00
Tianmu Li	47b7af0d87	[Feat] Enable CompressedTensorW4A8Int for XPU (#37207 ) Signed-off-by: Li, Tianmu <tianmu.li@intel.com>	2026-03-20 02:34:28 +00:00
tianshu-Michael-yu	269bf46d99	fix: disambiguate multimodal prefix cache keys (#36708 ) Signed-off-by: tianshu.yu <tianshuyu.formal@gmail.com>	2026-03-20 10:33:20 +08:00
Flora Feng	e5a77a5015	[CI] Update mergify tool-calling label paths (#37478 ) Signed-off-by: sfeng33 <4florafeng@gmail.com>	2026-03-20 02:22:23 +00:00
Itay Alroy	ca1ac1a4b4	Fix DP coordinator ZMQ port TOCTOU (#37452 ) Signed-off-by: Itay Alroy <ialroy@nvidia.com>	2026-03-20 00:58:31 +00:00
Divakar Verma	4ca3fa6bb4	[ROCm][Bugfix] fix cache block size mismatch for aiter unified attention (#37606 ) Signed-off-by: Divakar Verma <divakar.verma@amd.com>	2026-03-20 00:00:08 +00:00
Flora Feng	be12afd284	[Bugfix] Fix Deepseekv32 tool parser when stream interval > 1 (#36056 )	2026-03-19 19:51:25 -04:00
Wentao Ye	df3c0291a3	[Bug] Fix EmbedIOprocessor "classify" <-> "embed" (#37573 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-20 07:40:10 +08:00
Wentao Ye	2be1a0f74b	[Refactor] Remove dead code in pooling model (#37572 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-20 07:39:43 +08:00
Jim Smith	4120a05ff1	Fix AttributeError in Qwen3.5 GDN layers with quantized models (#37448 ) Signed-off-by: Jim Smith <jim@joshua8.ai> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com> Co-authored-by: Xin Yang <105740670+xyang16@users.noreply.github.com>	2026-03-19 19:21:14 -04:00
rasmith	98ff042917	[CI][BugFix][AMD] Don't set VLLM_ROCM_USE_AITER anymore in test_rocm_aiter_topk since its not necessary (#36996 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2026-03-20 07:12:45 +08:00
Artem Perevedentsev	b55156eae9	[Performance] Enable Triton autotuning disk cache by default (#37188 ) Signed-off-by: Artem Perevedentsev <aperevedents@nvidia.com>	2026-03-19 17:36:28 -04:00
Laith Sakka	112944fab9	test Qwen/Qwen3-4B-Instruct-2507 for unbacked (#36064 ) Signed-off-by: Laith Sakka <lsakka@meta.com>	2026-03-19 17:28:45 -04:00
bnellnm	91be5f9be3	[MoE Refactor] Rename "naive" all2all backend (#36294 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-03-19 15:50:34 -04:00
Aaron Hao	4ee847e400	Comment fix for async rl example (#35244 ) Signed-off-by: hao-aaron <ahao@anyscale.com>	2026-03-19 19:46:07 +00:00
Andreas Karatzas	040a505ff5	[ROCm][CI] Cleaning and restructuring amd-ci legacy pipeline (#34839 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-19 14:30:58 -05:00
bnellnm	9279c59a0e	[MoE Refactor] DefaultMoERunner simplifcation (#33049 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-03-19 15:07:44 -04:00
Wentao Ye	7454096199	[Log] Log once in local node by default (#37568 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-03-19 12:04:59 -07:00
Andreas Karatzas	fb8b5e05fc	[CI] Add retry with 4x backoff to HTTP fetches for transient failures (#37218 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-19 19:00:20 +00:00
Harry Mellor	e5d96dc8fc	Fix `SpeculatorsConfig` now that `PreTrainedConfig` is a `dataclass` in Transformers (#37574 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 18:04:40 +00:00
EdalatiAli	daa05bf340	[Bugfix] Fix AttributeError when serving MXFP8 models with DeepGEMM installed (#37358 ) Signed-off-by: EdalatiAli <aliedalati@cohere.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-19 17:58:33 +00:00
Lucas Kabela	7769b58307	[torch.compile][BE][Multimodal] Remove requirement to set_model_tag to avoid cache conflict (#37345 ) Signed-off-by: Lucas Kabela <lucaskabela@meta.com>	2026-03-19 17:26:12 +00:00
Chauncey	2f9f946b22	[P/D] AnthropicMessages add kv_transfer_params for PD disaggregation (#37535 ) Signed-off-by: chaunceyjiang <chaunceyjiang@gmail.com>	2026-03-19 16:41:20 +00:00
Fadi Arafeh	2890aecce5	[CPU][UX] Do not crash when tcmalloc/libiomp are not ldpreloaded (#37561 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-03-19 16:35:45 +00:00
Harry Mellor	34f093b417	[CI] Gate pre-commit on `ready` label or number of contributions (#37544 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 16:21:57 +00:00
Harry Mellor	4dce8321a9	Run MacOS smoke test on daily `cron` job instead of every commit (#37567 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 16:19:50 +00:00
Cyrus Leung	657855ab41	[Misc] Cleanup more configs and processors (#37560 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 15:45:23 +00:00
Wei Zhao	e27b8ba3d1	[Bug] Fix fp8 trtllm MoE modular kernel supported routing methods (#37346 ) Signed-off-by: wzhao18 <wzhao18.sz@gmail.com>	2026-03-19 11:43:06 -04:00
Woosuk Kwon	40b8363b45	[MRV2] Use fp32 for draft logits (#37526 ) Signed-off-by: Woosuk Kwon <woosuk@inferact.ai>	2026-03-19 08:41:21 -07:00
mikaylagawarecki	8b10e4fb31	[1/n] Migrate permute_cols to libtorch stable ABI (#31509 ) Signed-off-by: Mikayla Gawarecki <mikaylagawarecki@gmail.com>	2026-03-19 11:27:26 -04:00
Ifta khairul Alam Adil	104605cbf2	Remove deprecated reasoning_content message field(part-2) (#37480 ) Signed-off-by: JartX <sagformas@epdcenter.es> Signed-off-by: Ifta Khairul Alam Adil <ikaadil007@gmail.com> Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Signed-off-by: Philip Ottesen <phiott256@gmail.com> Signed-off-by: Woosuk Kwon <woosuk@inferact.ai> Signed-off-by: Michael Goin <mgoin64@gmail.com> Signed-off-by: Giancarlo Delfin <gdelfin@inferact.ai> Signed-off-by: Andy Lo <andy@mistral.ai> Signed-off-by: Thillai Chithambaram <thillaichithambaram.a@gmail.com> Signed-off-by: sihao.li <sihao.li@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: JartX <sagformas@epdcenter.es> Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com> Co-authored-by: Philip Ottesen <phiott256@gmail.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Giancarlo Delfin <32987265+TheEpicDolphin@users.noreply.github.com> Co-authored-by: Andy Lo <andy@mistral.ai> Co-authored-by: Thillai Chithambaram <79466435+thillai-c@users.noreply.github.com> Co-authored-by: sihao_li <165983188+1643661061leo@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 15:20:08 +00:00
Jee Jee Li	96266f119b	[LoRA] Minor improvements to LoRA log (#37557 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com>	2026-03-19 15:18:06 +00:00
Sage Moore	7c0cf3bcd0	Cap the number of API servers to 1 when using Elastic EP. (#37466 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-19 10:42:57 -04:00
Harry Mellor	572b432913	Stop bench CLI from recursively casting all configs to `dict` (#37559 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 14:04:03 +00:00
Cyrus Leung	9515c20868	[Misc] Clean up processing logic (#37541 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 13:30:20 +00:00
DorBernsohn	c63ca2b2e6	[Bugfix] Add Kimi-K2.5 reasoning/tool parser aliases and tool_call_id support (#37438 ) Signed-off-by: DorBernsohn <dor.bernsohn@gmail.com>	2026-03-19 21:08:00 +08:00
Harry Mellor	a32eaf5bb2	[CI] Merge `cleanup_pr_body.yml` and `reminder_comment.yml` (#37552 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 12:55:07 +00:00
XueLiang Yang	e390742c59	Fix KV Offloading + MLA AssertionError by using num_kv_heads=1 in cpu… (#37536 ) Signed-off-by: xueliangyang-oeuler <yxl546827391@gmail.com> Co-authored-by: xueliangyang-oeuler <yxl546827391@gmail.com>	2026-03-19 12:05:07 +00:00
Cyrus Leung	7a6ebcbfcf	[Model] Remove unnecessary `get_language_model` (#37545 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 20:00:36 +08:00
Cyrus Leung	c7bc12c20f	[CI/Build] Split out MM pooling tests (#37542 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 11:36:11 +00:00
wang.yuqi	f9e2a38386	[Docs] Reorganize pooling docs. (#35592 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 11:25:47 +00:00
Harry Mellor	4426447bba	Don't log `exc_info` when vLLM tries to doenload a file that doesn't exist (#37458 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-19 10:38:29 +00:00
Li, Jiang	3322e26420	[Bugfix] Avoid more OpenMP thread reallocation in CPU torch compile (#37538 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-03-19 10:24:39 +00:00
Cyrus Leung	765e461065	[Bugfix] Fix Nemotron Parse loading (#37407 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-19 09:55:29 +00:00
Duyi-Wang	6a9cceb219	[Bugfix][ROCm] Fix MoRI + AITER FP8 dispatch compatibility for defer_input_quant (#37418 ) Signed-off-by: Duyi-Wang <duyi.wang@amd.com>	2026-03-19 09:49:27 +00:00
yassha	199f914183	fix(cpu): add null check for aligned_alloc in ScratchPadManager (#37369 ) Signed-off-by: yassha <50112520+yassha@users.noreply.github.com>	2026-03-19 17:45:06 +08:00
Kunshang Ji	ca21483bf9	[MISC] fix pin_memory=torch.cuda.is_available(), use is_pin_memory_available (#37415 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-19 09:23:24 +00:00

... 4 5 6 7 8 ...

15309 Commits