biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Cyrus Leung	6dd302653f	[Misc] Rename `group_mm_kwargs_by_modality -> group_and_batch_mm_kwargs` (#36158 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-06 12:32:48 +08:00
Cyrus Leung	de00ebeac4	[Bugfix] Fix simple Mistral-Small example (#36156 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-05 20:25:11 -08:00
Andreas Karatzas	639680d220	[ROCm][CI] Adding missing dependencies for Multi-modal models tests (#36177 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 12:23:10 +08:00
Rohan Potdar	c5362c739f	Reenable features for ROCm attention backends (#36185 ) Signed-off-by: Rohan138 <rohanpotdar138@gmail.com>	2026-03-05 20:21:06 -08:00
Nikhil Gupta	0a49676fb0	cpu: aarch64: Upgrade OneDNN for aarch64 to add support for int8 matmul (#36147 ) Signed-off-by: Nikhil Gupta <nikhil.gupta2@arm.com>	2026-03-06 03:48:59 +00:00
Jeffrey Wang	c012a8c477	Don't fire ray compatibility webhook when PR or branch is not provided (#36088 ) Signed-off-by: Jeffrey Wang <jeffreywang@anyscale.com>	2026-03-06 00:42:21 +00:00
Dor Huri	ebed80a7c8	[Performance] Extract KV-cache update from TreeAttention backend (#35384 ) Signed-off-by: dorhuri123 <dor.huri1@live.biu.ac.il>	2026-03-06 00:22:43 +00:00
Nick Hill	a73af584fe	[Model Runner V2] Fix warmup for very small kvcache and/or blocksizes (#36176 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-05 14:48:10 -08:00
Zhengxu Chen	a97954b6a8	[compile] Consistent compiler config for saved/loaded vllm backends. (#35810 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 15:08:12 -05:00
Yanhong Li	a911f4dd20	[Model] Add support for OLMo Hybrid (#32550 )	2026-03-05 14:51:06 -05:00
Russell Bryant	5395471d29	[CI] Add explicit permissions to macOS smoke test workflow (#35775 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2026-03-05 19:08:48 +00:00
Frank Wang	a57c877f18	[BugFix] Fallback from FA4->FA2 for Batch Invariance (#36059 ) Signed-off-by: frankwang28 <frank.wbb@hotmail.com>	2026-03-05 14:05:56 -05:00
Xin Yang	f917020983	[Perf] Optimize FusedMoEModularKernel output tensor using torch.empty (#35794 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-03-05 13:47:53 -05:00
tomeras91	86483ca774	[Bugfix] Disable FlashInfer TRTLLM BF16 path for non-gated MoE (#36146 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2026-03-05 09:49:05 -08:00
Netanel Haber	b93a9e6f6d	ParakeetProjection.norm = RMSNorm instead of nn.LayerNorm (#36133 ) Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>	2026-03-05 17:29:30 +00:00
Xinyu Chen	d8839ef7d9	[XPU] Enable ModelRunnerV2 on XPU (#36078 ) Signed-off-by: Xinyu Chen <xinyu1.chen@intel.com>	2026-03-05 17:19:18 +00:00
Avery Miao	e998fa76b9	[BUGFIX]Fix Qwen-Omni models audio max_token_per_item estimation error leading to encoder_cache_size is 0 (#35994 ) Signed-off-by: Miao, Avery <avery.miao@intel.com>	2026-03-05 09:16:29 -08:00
Jiayi Yan	6a895197fa	[Bugfix][CI] fix typos (#34934 ) Signed-off-by: 1195343015 <1195343015@qq.com> Signed-off-by: Jiayi Yan <66017932+1195343015@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 17:05:46 +00:00
Sage Moore	8c760b6ab6	[ROCm] Refactor ROCm attention backend selection logic (#35246 ) Signed-off-by: Sage Moore <sage@neuralmagic.com>	2026-03-05 10:51:26 -06:00
AllenDou	3ee68590c7	refactor funasr model. (#36108 ) Signed-off-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: zixiao <shunli.dsl@alibaba-inc.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-05 08:07:37 -08:00
Cyrus Leung	7196348157	[Bugfix] Fix Qwen-VL tokenizer implementation (#36140 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2026-03-05 08:07:19 -08:00
Ning Xie	176c799f4c	[openai api] log exception in exception handler (1/N) (#31164 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2026-03-05 16:00:12 +00:00
Or Ozeri	612e7729c2	[KVConnector] Scheduler: Fix num_computed_tokens after async KV load (#34616 ) Signed-off-by: Or Ozeri <oro@il.ibm.com>	2026-03-05 14:25:15 +00:00
Harry Mellor	ecde7af9c4	Fix import that was moved in Transformers 5.2.0 (#36120 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:59:44 +00:00
Harry Mellor	8df523351f	[Docs] Only build docs if `documentation` or `ready` labels are present (#36135 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 13:58:16 +00:00
Andreas Karatzas	b03ff6a96b	[CI] Stabilize test_no_args_tool_call and add ROCm-specific server args (#36107 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-05 21:52:49 +08:00
Ajay Anubolu	ed81d5edd1	[Bugfix] Fix RunAI streamer crash with S3-hosted model paths (#35976 ) Signed-off-by: AjAnubolu <anuboluajay@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-05 12:14:20 +00:00
Shiyan Deng	3c23ac840e	[Bugfix] Fix mypy errors in hermes_tool_parser.py (#36114 ) Signed-off-by: Shiyan Deng <dsy842974287@meta.com>	2026-03-05 11:37:47 +00:00
cjackal	a708ef5944	[Misc] Fix SyntaxWarning - invalid escape sequence '\e' (#36020 ) Signed-off-by: cjackal <44624812+cjackal@users.noreply.github.com>	2026-03-05 10:55:31 +00:00
Kunshang Ji	66a2209645	[Hardware] Replace `torch.cuda.synchronize()` api with `torch.accelerator.synchronize` (#36085 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-05 10:36:39 +00:00
Doug Smith	0bfa229bf1	[Release] Include source distribution (sdist) in PyPI uploads (#35136 ) Signed-off-by: dougbtv <dosmith@redhat.com> Co-authored-by: Daniele Trifirò <dtrifiro@redhat.com>	2026-03-05 01:43:50 -08:00
Paco Xu	7493c51c55	[Docs] add Dynamo/aibrix integration and kubeai/aks link (#32767 ) Signed-off-by: Paco Xu <paco.xu@daocloud.io>	2026-03-05 17:39:50 +08:00
Reagan Lee	ac773bbe80	[Docs] Update docs to include mm processor + encoder benchmarks (#34083 ) Signed-off-by: Reagan <reaganjlee@gmail.com>	2026-03-05 01:38:25 -08:00
Christian Munley	48e376a007	qwen3coder tool parser fix anyOf double encoded parameters (#36032 ) Signed-off-by: Christian Munley <cmunley@nvidia.com>	2026-03-05 09:06:57 +00:00
Isotr0py	21eb2c3372	[Chore] Correct MTP models test registry ordering (#36115 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-03-05 08:55:04 +00:00
Seiji Eicher	e2b31243c0	[Docs] Update `CacheConfig` block_size docstring to remove inaccurate limit when using CUDA (#35632 ) Signed-off-by: Seiji Eicher <seiji@anyscale.com>	2026-03-05 06:24:08 +00:00
Martin Hickey	c3598d02fa	[Misc] Remove deprecated items that are due for removal (#36006 ) Signed-off-by: Martin Hickey <martin.hickey@ie.ibm.com>	2026-03-05 06:14:50 +00:00
Benjamin Chislett	57c629e9c1	[Bugfix] Fix block_size for hybrid model MTP (#36036 ) Signed-off-by: Benjamin Chislett <bchislett@nvidia.com>	2026-03-05 06:10:54 +00:00
zihaoanllm	d106bf39f5	[Doc] Add Parallel Draft Models (#35973 ) Signed-off-by: <zihaoan2@amd.com> Signed-off-by: zihaoanllm <zihaoan2@amd.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 05:44:07 +00:00
Yanan Cao	b0651021e5	[Kernel] [Helion] [11/N] Retune configs for silu_mul_fp8 (#36062 )	2026-03-04 21:25:59 -08:00
Hanjun Cho	f600d5192e	[Bugfix] Fix score layer quantization for sequence classification models - Qwen3 (VL) Reranker (#35849 ) Signed-off-by: Hanjun Cho <gkswns0531@gmail.com> Co-authored-by: wang.yuqi <yuqi.wang@daocloud.io>	2026-03-04 20:57:20 -08:00
Tianmu Li	8e7820131e	[Perf] Use dummy M for weight prepacking on x86 (#35890 ) Signed-off-by: Li, Tianmu <tianmu.li@intel.com>	2026-03-05 04:56:49 +00:00
Andrii Skliar	0a12cea25f	Order `config.py` in Lexicographical order (#35866 ) Signed-off-by: Andrii Skliar <askliar@nvidia.com> Co-authored-by: Andrii Skliar <askliar@nvidia.com>	2026-03-04 20:56:47 -08:00
Zhengxu Chen	dd6dbd93f8	[compile] Fix extra cache save on warm start. (#35921 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 12:56:30 +08:00
Harry Mellor	26366009c5	[CI] Don't leave docs preview comment on closed PRs (#36087 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-03-05 04:51:46 +00:00
Nick Hill	16c472abe7	[Core] Move ray-specific WorkerWrapperBase methods to RayWorkerWrapper (#35328 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-05 12:11:59 +08:00
daje0601	3b23d57c96	[Model] Add LoRA support for Whisper models (#29856 ) Signed-off-by: daje0601 <englishmt4118@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-05 10:38:25 +08:00
Wentao Ye	2f4226fe52	[CI] Fix pre-commit mypy issue in main (#36049 )	2026-03-04 18:13:12 -08:00
nkm-meta	792cbd64ca	Add platform method to enable custom collective ops registration (#34760 ) Signed-off-by: Naina Kuruballi Mahesh <nainakm@meta.com>	2026-03-05 00:50:32 +00:00
Zhengxu Chen	2ed4722e26	[compile] Reduce log spam from compile. (#36044 ) Signed-off-by: zhxchen17 <zhxchen17@fb.com>	2026-03-05 00:48:36 +00:00

1 2 3 4 5 ...

14539 Commits