biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Shengqi Chen	136c499f6e	[CI] fix version comparsion and exclusion patterns in upload-release-wheels.sh (#32971 ) Signed-off-by: Shengqi Chen <harry-chen@outlook.com>	2026-01-23 22:21:49 +00:00
joninco	ebd0a17e0e	[Bugfix] Fix missing is_layer_skipped check for FusedMoE in AWQConfig (#32935 ) Signed-off-by: jon <joninco@bullpoint.org>	2026-01-23 17:19:56 -05:00
Wentao Ye	37c9859fab	[Refactor] Clean up unused variables & func (#32692 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 17:04:25 -05:00
Michael Goin	4561f13985	[Refactor] Rename `gptq_marlin` to `marlin` to match MoE (#32952 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2026-01-23 16:48:12 -05:00
rasmith	6cc6d92be5	[CI][AMD][BugFix] Update wvSplitK (and other skinny_gemm wrappers) to ensure tensors passed will be made contiguous for the kernel (#32831 ) Signed-off-by: Randall Smith <ransmith@amd.com> Co-authored-by: Randall Smith <ransmith@amd.com>	2026-01-23 13:35:48 -08:00
Wentao Ye	dfab5f3764	[Bug] Fix benchmark script `moe_permute_unpermute` (#32949 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 16:18:56 -05:00
Markus / Mark	586a57ad7e	fix: Add glm4_moe_lite to MLA detection (#32614 ) Signed-off-by: marksverdhei <marksverdhei@hotmail.com> Signed-off-by: Markus / Mark <46672778+marksverdhei@users.noreply.github.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2026-01-23 12:38:57 -08:00
Lucas Wilkinson	3a41459501	[cudagraphs] Refactor cudagraph capture loop (#32946 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-23 13:22:20 -07:00
Nick Hill	8518b30447	[Model Runner V2] Add KV Connector support (#32742 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 10:49:17 -08:00
Matthew Bonanni	2d6b537157	[Bugfix][CI] Fix pre-commit (#32956 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-23 10:26:56 -08:00
Orion Reblitz-Richardson	68b0a6c1ba	[CI][torch nightlies] Use main Dockerfile with flags for nightly torch tests (#30443 ) Signed-off-by: Orion Reblitz-Richardson <orionr@meta.com> Signed-off-by: Orion Reblitz-Richardson <orionr@gmail.com> Co-authored-by: Kevin H. Luu <khluu000@gmail.com>	2026-01-23 10:22:56 -08:00
Harry Huang	5206e5e28c	[V1][Hybrid] Mamba Prefix Caching with align mode (#30877 ) Signed-off-by: huanghaoyan.hhy <huanghaoyan.hhy@alibaba-inc.com> Signed-off-by: Chen Zhang <zhangch99@outlook.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2026-01-23 09:56:48 -08:00
Matteo Fari	fec9da0af4	[Model] Enable LoRA support for internvl2 (#32397 ) Signed-off-by: Matteo Fari <matteofari06@gmail.com>	2026-01-24 01:39:01 +08:00
Luka Govedič	bbbd696af9	[torch.compile][CI] Add back attn fusion on hopper/ada (#32940 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com>	2026-01-23 16:49:20 +00:00
sangbumlikeagod	9b77bb790d	[Frontend] add logprob, compression_rate to 'verbose_json' features (#31059 ) Signed-off-by: sangbumlikeagod <oironese@naver.com> Signed-off-by: sangbumlikeagod <98077576+sangbumlikeagod@users.noreply.github.com>	2026-01-23 16:35:13 +00:00
Matt	305e53ade8	[Hardware][AMD][CI][Bugfix] Fix Kernels Attention Cache test (#32904 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-23 16:24:26 +00:00
Mark McLoughlin	1cb4341fbc	[ROCm][PD] Remove unused moriio connector proxy code (#32939 ) Signed-off-by: Mark McLoughlin <markmc@redhat.com>	2026-01-23 15:59:04 +00:00
baonudesifeizhai	1fb648bf10	[Bugfix] Fix FP8 MoE EP Weight Loading for ModelOpt Llama4 (#32886 ) Signed-off-by: baonudesifeizhai <baonudesifeizhai@gmail.com>	2026-01-23 10:31:48 -05:00
Nicolò Lucchesi	7e22309755	[Misc] Postpone torch_profiler deprecation (#32867 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-23 14:39:48 +00:00
Xin Yang	90c2007932	[Bugfix] Disable tma_aligned_scales in test_fusions_e2e (#32916 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-23 14:34:30 +00:00
Raushan Turganbay	d95d650762	[Bugfix] Fix getting vision features in Transformer Multimodal backend (#32933 ) Signed-off-by: raushan <raushan@huggingface.co>	2026-01-23 13:34:48 +00:00
tianshu-Michael-yu	13d8746c54	[Feature]: Remove DtoH Copy for lfm2_vl On Default Stream (#32815 ) Signed-off-by: Tianshu Yu <tianshuyu.formal@gmail.com>	2026-01-23 13:20:30 +00:00
Fadi Arafeh	10e94c84f6	[CPU][Feat] Update PyTorch to v2.10 for CPU Backend (#32869 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com> Co-authored-by: Li, Jiang <jiang1.li@intel.com>	2026-01-23 21:13:06 +08:00
Isotr0py	243e78c20f	[Benchmark][Bugfix] Fix race condtion when starting server for sweep benchmark (#32927 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2026-01-23 12:11:18 +00:00
Fadi Arafeh	aac0b817fa	[CPU Backend][BugFix] Fix failing CPU MoE test (#32876 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-23 12:06:51 +00:00
wang.yuqi	05f3d714db	[Frontend][3/n] Make pooling entrypoints request schema consensus \| EmbedRequest & ClassifyRequest (#32905 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-23 12:03:44 +00:00
Patrick von Platen	3f3f89529d	[Voxtral] Add new streaming arch (#32861 ) Signed-off-by: Patrick von Platen <patrick.v.platen@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-23 12:41:52 +01:00
Li, Jiang	5da4c7d789	[CI/Build][CPU] Fix failed pooling tests and macos smoke test (#32907 ) Signed-off-by: jiang1.li <jiang1.li@intel.com> Signed-off-by: Li, Jiang <bigpyj64@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-01-23 10:48:20 +00:00
Nicolò Lucchesi	160c6fa387	[Misc] Add `get_name` to missing AttentionBackends (#32698 ) Signed-off-by: NickLucche <nlucches@redhat.com>	2026-01-23 10:35:44 +00:00
Andreas Karatzas	a8eb1182f1	[CI][Models] Add VLM Support for Sequence Classification Conversion (#32885 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-01-23 16:22:51 +08:00
Karan Bansal	fa6e599a61	[Bugfix] Fix _CPU_MOE_ACT AssertionError when vLLM config not set (#32777 ) Signed-off-by: Karan Bansal <karanb192@gmail.com>	2026-01-23 08:22:37 +00:00
Wentao Ye	7ef5873752	[CI] Fix mypy for `vllm/v1/structured_output` (#32722 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-23 11:55:51 +08:00
Luka Govedič	5e4e0e51f4	[torch.compile] Compile `CustomOp.forward_native` for `SiluAndMul` and `QuantFP8` to avoid raw torch ops inside opaque custom ops (#32806 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2026-01-22 19:52:26 -08:00
Rishabh Saini	f61c9da711	[BugFix] deepseek_v32_encoding: Replace asserts with proper exceptions (#32884 ) Signed-off-by: RishabhSaini <rishabhsaini01@gmail.com>	2026-01-23 03:44:11 +00:00
Nick Hill	7fe255889e	[Misc] Log vLLM logo when starting server (#32796 ) Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-01-23 11:15:12 +08:00
bnellnm	dc917cceb8	[MoE Refactor] Move `select_experts` from `FusedMoEQuantMethod` -> `FusedMoE` (#31996 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2026-01-22 18:21:35 -05:00
Fadi Arafeh	fc56f4a071	[BugFix] Fix invalid flashinfer_fused_moe_blockscale_fp8 op registration (#32855 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-22 22:27:40 +00:00
Xin Yang	d08b356ee0	[Perf] Create TMA-aligned input scale tensor for DeepGemm on Hopper (#32619 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2026-01-22 15:47:04 -05:00
Wentao Ye	f744810184	[Refactor] Remove unused tpu files (#32610 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2026-01-22 15:35:18 -05:00
Eldar Kurtić	44f08af3a7	Add llmcompressor fp8 kv-cache quant (per-tensor and per-attn_head) (#30141 ) Signed-off-by: Eldar Kurtic <8884008+eldarkurtic@users.noreply.github.com> Signed-off-by: eldarkurtic <8884008+eldarkurtic@users.noreply.github.com>	2026-01-22 13:29:57 -07:00
Matthew Bonanni	955b43a5a5	[Bugfix][Attention] Explicitly report support for kv_cache_dtype bfloat16 (#32795 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-22 19:05:18 +00:00
Fadi Arafeh	744ef30484	[CPU Backend] [Perf] Accelerate tensor-parallel/data-parallel inference across NUMA domains on Arm (#32792 ) Signed-off-by: Fadi Arafeh <fadi.arafeh@arm.com>	2026-01-22 18:55:23 +00:00
Matthew Bonanni	300622e609	[CI][Attention] Add more CI dependencies for attention tests (#32487 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com>	2026-01-22 18:44:56 +00:00
RickyChen / 陳昭儒	69d09fdd6c	[Feature] Add --ssl-ciphers CLI argument for TLS cipher control (#30937 ) Signed-off-by: rickychen-infinirc <ricky.chen@infinirc.com>	2026-01-22 09:53:24 -08:00
David Ramon Prados	3a63be0faa	Support custom URI schemes and trace handlers for profiler (#32393 )	2026-01-22 09:45:40 -08:00
Tyler Michael Smith	803e3f3f68	[UX] Default api_server_count to dp_size if not specified (#32525 ) Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com>	2026-01-22 17:35:35 +00:00
Vadim Gimpelson	70917b1c55	[MISC] Add .cursor to .gitignore (#32868 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-22 17:27:13 +00:00
Matt	c517d8c934	[Hardware][AMD][CI][Bugfix] Fix regressions from deprecated env vars (#32837 ) Signed-off-by: Matthew Wong <Matthew.Wong2@amd.com>	2026-01-23 00:59:15 +08:00
Xu Jinyang	fc37187a51	[Bugfix] ModelScope is supported when downloading LORA models. (#32844 ) Signed-off-by: AuYang <459461160@qq.com>	2026-01-22 16:33:21 +00:00
Maximilien de Bayser	ff365eea94	Support bge-m3 sparse embeddings and colbert embeddings (#14526 ) Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Max de Bayser <maxdebayser@gmail.com>	2026-01-22 23:52:57 +08:00

... 22 23 24 25 26 ...

14386 Commits