biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
mobicham	96846bb360	Fix TorchAOConfig skip layers (#19265 ) Signed-off-by: mobicham <hicham@mobiuslabs.com>	2025-06-12 22:22:53 +08:00
Jee Jee Li	73e2e0118f	[Quantization] Improve AWQ logic (#19431 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-12 11:02:11 +00:00
rasmith	2e090bd5df	[AMD][Kernel][BugFix] fix test_rocm_compressed_tensors_w8a8 for rocm (#19509 ) Signed-off-by: Randall Smith <Randall.Smith@amd.com>	2025-06-12 07:14:24 +00:00
Brayden Zhong	3f6341bf7f	Add Triton Fused MoE kernel config for E=16 on B200 (#19518 ) Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>	2025-06-12 04:31:51 +00:00
Varun Sundar Rabindranath	e5d35d62f5	[BugFix] Force registration of w8a8_block_fp8_matmul_deepgemm via lazy import (#19514 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-12 04:28:12 +00:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
bnellnm	29fa5cac1c	[Kernels] Add activation chunking logic to FusedMoEModularKernel (#19168 ) Signed-off-by: Bill Nell <bnell@redhat.com>	2025-06-11 12:53:10 -04:00
Ximingwang-09	3c8694eabe	Fix some typo (#19475 ) Signed-off-by: ximing.wxm <ximing.wxm@antgroup.com> Co-authored-by: ximing.wxm <ximing.wxm@antgroup.com>	2025-06-11 10:36:04 +00:00
artetaout	b8e809a057	[Kernel] Support deep_gemm for linear methods (#19085 ) Signed-off-by: artetaout <lulala341@gmail.com>	2025-06-11 15:14:45 +08:00
Junhao Li	2d40665fe8	Add fused MOE config for Qwen3 30B A3B on B200 (#19455 ) Signed-off-by: Junhao Li <junhao@ubicloud.com>	2025-06-11 13:43:46 +08:00
wang.yuqi	3952731e8f	[New Model]: Support Qwen3 Embedding & Reranker (#19260 )	2025-06-10 20:07:30 -07:00
Xu Wenqing	22c3c0aa4a	Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B-FP8 (#19401 ) Signed-off-by: 许文卿 <xwq391974@alibaba-inc.com>	2025-06-11 07:23:57 +08:00
py-andy-c	33f8dba7c6	[Model] use AutoWeightsLoader for commandr (#19399 ) Signed-off-by: py-andy-c <pychen1017@gmail.com>	2025-06-10 22:42:21 +00:00
Jee Jee Li	b6553be1bc	[Misc] Slight improvement of the BNB (#19418 ) Some checks failed Create Release / Create Release (push) Has been cancelled Details Signed-off-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Isotr0py <2037008807@qq.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-06-10 13:51:49 +00:00
Varun Sundar Rabindranath	5cf2daea9a	[Misc] Fixes and Optimizations for DeepEP + DeepGEMM combination. (#19298 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-09 10:50:39 -04:00
Dipika Sikka	c123bc33f9	[Quantization] Add compressed-tensors NVFP4 support (#18312 )	2025-06-08 09:05:55 -04:00
Xu Wenqing	989dcee981	Add H20-3e fused MoE kernel tuning configs for Qwen3-235B-A22B (#19315 ) Signed-off-by: Xu Wenqing <xuwq1993@qq.com>	2025-06-08 16:07:02 +08:00
ElizaWszola	84166fee97	[Kernel] Integrate CUTLASS MoE kernel with PPLX (#18762 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-06 18:26:11 -07:00
Jee Jee Li	7661e92ef8	[Model] Optimize nemotron_h implementation (#19249 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-06-06 10:05:14 +00:00
Dipika Sikka	94870359cd	[Quantization] Bump compressed-tensors version; update NVFP4A16 test model (#19224 ) Signed-off-by: Dipika Sikka <dipikasikka1@gmail.com>	2025-06-06 01:21:54 -07:00
Benjamin Chislett	3465b87ef8	[Bugfix] Fix EAGLE vocab embedding construction for Llama 70B (#19033 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-06-05 19:10:08 -07:00
Jerry Zhang	c8134bea15	Fix AOPerModuleConfig name changes (#18869 ) Signed-off-by: Jerry Zhang <jerryzh168@gmail.com>	2025-06-05 18:51:32 -07:00
Luis Vega	cb6d572e85	[Model] NemotronH support (#18863 ) Signed-off-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com> Co-authored-by: Luis Vega <2478335+vegaluisjose@users.noreply.github.com>	2025-06-05 21:29:28 +00:00
Chiyue Wei	61059bee40	[Hardware][NVIDIA] FP4 MoE kernel optimization (#19110 ) Signed-off-by: Chiyue Wei <chiyuew@nvidia.com> Co-authored-by: Chiyue Wei <chiyuew@nvidia.com>	2025-06-05 09:48:26 -07:00
Xu Wenqing	ec89524f50	Add H20-3e fused MoE kernel tuning configs for DeepSeek-R1/V3 (#19205 )	2025-06-05 16:38:54 +00:00
Varun Sundar Rabindranath	c3fd4d669a	[Kernel] Integrate batched/masked deepgemm kernel (#19111 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun <vsundarr@redhat.com>	2025-06-04 21:59:18 +00:00
Lain	5f2cd251d2	Sm100 blockwise fp8 swap ab (#18564 )	2025-06-04 07:48:45 -07:00
Cyrus Leung	01dc9a76db	[CI/Build][Bugfix] Ensure compatibility with transformers 4.52 (#18678 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-06-04 04:49:20 -07:00
wang.yuqi	35cf32df30	Improve the output precision of embedding models (#19092 )	2025-06-04 11:48:57 +00:00
Vadim Gimpelson	5d6d1adf15	[KERNEL] Sampler. CUDA kernel for applying repetition penalty (#18437 )	2025-06-03 21:13:01 -07:00
Varun Sundar Rabindranath	fa98d77773	[Kernel] DeepEP dispatch-combine kernel integration (#18434 ) Signed-off-by: Varun <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-06-03 12:30:02 -07:00
Simon Mo	02f0c7b220	[Misc] Add SPDX-FileCopyrightText (#19100 ) Signed-off-by: simon-mo <simon.mo@hey.com>	2025-06-03 11:20:17 -07:00
Isotr0py	ec2dcd80bc	[Misc] Update `WeightsMapper` for qwen2-vl/qwen2.5-vl (#19054 ) Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-03 09:08:20 +00:00
汪志鹏	1282bd812e	Add tarsier model support (#18985 ) Signed-off-by: 汪志鹏 <wangzhipeng628@gmail.com>	2025-06-03 13:13:13 +08:00
Tyler Michael Smith	8a57872b2a	[Bugfix][EP+DP] Use pplx-kernel internode instead of intranode (#19034 ) Signed-off-by: Tyler Michael Smith <tysmith@redhat.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>	2025-06-03 11:36:51 +08:00
Siyuan Liu	9112b443a0	[Hardware][TPU] Initial support of model parallelism with single worker using SPMD (#18011 ) Signed-off-by: Siyuan Liu <lsiyuan@google.com> Co-authored-by: Hossein Sarshar <hossein.sarshar@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@google.com>	2025-06-03 00:06:20 +00:00
jennyyyyzhen	ebb1ec9318	[Model] enable data parallel for Llama4 vision encoder (#18368 ) Signed-off-by: yzhen <yzhen@devgpu093.cco2.facebook.com> Co-authored-by: yZhen <yZhen@fb.com> Co-authored-by: yzhen <yzhen@devgpu093.cco2.facebook.com>	2025-06-02 19:22:54 +08:00
22quinn	9760fd8f6a	[Core] Support inplace model weights loading (#18745 ) Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>	2025-06-02 17:38:50 +08:00
Isotr0py	a35ca765a5	[LoRA] Support dynamically initialize `packed_modules_mapping` for VLM with arbitrary components (#18987 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-06-01 11:06:57 +08:00
Benjamin Chislett	1bc86a3da1	[Bugfix] Fix EAGLE3 broken logits (#18909 ) Signed-off-by: Benjamin Chislett <benjamin.chislett@centml.ai>	2025-05-31 19:58:07 -07:00
Charlie Fu	306d60401d	[ROCm][Kernel] Add gfx950 support for skinny gemms (#18010 ) Signed-off-by: charlifu <charlifu@amd.com>	2025-05-31 07:40:05 -07:00
vllmellm	0f5e0d567e	[FEAT][ROCm] Add AITER grouped topk for DeepSeekV2 (#18825 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-05-31 03:39:31 -07:00
Satyajith Chilappagari	2a50ef5760	[Neuron] Add Multi-Modal model support for Neuron (#18921 ) Signed-off-by: Satyajith Chilappagari <satchill@amazon.com> Co-authored-by: Ashraf Mahgoub <ashymahg@amazon.com> Co-authored-by: Rohith Nallamaddi <nalrohit@amazon.com> Co-authored-by: FeliciaLuo <luof@amazon.com> Co-authored-by: Elaine Zhao <elaineyz@amazon.com>	2025-05-31 10:39:11 +00:00
rongfu.leng	7f21e8052b	[Misc] add group_size is -1 in awq quantization (#18910 ) Signed-off-by: rongfu.leng <rongfu.leng@daocloud.io>	2025-05-30 17:34:22 +00:00
Isotr0py	5a8641638a	[VLM] Add PP support and fix GPTQ inference for Ovis models (#18958 ) Signed-off-by: isotr0py <2037008807@qq.com> Signed-off-by: Isotr0py <2037008807@qq.com>	2025-05-30 17:11:44 +00:00
Shawn Huang	e1fadf1197	[Feature] minicpm eagle support (#18943 ) Signed-off-by: huangyuxiang03 <huangyx0321@gmail.com> Co-authored-by: huangyuxiang03 <huangyx0321@gmail.com>	2025-05-30 06:45:56 -07:00
Lukas Geiger	c3bb9f2331	[Model] Use in-place adds in SigLIP (#18922 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-05-30 17:12:59 +08:00
Cyrus Leung	4f4a6b844a	[Deprecation] Remove mean pooling default for `Qwen2EmbeddingModel` (#18913 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-30 06:53:37 +00:00
Michael Goin	4d0a1541be	[Bugfix] Remove NVFP4 scales assertions to fix load_format=dummy (#18861 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-05-30 13:37:36 +08:00
iLeGend	3987e2ae96	[Model] Use AutoWeightsLoader for mamba2 (#18918 ) Signed-off-by: iLeGend <824040212@qq.com>	2025-05-30 04:50:10 +00:00

1 2 3 4 5 ...

1928 Commits