biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Anna Shors	6eb745d9bd	Add truncate arg to yarn to match openai implementation of gpt-oss (#28244 ) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Chen Zhang <zhangch99@outlook.com>	2025-11-20 18:53:50 +08:00
Wentao Ye	2c52c7fd9a	[Bug] Fix torch dynamo warning Dynamo detected a call to a `functools.lru_cache` (#29038 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-20 16:52:23 +08:00
Shengliang Xu	a8c536829c	Consolidate Nvidia ModelOpt quant config handling for all quantization methods (#28076 ) Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>	2025-11-19 22:39:36 -05:00
Wentao Ye	5031cd5d55	[Refactor] Optimize `select_experts` (#28069 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 18:53:15 -05:00
JartX	8e38e99829	[Feature] EPLB on Qwen3VLMoe and CompressedTensorsWNA16MoEMethod (#28849 )	2025-11-19 18:30:08 -05:00
Max Hu	cb0a7b4bea	[Bugfix] Move flashinfer kernel check into ```__init__`` `function of` ``FusedMoE``` (#29018 ) Signed-off-by: Max Hu <hyoung2991@gmail.com>	2025-11-19 21:54:15 +00:00
Yongye Zhu	88f5b19f0b	[DeepSeek] Fix DeepSeek V3.2 Rope Embedding (#28968 ) Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-11-19 16:30:04 -05:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
Wentao Ye	1607e664f0	[Bug] Fix Batch Invariant MLA test (#28967 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-19 21:18:32 +00:00
Qiu	2fd893b4ce	[Feature] Prefill Context Parallel (PCP) basic support (#28718 ) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Signed-off-by: FENP <yuanyongjie.yyj@antgroup.com> Signed-off-by: LookAround <lixushi@huawei.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: FENP <yuanyongjie.yyj@antgroup.com> Co-authored-by: LookAround <lixushi@huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: zhenwenqi2024 <zhenwenqi_2022@qq.com> Co-authored-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com>	2025-11-19 15:52:44 -05:00
杰兮	9d2d561257	[Bugfix] Fix precision corruption when shared_experts_stream=None (#28942 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-19 19:30:57 +00:00
Robert Shaw	fe69f331f8	[Kernels] Improve H200 Fused MoE Config (#28992 ) Signed-off-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com>	2025-11-19 19:23:54 +00:00
Harry Mellor	a8b70304d6	Update `rope_scaling` to `rope_parameters` in preparation for Transformers v5 (#28542 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 09:06:36 -08:00
Shanshan Shen	d44e9df7d4	[Model][Mamba] Add selector for mamba attention backend and make it pluggable for other device (#26487 ) Signed-off-by: shen-shanshan <467638484@qq.com>	2025-11-19 16:24:55 +00:00
Chen Bruce	da2f6800e0	[Feat][Perf] Enable deepep-low-latency with round-robin expert placement. (#28449 ) Signed-off-by: bruceszchen <bruceszchen@tencent.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-19 13:46:24 +01:00
Xin Yang	468a8d72ba	[Bugfix] Fix FusedMoEModularKernel for triton backend (#28913 ) Signed-off-by: Xin Yang <xyangx@amazon.com>	2025-11-19 13:05:22 +08:00
Li, Jiang	20852c8f4c	[CPU] Refactor CPU WNA16 (#28826 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2025-11-19 10:32:00 +08:00
tomeras91	1395461f5f	[Hybrid][torch.compile] Refactor mamba2 forward to avoid obscuring linear projections under custom op (#28587 ) Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>	2025-11-18 16:49:36 -08:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
Isotr0py	e4bb2684bc	[Models] Replace all `nn.Conv2d` with vLLM's Conv2dLayer (#28842 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 18:56:04 +00:00
Luciano Martins	c2612371ad	[Model] Add Gemma3 GGUF multimodal support (#27772 ) Signed-off-by: Luciano Martins <lucianommartins@users.noreply.github.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Luciano Martins <lucianommartins@users.noreply.github.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-18 08:56:29 -08:00
Canlin Guo	b9489f51e1	[Model][Perf] Use cos and sin cache in QwenVL (#28798 ) Signed-off-by: gcanlin <canlinguosdu@gmail.com>	2025-11-18 11:51:54 +00:00
Wentao Ye	3ddcf46011	[Refactor] Remove Unused Func in Batch Invariant (#28881 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-11-17 20:29:29 -08:00
xuebwang-amd	d0a73620cc	[ROCm][Quantization] add apply_vllm_mapper in quark config for models like gpt-oss (#28638 ) Signed-off-by: xuebwang-amd <xuebwang@amd.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 11:16:45 +08:00
Zhewen Li	f8b19c0ffd	[Bugfix] Fix GPT-OSS on AMD after #28603 (#28816 ) Signed-off-by: zhewenli <zhewenli@meta.com>	2025-11-17 13:15:26 -05:00
jiahanc	561253b37f	[Performance][Fix] update nvfp4 code to support renorm routing (#28569 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-16 18:02:42 -08:00
amirkl94	03ee48111d	Feature: Support Relu2 in FusedMoE fp8 cutlass path (#27261 )	2025-11-16 13:39:44 -05:00
Zhewen Li	1ec978c209	[Kernel][Moe Configs] llama4 maverick fp8 moe config tp8 on mi325 (#28709 ) Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-11-15 01:10:48 -08:00
Varun Sundar Rabindranath	6965ef436f	[Performance][DeepGEMM] Estimate expected_m (#28694 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-15 13:52:14 +08:00
Thomas Parnell	e0c910bb89	[Hybrid] [Kernel] Fix chunk scan kernel when BLOCK_SIZE_DSTATE > 128 (#28295 ) Signed-off-by: Thomas Parnell <tpa@zurich.ibm.com>	2025-11-14 22:55:42 +00:00
Alexander Matveev	e5c78956c0	[Bugfix] Fix incorrect use of hidden_states for shared_experts due to do_naive_dispatch_combine (#28740 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-14 14:13:46 -08:00
Andrey Khalyavin	fd4555089a	[BugFix] Fix misprint introduced by modular_kernel refactoring. (#28728 ) Signed-off-by: Andrey Khalyavin <halyavin@yandex-team.ru>	2025-11-14 10:58:18 -08:00
TJian	a425dc256e	[Bugfix] [ROCm] [AITER]: Fix aiter block quant not compatible with torch compile dynamo (#28716 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-11-14 10:30:50 -08:00
Duncan Moss	3f8a874065	[Kernels] Enable FlashInfer FP8 Blockscale on SM90 (for TEP DSR1) (#27134 ) Signed-off-by: Duncan Moss <djm.moss@gmail.com> Co-authored-by: Robert Shaw <114415538+robertgshaw2-redhat@users.noreply.github.com> Co-authored-by: Wentao Ye <44945378+yewentao256@users.noreply.github.com>	2025-11-14 08:02:44 -08:00
Shanshan Shen	41b92f7d38	[Model][MM] Extract conv layer as CustomOp (#28455 ) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn>	2025-11-14 19:16:13 +08:00
haoyangli-amd	0b25498990	[Misc] add ignore mapper for quark quantization (#28275 ) Signed-off-by: Haoyang Li <lihaoyang0109@gmail.com>	2025-11-14 05:56:35 +00:00
Hank_	4d5943bda6	[quantization][config] enable override existing quant_config (#28510 ) Signed-off-by: Hank <hcc.mayday@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-14 01:24:10 +00:00
Varun Sundar Rabindranath	fe1cd7704d	[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-13 10:16:55 -08:00
zofia	c47b6c85ac	[XPU] add sym params to IPEXConfig (#28611 ) Signed-off-by: Zhu, Zufang <zufang.zhu@intel.com>	2025-11-13 11:35:04 +00:00
Zijing Liu	5e973209aa	[BugFix] Fix type error when assign a trition kernel tensor to a torch.nn.Parameter (#28603 ) Signed-off-by: Zijing Liu <liuzijing2014@gmail.com>	2025-11-13 11:30:04 +00:00
Jiangyun Zhu	fa183e9271	[Bugfix] fix kimi-linear crash (#28445 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-11-13 07:59:58 +00:00
Lucia Fang	7e082bc14e	Support DeepEP for Kimi-k2-thinking through enabling gemm selection for compressed-tensor marlin wna16 (#28574 ) Signed-off-by: Lu Fang <fanglu@fb.com>	2025-11-12 21:40:45 -08:00
wangxiyuan	2dacd57394	[platform] Move get_cu_count to utils (#27005 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2025-11-13 08:48:47 +08:00
Alexander Matveev	69d0e90313	[MoE][Kernel][Perf] Improve Shared Expert Stream Overlap (#28406 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 23:37:24 +00:00
vllmellm	d8140b9833	[ROCM] Fix ROCm warnings, environment flag access, and GEMM kernel naming for consistency in `_aiter_ops.py` (#28464 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2025-11-12 21:46:57 +00:00
Varun Sundar Rabindranath	74a9a9faad	[Performance][B200] Fix deepgemm prologue (#27897 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-12 13:13:03 -08:00
PerryZhang01	a1e7fa362a	[EPLB][ROCm]: support EPBL for ROCm backend (#27731 ) Signed-off-by: Perry Zhang <perzhang@amd.com> Co-authored-by: Perry Zhang <perzhang@amd.com>	2025-11-12 18:16:35 +00:00
Alexander Matveev	f76e85c299	[Performance][Hopper] Avoid M dim padding to 4x for most cases (due to cuda graphs paddings) (#28492 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com>	2025-11-12 10:51:43 -05:00
Harry Mellor	54aecd9ed5	Fix pre-commit (and XPU) on `main` (#28556 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-12 06:13:41 -08:00
Lukas Geiger	ac0bb2c307	[Core] Cache `vllm_is_batch_invariant` (#28304 ) Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>	2025-11-12 05:03:01 +00:00

1 2 3 4 5 ...

1539 Commits