biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Amir Samani	030fc44914	use the same stream for cuda graph catpure and replay for NCCL (#29207 ) Signed-off-by: Amir Samani <asamani@nvidia.com> Signed-off-by: youkaichao <youkaichao@gmail.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-25 19:10:03 +08:00
Cyrus Leung	09dc7c690c	[Chore][1/2] Drop `v0.14` deprecations (#31285 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-24 09:54:01 -08:00
Kevin McKay	ec58c10ce1	[Misc] Fix quantization-related typos (#31116 ) Signed-off-by: c0de128 <kevin.mckay@outlook.com>	2025-12-21 21:13:48 -08:00
Wentao Ye	3bd8335bd0	[Refactor] Refactor for `DeepGemmQuantScaleFMT` using cache (#30898 ) Signed-off-by: yewentao256 <zhyanwentao@126.com>	2025-12-19 13:50:39 -07:00
Cyrus Leung	2497228ad4	[Chore] Factor out logic for requesting initial memory (#30868 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-17 07:32:17 -08:00
danielafrimi	7b966ae2ba	[Fix]Load kv-cache dtype from hf_quant_config.json automatically (fix for reverted PR) (#30785 ) Signed-off-by: <> Co-authored-by: root <root@gpu-937.slurm-workers-slurm.slurm.svc.cluster.local>	2025-12-17 01:56:38 -08:00
Ye (Charlotte) Qi	a100152288	[Kernels][FI] Skip trtllm attention when num_kv_heads=1 (#30842 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-12-17 01:54:21 -08:00
jiahanc	254a7f8fd6	[Perf] Do FP4 quant before All gather on flashinfer trtllmgen MOE (#30014 ) Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com>	2025-12-16 13:01:48 -08:00
Robert Shaw	e2ed238885	Revert "[Fix]Load kv-cache dtype from hf_quant_config.json automatically" (#30653 )	2025-12-14 19:33:41 -05:00
ElizaWszola	994acec0cc	[Bugfix] Fix fusion for VL models (#30244 ) Signed-off-by: ElizaWszola <ewszola@redhat.com>	2025-12-14 21:22:37 +08:00
Roberto L. Castro	4fa7ce46f3	[Feature] Add SM103 (Blackwell Ultra) Support to vLLM (#30484 ) Signed-off-by: LopezCastroRoberto <robertol.c510@gmail.com> Signed-off-by: Roberto L. Castro <38211239+LopezCastroRoberto@users.noreply.github.com> Co-authored-by: youkaichao <youkaichao@gmail.com>	2025-12-12 19:34:23 -08:00
danielafrimi	6ec0d8dbe4	[Fix]Load kv-cache dtype from hf_quant_config.json automatically (#29980 ) Signed-off-by: Daniel Afrimi <dafrimi@nvidia.com>	2025-12-12 11:27:47 -08:00
Po-Han Huang (NVIDIA)	eea41804a4	[bug] Fix "Current vLLM config is not set." warnings when FlashInfer attention is used (#30241 ) Signed-off-by: Po-Han Huang <pohanh@nvidia.com>	2025-12-10 11:18:51 -08:00
Zhewen Li	ae339b1a67	[Bugfix] Fix DeepGEMM after #29546 (#30267 ) Signed-off-by: zhewenli <zhewenli@meta.com> Signed-off-by: Zhewen Li <zhewenli@meta.com>	2025-12-09 01:05:27 +00:00
wang.yuqi	2e660c2434	[Frontend] Binary embedding response does not return metadata by setting encoding_format to bytes_only. (#30249 ) Signed-off-by: wang.yuqi <yuqi.wang@daocloud.io> Signed-off-by: wang.yuqi <noooop@126.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-12-08 12:01:21 +00:00
ElizaWszola	af0444bf40	[Performance] Fused blockwise quant RMS norm (#27883 ) Signed-off-by: ElizaWszola <ewszola@redhat.com> Signed-off-by: yewentao256 <zhyanwentao@126.com> Co-authored-by: yewentao256 <zhyanwentao@126.com>	2025-12-07 16:38:04 +00:00
Yanan Cao	cbedb703cc	[Frontend] Remove confusing -O.xx flag error (#30169 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-07 02:53:42 +00:00
Matthew Bonanni	66e674cdd5	[Attention][UX][1/N] Add AttentionConfig and change attention env vars to CLI arguments (#26315 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com>	2025-12-05 09:48:43 -08:00
Max Hu	c2894d3883	[Feature] Add Layer-wise NVTX Support (#29990 ) Signed-off-by: Max Hu <hyoung2991@gmail.com> Signed-off-by: Max Hu <maxhu@nvidia.com> Co-authored-by: Max Hu <maxhu@nvidia.com>	2025-12-05 11:20:07 +00:00
Yanan Cao	62b3333448	[Frontend] Remove deprecated -O.xx flag (#29991 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com>	2025-12-05 00:47:22 -08:00
amitz-nv	6038b1b04b	[Frontend][Model] Add 'float16' to possible mamba cache dtype values, override mamba SSM cache dtype value for NemotronH (#29978 ) Signed-off-by: amitz-nv <203509407+amitz-nv@users.noreply.github.com>	2025-12-05 00:34:33 -08:00
Cyrus Leung	9ae2f60374	[Misc] Various cleanups for MM input processing (#29970 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-04 06:22:20 +00:00
Elizabeth Thomas	b5407869c8	[Bugfix] Respect VLLM_CONFIGURE_LOGGING value (#28671 ) Signed-off-by: Elizabeth Thomas <email2eliza@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Roger Wang <hey@rogerw.io> Signed-off-by: Jane Xu <janeyx@meta.com> Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Johnny Yang <johnnyyang@google.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: bruceszchen <bruceszchen@tencent.com> Co-authored-by: Roger Wang <hey@rogerw.io> Co-authored-by: Jane (Yuan) Xu <31798555+janeyx99@users.noreply.github.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Johnny Yang <24908445+jcyang43@users.noreply.github.com>	2025-12-03 22:00:52 +00:00
Lumis Chen	9bcf92295a	[Core] Add xxHash as a high-performance hash option for accelerating prefix caching (#29163 ) Signed-off-by: LuminolT <lumischen01@gmail.com> Signed-off-by: Lumis Chen <lumischen01@gmail.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-12-03 16:06:57 +00:00
Cyrus Leung	7675ba30de	[Misc] Remove redundant `ClassRegistry` (#29681 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Cyrus Leung <cyrus.tl.leung@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-28 15:24:47 -08:00
Yanan Cao	3461e7efd8	[Frontend] Remap -O to -cc commandline flag (#29557 ) Signed-off-by: Yanan Cao <gmagogsfm@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2025-11-28 21:51:12 +00:00
Didier Durand	fae6943068	[Doc]: fixing typos in multiple files. (#29685 ) Signed-off-by: Didier Durand <durand.didier@gmail.com>	2025-11-28 08:41:41 -08:00
Cyrus Leung	953d9c820b	[mypy] Pass type checking for `vllm/utils` and `vllm/v1/pool` (#29666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 20:40:47 +08:00
Cyrus Leung	33b06a6f24	[Misc] Remove redundant attention var constants (#29650 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-28 04:35:19 -08:00
Morrison Turnansky	0838b52e2e	[Frontend][torch.compile] CompilationConfig Overhaul (#20283 ): Set up -O infrastructure (#26847 ) Signed-off-by: morrison-turnansky <mturnans@redhat.com> Signed-off-by: adabeyta <aabeyta@redhat.com> Signed-off-by: Morrison Turnansky <mturnans@redhat.com> Co-authored-by: adabeyta <aabeyta@redhat.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-11-27 01:55:58 -08:00
Michael Goin	8d6a89dffd	[UX] Suppress gloo log spam (#29250 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-25 17:19:35 -08:00
George D. Torres	56531b79cc	[Misc] Add backup hash algorithm for FIPS constrained environments (#28795 ) Signed-off-by: George D. Torres <gdavtor@gmail.com> Signed-off-by: George D. Torres <41129492+geodavic@users.noreply.github.com> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com>	2025-11-26 00:50:22 +00:00
Harry Mellor	a1f2676879	Scheduled removal of `override_pooler_config` and `disable_log_requests` (#29402 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-25 16:08:57 +00:00
Nick Hill	db2906108a	[Misc] Streamline unique id generation (#29375 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-25 08:30:11 +00:00
Roger Wang	0ff70821c9	[Core] Deprecate `xformers` (#29262 ) Signed-off-by: Roger Wang <hey@rogerw.io>	2025-11-24 04:18:55 +00:00
Jialin Ouyang	30b9c67743	Revert "[Redo] #26368 (#28771 )" (#29121 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 21:27:45 -08:00
Pleaplusone	06c20c9904	[ROCm] Add AMD GPU support on Deepseek v3.2 and SparseMLA (#26670 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-20 02:54:01 -08:00
Nick Hill	9ccef8e333	[Misc] Colorize logs (#29017 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-11-19 19:26:04 -05:00
Jialin Ouyang	537cc635c7	[GC Debugger] Simply and improve GC Debugger Utils (#29029 ) Signed-off-by: Jialin Ouyang <Jialin.Ouyang@gmail.com>	2025-11-20 00:10:22 +00:00
Alexander Matveev	3aaa94ac99	[Performance] Reduce DeepGEMM N dim restriction from 128 to 64 multiplier (#28687 ) Signed-off-by: Alexander Matveev <amatveev@redhat.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: mgoin <mgoin64@gmail.com>	2025-11-19 15:47:13 -08:00
Shu Wang	613abb50d5	[MoE] Nvfp4 Masked Gemm: Add flashinfer grouped_gemm_nt_masked (#25990 ) Signed-off-by: Shu Wang. <shuw@nvidia.com> Signed-off-by: mgoin <mgoin64@gmail.com> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-11-19 13:29:06 -08:00
杰兮	9d2d561257	[Bugfix] Fix precision corruption when shared_experts_stream=None (#28942 ) Signed-off-by: zhyajie <yajizhan@amd.com> Co-authored-by: zhyajie <yajizhan@amd.com>	2025-11-19 19:30:57 +00:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
Vadim Gimpelson	173b356abf	[PERF] Remove TRTLLM Gen attn kernel limitation `max_seq_len <=131072` (#28755 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-11-15 15:43:41 +05:30
Cyrus Leung	511a6b611d	[Config] Clean up SchedulerConfig initialization (#28665 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 22:41:02 +08:00
Jingchun Gao	4516d44b7f	[DCP] Support Decode Context Parallel (DCP) for GQA with Flashinfer (#25438 ) Signed-off-by: gaojc <1055866782@qq.com> Signed-off-by: Jingchun Gao <gaojingchun1@huawei.com> Signed-off-by: Jingchun Gao <63247409+gjc0824@users.noreply.github.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gaojingchun (A) <g00955623@china.huawei.com> Co-authored-by: Jingchun Gao <gaojingchun1@huawei.com> Co-authored-by: QiuChunshuo <qiuchunshuo@huawei.com>	2025-11-14 11:24:10 +00:00
Cyrus Leung	01bea115c4	[Misc] Remove `warn_for_unimplemented_methods` (#28613 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-11-14 11:10:10 +08:00
Varun Sundar Rabindranath	fe1cd7704d	[Performance][B200] silu_mul_quant: pack scales in int32 (#28358 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-13 10:16:55 -08:00
Pleaplusone	7dca0c90cb	[BugFix][ROCm] Fix `get_cu_count` missing variable error (#28608 ) Signed-off-by: ganyi <ygan@amd.com>	2025-11-13 05:18:56 +00:00
Michael Goin	a543e678b4	[Bugfix] Fix SM100 gpt-oss regression due to faulty attn sink support (#28561 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-11-12 19:40:59 -07:00

1 2 3 4 5

221 Commits