biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Michael Goin	eb4205fee5	[UX] Integrate DeepGEMM into vLLM wheel via CMake (#37980 ) Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>	2026-04-08 18:56:32 -07:00
wangxiyuan	35141a7eed	[Misc]Update gitignore (#37863 ) Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>	2026-03-23 01:14:10 -07:00
Harry Mellor	e39257a552	Add `AGENTS.md` (#36877 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-03-12 10:20:50 -07:00
Lucas Wilkinson	8b5014d3dd	[Attention] FA4 integration (#32974 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Matthew Bonanni <mbonanni@redhat.com> Co-authored-by: Tyler Michael Smith <tyler@neuralmagic.com>	2026-03-01 23:44:57 +00:00
Li, Jiang	05339a7b20	[Bugfix][CPU] Fix llama4 inference on CPU (#34321 ) Signed-off-by: jiang1.li <jiang1.li@intel.com>	2026-02-11 19:07:23 +08:00
Vadim Gimpelson	70917b1c55	[MISC] Add .cursor to .gitignore (#32868 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2026-01-22 17:27:13 +00:00
Lucas Wilkinson	889722f3bf	[FlashMLA] Update FlashMLA to expose new arguments (#32810 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2026-01-21 22:02:39 -07:00
Chang Su	791b2fc30a	[grpc] Support gRPC server entrypoint (#30190 ) Signed-off-by: Chang Su <chang.s.su@oracle.com> Signed-off-by: njhill <nickhill123@gmail.com> Signed-off-by: Nick Hill <nickhill123@gmail.com> Co-authored-by: njhill <nickhill123@gmail.com> Co-authored-by: Simon Mo <simon.mo@hey.com>	2026-01-07 23:24:46 -08:00
Varun Sundar Rabindranath	9912b8ccb8	[Build] Add OpenAI triton_kernels (#28788 ) Signed-off-by: Varun Sundar Rabindranath <vsundarr@redhat.com> Co-authored-by: Varun Sundar Rabindranath <vsundarr@redhat.com>	2025-11-18 16:45:20 -08:00
nadavkluger	611c86ea3c	Added disable rule to track files under benchmarks/lib (#28048 ) Signed-off-by: Nadav Kluger <nadav.k@fmr.ai>	2025-11-04 18:18:43 +00:00
Andrew Xia	bfe0b4bd2a	[ez] add uv lock to gitignore (#27212 ) Signed-off-by: Andrew Xia <axia@fb.com> Co-authored-by: Andrew Xia <axia@fb.com>	2025-10-21 00:37:44 +00:00
Roger Wang	ccb97338af	[Misc] Add Codex settings to gitignore (#24493 ) Signed-off-by: Roger Wang <hey@rogerw.me> Co-authored-by: Roger Wang <hey@rogerw.me>	2025-09-09 01:25:44 -07:00
Ye (Charlotte) Qi	45c9cb5835	[Misc] Add claude settings to gitignore (#24492 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-09-09 01:14:45 -07:00
Jee Jee Li	48f4636927	[Misc] Ignore ep_kernels_workspace (#22807 ) Signed-off-by: Jee Jee Li <pandaleefree@gmail.com>	2025-08-15 05:58:03 -07:00
Harry Mellor	767e63b860	[Docs] Improve docs navigation (#22720 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-12 04:25:55 -07:00
Yongye Zhu	e789cad6b8	[gpt-oss] triton kernel mxfp4 (#22421 ) Signed-off-by: <zyy1102000@gmail.com> Signed-off-by: Yongye Zhu <zyy1102000@gmail.com>	2025-08-08 08:24:07 -07:00
Harry Mellor	3482fd7e4e	[Doc] Add engine args back in to the docs (#20674 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-07-10 08:02:40 -07:00
Ning Xie	2f1c19b245	[CI] change spell checker from codespell to typos (#18711 ) Signed-off-by: Andy Xie <andy.xning@gmail.com>	2025-06-11 19:57:10 -07:00
Cyrus Leung	82e2339b06	[Doc] Move examples and further reorganize user guide (#18666 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-05-26 07:38:04 -07:00
Harry Mellor	a1fe24d961	Migrate docs from Sphinx to MkDocs (#18145 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-23 02:09:53 -07:00
Harry Mellor	d6484ef3c3	Add full API docs and improve the UX of navigating them (#17485 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-05-03 19:42:43 -07:00
Lucas Wilkinson	d8bccde686	[BugFix] Fix vllm_flash_attn install issues (#17267 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Co-authored-by: Jee Jee Li <pandaleefree@gmail.com> Co-authored-by: Aaron Pham <contact@aarnphm.xyz>	2025-04-27 17:27:56 -07:00
Christian Heimes	65e262b93b	Fix Python packaging edge cases (#17159 ) Signed-off-by: Christian Heimes <christian@python.org>	2025-04-26 06:15:07 +08:00
DefTruth	a6481525b8	[misc] ignore marlin_moe_wna16 local gen codes (#16760 ) Signed-off-by: DefTruth <qiustudent_r@163.com>	2025-04-17 17:15:14 +08:00
Lucas Wilkinson	dccf535f8e	[V1] Enable V1 Fp8 cache for FA3 in the oracle (#15191 ) Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com> Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com>	2025-03-23 15:07:04 -07:00
Aaron Pham	80e9afb5bc	[V1][Core] Support for Structured Outputs (#12388 ) Signed-off-by: Aaron Pham <contact@aarnphm.xyz> Signed-off-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Russell Bryant <rbryant@redhat.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: Nick Hill <nhill@redhat.com>	2025-03-07 07:19:11 -08:00
Harry Mellor	5950f555a1	[Doc] Group examples into categories (#11782 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-01-08 09:20:12 +08:00
Rafael Vasquez	32aa2059ad	[Docs] Convert rST to MyST (Markdown) (#11145 ) Signed-off-by: Rafael Vasquez <rafvasq21@gmail.com>	2024-12-23 22:35:38 +00:00
Russell Bryant	3be5b26a76	[CI/Build] Add shell script linting using shellcheck (#7925 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-11-07 18:17:29 +00:00
Russell Bryant	e0dbdb013d	[CI/Build] Add linting for github actions workflows (#7876 ) Signed-off-by: Russell Bryant <rbryant@redhat.com>	2024-10-07 21:18:10 +00:00
Tyler Michael Smith	2e7fe7e79f	[Build/CI] Set FETCHCONTENT_BASE_DIR to one location for better caching (#8930 )	2024-09-29 03:13:01 +00:00
Daniele	2467b642dd	[CI/Build] fix setuptools-scm usage (#8771 )	2024-09-24 12:38:12 -07:00
Daniele	ee5f34b1c2	[CI/Build] use setuptools-scm to set __version__ (#4738 ) Co-authored-by: youkaichao <youkaichao@126.com>	2024-09-23 09:44:26 -07:00
Luka Govedič	71c60491f2	[Kernel] Build flash-attn from source (#8245 )	2024-09-20 23:27:10 -07:00
Lucas Wilkinson	5288c06aa0	[Kernel] (1/N) Machete - Hopper Optimized Mixed Precision Linear Kernel (#7174 )	2024-08-20 07:09:33 -06:00
Michael Goin	855866caa9	[Kernel] Add tuned triton configs for ExpertsInt8 (#7601 )	2024-08-16 11:37:01 -07:00
Michael Goin	111fc6e7ec	[Misc] Add generated git commit hash as `vllm.__commit__` (#6386 )	2024-07-12 22:52:15 +00:00
Harry Mellor	3d925165f2	Add example scripts to documentation (#4225 ) Co-authored-by: Harry Mellor <hmellor@oxts.com>	2024-04-22 16:36:54 +00:00
Adrian Abeyta	2ff767b513	Enable scaled FP8 (e4m3fn) KV cache on ROCm (AMD GPU) (#3290 ) Co-authored-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Co-authored-by: HaiShaw <hixiao@gmail.com> Co-authored-by: AdrianAbeyta <Adrian.Abeyta@amd.com> Co-authored-by: Matthew Wong <Matthew.Wong2@amd.com> Co-authored-by: root <root@gt-pla-u18-08.pla.dcgpu> Co-authored-by: mawong-amd <156021403+mawong-amd@users.noreply.github.com> Co-authored-by: ttbachyinsda <ttbachyinsda@outlook.com> Co-authored-by: guofangze <guofangze@kuaishou.com> Co-authored-by: Michael Goin <mgoin64@gmail.com> Co-authored-by: jacobthebanana <50071502+jacobthebanana@users.noreply.github.com> Co-authored-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2024-04-03 14:15:55 -07:00
Woosuk Kwon	1cb0cc2975	[FIX] Make `flash_attn` optional (#3269 )	2024-03-08 10:52:20 -08:00
Woosuk Kwon	2daf23ab0c	Separate attention backends (#3005 )	2024-03-07 01:45:50 -08:00
Harry Mellor	2709c0009a	Support OpenAI API server in `benchmark_serving.py` (#2172 )	2024-01-18 20:34:08 -08:00
TJian	6ccc0bfffb	Merge EmbeddedLLM/vllm-rocm into vLLM main (#1836 ) Co-authored-by: Philipp Moritz <pcmoritz@gmail.com> Co-authored-by: Amir Balwel <amoooori04@gmail.com> Co-authored-by: root <kuanfu.liu@akirakan.com> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: kuanfu <kuanfu.liu@embeddedllm.com> Co-authored-by: miloice <17350011+kliuae@users.noreply.github.com>	2023-12-07 23:16:52 -08:00
Woosuk Kwon	e3e79e9e8a	Implement AWQ quantization support for LLaMA (#1032 ) Co-authored-by: Robert Irvine <robert@seamlessml.com> Co-authored-by: root <rirv938@gmail.com> Co-authored-by: Casper <casperbh.96@gmail.com> Co-authored-by: julian-q <julianhquevedo@gmail.com>	2023-09-16 00:03:37 -07:00
Zhuohan Li	2cf1a333b6	[Doc] Documentation for distributed inference (#261 )	2023-06-26 11:34:23 -07:00
Zhuohan Li	a255885f83	Add logo and polish readme (#156 )	2023-06-19 16:31:13 +08:00
Woosuk Kwon	376725ce74	[PyPI] Packaging for PyPI distribution (#140 )	2023-06-05 20:03:14 -07:00
Woosuk Kwon	19d2899439	Add initial sphinx docs (#120 )	2023-05-22 17:02:44 -07:00
Zhuohan Li	4858f3bb45	Add an option to launch cacheflow without ray (#51 )	2023-04-30 15:42:17 +08:00
Woosuk Kwon	84eee24e20	Collect system stats in scheduler & Add scripts for experiments (#30 )	2023-04-12 15:03:49 -07:00

1 2

52 Commits