biondizzle/vllm - vllm - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Maral	2e9034c998	[W8A8 Block Linear Refactor][2/N] Remove W8A8Fp8BlockLinearOp and adopt Fp8 block linear kernel selections. (#33892 ) Signed-off-by: maral <maralbahari.98@gmail.com> Signed-off-by: Maral <maralbahari.98@gmail.com>	2026-04-09 08:50:39 +08:00
wliao2	4dfad17ed1	replace cuda_device_count_stateless() to current_platform.device_count() (#37841 ) Signed-off-by: Liao, Wei <wei.liao@intel.com> Signed-off-by: wliao2 <wei.liao@intel.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>	2026-03-31 22:32:54 +08:00
Andreas Karatzas	4ed51308c8	[CI] Fix GPU memory leak when RemoteOpenAIServer fails to start in __init__ (#37230 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-17 09:08:08 -07:00
Andreas Karatzas	d4c57863f7	[ROCm][CI] Fix engine teardown and text normalization to stabilize voxtral test (#37138 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-16 04:49:31 +00:00
Nick Hill	b373b5102a	[Tests] Shutdown test `RemoteVLLMServer` cleanly (#36950 ) Recent PR #33949 changed the teardown logic of the RemoteVLLMServer test utility class to send SIGTERM to all vllm (sub)processes at once, which breaks the clean/coordinated shutdown logic that assumes only the top-level process will receive a signal (for example when running in a container that's shut down). This caused a bunch of errors and stacktraces in some test logs, even though those tests still pass. We should still attempt a normal shutdown and only kill other procs if they are still running after a few seconds. Example: tests/v1/distributed/test_external_lb_dp.py::test_external_lb_completion_streaming Signed-off-by: Nick Hill <nickhill123@gmail.com>	2026-03-13 07:32:55 +00:00
Sage	802f306cd1	[Tests] Skip model weight download for render-only test server (#36813 ) Signed-off-by: Sage Ahrac <sagiahrak@gmail.com>	2026-03-12 06:24:42 +00:00
Andreas Karatzas	1e0f917b34	[ROCm][CI] Fix logprob divergence for TitanML/tiny-mixtral under AITER rms_norm (#36101 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-09 12:07:44 -05:00
Andreas Karatzas	807d680337	[ROCm][CI] Fix tool use test stability - disable skinny GEMM, prefix caching, eliminate batch variance (#35553 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-03-06 15:15:12 +08:00
Hyunkyun Moon	bc6be89d16	[Frontend] Add vllm launch command for GPU-less preprocessing serving (#34551 ) Signed-off-by: HyunKyun Moon <mhg5303@gmail.com>	2026-03-04 18:41:52 +00:00
Andreas Karatzas	f5d1281c9d	[ROCm][CI] Expose tests to AMD production CI and fix amdsmi heap corruption (#35071 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-28 13:57:31 +08:00
Itay Alroy	dea268336f	[1/N] Elastic EP Milestone 2 (#34861 ) Signed-off-by: Yongji Wu <wuyongji317@gmail.com> Signed-off-by: Itay Alroy <ialroy@nvidia.com> Signed-off-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Signed-off-by: Ron Tourgeman <rtourgeman@nvidia.com> Co-authored-by: Yongji Wu <wuyongji317@gmail.com> Co-authored-by: Tyler Michael Smith <tlrmchlsmth@gmail.com> Co-authored-by: Ron Tourgeman <rtourgeman@nvidia.com>	2026-02-28 04:46:42 +00:00
Andreas Karatzas	9571e99945	[ROCm][CI] Extending attention backend coverage for Eagle spec decode tests (#35265 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-25 14:16:18 -08:00
BadrBasowid	6af03f2394	[Refactor] [1/N] Reorganize kernel abstraction directory (#34055 ) Signed-off-by: BadrBasowid <badr.basowid@gmail.com> Co-authored-by: vllmellm <vllm.ellm@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2026-02-24 06:47:22 +00:00
Andreas Karatzas	991d6bff38	[CI][MCP][Harmony] Heavy refactoring Harmony & MCP response tests and stabilizing with deterministic test infrastructure (#33949 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-20 20:03:32 -08:00
Micah Williamson	f5432e35a3	[ROCm][CI] Loosen RemoteOpenAIServer Startup Timeout (#34922 ) Signed-off-by: Micah Williamson <micah.williamson@amd.com>	2026-02-20 05:37:49 +00:00
Andreas Karatzas	fb1270f1f8	[CI][Bugfix]: return McpCall for built-in MCP tools in non-streaming mode (#32762 ) Signed-off-by: Andreas Karatzas <akaratza@amd.com>	2026-02-05 11:14:06 +08:00
Harry Mellor	2eb673a088	Add flake8-implicit-str-concat rules to Ruff (#33191 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2026-01-28 04:56:10 +00:00
7. Sun	0ccecf8833	[Tests] Standardize RNG seed utility across test files (#32982 ) Signed-off-by: 7. Sun <jhao.sun@gmail.com>	2026-01-24 06:47:14 +00:00
vllmellm	148117ea2e	[Refactor] Make FP8 Linear Ops use kernel abstraction (#27814 ) Signed-off-by: vllmellm <vllm.ellm@embeddedllm.com>	2026-01-20 14:48:20 +08:00
Lucas Wilkinson	6cdf015c3c	[Misc] Fix `Current vLLM config is not set.` warnings, assert to avoid issues in the future (#31747 ) Signed-off-by: Lucas Wilkinson <lwilkins@redhat.com> Signed-off-by: Lucas Wilkinson <LucasWilkinson@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2026-01-08 15:20:49 -08:00
Vadim Gimpelson	bc0a5a0c08	[CI] Add Qwen3-Next-FP8 to Blackwell model tests (#31049 ) Signed-off-by: Vadim Gimpelson <vadim.gimpelson@gmail.com>	2025-12-23 17:21:50 -08:00
Cyrus Leung	7e24e5d4d6	[Deprecation] Remove deprecated task, seed and MM settings (#30397 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-10 19:59:39 -08:00
Charlie Fu	6af70e11a0	[ROCm][CI] Fix test_max_len.py for Rocm (#29916 ) Signed-off-by: charlifu <charlifu@amd.com> Signed-off-by: Charlie Fu <Charlie.Fu@amd.com>	2025-12-08 16:58:30 -05:00
Cyrus Leung	653591d5e7	[Chore] Move tokenizer initialization methods (#29793 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-12-02 13:33:37 +08:00
Vensen	66b5840287	[Bugfix][sleepmode][fp8 kv cache]: Fix FP8 KV cache + sleep(level=2) gibberish output (#28783 ) Signed-off-by: vensen <vensenmu@gmail.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>	2025-11-30 14:24:25 +08:00
Kevin H. Luu	c64c0b78de	[chore] Move the rest of wikimedia url to S3 (#28921 ) Signed-off-by: Kevin H. Luu <khluu000@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-11-18 09:44:18 -08:00
Benjamin Bartels	1e88fb751b	Adds anthropic /v1/messages endpoint to openai api_server (#27882 ) Signed-off-by: bbartels <benjamin@bartels.dev> Signed-off-by: Benjamin Bartels <benjamin@bartels.dev>	2025-11-01 12:45:42 -07:00
Cyrus Leung	7c2bdb83dc	[Misc] Clean up utils (#27552 ) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>	2025-10-27 09:05:40 +00:00
Jiangyun Zhu	29c9cb8007	[CI] Add tests for cudagraph (#27391 ) Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>	2025-10-25 02:37:33 +00:00
RED	c9461e05a4	Support Anthropic API /v1/messages Endpoint (#22627 ) Signed-off-by: liuli <ll407707@alibaba-inc.com> Co-authored-by: liuli <ll407707@alibaba-inc.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk> Co-authored-by: Michael Goin <mgoin64@gmail.com>	2025-10-22 09:13:18 -07:00
iAmir97	7a6c8c3fa1	[Chore] Separate out `vllm.utils.network_utils` (#27164 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com>	2025-10-19 03:06:32 -07:00
Isotr0py	6ac5e06f7c	[Chore] Clean up pytorch helper functions in `vllm.utils` (#26908 ) Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: isotr0py <2037008807@qq.com>	2025-10-18 09:48:22 -07:00
iAmir97	1d165d6d85	[Chore] Separate out `vllm.utils.mem_utils` (#27143 ) Signed-off-by: iAmir97 <Amir.balwel@embeddedllm.com> Signed-off-by: iAmir97 <71513472+iAmir97@users.noreply.github.com> Co-authored-by: iAmir97 <Amir.balwel@embeddedllm.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2025-10-18 10:06:59 +00:00
Luka Govedič	bd7157a071	[torch.compile] Enable attention and allreduce fusion without custom ops enabled (#24604 ) Signed-off-by: Luka Govedič <lgovedic@redhat.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-10-17 08:10:23 -06:00
Ye (Charlotte) Qi	d32c611f45	[CI/Build] Use 127.0.0.1 instead of localhost in utils (#26750 ) Signed-off-by: Ye (Charlotte) Qi <yeq@meta.com>	2025-10-14 07:04:00 +00:00
Harry Mellor	8fcaaf6a16	Update `Optional[x]` -> `x \| None` and `Union[x, y]` to `x \| y` (#26633 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-12 09:51:31 -07:00
Harry Mellor	d6953beb91	Convert formatting to use `ruff` instead of `yapf` + `isort` (#26247 ) Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-10-05 07:06:22 -07:00
Kunshang Ji	143844fa43	[XPU]Fix xpu spec decoding UTs, avoid using cuda graph (#25847 ) Signed-off-by: Kunshang Ji <kunshang.ji@intel.com>	2025-09-29 05:15:10 +00:00
Michael Goin	f708bd4904	[CI] Add E2E Blackwell Quantized MoE Test (#25723 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-09-26 12:23:00 -07:00
Matthew Bonanni	3468f17ebe	[V0 deprecation] Remove _VLLM_V1 suffixes from attention backend names (#25489 ) Signed-off-by: Matthew Bonanni <mbonanni@redhat.com> Signed-off-by: Matthew Bonanni <mbonanni001@gmail.com>	2025-09-25 17:37:50 +00:00
Woosuk Kwon	eb68c2dcd9	[CI] Revert back prepare_prompts and check_answers (#25087 ) Signed-off-by: Woosuk Kwon <woosuk.kwon@berkeley.edu>	2025-09-17 11:03:16 -07:00
Nick Hill	4db4426404	[CI] Fail subprocess tests with root-cause error (#23795 ) Signed-off-by: Nick Hill <nhill@redhat.com>	2025-09-10 13:53:21 -07:00
nvjullin	37241077d5	[Misc] Removed force_fp8_e4m3fnuz from FP8LinearOp (#23725 ) Signed-off-by: Julien Lin <jullin@nvidia.com> Signed-off-by: Luka Govedič <ProExpertProg@users.noreply.github.com> Co-authored-by: Luka Govedič <ProExpertProg@users.noreply.github.com>	2025-09-04 09:25:40 -04:00
Michael Goin	906e461ed6	[CI Fix] Pin deepep and pplx tags in tools/ep_kernels/, gate multigpu tests (#23568 ) Signed-off-by: mgoin <mgoin64@gmail.com>	2025-08-25 18:29:00 -07:00
bigmoyan	582bbe6bd7	[Fix] correct tool_id for kimi-k2 when use tool_choice=required (#21259 ) Co-authored-by: wangzhengtao <wangzhengtao@msh.team>	2025-08-20 12:59:54 -07:00
afeldman-nm	bf7f470b22	[V1] Logits processors extensibility (#19912 ) Signed-off-by: Andrew Feldman <afeldman@redhat.com> Signed-off-by: Andrew Feldman <afeld2012@gmail.com> Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Nick Hill <nhill@redhat.com> Co-authored-by: Andrew Feldman <afeld2012@gmail.com> Co-authored-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>	2025-08-16 12:59:17 -07:00
yyweiss	baece8c3d2	[Frontend] Add unix domain socket support (#18097 ) Signed-off-by: <yyweiss@gmail.com> Signed-off-by: yyw <yyweiss@gmail.com>	2025-08-08 16:23:44 -07:00
TJian	1ee5ead5f8	[ROCm] [V1] [SpecDec] Enable Speculative Decoding on ROCm V1 Engine (#21496 ) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>	2025-08-07 19:13:17 -07:00
Ilya Markov	6e672daf62	Add FlashInfer allreduce RMSNorm Quant fusion (#21069 ) Signed-off-by: ilmarkov <imarkov@redhat.com> Signed-off-by: ilmarkov <markovilya197@gmail.com> Co-authored-by: ilmarkov <imarkov@redhat.com>	2025-07-31 13:58:38 -07:00
Dmitry Rogozhkin	e760fcef22	[XPU] Use spawn with XPU multiprocessing (#20649 ) Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>	2025-07-09 00:34:28 -07:00

1 2 3

123 Commits