[ROCm][CI] Mark gemma3 as large GPU test to avoid OOM on MI250 (#37610)
Signed-off-by: Andreas Karatzas <akaratza@amd.com>
This commit is contained in:
@@ -39,8 +39,7 @@
|
||||
#####################################################################################################################################
|
||||
# #
|
||||
# IMPORTANT: #
|
||||
# * Currently AMD CI has MI300 agents, MI325 agents, and MI355 agents. Of those, AMD is using mostly MI325 and MI355. AMD team #
|
||||
# is actively working on enabling more MI300 machines. All upcoming feature improvements are tracked in: #
|
||||
# * Currently AMD CI has MI250 agents, MI325 agents, and MI355 agents. All upcoming feature improvements are tracked in: #
|
||||
# https://github.com/vllm-project/vllm/issues/34994 #
|
||||
# #
|
||||
#-----------------------------------------------------------------------------------------------------------------------------------#
|
||||
@@ -49,13 +48,15 @@
|
||||
# * [Pytorch Nightly Dependency Override Check]: if this test fails, it means the nightly torch version is not compatible with #
|
||||
# some of the dependencies. Please check the error message and add the package to #
|
||||
# whitelist in `/vllm/tools/pre_commit/generate_nightly_torch_test.py`. #
|
||||
# * [Entrypoints Integration Test (LLM)]: #
|
||||
# * [Entrypoints Integration (LLM)]: #
|
||||
# - {`pytest -v -s entrypoints/llm/test_generate.py`}: It needs a clean process #
|
||||
# - {`pytest -v -s entrypoints/offline_mode`}: Needs to avoid interference with other tests #
|
||||
# * [V1 Test e2e + engine]: The test uses 4 GPUs, but we schedule it on 8-GPU machines for stability. See discussion here: #
|
||||
# https://github.com/vllm-project/vllm/pull/31040 #
|
||||
# * [V1 others]: #
|
||||
# - Split the tests to avoid interference #
|
||||
# * [Engine / Engine (1 GPU) / e2e Scheduling / e2e Core / V1 e2e / Spec Decode / V1 Sample + Logits / V1 Core + KV + Metrics]: #
|
||||
# - Previously a single "V1 Test e2e + engine" step, now split across multiple groups. #
|
||||
# - V1 e2e (2/4 GPUs) uses 4 GPUs but is scheduled on 8-GPU machines for stability. See: #
|
||||
# https://github.com/vllm-project/vllm/pull/31040 #
|
||||
# * [V1 Sample + Logits / V1 Core + KV + Metrics / V1 others (CPU)]: #
|
||||
# - Previously a single "V1 others" step, now split to avoid interference. #
|
||||
# - Integration test for streaming correctness (requires special branch for __harness__ lib). #
|
||||
# * [V1 others (CPU)]: Split the tests to avoid interference #
|
||||
# * [PyTorch Compilation Unit Tests]: Run unit tests defined directly under `compile/`, not including subdirectories, which #
|
||||
@@ -83,9 +84,9 @@
|
||||
# run plamo2 model in vLLM. #
|
||||
# * [Language Models Test (Extended Generation)]: Install fast path packages for testing against transformers (mamba, conv1d) #
|
||||
# and to run plamo2 model in vLLM. #
|
||||
# * [Multi-Modal Models (Standard)]: #
|
||||
# * [Multi-Modal Models (Standard) 1-4]: #
|
||||
# - Do NOT remove `VLLM_WORKER_MULTIPROC_METHOD=spawn` setting as ROCm requires this for certain models to function. #
|
||||
# * [Transformers Nightly Models Test]: Whisper needs `VLLM_WORKER_MULTIPROC_METHOD=spawn` to avoid deadlock. #
|
||||
# * [Transformers Nightly Models]: Whisper needs `VLLM_WORKER_MULTIPROC_METHOD=spawn` to avoid deadlock. #
|
||||
# * [Plugin Tests (2 GPUs)]: #
|
||||
# - {`pytest -v -s entrypoints/openai/test_oot_registration.py`}: It needs a clean process #
|
||||
# - {`pytest -v -s models/test_oot_registration.py`}: It needs a clean process #
|
||||
@@ -94,11 +95,11 @@
|
||||
# - There is some Tensor Parallelism related processing logic in LoRA that requires multi-GPU testing for validation. #
|
||||
# - {`pytest -v -s -x lora/test_gptoss_tp.py`}: Disabled for now because MXFP4 backend on non-cuda platform doesn't support #
|
||||
# LoRA yet. #
|
||||
# * [Distributed Tests (GPU_TAG)]: Don't test llama model here, it seems hf implementation is buggy. See: #
|
||||
# https://github.com/vllm-project/vllm/pull/5689 #
|
||||
# * [Distributed Tests (GPU_TAG)]: Some old E2E tests were removed in https://github.com/vllm-project/vllm/pull/33293 in #
|
||||
# favor of new tests in fusions_e2e. We avoid replicating the new jobs in #
|
||||
# this file as it's deprecated. #
|
||||
# * [Distributed Tests (NxGPUs)(HW-TAG)]: Don't test llama model here, it seems hf implementation is buggy. See: #
|
||||
# https://github.com/vllm-project/vllm/pull/5689 #
|
||||
# * [Distributed Tests (NxGPUs)(HW-TAG)]: Some old E2E tests were removed in https://github.com/vllm-project/vllm/pull/33293 #
|
||||
# in favor of new tests in fusions_e2e. We avoid replicating the new jobs in #
|
||||
# this file as it's deprecated. #
|
||||
# #
|
||||
#####################################################################################################################################
|
||||
|
||||
|
||||
@@ -220,7 +220,10 @@ VLM_TEST_SETTINGS = {
|
||||
vllm_runner_kwargs={
|
||||
"model_impl": "transformers",
|
||||
},
|
||||
marks=[pytest.mark.core_model],
|
||||
marks=[
|
||||
pytest.mark.core_model,
|
||||
*([large_gpu_mark(min_gb=80)] if current_platform.is_rocm() else []),
|
||||
],
|
||||
),
|
||||
"idefics3-transformers": VLMTestInfo(
|
||||
models=["HuggingFaceTB/SmolVLM-256M-Instruct"],
|
||||
|
||||
Reference in New Issue
Block a user