70 KiB
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:299]
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:299] █ █ █▄ ▄█
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:299] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.19.0
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:299] █▄█▀ █ █ █ █ model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:299] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:299]
(APIServer pid=1) INFO 04-15 22:38:39 [utils.py:233] non-default args: {'model_tag': 'nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4', 'enable_auto_tool_choice': True, 'tool_call_parser': 'qwen3_coder', 'host': '0.0.0.0', 'model': 'nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4', 'trust_remote_code': True, 'max_model_len': 1048576, 'enforce_eager': True, 'attention_backend': 'TRITON_ATTN', 'reasoning_parser': 'super_v3', 'reasoning_parser_plugin': '/opt/super_v3_reasoning_parser.py', 'tensor_parallel_size': 8, 'disable_custom_all_reduce': True, 'gpu_memory_utilization': 0.96, 'enable_prefix_caching': False, 'mamba_ssm_cache_dtype': 'float16', 'enable_chunked_prefill': True, 'disable_hybrid_kv_cache_manager': False, 'async_scheduling': True, 'max_cudagraph_capture_size': 128, 'kv_transfer_config': KVTransferConfig(kv_connector='LMCacheConnectorV1', engine_id='dea40998-1518-4361-a31f-884d3c1c1e74', kv_buffer_device='cuda', kv_buffer_size=1000000000.0, kv_role='kv_both', kv_rank=None, kv_parallel_size=1, kv_ip='127.0.0.1', kv_port=14579, kv_connector_extra_config={}, kv_connector_module_path=None, enable_permute_local_kv=False, kv_load_failure_policy='fail')}
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_80_TCP_ADDR
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_9091_TCP_PORT
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_9091_TCP
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_SERVICE_PORT
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_9091_TCP_PROTO
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_80_TCP
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_80_TCP_PORT
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_SERVICE_PORT_HTTP_MONITORING
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_SERVICE_PORT_LISTENER_80
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_SERVICE_HOST
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_80_TCP_PROTO
(APIServer pid=1) WARNING 04-15 22:38:39 [envs.py:1744] Unknown vLLM environment variable detected: VLLM_ROUTER_GATEWAY_PORT_9091_TCP_ADDR
(APIServer pid=1) A new version of the following files was downloaded from https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4:
(APIServer pid=1) - configuration_nemotron_h.py
(APIServer pid=1) . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
(APIServer pid=1) INFO 04-15 22:38:46 [model.py:549] Resolved architecture: NemotronHForCausalLM
(APIServer pid=1) WARNING 04-15 22:38:46 [model.py:2176] User-specified max_model_len (1048576) is greater than the derived max_model_len (max_position_embeddings=262144.0 or model_max_length=None in model's config.json). VLLM_ALLOW_LONG_MAX_MODEL_LEN must be used with extreme caution. If the model uses relative position encoding (RoPE), positions exceeding derived_max_model_len lead to nan. If the model uses absolute position encoding, positions exceeding derived_max_model_len will cause a CUDA array out-of-bounds error.
(APIServer pid=1) INFO 04-15 22:38:46 [model.py:1678] Using max model len 1048576
(APIServer pid=1) INFO 04-15 22:38:46 [cache.py:227] Using fp8 data type to store kv cache. It reduces the GPU memory footprint and boosts the performance. Meanwhile, it may cause accuracy drop without a proper scaling factor.
(APIServer pid=1) INFO 04-15 22:38:46 [scheduler.py:238] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=1) INFO 04-15 22:38:46 [config.py:281] Setting attention block size to 1056 tokens to ensure that attention page size is >= mamba page size.
(APIServer pid=1) INFO 04-15 22:38:46 [config.py:312] Padding mamba page size by 0.19% to ensure that mamba page size and attention page size are exactly equal.
(APIServer pid=1) WARNING 04-15 22:38:46 [modelopt.py:381] Detected ModelOpt fp8 checkpoint (quant_algo=FP8). Please note that the format is experimental and could change.
(APIServer pid=1) WARNING 04-15 22:38:46 [modelopt.py:998] Detected ModelOpt NVFP4 checkpoint. Please note that the format is experimental and could change in future.
(APIServer pid=1) INFO 04-15 22:38:46 [vllm.py:790] Asynchronous scheduling is enabled.
(APIServer pid=1) WARNING 04-15 22:38:46 [vllm.py:848] Enforce eager set, disabling torch.compile and CUDAGraphs. This is equivalent to setting -cc.mode=none -cc.cudagraph_mode=none
(APIServer pid=1) WARNING 04-15 22:38:46 [vllm.py:859] Inductor compilation was disabled by user settings, optimizations settings that are only active during inductor compilation will be ignored.
(APIServer pid=1) INFO 04-15 22:38:46 [vllm.py:1025] Cudagraph is disabled under eager mode
(APIServer pid=1) INFO 04-15 22:38:51 [compilation.py:290] Enabled custom fusions: norm_quant, act_quant, allreduce_rms
(EngineCore pid=277) INFO 04-15 22:38:58 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4', speculative_config=None, tokenizer='nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=1048576, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=True, quantization=modelopt_mixed, enforce_eager=True, enable_return_routed_experts=False, kv_cache_dtype=fp8_e4m3, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='super_v3', reasoning_parser_plugin='/opt/super_v3_reasoning_parser.py', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': [], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'encoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [128, 8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_asserts': False, 'scalar_asserts': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': True, 'fuse_act_quant': True, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': True}, 'max_cudagraph_capture_size': 0, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
(EngineCore pid=277) INFO 04-15 22:38:58 [multiproc_executor.py:134] DP group leader: node_rank=0, node_rank_within_dp=0, master_addr=127.0.0.1, mq_connect_ip=10.244.248.111 (local), world_size=8, local_world_size=8
(Worker pid=348) INFO 04-15 22:39:03 [parallel_state.py:1400] world_size=8 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=415) INFO 04-15 22:39:07 [parallel_state.py:1400] world_size=8 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=483) INFO 04-15 22:39:11 [parallel_state.py:1400] world_size=8 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=556) INFO 04-15 22:39:15 [parallel_state.py:1400] world_size=8 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=629) INFO 04-15 22:39:19 [parallel_state.py:1400] world_size=8 rank=4 local_rank=4 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=702) INFO 04-15 22:39:23 [parallel_state.py:1400] world_size=8 rank=5 local_rank=5 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=775) INFO 04-15 22:39:27 [parallel_state.py:1400] world_size=8 rank=6 local_rank=6 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=848) INFO 04-15 22:39:31 [parallel_state.py:1400] world_size=8 rank=7 local_rank=7 distributed_init_method=tcp://127.0.0.1:36625 backend=nccl
(Worker pid=348) INFO 04-15 22:39:31 [pynccl.py:111] vLLM is using nccl==2.28.9
(Worker pid=348) INFO 04-15 22:39:36 [parallel_state.py:1716] rank 0 in world size 8 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank 0, EPLB rank N/A
(Worker_TP0 pid=348) INFO 04-15 22:39:37 [gpu_model_runner.py:4735] Starting to load model nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4...
(Worker_TP0 pid=348) INFO 04-15 22:39:38 [init.py:261] Selected FlashInferFP8ScaledMMLinearKernel for ModelOptFp8LinearMethod
(Worker_TP0 pid=348) INFO 04-15 22:39:38 [deep_gemm.py:115] DeepGEMM E8M0 enabled on current platform.
(Worker_TP0 pid=348) INFO 04-15 22:39:38 [nvfp4_utils.py:85] Using NvFp4LinearBackend.FLASHINFER_CUTLASS for NVFP4 GEMM
(Worker_TP0 pid=348) INFO 04-15 22:39:38 [nvfp4.py:256] Using 'FLASHINFER_TRTLLM' NvFp4 MoE backend out of potential backends: ['FLASHINFER_TRTLLM', 'FLASHINFER_CUTEDSL', 'FLASHINFER_CUTLASS', 'VLLM_CUTLASS', 'MARLIN'].
(Worker_TP1 pid=415) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP0 pid=348) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP4 pid=629) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP2 pid=483) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP6 pid=775) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP7 pid=848) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP3 pid=556) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP5 pid=702) INFO 04-15 22:39:38 [cuda.py:274] Using AttentionBackendEnum.TRITON_ATTN backend.
(Worker_TP1 pid=415) INFO 04-15 22:41:36 [weight_utils.py:581] Time spent downloading weights for nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4: 116.048954 seconds
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 0% Completed | 0/17 [00:00<?, ?it/s]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 6% Completed | 1/17 [00:01<00:20, 1.29s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 12% Completed | 2/17 [00:03<00:23, 1.60s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 18% Completed | 3/17 [00:04<00:21, 1.52s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 24% Completed | 4/17 [00:06<00:20, 1.58s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 29% Completed | 5/17 [00:07<00:17, 1.49s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 35% Completed | 6/17 [00:08<00:16, 1.47s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 41% Completed | 7/17 [00:10<00:14, 1.42s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 47% Completed | 8/17 [00:11<00:12, 1.39s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 53% Completed | 9/17 [00:13<00:11, 1.40s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 59% Completed | 10/17 [00:14<00:10, 1.47s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 65% Completed | 11/17 [00:16<00:08, 1.45s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 71% Completed | 12/17 [00:17<00:07, 1.44s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 76% Completed | 13/17 [00:18<00:05, 1.43s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 82% Completed | 14/17 [00:20<00:04, 1.43s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 88% Completed | 15/17 [00:21<00:02, 1.30s/it]
(Worker_TP0 pid=348)
Loading safetensors checkpoint shards: 100% Completed | 17/17 [00:21<00:00, 1.26s/it]
(Worker_TP0 pid=348)
(Worker_TP0 pid=348) INFO 04-15 22:41:59 [default_loader.py:384] Loading weights took 21.38 seconds
(Worker_TP0 pid=348) INFO 04-15 22:41:59 [flashinfer_utils.py:238] Padding intermediate size from 336 to 384 for up/down projection weights.
(Worker_TP0 pid=348) INFO 04-15 22:41:59 [nvfp4.py:401] Using MoEPrepareAndFinalizeNoDPEPMonolithic
(Worker_TP0 pid=348) WARNING 04-15 22:41:59 [kv_cache.py:94] Checkpoint does not provide a q scaling factor. Setting it to k_scale. This only matters for FP8 Attention backends (flash-attn or flashinfer).
(Worker_TP0 pid=348) WARNING 04-15 22:41:59 [kv_cache.py:108] Using KV cache scaling factor 1.0 for fp8_e4m3. If this is unintended, verify that k/v_scale scaling factors are properly set in the checkpoint.
(Worker_TP0 pid=348) INFO 04-15 22:42:01 [gpu_model_runner.py:4820] Model loading took 10.4 GiB memory and 142.225157 seconds
(Worker_TP0 pid=348) INFO 04-15 22:42:10 [gpu_worker.py:436] Available KV cache memory: 158.16 GiB
(EngineCore pid=277) INFO 04-15 22:42:11 [kv_cache_utils.py:1319] GPU KV cache size: 13,819,872 tokens
(EngineCore pid=277) INFO 04-15 22:42:11 [kv_cache_utils.py:1324] Maximum concurrency for 1,048,576 tokens per request: 78.68x
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP7 pid=848) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP6 pid=775) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP3 pid=556) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP0 pid=348) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP4 pid=629) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP2 pid=483) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP5 pid=702) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] WorkerProc hit an exception.
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] Traceback (most recent call last):
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 944, in worker_busy_loop
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] output = func(*args, **kwargs)
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/worker_base.py", line 306, in initialize_from_config
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] self.worker.initialize_from_config(kv_cache_config) # type: ignore
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] return func(*args, **kwargs)
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 527, in initialize_from_config
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ensure_kv_transfer_initialized(self.vllm_config, kv_cache_config)
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_transfer_state.py", line 67, in ensure_kv_transfer_initialized
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] _KV_CONNECTOR_AGENT = KVConnectorFactory.create_connector(
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/kv_transfer/kv_connector/factory.py", line 59, in create_connector
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] raise ValueError(
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949] ValueError: Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.
(Worker_TP1 pid=415) ERROR 04-15 22:42:11 [multiproc_executor.py:949]
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] EngineCore failed to start.
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] Traceback (most recent call last):
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in init
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] super().init(
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 124, in init
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] return func(*args, **kwargs)
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 280, in _initialize_kv_caches
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 117, in initialize_from_config
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] self.collective_rpc("initialize_from_config", args=(kv_cache_configs,))
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 397, in collective_rpc
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] return aggregate(get_response())
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] ^^^^^^^^^^^^^^
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 380, in get_response
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] raise RuntimeError(
(EngineCore pid=277) ERROR 04-15 22:42:11 [core.py:1108] RuntimeError: Worker failed with error 'Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.', please check the stack trace above for the root cause
(Worker_TP3 pid=556) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP0 pid=348) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP6 pid=775) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP7 pid=848) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP4 pid=629) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP2 pid=483) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP1 pid=415) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP5 pid=702) WARNING 04-15 22:42:11 [multiproc_executor.py:871] WorkerProc was terminated
(EngineCore pid=277) ERROR 04-15 22:42:14 [multiproc_executor.py:273] Worker proc VllmWorker-3 died unexpectedly, shutting down executor.
(EngineCore pid=277) Process EngineCore:
(EngineCore pid=277) Traceback (most recent call last):
(EngineCore pid=277) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore pid=277) self.run()
(EngineCore pid=277) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore pid=277) self._target(*self._args, **self._kwargs)
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1112, in run_engine_core
(EngineCore pid=277) raise e
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 1082, in run_engine_core
(EngineCore pid=277) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore pid=277) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=277) return func(*args, **kwargs)
(EngineCore pid=277) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 848, in init
(EngineCore pid=277) super().init(
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 124, in init
(EngineCore pid=277) kv_cache_config = self._initialize_kv_caches(vllm_config)
(EngineCore pid=277) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore pid=277) return func(*args, **kwargs)
(EngineCore pid=277) ^^^^^^^^^^^^^^^^^^^^^
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 280, in _initialize_kv_caches
(EngineCore pid=277) self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 117, in initialize_from_config
(EngineCore pid=277) self.collective_rpc("initialize_from_config", args=(kv_cache_configs,))
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 397, in collective_rpc
(EngineCore pid=277) return aggregate(get_response())
(EngineCore pid=277) ^^^^^^^^^^^^^^
(EngineCore pid=277) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 380, in get_response
(EngineCore pid=277) raise RuntimeError(
(EngineCore pid=277) RuntimeError: Worker failed with error 'Connector LMCacheConnectorV1 does not support HMA but HMA is enabled. Please set --disable-hybrid-kv-cache-manager.', please check the stack trace above for the root cause
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 75, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 670, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 684, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 154, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 130, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=1) return func(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 887, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 535, in init
(APIServer pid=1) with launch_core_engines(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 998, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 1057, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 8 leaked shared_memory objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '