Verifying MACA 3.2.1 + VLLM 0.11.0 Installation

Dec 3, 2025 by Alex Johnson 48 views

Verifying the Installation of MACA 3.2.1 with vLLM 0.11.0

In this comprehensive guide, we'll walk you through the environment verification process for MACA 3.2.1 and vLLM 0.11.0. This article provides a detailed report on the successful installation and validation of the vLLM environment using MACA 3.2.1 and PyTorch 2.6. We'll cover everything from system specifications and environment details to installation steps and verification results. Whether you're a seasoned developer or just getting started, this guide will help you ensure your environment is set up correctly for optimal performance.

Task Overview

This Level 1 environment verification task focuses on ensuring that the core components are correctly installed and functioning. The primary goal is to validate the installation of vLLM from source using MACA 3.2.1 and a PyTorch 2.6 image. This involves a series of steps, from configuring the environment to running inference tests. Proper validation at this stage is crucial for the stability and performance of subsequent tasks and applications.

Task Level: Level 1 - Environment and Basic Verification
Task Description: Verify vLLM source installation using MACA 3.2.1 + PyTorch 2.6 image.
Submission Date: 2025-12-03

Environmental Setup

Before diving into the installation process, it's essential to understand the environment in which vLLM will operate. This includes both system and software specifications. A well-documented environment setup ensures reproducibility and helps in troubleshooting any issues that may arise during or after installation. Below are the detailed specifications of the environment used for this verification.

System Information

The system information provides an overview of the hardware and operating system on which the validation is performed. This includes details about the OS, kernel, and architecture, which are critical for ensuring compatibility and performance.

Item	Value
OS	Ubuntu 24.04.1 LTS
Kernel	5.15.0-58-generic
Architecture	x86_64

MACA Environment

The MACA environment details the specific versions of MACA components installed, including MACA Version, MX-SMI Version, and Driver Version. These components are integral to the functionality of the MetaX GPUs and need to be correctly configured.

Item	Value
MACA Version	3.2.1.10
MX-SMI Version	2.2.9
Driver Version	3.0.11
BIOS Version	1.27.5.0

GPU Information

The GPU information is perhaps the most crucial part of the environment setup, especially for a library like vLLM that heavily relies on GPU resources. It details the GPU model, memory, and status, ensuring that the GPU is correctly recognized and available for computations. The MetaX C500 GPUs are specifically designed for high-performance computing and AI workloads, making them ideal for vLLM.

Item	Value
GPU Model	MetaX C500
Memory	15.2 GB (sGPU)
GPU State	Available

Python Environment

The Python environment forms the software foundation for vLLM. It includes the Python version, PyTorch version, and CUDA availability. vLLM requires specific versions of these components to function correctly, and any discrepancies can lead to compatibility issues. In this setup, Python 3.10.10 and PyTorch 2.6.0+metax3.2.1.3 are used, with CUDA enabled to leverage GPU acceleration. Ensuring CUDA is available is crucial for GPU-accelerated computations, which significantly speed up vLLM's inference tasks.

Item	Value
Python	3.10.10
PyTorch	2.6.0+metax3.2.1.3
CUDA Available	True

vLLM Version

The versions of vLLM and its MetaX plugin are critical for ensuring compatibility and accessing specific features. The table below outlines the exact versions used in this validation.

Package Name	Version
vllm	0.11.1.dev0+gb8b302cde.d20251203.empty
vllm_metax	0.11.0+gg8bbcb9.d20251203.maca3.2.1.10.torch2.6

Installation Process

The installation process involves several key steps, from configuring environment variables to building vLLM and its MetaX plugin. Each step is crucial for ensuring a smooth and successful installation. Below is a detailed breakdown of the steps taken to install vLLM and its dependencies.

1. Environment Variable Configuration

Configuring environment variables is the first step in setting up the environment for vLLM. These variables define the paths to essential libraries and tools, ensuring that the system can locate and use them correctly. Incorrectly configured environment variables can lead to runtime errors and prevent vLLM from functioning correctly. The following environment variables were configured:

export MACA_PATH=/opt/maca
export CUCC_PATH=${MACA_PATH}/tools/cu-bridge
export CUDA_PATH=/root/cu-bridge/CUDA_DIR
export CUCC_CMAKE_ENTRY=2
export PATH=${MACA_PATH}/mxgpu_llvm/bin:${MACA_PATH}/bin:${CUCC_PATH}/tools:${CUCC_PATH}/bin:${PATH}
export LD_LIBRARY_PATH=${MACA_PATH}/lib:${MACA_PATH}/ompi/lib:${MACA_PATH}/mxgpu_llvm/lib:${LD_LIBRARY_PATH}
export VLLM_INSTALL_PUNICA_KERNELS=1

MACA_PATH: Specifies the root directory for MACA.
CUCC_PATH: Points to the CUCC (CUDA Compatibility Compiler) bridge tools.
CUDA_PATH: Defines the path to the CUDA installation directory.
CUCC_CMAKE_ENTRY: Sets the entry point for CUCC CMake.
PATH: Includes various binary directories in the system's executable path.
LD_LIBRARY_PATH: Specifies the paths to shared libraries.
VLLM_INSTALL_PUNICA_KERNELS: Enables the installation of Punica kernels for vLLM.

2. Cloning and Building vLLM (empty device)

The next step involves cloning the vLLM repository from GitHub and building it for an empty device. This ensures that vLLM is compiled against the correct CUDA and PyTorch versions. The use_existing_torch.py script ensures that vLLM uses the existing PyTorch installation instead of attempting to install its own version. This is crucial for maintaining compatibility with the MACA environment.

git clone --depth 1 --branch v0.11.0 https://github.com/vllm-project/vllm
cd vllm
python use_existing_torch.py
pip install -r requirements/build.txt
VLLM_TARGET_DEVICE=empty pip install -v . --no-build-isolation

Cloning the vLLM repository from GitHub.
Navigating into the vllm directory.
Running use_existing_torch.py to ensure compatibility with the existing PyTorch installation.
Installing build requirements from requirements/build.txt.
Installing vLLM with the VLLM_TARGET_DEVICE environment variable set to empty and disabling build isolation.

3. Cloning and Building vLLM-MetaX Plugin

The vLLM-MetaX plugin integrates vLLM with the MetaX hardware, allowing it to leverage the specialized features of the MetaX GPUs. This step involves cloning the vLLM-MetaX repository, installing its dependencies, and initializing the plugin. The use_existing_metax.py script ensures that the plugin is built against the correct MetaX libraries.

git clone --depth 1 --branch v0.11.0-dev https://github.com/MetaX-MACA/vLLM-metax
cd vLLM-metax
python use_existing_metax.py
pip install -r requirements/build.txt
pip install . -v --no-build-isolation
vllm_metax_init  # Initialize the plugin

Cloning the vLLM-MetaX repository from GitHub.
Navigating into the vLLM-metax directory.
Running use_existing_metax.py to ensure compatibility with MetaX libraries.
Installing build requirements from requirements/build.txt.
Installing the vLLM-MetaX plugin with verbose output and disabling build isolation.
Initializing the plugin using vllm_metax_init.

Verification Results

After completing the installation, it's essential to verify that all components are functioning correctly. This involves running tests to check PyTorch GPU support, vLLM import, and offline inference. These tests provide confidence that the environment is set up correctly and ready for more complex tasks.

PyTorch GPU Test

The PyTorch GPU test verifies that PyTorch can detect and utilize the MetaX GPU. This is a fundamental check to ensure that GPU acceleration is available for vLLM. The output provides information about the PyTorch version, CUDA availability, device count, device name, and device memory. A simple computation test confirms that the GPU can perform calculations correctly.

PyTorch version: 2.6.0+metax3.2.1.3
CUDA available: True
Device count: 1
Device name: MetaX C500
Device memory: 15.2 GB
GPU computation test: PASSED

vLLM Import Test

The vLLM import test checks whether vLLM can be imported successfully into a Python script. This ensures that the vLLM library and its dependencies are correctly installed and accessible. The output also shows the available plugins for vLLM, including the MetaX plugin.

INFO: Available plugins for group vllm.platform_plugins:
INFO: - metax -> vllm_metax:register
INFO: Platform plugin metax is activated
vLLM import successful!

Offline Inference Test

The offline inference test is a more comprehensive validation step that involves running vLLM to generate text from a pre-trained model. This test verifies the entire pipeline, from model loading to text generation. The Qwen3-0.6B model is used for this test, and the output is evaluated to ensure that vLLM is producing coherent and contextually relevant text.

Model: Qwen3-0.6B
Test Status: ✅ PASSED

Test Code

The test code uses the vllm library to load the Qwen3-0.6B model and generate text based on a set of prompts. The SamplingParams class is used to configure the text generation parameters, such as temperature and maximum tokens.

from vllm import LLM, SamplingParams

llm = LLM(model='/mnt/moark-models/Qwen3-0.6B', trust_remote_code=True)
sampling_params = SamplingParams(temperature=0.7, top_p=0.95, max_tokens=100)

prompts = [
    'Hello, my name is',
    'The capital of China is',
    'Write a simple Python function to add two numbers:',
]

outputs = llm.generate(prompts, sampling_params)

Inference Output Example

The inference output provides examples of the text generated by vLLM for each prompt. This output is manually reviewed to ensure the generated text is coherent and contextually relevant.

Prompt	Generated Text (Excerpt)
"Hello, my name is"	"Josh and I'm in the middle of a project..."
"The capital of China is"	"the most important and the most significant external factor..."
"Write a simple Python function..."	"add_two_numbers(a, b). The function should add a and b..."

Performance Metrics

Performance metrics provide insights into the efficiency of vLLM. These metrics include the time taken to load the model, the memory used by the KV cache, and the estimated throughput. A high throughput indicates that vLLM is processing requests efficiently.

Model loading: 1.12 GiB, 0.89 seconds
KV cache memory: 10.84 GiB
GPU KV cache size: 101,440 tokens
Estimated throughput: ~227 tokens/s output

Key Log Extracts

Key log extracts provide detailed information about the vLLM initialization and inference process. These logs are useful for troubleshooting and understanding the internal workings of vLLM. The log extracts below show the successful initialization of the V1 LLM engine, the use of Flash Attention, and the memory allocation for the KV cache.

INFO: Initializing a V1 LLM engine (v0.10.0) with config:
  - model='/mnt/moark-models/Qwen3-0.6B'
  - dtype=torch.bfloat16
  - tensor_parallel_size=1

INFO: Using Maca version of flash attention, which only supports version 2.
INFO: Using Flash Attention backend on V1 engine.
INFO: Loading weights took 0.66 seconds
INFO: Model loading took 1.1201 GiB and 0.889975 seconds
INFO: torch.compile takes 54.81 s in total
INFO: Available KV cache memory: 10.84 GiB
INFO: GPU KV cache size: 101,440 tokens

Processed prompts: 100%|██████████| 3/3 [00:01<00:00, 2.27it/s]
=== vLLM MACA Inference Test PASSED! ===

Conclusion

Based on the verification results, the following conclusions can be drawn:

✅ MACA 3.2.1 environment installed successfully.
✅ PyTorch 2.6 + MACA integration is normal.
✅ vLLM v0.11.0 source code installed successfully.
✅ vLLM-MetaX MACA plugin installed successfully.
✅ MetaX C500 GPU recognition is normal.
✅ Offline inference test passed.

Level 1 environment verification task completed successfully!

This comprehensive validation confirms that the environment is correctly set up and ready for further development and deployment of vLLM-based applications. The successful installation of vLLM and its MetaX plugin, along with the verification tests, provide a solid foundation for future tasks.

Attachments

This report in Markdown format.
Verification scripts and logs.

For further reading and a deeper understanding of vLLM, you might find the official vLLM documentation helpful. You can access it here: **vLLM Official Documentation.