如何在NVIDIA GPU上进行性能压测,需要部署哪些工具?本文将介绍如何进行性能压测,以及需要部署哪些工具。
NVIDIA【Data Center GPU Manager】#
基础环境:
- Ubuntu 22.04;内核5.15.0-119-generic
- NVIDIA GPU
- Mellanox
- GitHub【科学网络】
一、基础环境#
1.1 配置源#
- 备份原有 sources.list(若已存在则不重复覆盖)
[ -f /etc/apt/sources.list ] && cp -n /etc/apt/sources.list /etc/apt/sources.list.bak
- 写入阿里云 Ubuntu 22.04 (jammy) 镜像源
cat <<'EOF' > /etc/apt/sources.list
deb https://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb https://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb https://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
# deb https://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
# deb-src https://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
deb https://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
EOF
- 更新源
sudo apt update
1.2 内核包#
注意: 内核相关的包一定要保持与当前系统内核版本号一致。
apt install linux-image-5.15.0-119-generic linux-headers-5.15.0-119-generic linux-tools-5.15.0-119-generic linux-cloud-tools-5.15.0-119-generic
二、 NVIDIA DCGM#
- 安装
# 1. 下载repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
# 2. 安装
dpkg -i cuda-keyring_1.1-1_all.deb
# 3. 创建源列表
apt update
# 4. 安装;请注意后续其他包之间版本依赖问题,
# apt-cache madison 命令查看所有版本
apt install datacenter-gpu-manager-4-cuda12
apt install datacenter-gpu-manager-4-multinode-cuda12
apt install datacenter-gpu-manager-4-dev
结语#
参考:


