跳过正文
NVIDIA【nvbandwidth】
  1. 运维日记/

NVIDIA【nvbandwidth】

·569 字·2 分钟·
目录
nvidia - 这篇文章属于一个选集。
§ 8: 本文

如何在NVIDIA GPU上进行性能压测,需要部署哪些工具?本文将介绍如何进行性能压测,以及需要部署哪些工具。

NVIDIA【性能压测】
#

基础环境:

  • Ubuntu 22.04;内核5.15.0-119-generic
  • NVIDIA GPU
  • Mellanox
  • GitHub【科学网络】

一、基础环境
#

1.1 配置源
#

  1. 备份原有 sources.list(若已存在则不重复覆盖)
[ -f /etc/apt/sources.list ] && cp -n /etc/apt/sources.list /etc/apt/sources.list.bak
  1. 写入阿里云 Ubuntu 22.04 (jammy) 镜像源
cat <<'EOF' > /etc/apt/sources.list
deb https://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy main restricted universe multiverse

deb https://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy-security main restricted universe multiverse

deb https://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy-updates main restricted universe multiverse

# deb https://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse
# deb-src https://mirrors.aliyun.com/ubuntu/ jammy-proposed main restricted universe multiverse

deb https://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
deb-src https://mirrors.aliyun.com/ubuntu/ jammy-backports main restricted universe multiverse
EOF
  1. 更新源
sudo apt update

1.2 内核包
#

apt install linux-image-5.15.0-119-generic linux-headers-5.15.0-119-generic linux-tools-5.15.0-119-generic linux-cloud-tools-5.15.0-119-generic

二、nvbandwidth
#

依赖关系
#

  1. CUDA toolkit: 版本大于11.X,Multinode version requires 12.3 toolkit and 550 driver or above.
  2. Install a compiler package which supports c++17. GCC 7.x or above is a possible option.
  3. Install cmake (version 3.20 or above). Cmake version 3.24 or newer is encouraged.
  4. Install Boost program options library (More details in the next section)

编译安装
#

  • build the executable for single-node: cmake ..
  • build multinode version of nvbandwidth: cmake -DMULTINODE=1 ..
  1. 创建脚本 nvbandwidth-install.sh
#!/usr/bin/env bash
set -euo pipefail

TAG="v0.8"
REPO="https://github.com/NVIDIA/nvbandwidth.git"
BIN_NAME="nvbandwidth"
PREFIX="/usr/bin"

WORKDIR="$(mktemp -d /tmp/nvbandwidth.XXXXXX)"

cleanup() {
  rm -rf "$WORKDIR"
}
trap cleanup EXIT

echo "==> Working directory: $WORKDIR"

echo "==> Install build dependencies"
apt update
apt install -y \
  build-essential \
  libboost-program-options-dev \
  cmake \
  git

echo "==> Check cmake version"
cmake_ver=$(cmake --version | sed -n '1p' | sed 's/[^0-9.]//g')
if [ "$(printf '%s\n' 3.20 "$cmake_ver" | sort -V | head -n1)" != "3.20" ]; then
  echo "ERROR: cmake >= 3.20 required, current: $cmake_ver"
  exit 1
fi

echo "==> Check CUDA"
if ! command -v nvcc >/dev/null 2>&1; then
  echo "ERROR: nvcc not found (CUDA required)"
  exit 1
fi

echo "==> Clone nvbandwidth tag $TAG"
git clone --branch "$TAG" --depth 1 "$REPO" "$WORKDIR/nvbandwidth"

echo "==> Build"
cd "$WORKDIR/nvbandwidth"
mkdir -p build
cd build
# build the executable for single-node:
cmake ..
# build multinode version of nvbandwidth
# cmake -DMULTINODE=1 ..
make -j"$(nproc)"

echo "==> Install binary to $PREFIX"
install -m 0755 "$BIN_NAME" "$PREFIX/$BIN_NAME"

echo "==> Verify"
"$PREFIX/$BIN_NAME" --help || true

echo "==> Build finished, source cleaned"
  1. 运行脚本
bash nvbandwidth-install.sh

结语
#

参考:

nvidia - 这篇文章属于一个选集。
§ 8: 本文

相关文章


微信赞赏
微信赞赏
关注公众号
关注公众号
支付宝赞赏
支付宝赞赏