这里写自定义目录标题
- win10 子系统 ubuntu GPU 安装 与 win10 GPU 机器学习性能对比
* - win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装
- win10 端 CUDA, CUDNN安装
- tensorflow 安装
- 性能对比
- win10 运行结果
- 子系统UBUNTU运行结果
-
win10 子系统 ubuntu GPU 安装 与 win10 GPU 机器学习性能对比
WSL 2 使用最新、最强大的虚拟化技术在轻量级实用工具虚拟机 (VM) 中运行 Linux 内核。
本文指导win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装过程和与 win10 GPU 机器学习性能对比。
两系统统一采用tensorflow 2.7.0, cuda 11.2, cudnn8.1
硬件: CPU AMD R7 5800H, GPU RTX 3050TI
win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装
win10 子系统 安装过程参考
https://blog.csdn.net/qq_33371133/article/details/107955261
ubuntu GPU驱动: 这是一个坑,不能直接下载linux gpu驱动,需要在win10端下载安装支持子系统和CUDA的驱动,它会覆盖win10原有驱动。下载链接https://developer.nvidia.com/cuda/wsl/download
CUDA, CUDNN安装 参考 https://zhuanlan.zhihu.com/p/72298520, 略过显卡驱动安装流程。选择安装 cuda 11.2, cudnn8.1
检查驱动是否安装完成:
win10端: cmd输入nvidia-smi:
子系统Ubuntu端: terminal端输入nvidia-smi:
二者应该一致。
; win10 端 CUDA, CUDNN安装
参考https://blog.csdn.net/qq_37296487/article/details/83028394, 略过驱动安装环节。
选择安装 cuda 11.2, cudnn8.1
tensorflow 安装
win10 端和子系统端都使用pip安装
pip install tensorflow
性能对比
cifar10 数据集的CNN 分类任务, 训练10遍
采用相同的代码:
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
(train_images, train_labels), (test_images, test_labels) = datasets.cifar10.load_data()
train_images, test_images = train_images / 255.0, test_images / 255.0
class_names = ['airplane', 'automobile', 'bird', 'cat', 'deer',
'dog', 'frog', 'horse', 'ship', 'truck']
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10))
model.summary()
model.compile(optimizer='adam',
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
history = model.fit(train_images, train_labels, epochs=10,
validation_data=(test_images, test_labels))
win10 运行结果
CPU 利用率 20-30%
GPU 峰值能跑到100%
1563/1563 [==============================] - 10s 5ms/step - loss: 1.4984 - accuracy: 0.4489 - val_loss: 1.2695 - val_accuracy: 0.5335
Epoch 2/10
1563/1563 [==============================] - 6s 4ms/step - loss: 1.1277 - accuracy: 0.5993 - val_loss: 1.1031 - val_accuracy: 0.6096
Epoch 3/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.9731 - accuracy: 0.6575 - val_loss: 0.9546 - val_accuracy: 0.6614
Epoch 4/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.8774 - accuracy: 0.6927 - val_loss: 0.9079 - val_accuracy: 0.6830
Epoch 5/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.8131 - accuracy: 0.7149 - val_loss: 0.8627 - val_accuracy: 0.6948
Epoch 6/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.7506 - accuracy: 0.7373 - val_loss: 0.8729 - val_accuracy: 0.6972
Epoch 7/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.7097 - accuracy: 0.7509 - val_loss: 0.8597 - val_accuracy: 0.7012
Epoch 8/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.6689 - accuracy: 0.7643 - val_loss: 0.8671 - val_accuracy: 0.7026
Epoch 9/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.6318 - accuracy: 0.7782 - val_loss: 0.8412 - val_accuracy: 0.7122
Epoch 10/10
1563/1563 [==============================] - 6s 4ms/step - loss: 0.5992 - accuracy: 0.7890 - val_loss: 0.8743 - val_accuracy: 0.7061
子系统UBUNTU运行结果
CPU 利用率 20-30%
GPU 峰值能跑到100%
和win10 端差不多
1563/1563 [==============================] - 11s 5ms/step - loss: 1.5182 - accuracy: 0.4468 - val_loss: 1.3254 - val_accuracy: 0.5321
Epoch 2/10
1563/1563 [==============================] - 7s 4ms/step - loss: 1.1464 - accuracy: 0.5937 - val_loss: 1.1226 - val_accuracy: 0.6122
Epoch 3/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.9849 - accuracy: 0.6550 - val_loss: 0.9455 - val_accuracy: 0.6695
Epoch 4/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.8819 - accuracy: 0.6905 - val_loss: 0.9230 - val_accuracy: 0.6782
Epoch 5/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.8085 - accuracy: 0.7167 - val_loss: 0.8923 - val_accuracy: 0.6935
Epoch 6/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.7483 - accuracy: 0.7376 - val_loss: 0.8511 - val_accuracy: 0.7101
Epoch 7/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.6985 - accuracy: 0.7561 - val_loss: 0.8586 - val_accuracy: 0.7066
Epoch 8/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.6536 - accuracy: 0.7702 - val_loss: 0.8609 - val_accuracy: 0.7061
Epoch 9/10
1563/1563 [==============================] - 7s 5ms/step - loss: 0.6108 - accuracy: 0.7855 - val_loss: 0.8639 - val_accuracy: 0.7188
Epoch 10/10
1563/1563 [==============================] - 7s 4ms/step - loss: 0.5790 - accuracy: 0.7963 - val_loss: 0.8540 - val_accuracy: 0.7163
结果对比
一开始我以为wsl2版的ubuntu子系统对显卡的支持不好,因为wsl一代根本就读不出来显卡233。没想到子系统居然可以100%调用显卡,并且性能损失不大(7.4 对比 6.4 秒,只慢了一秒!)。具体原因调研中,后期会更新...
问题: could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
子系统运行CNN的时候,报了一个这样的错误。
2021-11-25 18:10:44.356599: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:925] could not open file to read NUMA node: /sys/bus/pci/devices/0000:01:00.0/numa_node
参考: https://forums.developer.nvidia.com/t/numa-error-running-tensorflow-on-jetson-tx2/56119/4
就是说
不要在论坛发誓(骂人)。
NUMA 信息是无害警告。
Tensorflow 可以在出现警告的情况下正确运行。
所以无视就好。
欢迎提问。
Original: https://blog.csdn.net/BNGary/article/details/121539396
Author: 宋甘
Title: win10 子系统 ubuntu GPU驱动,CUDA, CUDNN安装与 win10 GPU 机器学习性能对比, numa_node 问题