/ 中存储网

告诉你Ubuntu 14.04安装配置CUDA的方法及命令

2015-07-05 20:10:17 来源:中存储网

首先,我装的系统是Ubuntu 14.04.1。

1. 预检查

按照参考链接1中所示,检查系统。

执行命令:

:~$ lspci | grep -i nvidia
03:00.0 3D controller: NVIDIA Corporation GK110GL [Tesla K20c] (rev a1)
04:00.0 VGA compatible controller: NVIDIA Corporation GK106GL [Quadro K4000] (rev a1)
04:00.1 Audio device: NVIDIA Corporation GK106 HDMI Audio Controller (rev a1)

发现有K20和K4000两块GPU,还有一块Audio的应该是声卡。

然后,执行命令检查系统版本:

~$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.1 LTS"
NAME="Ubuntu"
VERSION="14.04.1 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.1 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"

可以看到,机器是ubuntu14.04的版本。

然后,使用gcc --version检查gcc版本是否符合链接1中的要求:

~$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

检查完毕,就去nvidia的官网(参考链接3)上下载驱动,为下载的是ubuntu14.04的deb包。

2. 安装
Deb包安装较为简单,但是安装过程中提示不稳定,不过用着也没啥出错的地方。

先按照参考链接2安装必要的库。

sudo apt-get install freeglut3-dev build-essential libx11-dev libxmu-dev libxi-dev libgl1-mesa-glx libglu1-mesa libglu1-mesa-dev

还是按照官网上的流程来。

$ sudo dpkg -i cuda-repo-<distro>_<version>_<architecture>.deb
$ sudo apt-get update
$ sudo apt-get install cuda

可能需要下载较长时间,但是没关系,放在那等着就是。

没啥问题就算安装好了。

安装过程中提示:

*** Please reboot your computer and verify that the nvidia graphics driver is loaded. ***
*** If the driver fails to load, please use the NVIDIA graphics driver .run installer ***
*** to get into a stable state.

我没管,提示使用.run安装比较稳定,但我现在用着没问题。

3. 配置环境

我的系统是64位的,因此配置环境时在.bashrc中加入

$ export PATH=/usr/local/cuda-6.5/bin:$PATH
$ export LD_LIBRARY_PATH=/usr/local/cuda-6.5/lib64:$LD_LIBRARY_PATH

配置完环境后,执行命令

~$ source .bashrc

使其立刻生效。

4. 安装sample

配置好环境后,可以执行如下命令:

$ cuda-install-samples-6.5.sh <dir>

这样,就将cuda的sample拷贝到dir文件夹下了。该命令只是一个拷贝操作。

然后进入该文件夹,执行make命令进行编译,编译时间较长,需要等待。

5. 验证安装是否成功

5.1. 驱动验证

首先,验证nvidia的驱动是否安装成功。

~$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  340.29  Thu Jul 31 20:23:19 PDT 2014
GCC version:  gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1)

5.2. Toolkit验证

验证cuda toolkit是否成功。

~$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2014 NVIDIA Corporation
Built on Thu_Jul_17_21:41:27_CDT_2014
Cuda compilation tools, release 6.5, V6.5.12

5.3. 设备识别

使用cuda sample已经编译好的deviceQuery来验证。deviceQuery在<cuda_sample_install_path>/bin/x_86_64/linux/release目录下。我的结果如下,检测出了两块GPU来。

~/install/NVIDIA_CUDA-6.5_Samples/bin/x86_64/linux/release$ ./deviceQuery
./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 2 CUDA Capable device(s)
 
Device 0: "Tesla K20c"
CUDA Driver Version / Runtime Version          6.5 / 6.5
CUDA Capability Major/Minor version number:    3.5
Total amount of global memory:                4800 MBytes (5032706048 bytes)
(13) Multiprocessors, (192) CUDA Cores/MP:    2496 CUDA Cores
GPU Clock rate:                                706 MHz (0.71 GHz)
Memory Clock rate:                            2600 Mhz
Memory Bus Width:                              320-bit
L2 Cache Size:                                1310720 bytes
Maximum Texture Dimension Size (x,y,z)        1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
Total amount of constant memory:              65536 bytes
Total amount of shared memory per block:      49152 bytes
Total number of registers available per block: 65536
Warp size:                                    32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block:          1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                            512 bytes
Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
Run time limit on kernels:                    No
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:      Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Enabled
Device supports Unified Addressing (UVA):      Yes
Device PCI Bus ID / PCI location ID:          3 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
 
Device 1: "Quadro K4000"
CUDA Driver Version / Runtime Version          6.5 / 6.5
CUDA Capability Major/Minor version number:    3.0
Total amount of global memory:                3071 MBytes (3220504576 bytes)
( 4) Multiprocessors, (192) CUDA Cores/MP:    768 CUDA Cores
GPU Clock rate:                                811 MHz (0.81 GHz)
Memory Clock rate:                            2808 Mhz
Memory Bus Width:                              192-bit
L2 Cache Size:                                393216 bytes
Maximum Texture Dimension Size (x,y,z)        1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
Total amount of constant memory:              65536 bytes
Total amount of shared memory per block:      49152 bytes
Total number of registers available per block: 65536
Warp size:                                    32
Maximum number of threads per multiprocessor:  2048
Maximum number of threads per block:          1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch:                          2147483647 bytes
Texture alignment:                            512 bytes
Concurrent copy and kernel execution:          Yes with 1 copy engine(s)
Run time limit on kernels:                    Yes
Integrated GPU sharing Host Memory:            No
Support host page-locked memory mapping:      Yes
Alignment requirement for Surfaces:            Yes
Device has ECC support:                        Disabled
Device supports Unified Addressing (UVA):      Yes
Device PCI Bus ID / PCI location ID:          4 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
> Peer access from Tesla K20c (GPU0) -> Quadro K4000 (GPU1) : No
> Peer access from Quadro K4000 (GPU1) -> Tesla K20c (GPU0) : No
 
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.5, CUDA Runtime Version = 6.5, NumDevs = 2, Device0 = Tesla K20c, Device1 = Quadro K4000
Result = PASS

这样,cuda就安装成功了。

参考链接
1.http://docs.nvidia.com/cuda/cuda-getting-started-guide-for-linux/index.html
3.https://developer.nvidia.com/cuda-downloads