1、驱动准备
从NVIDIA网站(https://nvid.nvidia.com/ 需有购买NVIDIA账号登陆访问)下载对应驱动包或者第三方下载
2、NVIDIA常用链接
显卡和驱动版本匹配查询地址:https://docs.nvidia.com/grid/gpus-supported-by-vgpu.html
GPU1,2, 3 | vGPU 17 | vGPU 164 | vGPU 15 | vGPU 14 | vGPU 13 | vGPU 12 | vGPU 11 | vGPU 10 | vGPU 9 | vGPU 8 | vGPU 7 | vGPU 6 | vGPU 5 | GRID 4 | GRID 3 | GRID 2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NVIDIA A800 PCIe 80GB | – | – | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A800 PCIe 80GB liquid cooled | – | – | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A800 HGX 80GB | – | – | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A100 HGX 80GB | – | – | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – |
NVIDIA A100 PCIe 80GB | – | – | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A100 PCIe 80GB liquid cooled | – | – | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A100X | – | – | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A100 HGX 40GB | – | – | ✔ | ✔ | ✔ | ✔ | ✔10 | – | – | – | – | – | – | – | – | – |
NVIDIA A100 PCIe 40GB | – | – | ✔ | ✔ | ✔ | ✔ | ✔10 | – | – | – | – | – | – | – | – | – |
NVIDIA A40 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – |
NVIDIA A30 | – | – | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A30X | – | – | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A16 | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA A10 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔9 | – | – | – | – | – | – | – | – | – | – |
NVIDIA A2 | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA H800 PCIe 80GB6 | – | – | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA H100 PCIe 80GB | – | – | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA L40S4 | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA L407 | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA L208 | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA L20 liquid cooled9 | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA L46 | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA L28 | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA RTX 5000 Ada4 | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA RTX 5880 Ada | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA RTX 6000 Ada7 | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA RTX A6000 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – |
NVIDIA RTX A5500 | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – | – | – | – | – |
NVIDIA RTX A5000 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔9 | – | – | – | – | – | – | – | – | – | – |
Quadro RTX 8000 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – |
Quadro RTX 8000 passive | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – |
Quadro RTX 6000 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – |
Quadro RTX 6000 passive | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – | – | – | – | – |
Tesla V100 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – | – |
Tesla T4 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔11 | – | – | – | – | – |
Tesla P100 | – | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – |
Tesla P40 | – | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – |
Tesla P6 | – | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – |
Tesla P4 | – | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – | – |
Tesla M60 | – | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
Tesla M10 | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | – | – |
Tesla M6 | – | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ | ✔ |
GRID K2 | – | – | – | – | – | – | – | – | – | – | – | – | – | ✔ | ✔ | ✔ |
GRID K1 | – | – | – | – | – | – | – | – | – | – | – | – | – | ✔ | ✔ | ✔ |
服务器和显卡兼容性查询地址:https://www.nvidia.cn/data-center/resources/vgpu-certified-servers/
英伟达最全vGPU 链接:http://vgpu.com.cn/可以查看所有相关的文档
NVIDIA Grid驱动版本匹配地址:
https://docs.nvidia.com/grid/get-grid-version.html
3、开启vsphere ssh功能
在vSphere平台下安装vGPU驱动之前需要打开ESXi的SSH服务,方便来传输vib驱动安装包以及后面一些命令行的操作。
ESXi需要加入vCenter,vGPU相关的一些配置都需要在vCenter下来修改。
前置设置,ESXI上的所有虚拟机关闭电源;并将ESXI置于维护模式;在BIOS 中启用 SR-IOV,然后在 ESXi 的 PCI 硬件配置中允许 SR-IOV。为所需的显卡启用 SR-IOV 后,重新启动系统。
4、上传驱动包
使用WinSCP把驱动压缩包中Host主机目录下相关文件传到ESXi的/tmp目录下,根据驱动版本不同,文件夹下文件也不同,对于目前的长周期支持分支vGPU 13.x(ESXi 7.0 推荐使用,不支持ESXi 8.0),Host主机仅有一个NVD-VGPU开头的驱动包,对于最新的产品分支vGPU 15.x及之后(支持ESXi 8.0) ,还会有一个nvd-gpu-mgmt-daemon开头的文件,Host主机下的NVD-VGPU开头的驱动zip文件不需要再次解压,请把文件直接拷过去。
对于nvd-gpu-mgmt文件,在最新的15.3版本中做了2次打包,所以nvd-gpu-mgmt-daemon_525.125.03-0.0.0000_21816754-package.zip这个文件需要解压,然后把解压得到的nvd-gpu-mgmt-daemon_525.125.03-0.0.0000_21816754.zip文件拷过去,注意我们需要的安装文件是不带package结尾的,之前的版本如15.2不用解压直接拷过去就行,最新16.0版本驱动也直接可用,安装之前先检查一下。
5、主机安装显卡驱动
5.1、安装驱动之前先检查一下GPU是否被正常识别到,SSH到ESXi,使用命令:lspci | grep NVIDIA 来检查。
5.2、服务器需要在BIOS里开启SR-IOV和Monitor/Mwait功能
登录ESXI网页将主机进入维护模式,开始安装驱动
对于vGPU 13.x:esxcli software vib install -d /tmp/NVD-VGPU*.zip
5.3、对于vGPU 15.x及之后需要执行俩次命令:
esxcli software vib install -d /tmp/NVD-VGPU*.zip
esxcli software vib install -d /tmp/nvd-gpu-mgmt-daemon*.zip
5.4、等待几分钟,提示安装完成,然后将显卡切换为直通模式,重启一下主机验证驱动是否正常。注意:这一步必须要设置, 否则无法进行下一步的操作,笔者曾经在这里付出了真金白银的惨痛代价。
重启完成后输入nvidia-smi
5.5、特别注意2点:1、ECC模式是否显示off 2、默认情况下vSphere会使用vsga模式,而不是vGPU模式,所以能看到最下面xorg条目,这个我们后面改。
如果看到nvidia-smi信息里面ECC区域不是off状态,而是0,这说明你现在的ECC是启用的状态,并不是所有vgpu都可以使用ECC,如果你不能确认你的环境是否需要启用ECC,那建议先关闭ECC
5.6、使用命令来关闭ECC,注意,启用或者关闭ECC都需要重启主机。
执行命令关闭ECC,关闭后需要重启主机 nvidia-smi –e 0
5.7、驱动安装完成后将主机退出维护模式。
6、vCenter配置显卡共享模式
6.1、登陆vCenter,定位到主机-配置-图形,编辑主机图形设置,可以看到默认是共享模式,也就是vsga,我们需要切换到“直接共享”才能使用vGPU。
6.2、默认为共享,修改成直接共享,并重启xorg服务。
更改成功以后就没有xorg条目了,可以开始使用vGPU了!
7、分配虚拟机显卡资源
7.1、编辑虚拟机,新增PCI设备,可根据授权来分配显卡Q B A
显存分配必须一致,比如一个主机下所有虚拟机显存都是2Q 或者4Q,
不能虚拟机1是2Q,虚拟机2是4Q,显卡资源调度会有问题。
7.2、GPU显卡选择类型如下:
每一种物理GPU支持的vGPU类型都不一样。VGPU显卡性能主要以分配的显存及显卡类型进行划
分,即虚拟机所分配的虚拟显卡型号。VGPU有几种后缀:
1.Q系列虚拟GPU类型针对设计师和高级用户。(vDWS,虚拟工作站)
2.B系列虚拟GPU类型针对高级用户。(vPC,虚拟pc)
3.A系列虚拟GPU类型针对虚拟应用程序用户。(VAPP虚拟应用,有点类似于远程应用)
8、系统安装驱动包
8.1、Win10系统挂载好显存后,开机安装显卡驱动包
安装虚拟机驱动,注意一定要使用安装包内对应版本的驱动(使用不同版本可能会有问题),截图示例win10系统驱动包
安装驱动后,重启虚拟机,在控制台界面,看到“黑屏”,这是正常的。使用远程桌面RDP协议登录