qgpureset (1) - Linux Manuals
qgpureset: reset GPU error counts
NAME
SYNOPSIS
DESCRIPTION
The qgpureset command will request a MOM to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.Changing the GPU mode requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.
OPTIONS
- -H host
- Specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.
- -g gpuid
- Specifies the ID of the GPU.
- -p
- Specifies to reset the GPU's permanent ECC error count.
- -v
- Specifies to reset the GPU's volatile ECC error count.
OPERANDS
NoneSTANDARD ERROR
The qgpureset command will write a diagnostic messages to standard error for each error occurrence.EXIT STATUS
Upon successful processing of all the operands presented to theIf the qgpureset command fails to process any operand, the command exits with a value greater than zero.