qgpureset (1) - Linux Manuals

qgpureset: reset GPU error counts

NAME


 qgpureset - reset GPU error counts

SYNOPSIS


 qgpureset -H host -g gpuid -p -v

DESCRIPTION

The qgpureset command will request a MOM to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.

Changing the GPU mode requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.

OPTIONS

-H host
Specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.
-g gpuid
Specifies the ID of the GPU.
-p
Specifies to reset the GPU's permanent ECC error count.

-v
Specifies to reset the GPU's volatile ECC error count.

OPERANDS

None

STANDARD ERROR

The qgpureset command will write a diagnostic messages to standard error for each error occurrence.

EXIT STATUS

Upon successful processing of all the operands presented to the
 qgpureset command, the exit status will be a value of zero.

If the qgpureset command fails to process any operand, the command exits with a value greater than zero.

SEE ALSO

pbs_mom(8B) and pbs_server(8B)