perf_event_open (2) - Linux Manuals
perf_event_open: set up performance monitoring
NAME
perf_event_open - set up performance monitoring
SYNOPSIS
#include <linux/perf_event.h> #include <linux/hw_breakpoint.h> int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu, int group_fd, unsigned long flags);
Note: There is no glibc wrapper for this system call; see NOTES.
DESCRIPTION
Given a list of parameters, perf_event_open() returns a file descriptor, for use in subsequent system calls (read(2), mmap(2), prctl(2), fcntl(2), etc.).A call to perf_event_open() creates a file descriptor that allows measuring performance information. Each file descriptor corresponds to one event that is measured; these can be grouped together to measure multiple events simultaneously.
Events can be enabled and disabled in two ways: via ioctl(2) and via prctl(2). When an event is disabled it does not count or generate overflows but does continue to exist and maintain its count value.
Events come in two flavors: counting and sampled. A counting event is one that is used for counting the aggregate number of events that occur. In general, counting event results are gathered with a read(2) call. A sampling event periodically writes measurements to a buffer that can then be accessed via mmap(2).
Arguments
The pid and cpu arguments allow specifying which process and CPU to monitor:- pid == 0 and cpu == -1
- This measures the calling process/thread on any CPU.
- pid == 0 and cpu >= 0
- This measures the calling process/thread only when running on the specified CPU.
- pid > 0 and cpu == -1
- This measures the specified process/thread on any CPU.
- pid > 0 and cpu >= 0
- This measures the specified process/thread only when running on the specified CPU.
- pid == -1 and cpu >= 0
- This measures all processes/threads on the specified CPU. This requires CAP_PERFMON (since Linux 5.8) or CAP_SYS_ADMIN capability or a /proc/sys/kernel/perf_event_paranoid value of less than 1.
- pid == -1 and cpu == -1
- This setting is invalid and will return an error.
When pid is greater than zero, permission to perform this system call is governed by CAP_PERFMON (since Linux 5.9) and a ptrace access mode PTRACE_MODE_READ_REALCREDS check on older Linux versions; see ptrace(2).
The group_fd argument allows event groups to be created. An event group has one event which is the group leader. The leader is created first, with group_fd = -1. The rest of the group members are created with subsequent perf_event_open() calls with group_fd being set to the file descriptor of the group leader. (A single event on its own is created with group_fd = -1 and is considered to be a group with only 1 member.) An event group is scheduled onto the CPU as a unit: it will be put onto the CPU only if all of the events in the group can be put onto the CPU. This means that the values of the member events can be meaningfully compared---added, divided (to get ratios), and so on---with each other, since they have counted events for the same set of executed instructions.
The flags argument is formed by ORing together zero or more of the following values:
- PERF_FLAG_FD_CLOEXEC (since Linux 3.14)
- This flag enables the close-on-exec flag for the created event file descriptor, so that the file descriptor is automatically closed on execve(2). Setting the close-on-exec flags at creation time, rather than later with fcntl(2), avoids potential race conditions where the calling thread invokes perf_event_open() and fcntl(2) at the same time as another thread calls fork(2) then execve(2).
- PERF_FLAG_FD_NO_GROUP
- This flag tells the event to ignore the group_fd parameter except for the purpose of setting up output redirection using the PERF_FLAG_FD_OUTPUT flag.
- PERF_FLAG_FD_OUTPUT (broken since Linux 2.6.35)
- This flag re-routes the event's sampled output to instead be included in the mmap buffer of the event specified by group_fd.
- PERF_FLAG_PID_CGROUP (since Linux 2.6.39)
- This flag activates per-container system-wide monitoring. A container is an abstraction that isolates a set of resources for finer-grained control (CPUs, memory, etc.). In this mode, the event is measured only if the thread running on the monitored CPU belongs to the designated container (cgroup). The cgroup is identified by passing a file descriptor opened on its directory in the cgroupfs filesystem. For instance, if the cgroup to monitor is called test, then a file descriptor opened on /dev/cgroup/test (assuming cgroupfs is mounted on /dev/cgroup) must be passed as the pid parameter. cgroup monitoring is available only for system-wide events and may therefore require extra permissions.
The perf_event_attr structure provides detailed configuration information for the event being created.
struct perf_event_attr {