I’d like to use hardware performance counter, specifically x86 CPUs to obtain cache misses or branch mis-prediction. Performance counters are heavily used in advanced profilers like Intel VTune. Please don’t be confused performance counters on Windows operating systems.
In order to use these counters in C/C++ program, one may use PAPI: http://icl.cs.utk.edu/papi/
This allows you to easily use performance counters, but on only Linux. PAPI once supported Windows, but not now.
Is there anyone who recently tried PAPI or other APIs to use hardware performance counters on Windows?
However, Windows prohibits user-mode applications to execute this instruction by setting CR4.PCE to 0. Presumably, this is done because the meaning of each counter is determined by MSR registers, which are only accessible in kernel mode. In other words, unless you’re a kernel-mode module (e.g. a device driver), you are going to get “privileged instruction” trap if you attempt to execute this instruction.
If you’re writing a user-mode application, your only option is (as @Christopher mentioned in comments) to write a kernel module which would execute this instruction for you (you’ll incur user->kernel call penalty) and enable test signing on your machine so your presumably self-signed “driver” can be loaded. This means you can’t easily distribute this app, but that’ll work for in-house tuning.
Answered By – Rom
Answer Checked By – Dawn Plyler (BugsFixing Volunteer)