Refactor PAPI support, and add profiling of multithreaded GC