Cactus Code Thorn TAUClocks Thorn Author(s) : Maciej Brodowicz Erik Schnetter Thorn Maintainer(s): Maciej Brodowicz Erik Schnetter ----------------------------------------------------------------- Purpose of the thorn: TAUClocks registers TAU timers as Cactus clocks with the flesh. Existing Cactus timers can then be used to output this information for individual scheduled routines and give a precise profile on how good a Cactus simulation is performing. TAU homepage: http://www.cs.uoregon.edu/research/tau/home.php PAPI homepage: http://icl.cs.utk.edu/papi/ PCL homepage: http://www.fz-juelich.de/jsc/PCL/ mpiP homepage: http://mpip.sourceforge.net/ perfctr repository: http://user.it.uu.se/~mikpe/linux/perfctr/ Requirements: TAU attempts to combine several performance profiling methods, approaches and libraries and make them available from an easily accessible API (in addition to source-level instrumentation, which currently doesn't seem to work well with Cactus). Many of the most useful low-overhead profiling techniques require hardware support (hardware event counters, high-resolution timers), which significantly varies from architecture to architecture, making it a portability nighmare. To complicate the things even more, the access to those hardware resources needs to be controlled and coordinated by the OS, however, most OSes are not configured to provide this support "out of the box". For Linux running on x86-compatibles, the required set of kernel patches is supplied by the perfctr project; AIX on Power processors utilizes PMAPI kernel extensions from IBM, etc. In all cases, the upgrades of the system software/kernel must be performed by administrators (who are typically overly cautious about potential destabilization of their production systems and hence reluctant to introduce changes). The kernel-enhacing packages usually provide simple libraries to facilitate the access to the added profiling features (the good news here is that if they are missing, the average users may still compile them themselves). While it is possible to use them directly, their interfaces are not standardized and frequently require in-depth knowledge of processor features. Also, porting to other platforms would invariably imply changes in application's profiling source code. The attempts to mitigate this problem resulted in creation of 3rd party libraries providing a uniform API on all supported architectures, most notably: PAPI from ICL at UT, and PCL from FZ Juelich. Note that TAU requires at least one of them to take advantage of hardware-augmented profiling. Questions to the TAU developers: 1. How can unsupported counters be detected? We know how to detect unsupported PAPI counters, but what about others, e.g. LINUX_TIMERS? 2. When are LINUX_TIMERS supported? Do they need to be explicitly enabled when TAU is configured? Is there a macro LINUX_TIMERS or TAU_LINUX_TIMERS that indicates this? 3. How can rdtsc be accessed? (P_WALL_CLOCK_TIME) 4. The macro TAU_GET_FUNC_VALS returns uninitialised pointers. It first creates a list of all counters, leaving the fields for unsupported undefined. This list is then copied into a second list, but the second list is too short, so that there are both undefined fields and missing fields. 5. Is there a better way than environment variables to select counters? If not, how can one ask TAU to evaluate the environment variables selecting the counters? 6. TAU seems to require many different installations, depending on what features it supports. We generally require support for MPI, OpenMP, and PAPI. We probably don't want the tools for automatic source code instrumentation; especially OPARI seems to require symbols in the executable that we don't have. Should we install our own versions? Or should be bother the system administrators to support our version? (Will require installed versions without OPARI.) 7. For PAPI, how can we detect when there are conflicting counters selected? If there are conflicting counters, all counter results seem to be zero. (papi_event_chooser) 8. How can we use multiplexing? (currently not supported by TAU.) 9. Should we maybe just define timers and let the user define events via environment variables? In this case, how can we detect unsupported or conflicting settings? (papi_event_chooser) 10. Using the TauMpi library apparently initialises TAU so early that it looks at the environment variables which determine the counters that TAUClock sets these too late. Hints from Sameer: Memory allocation tracking: -DPROFILE_MEMORY: mallinfo() values for every function TAU_track_memory(): same thing, but every 10 seconds TAU_change_memory_interval(): change interval TAU_track_memory_here(): same thing, but only at these locations malloc/free: leak detectors TAU_new / TAU_delete same for Fortran maybe combine with Cactus make system to replace new/delete etc. context events / atomic events call PMPI_Init instead of MPI_Init (same with MPI_Finalize) TAU_profile_set_node (RANK) TAU_profile_init