The Java VM has a built-in “garbage collection” (GC) feature that clears and compacts the Java memory space (heap). This prevents out of memory (OOM) conditions, and improves application performance.
A word of caution is in-order however, when using applications like Picard Tools or GATK on the HPCC. For the last several months, staff have been classifying the brief spawning of GC threads as “overutilization” if their monitoring scripts happen to intersect with a minor or major GC event. The term “overutilization” when correctly applied, is a condition where the number of CPUs used over the course of the job, exceeds the number requested. It is (or should be) a reflection of effective load. User’s so tagged get their jobs killed and their accounts flagged for additional monitoring. Some have even had all of their jobs killed - even the non-offending ones. In severe cases, accounts may even be suspended. Unfortunately, users have no practical way of anticipating how many threads will spawn, and their "overutilization events” are caught irregularly since the GC happens briefly, and only at sporadic intervals.
Testing shows that these GC threads spawn for only a second or two at a time, at most. Thus, overall processor utilization throughout the job run is still roughly 1:1, assuming everything else has been specified correctly. So how to deal with this dilemma?
Here are some alternatives:
- Increase your heap size at run time (-Xmx and -Xms). If your heap is bigger, GC will run less often.
- Specify -XX:UseParallelGC: this will restrict multiple threads to “minor” GC events, and “major” GC will be performed serially.
- Specify -XX:ParallelGCThreads=#, where “#” is the number of threads to use for parallel GC collection. If you can fix the number of threads here, you can fix your resource request to the load manager (i.e., ppn).
- Run the application locally if you can. Many of the labs I deal with have done exactly that.
Of the options above, #3 appears to be the most viable and reliable alternative if you wish to continuing running these applications on the HPCC. Options #1 and #2 may still get your run flagged, and #4 requires resources some labs simply do not have.