As Java becomes increasingly popular for bioinformatics applications (GATK and picardTools come to mind), users have been increasingly receiving warnings from the HPCC about "over-utilization". These warnings can result in killed jobs and in some cases suspended accounts. Many users have been running these applications for some time without incident until fairly recently.
The problem stems from built in functionality within the Java VM that exists to empty and compact Java heap (memory) space, called "Garbage Collection" (GC). When certain portions of the heap become full, GC is activated. GC types can be subdivided into "minor" and "major" types, depending on the memory space being addressed. Depending upon the methodology employed, some GC may be accomplished in parallel (with multiple threads), or in serial (single threaded). By default, Java applications on the HPCC will use parallel multi-threaded GC for at least some of these operations. When using the default Java VM settings, users cannot predict with any great accuracy how many threads will be used. Extensive testing and observation of many of these programs shows that many threads can be spawned during a GC event, but only for a couple of seconds. These momentary events can occur several times over the course of a job run. Thus, depending upon the tools used by the HPCC to detect over-utilization and the frequency and timing of their execution, a job may be flagged as "over-utilizing". Once a user is flagged, closer scrutiny of that user's processes is often implemented, leading to more warnings.
The accuracy of the designation "over-utilization" in these cases not withstanding, this presents a problem for people wishing to use Java bioinformatics applications. To address this issue, I propose the following remedies:
- Since GC does not occur unless heap size is filled to a certain level, you can increase the maximum and minimum heap sizes (-Xmx and -Xms, respectively) on the Java command line. I would recommend using substantially more than you are used to using.
- Make sure to modify your "mem" request in your job script to meet or exceed "-Xmx" specifications.
- You may also use the following additional flags: "-XX:UseParallelGC" and "-XX:ParallelGCThreads=#", where "#" is some number of GC threads.
- When using "-XX:ParallelGCThreads=#", make sure to increase your "ppn" request accordingly. Add at LEAST "1" to the number of GC threads and assign that to your value for "ppn"
- If you specify "-XX:ParallelGCThreads=#", you should assume that this number of GC threads will be applied PER PROCESS THREAD. So specifying 4 application threads, and 4 parallel GC threads will result in 16 total GC threads, plus the 4 main process threads (20 threads total)
With respect to the flag "-XX:UseParallelGC", this should restrict "major" GC collection to a single thread, although "minor" collection should occur in multiple threads. The "-XX:ParallelGCThreads=#" flag should further limit the number of parallel threads used for "minor" GC collection.
Based on previous experiments, I generally do NOT recommend using the "-XX:UseSerialGC" flag, since, while this will restrict all GC to one thread, it will result in a significant performance hit.
Failing all of the suggestions above, you should seek to run your Java application locally if at all possible. Some tools like picardTools can generally be run effectively on a local Linux box, while some others such as GATK are impractical to run without substantial local server resources.