$seoHelper.renderFullSimple($sitemeshPage,"{2} - {3}")

Blog

Found this today while I was working on some RNA-seq exercise.  It's a very nice SAMTools tutorial written by Ethan Cerami:

http://biobits.org/samtools_primer.html

It offers a detailed complement to the example RNA-seq pipeline tutorial by Jason Woods (featuring TopHat and cufflinks).
 

Anatomy of a SAMTools tview screen (click to enlarge):


A test of the backup generator system for the MPS building was conducted between 10-11am on Monday, January 16th.  This test was very important, since there was a lag between the time the house power went down and the generator power kicked in.  That meant that this test would help us evaluate the effectiveness of the server room UPS, the response of the facility cooling systems, and the behavior of the rack transfer switches.

All systems operated exactly as intended.  The UPS provided uninterrupted power during the switching between house and generator power.  In addition, single corded servers were successfully switched from house to backup power by the rack transfer switches, without interruption.  Finally, the cooling system successfully switched from the cooling tower to on-board compressor units within a couple of minutes, meaning that climate conditions were successfully maintained with in the optimal windows during the test.

With the successful completion of these tests, the MPS Server Room facility is now truly open to receiving equipment. If you are interested in housing server equipment in this facility, please contact John Johnston for more information.

Upgrades to the MPS Server Room are now complete.  A tour of the facility for affiliated faculty and personnel took place on August 22nd, and final heat/load testing is underway.  Once that is complete, we will be able to announce an opening/availability date.  We hope to be back in the facility right around the time classes begin.  

Humidification has been added to both the existing Liebert and the new 5-ton capacity unit.  Variable fan speed control and reheater units have also been added to the existing Liebert so that high humidity can effectively be removed from the facility.  In addition, hot aisle containment has been added to the racks to make cooling more efficient.

Watch this space for information on when the facility has been officially reopened.

MPS Server Room Update

At a recent meeting concerning the status of MPS Server Room upgrades, we were apprised that the current target ETA for the completion of climate control fixes is now set as late July 2016.  It is hoped that we can get the upgrades in-place and tested prior to the commencement of Fall Semester. If you have any questions, do not hesitate to send me an email.

MPS Server Room Status

After discovering several issues related to networking and climate control deficiencies in the MPS Server Room, equipment was moved to a temporary space in Biochemistry pending the completion of various fixes.  To date, a separate VLAN has been established by MSU Network Services for the facility.  This should mitigate disruptions propagating through the network by unruly machines in the MPS and PSS buildings, by isolating equipment in the managed facility.  Problems with the UPS backup appear also to have been rectified.  However, cooling system modifications are needed before stable climate control conditions can be maintained consistently and reliably.  

Cooling system upgrades are expected to take from 3-6 months to complete, after which, equipment will be relocated back to the facility.

If your lab has rack mounted server equipment and need a place to host it, limited space is available on a rack in BMB.  I am actively monitoring the climate conditions in the space and maintain the relocated equipment there.  Space is very limited, so if you think you're interested, please send me an email.

Viking and MySQL

The "viking" server used to provide MySQL service for OrthoMCL runs on the HPCC had a disk crash recently.  Fortunately, everything of importance was backed-up and the the system has been restored.  If you are a biological researcher at MSU and need MySQL server access for bioinformatics data analysis, please contact me for more information.  I can generally have you setup in about 1 day.  Storage space is somewhat limited, and I do not provide backend webhosting support, or long term data storage.  The MySQL service is for active bioinformatics analysis only.

Google Genomics

The latest "cloud" service offering for biological researchers is "Google Genomics".  For a fee, researchers can store and analyze data, perform variant searches, and execute a number of other bioinformatics tasks.  I would be interested in hearing from anyone that has had some experience using this service and what are the impressions, especially as compared to iPlant, Amazon, etc.?  Drop me a line if you have anything you'd be willing to share.

Overview

Sysdig is a very useful open source application for Linux that provides realtime information on system state, activity, and performance.  This article briefly summarizes some of its useful features and potential administration capabilities (note, in many cases, you will need root or sudo privileges to run many of these commands).

Useful One-Lliners

Realtime monitoring of all user commands:

sysdig -pc -c spy_users 

Show every file under a particular directory:

sysdig evt.type=open and fd.name contains /data

Dump system activity to a binary file, and read it later:

sysdig -w trace.scap
sysdig -r trace.scap

List all sysdig "chisels":

sysdig -cl

Example chisel list:

List processes that have a high number of file handles:

sysdig -c fdcount_by proc.name "fd.type=file"

See top directories with high volume of I/O activity:

sysdig -c fdbytes_by fd.directory "fd.type=file"

Show files with highest levels of I/O in bytes:

sysdig -c topfiles_bytes

More Information

 

As some of you may be aware, iCER/HPCC have been placing limits on the software they will install for users.  In some cases, users making requests are being asked for the "numbers" of people who will be using the new software title or upgrade. Unfortunately, some bioinformatics software (particularly pipelines) can be unduly complex for many users to install correctly.  To that end, I have been installing software packages for users on request.

Custom/private software installation requires that you provide me with access to a portion of either your /mnt/home or /mnt/research directory spaces.  Research spaces are often times preferable, since the installs can be shared with all the members of a particular lab or working group.  As part of the installation process, I can setup a custom private modules system, that allows you to use your custom installs, while not preventing you from also utilizing HPCC modules.  Essentially, your private modules will be checked FIRST for software titles on load, and the HPCC modules secondarily.

If you are interested, please send me an email with your request. 

Registration is now open for the 2015 Plant Biotechnology for Health and Sustainability Symposium to be held October 9 & 10 in 1200 Molecular Plant Sciences.

This Symposium is part of an interdisciplinary effort at MSU to develop training opportunities for graduate students in plant biotechnology.

For more information, please visit the symposium website.

iPlant Updates & Issues

Updates

iPlant Atmosphere has made a new user interface available which  substantially increases the ability to manage images and instances.  Some of the new features include:

  • Users can edit existing images by adding tags, changing descriptions, etc. Previously, once an image was created, no changes were possible to even descriptive elements without launching the image and generating a brand new imaging request.
     
  • It is now possible to modify the content of images by launching them, making your changes/additions, and then requesting a new image as a VERSION of the original.  Once that "version" imaging is complete, you can choose to launch a particular named and tagged image, and then further choose which specific version you want.  This reduces the proliferation of brand new whole images, all with different names.
     
  • Active instances have new options such as "redeploy" (regenerate this running instance from scratch).  In the past, one would have to destroy the instance and launch a brand new one to get back to the base image.
     
  • Overall flow and navigation is much improved.

Please note that the above is not a comprehensive list of new features.

Issues

I've noticed over the last couple of weeks that new instance launches are continually failing.  Resources are allocated and the image begins building, only to stall at the final deployment stages.  After a time, the stall turns into an "active deployment error" and the launch must be reinitiated.  I have also heard similar reports of this from other users.  

Whereas technical support and response times were much improved early this year, I have personally experienced very poor response times.  Upon filing a request for assistance for an "active deploy error" (one of 4 in the last 2 days), I did not even receive confirmation of my support ticket until about 10 hours later, and I'm still waiting for an actual response.  Again, several users have reported to me that iPlant support response over the last couple of months has been abysmal - so much so they are actively seeking new alternatives.  

iPlant is an extremely valuable and useful resource, and I certainly hope they get these problems rectified shortly.

I have finally received a message from Tech Support.  Once I have a resolution, I will create a new post.

 

 

 

Linux Command-Phu

Here's is an interesting "one-liner" (powerful, one line Linux command) for graphing your command usage in a terminal using ASCII:

history|awk '{print $2}'|sort|uniq -c|sort -rn|head -20|awk '!max{max=$1;}{r="";i=s=60*$1/max;while(i-->0)r=r"#";printf "%15s %5d %s %s",$2,$1,r,"\n";}'

The result will looking something like this:

             ls   251 ############################################################ 
             cd   133 ################################
           less   123 ############################## 
           perl    89 ###################### 
         screen    87 ##################### 
             rm    54 ############# 
           grep    54 ############# 
            top    45 ########### 
         logout    25 ###### 
            pwd    23 ###### 
           exit    23 ###### 
         module    16 ####
           exec    14 #### 
           bash    10 ### 
             wc     8 ## 
              }     7 ## 
    fasta_merge     4 # 
          print     3 # 
             if     3 # 
     gff3_merge     3 # 

You can alter the portion "head -20" to use whatever value you like (the version shown gives the top 20 only).  

Here's an example on another machine using "50" (i.e. top 50 commands):

             ls   220 ############################################################ 
             cd   178 ################################################# 
             ll   106 ############################# 
           more    78 ###################### 
           exit    63 ################## 
             rm    60 ################# 
        intel10    29 ######## 
         module    28 ######## 
           find    28 ######## 
             mv    24 ####### 
            ssh    19 ###### 
            vim    17 ##### 
            cat    14 #### 
      ./usearch    12 #### 
          qstat    10 ### 
          mysql     8 ### 
          mkdir     8 ### 
             cp     7 ## 
           grep     6 ## 
            pwd     4 ## 
		   ascp     4 ## 
          chmod     4 ## 
           wget     3 # 
          touch     3 # 
            scp     3 # 
         python     3 # 
    history|awk     3 # 
             df     3 # 
          chown     3 # 
             wc     2 # 
        usearch     2 # 
            tar     2 # 
           make     2 # 
             ln     2 # 
           ITSx     2 # 
        intel14     2 # 
        firefox     2 # 
             du     2 # 
          clear     2 # 
        chimera     2 # 
       checkjob     2 # 
          which     1 # 
      wgscelera     1 # 
          watch     1 # 
            top     1 # 
           stat     1 # 
          rsync     1 # 
          rmdir     1 # 
          quota     1 # 
           qsub     1 #
Fall Seminar Series

The following seminars are being offered this Fall 2015 and may be of interest to the MSU biological research community:

If you have a seminar you would like publicized that would be of interest to the MSU biology community and don't see it here, please drop me a line with the details.

Rosalind is "a platform for learning bioinformatics and programming through problem solving."  Rosalind features a "Python Village" where biologists can learn the fundamentals of Python for solving various biological problems.  

The "Problems" section of the "Bioinformatics Stronghold" offers a variety of bioinformatics problems that require the use of programming to solve.  Users are presented with a particular problem (e.g., counting nucleotides in a sequence) and asked to solve it using their Python programming skills in a set time limit.  Answers are uploaded and verified for correctness.  Users can challenge themselves and track their progress through these exercises.  A interesting and useful way to sharpen and test your skills.

NGS Wikibook

A free Wikibook on NextGen Sequence Analysis (NGS) has been released.  You can access it here: 

https://en.wikibooks.org/wiki/Next_Generation_Sequencing_%28NGS%29

Although it is somewhat lacking in information for "Big Data" issues, it does provide basic "howto" tutorials on genome assembly, variant analysis, short read alignments, and GWAS. This can be a nice supplement to beginning  your ventures into bioinformatics.

A new version of InterproScan (iprscan) has been released (version 5.14.53.0).  It is important to ensure you are using the latest version, especially if you utilize the InterPro lookup service.  Lookup service connectivity is frequently deprecated by the authors for older versions once the new version has had time to propagate to the user base.

For those that use only the local lookup, the latest version incorporates an significant upgrade to Pfam (28.0) as well as 166 new features.  One important improvement: the ability to use a non-default version of the "interproscan.properties" file.  Previous versions, if used on the HPCC (for example), required the user to copy the entire installation directory to their home or research space in order to alter the properties file to meet their requirements.  For version 5.14.53.0, simply copy the batch file "interproscan.sh" to a location where you have write permissions and add the line:

-Dsystem.interproscan.properties=/path/to/customised/properties/file \

...top the top of the file.  For example, the "interproscan.sh" file prior to editing:

"$JAVA" \
-XX:+UseParallelGC -XX:ParallelGCThreads=2 -XX:+AggressiveOpts \
-XX:+UseFastAccessorMethods -Xms128M -Xmx2048M \
-jar  interproscan-5.jar $@ -u $USER_DIR

And after adding the required line:

"$JAVA" \
-Dsystem.interproscan.properties=/path/to/customised/properties/file \
-XX:+UseParallelGC -XX:ParallelGCThreads=2 -XX:+AggressiveOpts \
-XX:+UseFastAccessorMethods -Xms128M -Xmx2048M \
-jar  interproscan-5.jar $@ -u $USER_DIR

Obviously, you should alter the portion "/path/to/customised/properties/file" to match your actual path.

IMPORTANT:  If running on the HPCC, you will also need to update the path to the "interproscan-5.jar" file.  For example:

"$JAVA" \
-Dsystem.interproscan.properties=/path/to/customised/properties/file \
-XX:+UseParallelGC -XX:ParallelGCThreads=2 -XX:+AggressiveOpts \
-XX:+UseFastAccessorMethods -Xms128M -Xmx2048M \
-jar  /opt/software/iprscan/5.14.53.0/interproscan-5.jar $@ -u $USER_DIR

Now run from the new batch file, for example:

./interproscan.sh -i proteins_test.fasta -f tsv

As of this writing (August 21, 2015), the HPCC has NOT updated InterproScan. If you want to install this in your home directory, feel free to drop me a line for assistance.