PaperBlast provides an interactive search tool to link protein sequences to published and open source scientific articles.
From the website:
"PaperBLAST builds a database of protein sequences that are linked to scientific articles. These links come from automated text searches against the articles in EuropePMC and from manually-curated information from GeneRIF, UniProtKB/Swiss-Prot, BRENDA, CAZy (as made available by dbCAN), CharProtDB, MetaCyc, EcoCyc, REBASE, and the Fitness Browser. Given this database and a protein sequence query, PaperBLAST uses protein-protein BLAST to find similar sequences with E < 0.001. "
The PaperBLAST database links 380,859 different protein sequences to 918,163 scientific articles. Searches against EuropePMC were last performed on September 3 2018.
Visit the PaperBLAS website.
Found this today while I was working on some RNA-seq exercise. It's a very nice SAMTools tutorial written by Ethan Cerami:
It offers a detailed complement to the example RNA-seq pipeline tutorial by Jason Woods (featuring TopHat and cufflinks).
Anatomy of a SAMTools tview screen (click to enlarge):
A test of the backup generator system for the MPS building was conducted between 10-11am on Monday, January 16th. This test was very important, since there was a lag between the time the house power went down and the generator power kicked in. That meant that this test would help us evaluate the effectiveness of the server room UPS, the response of the facility cooling systems, and the behavior of the rack transfer switches.
All systems operated exactly as intended. The UPS provided uninterrupted power during the switching between house and generator power. In addition, single corded servers were successfully switched from house to backup power by the rack transfer switches, without interruption. Finally, the cooling system successfully switched from the cooling tower to on-board compressor units within a couple of minutes, meaning that climate conditions were successfully maintained with in the optimal windows during the test.
With the successful completion of these tests, the MPS Server Room facility is now truly open to receiving equipment. If you are interested in housing server equipment in this facility, please contact John Johnston for more information.
Upgrades to the MPS Server Room are now complete. A tour of the facility for affiliated faculty and personnel took place on August 22nd, and final heat/load testing is underway. Once that is complete, we will be able to announce an opening/availability date. We hope to be back in the facility right around the time classes begin.
Humidification has been added to both the existing Liebert and the new 5-ton capacity unit. Variable fan speed control and reheater units have also been added to the existing Liebert so that high humidity can effectively be removed from the facility. In addition, hot aisle containment has been added to the racks to make cooling more efficient.
Watch this space for information on when the facility has been officially reopened.
At a recent meeting concerning the status of MPS Server Room upgrades, we were apprised that the current target ETA for the completion of climate control fixes is now set as late July 2016. It is hoped that we can get the upgrades in-place and tested prior to the commencement of Fall Semester. If you have any questions, do not hesitate to send me an email.
After discovering several issues related to networking and climate control deficiencies in the MPS Server Room, equipment was moved to a temporary space in Biochemistry pending the completion of various fixes. To date, a separate VLAN has been established by MSU Network Services for the facility. This should mitigate disruptions propagating through the network by unruly machines in the MPS and PSS buildings, by isolating equipment in the managed facility. Problems with the UPS backup appear also to have been rectified. However, cooling system modifications are needed before stable climate control conditions can be maintained consistently and reliably.
Cooling system upgrades are expected to take from 3-6 months to complete, after which, equipment will be relocated back to the facility.
If your lab has rack mounted server equipment and need a place to host it, limited space is available on a rack in BMB. I am actively monitoring the climate conditions in the space and maintain the relocated equipment there. Space is very limited, so if you think you're interested, please send me an email.
Viking and MySQL
The "viking" server used to provide MySQL service for OrthoMCL runs on the HPCC had a disk crash recently. Fortunately, everything of importance was backed-up and the the system has been restored. If you are a biological researcher at MSU and need MySQL server access for bioinformatics data analysis, please contact me for more information. I can generally have you setup in about 1 day. Storage space is somewhat limited, and I do not provide backend webhosting support, or long term data storage. The MySQL service is for active bioinformatics analysis only.
The latest "cloud" service offering for biological researchers is "Google Genomics". For a fee, researchers can store and analyze data, perform variant searches, and execute a number of other bioinformatics tasks. I would be interested in hearing from anyone that has had some experience using this service and what are the impressions, especially as compared to iPlant, Amazon, etc.? Drop me a line if you have anything you'd be willing to share.
Sysdig is a very useful open source application for Linux that provides realtime information on system state, activity, and performance. This article briefly summarizes some of its useful features and potential administration capabilities (note, in many cases, you will need root or sudo privileges to run many of these commands).
Realtime monitoring of all user commands:
Show every file under a particular directory:
Dump system activity to a binary file, and read it later:
List all sysdig "chisels":
Example chisel list:
List processes that have a high number of file handles:
See top directories with high volume of I/O activity:
Show files with highest levels of I/O in bytes:
As some of you may be aware, iCER/HPCC have been placing limits on the software they will install for users. In some cases, users making requests are being asked for the "numbers" of people who will be using the new software title or upgrade. Unfortunately, some bioinformatics software (particularly pipelines) can be unduly complex for many users to install correctly. To that end, I have been installing software packages for users on request.
Custom/private software installation requires that you provide me with access to a portion of either your /mnt/home or /mnt/research directory spaces. Research spaces are often times preferable, since the installs can be shared with all the members of a particular lab or working group. As part of the installation process, I can setup a custom private modules system, that allows you to use your custom installs, while not preventing you from also utilizing HPCC modules. Essentially, your private modules will be checked FIRST for software titles on load, and the HPCC modules secondarily.
If you are interested, please send me an email with your request.
Registration is now open for the 2015 Plant Biotechnology for Health and Sustainability Symposium to be held October 9 & 10 in 1200 Molecular Plant Sciences.
This Symposium is part of an interdisciplinary effort at MSU to develop training opportunities for graduate students in plant biotechnology.
For more information, please visit the symposium website.
iPlant Atmosphere has made a new user interface available which substantially increases the ability to manage images and instances. Some of the new features include:
- Users can edit existing images by adding tags, changing descriptions, etc. Previously, once an image was created, no changes were possible to even descriptive elements without launching the image and generating a brand new imaging request.
- It is now possible to modify the content of images by launching them, making your changes/additions, and then requesting a new image as a VERSION of the original. Once that "version" imaging is complete, you can choose to launch a particular named and tagged image, and then further choose which specific version you want. This reduces the proliferation of brand new whole images, all with different names.
- Active instances have new options such as "redeploy" (regenerate this running instance from scratch). In the past, one would have to destroy the instance and launch a brand new one to get back to the base image.
- Overall flow and navigation is much improved.
Please note that the above is not a comprehensive list of new features.
I've noticed over the last couple of weeks that new instance launches are continually failing. Resources are allocated and the image begins building, only to stall at the final deployment stages. After a time, the stall turns into an "active deployment error" and the launch must be reinitiated. I have also heard similar reports of this from other users.
Whereas technical support and response times were much improved early this year, I have personally experienced very poor response times. Upon filing a request for assistance for an "active deploy error" (one of 4 in the last 2 days), I did not even receive confirmation of my support ticket until about 10 hours later, and I'm still waiting for an actual response. Again, several users have reported to me that iPlant support response over the last couple of months has been abysmal - so much so they are actively seeking new alternatives.
iPlant is an extremely valuable and useful resource, and I certainly hope they get these problems rectified shortly.
I have finally received a message from Tech Support. Once I have a resolution, I will create a new post.
Here's is an interesting "one-liner" (powerful, one line Linux command) for graphing your command usage in a terminal using ASCII:
The result will looking something like this:
You can alter the portion "head -20" to use whatever value you like (the version shown gives the top 20 only).
Here's an example on another machine using "50" (i.e. top 50 commands):
The following seminars are being offered this Fall 2015 and may be of interest to the MSU biological research community:
- Plant Biology Seminar Series - Room 101 Biochemistry Bldg., Mondays at 4:00pm
- Science at the Edge - 1400 Biomedical and Physical Sciences Bldg., Fridays at 11:30am
- Plant, Soil and Microbial Sciences Seminar Series - A149 Plant and Soil Science Bldg., Thursdays at 4:00pm
- Transcription Journal Club - 502 Biochemistry Bldg., Fridays at 1:00pm
If you have a seminar you would like publicized that would be of interest to the MSU biology community and don't see it here, please drop me a line with the details.
Rosalind is "a platform for learning bioinformatics and programming through problem solving." Rosalind features a "Python Village" where biologists can learn the fundamentals of Python for solving various biological problems.
The "Problems" section of the "Bioinformatics Stronghold" offers a variety of bioinformatics problems that require the use of programming to solve. Users are presented with a particular problem (e.g., counting nucleotides in a sequence) and asked to solve it using their Python programming skills in a set time limit. Answers are uploaded and verified for correctness. Users can challenge themselves and track their progress through these exercises. A interesting and useful way to sharpen and test your skills.
A free Wikibook on NextGen Sequence Analysis (NGS) has been released. You can access it here:
Although it is somewhat lacking in information for "Big Data" issues, it does provide basic "howto" tutorials on genome assembly, variant analysis, short read alignments, and GWAS. This can be a nice supplement to beginning your ventures into bioinformatics.