GATK in the Cloud – The Phosphorus March Meetup
This month’s Big Data Meetup centered on GATK, a suite of software tools used by the Broad Institute for scaling genomic pipelines. Geraldine A. van der Auwera, Ph.D, Associate Director of Outreach and Communications at the Broad Institute, talked at length on GATK as a means to solve many of the problems that typically accompany genomic pipeline expansion.
The Broad Institute is a biomedical and genomic research center located in Massachusetts that partners with MIT and Harvard. Its genomics platform comprises a team of professionals tasked with the development and optimization of tools for researchers to conduct their work. GATK, which stands for “Genomic Analysis ToolKit,” offers a wide range of tools with a focus on genotyping and variant discovery. The toolkit is engineered for projects of any size, thanks to high-performance computing features and powerful processing ability.
When compared to internet giants such as Facebook, YouTube, and Netflix, Broad generates an astounding amount of data every day — approximately 17 terabytes! The complete amount of data Broad manages is 45 petabytes (PB) — sizeable compared to Facebook’s 37 PB, for example. Dr. van der Auwera covered in detail the various pipelining solutions GATK provides to manage this comprehensive data, including the toolkit’s use of Workflow Description Language, Cromwell execution engine, the Workbench Platform, FireCloud collaborative analysis platform, and more. As if this breadth of resources weren’t sufficient, Broad even promises additional developments coming soon, including GATK4, which is advertised as a “completely new, streamlined engine.”
Dr. van der Auwera’s talk was followed by spirited discussion and questions from the audience that teased out further specifics of Broad’s technological models. Following tradition, attendees to the Big Data Meetup event were treated to a veritable cornucopia of beer selections and novel deep dish / hand-tossed concoctions.
A similarly merry time beckons you to our next Big Data Meetup! Follow us on our Meetup page to keep up to date. For more info on GATK, visit the Broad Institute’s designated website page, here. You can also check out the video of Dr. van der Auwera’s talk, below.
Please feel free to share some of your own interesting resources or suggestions in the comment section at the bottom of the page!