A Meetup with ADAM
December’s Big Data Genomics Meetup at Phosphorus HQ was yet another wonderful opportunity for great minds to convene, contemplate the latest in big data trends, and enjoy pizza with new friends.
The Meetup was led by Justin Paschall, a scientific software engineer from the AMPLab at UC Berkeley. Justin spoke at length about ADAM — a framework developed in the Apache Spark ecosystem to replace ad-hoc pipelines that interpret massive amounts of genomic and transcriptomic DNA and RNA sequencing data.
ADAM addresses the tenuous gluing-together of workflows that can occur when attempting to avoid major bottlenecks in scalability, long-term stability, and speed maintenance. ADAM leverages the Avro and Parquet frameworks, providing a more consistent and effective means of persisting, sharing, and processing integral genomic data, whereas the current representation of data uses a variety of different legacy formats, most of which have not been designed with scalability or BigData in mind. (For instance, the VCF format can be difficult to parse and was designed more for human readability, rather than large-scale data-processing.)
ADAM is open source, founded by the same UC Berkeley AMPLab that developed Apache Spark, and it re-envisions the way genomic analysis uses clusters of computers to solve Big Data challenges in science and biomedicine, bringing more efficiency and parallelization in development time and end-user advantages.
Justin also outlined the motivations behind the use of Apache Spark in genomics, gave details about Spark-based genomics APIs, shared strategies to integrate Spark into larger bioinformatics pipelines, and provided steps for getting involved in the BDG developer and user communities.
At the end of an illuminating presentation, participants were treated to a variety of beers, ranging from the crowd-pleasing concoctions of Brooklyn Brewery, to the ever-exotic ambrosia of Heineken.
For more info on ADAM, check out the project’s GitHub page here. You can also check out the video of Justin Paschall’s talk, below.
Please feel free to share some of your own interesting resources or suggestions in the comment section at the bottom of the page!