An enterprise-ready software platform for low-pass sequencing analysis

Joe Pickrell
The Gencove Blog
Published in
2 min readOct 8, 2019

--

Over the past two years, we at Gencove have launched low-pass sequencing products as a superior alternative to genotyping arrays across species, with applications ranging from genome-wide association studies to molecular breeding.

During this time, we’ve seen dramatic decreases in the cost and operational complexity of implementing sequencing workflows— companies like Illumina and BGI continue to drive down the cost to sequence a base of DNA, while library preparation kits are now built with cost-effectiveness and automation in mind (see e.g. offerings from companies like seqWell, Twist, and iGenomX).

With these advances, a small investment in implementing low-pass sequencing can turn any lab into a high-throughput genome center — for example, we run over 1,500 samples at once in a single two-day run of an Illumina NovaSeq machine. To process this volume of low-pass sequencing data, we developed custom imputation tools and optimized software pipelines.

Today we are pleased to make our low-pass analysis tools and pipelines broadly available via a new cloud-based platform. This platform is available via a web app at gencove.com, or can be interacted with in an automated manner via our API and command line interface.

In building out this infrastructure, we optimized for features to make implementing low-pass sequencing straightforward:

Accuracy. For all of the species we support, we’ve created haplotype reference panels that allow for rapid and accurate genotype imputation across the entire genome. In benchmarks, our tools are substantially faster and more accurate than existing options, and we continue to add data to our reference panels over time. Additionally, we work with customers to implement custom reference panels for specific populations or species of interest.

Average imputation accuracy and runtimes in 2Mb chunks of the human genome. NA12878 was held out from the 1000 Genomes Phase 3 reference panel and then imputed back. PPA: concordance with the Genome-in-a-Bottle gold standard at sites where the gold standard reports at least one non-reference allele. NPA: concordance at sites where the gold standard reports two reference alleles

Usability. The platform can be accessed via our web app, but for high-throughput users we developed an easy-to-use command line interface. Alignment, variant calling, and all downstream analysis like ancestry analysis or polygenic score profiling can be launched with a single command.

Reliability. We include automated quality control metrics to quickly detect and fail poor-quality samples, and we run our infrastructure on high-reliability AWS services. The platform is built with compliance in mind, and is HIPAA-compliant for companies that handle sensitive medical data.

We’re excited to work with our current and new partners to advance our mission to make genome sequencing technologies accessible and interpretable. If there are features you’d like to see in the next release of the platform please reach out!

--

--