Doing the sequence data dance
20 Jul 2018 by Evoluted New Media
Next-generation sequencing has ushered in a revolution of powerful genomic insights. So powerful in fact – the sheer amount of data produced has become a challenge. Here, Nicole Rose gives us a few ways through the data maze…
While advances in technology continue to make NGS easier, faster and cheaper, the sheer volume of data that is generated by sequencing techniques can be overwhelming. Bioinformatics and data management tools are available to make analysis and interpretation more manageable. In this article, we look at how digital solutions can improve NGS data management.
Increasing the speed, read length and throughput of DNA and RNA sequencing assays has enabled scientists to study the genomes of a diverse range of organisms that would otherwise have taken decades to investigate using conventional sequencing methods. Given that genome sequencing can now be accomplished in a matter of hours using the latest sequencing technologies, NGS has inspired novel applications not previously possible.1
NGS offers an alternative approach to traditional Sanger sequencing, giving researchers the ability to study more genetic information in less time. While there are several NGS platforms to choose from, some instruments are capable of analysing millions of DNA particles in parallel, while others (either due to design or other capacity limitations) analyse much fewer reads in parallel. In either case, each sequence is analysed multiple times for greater depth of coverage and accuracy.2
Bioinformatic analyses are then used to map the individual sequences to a reference genome if one is available, or piece together overlapping fragments for a new assembly if no reference is available. These approaches can be used to study entire genomes, both known or novel, or specific areas of interest such as exomes or selected genes.
Thanks to NGS, the goal of using genomic analysis to make clinical assessments for disease prevention, diagnosis and management is rapidly becoming a realityNGS is allowing researchers to probe fundamental cellular processes, including DNA replication, transcription, translation and methylation. Current applications include de novo genome assembly, DNA, RNA and epigenome sequencing, DNA methylation, and chromatin immunoprecipitation and sequencing (ChIP-seq).3 Each of these opens the door towards building a better understanding of genetic variation, transcriptomics, epigenomics, regulatory studies and diagnostics.
The impact of NGS applications is now beginning to be felt in the clinic, allowing biomedical scientists to better understand and discover treatments for a wide range of disorders.4 Technology is creating a new era of personalised medicine, where custom targeted gene sequencing, whole genome sequencing and multi-faceted bioinformatics tools are enabling more tailored medical care.5,6 Thanks to NGS, the goal of using genomic analysis to make clinical assessments for disease prevention, diagnosis and management is rapidly becoming a reality.
However, NGS comes with a major bottleneck that scientists and technology developers around the world are working to overcome. So much data is generated from sequencing workflows that the time taken for data analysis, interpretation and management actually exceeds that required for data generation in the first place.1 Bioinformatics tools are being developed to alleviate this issue, improving current time constraints and data organisation. Here, we offer some tips on how to improve NGS data management and sharing capabilities, in order to overcome this important challenge.
Define processes using digital solutions
Given the type and scale of data generated, NGS workflows present unique challenges in terms of process organisation, forcing labs to take different approaches to implementing NGS than with other lab techniques. Furthermore, technologies and capabilities continue to evolve, meaning that data management solutions must offer scalability and flexibility to keep pace with the field. Defining these processes so that they can be measured quantitatively or qualitatively is essential to support high-quality results and high levels of throughput.Strategies for implementing effective NGS workflows can include setting standards, in a similar way to the Six Sigma philosophy, which allows labs to organise and measure the success of daily activities. Standards can also help laboratories to set expectations with collaborators and customers to ensure reliable and consistent performance. Implementing a laboratory information management system (LIMS) to standardise operating procedures and set up sample tracking from step to step for example, can offer significant benefits for labs when managing processes, automation and data flow. Setting up processes in this way delivers added structure for maximum throughput, data quality and organisational efficiency.
Track and measure outcomes for effective workflows
Once a lab has developed processes by defining expectations for the delivery of quality results in specific workflows, benchmarks can be used to drive lab productivity and efficiency. Defining tasks using an integrated digital platform simplifies the monitoring of metrics for each step in a process. Automatically tracking and measuring the effectiveness of lab processes using real-world data in this way helps to optimise workflow efficiency and boost the quality of outcomes.NGS labs rely heavily on robust monitoring procedures in order to refine activities and services. This includes verifying and validating instrumentation, putting checks in place for sample and reagent preparation and use, and ensuring adherence to strict quality control protocols. While many labs simply want to get straight to the results, it is important to evaluate the details that underpin data generation. LIMS platforms enable easy collection, monitoring and assessment of a wealth of laboratory information to compare different samples and workflows, including quality control results, workflow updates, sequencing run metrics and bioinformatics pipeline flow. This can help labs optimise processes and deliver results in a more timely fashion.
Manage change and communicate value
Building the necessary infrastructure and acquiring the right resources to support evolving NGS workflows can be a substantial undertaking for any organisation, involving changes in working habits, responsibilities and operating procedures. Collecting metrics to monitor lab processes can help stakeholders adjust. By tracking performance data, labs can ensure that the NGS processes they put in place are robust and will produce the highest quality NGS results. LIMS offer an ideal platform to aggregate this data, and can offer a lab-wide view of workflow productivity that can subsequently be used to guide process improvements.In order to effectively manage infrastructure changes within labs, strong communication is key. Stakeholders who understand the goals of individual laboratories, and how existing workflows impact on users and the results they generate, will be better placed to drive more effective processes. In this way, LIMS can help to guide organisational changes by enabling the sharing of ideas, facilitating stronger collaboration, and ultimately informing better decision-making.
Transforming NGS for the future
Given the sheer volume of data that is generated by sequencing technologies, managing NGS workflows can be complex and time-consuming. However, by implementing digital solutions, such as the use of a LIMS to support data collection and organisation, labs can run more efficiently and overcome process bottlenecks. LIMS can also improve lab productivity by guiding process improvements, streamlining data management and enabling better communication. Ensuring effective systems for data management using LIMS platforms can help to simplify the application of NGS techniques, driving better understanding of diseases and developing potential treatments.
References:
- Zhang J et al. The impact of next-generation sequencing on genomics. J Genet Genomics. 2011;38(3):95-109. doi:10.1016/j.jgg.2011.02.003
- Behjati S, Tarpey PS. What is next generation sequencing? Arch Dis Child Educ Pract Ed. 2013;98(6):236-238. doi:10.1136/archdischild-2013-304340
- Park ST, Kim J. Trends in next-generation sequencing and a new era for whole genome sequencing. Int Neurourol J. 2016;20:76-83. doi:10.5213/inj.1632742.371
- Koboldt DC et al. The next-generation sequencing revolution and its impact on genomics. Cell. 2013;155(1). doi:10.1016/j.cell.2013.09.006
- Lam M et al. Precision oncology using a clinician-directed, tailored approach to molecular profiling. Asia Pac J Clin Oncol. 2018;14(1):84-90. doi:10.1111/ajco.12787
- García-García G et al. Assessment of the latest NGS enrichment capture methods in clinical context. Sci Rep. 2016;6. doi:10.1038/srep20948
Nicole Rose is the Genomics Application Manager for Platform for Science at Thermo Fisher Scientific.