Panagiotis Moulos Research Group

Next Generation Sequencing (NGS) is considered the most promising technology for the achievement of the long discussed goal of personalized medicine, as it can provide the sequence of whole genomes and many other genomic insights in short times with rapidly decreasing costs. RNA-Seq, an NGS technique for the high-resolution measurement of gene expression patterns is gradually becoming the standard tool for transcriptomic expression studies in biological research by quickly replacing older genomic methods, as it yields wider dynamic range of measurements and additional information regarding genomic events such as alternative gene splicing, or the detection of alternative isoform expression. At the same time, ChIP-Seq, which measures massive DNA-protein interactions has become the standard technique for the study of transcription factor profiles as well as epigenetic modifications. Other NGS applications include the detection of genome structure variations related to evolutionary mechanisms, genetic diseases or cancer through Whole Exome (WES) and Whole Genome (WGS) sequencing.

NGS comprises one of the most intensive high-throughput techniques in biomedical research, yielding vast amounts of raw data which require special management and statistical analysis. Although considerable progress has been recorded in the development of statistical algorithms for all the aforementioned techniques, little effort has been devoted to combining the advantages of individual computational algorithms from each technique (RNA-Seq, ChIP-Seq, WES, WGS) towards integrated results which can more efficiently reflect true experimental outcomes. Such outcomes include among others DNA-protein interaction sites of high trust, the derivation of robust differentially altered gene lists and the derivation of genetic variations characterized by both higher statistical power and lower numbers of false positives and false negatives. As a result, a significant amount of research time is spent in comparing results from different methods to identify which one fits better to the data. However, each algorithm has advantages and disadvantages, without clear winners for all experimental setups. This trend contributes to the ever-increasing repertoire of computational methods, without focusing neither on usability nor sustainability, leading to a graveyard of otherwise valuable algorithms and leaving biomedical researchers more and more frustrated. This confusing landscape is further enriched by the introduction of recent advances such as single-cell sequencing which render fast turnover times in data analysis issues more imperative than ever.