My latest research focuses on mRNA life cycle. Until now, decay and
translation of canonical mRNAs were mostly considered two distinct processes
and a key assumption behind gene expression profiling methods (RNA-Seq,
microarrays and RT-PCR) was that canonical mRNAs exist, largely, as
full-length molecules in cells. By developing Akron-Seq, a novel, in-vivo,
approach that simultaneously captures the native 3’ and 5’ ends of capped and
polyadenylated RNAs respectively, we found that this is not accurate and
canonical human mRNAs are subject to repeated, cotranslational,
ribosome-phased, endonucleolytic cuts in a process that we termed
ribothrypsis. By integrating orthogonal large-scale datasets, we found that
ribothrypsis is initiated by a ribosome stall that triggers an endonucleolytic
cleavage and results in propagating cleavages by upstream ribosomes. Our
results uncovered RNA G-quadruplexes as ribothrypsis triggers, and showed that
ribothrypsis is a conserved process with a key regulatory role in mRNA
translation, in humans. Also, mRNA fragments, residuals of ribothrypsis, are
abundant in living cells with critical implications for the interpretation of
RNA experiments, such as RNA-Seq. In light of this discovery, we will need to
redefine what the functional mRNA molecule is, and discover how the function
of regulatory elements on mRNAs is affected.
During my PhD, I focused on miRNA post-transcriptional gene regulation. First, I wanted to understand how miRNAs recognize their target genes and then, using this information, decipher miRNA function in cells. I developed one of the most accurate miRNA target prediction programs, DIANA-microT, which was found as the most precise program in an independent evaluation published in Nature. The program was integrated into a complete suite and a web server which even after several years, attracts hundreds of users on a daily basis. I later applied the developed tools on the prediction of viral miRNA host gene targets and associated miRNA editing to Epstein-Barr-viral latency. In parallel, I used systems biology to integrate miRNA targeting into biological pathways and associate gene expression profiles with miRNA function. Once Ago2 high-throughput CLIP-sequencing data became available, changing the miRNA landscape and revealing in-vivo miRNA binding sites, I was among the first to use machine learning on the novel datasets to decipher the miRNA binding determinants in a data-driven approach. Using this, I managed for the first time, to successfully integrate miRNA targeting in the coding sequence of mRNAs. To associate miRNA targeting to function, I mined and integrated large-scale sequencing and bibliographic data into TarBase, one of the most widely accessed miRNA resources, globally. The computational tools I developed, had and still have a huge impact in the community, have been cited by thousands of researchers and are accessed by hundreds of users daily.
As a postdoctoral fellow at Penn I employed novel computational approaches to design and analyze state of the art RNA-Seq and in-vivo cross-linking of RNA-protein complexes coupled with sequencing (CLIP-Seq) datasets to study small RNAs and RNA binding proteins. In the lab we followed a combined computational and experimental approach and were among the first labs to successfully apply CLIP-Seq. Using this protocol, I studied the biogenesis and function of piRNAs whose function in the germline is critical to protect genome integrity by silencing transposable elements. Developing advanced analysis tools and integrating orthogonal high-dimensional datasets, I revealed the mechanisms governing piRNA biogenesis, showed the function of Mili, Miwi, MOV10L1 and BmPAPI in the process and identified their critical role in spermiogenesis. Applying the tools I developed on deep small RNA immunoprecipitation data, I revealed a pre-miRNA surveillance system that is critical for the quality control of miRNA synthesis. Recently, in an article in Nature, we revealed an unexpected function of the Piwi ribonucleoprotein complexes that is paramount for germ cell specification. We revealed a stochastic, sequence-dependent but not sequence-specific piRNA adhesion mechanism that traps maternally deposited mRNAs to the germ plasm. In parallel, I worked to identify the role of RNA binding proteins in Amyotrophic Lateral Sclerosis (ALS) and employed RNA-Seq, CLIP-Seq and Ribo-Seq to reveal that the RNA binding protein FUS, implicated in ALS pathology, regulates genes coding for RNA binding proteins in neurons by binding to their highly conserved introns. Also, by identifying the in-vivo TAF15 RNA binding targets I revealed the effect of TAF15 on the expression and splicing of the neuronal transcriptome. In the context of the described works, I developed an efficient open source programming framework and an advanced analysis suite named CLIPSeqTools that can be used as the basis for the design and development of advanced analysis tools for high-throughput sequencing datasets.