1 / 15

Based on: MicroRNA identification based on sequence and structure alignment

A Modified miRAlign Approach to Finding MicroRNAs in the Chicken Genome. Based on: MicroRNA identification based on sequence and structure alignment. Presented by - Neeta Jain, Nehar Arora, and Jeff Bonis. Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li.

nailah
Download Presentation

Based on: MicroRNA identification based on sequence and structure alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Modified miRAlign Approach to Finding MicroRNAs in the Chicken Genome Based on:MicroRNA identification based on sequence andstructure alignment Presented by - Neeta Jain, Nehar Arora, and Jeff Bonis Xiaowo Wang†, Jing Zhang†, Fei Li, Jin Gu, Tao He, Xuegong Zhang and Yanda Li

  2. Outline • Introduction • Motivation • Methods • Results • Conclusion

  3. Introduction • What are miRNAs and why are they important? • miRNAs are ~22 nt long non-coding RNAs • They are derived from their ~70 nt precursors, which typically have a hairpin structure Importance of miRNAs: • They are found to regulate the expression of target genes via complementary base pair interactions.

  4. Motivation • miRNAs are short (~22 nt) and more conserved in their secondary structure than in primary • Hence, conventional sequence alignment methods such as BLAST can only find relatively close homologues • There are replaceable steps of the miRAlign, and the increase/decrease in performance should be evaluated • Prof. Joan at the Delaware BioTechnology Institute is working on identifying miRNAs in the chicken genome, but the secondary structure information has not yet been exploited

  5. Methods • Data • Reference sets • mirRBase Registry Version 8.0 (http://microrna.sanger.ac.uk/sequences) • MicroRNA Registry Version 5.0 was previously used • 1300 animal miRNAs from six species and their precursors(1104) composed our raw training set Train_All. • Train_Sub_1 : All six animal miRNAs except those from G. gallus • Train_Sub_2: All six animal miRNAs except those from G. gallus and C.elegans • Genomic sequences • Only the chicken genome (G. gallus) was used.

  6. Methods (contd)

  7. Methods (cont.) • Preprocessing • Known precursors from training set are used to BLAT (instead of BLAST) against the chicken genome • The resulting candidate pre-miRNAs are used as the potential precursor miRNAs • Experienced difficulty extracting flanking sequences

  8. Experiment (contd) • “Modified” miRAlign (1.) Secondary Structure Prediction • Both the candidate sequence and it’s reverse complement are analyzed by RNAfold to predict hairpins. • Alternatively, sequences were also analyzed in parallel by mFold to predict their secondary structures. • Only hairpins with MFE lower than -20 kcal/mol are retained. (2.) Pairwise sequence alignment • Sequences from previous step are aligned pairwise to all the ~22 nt known miRNA sequences from the training set • Sequence similarity score between the candidate and known mature miRNAs is calculated by CLUSTALW. • If the score exceeds a user-defined threshold (default=70), then the candidate to known miRNA pairs are kept for further analysis

  9. Methods (contd) (3.) Checking miRNA’s position on stemloop • Should not locate on terminal loop of hairpin • Omitted due to unavailability of the offset of the known mature miRNAs in the pre-miRNAs: • Should locate on the same arm of hairpin • Position of potential miRNA on hairpin should not differ too much from it’s known homologues (chosen delta_len :- 15)

  10. Methods (contd) (4.) RNA secondary structure alignment • RNAforester computes pairwise structure alignment and gives similarity score • Score is a summation of all base (base pair) match (insertion, deletion). • Normalized similarity score of structure C and m is given as: • An alternative structure alignment program, SimTree, transforms the structures into labeled trees then computes the distance between them and assigns a normalized score.

  11. Methods (contd) (5.) Total similarity score After aligning all potential homologue pairs, a total similarity score (tss) is assigned to each candidate sequence. Where, C- candidate sequence ; R – set composed of all C’s

  12. Results • Search for miRNAs in the chicken genome proved somewhat difficult. BLAT was used instead of BLAST because of time restraints • For secondary structure prediction, mFold predicted a lower MFE than RNAfold, on average • T-Coffee could be used for pairwise sequence alignment instead of CLUSTALW, but is about N-times slower

  13. Results (cont.) • Requirements for the position of mature miRNA on the stem loop were reduced • Only the non-loop locating condition was satisfied • Needed orientation (5’ vs. 3’) of known pre-miRNAs to check arm location and hairpin length • Previously found that over 97.5% of known animal miRNAs met the non-stringent cutoff hairpin length difference of 15 • For secondary structure alignments, SimTree was used along with the original RNAforester. • SimTree uses similar tree alignment methods to RNAforester

  14. Conclusion • Final results are still under analysis • Future work: • Perform primary sequence steps first, then secondary structure filter steps • Primary sequence filters provide a greater reduction in the candidate set then the secondary structure-based filters • Additional seondary structure prediction, primary sequence alignment, and secondary alignment tools could be evaluated • Different combinations of these tools could also lead to better performance • Tertiary structure tools could supplement/replace some of the filtering steps

  15. THANK YOU Questions ??

More Related