ChIP-Seq

ChIP-Seq Giulio Pavesi University of Milano giulio.pavesi@unimi.it

Epigenetics • Modern experimental techniques and technologies allow for the genome-wide study of different types of histone modifications, shedding light on the role of each one

“ChIP” • If we have the “right” antibody, we can extract (“immunoprecipitate”) from living cells the protein of interest bound to the DNA • And - we can try to identify which were the DNA regions bound by the protein • Can be done for transcription factors • But can be done also for histones - and separately for each modification

After ChIP Identification of the DNA fragment bound by the protein PCR Microarray Hybridization Sequencing

Many cells- many copies of the same region bound by the protein

ChIP-Seq Histone ChIP TF ChIP

So - if we found that a region has been sequenced many times, then we can suppose that it was bound by the protein, but…

Only a short fragment of the extracted DNA region can be sequenced, at either or both ends (“single” vs “paired end” sequencing) for no more than 35 (before) / 50 (now) / 75 (now) bps Thus, original regions have to be “reconstructed” …and, once again, bioinformaticians can be of help…

The ChIP-Seq pipeline

“Peak finding”, in a perfect world Peaks: How tall? How wide? How much enriched?

“Peak finding” • The main issue: the DNA sample sequenced (apart from sequencing errors/artifacts) contains a lot of “noise” • Sample “contamination” - the DNA of the PhD student performing the experiment • DNA shearing is not uniform: open chromatin regions tend to be fragmented more easily and thus are more likely to be sequenced • Repetitive sequences might be artificially enriched due to inaccuracies in genome assembly • Amplification pushed too much: you see a single DNA fragment amplified, not enriched • As yet unknown problems, that anyway seem to produce “noisy” sequencings and screw the experiment up

Dealing with “noise” • A solution is to perform a “control” experiment, and use it to filter out the results: • “Input DNA”: a portion of the DNA sample removed before IP • “Mock IP DNA”: DNA obtained from IP without antibodies • “Non-specific IP DNA”: DNA obtained from IP using an antibody against a protein not known to be involved in DNA binding, e.g. immunoglobulin G • “Knock out IP DNA”: perform IP as in the main experiment, but on “knock out” cells that do not express the TF studied (cannot be done for histones…)

12 different methods have been introduced in less than 2 years - each one answers the questions in a different way and computes significance (p or q values) with different strategies They output the list of “enriched regions” from your experiment

Histone modifications at transcribed regions Read count (peak height) High Low Expression level

Insulator regions Enhancer regions

Detecting unannotated transcripts and TSSs

Not only methylations… • Wang et. al, Nature Genetics 40(7), 2008 • 18 histone acetylations • H2AK5ac • H2BK5ac • H2BK12ac • H2BK20ac • H2BK120ac • H3K4ac • H3K9ac • H3K14ac • H3K18ac • H3K23ac • H3K27ac…. • And intersection with methylation maps

“patterns” of modifications associated with promoters of genes

Histones in different cell types • Mikkelsen et. al., Nature 448, 553-560, 2007 • Chip-Seq in three mouse cell types: • Embryonic Stem (ES) cells • Neural Progenitor Cells (NPCs) • Embryonic Fibroblasts (MEFs) • Produced over 4G sequences • For histone modifications: • H3K4me3, H3K9me3, H3K27me3, H3K36me3, H4K20me3 • And • RNA polymerase II

Neural TF Housekeeping Neurogenesis TF Adipogenesis TF NP Marker gene H3K4me3+H3K27me3 is a “bivalent” chromatin mark, Poises genes for lineage specific activation or repression

Allele-specific modifications

Histone Data at the UCSC • The ENCODE project is producing whole-genome maps of • Transcription factors • DNA methylation • CTCF/insulators • PolII/III binding • Histone modifications • ..across several different cell lines • As data are produced, they are loaded as “tracks” accessible and retrievable through the UCSC genome browser

What next? • From the signature histone modifications, “characterize” genomic regions: • Promoter? • Enhancer? • Transcribed? • Repressed?

Cell-type-specific promoter and enhancer states and associated functional enrichments.

ChIP-Seq

ChIP-Seq

Presentation Transcript

ChIP-chip and ChIP-seq

RNA-Seq

ChIP-seq and its applications in GRN construction

Analysis of ChIP-Seq Data

ChIP-seq and related applications

ChIP seq

ChIP-seq Data Analysis

Chip – Seq Peak Calling in Galaxy

ChIP-seq

ChIP-seq analysis

Detecting enriched regions (Chip- seq , RIP- seq ) Statistical evaluation of enriched regions

Differential Principal Component Analysis (dPCA) for ChIP-seq

More on TF Motif Finding ChIP-chip / seq

ChIP-seq

ChIP-Seq: TB Example

Chromatin Immunoprecipitation DNA Sequencing (ChIP-seq)

ChIP-seq

SEQ

SEQ Test

ChIP-seq

clip seq