The following steps suggest a standard way to normalize deep sequencing data. You can use different combinations of BEADS commands to achieve what you want.
Users are supposed to have sequence reads mapped and enrichment regions identified using software of their choice prior to applying BEADS for normalizing the data. We recommend users to visually examine the enrichment regions to ensure that most true signals are reasonably captured.
BEADS requires the user to generate a mappability track and a set of mappable fragments sampled across the genome under the same conditions used in read mapping. Click here for instructions.
1) Extend mapped reads to expected fragment size
For example, if reads are 35-mers and expected fragment size is 200 bp:
beads extend -threePrime 165 reads.gff > reads.ext.gff
2) Get GC-count for each extended read
beads getGC ref_genome.2bit reads.ext.gff > reads.ext.plusGC.gff
3) Estimate background GC distribution in reads
3.1) Retain only reads in background (i.e. those do not overlap with enrichment regions)
beads mask enrichment_locations.gff reads.ext.plusGC.gff > reads.bg.ext.plusGC.gff
3.2) Construct GC distribution using reads in background
beads gcHist reads.bg.ext.plusGC.gff > reads.bg.gcHist.dat
4) Estimate GC distribution in referece genome
4.1) Sample fragments across the entire reference genome and get GC information (See instructions)
4.2) Retain only fragments sampled in background regions corresponding to sequence reads
beads mask enrichment_locations.gff genome_fragments.plusGC.gff > genome.bg.plusGC.gff
4.3) Construct genomic GC distribution using fragments in background
beads gcHist genome.bg.plusGC.gff > genome.bg.gcHist.dat
5) Weigh each read according to its GC-count
beads gcWeigh reads.ext.plusGC.gff reads.bg.gcHist.dat genome.bg.gcHist.dat > reads.gcw.gff
6) Collect tag counts at regular intervals across the genome
beads tagCount -base 50 reads.gcw.gff > reads.gcw.binned.50bp.gff
7) Apply mappability adjustment
7.1) Prepare mappability track for reference genome (See instructions)
7.2) Adjust for mappability variations
beads mapCorr mappability_track.binned.50bp.binned.gff reads.gcw.binned.50bp.gff -maxMap 400 > reads.gcw-map.binned.50bp.gff
8) Divide by control data
If you wish to use several sets of control data altogether as a master control, each of the control data set will have to be GC and mappability corrected separately. Several tag-count tracks can be summed together into one track using the sumTagCounts command. Tag counts of all files must be collected at same positions.
beads divide reads.gcw-map.binned.50bp.gff control.gcw-map.binned.50bp.gff > reads.gcw-map-div.binned.50bp.gff
Copyright © 2010 Nicole Cheung (The Gurdon Institute, University of Cambridge)