University of Cambridge The Gurdon Institute

BEADS

SourceForge.net Logo

BEADS: Bias Elimination Algorithm for Deep Sequencing

Normalize your data with BEADS

The following steps suggest a standard way to normalize deep sequencing data. You can use different combinations of BEADS commands to achieve what you want.

Users are supposed to have sequence reads mapped and enrichment regions identified using software of their choice prior to applying BEADS for normalizing the data. We recommend users to visually examine the enrichment regions to ensure that most true signals are reasonably captured.


Prerequisites

BEADS requires the user to generate a mappability track and a set of mappable fragments sampled across the genome under the same conditions used in read mapping. Click here for instructions.


Steps to normalize sequencing data

1) Extend mapped reads to expected fragment size

    For example, if reads are 35-mers and expected fragment size is 200 bp:

beads extend -threePrime 165 reads.gff > reads.ext.gff

2) Get GC-count for each extended read

beads getGC ref_genome.2bit reads.ext.gff > reads.ext.plusGC.gff

3) Estimate background GC distribution in reads

    3.1) Retain only reads in background (i.e. those do not overlap with enrichment regions)

beads mask enrichment_locations.gff reads.ext.plusGC.gff > reads.bg.ext.plusGC.gff

    3.2) Construct GC distribution using reads in background

beads gcHist reads.bg.ext.plusGC.gff > reads.bg.gcHist.dat

4) Estimate GC distribution in referece genome

    4.1) Sample fragments across the entire reference genome and get GC information (See instructions)

    4.2) Retain only fragments sampled in background regions corresponding to sequence reads

beads mask enrichment_locations.gff genome_fragments.plusGC.gff > genome.bg.plusGC.gff

    4.3) Construct genomic GC distribution using fragments in background

beads gcHist genome.bg.plusGC.gff > genome.bg.gcHist.dat

5) Weigh each read according to its GC-count

beads gcWeigh reads.ext.plusGC.gff reads.bg.gcHist.dat genome.bg.gcHist.dat > reads.gcw.gff

6) Collect tag counts at regular intervals across the genome

beads tagCount -base 50 reads.gcw.gff > reads.gcw.binned.50bp.gff

7) Apply mappability adjustment

    7.1) Prepare mappability track for reference genome (See instructions)

    7.2) Adjust for mappability variations

beads mapCorr mappability_track.binned.50bp.binned.gff reads.gcw.binned.50bp.gff -maxMap 400 > reads.gcw-map.binned.50bp.gff

8) Divide by control data

    If you wish to use several sets of control data altogether as a master control, each of the control data set will have to be GC and mappability corrected separately. Several tag-count tracks can be summed together into one track using the sumTagCounts command. Tag counts of all files must be collected at same positions.

beads divide reads.gcw-map.binned.50bp.gff control.gcw-map.binned.50bp.gff > reads.gcw-map-div.binned.50bp.gff


Last updated 29 Mar 2011 by Nicole Cheung.

Valid HTML 4.01 Transitional

Copyright © 2010 Nicole Cheung (The Gurdon Institute, University of Cambridge)