University of Cambridge The Gurdon Institute

BEADS

SourceForge.net Logo

BEADS: Bias Elimination Algorithm for Deep Sequencing


BEADS requires the user to generate a mappability track and a set of mappable fragments sampled across the genome for GC distribution. These have to be created using the same software and criteria for read mapping. You only need to do these once unless the read mapping conditions have changed.


How to generate a mappability track

1) Create an exhaustive set of simulated reads

To generate a mappability track for a given chromosome, we first need to simulate short sequence reads by shredding the whole chromosomal sequence into overlapping fragments at 1 bp resolution. The length of the simulated reads should be the same as the length of experimental sequence reads. For example, if the sequence reads are generally 35-bp long:

beads shred ref_genome.fa -length 35 -step 1 > genome_reads.fastq

2) Read mapping

After getting simulated reads in FASTQ format, you can treat these reads with the same procedures as you would treat any experimental sequence reads and map them onto the reference genome using the same software and parameters of your choice.

The resulting mapped read file should be converted to GFF format if necessary [sample mapped read file].

3) Extend mappable reads

Each read can be either positive or negative stranded in reality, therefore all mapped reads should be extended to the expected fragment size (e.g. 200 bp) separately in each direction:

beads extend -threePrime 165 mapped_reads.gff > mapped_reads.ext.forward.gff

beads extend -fivePrime 165 mapped_reads.gff > mapped_reads.ext.reverse.gff

Then add the extended reads together:

cat mapped_reads.ext.forward.gff mapped_reads.ext.reverse.gff > mapped_reads.ext.gff

The resulting file contains all fragments that can possibly be mapped onto the genome if present in the data.

4) Collect mappability values at regular intervals across the genome

Mappability can then be quantified by the number of overlapping fragments at a given genomic location:

beads tagCount -base 50 mapped_reads.ext.gff > mappability_track.binned.50bp.gff




How to sample mappable fragments across reference genome

1) Sample fragments across genome

To estimate the genomic GC distribution, we make use of the nucleotide composition information of fragments sampled at regular intervals across the entire genome. For example, to sample fragments of 200 bp in length at a 50-bp interval:

beads sampleGenomeGC ref_genome.2bit -length 200 -step 50 > genome_fragments.plusGC.gff

2) Select mappable fragments

After sampling fragments from the genome, we also need to make sure that all sampled fragments are mappable so that they represent the parts of the genome that are available in the mapping procedure. When you construct the mappability track, you will have a mapped read file that contains all mappable reads for each chromosome (See above, Step 2). We can then select only fragments whose start locations overlap with mappable reads, one chromosome at a time:

beads selectMappable genome_fragments.plusGC.gff mappable_reads.chr1.gff > genome_mappable_fragments.plusGC.chr1.gff


Last updated 29 Mar 2011 by Nicole Cheung.

Valid HTML 4.01 Transitional

Copyright © 2010 Nicole Cheung (The Gurdon Institute, University of Cambridge)