BEADS requires the user to generate a mappability track and a set of mappable fragments sampled across the genome for GC distribution. These have to be created using the same software and criteria for read mapping. You only need to do these once unless the read mapping conditions have changed.
To generate a mappability track for a given chromosome, we first need to simulate short sequence reads by shredding the whole chromosomal sequence into overlapping fragments at 1 bp resolution. The length of the simulated reads should be the same as the length of experimental sequence reads. For example, if the sequence reads are generally 35-bp long:
beads shred ref_genome.fa -length 35 -step 1 > genome_reads.fastq
After getting simulated reads in FASTQ format, you can treat these reads with the same procedures as you would treat any experimental sequence reads and map them onto the reference genome using the same software and parameters of your choice.
The resulting mapped read file should be converted to GFF format if necessary [sample mapped read file].
Each read can be either positive or negative stranded in reality, therefore all mapped reads should be extended to the expected fragment size (e.g. 200 bp) separately in each direction:
beads extend -threePrime 165 mapped_reads.gff > mapped_reads.ext.forward.gff
beads extend -fivePrime 165 mapped_reads.gff > mapped_reads.ext.reverse.gff
Then add the extended reads together:
cat mapped_reads.ext.forward.gff mapped_reads.ext.reverse.gff > mapped_reads.ext.gff
The resulting file contains all fragments that can possibly be mapped onto the genome if present in the data.
Mappability can then be quantified by the number of overlapping fragments at a given genomic location:
beads tagCount -base 50 mapped_reads.ext.gff > mappability_track.binned.50bp.gff
To estimate the genomic GC distribution, we make use of the nucleotide composition information of fragments sampled at regular intervals across the entire genome. For example, to sample fragments of 200 bp in length at a 50-bp interval:
beads sampleGenomeGC ref_genome.2bit -length 200 -step 50 > genome_fragments.plusGC.gff
After sampling fragments from the genome, we also need to make sure that all sampled fragments are mappable so that they represent the parts of the genome that are available in the mapping procedure. When you construct the mappability track, you will have a mapped read file that contains all mappable reads for each chromosome (See above, Step 2). We can then select only fragments whose start locations overlap with mappable reads, one chromosome at a time:
beads selectMappable genome_fragments.plusGC.gff mappable_reads.chr1.gff > genome_mappable_fragments.plusGC.chr1.gff
Copyright © 2010 Nicole Cheung (The Gurdon Institute, University of Cambridge)