University of Cambridge The Gurdon Institute

BEADS

SourceForge.net Logo

BEADS: Bias Elimination Algorithm for Deep Sequencing

BEADS Manual

Before running BEADS for normalization, users should have already mapped sequenced reads onto the reference genome and identified enrichment regions using software of their choice.

BEADS primarily works with files in GFF format. Please note that the chromosome name in the first column should not start with "chr" [sample file]. You can remove the leading "chr" in your file using, for example, the Unix utility program sed and just type in a terminal:

sed -e 's/^chr//g' in.with_leading_chr.gff > out.without_leading_chr.gff


Key Commands


extend beads   extend   [-threePrime INT]   [-fivePrime INT]   in.reads.gff   >   out.reads.ext.gff
Extend mapped reads at the three-prime or five-prime end. in.reads.gff is a mapped read file in GFF format. [sample input ; sample output]
Options:
-threePrime INT Number of basepairs to extend from the three-prime end. [0]
-fivePrime INT Number of basepairs to extend from the five-prime end. [0]
getGC beads   getGC   in.ref_genome.2bit   in.reads.gff   >   out.reads.plusGC.gff
Get GC-count for each read. in.ref_genome.2bit is the sequence file of the reference genome in 2bit format. [sample output]
mask beads   mask   in.locations.gff   in.reads.gff   >   out.reads.masked.gff
Retain only reads that are not overlapping with any of the specified locations.
gcHist beads   gcHist   in.reads.plusGC.gff   >   out.reads.gcHist.dat
Construct GC distribution using GC information in input file. The input file is the output of the getGC command. If you want to construct GC distrbution using only the noise elements, you will need to first mask out all reads that overlap with enrichment regions using the mask command. [sample output]
gcWeigh beads   gcWeigh   in.reads.plusGC.gff   in.reads.gcHist.dat   in.genome.gcHist.dat   >   out.reads.gcw.gff
Weigh reads using the GC distributions of the sequenced data and the genome. [sample output]
tagCount beads   tagCount   [-base INT]   in.reads.gcw.gff   >   out.reads.gcw.binned.gff
Collect tag-count information across the genome. [sample output]
Option:
-base INT Resolution in basepair at which tag counts are collected. [50]
mapCorr beads   mapCorr   in.mappability_track.binned.gff   in.reads.gcw.binned.gff   [-maxMap INT]   [-C INT]   >   out.gcw-map.binned.gff
Correct for mappability variations. Tag counts of the two input files (in.mappability_track.binned.gff and in.reads.gcw.binned.gff) must be collected at same positions (i.e. using the same resolution for the tagCount command). Instructions to generate a mappability track can be found here.
Options:
-maxMap INT Maximum mappability value. Should be set to 2*fragment_length. [400]
-C INT Mappability cutoff value. Locations with mappability lower than this value will be ignored. [100]
divide beads   divide   in.reads.gcw-map.binned.gff   in.control.gcw-map.binned.gff   [-log2]   [-F FLOAT]   [-C FLOAT]   >   out.reads.gcw-map-div.binned.gff
Divide sequenced data by control input data. Both input files should be GC and mappability corrected. If you wish to use several sets of control data altogether as a master control, each of the control data set will have to be GC and mappability corrected separately. Several tag-count tracks can be summed together into one track using the sumTagCounts command. Tag counts of all files must be collected at same positions.
Options:
-log2 Log transform fold change into log-2 scale.
-F FLOAT Force scaling factor to be applied on treatment (in.reads.gcw-map.binned.gff). Default value is control-to-treatment total tag-count ratio calculated empirically using the two input files.
-C FLOAT Input tag-count cutoff value. Locations with input control tag counts lower than this value will be ignored. [0]

Other Commands


sumTagCounts beads   sumTagCounts   in1.1.binned.gff   in.2.binned.gff   [in.3.binned.gff ...]   [-weight]   >   out.summed.binned.gff
Add up tag-count values at each position of multiple input files. Tag counts must be collected at same positions in all input files.
Option:
-weight Scale tag counts in each input file by its total tag counts relative to that of the first input file.
sampleGenomeGC beads   sampleGenomeGC   in.ref_genome.2bit   [-length INT]   [-step INT]   >   out.fragments.plusGC.gff
Sample fragments with GC information from the reference genome at regular intervals. in.ref_genome.2bit is the sequence file of the reference genome in 2bit format.
Options:
-length INT Length of fragments to generate in basepair. [200]
-step INT Distance between two fragments in basepair. [50]
shred beads   shred   in.ref_genome.fa   [-length INT]   [-step INT]   >   out.fragments.fastq
Shred the genome into fragments and output in FASTQ format.
Options:
-length INT Length of fragments to generate in basepair. [35]
-step INT Distance between two fragments in basepair. [1]
selectMappable beads   selectMappable   in.fragments.gff   in.mappable_reads.chr1.gff   >   out.mappable_fragments.chr1.gff
Retain only fragments whose start locations overlap with any mappable reads. This program only processes with one chromosome at a time. Fragments that belong to any chromosome other than the one specified in the mappable read file will be ignored. [sample inputs: fragment file ; mappable read file]
bed2gff beads   bed2gff   in.bed   >   out.gff
Convert BED to GFF format.
gff2bedGraph beads   gff2bedGraph   in.gff   >   out.bedGraph
Convert GFF to BedGraph format. Note that some genomic browsers like Affymetrix IgB can display BedGraph file but requires the file extension to be '.wig' instead of '.bedGraph'.
gff2wig beads   gff2wig   in.gff   >   out.wig
Convert GFF to WIG format.
sam2gff beads   sam2gff   [-C INT]   in.sam   >   out.gff
Convert SAM to GFF format.
Options:
-C INT Mapping quality cutoff. [10]


2bit sequence file

2bit is a file format that stores DNA sequences and supports efficient random-access. A 2bit file can be converted from a FASTA file using the faToTwoBit program from BLAT suite.

To download sequences of common genomes in 2bit format, click here.


Last updated 29 Mar 2011 by Nicole Cheung.

Valid HTML 4.01 Transitional

Copyright © 2010 Nicole Cheung (The Gurdon Institute, University of Cambridge)