Before running BEADS for normalization, users should have already mapped sequenced reads onto the reference genome and identified enrichment regions using software of their choice.
BEADS primarily works with files in GFF format. Please note that the chromosome name in the first column should not start with "chr" [sample file]. You can remove the leading "chr" in your file using, for example, the Unix utility program sed and just type in a terminal:
sed -e 's/^chr//g' in.with_leading_chr.gff > out.without_leading_chr.gff
extend | beads extend [-threePrime INT] [-fivePrime INT] in.reads.gff > out.reads.ext.gff | |
Extend mapped reads at the three-prime or five-prime end. in.reads.gff is a mapped read file in GFF format. [sample input ; sample output] | ||
Options: | ||
-threePrime INT | Number of basepairs to extend from the three-prime end. [0] | |
-fivePrime INT | Number of basepairs to extend from the five-prime end. [0] | |
getGC | beads getGC in.ref_genome.2bit in.reads.gff > out.reads.plusGC.gff | |
Get GC-count for each read. in.ref_genome.2bit is the sequence file of the reference genome in 2bit format. [sample output] | ||
mask | beads mask in.locations.gff in.reads.gff > out.reads.masked.gff | |
Retain only reads that are not overlapping with any of the specified locations. | ||
gcHist | beads gcHist in.reads.plusGC.gff > out.reads.gcHist.dat | |
Construct GC distribution using GC information in input file. The input file is the output of the getGC command. If you want to construct GC distrbution using only the noise elements, you will need to first mask out all reads that overlap with enrichment regions using the mask command. [sample output] | ||
gcWeigh | beads gcWeigh in.reads.plusGC.gff in.reads.gcHist.dat in.genome.gcHist.dat > out.reads.gcw.gff | |
Weigh reads using the GC distributions of the sequenced data and the genome. [sample output] | ||
tagCount | beads tagCount [-base INT] in.reads.gcw.gff > out.reads.gcw.binned.gff | |
Collect tag-count information across the genome. [sample output] | ||
Option: | -base INT | Resolution in basepair at which tag counts are collected. [50] |
mapCorr | beads mapCorr in.mappability_track.binned.gff in.reads.gcw.binned.gff [-maxMap INT] [-C INT] > out.gcw-map.binned.gff | |
Correct for mappability variations. Tag counts of the two input files (in.mappability_track.binned.gff and in.reads.gcw.binned.gff) must be collected at same positions (i.e. using the same resolution for the tagCount command). Instructions to generate a mappability track can be found here. | ||
Options: | ||
-maxMap INT | Maximum mappability value. Should be set to 2*fragment_length. [400] | |
-C INT | Mappability cutoff value. Locations with mappability lower than this value will be ignored. [100] | |
divide | beads divide in.reads.gcw-map.binned.gff in.control.gcw-map.binned.gff [-log2] [-F FLOAT] [-C FLOAT] > out.reads.gcw-map-div.binned.gff | |
Divide sequenced data by control input data. Both input files should be GC and mappability corrected. If you wish to use several sets of control data altogether as a master control, each of the control data set will have to be GC and mappability corrected separately. Several tag-count tracks can be summed together into one track using the sumTagCounts command. Tag counts of all files must be collected at same positions. | ||
Options: | ||
-log2 | Log transform fold change into log-2 scale. | |
-F FLOAT | Force scaling factor to be applied on treatment (in.reads.gcw-map.binned.gff). Default value is control-to-treatment total tag-count ratio calculated empirically using the two input files. | |
-C FLOAT | Input tag-count cutoff value. Locations with input control tag counts lower than this value will be ignored. [0] |
sumTagCounts | beads sumTagCounts in1.1.binned.gff in.2.binned.gff [in.3.binned.gff ...] [-weight] > out.summed.binned.gff | |
Add up tag-count values at each position of multiple input files. Tag counts must be collected at same positions in all input files. | ||
Option: | ||
-weight | Scale tag counts in each input file by its total tag counts relative to that of the first input file. | |
sampleGenomeGC | beads sampleGenomeGC in.ref_genome.2bit [-length INT] [-step INT] > out.fragments.plusGC.gff | Sample fragments with GC information from the reference genome at regular intervals. in.ref_genome.2bit is the sequence file of the reference genome in 2bit format. |
Options: | ||
-length INT | Length of fragments to generate in basepair. [200] | |
-step INT | Distance between two fragments in basepair. [50] | |
shred | beads shred in.ref_genome.fa [-length INT] [-step INT] > out.fragments.fastq | |
Shred the genome into fragments and output in FASTQ format. | ||
Options: | ||
-length INT | Length of fragments to generate in basepair. [35] | |
-step INT | Distance between two fragments in basepair. [1] | |
selectMappable | beads selectMappable in.fragments.gff in.mappable_reads.chr1.gff > out.mappable_fragments.chr1.gff | |
Retain only fragments whose start locations overlap with any mappable reads. This program only processes with one chromosome at a time. Fragments that belong to any chromosome other than the one specified in the mappable read file will be ignored. [sample inputs: fragment file ; mappable read file] | ||
bed2gff | beads bed2gff in.bed > out.gff | |
Convert BED to GFF format. | ||
gff2bedGraph | beads gff2bedGraph in.gff > out.bedGraph | |
Convert GFF to BedGraph format. Note that some genomic browsers like Affymetrix IgB can display BedGraph file but requires the file extension to be '.wig' instead of '.bedGraph'. | ||
gff2wig | beads gff2wig in.gff > out.wig | |
Convert GFF to WIG format. | ||
sam2gff | beads sam2gff [-C INT] in.sam > out.gff | |
Convert SAM to GFF format. | ||
Options: | ||
-C INT | Mapping quality cutoff. [10] |
2bit is a file format that stores DNA sequences and supports efficient random-access. A 2bit file can be converted from a FASTA file using the faToTwoBit program from BLAT suite.
To download sequences of common genomes in 2bit format, click here.
Copyright © 2010 Nicole Cheung (The Gurdon Institute, University of Cambridge)