How to install the wrapper scripts:
# Create a folder for scripts.
mkdir -p ~/bin
# Install the wrappers for the EMBOSS aligners.
curl http://data.biostarhandbook.com/align/global-align.sh > ~/bin/global-align.sh
curl http://data.biostarhandbook.com/align/local-align.sh > ~/bin/local-align.sh
# Make the scripts executable.
chmod +x ~/bin/*-align.sh
Usage:
global-align.sh THISLINE ISALIGNED
or on files:
global-align.sh file1.fa file2.fa
in both cases you may pass additional parameters like:
global-align.sh THISLINE ISALIGNED -gapopen 9 -gapextend 5
Tip: Replace the tool needle
with the tool named stretcher
for a less rigorous but better performing algorithm.
Usage:
local-align.sh THISLINE ISALIGNED
or on files:
local-align.sh file1.fa file2.fa
Tip: Replace the tool water
with the tool named matcher
for a less rigorous but better performing algorithm.
Generates perfect coverage data:
# Get the script
curl -O http://data.biostarhandbook.com/align/perfect_coverage.py
# Usage:
cat genome.fa | python perfect_coverage.py
Produces the files R1.fq
and R2.fq
Sweeps through the bowtie2 parameters space and attempts to evaluate the parameters' effect on alignment rates.
Outputs the number of concordant, discordant and one time alignments.
Based on a script by Fan Song submitted for an assignment. Fan Song writes:
The parameters that I think will influence the results the most are -D, -R, -N, -L, and -i."
The range for these parameters are
D: [5, 15]
, give up extending after R: [1, 3]
, for reads w/ repetitive seeds, try N: [0, 1]
, max number of mismatches in seed alignment; can be 0 or 1 (0)L: [15, 25]
, length of seed substrings; must be >3, <32 (22)i1: [0, 1]
, interval between seed substrings w/r/t read len (S,1,1.15)i2: [0.5, 2.5]
with a increment of 0.25for a total of `10365`` combinations:
Get the script
curl -O http://data.biostarhandbook.com/align/bowtie2-parameter-sweep.sh
Usage:
bash bowtie2-parameter-sweep.sh
or to set an error rate of 10%
:
bash bowtie2-parameter-sweep.sh 0.1
This script simulates a sequencing run from an accession number.
You cam mutate the genome genome by hand - or as it is in the script. Then run this tool generate alignments and visualize these alignments in IGV.
A typical run will be:
*** Obtain data for AF086833 and save it as refs/REFERENCE.fa.
*** Index refs/REFERENCE.fa with several tools.
*** Mutating refs/REFERENCE.fa to refs/GENOME.fa
*** Generating simulated reads from: refs/GENOME.fa
*** Aligning the reads into: results.bam