Installation

How to install the wrapper scripts:

# Create a folder for scripts.
mkdir -p ~/bin

# Install the wrappers for the EMBOSS aligners.
curl http://data.biostarhandbook.com/align/global-align.sh > ~/bin/global-align.sh
curl http://data.biostarhandbook.com/align/local-align.sh > ~/bin/local-align.sh

# Make the scripts executable.
chmod +x ~/bin/*-align.sh

global-align.sh

Usage:

global-align.sh THISLINE ISALIGNED

or on files:

global-align.sh file1.fa file2.fa

in both cases you may pass additional parameters like:

global-align.sh THISLINE ISALIGNED -gapopen 9 -gapextend 5

Tip: Replace the tool needle with the tool named stretcher for a less rigorous but better performing algorithm.

local-align.sh

Usage:

local-align.sh THISLINE ISALIGNED

or on files:

local-align.sh file1.fa file2.fa

Tip: Replace the tool water with the tool named matcher for a less rigorous but better performing algorithm.

perfect_coverage.py

Generates perfect coverage data:

# Get the script
curl -O http://data.biostarhandbook.com/align/perfect_coverage.py

# Usage:
cat  genome.fa | python perfect_coverage.py

Produces the files R1.fq and R2.fq

bowtie2-parameter-sweep.sh

Sweeps through the bowtie2 parameters space and attempts to evaluate the parameters' effect on alignment rates.

Outputs the number of concordant, discordant and one time alignments.

Based on a script by Fan Song submitted for an assignment. Fan Song writes:

The parameters that I think will influence the results the most are -D, -R, -N, -L, and -i."

The range for these parameters are

D: [5, 15], give up extending after failed extends in a row (15)
R: [1, 3], for reads w/ repetitive seeds, try sets of seeds (2)
N: [0, 1], max number of mismatches in seed alignment; can be 0 or 1 (0)
L: [15, 25], length of seed substrings; must be >3, <32 (22)
i1: [0, 1], interval between seed substrings w/r/t read len (S,1,1.15)
i2: [0.5, 2.5] with a increment of 0.25

for a total of `10365`` combinations:

Get the script

curl -O http://data.biostarhandbook.com/align/bowtie2-parameter-sweep.sh

Usage:

bash bowtie2-parameter-sweep.sh

or to set an error rate of 10%:

bash bowtie2-parameter-sweep.sh 0.1

simulate-exprimental-data.sh

This script simulates a sequencing run from an accession number.

You cam mutate the genome genome by hand - or as it is in the script. Then run this tool generate alignments and visualize these alignments in IGV.

A typical run will be:

*** Obtain data for AF086833 and save it as refs/REFERENCE.fa.
*** Index refs/REFERENCE.fa with several tools.
*** Mutating refs/REFERENCE.fa to refs/GENOME.fa
*** Generating simulated reads from: refs/GENOME.fa
*** Aligning the reads into: results.bam