Installation

How to install the wrapper scripts:

# Create a folder for scripts.
mkdir -p ~/bin

# Install the wrappers for the EMBOSS aligners.
curl http://data.biostarhandbook.com/align/global-align.sh > ~/bin/global-align.sh
curl http://data.biostarhandbook.com/align/local-align.sh > ~/bin/local-align.sh

# Make the scripts executable.
chmod +x ~/bin/*-align.sh

global-align.sh

Usage:

global-align.sh THISLINE ISALIGNED

or on files:

global-align.sh file1.fa file2.fa

in both cases you may pass additional parameters like:

global-align.sh THISLINE ISALIGNED -gapopen 9 -gapextend 5

Tip: Replace the tool needle with the tool named stretcher for a less rigorous but better performing algorithm.

local-align.sh

Usage:

local-align.sh THISLINE ISALIGNED

or on files:

local-align.sh file1.fa file2.fa

Tip: Replace the tool water with the tool named matcher for a less rigorous but better performing algorithm.

perfect_coverage.py

Generates perfect coverage data:

# Get the script
curl -O http://data.biostarhandbook.com/align/perfect_coverage.py

# Usage:
cat  genome.fa | python perfect_coverage.py

Produces the files R1.fq and R2.fq

bowtie2-parameter-sweep.sh

Sweeps through the bowtie2 parameters space and attempts to evaluate the parameters' effect on alignment rates.

Outputs the number of concordant, discordant and one time alignments.

Based on a script by Fan Song submitted for an assignment. Fan Song writes:

The parameters that I think will influence the results the most are -D, -R, -N, -L, and -i."

The range for these parameters are

for a total of `10365`` combinations:

Get the script

curl -O http://data.biostarhandbook.com/align/bowtie2-parameter-sweep.sh

Usage:

bash bowtie2-parameter-sweep.sh

or to set an error rate of 10%:

bash bowtie2-parameter-sweep.sh 0.1

simulate-exprimental-data.sh

This script simulates a sequencing run from an accession number.

You cam mutate the genome genome by hand - or as it is in the script. Then run this tool generate alignments and visualize these alignments in IGV.

A typical run will be:

*** Obtain data for AF086833 and save it as refs/REFERENCE.fa.
*** Index refs/REFERENCE.fa with several tools.
*** Mutating refs/REFERENCE.fa to refs/GENOME.fa
*** Generating simulated reads from: refs/GENOME.fa
*** Aligning the reads into: results.bam