# Installation How to install the wrapper scripts: # Create a folder for scripts. mkdir -p ~/bin # Install the wrappers for the EMBOSS aligners. curl http://data.biostarhandbook.com/align/global-align.sh > ~/bin/global-align.sh curl http://data.biostarhandbook.com/align/local-align.sh > ~/bin/local-align.sh # Make the scripts executable. chmod +x ~/bin/*-align.sh # global-align.sh Usage: global-align.sh THISLINE ISALIGNED or on files: global-align.sh file1.fa file2.fa in both cases you may pass additional parameters like: global-align.sh THISLINE ISALIGNED -gapopen 9 -gapextend 5 Tip: Replace the tool `needle` with the tool named `stretcher` for a less rigorous but better performing algorithm. # local-align.sh Usage: local-align.sh THISLINE ISALIGNED or on files: local-align.sh file1.fa file2.fa Tip: Replace the tool `water` with the tool named `matcher` for a less rigorous but better performing algorithm. # perfect_coverage.py Generates perfect coverage data: # Get the script curl -O http://data.biostarhandbook.com/align/perfect_coverage.py # Usage: cat genome.fa | python perfect_coverage.py Produces the files `R1.fq` and `R2.fq` # bowtie2-parameter-sweep.sh Sweeps through the bowtie2 parameters space and attempts to evaluate the parameters' effect on alignment rates. Outputs the number of concordant, discordant and one time alignments. Based on a script by Fan Song submitted for an assignment. Fan Song writes: >The parameters that I think will influence the results the most > are -D, -R, -N, -L, and -i." The range for these parameters are * `D: [5, 15]`, give up extending after failed extends in a row (15) * `R: [1, 3]`, for reads w/ repetitive seeds, try sets of seeds (2) * `N: [0, 1]`, max number of mismatches in seed alignment; can be 0 or 1 (0) * `L: [15, 25]`, length of seed substrings; must be >3, <32 (22) * `i1: [0, 1]`, interval between seed substrings w/r/t read len (S,1,1.15) * `i2: [0.5, 2.5]` with a increment of 0.25 for a total of `10365`` combinations: Get the script curl -O http://data.biostarhandbook.com/align/bowtie2-parameter-sweep.sh Usage: bash bowtie2-parameter-sweep.sh or to set an error rate of `10%`: bash bowtie2-parameter-sweep.sh 0.1 # simulate-exprimental-data.sh This script simulates a sequencing run from an accession number. You cam mutate the genome genome by hand - or as it is in the script. Then run this tool generate alignments and visualize these alignments in IGV. A typical run will be: *** Obtain data for AF086833 and save it as refs/REFERENCE.fa. *** Index refs/REFERENCE.fa with several tools. *** Mutating refs/REFERENCE.fa to refs/GENOME.fa *** Generating simulated reads from: refs/GENOME.fa *** Aligning the reads into: results.bam