Input Files

Scripture modules take alignment bam/sam files as input. Any alignment software, for example, Tophat or GSNAP can be used for this purpose. BAM or SAM files must be sorted and indexed in order to be processed. You can use samtools or Picard for tools to sort and index SAM/BAM files.

Example commands:

Filter out reads that map to rRNA from the fastq files:

Make a bowtie index of the ribosomal file for the species of interest.

bowtie-build ribosomal_RNA.fa rRNA

Use the above file to filter out all reads that map at least once to a ribosomal sequence.

bowtie -q -k 1 --best --maxins 1000 --un input_filtered_rRNA.fq rRNA -1 input.left.fq -2 input.right.fq --sam input_ribosomal_mappings.sam

Use the filtered fastq files from above to align to the mouse genome using Tophat version 1.4.1

(Optional: use a gtf file and a transcriptome index to guide Tophat)

Genome version: mm9

tophat --mate-inner-dist 250 --prefer-multihits --GTF genes.gtf --max-multihits 15 --transcriptome-index genes_index --library-type fr-secondstrand --segment-length 20 --output-dir tophat_to_genome mm9.nonrandom.bowtie input_filtered_rRNA_1.fq input_filtered_rRNA_2.fq

Generate a paired end bam file. (If not generated previously, Scripture will generate this file)

Scripture will first generate a Pairedend.bam file for your input bam file and then use this file for further steps. If you are running the reconstructions one chromosome at a time (recommended since this is faster), please first create the paired end bam file using the following, since each run will then try to create the paired end bam file.

java -jar CreatePairedBamFile.jar -alignment <input.bam> -strand <first | second | unstranded >