Scripture

Scripture Transcriptome assembly

The Scripture-reconstructions module will build the transcriptome using an aligned full-length RNA-Seq library and an assebled genome. It is implemented in Java and requires version 1.6 or higher. The module can be run on the entire genome or one chromosome at a time. The input preparation section describes the input file formats.

Usage:

Entire genome:

java -jar Scripture.beta.v3.jar -task <reconstruct> -alignment <BAM/SAM alignment file> -genome <Path to genome FASTA filet> -out <Output file name> <Options>

One chromosome at a time:

java -jar Scripture.beta.v3.jar -task <reconstruct> -alignment <BAM/SAM alignment file> -genome <Path to genome FASTA filet> -out <Output file name> <Options> -chr <chromosome name>

Arguments:

-alignment <bam/sam file path> Path to the alignment file.
-genome <genome fasta file path> Points to the fasta file of the assembled genome. Note: The chromosome names in the alignment file and genome files must match.
-out Specifies the name of the DGE expression table. Other output files (depending on the parameters used) will use this name as prefix.

Options:

-strand Specifies whether the data is unstranded or if stranded, which mate is in the direction of transcription. Values: first, second, unstranded. Default is unstranded.
-coverage The percentable drop allowed in coverage. Default is 0.2 (that is, 20%)
-alpha p-value threshold to remove insignificant transcripts. Default is 0.01
-minSpliceReadsThe minimum number of spliced reads required to support any intron. Default is 3
-minSplicePercentAny transcript with an assymmetric ditribution of spliced reads supporting its introns is removed. Every intron for a transcript must have at least THIS percent of the average number spliced reads supporting the entire transcript. Default is 0.005 (that is, 5%)

Output:

Scripture outputs reconstructions in two output files:

Output files:

  1. output.name.scripture.paths.bed -- The best reconstructions using the paired end data in the input file. The scores of the bed entries are a modified FPKM value.

  2. output.name.connected.bed -- This file consists of reconstructions joined based on paired end reads. This means that sometimes, reconstructions are disconnected due to drops in coverage but they might be connected using the information that they are spanned by paired end reads.

Input preparation

Scripture modules take alignment bam/sam files as input. Any alignment software, for example, Tophat or GSNAP can be used for this purpose. BAM or SAM files must be sorted and indexed in order to be processed. See samtools or Picard for tools to sort and index SAM/BAM files. More details on the input page.