********************* * NOTE: As of 21-Aug-2012 the help needs to be rewritten. So use the following with care. ********************* This section gives an overview of the work done for the accessions (aka, samples). In general an accession will go through each processing step only once. However there are times when the same processing step will be run multiple times with different parameters. In this case all related steps will be in one large column with sub-columns for the differences.

For each processing step that results in a FastA or FastQ file there are three common and untitled lines at the top:

  1. Number of reads, contigs or scaffolds.
  2. Number of bases in the reads, contigs or scaffolds.
  3. The minimum and maximum length of the reads, contigs or scaffolds. If there is a middle number inside braces then this is the mean length. E.g., '30-[139]-150' indicates that the shortest read is 30 base pairs, the longest 150 and the mean length is 139.
  4. If there is a fourth line that starts with 'cutoff' then the count of reads and number of bases are those of that cutoff length and greater. Otherwise all reads are considered. A 'cutoff' is most often used in contig creating programs such as 'ABySS' or 'Trinity' which tend to create a bunch of small and less interesting contigs.
  5. There may be a fifth line with a link to the FastA/FastQ files.

For processing steps that result in a BAM output file then there are four common lines.

  1. Number of reads, contigs or scaffolds [untitled].
  2. Percent of 'properly paired' [%PP] and 'singleton' [%Si] reads as given by the 'samtools flagstat' program.
  3. Percent of 'mapped' [%Map] and 'unmapped' [%Un] reads as given by the 'samtools idxstats' program.
  4. A link to the BAM file.

Specific sections:
Unaligned These are the raw reads from the sequencer. Unless there was something amiss in the sequencing run then the length range should have the same lower and upper limits; e.g., '100-100' for a 100-base run. Because these files come in many 'small' (~2 GB) files separated by lane and read direction then if you want these files then you will have to go the Unaligned directory in each accession.

Unaligned_filtered We run either Trimmomatic and/or fastx_clipper in order to remove adapters and to clip poor quality bases from both the 5' and 3' ends of the Unaligned reads. Any reads below a mimimum length are discarded. The Unaligned_filtered reads are the one most often used for further processing. The filtered reads for a sample are put all together into one large file per read direction. You can access these either via the given link, the 'Unaligned_filtered' directory(s) below or via the per-sample directory.