* NOTE: As of 21-Aug-2012 the help needs to be rewritten. So use the following with care.
This section gives an overview of the work done for the accessions (aka, samples). In general
an accession will go through each processing step only once. However there are times when
the same processing step will be run multiple times with different parameters. In this case
all related steps will be in one large column with sub-columns for the differences.
For each processing step that results in a FastA or FastQ file
there are three common and untitled lines at the top:
Number of reads, contigs or scaffolds.
Number of bases in the reads, contigs or scaffolds.
The minimum and maximum length of the reads, contigs or scaffolds.
If there is a middle number inside braces then this
is the mean length. E.g., '30--150' indicates that the shortest read is 30 base pairs, the longest
150 and the mean length is 139.
If there is a fourth line that starts with 'cutoff' then the count of reads and number of bases are those
of that cutoff length and greater. Otherwise all reads are considered. A 'cutoff' is most often used in contig
creating programs such as 'ABySS' or 'Trinity' which tend to create a bunch of small and
less interesting contigs.
There may be a fifth line with a link to the FastA/FastQ files.
For processing steps that result in a BAM output file then there are four common lines.
Number of reads, contigs or scaffolds [untitled].
Percent of 'properly paired' [%PP] and 'singleton' [%Si] reads as given by the 'samtools flagstat' program.
Percent of 'mapped' [%Map] and 'unmapped' [%Un] reads as given by the 'samtools idxstats' program.
A link to the BAM file.
Unaligned_filtered We run either Trimmomatic and/or fastx_clipper in order to remove adapters and
to clip poor quality bases from both the 5' and 3' ends of the Unaligned reads. Any reads below a
mimimum length are discarded. The Unaligned_filtered reads are the one most often used for further processing.
The filtered reads for a sample are put all together into one large file per read
direction. You can access these either via the given link,
the 'Unaligned_filtered' directory(s) below or via the per-sample directory.
Unaligned These are the raw reads from the sequencer. Unless there was something amiss in the
sequencing run then the length range should have the same lower and upper limits; e.g., '100-100' for a
100-base run. Because these files come in many 'small' (~2 GB) files separated by lane and read direction
then if you want these files then you will have to go the Unaligned directory in each accession.