Output

kallisto_output

Contains the final counts files. Currently, there are three counts files generated by the pipeline:

  • kallisto_gene_counts_rounded.csv
  • kallisto_gene_counts_unrounded.csv
  • kallisto_transcript_counts.csv

The transcript counts file can be seen as the raw kallisto output. This could be useful if the user wants the greatest level of control, or has a specific need for transcript-level quantification.

The unrounded gene counts table is like the transcript counts table, but with transcript counts aggregated at the level of the gene. This will not, as is, be usable by standard differential expression tools (the counts are not necessarily whole numbers), but users could round these as they desire to get something that looks like a traditional gene counts table.

The rounded gene counts table is, in some sense, the “standard”. It is used by edgeR inside the pipeline and is probably the most familiar to most users. It is equivalent to the unrounded gene counts table, except that counts are rounded to the nearest whole number.

edgeR_output

Contains the edgeR differential expression for each comparison, as well as a file with all comparisons combined, combined_edgeR.tsv. The FDRs will not match between the individual and combined files, because the FDR is always calculated at the level of the file. That is, the FDR for combined_edgeR.tsv calculates the FDR across all comparisons jointly.

Note

To stay consistent with the lab’s existing base differential expression code, the log fold change is defined such that a negative log fold change for a comparison x_vs_y means higher counts in y than in x.

Note

The standard edgeR differential expression built into the pipeline is not optimal in a few different senses. First, it does not support inclusion of covariates into the differential expression model. Second, it does not filter samples with low alignment (\(< 30\%\)). Finally, Lior Pachter (a co-author of the Kallisto software) highly advises against using Kallisto, rounding gene counts, and using edgeR (which is exactly what the pipeline does). The differential expression portion of the pipeline was originally written using Sleuth (which also more easily supports automated sample filtering), but people seemed to be put off by the output not looking like edgeR and DESeq. It might be a good idea to add Sleuth back in as the pipeline default and people can do their own edgeR or DESeq if they so choose.

multiqc_report.html

Standard QC output regarding alignment percentage, numbers of reads, sequence quality, etc.