Setup

Preprocessing

The user will need to ensure that there is both a fastq and welltags file for each plate, as well as a single universal vectags file.

Configuration

experiment_name

Refers to the common Xavier Lab naming convention of <date_initials>. This will help set the Google bucket location of the output files for the experiment (genomics_xavier_bucket/TFSeq/<experiment_name>).

plate_info_file

The plate_info_file contains 3 columns; fastq filepath, plate name, and welltags filepath. Each row corresponds to a plate:

plate_info_file
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate1/PLATE1_S1_L001_R1_001.fastq.gz  Plate1  gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate1/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate2/PLATE2_S2_L001_R1_001.fastq.gz  Plate2  gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate2/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate3/PLATE3_S3_L001_R1_001.fastq.gz  Plate3  gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate3/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate4/PLATE4_S4_L001_R1_001.fastq.gz  Plate4  gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate4/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate5/PLATE5_S5_L001_R1_001.fastq.gz  Plate5  gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate5/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate6/PLATE6_S6_L001_R1_001.fastq.gz  Plate6  gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate6/welltags.csv

This file must be a .tsv file in order to be interpreted properly by the pipeline. One way to check this is to run the following line in bash

cat -A path/to/plate_info_file.tsv

This should produce ^I (a marker of tabs) between the columns and $ at the end of each row. Make sure there is not an extra $ at the end of the file.

Once you have created a valid plate_info_file, we recommend uploading it to the Google cloud directory where the rest of your files are (in the spirit of keeping related files together). See here for an example.

Note that the plate names can be whatever you want (they are just used to organize the output), but they will likely be Plate1, Plate2, …. Likewise, the fastq and welltags filepaths could conceivably point to any Google bucket locations, but we think it makes sense to have them organized as shown above (a directory for each plate and the fastq and welltags files listed within them).

vectags_file

The vectags_file is a universal file providing vectags for every plate.