Setup¶
Preprocessing¶
The user will need to ensure that there is both a fastq and welltags file for each plate, as well as a single universal vectags file.
Configuration¶
experiment_name¶
Refers to the common Xavier Lab naming convention of <date_initials>. This will help set the Google bucket location of the output files for the experiment (genomics_xavier_bucket/TFSeq/<experiment_name>).
plate_info_file¶
The plate_info_file contains 3 columns; fastq filepath, plate name, and welltags filepath. Each row corresponds to a plate:
plate_info_file¶gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate1/PLATE1_S1_L001_R1_001.fastq.gz Plate1 gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate1/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate2/PLATE2_S2_L001_R1_001.fastq.gz Plate2 gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate2/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate3/PLATE3_S3_L001_R1_001.fastq.gz Plate3 gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate3/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate4/PLATE4_S4_L001_R1_001.fastq.gz Plate4 gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate4/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate5/PLATE5_S5_L001_R1_001.fastq.gz Plate5 gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate5/welltags.csv
gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate6/PLATE6_S6_L001_R1_001.fastq.gz Plate6 gs://genomics_xavier_bucket/TFSeq/20190213_MK/Plate6/welltags.csv
This file must be a .tsv file in order to be interpreted properly by the pipeline. One way to check this is to run the following line in bash
cat -A path/to/plate_info_file.tsv
This should produce ^I (a marker of tabs) between the columns and $ at the end of each row. Make sure there is not an extra $ at the end of the file.
Once you have created a valid plate_info_file, we recommend uploading it to the Google cloud directory where the rest of your files are (in the spirit of keeping related files together). See here for an example.
Note that the plate names can be whatever you want (they are just used to organize the output), but they will likely be Plate1, Plate2, …. Likewise, the fastq and welltags filepaths could conceivably point to any Google bucket locations, but we think it makes sense to have them organized as shown above (a directory for each plate and the fastq and welltags files listed within them).
vectags_file¶
The vectags_file is a universal file providing vectags for every plate.