Google Cloud Setup¶

General Information¶

Any pipeline we use will likely utilize three different products within the Google Cloud ecosystem; Compute Engine for genomics and microbiome (link TBA), Cloud Storage for genomics and microbiome (link TBA), and the Container Repository. As the names imply, the pipeline performs computations using the compute engine, stores data in cloud storage, and uses docker images stored in the container registry. If you are at the Broad Institute, you should be able to follow this link and use your Broad gmail account to sign into the platform. The Xavier Lab has projects called genomics-xavier and microbiome-xavier, and these are the projects under which all activities take place. This is also the level of organization at which billing takes place.

Web Access¶

Google cloud can be controlled through web interface. Click links above and below.

Create instance
Create bucket
Plz follow naming convention.
Cost: No need to worry unless using >1TB of bucket or hard disk, or running >16 cores or >64GB memory on average per month. If so, contact Ariel.

Command Line Access¶

Authentication¶

In order to access Google Cloud from the command line, you will need to authenticate and configure your Google Cloud account. You can do that through the following steps:

Log onto the Broad cluster

Carry out the following shell commands and follow the instructions:

# load google cloud software development kit
use .google-cloud-sdk

# authenticate (this should be necessary only for the first time using the google cloud sdk)
gcloud auth login

# identify the google cloud project
gcloud config set project <genomics-xavier, xavier_microbiome>

Congratulations! At this point, you should be able to interact with Google Cloud! It may be useful for you to check out the documentation for the Google Cloud command line tool (called gsutil) here.

Recommended Usages¶

Naming convention. Please name everything (instances, disks, snapshots, images, networks, buckets, service accounts, etc) by starting with your name. This identifies ownership to reduce mis-operation. Prefix instance names with ‘a’ is also recommended so it stays on top when running pipelines floods the instances list.
Machine types. Please use the proper machine type for your need. Feel free to use more CPUs & memory if necessary, but just remember to shut down or resize the machine upon finishing. Minimal machine size is recommended for simple single-thread analysis.
Automatic shutdown. Here’s a “selfstop” script to shutdown the instance running it. Example usage in bash:
```
( R < analysis.R &> analysis.log; selfstop ) &
disown
```
Use auto-delete buckets for temporary file transfer. Buckets expire-1d and expire-7d automatically delete files after 1 or 7 days.

Q&A¶

Root access. On official image:
```
sudo bash
```
Disk expansion:
1. Stop your instance
2. On Google Compute Engine web interface, go to Disks (left).
3. Click the disk you want to expand
4. Click Edit
5. Change the size and save
Other non-urgent questions. Please raise issue in this documentation or relevant repo.

Notes¶

Christian has noticed that using wildcard matching within gsutil does not work when using Z shell. He knows it works in bash, but can’t speak to any other shell options.