Google Cloud Setup

General Information

Any pipeline we use will likely utilize three different products within the Google Cloud ecosystem; Compute Engine for genomics and microbiome (link TBA), Cloud Storage for genomics and microbiome (link TBA), and the Container Repository. As the names imply, the pipeline performs computations using the compute engine, stores data in cloud storage, and uses docker images stored in the container registry. If you are at the Broad Institute, you should be able to follow this link and use your Broad gmail account to sign into the platform. The Xavier Lab has projects called genomics-xavier and microbiome-xavier, and these are the projects under which all activities take place. This is also the level of organization at which billing takes place.

Web Access

Google cloud can be controlled through web interface. Click links above.

Command Line Access

Authentication

In order to access Google Cloud from the command line, you will need to authenticate and configure your Google Cloud account. You can do that through the following steps:

  1. Log onto the Broad cluster

  2. Carry out the following shell commands and follow the instructions:

    # load google cloud software development kit
    use .google-cloud-sdk
    
    # authenticate (this should be necessary only for the first time using the google cloud sdk)
    gcloud auth login
    
    # identify the google cloud project
    gcloud config set project <genomics-xavier, xavier_microbiome>
    

Congratulations! At this point, you should be able to interact with Google Cloud! It may be useful for you to check out the documentation for the Google Cloud command line tool (called gsutil) here.

Q&A

  • Root access. On official image:

    sudo bash
    
  • Other non-urgent questions. Please raise issue in this documentation or relevant repo.

Notes

  • Christian has noticed that using wildcard matching within gsutil does not work when using Z shell. He knows it works in bash, but can’t speak to any other shell options.