Terra Setup¶
In this section we will go over how to set up various elements of Terra, as well as when/why a user might need to do so.
What is Terra?¶
Terra is an online platform created by the Broad Institute as a means of organizing and running computational jobs that rely on the WDL/Cromwell. There is a lot of existing documentation written by the creators of Terra here.
There are a few core concepts about Terra that are helpful to know up front:
Workspaces¶
Workspaces are a way of organizing all the potential pieces of an analysis (data, workflows, results, etc.). For our purposes, we use workspaces primarily to delineate between the host and microbiome pipelines. This is useful because each workspace is associated with its own cost object (linked to a Google Billing Account). Users should not need to worry much about workspaces; they should simply be afforded access to the one matching their role in the lab (host or microbiome).
Methods¶
A method is a WDL script that contains all the instructions to execute within a given pipeline. There should be one method associated with each pipeline, but there can be many methods within a workspace. Methods are uploaded to Firecloud via the method repository and are then linked with a workspace.
When/why to use Terra¶
We have found Terra to be useful for running pre-existing computational pipelines because it a large number of jobs to be run simultaneously, is more reliable than hosting our own Cromwell server, and has some nice features (e.g. cost-tracking). WDL pipelines included in the Xavier Lab Computation Pipelines repository <https://gitlab.com/xavier-lab-computation/pipelines> should all have corresponding methods linked to the correct workspace.
Setting Up Billing¶
There is an existing set of instructions for setting up a FireCloud billing account here (FireCloud was the old version of Terra) – setting up Terra billing should be the same? Terra billing is linked to Google Project billing, so the host and microbiome groups will have separate billing accounts. Users likely will not have to worry about the details of setting up billing – it should be important only if you are setting up a new workspace or if the lab’s billing structure changes in the future. You can see billing projects by clicking on your account name in the upper right corner and navigating to Billing.
Creating and Adding to Groups¶
You can see your groups by clicking on your account name in the upper right corner and navigating to Groups. We have already created genomics-xavier and microbiome-xavier groups, each of which have permissions to access their respective workspaces. If someone new needs to be added to the group, you can click on the group name and you will see and Add User button.
We suggest that you have only one group associated with a given workspace. It appears that if you have multiple groups linked to a workspace, you need to add users to every group in order for them to run pipelines in that workspace.
Accessing Google Cloud Buckets¶
Each Terra workspace is automatically associated with a Google bucket it can access upon workspace creation. However, we often find it useful to be able to access pre-existing Google buckets. In order for this to work, you must give your Terra group (whichever is associated with your workspace) access to the Google bucket in question. This likely will have already been done for existing methods (pipelines), but you may need to do this yourself if you are adding a new method:
1. On Terra, go to your Groups page (as described in the section above). You will see an email address associated with each group, it should be <group_name>@Terra.org.
2. Go to your Google Cloud account and find the bucket(s) that need to be accessed by your method. Select the bucket with the checkbox on the left side of the screen and click SHOW INFO PANEL in the top right. Paste the email address associated with your group in the text box under Add members, and choose Storage/Storage Admin as the role. Your Terra group should now have full access to the Google bucket in question.