Running the metagenomics workflow
Before starting
-
The CIDR metagenomics workflow must be started during a sequencing experiment or after a sequencing experiment has completed. The pipeline must not be activated before a sequencing experiment has started in MinKNOW and has started producing reads (See MinKNOW setup - Lab Protocol).
-
Ensure the SSD is inserted in to one of the rear USB 3.1 ports, has been mounted and the encryption key has been entered successfully. Test the disk has been mounted by navigating to it in the Ubuntu file explorer.
Starting a run
- Double click the Metagenomics Launcher icon on the GridION desktop, the CIDR Metagenomics Launcher should appear alongside a terminal window.
Known issues
The 'geocryptfs error not found...'
error can be ignored as it is not essential to the workflow.
If a sample is repeated, append the Lab ID accordingly (_2) - eg. 123mre123456_2
-
Select the number of samples to be analysed from the dropdown.
-
You can choose to initiate the launcher using one of the below methods:
- Fill out the fields on the form for each sample to be analysed.
- Loading a pre-existing TSV - see example.
Field descriptions:
Field | Description |
---|---|
MinKNOW experiment ID | The exact name matching the experiment name on MinKNOW entered by the user when initiating a sequencing run. This is populated automatically from the /data directory. |
KinKNOW sample ID | The exact name matching the Sample name on MinKNOW entered by the user when initiating a sequencing run. This is populated automatically from the /data/{experiment_id}/ directory. |
ONT barcode | The ONT library index/barcode used. Green colour indicates the barcode directory has been validated. |
Lab/Sample ID | The unique lab accession number for the sample. This data is encrypted before transmission. If repeating a sample, append with _n |
Anonymised identifier | An anonymised identifier linked to the sample hospital number. |
Collection date | Collection data of the sample. |
Sample Class | The class of sample loaded. |
Sample type | The methodology used to collect the sample. |
Operator | Arbitrary identifier of the user operating the sequencer. |
Note
- Option 1 will generate a sample sheet stored in the
metagenomics/sample_sheets
directory. This can be reused if a repeat run is required - or quick edits need to be made to a set of samples without having to fill out the fields again. - Filling the 'filename suffix' field will save the sample sheet with an appended string of your choosing to help identify your run's metadata in the 'sample_sheets' folder.
- With the metadata form filled, select the run parameter check boxes.
Parameter | Description |
---|---|
Force overwrite | The exact name matching the experiment name on MinKNOW entered by the user when initiating a sequencing run. This is populated automatically from the /data directory. |
mSCAPE prompt | After the sequencing and analysis run has completed, open the mSCAPE uploader for user input. No data is uploaded without par-sample expressed authorisation. |
Disable data ingest sleep | Not for real-time analysis! Analyse all data immediately - do not wait for it to be generated by the sequencer. |
Known issues
You should wait to launch the pipeline after the sequencer has reported producing reads in MinKNOW, the workflow will display errors in red if no reads have been found.
The NTC will exhibit the same 'error' behavior as no reads are present in the corresponding barcode folder. We are working on functionality to circumvent this.
You can stop the analysis or close the Launcher window at any point by closing the terminal window. The terminal window can be closed using the X
in the top right corner.
- Click on
Launch pipeline
and clickOK
to start analysis. - After a minute, the terminal window accompanying the workflow launcher should start displaying log outputs from the workflow. See below for an example.
- ~40 minutes after launching the sequencing experiment alongside the metagenomics workflow, the first reports will be available in
/media/grid/metagenomics/reports/{sample_name}/{timepoint}
. See below for a guide on how to access this.
Success!
We have now run the CIDR metagenomics workflow. The workflow will run for ~24 hours generating PDF reports for 0.5, 1, 2, 16, 24 hour time-points.