TOSTADAS → Toolkit for Open Sequence Triage, Annotation and DAtabase Submission 🧬 💻

PATHOGEN ANNOTATION AND SUBMISSION PIPELINE

For the complete TOSTADAS documentation, please see the Wiki

Overview

T O S T A D A S
Toolkit for Open Sequence Triage, Annotation, and DAtabase Submission

A portable, open-source pipeline designed to streamline submission of pathogen genomic data to public repositories. Reducing barriers to timely data submission increases the value of public repositories for both public health decision making and scientific research. TOSTADAS facilitates routine sequence submission by standardizing and automating:

Metadata Validation
Genome Annotation
File submission

TOSTADAS is designed to be flexible, modular, and pathogen agnostic, allowing users to customize their submission of raw read data, assembled genomes, or both. The current release has been tested with sequence data from Poxviruses and select bacteria. Testing for additional pathogen is planned for future releases.

Installation and Quick Start

❗ Note: If you are a CDC user, please follow the set-up instructions found here: CDC User Guide

For non-CDC users, please follow the instructions below.

1. Clone the repository to your local machine

git clone https://github.com/CDCgov/tostadas.git

! Note: If you already have Nextflow installed in your local environment, skip ahead to step 5.

2. Install mamba and add it to your PATH

2a. Install mamba

❗ Note: If you have mamba installed in your local environment, skip ahead to step 3 (Create and activate a conda environment)

curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
bash Mambaforge-$(uname)-$(uname -m).sh -b -p $HOME/mambaforge

2b. Add mamba to PATH:

export PATH="$HOME/mambaforge/bin:$PATH"

3. Create and activate a conda environment

3a. Create an empty conda environment

conda create --name tostadas

This conda environment will be used to install Nextflow.

3b. Activate the environment

conda activate tostadas

Verify which environment is active by running the following conda command: conda env list. The active environment will be denoted with an asterisk *

4. Install Nextflow using mamba and the bioconda Channel

mamba install -c bioconda nextflow

5. Update the default submissions config file with your NCBI username and password, and run the following nextflow command to execute the scripts with default parameters and the local run environment:

# update this config file (you don't have to use vim)
vim conf/submission_config.yaml
# test command for virus reads
nextflow run main.nf -profile test,<singularity|docker|conda> --virus

The pipeline outputs appear in tostadas/test_output

6. Start running your own analysis

Annotate and submit viral reads

nextflow run main.nf -profile <docker|singularity> --species virus --submission --annotation  --genbank true --sra true --biosample true --output_dir <path/to/output/dir/> --meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml>

Annotate and submit bacterial reads

nextflow run main.nf -profile <docker|singularity> --species bacteria --submission --annotation  --genbank true --sra true --biosample true --meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml> --download_bakta_db --bakta_db_type <light|full> --output_dir <path/to/output/dir/>

Refer to the wiki for more information on input parameters and use cases

7. Custom metadata validation and custom BioSample package

TOSTADAS defaults to Pathogen.cl.1.0 (Pathogen: clinical or host-associated; version 1.0) NCBI BioSample package for submissions to the BioSample repository. You can submit using a different BioSample package by doing the following:

Change the package name in the conf/submission_config.yamlsubmissions. Choose one of the available NCBI BioSample packages.
Add the necessary fields for your BioSample package to your input Excel file.
Add those fields as keys to the JSON file (assets/custom_meta_fields/example_custom_fields.json) and provide key info as needed. replace_empty_with: TOSTADAS will replace any empty cells with this value (Example application: NCBI expects some value for any mandatory field, so if empty you may want to change it to "Not Provided".) new_field_name: TOSTADAS will replace the field name in your metadata Excel file with this value. (Example application: you get weekly metadata Excel files and they specify 'animal_environment' but NCBI expects 'animal_env'; you can specify this once in the JSON file and it will changed on every run.)

Submit to a custom BioSample package

nextflow run main.nf -profile <docker|singularity> --species virus --submission --annotation  --genbank true --sra true --biosample true --output_dir <path/to/output/dir/> --meta_path <path/to/metadata_file.xlsx> --submission_config <path/to/submission_config.yaml> --custom_fields_file  <path/to/metadata_custom_fields.json>

Get in Touch

If you need to report a bug, suggest new features, or just say “thanks”, open an issue and we’ll try to get back to you as soon as possible!

Acknowledgements

Contributors

Tools

The submission portion of this pipeline was adapted from SeqSender. To find more information on this tool, please refer to their GitHub page: SeqSender

Resources

🔗 NCBI Submission Guidelines: https://submit.ncbi.nlm.nih.gov/sarscov2/sra/#step6

🔗 SeqSender Documentation: https://github.com/CDCgov/seqsender

🔗 Liftoff Documentation: https://github.com/agshumate/Liftoff

🔗 VADR Documentation: https://github.com/ncbi/vadr.git

🔗 Bakta Documentation: https://github.com/oschwengers/bakta

🔗 RepeatMasker Documentation: https://www.repeatmasker.org/

Name		Name	Last commit message	Last commit date
Latest commit History 1,485 Commits
.github		.github
app		app
assets		assets
bin		bin
conf		conf
docs		docs
environments		environments
lib		lib
modules		modules
params		params
setup		setup
subworkflows/local		subworkflows/local
tests		tests
vadr_files		vadr_files
workflows		workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
CONTRIBUTING.md		CONTRIBUTING.md
DISCLAIMER.md		DISCLAIMER.md
LICENSE		LICENSE
README.md		README.md
code-of-conduct.md		code-of-conduct.md
environment.yml		environment.yml
main.nf		main.nf
nextflow.config		nextflow.config
nextflow_schema.json		nextflow_schema.json
open_practices.md		open_practices.md
rules_of_behavior.md		rules_of_behavior.md
thanks.md		thanks.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TOSTADAS → Toolkit for Open Sequence Triage, Annotation and DAtabase Submission 🧬 💻

PATHOGEN ANNOTATION AND SUBMISSION PIPELINE

Overview

Installation and Quick Start

1. Clone the repository to your local machine

2. Install mamba and add it to your PATH

3. Create and activate a conda environment

4. Install Nextflow using mamba and the bioconda Channel

5. Update the default submissions config file with your NCBI username and password, and run the following nextflow command to execute the scripts with default parameters and the local run environment:

6. Start running your own analysis

7. Custom metadata validation and custom BioSample package

Get in Touch

Acknowledgements

Contributors

Tools

Resources

About

Releases 4

Packages

Contributors 9

Languages

License

CDCgov/tostadas

Folders and files

Latest commit

History

Repository files navigation

TOSTADAS → Toolkit for Open Sequence Triage, Annotation and DAtabase Submission 🧬 💻

PATHOGEN ANNOTATION AND SUBMISSION PIPELINE

Overview

Installation and Quick Start

1. Clone the repository to your local machine

2. Install mamba and add it to your PATH

3. Create and activate a conda environment

4. Install Nextflow using mamba and the bioconda Channel

5. Update the default submissions config file with your NCBI username and password, and run the following nextflow command to execute the scripts with default parameters and the local run environment:

6. Start running your own analysis

7. Custom metadata validation and custom BioSample package

Get in Touch

Acknowledgements

Contributors

Tools

Resources

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 9

Languages

Packages