Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why my colabfold_search run takes too long time? #245

Open
spark159 opened this issue Jul 28, 2024 · 3 comments
Open

why my colabfold_search run takes too long time? #245

spark159 opened this issue Jul 28, 2024 · 3 comments

Comments

@spark159
Copy link

Caution: Please only report your issue related to the installation on your local PC or macOS. If you can get the help message by colabfold_batch --help or run a test prediction successfully, your installation is successful. Requests or questions regarding ColabFold features should be directed to ColabFold repo's issues.


What is your installation issue?

I tried to run colabfold_search in SLURM cluster but it takes like more than 2 days, even though input fasta is just single input.

Computational environment

I used this job allocations:
#SBATCH -c 10 # Requested cores
#SBATCH --time=2-00:00 # Runtime in D-HH:MM format
#SBATCH --partition=medium # Partition to run in
#SBATCH --mem=100GB # Requested Memory
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j)

To Reproduce

And this my colabfold_search execute command:
colabfold_search
--use-env 1
--use-templates 0
--db-load-mode 2
--mmseqs mmseqs
--threads 4
${input_path}
${database_path}
${output_path}

and my fasta input is simply this:

S1_S8
NHIIIPSYASWFDYNCIHVIERRALPEFFNGKNKSKTPEIYLAYRNFMIDTYRLNPQEYLTSTACRRNLTGDVCAVMRVHAFLEQWGLVNYQVDPESRPMAMGPPPTPHFNVLADTPSGLVPLHLRSPQVPAAQQMLNFPEKNKEKPVDLQNFGLRTDIYSKKTLAKSKGASAGREWTEQETLLLLEALEMYKDDWNKVSEHVGSRTQDECILHFLRLPIEDPYLENSDASLGPLAYQPVPFSQSGNPVMSTVAFLASVVDPRVASAAAKAALEEFSRVREEVPLELVEAHVKKVQEAAR:PLCTLLDWQDSLAKRCVCVSNTIRSLSFVPGNDFEMSKHPGLLLILGKLILLHHKHPERKQAPLTYEKEEEQDQGVSCNKVEWWWDCLEMLRENTLVTLANISGQLDLSPYPESICLPVLDGLLHWAVCPSAEAQDPFSTLGPNAVLSPQRLVLETLSKLSIQDNNVDLILATPPFSRLEKLYSTMVRFLSDRKNPVCREMAVVLLANLAQGDSLAARAIAVQKGSIGNLLGFLEDSLAATQFQQSQASLLHMQNPPFEPTSVDMMRRAARALLALAKVDENHSEFTLYESRLLDISVSPLMNSLVSQVICDVLFLIGQS

Expected behavior

I expected short run time like few hours, but it takes more than 2 days and job was cancelled.
And I attaching the log file, too.
42792350.txt

Thank you for your help in advance!

@YoshitakaMo
Copy link
Owner

This issue is not about the installation itself. Because colabfold_search depends largely on the machine environment, such as the file system, storage (> 2TB SSD is highly recommended for best performance), RAM (> 768 GB for best performance), and whether or not vmtouch is used. If your job was run on a shared supercomputer and the file system is RAID or network mounted, the calculation speed will be too slow.

@spark159
Copy link
Author

Thank you so much for your kind response!

Just few more questions.

Do you have any suggestions to increase the speed of run in my current environment?
What is the most important parameter for determining the performance speed?
Probably, RAM memory (>768 GB)?

Thank you!

@YoshitakaMo
Copy link
Owner

In my experience, the most important factor is the file system and the use of SSD. If the sequence databases are placed on an SSD connected by a SATA cable, colabfold_search returns the result in 30~60 minutes even if the machine has only 64GB RAM (in my Desktop Ubuntu 22.04). However, using HDD or network-mounted drive will slow the calculation by more than 10 times. If the sequence databases can be fully cached on the RAM (>768GB) on the first run of colabfold_search, subsequent runs will be extremely fast, on par with the MMSeqs web server.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants