why my colabfold_search run takes too long time? #245

spark159 · 2024-07-28T15:26:39Z

Caution: Please only report your issue related to the installation on your local PC or macOS. If you can get the help message by colabfold_batch --help or run a test prediction successfully, your installation is successful. Requests or questions regarding ColabFold features should be directed to ColabFold repo's issues.

What is your installation issue?

I tried to run colabfold_search in SLURM cluster but it takes like more than 2 days, even though input fasta is just single input.

Computational environment

I used this job allocations:
#SBATCH -c 10 # Requested cores
#SBATCH --time=2-00:00 # Runtime in D-HH:MM format
#SBATCH --partition=medium # Partition to run in
#SBATCH --mem=100GB # Requested Memory
#SBATCH -o %j.out # File to which STDOUT will be written, including job ID (%j)
#SBATCH -e %j.err # File to which STDERR will be written, including job ID (%j)

To Reproduce

And this my colabfold_search execute command:
colabfold_search
--use-env 1
--use-templates 0
--db-load-mode 2
--mmseqs mmseqs
--threads 4
${input_path}
${database_path}
${output_path}

and my fasta input is simply this:

S1_S8
NHIIIPSYASWFDYNCIHVIERRALPEFFNGKNKSKTPEIYLAYRNFMIDTYRLNPQEYLTSTACRRNLTGDVCAVMRVHAFLEQWGLVNYQVDPESRPMAMGPPPTPHFNVLADTPSGLVPLHLRSPQVPAAQQMLNFPEKNKEKPVDLQNFGLRTDIYSKKTLAKSKGASAGREWTEQETLLLLEALEMYKDDWNKVSEHVGSRTQDECILHFLRLPIEDPYLENSDASLGPLAYQPVPFSQSGNPVMSTVAFLASVVDPRVASAAAKAALEEFSRVREEVPLELVEAHVKKVQEAAR:PLCTLLDWQDSLAKRCVCVSNTIRSLSFVPGNDFEMSKHPGLLLILGKLILLHHKHPERKQAPLTYEKEEEQDQGVSCNKVEWWWDCLEMLRENTLVTLANISGQLDLSPYPESICLPVLDGLLHWAVCPSAEAQDPFSTLGPNAVLSPQRLVLETLSKLSIQDNNVDLILATPPFSRLEKLYSTMVRFLSDRKNPVCREMAVVLLANLAQGDSLAARAIAVQKGSIGNLLGFLEDSLAATQFQQSQASLLHMQNPPFEPTSVDMMRRAARALLALAKVDENHSEFTLYESRLLDISVSPLMNSLVSQVICDVLFLIGQS

Expected behavior

I expected short run time like few hours, but it takes more than 2 days and job was cancelled.
And I attaching the log file, too.
42792350.txt

Thank you for your help in advance!

The text was updated successfully, but these errors were encountered:

YoshitakaMo · 2024-07-28T16:13:06Z

This issue is not about the installation itself. Because colabfold_search depends largely on the machine environment, such as the file system, storage (> 2TB SSD is highly recommended for best performance), RAM (> 768 GB for best performance), and whether or not vmtouch is used. If your job was run on a shared supercomputer and the file system is RAID or network mounted, the calculation speed will be too slow.

spark159 · 2024-07-28T19:02:24Z

Thank you so much for your kind response!

Just few more questions.

Do you have any suggestions to increase the speed of run in my current environment?
What is the most important parameter for determining the performance speed?
Probably, RAM memory (>768 GB)?

Thank you!

YoshitakaMo · 2024-07-29T02:41:52Z

In my experience, the most important factor is the file system and the use of SSD. If the sequence databases are placed on an SSD connected by a SATA cable, colabfold_search returns the result in 30~60 minutes even if the machine has only 64GB RAM (in my Desktop Ubuntu 22.04). However, using HDD or network-mounted drive will slow the calculation by more than 10 times. If the sequence databases can be fully cached on the RAM (>768GB) on the first run of colabfold_search, subsequent runs will be extremely fast, on par with the MMSeqs web server.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why my colabfold_search run takes too long time? #245

why my colabfold_search run takes too long time? #245

spark159 commented Jul 28, 2024

YoshitakaMo commented Jul 28, 2024

spark159 commented Jul 28, 2024

YoshitakaMo commented Jul 29, 2024

why my colabfold_search run takes too long time? #245

why my colabfold_search run takes too long time? #245

Comments

spark159 commented Jul 28, 2024

YoshitakaMo commented Jul 28, 2024

spark159 commented Jul 28, 2024

YoshitakaMo commented Jul 29, 2024