BFD Downloads

BFD is available at the following two mirrors. The archives were created at different times, thus the checksums won't match. However, the hashes for the individual files are listed below.
  1. Mirror Google Cloud
    MD5 Hash
    6a634dc6eb105c2e9b4cba7bbae93412
    Byte Size
    291649557441
  2. Mirror GWDG
    MD5 Hash
    4b53fc6ca77c78fbc433948fb47e08c6
    Byte Size
    291649557551

Bioinformatic Methods

BFD was created by clustering 2.5 billion protein sequences from Uniprot/TrEMBL+Swissprot, Metaclust and Soil Reference Catalog Marine Eukaryotic Reference Catalog assembled by Plass.

We clustered sequences that could be aligned to a longer sequence with 90% of their residues and a sequence identity of 30% using Linclust/MMseqs2 --cov-mode 1 --min-seq-id 0.3.

We removed all clusters with less than three sequences and turned the database into an HH-suite3 database using the Uniclust pipeline.

File Format


The each entry sequences in the database has id from different sources, either Uniprot, JGI, NCBI or OM-RGC. The JGI data can be accessed by using the first part of the fasta identifier at the organism field of the following url. https://genome.jgi.doe.gov/portal/pages/dynamicOrganismDownload.jsf?organism=RifCSPlowO2_12
 >RifCSPlowO2_12_1023861.scaffolds.fasta_scaffold367679_1 # 24 # 428 # -1 # ID=367679_1;partial=00;start_type=ATG;rbs_motif=None;rbs_spacer=None;gc_cont=0.435

Consistency Check

After download please check that the contents of the downloaded archive matches the following MD5 hashes:

bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffdata
2dc0f09adabbcf1965ed578e0b2ab07e
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_a3m.ffindex
476941cf4a964d96fb3b68a82fe734d1
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffdata
4bb63ac9c3a3dd088cf654df1f548d53
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_cs219.ffindex
26d48869efdb50d036e2fb9056a0ae9d
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffdata
9bd2da8a8adbcc30801f0221d0dc1987
bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt_hhm.ffindex
799f308b20627088129847709f1abed6

We recommend downloading the BFD with aria2c.

License

All files are available under a Creative Commons Attribution 4.0 International License.