The AllFam database is a resource for classifying allergens into protein
families. The database merges allergen data from the WHO/
Nomenclature Database, supplemented by data from
AllergenOnline, with protein
family definitions from the Pfam
See About AllFam for information on the AllFam team and how to
cite AllFam. See References for a list of references related
to AllFam and the underlying databases.
In the last two decades, hundreds of allergens have been identified, cloned and
sequenced. This wealth of data now enables us to classify allergens into families of
evolutionary related proteins that are defined by sequence and structural similarity.
Members of the same family often have similar physico-chemical characteristics and
biochemical functions. Usually, only members of the same protein family show
cross-reactivity on the IgE and T cell level. However, it has to be stressed that neither
are all members of an allergen-containing protein family allergenic, nor are all allergenic
members of a protein family cross-reactive.
Employing different methods, several groups have shown that most allergens belong to a
limited set of protein families thus disproving the assumption that every protein can
become an allergen (see References). Hence, the
classification of allergenic proteins based on biochemical and structural similarities
including the comparison of allergenic and non-allergenic members of a protein family will
lead to new insights into factors that contribute to allergenicity. In addition, these data
will provide the foundation for elucidating the structural basis of allergenic
Construction of AllFam
Figure 1 shows an overview of the algorithm used in constructing the
Allfam database. Briefly, the procedure is as follows:
- Merging data from the IUIS Allergen Nomenclature database and AllergenOnline:
- Sequences from AllergenOnline are matched with corresponding IUIS-approved allergens
from the same species by using a sequence identity threshold of 50% (applying a higher
threshold for very short fragments to exclude biologically insignificant short
- Comparison with Pfam family definitions:
- Single sequences representing each allergen are compared with all Hidden Markov Models
(HMMs) representing protein families in Pfam using the thresholds for significant hits
defined in Pfam.
- Translation of found Pfam domains into AllFam families:
- This process in described in more detail below (What is the
difference between Pfam and AllFam families?).
- BLAST search of unclassified sequences against already classified ones.
- Most allergen sequences not classified by searching against Pfam HMMs are short
fragments obtained by N-terminal sequencing or mass spectrometry. These sequences are
compared with a database of already classified sequences using blastp and assigned to
the AllFam family of the most similar hit if the alignment yields a significant score.
Flow chart showing the process of generating the AllFam database
What is the difference between Pfam and AllFam families?
The basis of AllFam's allergen family classification are protein family definitions from
Pfam. In contrast to Pfam domains, AllFam families are defined so that each allergen is
assigned to a single AllFam family and that related allergens are grouped into the same
AllFam family. This is achieved by the following steps:
- Merging of closely-related Pfam families into single AllFam families:
- Many families whose members show a low degree of sequence conservation cannot be
represented by a single Hidden Markov Model (HMM), but instead by a group of HMMs, each
of which represents only some family members. In AllFam, these groups of HMMs are merged
into single families. For instance, members of the EF-hand family
(AllFam:AF007) contain highly conserved calcium
binding sites flanked by helices with much lower degree of conservation. They are
represented in Pfam by multiple HMMs, six of which matched allergen sequences and, hence,
were merged into a single AllFam family.
- Merging of constituent domains of multi-domain proteins into single families:
- Some domains always occur together and are therefore merged into a single AllFam
family. For example, enolases consist of an N-terminal
(Pfam:PF03952) and a C-terminal
(Pfam:PF00113) domain, which are combined into a single
AllFam family: AF031.
- Classifying all constituent domains of different multi-domain proteins that share
a particular domain into a single family:
- Figure 2 shows the domain composition of members of AllFam family
AF043: Hevein-like and class I/II chitinase. The
family is defined by the hevein-like domain (Pfam:PF00187),
which is present in proteins with varying domain compositions. However, this AllFam family
also comprises class II chitinases, which lack a hevein domain. Otherwise, the chitinase
domain (Pfam:PF00182) would appear in two different AllFam
Pfam domain compositions of members of AllFam family
AF043: Hevein-like and class I/II chitinase.
You can search AllFam for various types of data:
- Allergen names
- IUIS-approved allergens are named by an abbreviation of the genus name (3-4 letters),
an abbreviation of the species name (1-2 letters) and a number, each item separated by a
blank (e.g. Der p 1 for the first allergen from
Dermatophagoides pteronyssinus). Non-IUIS allergens from AllergenOnline received
genus and species abbreviations according to the IUIS nomenclature, followed by a short
biochemical name or other unambiguous identifier (e.g. Hor v nsLTP-1 for the type 1
non-specific lipid transfer protein from Hordeum
- Allergen sources
- Search for allergen sources by species abbreviation, scientific name or common name.
For instance, search for peanut allergens by using "Ara h", "Arachis",
"Arachis hypogaea", or "peanut" as search terms.
- Family names
- Search for Allfam family names, names of the corresponding Pfam domains or keywords
associated with each family.
- AllFam family accession numbers
AllFam accession numbers have the format AFddd, i.e. the letters 'AF' followed by a
- Pfam domain accession numbers
- Pfam accession numbers have the format PFddddd, i.e. the letters 'PF' followed by a
All searches are case-insensitive, i.e. searches for 'bet v 1', 'Bet v 1' and 'BET V 1'
are equivalent. Multiple words are combined by AND, i.e. all words searched for have
to be found in order to yield a result.