Mascot search overview
Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases.
While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:
- Peptide Mass Fingerprint in which the only experimental data are peptide mass values, (tutorial)
- Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query, (more information)
- MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides, (tutorial)
MS/MS data can be searched against both Fasta files and spectral libraries.
The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.
Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.
The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.
The sequence databases that can be searched on the Matrix Science free, public Mascot server are:
Fasta sequence databases:
SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical. SwissProt is ideal for peptide mass fingerprint searches and MS/MS searches of well characterised organisms, where it isn’t essential to match every single spectrum.
EMBL EST divisions contain "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. There are 10 divisions:
- Environmental_EST
- Fungi_EST
- Human_EST
- Invertebrates_EST
- Mammals_EST
- Mus_EST
- Plants_EST
- Prokaryotes_EST
- Rodents_EST
- Vertebrates_EST
contaminants is a database of common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried
cRAP is a database of common contaminants compiled by the Global Proteome Machine Organization
Selected UniProt proteomes
Database name | Organism | Taxonomy ID | Uniprot ID | Coverage |
---|---|---|---|---|
UP6548_A_thaliana | Arabidopsis thaliana (Mouse-ear cress) (Strain: cv. Columbia) | 3702 | UP000006548 | 99.6% |
UP9136_B_taurus | Bos taurus (Bovine) (Strain: Hereford) | 9913 | UP000009136 | 98.0% |
UP1940_C_elegans | Caenorhabditis elegans (Strain: Bristol N2) | 6239 | UP000001940 | 99.7% |
UP6906_C_reinhardtii | Chlamydomonas reinhardtii (Chlamydomonas smithii) (Strain: CC-503) | 3055 | UP000006906 | 96.0% |
UP437_D_rerio | Danio rerio (Zebrafish) (Brachydanio rerio) (Strain: Tuebingen) | 7955 | UP000000437 | 96.9% |
UP2195_D_discoideum | Dictyostelium discoideum (Slime mold) (Strain: AX4) | 44689 | UP000002195 | 96.0% |
UP803_D_melanogaster | Drosophila melanogaster (Fruit fly) (Strain: Berkeley) | 7227 | UP000000803 | 99.3% |
UP625_E_coli_K12 | Escherichia coli (strain K12) (Strain: K12 / MG1655 / ATCC 47076) | 83333 | UP000000625 | 100.0% |
UP219602_F_oxysporum | Fusarium oxysporum f. sp. radicis-cucumerinum (Strain: Forc016) | 327505 | UP000219602 | 98.5% |
UP5640_H_sapiens | Homo sapiens (Human) | 9606 | UP000005640 | 99.5% |
UP589_M_musculus | Mus musculus (Mouse) (Strain: C57BL/6J) | 10090 | UP000000589 | 99.7% |
UP808_M_pneumoniae | Mycoplasma pneumoniae (strain ATCC 29342 / M129) | 272634 | UP000000808 | 75.9% |
UP59680_O_sativa | Oryza sativa subsp. japonica (Rice) (Strain: cv. Nipponbare) | 39947 | UP000059680 | 87.0% |
UP8311_R_communis | Ricinus communis (Castor bean) | 3988 | UP000008311 | 90.5% |
UP2494_R_norvegicus | Rattus norvegicus (Rat) (Strain: Brown Norway) | 10116 | UP000002494 | 97.8% |
UP2311_S_cerevisiae | Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker’s yeast) | 559292 | UP000002311 | 98.9% |
UP2485_S_pombe | Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) | 284812 | UP000002485 | 97.8% |
UP8227_S_scrofa | Sus scrofa (Pig) (Strain: Duroc) | 9823 | UP000008227 | 96.2% |
UP241690_T_harzianum | Trichoderma harzianum CBS 226.95 | 983964 | UP000241690 | 98.6% |
UP5226_T_rubripes | Takifugu rubripes (Japanese pufferfish) (Fugu rubripes) | 31033 | UP000005226 | 95.2% |
UP279841_T_thermophilus | Thermus thermophilus | 274 | UP000279841 | 85.5% |
UP186698_X_laevis | Xenopus laevis (African clawed frog) (Strain: J) | 8355 | UP000186698 | 95.6% |
UP7305_Z_mays | Zea mays (Maize) (Strain: cv. B73) | 4577 | UP000007305 | 96.4% |