Mascot search overview

Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases.

While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:

  • Peptide Mass Fingerprint in which the only experimental data are peptide mass values, (tutorial)
  • Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query, (more information)
  • MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides, (tutorial)

MS/MS data can be searched against both Fasta files and spectral libraries.

The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry.

Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.

The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species.

The sequence databases that can be searched on the Matrix Science free, public Mascot server are:

Fasta sequence databases:

SwissProt is a high quality, curated protein database. Sequences are non-redundant, rather than non-identical. SwissProt is ideal for peptide mass fingerprint searches and MS/MS searches of well characterised organisms, where it isn’t essential to match every single spectrum.

EMBL EST divisions contain "single-pass" cDNA sequences, or Expressed Sequence Tags, from a number of organisms. During a Mascot search, the nucleic acid sequences are translated in all six reading frames. There are 10 divisions:

  • Environmental_EST
  • Fungi_EST
  • Human_EST
  • Invertebrates_EST
  • Mammals_EST
  • Mus_EST
  • Plants_EST
  • Prokaryotes_EST
  • Rodents_EST
  • Vertebrates_EST

contaminants is a database of common contaminants compiled by Max Planck Institute of Biochemistry, Martinsried

cRAP is a database of common contaminants compiled by the Global Proteome Machine Organization

Selected UniProt proteomes

Database name Organism Taxonomy ID Uniprot ID Coverage
UP6548_A_thaliana Arabidopsis thaliana (Mouse-ear cress) (Strain: cv. Columbia) 3702 UP000006548 99.6%
UP9136_B_taurus Bos taurus (Bovine) (Strain: Hereford) 9913 UP000009136 98.0%
UP1940_C_elegans Caenorhabditis elegans (Strain: Bristol N2) 6239 UP000001940 99.7%
UP6906_C_reinhardtii Chlamydomonas reinhardtii (Chlamydomonas smithii) (Strain: CC-503) 3055 UP000006906 96.0%
UP437_D_rerio Danio rerio (Zebrafish) (Brachydanio rerio) (Strain: Tuebingen) 7955 UP000000437 96.9%
UP2195_D_discoideum Dictyostelium discoideum (Slime mold) (Strain: AX4) 44689 UP000002195 96.0%
UP803_D_melanogaster Drosophila melanogaster (Fruit fly) (Strain: Berkeley) 7227 UP000000803 99.3%
UP625_E_coli_K12 Escherichia coli (strain K12) (Strain: K12 / MG1655 / ATCC 47076) 83333 UP000000625 100.0%
UP219602_F_oxysporum Fusarium oxysporum f. sp. radicis-cucumerinum (Strain: Forc016) 327505 UP000219602 98.5%
UP5640_H_sapiens Homo sapiens (Human) 9606 UP000005640 99.5%
UP589_M_musculus Mus musculus (Mouse) (Strain: C57BL/6J) 10090 UP000000589 99.7%
UP808_M_pneumoniae Mycoplasma pneumoniae (strain ATCC 29342 / M129) 272634 UP000000808 75.9%
UP59680_O_sativa Oryza sativa subsp. japonica (Rice) (Strain: cv. Nipponbare) 39947 UP000059680 87.0%
UP8311_R_communis Ricinus communis (Castor bean) 3988 UP000008311 90.5%
UP2494_R_norvegicus Rattus norvegicus (Rat) (Strain: Brown Norway) 10116 UP000002494 97.8%
UP2311_S_cerevisiae Saccharomyces cerevisiae (strain ATCC 204508 / S288c) (Baker’s yeast) 559292 UP000002311 98.9%
UP2485_S_pombe Schizosaccharomyces pombe (strain 972 / ATCC 24843) (Fission yeast) 284812 UP000002485 97.8%
UP8227_S_scrofa Sus scrofa (Pig) (Strain: Duroc) 9823 UP000008227 96.2%
UP241690_T_harzianum Trichoderma harzianum CBS 226.95 983964 UP000241690 98.6%
UP5226_T_rubripes Takifugu rubripes (Japanese pufferfish) (Fugu rubripes) 31033 UP000005226 95.2%
UP279841_T_thermophilus Thermus thermophilus 274 UP000279841 85.5%
UP186698_X_laevis Xenopus laevis (African clawed frog) (Strain: J) 8355 UP000186698 95.6%
UP7305_Z_mays Zea mays (Maize) (Strain: cv. B73) 4577 UP000007305 96.4%

Spectral library databases:

NIST_Mouse_IonTrap

NIST_S.cerevesiae_IonTrap

PRIDE_Contaminants

PRIDE_Human