Specific genes are well known for species identification and include nuclear (e.g., ribosomal RNA genes), mitochondrial (e.g., COI), and chloroplast (e.g., rbcl). When a sample is analyzed the question is no longer: “Are species X, Y, or Z present in the sample?” Using NGS the question is: “Which species are present in the sample?”
Since all sequences obtained can be compared with a specific DNA database, each match between the obtained NGS sequences and the database originate a species ID result, producing a list of species instead of a presence/absence result for targeted species. Additionally, using appropriate software, a ratio of DNA sequences obtained for each species can be created. Due to the untargeted nature of this method even exotic species can be identified.
The Challenge of Fragmented DNA
DNA-based methods are limited by the need to obtain DNA fragments with the necessary integrity to perform the analysis. In some products, specifically those that have been highly processed, ingredient DNA can be highly fragmented or even absent. When DNA is highly fragmented, it is essential to guarantee that the DNA-based method used will allow the detection of DNA fragments as small as 100 base pairs, or even lower.
The smaller the DNA fragment to be analyzed, the more difficult it is to differentiate between closely related species. The best strategy is to use a DNA sequencing method that obtains the full nucleotide (A, T, G, C) sequence of the target region to be analyzed. Real-time polymerase chain reaction’s (PCR’s) fluorescent signal is a limitation for the detection of cross species reactivity, and may produce false positive results, especially in complex food products containing multiple ingredients.
DNA Barcoding Strategy
Probably the most well-known use of DNA sequencing for food authenticity is the DNA barcoding strategy that is already in use by many regulatory entities in the sector. Perhaps one of the most widely used barcoding methods is the one for fish-based products, enabling fish species identification by regulatory bodies in the U.S. and Europe. However, this method is not suitable for processed samples that contain multiple ingredients (species) as it only enables the identification of a unique species. Food products containing multiple species cannot be analyzed with this approach.
With NGS a similar barcoding approach can be used by sequencing defined DNA regions and comparing the results with the same DNA/species databases used for the classic Sanger DNA sequencing approach.
The DNA Sequence Database
One of the key points when using a DNA-sequence producing method like NGS is the reliability of the databases that are used for species identification. Many efforts have been made in recent years to try to ensure the reliability of the DNA sequences contained in the databases, including using reference material that is sequenced and included on the database. Using bioinformatic tools to analyze public data is also valuable work so long as the DNA sequence analysis tools are used correctly. The use of multiple DNA alignments and phylogenetic analyses is crucial for ensuring the reliability of the sequence included on the databases. Because NGS is highly customizable, it makes it possible for any lab to produce its own DNA database to ensure its quality.
Wider Availability of NGS
Given the recognition of NGS as a powerful tool, the first workflow for using NGS for species identification on food was announced for the market in November 2018, making the method available to any laboratory working in food production. Additionally, NGS has been introduced into standardization, namely at the ISO level, to start to define the minimum requirements related with all pre- and post-bioinformatic analyses required during NGS analysis. This includes not only the DNA sequence itself that depends on the NGS platform used, but also the definition of the DNA regions to be analyzed and the DNA databases used for species identification.