In addition to data sources briefly discussed above, food safety professionals can also have the opportunity to access a number of other structured and unstructured data sources, including often large amounts of data that are automatically captured through recording devices in food processing and retail environments (e.g., temperature data for heat treatment steps or refrigerated storage) and employment data (identifying the individuals that perform certain tasks, such as sanitation, on a given day). Unstructured data that could be mined for relevant information include, but are not limited to, video-captured data of facilities and employees.
It is also possible to rapidly acquire, often with no cost (other than computer and personnel time), large sets of metadata associated with samples that have been collected for microbiological or other testing. For example, public data sources are available that provide weather patterns (temperature, rain events, wind direction and speed, etc.) that are associated with a sample collection site and a specific sample collection date. These type of data can be used to rapidly determine whether out-of-spec samples (for example, samples positive for a pathogen or indicator organism) are associated with specific weather patterns (for instance, rain in the preceding day(s)), which can help in root cause analysis; for instance, associations with rain may indicate roof leaks or other water intrusions as a root cause. These same metadata could also be used for predictive analytics that may show an increased risk of pathogen findings or spoilage events after certain weather patterns, which could trigger enhanced preventive efforts.
Examples of Approaches in Food
One of the most mature examples of the use of large datasets in food safety is the use of WGS-based subtyping methods by both public health and regulatory agencies. In the U.S., the CDC and state partners are performing WGS on every human clinical Listeria monocytogenes isolate. Similarly, regulatory agencies such as the U.S. FDA are currently performing WGS of foodborne pathogen isolates obtained from foods and food associated sources. WGS will determine the sequence of virtually all 3 million nucleotides (A, T, C, and Gs) in the Listeria monocytogenes genome, typically with at least a 20-fold coverage, therefore creating 60 million data point per genome, which is used for extremely high resolution subtyping. Use of these WGS tools has significantly improved the ability of public health agencies to detect human listeriosis outbreaks, which allows for identification of more outbreaks than with previous subtyping tools (i.e., pulse field gel electrophoresis), including detection of smaller outbreaks (with less than five cases) that may also have gone undetected previously. As these tools are being applied to other pathogens, in particular Salmonella, the number of detected outbreaks caused by these other pathogens will likely increase considerably.
In addition to WGS, metagenomics-based tools also provide large datasets (often providing gigabases of sequence data), which can help characterize total microbial populations in samples. These tools have allowed for detection of new or previously unrecognized pathogens in clinical and food samples and have been shown to detect pathogens that were undetected by traditional microbiological methods. These methods also can facilitate detection and identification of spoilage issues and could be used as untargeted screening tools for raw materials streams and ingredients.
Use of geographic information system (GIS)-based datasets to predict and manage food safety risks are also rapidly gaining traction. For example, recent studies have shown how GIS data can be used to predict locations and time intervals that may represent a higher risk for foodborne pathogen contamination in fields.
While there clearly is considerable potential for big data-based approaches to facilitate improved approaches to food safety and food quality, a number of challenges remain for industry to take advantage of these tools. Most of these challenges are not unique to this industry, but some of them may be more pronounced. For example, data capture in the food industry is still often manual and often involves paper records that cannot be used easily for data mining. Also, there are few trained data scientists who are also familiar with food systems type issues (or food systems scientists who can work with large datasets), which further affects the ability of industry to develop and implement effective systems that utilize large datasets to address food safety and quality issues. Based on these and other challenges, there is a clear need for the industry to take action to prepare to take advantage of big-data tools and solutions for food safety and quality dilemmas.
What Could the Future Bring?
With the rapid advances in both collection and analysis of big data, it can be valuable to speculate on what the medium- and long-term future may look like as these tools are increasingly applied to food safety and quality. For example, the use of WGS for characterization of foodborne pathogen isolates by regulatory and public health agencies in the U.S. has gone hand-in-hand with rapid public release of full sequencing data. This puts industry in a position where it may soon be able to monitor subtype data for human clinical isolates and where it can then rapidly detect possible outbreaks, e.g. through comparisons with subtype data for isolates from processing facilities and other data (e.g., distribution pathways, purchase patterns). In the processing environment, integration of diverse data sources with historical microbial testing data may not only allow for improved and accelerated root cause analysis, but also for prediction of time intervals that may present lower and higher risk for spoilage or food safety issues; this information could be used to adjust food safety and operational practices in near real-time to include additional barriers and controls, including adjustments in preventative maintenance schedules, etc. Data sources that could be used in these analyses include weather patterns, environmental parameters in a facility (monitoring humidity, dews points, etc.), and equipment related parameters (vibration, flow rates, etc.).