“Big data” is being talked about everywhere, including increasingly in the context of food safety and food quality. For example, while only one symposium covered “big data” in the 2014 annual meeting of the International Association of Food Protection (IAFP), the recent 2015 IAFP annual meeting included at least four sessions that mentioned “big data” in the session title or abstract. While the potential of big data and data analytics to improve our ability to address food safety and quality issues is increasingly recognized, use of these tools in food safety and quality still appears to be limited. Even if “big data” are used in this space, many may argue that the amount of data used in these cases rarely qualify as truly being big data, rather these data may often simply be large traditional datasets. While big data may only be slowly making their way into food safety and quality, there is a need for food science professionals to critically discuss and contemplate the impact of big data and associated analytics to allow for timely and appropriate implementation and use of these tools in food safety and quality to achieve improved decision making.
Big Data Introduction
While many definitions exist for “big data,” a common definition reads along the lines of “Big data is a broad term for datasets so large or complex that traditional data processing applications are inadequate” (Wikipedia, accessed Aug. 3, 2015). Based on Douglas Laney’s definition of data by the “3Vs,” today a “4V” definition of big data is often used, which can be summarized as “Big data represents high volume, high velocity, high veracity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization.” Often, “big data” also is linked to predictive analytics, as compared to the more typical use of data in food safety, which focuses on retrospective identification of associations and increasingly real-time or near real-time monitoring of processes. Most uses of large datasets and big data analytics in food safety and quality to date focus on providing improved root cause and retrospective analyses, but development and use predictive analytics in food safety is likely to grow quickly in the near future.
Big Data Sources for Food
Many of the early discussions on big data have focused on the use of genomics data as well as social media-related information in food safety. Whole genome sequencing (WGS)-based subtyping has been used for more than five years to create large sets of data that can be used for high resolution subtype characterization of foodborne pathogens (and spoilage organisms), which allows for better outbreak detection and source attribution. Importantly, WGS data for foodborne pathogens are also often rapidly released by public health and regulatory agencies, allowing for use of these data by industry. For example, WGS data for Listeria monocytogenes isolates identified as having been obtained from ice cream in Kansas became publicly available soon after a listeriosis outbreak linked to ice cream (with cases in Kansas) was reported in early 2015. Other omics datasets, such as metagenomics data, have also been used to identify and characterize food spoilage issues. It is likely that these types of data sources will also increasingly become available to the food industry.
Use of social media-related information has seen considerable early enthusiasm based on initial reports that suggested that “Google Flu Trends” can allow for early detection of flu outbreaks. Subsequent studies have suggested though that this tool may often inaccurately predict flu outbreaks. However, a recent CDC report suggests that mining of Yelp reviews can help public health agencies to identify foodborne disease outbreaks, which are linked to restaurants and may have otherwise gone undetected. Similarly, sales data, including data from shopper club cards and similar instruments, are also available to many retailers and companies and can be used to help detect and identify foodborne disease outbreaks, aiding in rapid initiation of product recalls and other consumer safety actions.