A novel machine-learning approach could lead to faster identification of the animal source of certain Salmonella outbreaks. Scientists led by researchers at the University of Georgia Center for Food Safety in Griffin, Ga., published research on their methodology in the January 2019 issue of Emerging Infectious Diseases.
“In a foodborne outbreak, quickly identifying the implicated food commodity is necessary to recall contaminated products in a timely manner to prevent more consumers from contracting the pathogen,” says Xiangyu Deng, PhD, an assistant professor of food microbiology at the center who led the project. Meats and other livestock products are major vehicles of Salmonella infection.
Dr. Deng’s methodology involves using a source attribution tool that takes advantage of large and growing volumes of whole genome sequencing (WGS) data of one of the most prevalent Salmonella serotypes. He used more than 1,000 genomes to predict the animal sources of Salmonella Typhimurium.
“We trained a machine-learning algorithm called Random Forest with many Salmonella Typhimurium genomes from livestock origins. The algorithm learned how to classify Salmonella Typhimurium genomes by their association with livestock animals,” he explains. “Then, we can use the algorithm to predict animal sources of query genomes, which can be outbreak isolates.”
By using the machine-learning approach, Dr. Deng’s team identified a small set of about 50 genetic markers that were sufficient for robust livestock source prediction of the pathogen. “This finding may lead to a rapid and scalable source identification tool without analyzing entire genomes,” he says.
In commenting on the novel methodology, James S. Dickson, PhD, professor, Department of Animal Science, Inter-Departmental Program in Microbiology, Iowa State University, Ames, says, “Whole genome sequencing is a very powerful tool, but it’s very labor intensive to analyze the results. Using a computer algorithm to identify similar cultures from patients involved in an outbreak allows public health epidemiologists to begin to track a source sooner. This should lead to a quicker identification of the outbreak’s source.”
Dr. Dickson believes the new methodology is a logical next step in using WGS. “Because there’s a massive amount of data, using a computer algorithm to sort it makes a lot of sense and allows the data to be analyzed faster,” he says. “The key is to be sure that the computer knows what to look for in the data, which is what scientists at the University of Georgia are working on. If they can work through all of the details of telling the computer what to look for and what to ignore, then this could be a significant step forward in public health. The technology would not be limited to foodborne diseases but could be adapted for use with other types of human illnesses.”
Despite its perks, Dr. Deng notes that the approach has some limitations. “We have only worked on one serotype,” he says. While Typhimurium is often the most common serotype in many parts of the world, more than 2,600 different reported serotypes of Salmonella exist.
Secondly, Dr. Deng’s team has only explored a few of the major livestock sources including poultry, bovine, and swine, thus far. Furthermore, some Salmonella strains are generalists in terms of host preference. They can jump around among different animals, making source attribution challenging.
Says Dr. Dickson, “As with most innovations, the distance between proof of concept and application can vary widely. I think that they will need to broaden their approach and include international trading partners as well.”