Episode 2: How to select the right data for effective predictive analytics

Predictive analytics sound very promising and provide an asset to build a robust proactive strategy and prevent food safety incidents that can negatively affect brand trust and reputation. This innovative technology can assist in easing the food company’s decision-making process, by reducing the stress of uncertainty.

Completely understanding how predictive analytics work and can practically help food safety leaders is still a brainteaser. But our latest short video series is aiming to end this struggle.

In this episode, Giannis Stoitsis, CTO, and partner at Agroknow present the right data subsets, in order to address the questions raised in our previous episode.

Is it possible to predict with high confidence, how many food safety incidents we are going to have for each ingredient category in 2020?

There are some data selection decisions to be made:

  • Are we interested in the global picture, using a very large global dataset to build a generic food safety predictor?
  • Or do we need a data subset that is only associated with specific geographical regions of interest?
  • Furthermore, should we split data according to the product categories that they refer to, removing irrelevant product recalls and rejections, so that they do not influence the prediction?

So, get ready for some serious data preparation, splitting, and re-combination work.

This is typically done by splitting the real, historical data into training and testing subsets. The algorithm is parameterized using the training data, then its predictive capabilities are evaluated using the testing data.

How do we generate reliable predictions?

There are several data preparation and combination techniques to ensure that the model will be able to generate reliable predictions. Then, data problems such as the handling of missing or inconsistent data come to play. Again, there are plenty of techniques to help address issues such as missing data values.

The takeaway to keep in mind is that some serious data processing and management needs to take place before we can deploy algorithms.

How are the AI algorithms used over all these data combinations to deliver a reliable and efficient food safety prediction service? Find out in our next video!

If you’d like to discover how FOODAKAI can help your Food Safety & Quality team prevent product recalls by monitoring & predicting risks, schedule a call with us!


“Funded with the support of European Commission, and more specifically the project CYBELE “FOSTERING PRECISION AGRICULTURE AND LIVESTOCK FARMING THROUGH SECURE ACCESS TO LARGE-SCALE HPC-ENABLED VIRTUAL INDUSTRIAL EXPERIMENTATION ENVIRONMENT EMPOWERING SCALABLE BIG DATA ANALYTICS” (Grant No. 825355