NEWS / AI TECHNOLOGY

25 november 2020
At Artefact, we are so French that we have decided to apply Machine Learning to croissants. This first article out of two explains how we have decided to use Catboost to predict the sales of “viennoiseries”. The most important features driving sales were the last weekly sales, whether the product is in promotion or not and its price. We will present to you some nice feature engineering including cannibalisation and why you sometimes need to update your target variable.

What is it?

At Artefact, we are so French that we have decided to apply Machine Learning to croissants. This first article out of two explains how we have decided to use Catboost to predict the sales of “viennoiseries”. The most important features driving sales were the last weekly sales, whether the product is in promotion or not and its price.

We will present to you some nice feature engineering including cannibalisation and why you sometimes need to update your target variable. We chose the Forecast Accuracy and the biais as evaluation metrics. Our second article will explain how we put this model in production and some best practices of ML Ops.

For who?

  • Data Scientist, ML Engineer or Data lovers

Context

Model development

Now that we have a well defined problem and some goals to achieve, we can finally start writing some nice python code in our notebooks — let the fun begin!

Data request

As in any data science project, it all starts with data. From experience we strongly recommend asking for the data request as soon as possible. Don’t be shy to ask for a lot of data and for each data source be sure to identify a referent, someone you can easily contact and ask your questions about the data collection or how the data is structured.

Thanks to the different meetings we were able to draw a list of the data we could use:

  • Transactional data including price of products.
  • Promotions: a list of all the future promotions and their associated prices.
  • Product information: different characteristics related to the products.

Exploratory Data Analysis (EDA) and outliers detection

Image for post

From sales prediction to optimal sales prediction

A challenge led us to update our target variable. Sometimes due to an unexpected influence or bad forecasting, the department expected a shortage of products before the end of the day. Two phenomena can then happen: the customer not being able to find his product doesn’t buy anything, or buys a similar good. Based on historical data, we inferred some distribution laws (basic statistics) that helped us to modelise this impact and updated our target variable in order not to predict the historical sales but the optimal sales for a particular product.

This update of the target variable is tricky because it is really hard to know if the update made sense. Did you really improve the quality of the data or make it worse? One way to quantify our impact was to take sales without out of stock and create false shortage, for instance remove all sales after 5 or 6pm and then try to reconstruct the sales. This method helps us to get back to a classic supervised problem that we can assess objectively.

As a result we were able to predict the optimal sales and avoid our algorithm to learn shortage patterns.

Our models

After having properly cleaned our data we can finally test and try a few models.

Image for post

One model vs many models

To sum up we used one algorithm: Catboost, to predict all our 10 000 time series, for each product and each store. But what if an item has a really particular sales pattern, or a specific store? Would the algorithm identify and learn this pattern?

These questions lead us to the question, should we cluster our products, stores and train one algorithm per cluster? Even if the use of decision tree algorithms should tackle this challenge, we observed limitations in some specific cases.

Boosting algorithms are iterative algorithms, based on weak learners which will focus on their biggest errors. It is obviously a bit oversimplified but it helps me to point out one of their limitations. If you didn’t normalise your target variable, your algorithm will “only” focus on products with big errors which are to be more likely the one with the biggest sales. As a result, the algorithm may focus more on the products or stores with bigger sales volume.

We didn’t find the perfect way to address this challenge but we observed some improvements by clustering our products/stores by family or selling frequency.

How to evaluate our model?

  1.  Cross Validation

2. The choice of the metric:

Image for post

Final words, some advice for any data projects

Key Takeaways

Interested in digital and data marketing?

Sign up to Data Digest, Artefact’s newsletter, to get actionable advice, insights and opinion in your inbox every month.

Sign me up!