Watchfinder Case Study

How to identify leads coming from TV to understand the processes leading to a sale?

The problem

Watchfinder, one of the premier retailers specialising in premium watches, wanted to optimise their ad spending. As their products are high-value and can be seen as an investment, there generally isn’t an instant conversion. A customer will often have several engagements over different channels before deciding to buy. This makes attribution of sales a challenge as the spot prompting the first visit might be crucial in driving sales but cannot be easily linked to that sale. A way to identify individual visits generated by TV is needed so that a sale can be traced back to the first engagement.

The solution

Watchfinder contacted Adalyser to see whether there is a way to identify individual leads as coming from TV. At Adalyser we are working with machine learning to bring new insights to our clients alongside our tried and tested analytic models.

Machine learning is the approach of letting a computer “learn” by giving it data and the desired result and letting it find patterns in the data (link). We have a model that takes spot and lead information and tries to classify the lead as coming from TV or not.

A natural question is what information is most important for the correct classification of a lead. In our current model, the time since the spot aired is the most important. This is not surprising as there is a well defined time-window in which direct responses from TV occur and is also used in our standard attribution models. The device used is the second most important indicator, followed closely by the source of the lead (i.e. direct, organic search, etc). The signal-to-noise ratio measures the ratio between the number of leads before and after a spot and is correlated with the impact of the spot. Finally, the operating system (OS) also provides a measurable improvement in the prediction accuracy of the algorithm.

How accurate can this model classify leads? It depends on the strength of the response driven by the TV spot. For spots which drive little traffic, the number of mis-classification raises. For that reason, this model is only applied to spots with a clear response curve. In those cases the accuracy is greater than 80% and for the largest impact spots can be in excess of 90%.

Using this trained model on the Watchfinder data allowed us to identify a large number of leads that have come from TV.

The result

Watchfinder is now able to trace sales back to their first contact and understand the life-cycle of a customer leading up to a sale. This enables Watchfinder to attribute the effect of advertisement accurately and optimise their ad spending.

Attributing what fraction of leads originated from TV has been the bread and butter of our analytics. For the first time we are now able to identify individual sessions that have come from TV and this opens up a range of possible analysis. Watchfinders analysis is but one example of many new possibilities.

How to use machine learning insight in TV attribution

In this blog post, we show how machine learning can be used to attribute leads. Given information on a TV spot and the website traffic after the spot has aired, can we accurately identify the leads cause by TV?

At Adalyser, we are at the forefront of using machine learning (ML) to enable more targeted spending. The first question when faced with such a task is what method to use.

A gentle introduction to machine learning

Machine learning offers a variety of approaches that let computers “learn” based on data rather than explicitly being instructed how to do something. For this to work, we need some examples, such as a spots impact, and the thing we would like to predict, e.g. the number of leads that will result.

At first, we need to train our model. Initially, we guess the parameters of the model and try to predict the “label”. Then we change the parameters a little bit in order to improve our prediction. This process is repeated until our predictions stop getting better.

Once we have trained our model, we use it to predict the label on data for which we do not know the correct answer.

Let us try to predict the number of leads generated by a TV spot using the impact of the spot as our measurement. The simplest model is fitting a straight line between those two variables, let's call them the impact i and number of leads l. This is shown in the animation, where we add more and more training examples and update our model accordingly.

Once we have trained our model, we can use it to predict the number of leads we expect, given a spots impact. This is of course a simplistic example. In practice, we would consider many more variables, such as the type of product being sold, TV channel, time of day and use a more complex model than a straight line. This is what we do in the next section.

Applying machine learning

When choosing a model, there is an inherent trade-off between simplicity and predictive ability. Simple models are not able to represent all real world effects but are robust to the effects of noise, random variations in the data. This means that such models are likely to be as good at predicting the outcome of new data as they are on the training data. More advanced models can represent more complex relationships. However, such more flexible methods may just learn to approximate the data used for training and be bad at predicting other data. This is called overfitting.

Naive Bayes is a simple classifier that is nonetheless competitive at some tasks, such as text classification (e.g. spam filters). It is also often used as a baseline to compare other machine learning techniques against. For this reason, we will apply this to our data first, even if it is unreasonable to expect good results. After training, the Naive Bayes classifiers has an overall accuracy of 73%. While 90% of the leads not from TV were classified correctly, only 43% of leads from TV were correctly identified. This isn’t very useful yet, so we need to move to a more sophisticated machine learning technique.

Decision trees are a general method of predicting a value and data mining. They are robust to including variables that are irrelevant and trees naturally can deal with both numerical and categorical variables. However, they are prone to over-fit. There are several methods that combat this by building several trees and averaging them in some way. One such method is called a random forest (pun intended).

How accurately can we predict whether a lead was caused by television with a random forest? Using tracking and spot data, 88% of leads in our test data are labelled correctly. As with the Naive Bayes classifier, 91% of leads not from TV are classified correctly. However, in contrast to the Naive Bayes, 81% of leads from TV are also classified correctly. As we only use spot and lead information, this can be used to identify visitors to a website in real time.

Can we do better than that? The most versatile machine learning technique is training neural networks. These have been shown to beat humans in complex tasks such as categorizing images or recognizing handwritten digits. However, for the case of a single lead and spot, this yields the same accuracy of 88%. This suggests that there is insufficient information in the tracking data to achieve a greater accuracy: some leads that look identical are from different sources and more information is needed to distinguish them. We will look at such approaches in future blog posts.