February 8, 2018

How we predicted conversions using Big Data & Machine Learning

Here at T2O media we love “storydoing” more than storytelling, this is why we decided to explore how we could use Big Data and Machine Learning, two concepts that sound pretty interesting in theory, in our digital media campaigns for the benefit of our clients. Particularly, we wanted to answer two questions: how could we improve, our client´s businesses and how can these technologies help us boost the performance of our communication and advertising efforts?

For sometime now, our multidisciplinary team has been evaluating different ways to implement Big Data into our agency´s projects, arriving to the conclusion that Big Data should be used as a means to reach our goals rather than being the goal itself. What is truly valuable about having access to large amounts of information it’s how you “put it to work” and that is exactly where Machine Learning algorithms come into play, as they are capable of identifying patterns and relationships within the vast amount of data generated by users throughout their interactions with our brands and from there, try to predict the outcome of future brand interactions.


Big Data and Machine Learning: first steps

This is what we did to get the project started:

  • We chose one of our clients, a hotel chain, to conduct the pilot as they are one of our most innovative partners in travel, a very competitive vertical .
  • We identified our main data sources: DoubleClick Campaign Manager was a perfect fit as we already had data from about 500k users each with 110 columns, all this just from our Spanish market alone.
  • We set our goal: predicting a user’s likelihood of conversion, which in this case was booking a room in one of our client’s hotels.


Data Preparation, algorithm selection and training

1. We collected a great array of information at our disposal: user identifier, type of device, geolocation, seasonality and medium.

2. Using Google Cloud Platform’s Dataproc (Spark’s implementation, for our more advanced readers) we treated all data and got it ready to be used by the algorithm.

3. After thorough research, we found the right algorithm to use for classification problems: Random Forest. How does it work? It classifies converters and non-converters within the previously chosen variables and generates decision trees on which these characteristics are distributed randomly.

4. We fine-tuned our algorithm eliminating outliers and unusual data that only generated noise, such as data produced by test accounts or web crawlers.

We also changed the weightings so the data would be more balanced: by default, there are more non-converters than converters, but we needed to focus all of our attention only in the converters, so we adjusted the weights for the algorithm to find our most relevant users.

Similarly, not all columns were equally important. As a matter of fact, we proved that it was far more beneficial to concentrate on just a few very relevant variables, than including a large number of columns with user characteristics. Within Big Data context, a “less is more” approach is more effective: shorten the number of branches in a decision tree, makes the work of the AI tool a bit easier.


5. We took our algorithm into the “real world”: we configured it in such a way that it would be able to predict if a user would convert the day after their interaction, based on the data collected. The result: the algorithm managed to correctly predict 85% of converters.


Random Forest Algorithm executing 40.000 users


Applying the algorithm to digital media executions

We have a pragmatic approach to big data and machine learning, in line with our company’s philosophy: our focus is always the pursue of results. In that sense, our objective was to activate all the data compiled in the digital environments and bring the algorithm into our media buys.

To achieve this, we generated audience lists in DoubleClick Campaign Manager that grouped all users with a high chance of conversion (data that we got from the model) and the users that were identified to have no chance of conversion, based on results from Machine Learning.

This last piece of information proved very useful taking into account that our client, a hotel chain, had many users that would book a room through an aggregator and visit the hotel website only to check prices or images. Now, we were able to identify these users and exclude them from our retargeting efforts as it didn’t make any sense to try to sell rooms to a person who already has a booking through another platform.

We allowed these audience lists to run in DoubleClick´s DSP at the same time as our traditional lists and the results were as expected: in our programmatic campaigns the Smart Lists of converters worked way better than our traditional lists: in some cases delivering 5x the efficiency of similar retargeting efforts, based on metrics such as CPO. On the other hand, the non-converters lists confirmed that this kind of audience would not convert.

All these findings provide us with the ability to optimize our media executions in a more intelligent and more automated way than before.

Big Data and Machine Learning help us to become more sophisticated and to be able to boost the performance of our digital campaigns in many ways, including:

  • Provide a better user experience, as we can show a more customized message based on our learnings.
  • Improve conversion rates, through specific actions for each user type and the stage of the customer journey they are in.
  • Delivering a more efficient media buy, by virtually eliminating allocation of funds on non-converters.
  • Identifying strengths and weaknesses of our sites by understanding which elements are more influential in the users’ decisions.


Big Data and Machine Learning: future and improvements

With this practical case we have been able to witness the many advantages these technologies have in store for us. But our learning process doesn’t end here, this project will keep growing and improving everyday 😉

We have started to upload our Smart Lists into platforms such as Google Analytics 360, which allows us to reach these audiences in Google Adwords and YouTube, and soon we will also be uploading them in other DSPs such as Mediamath.

Additionally, we are testing new algorithms such as Deep Neural Networks with TensorFlow, Google´s framework for deep learning.

Lastly, we expanded the reach of our project with the integration of big data from more than one touch point: call center data is one of the latest sources of information we are implementing to learn how offline users interact, and slowly complete our understanding of our consumers’ behavior.

Working with Big Data and Machine Learning has been key in our development of other services such as product recommendation engines, qualification of  converters (i.e efforts will differ between a user with $10,000 in her shopping cart and with one that has only $1,000 in it), and integrated platforms where we could customize our message according to this deeper knowledge of our audience (dynamic landings, etc.).

Making innovation a reality is possible, interesting and, above all, profitable for brands


Want to learn more? Here are some interesting articles, all about Big Data and Machine Learning: