Late last year, we shared a blog post detailing a new tool to understand changes in conversion. As a quick recap, we wanted to build a way for the business strategy team to quickly identify what could be driving variation in conversion. Our solution combined dbt, survival analysis, and an interactive data visualization with Tableau.
Some impressive stats from our blog post:
However, data projects are not Instagram and shouldn’t be evaluated by likes and shares. If we look at internal usage stats of the tool, the picture is less rosy.
For last viewed data as of Apr. 2020:
- Only 6 unique viewers in the previous month, with none of those users being the intended business user audience (they were mostly data scientists)
- Of the intended business user audience, the most recent view was two months ago. One of the key stakeholders hadn’t even looked at it since it launched in Sept.
This post is not about pointing fingers. Rather, it’s a public reflection on learning from mistakes, something I believe the tech community needs to be more honest about :).
What went wrong?
The model got worse over time
In addition to the 20-odd features in our model, we added a “month” feature corresponding to the creation month for the observation. In theory, this would serve as an “unexplained” catch-all for variance not covered by our 20 features.
The first few months after the project, the unexplained coefficient was pretty small. However, over time, it started to be the largest explanatory feature, rendering our model somewhat pointless.
This is a pretty understood effect called model drift, which basically means that a model will get worse over time as data / reality changes.
The tool never made it into an existing business process
After we finished the tool, we didn’t have a great plan for how it would get integrated into existing business processes. Should we spend 10 minutes at every strategy weekly looking at the results? Or maybe we just automatically email / slack out the results to relevant people?
A lot of these ideas were discussed, but fell through the cracks. Since people weren’t forced to develop a habit around using the tool, they naturally forgot about it.
What did we learn?
Do the simplest solution first
We probably didn’t need a fully productionized ML solution to improve our understanding of conversion. For example, consider this much simpler solution:
- Collect a dozen candidate features and fit a model offline. You can do this using a fancy ML library, but logistic regression in Excel works fine too
- Pick the top 3 most predictive features and sense-check with a domain expert
- Track those 3 features as KPIs in a line chart
While it doesn’t give the precision of being able to say “Conversion is down 10%, 5% of which is due to feature X, 3% is due to feature Y…”, this simple approach may be enough if all you want to know is if you’re moving in the right general direction. It’s also much easier to interpret and, best of all, can be completed in a week by a junior analyst.
Modeling is never done
If you really want to invest in the more holistic approach to understanding conversion, don’t make it a quarter “sprint”. Build a team around it and treat the final dashboard like you would treat any other feature in your core product.
Revisit your model at least once a quarter and do a feature audit. Data changes, customer behavior changes, and your KPIs should change accordingly.
The additional benefit of having a full time dedicated team is you can invest more in the internal product marketing side and really ensure it gets used in the right way across the business.
Failure is okay
As opposed to forcing people to use a product that is sub-optimal or becoming discouraged, it’s better to foster a culture that is honest when something isn’t working.
Most products fail, especially on their first try. Then you learn from those mistakes, dust yourself off and try again.
- project management
Looking for computer vision summer internsMon May 21 2018—by Erik Bernhardsson2 min read
- machine learning
Upserts in Redshift
Redshift doesn't support upserts (updates + inserts) but using a few tricks we can implement it anyway.Wed Aug 28 2019—by Erik Bernhardsson1 min read
Using ImageMagick for PDF redaction
Automating document redaction with a classic command line toolThu Aug 06 2020—by Robert Brownstein3 min read