Your Data Science Problem Is a Product Design Problem

Consider Netflix. You are presented with a screen where a top movie is shown in a hero placement, followed by rows of recommendations. The laziest recommendation is the chart: here is what is popular. This is a self-reinforcing loop. These titles are popular because they are shown prominently. They are shown prominently because they are popular.

If you click on a title from the chart, what does that tell Netflix about you? A large number of people are being served this same impression. Your click could mean you saw the ads elsewhere and are following up. It could mean the thumbnail is compelling. It could mean you like the franchise. It could mean you are bored and this is the path of least resistance. Netflix can take a guess, and that is what all the machine learning is trying to do. Take a guess.

Now consider a different design. What if the subscription charge reloaded a wallet each month? The wallet is used to "purchase" access to content. You could design various microtransaction structures: charged by viewing duration, charged only if you watch past 30% of the content, accumulation of coins that carry over across months. Premium content could require a slightly higher payment.

What is the point of adding this friction? You are forcing users to make a meaningful choice. Right now, clicking on something is costless. The user is not actively deciding what they want. With scarcity, even artificially created, you force tradeoffs. And tradeoffs are signal.

Now the data science team has something qualitatively different to work with. They are not observing what someone clicked on a free buffet. They are observing what someone chose to spend a limited resource on. They can see how someone trades off one title against another. They can measure the intensity of preference, not just the direction. A person who spends 80% of their monthly coins on a single title is telling you something fundamentally different from a person who spreads them across ten titles. The current system cannot distinguish these two users. The coin system makes the distinction visible.

This is not a recommendation for Netflix to adopt this model. Their business is built on growth and frictionless access, and adding transaction costs would conflict with that strategy. The illustration is meant to show what becomes possible when a product is designed to generate richer signals. The protocol creates investment: users think about tomorrow, about accumulation, about the next subscription round. There is a dynamic connection between present choices and future options. And that dynamic connection is itself informative.

You could extend this further. Let users commit coins to vote on what content they want next. Let them signal demand for types of content that do not yet exist on the platform. Now you have a forward-looking demand signal, not just a backward-looking consumption log.

The point is not about Netflix or coins or microtransactions. The point is about the relationship between product design and data quality.

The wrong question

Most data science teams are handed a fixed set of behavioral signals and asked to build models on top of them. The signals were not designed for analysis. They were designed for the product's primary function: playing a video, completing a booking, processing a payment. The analytical value is incidental.

If you have ever sat in a review where the personalization team presents an impressive model that still does not quite answer the question you care about, you have encountered this ceiling. The team is good. The models are sophisticated. The data is abundant. And yet the output feels approximate in a way that more iteration does not seem to fix.

The reason is that you are asking the wrong question. The question most platforms ask is: "how do we get more signal out of the data we already collect?" This question has diminishing returns. You can build increasingly sophisticated models on impoverished signals, but the ceiling is set by the signal quality, not the model quality.

The question you should be asking is: "how do we design interactions where meaningful choice is embedded in the product, so that using it naturally reveals the why behind the what?" This is the shift from data science as extraction to data science as design. It raises the ceiling by changing what the product makes visible.

Understanding your customers is determined by your product. The product determines the interaction modes users have, and therefore determines what signals can be collected and interpreted. Thumbs up, thumbs down, hide this, report this. These are the limited modes of communication most platforms give their users. A like is a like across all users. It tells you "what" but not "why."

This is not a data science failure. It is a product design failure. You cannot extract signal when the correct conditions are not there.

Why "what" thinking works for monopolies and fails for everyone else

The "what" approach works for monopoly platforms. When you control the entire market, anything goes. You can run gigantic deep learning models on billions of behavioral events and the sheer volume compensates for the poverty of each individual signal. You do not need to know why someone clicked. You just need enough clicks to find patterns.

Most platforms are not monopolies. If your platform actually needs to understand what your customers care about, you cannot limit their voices to a thumbs up and a skip button. The product should provide the core service but also create interactions that allow users to provide meaningful signals. Not surveys. Not focus groups. New interaction modes on the platform that augment the experience and generate richer data as a byproduct.

Part of the problem is inherent to certain business models. If a platform needs to sell ads, then less user agency is better. Pixels can be forced onto passive viewers. The platform's incentive is to maximize time spent, not to understand why time is spent. The data science team inherits this impoverished signal environment and is asked to build personalization on top of it. They do their best. But they are working with data that was never designed to carry the information they need.

If this is the ceiling you have been running into, the answer is not a better model. The answer is redesigning the interactions your product generates. That is a different kind of work, and it is the work Ansible Architecture does.

Ansible Architecture designs incentive systems and interaction protocols for platforms and marketplaces.

ansible-arc.com

Previous
Previous

The Lockbox Protocol: How Cooperation Begins Without Trust

Next
Next

Nordstrom Returns: Accidentally Generating Long Term Value