Karin Schoefegger has over 15 years of experience in building AI products. As an expert in her field, she wants to teach people how to build AI products that are innovative, but first and foremost trustworthy. Her talk at the Product at Heart conference starts out with a thought experiment:

Take a moment to let your mind wander. You wake up one morning, 10 years in the future. It’s a world where AI products fit seamlessly into your life. What is one thing your virtual assistant did for you while you were asleep? And how does this make you feel?

Karin acknowledges that there are many emotions that a future with AI can trigger in us. Some people are excited, some are cautious, some fearful. The media nowadays portraits a dystopian future: “If you don’t try out one of those hundreds of tools right now, you will lose your job!”.

But AI is here to stay, and Karin wants us to feel comfortable about it. She has vast experience in the field, but the developments of the last year still surprised her, especially the power – and speed – of how AI entered our lives. She now sees three big areas where AI is already in place and will continue to accelerate:

Team Productivity
Back Office Operations
Product Offerings

The best starting point to get into AI

So what is the best starting point to get into AI? Do we all need to learn how to code? No, she says, on the contrary. AI already has a diversity problem as of now, and it is necessary that people with diverse perspectives and backgrounds enter the field. This also means that job roles like product, design or business need to get more AI knowledge.

We hire smart ML engineers to focus on the tech. But there are many steps involved to build a successful AI product, and no model can live on its own. So in addition to product management skills, we need to learn some specific skills and become curious about how AI differs from other software products.

Good news first: Not everything has to change. Product managers will still focus on the interplay of business, UX and tech, so the famous venn diagram won’t change. But we need to understand how building AI products is different and which metrics are used to evaluate the quality of those products. So a prerequisite for building AI products is becoming data-driven, and that is true both for yourself and your organization. You can’t become AI-first if you are not data first.

But we don’t need to know it all: When it comes to AI skills, the goal is to gather just enough knowledge to find your way around your current setup. The level of needed knowledge depends on the use case and the maturity of your product. And in a world where knowledge is growing so fast, it is impossible to know everything.

So what is your job as product manager when working on AI? It’s making sure that we are focused on solving the right issues. Most AI products fail, for completely preventable reasons. The main cause is starting with a technical angle (finding a problem for a solution). Product managers need to consider other dimensions: Is it viable, does the customer want this? Is it adding value? And in the end: Is AI really the right solution to my problem? Karin tells us: If you do this, you are one step ahead.

Embrace Tradeoffs

When building AI products, we have to consider the goal of the product and the quality of the model. We need to design systems so they fail gracefully. And since models change over time, we have to pay attention to the outputs.

Karin illustrates this point with a case study where she was asked to build a classification system to identify news videos on YouTube. Seems simple at first, but soon questions arose:

What do we define as news? What data do we use to train our systems? And how is the output being used? These are important questions to ask, because when building AI models, there are always tradeoffs and different metrics to look at depending on the use case. Do we want to measure how many news videos are on the platform? Then we use a recall metric. But if we want to build a new section to show news-related videos, we use a precision metric to make sure that this section only includes the right content.

So when building AI products, product managers need to decide about the tradeoff between different metrics. We need to know what we optimize for, because we cannot optimize for both. Let’s take a chatbot as an example. Do we optimize for a human sounding speech pattern, to make it more relatable? Or do we optimize for accuracy of the given answers? Even explainability and model quality becomes a tradeoff decision. A higher quality output often requires more complex models, but those outputs are then more difficult to explain to the user.

These tradeoffs are not technical decisions, so you can’t leave them to the engineering team. These are business decisions, and they also affect your users. Because with every tradeoff, you introduce new risks to your product. So we need to understand the risks in order to assess if we can actually launch an AI product.

Address the Risks

When building AI products, we need to understand where models fail and what we can do about it. Are the failures systemic? And is the system trustworthy enough so it can be launched?

Karin gives two well-known examples for these failures: A tool where people could request a mortgage and another tool where they could apply for a job. In both cases a specific group of people was left out. Failures like this pose a risk to your product and business, including:

Negative effect on brand integrity
Negative effect on employee retention & recruiting
Limited user growth
Liabilities (current & upcoming.

Karin noticed that instead of talking about trustworthy AI, people pay more attention when you talk about risk management. Language matters. But whatever you call it, in the end the solution to mitigate the risks is building a trustworthy AI.

Build a Trustworthy AI

Karin defines a trustworthy AI as ethical, lawful and robust, which is based on a concept from the European Commission.

When looking at ethical AI, the first question is: How are fairness issues introduced to my system? The first reason is that there is a human involved in every decision when designing these systems. The second reason is that all models are based on data, and this data represents human interactions. So the system represents the input it has received and might amplify it.

We can try to prevent unfairness issues by looking at the data and trying to see the humans behind those data points. We can use a human-centered design approach and pay attention to who we design for. An example from Karin: She once built a language learning product, based on generative AI. They assumed that most people would like to learn American or British English, but soon discovered that many people actually wanted to learn Indian English.

The next step is to really make sure that you know where your data comes from. Of course your data sources have to be legal. But there is also this misconception that public data is more fair because it’s used by so many. Researchers found out that also these open data sets can be skewed into a certain direction.

It’s important to understand that a system will amplify biases that it gets from the data. And this can have a negative impact on society, but also on your business. So understanding where your model makes systemic mistakes and identifying the blindspots is crucial.

But how do we evaluate that our models are of good quality? We can take precision and recall, we take fluency and accuracy and we can look at these metrics. But the problem is that “average” metrics hide variance between groups of people. A medical system for example has to work for children, adults and the elderly. Now, group fairness is just one of many possible metrics to measure fairness. There are more than 20 definitions of fairness. So you need to look at multiple metrics of fairness and make tradeoffs depending on the use case at hand. Again, the metrics might contradict each other and you cannot fulfill them all.

Guide the user towards the right amount of trust

Since there is no way to make your AI system 100% correct, you need to guide your users on when to trust the system. You don’t want your users to have 0% trust, because then you lose them. But you also don’t want them to have too much trust, because they might rely on the model too much.

We therefore need to guide our users to know when to trust the system. Karin advises us to design a system that guides the user’s mental model. A weak example is Chat GPT, where there is one warning message at the top, which is easily forgotten by the user. There are different levers that you can use as a product manager: Ideally you guide the users to understand how your system works and tell them what to do when there is a system failure. You need to work together with all the other functions in your organization, for example Marketing.

Building trustworthy AI is not only a technical challenge, it needs a cultural shift in organizations in order to build the right models. There is no one-fits-all solution, but our best bet is building a diverse team in a psychologically safe environment. People need to feel safe in order to speak up about potential issues, and the more diverse the team, the more issues can be potentially uncovered.

In the end of her talk, Karin encourages us all to be curious about AI, to embrace the tradeoffs and address the risks. She reminds us that we (as product leaders) have a pivotal role in building the right AI products. And she urges us to imagine the impact we can have together, building more trustworthy AI products for our users, for our society – and ultimately for a better future.

Karin Schoefegger – AI for a better world: The Role of PMs

Watch Karin’s full talk

Karin Schoefegger – AI for a better world: The Role of PMs

Watch Karin’s full talk

Kicking Off the 2023 Product at Heart Conference: A Nod to the Power of Poetry

Teresa Torres – Even You Can Do Continuous Discovery: Bringing the Discovery Habits to Every Organization