Week 2: Asking the Right Questions

Recap

  • Lab assignment due Fri (upload log and written response)

  • Install R/Rstudio, VS code, github education

  • What is D3M?

  • What is an integrated development environment?

Customer Churn

Churn prediction

CustomerID Churn Tenure Preferred Device Satisfaction Score
50001 1 4 Mobile Phone 2
50002 1 Phone 3
50003 0 Phone 3
50004 1 0 Phone 5
50005 0 0 Phone 5
50006 1 0 Computer 5
50007 0 Phone 2
50008 1 Phone 2
50009 0 13 Phone 3
50010 1 Phone 3

ML model produces good prediction

Accuracy = 0.98

Precision = .96

Recall = 0.98

This looks like success

  • High accuracy
  • High precision
  • High recall
  • Clean evaluation

By most technical standards:

This is a very good model.

What question does this model actually answer?

A. Why customers churn
B. Which customers will churn
C. How to prevent churn
D. Whether churn is costly

Now what?

You can predict which customers will stay and which will leave.

How do you make a data-driven decision?

Taking a step back…

We need to make sure our analysis is answering a question

The question is formulated around the decision

The decision should solve a problem

Why should we care about churn?

  • Problem: Customer churn is expensive

  • Goal: We want to reduce churn by retaining more customers

  • Question: Why are customers leaving?

Exploring the data…

  • We find that customers leave when satisfaction falls

  • We then find that satisfaction falls when customers don’t feel valued (measured by survey and focus groups)

  • Do any of our existing programs increase satisfaction? loyalty discounts, mailers, early access?

Decision-oriented question

Which program (loyalty discounts, mailers, early access) increases satisfaction, thereby reducing churn?

Pair up and discuss

  • What business decisions are you familiar with?

  • Were the decisions related to a particular problem?

  • Dissect the business decision into:
    problem, goals, and decision-oriented questions

5 minutes to discuss then be ready to share out

A Framework

for data-driven decision making

Start by understanding the problem

%%{init: {"themeVariables": {"fontSize": "24px"}}}%%
flowchart LR
  G[Goals] ==> P[Problem]
  P ==> Q[Question]
  Q ==> Da[Data]
  Da ==> M[Model]
  M ==> R[Result]
  R ==> D[Decision]

Reality is more complicated

%%{init: {"themeVariables": {"fontSize": "24px"}}}%%
flowchart LR
  G[Goals] ==> P[Problem]
  P ==> Q[Question]
  Q ==> Da[Data]
  Da ==> M[Model]
  M ==> R[Result]
  R ==> D[Decision]
  Da ==> Q

Reality is very complicated

%%{init: {"themeVariables": {"fontSize": "24px"}}}%%
flowchart LR
  G[Goals] ==> P[Problem]
  P ==> Q[Question]
  Q ==> Da[Data]
  Da ==> M[Model]
  M ==> R[Result]
  R ==> D[Decision]
  M ==> Da
  R ==> Q
  R ==> M
  Da ==> Q

Where data/research projects usually fail

  • The decision is unclear

  • The question is vague or misaligned

  • The data don’t measure the concept

  • The unit of observation doesn’t match the claim

What makes a “good” question?

A good empirical question is:

  • Decision-relevant

  • Answerable (with data you have or can acquire)

  • Specific about:

    • Who/what (unit)

    • Where (one branch or whole enterprise)

    • When (timeline)

    • Compared to what (do nothing, )

A company asks: “What would happen to customer churn next year if we offered a retention discount at one location and not others?”

What type of analytics question is this? (Heidari)

A. Descriptive
B. Exploratory
C. Predictive
D. Causal (counterfactual)
E. Mechanistic

Types of Questions (Heidari)

  • Descriptive
    What is happening in the data? (summarize levels, averages, frequencies)

  • Exploratory
    Are there patterns, trends, or relationships worth investigating?

  • Inferential
    What can we conclude about a population based on a sample?

  • Predictive
    Given what we observe, what is likely to happen next?

  • Causal (Counterfactual)
    What would change if we intervene, compared to if we do nothing?

  • Mechanistic
    Through what process or structure does an outcome occur?

What percentage of customers churned last month?

What type of analytics question is this? (Heidari)

A. Descriptive
B. Exploratory
C. Predictive
D. Causal (counterfactual)
E. Mechanistic

Through what process does customer dissatisfaction lead to churn?

What type of analytics question is this? (Heidari)

A. Descriptive
B. Exploratory
C. Predictive
D. Causal (counterfactual)
E. Mechanistic

Is customer churn higher among users with low satisfaction scores?

What type of analytics question is this? (Heidari)

A. Descriptive
B. Exploratory
C. Predictive
D. Causal (counterfactual)
E. Mechanistic

Key Takeaways

  • Strong models do not guarantee good decisions
  • Data analysis should start with the problem and decision, not the data
  • Different decisions require different types of questions
  • Good questions are:
    • decision-relevant
    • answerable with data
    • clear about who, where, when, and compared to what

Reminders and Due Dates

  • Lab assignment due Friday

  • Lab on friday will focus on finding and importing data

    • Start in excel then move to R