portotype.com

TL;DR - this is an article about building Cohorts to analyze and simulate business. If you’re the kind of nerd that would rather jump to the spreadsheets themselves and not read the explanation, you can access them all here.

What are Cohorts?

Your business will evolve. You will launch campaigns, make changes in pricing, hire more people. And you will want to see what’s the effect of those actions. Cause and effect. The easiest way of doing it is to track cohorts.

Cohorts are groups of people that share a common characteristic. That allows us to reduce a lot of people into groups that are comparable. As your business progresses with time, the easiest and broader cohorts to track are time-based cohorts, that is, groups of people whose key characteristic is that they did something in a certain period. For eCommerce, for example, time-based cohorts are defined as the group of people who first bought during a certain week. With cohorts, you can answer the following questions:

The simplest cohort table has two axis, and can be rendered in 2 different ways: Time-elapsed or Real-time.

2 different ways to visualize cohort tables.

cohorts

Time Elapsed

left table

Real Time

right table

Note: cohorts can be done per month, week, or any other period.

How to build cohorts (in Rows)

The following examples are all hosted in Rows.com, the spreadsheet platform our team has built. You can (should!) clone this spreadsheet and play around with it, by clicking the + icon. That will create your own copy on your own account!

Building your first Cohorts

→ You can see the spreadsheet built here. Duplicate and extend it!

  1. You start with a list of all orders in the table Sales data. The minimum information that you need to build a cohort model is a unique identifier for customers (I used the email) and the date of purchase. Believe me, with that you can already build something super powerful. If you want to do other things, you can include spend, number of items on the basket and other information as i did in the example.
  2. Then you need to calculate, for each order on that table, some helper columns. Those are all just a few INDEX MATCH functions and other math tricks.
    • month of the sale (column D);
    • month of cohort: the month of the first purchase of that person, (column E);
    • months elapsed: months elapsed between users first purchase (cohort) and month of the current order (column F); and
    • purchase in month for this client: the number of the purchase of this person, within this same month (column H).
  3. Then you can build your cohort table for time-elapsed “Cohorts (elapsed)”.
    1. On each cell, you will have a COUNTIFS that counts, for each cell, the number of orders for that cohort and elapsed month from distinct individual people.
      • “for that cohort” - the first condition of the COUNTIFS is that the cohort in “sales data” (column F) matches the cohort of the “Cohorts (elapsed)” table (the row of the cell).
      • “for that elapsed month” - we filter only the elapsed month in “sales data” (column G) that matches the elapsed month in the table (the column of the cell).
      • “from distinct individual people” - we filter the number of the order of the Column to be 1, as considering only the first purchase counts only unique people.
      • Note: we lock the cell references of the tables “Sales data” and “Cohorts (elapsed)” as needed, using $ in the formulas. Putting the $ before the column or row locks them. This lets us copy paste any cell within the whole “Cohorts (elapsed)” table to another position and it adapts the references correctly.
    2. I also added an IF function on top (so, IF(COUNTIFS()) to display a “-” on cells that are “impossible”. You can’t have a sale for a month that doesn’t exist.
  4. Then, you can build another table that is a version of “Cohorts elapsed” but where cells are a % of the month 0. This is useful to see what % of people comes back, regardless of how many people purchased for the first time.
  5. Finally, you can plot the last table in a line chart, as I did.

With a cohort like this, you can:

Cohorts with real data

To go one step further, there’s an interesting twist I like to give to cohort analysis, which is to do my cohorts orthogonally. Orthogonal Cohorts are those where variables are independent from each other. So, in a way, instead of doing a big cohort which tracks revenue, you can build several cohorts that track users, orders per user and revenue per order, and then multiply them to build the outcome (revenue).

→ I built this spreadsheet with the much more complete orthogonal cohorts here.

Check how it looks, or duplicate and play with it by clicking +.

With a cohort like this, you will be able to understand:

Cohorts for simulations

To complete this article, I build one last model for simulating purposes. Here I used a complete, orthogonal cohort model.

But instead of using real data, I set the individual cohorts tables as editable cells so that you can play around.

→ Check it here. If you want it, duplicate it and continue to evolve it!

And that’s all about Cohorts for now! If you have any question, post below.