17 May 2023 for R Ladies Philly

Preliminaries: Analyzing Experiments

What do I mean by “experiment”?

  • Take sample from a population of subjects
  • Randomly assign subjects to either treatment or control
  • Measure average outcome for each subjects
  • Compare outcomes between treatment and control groups

This procedure allows you to estimate the causal effect of the treatment on some outcome

What types of experiments have you worked on?

Tell me in the chat while I tell you about my experiment.

Rhea chicks experiment

Koch, Colleen S., et al. “Training Rhea Americana chicks to walk voluntarily across a scale; effect on the handler’s time and the chicks’ weight gain compared with traditional techniques: A pilot study.” Journal of Veterinary Behavior 18 (2017): 69-75.

Protocol:

  • Newly-hatched rhea chicks were randomly assigned to be weighed by:
    • Being scooped up in a bucket involuntarily (standard practice)
    • Being encouraged to walk onto the scale using food
  • Chicks were weighed using the assigned method at 0, 10 and 25 days after hatch

Thanks, Mom!

Example: Rhea chicks experiment

chicks <- read.csv("rhea.tables.grams.20150626.csv")
chicks$treatment <- as.factor(chicks$treatment)
chicks$breeding_group <- as.factor(chicks$breeding_group)
chicks$sex <- as.factor(chicks$sex)
chicks$hatch_date <- as.Date(chicks$hatch_date, format="%m/%d")
chicks$color <- as.factor(chicks$color)
head(chicks)
##           treatment   chick breeding_group hatch_date grams_day0 grams_day10
## 1 Voluntary Walk-On 1667357            non 2023-05-29     362.88      426.38
## 2 Voluntary Walk-On 1670335            non 2023-05-30     408.24      444.53
## 3 Voluntary Walk-On 1680454            non 2023-05-30     399.17      453.60
## 4 Voluntary Walk-On 1660777            non 2023-05-30     449.06      453.60
## 5 Voluntary Walk-On 1680457            non 2023-05-31     430.92      426.38
## 6 Voluntary Walk-On 1667737            non 2023-06-02     435.46      435.46
##   grams_day25 pct_gain sex color
## 1      916.27   114.89   f     b
## 2     1106.78   148.98   m     b
## 3     1161.22   156.00   f     b
## 4     1215.65   168.00   m     b
## 5     1115.86   161.70   m     b
## 6      952.56   118.75   m     w

Chicks data summary

summary(chicks)
##               treatment      chick         breeding_group   hatch_date        
##  Involuntary Bucket:17   Min.   : 167633   breeder:14     Min.   :2023-05-28  
##  Voluntary Walk-On :18   1st Qu.:1667170   non    :21     1st Qu.:2023-05-30  
##                          Median :1667934                  Median :2023-05-30  
##                          Mean   :1626957                  Mean   :2023-06-01  
##                          3rd Qu.:1671362                  3rd Qu.:2023-06-08  
##                          Max.   :1680841                  Max.   :2023-06-10  
##    grams_day0     grams_day10     grams_day25        pct_gain        sex    
##  Min.   :317.5   Min.   :308.4   Min.   : 517.1   Min.   : 16.33   f   :15  
##  1st Qu.:383.3   1st Qu.:408.2   1st Qu.: 821.0   1st Qu.: 99.72   m   :18  
##  Median :403.7   Median :426.4   Median : 997.9   Median :134.69   NA's: 2  
##  Mean   :403.3   Mean   :433.8   Mean   :1033.9   Mean   :138.99            
##  3rd Qu.:426.4   3rd Qu.:453.6   3rd Qu.:1211.1   3rd Qu.:172.34            
##  Max.   :476.3   Max.   :535.2   Max.   :1705.5   Max.   :290.48            
##  color 
##  b:27  
##  w: 8  
##        
##        
##        
## 

Are your experiments this cute?

Rhea chicks

How to think about an experiment

Each subject has two potential outcomes in the experiment: \[Y_i(1) \hspace{1in}{ } Y_i(0)\]

The individual treatment effect is the difference between these two outcomes: \[\tau_i = Y_i(1) - Y_i(0)\] Unfortunately, we can only observe one potential outcome for each subject, so we can never know what \(\tau_i\) is for each subject

Average treatment effect (ATE)

Typically, the goal of an experiment is to estimate the average treatment effect (ATE): \[\tau = \mathbb{E}_i[Y_i(1) - Y_i(0)]\] which we estimate as the difference-in-means \[\widehat\tau = \frac{1}{n_1}\sum_{\textrm{treated}}Y_i(1) - \frac{1}{n_0}\sum_{\textrm{control}}Y_i(0)\] where \(n_1\) and \(n_0\) are the number of treated and control subjects.

Average Treatment effect for chicks

chicks %>%
  group_by(treatment) %>%
  summarize(mean(grams_day25))
## # A tibble: 2 × 2
##   treatment          `mean(grams_day25)`
##   <fct>                            <dbl>
## 1 Involuntary Bucket               1100.
## 2 Voluntary Walk-On                 972.
chicks %>% filter(treatment=="Voluntary Walk-On") %>% 
  summarize(mean(grams_day25)) - 
  chicks %>% filter(treatment=="Involuntary Bucket") %>% 
    summarize(mean(grams_day25))
##   mean(grams_day25)
## 1         -128.1348

Why experiments?

The difference-in-means \(\widehat\tau\) is an unbiased estimate of the average treatment effect (ATE)

Under the assumption that the outcomes for one subject do not affect the other subjects (SUTVA)

In other words, randomization eliminates the possiblity of confounders

Aside: Random sampling

The average treatment effect averages over the population in the experiment.

The ideal experiment randomly samples from the target population.

  • Trivial in engineering experiments
  • Do-able in marketing experiments
  • Unethical in clinical trials

So, there are two types of randomization in the ideal experiment: random sampling to selct subjects and random assignment to treatments

Statistical analysis: t test

The goal of a statistical analysis is to quantify uncertainty. One statistical analysis used for experiments is a t test. Here we have a very wide confidence interval for the difference, so we “aren’t too sure” about the difference.

t.test(grams_day25 ~ treatment, data=chicks)
## 
##  Welch Two Sample t-test
## 
## data:  grams_day25 by treatment
## t = 1.3131, df = 27.536, p-value = 0.2
## alternative hypothesis: true difference in means between group Involuntary Bucket and group Voluntary Walk-On is not equal to 0
## 95 percent confidence interval:
##  -71.90882 328.17843
## sample estimates:
## mean in group Involuntary Bucket  mean in group Voluntary Walk-On 
##                        1099.8465                         971.7117

Statistical analysis: regression

m1 <- lm(grams_day25 ~ treatment, data=chicks)
summary(m1)
## 
## Call:
## lm(formula = grams_day25 ~ treatment, data = chicks)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -582.75 -228.94  -19.15  173.63  605.69 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 1099.85      69.18  15.898   <2e-16 ***
## treatmentVoluntary Walk-On  -128.13      96.47  -1.328    0.193    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 285.2 on 33 degrees of freedom
## Multiple R-squared:  0.05075,    Adjusted R-squared:  0.02199 
## F-statistic: 1.764 on 1 and 33 DF,  p-value: 0.1932

Statistical analysis: regression

confint(m1)
##                                2.5 %     97.5 %
## (Intercept)                 959.0988 1240.59415
## treatmentVoluntary Walk-On -324.3981   68.12846

Summary of preliminaries

  • Randomized experiments allow us estimate the average treatment effect (ATE) by a simple difference-in-means calculation
  • We can characterize the uncertainty in our estimate using a t test or a simple regression of the outcome on the treatment

Treatment modifiers

Treatment modifiers

Treatment effects may be different for different groups of subjects.

chicks %>%
  group_by(color, treatment) %>%
  summarize(mean(grams_day25), .groups="keep")
## # A tibble: 4 × 3
## # Groups:   color, treatment [4]
##   color treatment          `mean(grams_day25)`
##   <fct> <fct>                            <dbl>
## 1 b     Involuntary Bucket               1175.
## 2 b     Voluntary Walk-On                1005.
## 3 w     Involuntary Bucket                855.
## 4 w     Voluntary Walk-On                 855.

Difference-in-means for colors

chicks %>% filter(color=="b", treatment=="Voluntary Walk-On") %>% 
  summarize(mean(grams_day25)) - 
  chicks %>% filter(color=="b", treatment=="Involuntary Bucket") %>% 
    summarize(mean(grams_day25))
##   mean(grams_day25)
## 1         -170.1252
chicks %>% filter(color=="w", treatment=="Voluntary Walk-On") %>% 
  summarize(mean(grams_day25)) - 
  chicks %>% filter(color=="w", treatment=="Involuntary Bucket") %>% 
    summarize(mean(grams_day25))
##   mean(grams_day25)
## 1                 0

Regression for analyzing modifiers

We can incorporate modifiers in the regression analysis using an interaction.

m2 <- lm(grams_day25 ~ color + treatment + treatment:color, data=chicks)
summary(m2)
## 
## Call:
## lm(formula = grams_day25 ~ color + treatment + treatment:color, 
##     data = chicks)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -494.77 -191.82    1.94  193.58  530.37 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        1175.17      75.57  15.551 3.44e-16 ***
## colorw                             -320.14     155.79  -2.055   0.0484 *  
## treatmentVoluntary Walk-On         -170.13     104.94  -1.621   0.1151    
## colorw:treatmentVoluntary Walk-On   170.13     219.39   0.775   0.4440    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 272.5 on 31 degrees of freedom
## Multiple R-squared:  0.1863, Adjusted R-squared:  0.1076 
## F-statistic: 2.367 on 3 and 31 DF,  p-value: 0.08995
confint(m2)
##                                       2.5 %      97.5 %
## (Intercept)                       1021.0503 1329.295848
## colorw                            -637.8703   -2.405846
## treatmentVoluntary Walk-On        -384.1602   43.909719
## colorw:treatmentVoluntary Walk-On -277.3240  617.574479

Some vocabulary

Looking for subgroups with different treatment effects is called effect modification or stratified analysis or heterogeneous treatment effects.

We are estimating conditional average treatment effects (CATEs): \[\mathbb{E}[\tau_i | X] = \mathbb{E}[Y_i(1)-Y_i(0) | X_i]\] which is pronounced “the expected treatment effect given X”.

Aside: Fun with R-formulas

color*treatment is shorthand for color + treatment + treatment:color

m2 <- lm(pct_gain ~ color*treatment, data=chicks) 
summary(m2)
## 
## Call:
## lm(formula = pct_gain ~ color * treatment, data = chicks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -112.757  -44.038    6.086   36.246  121.473 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                         169.01      17.11   9.878  4.3e-11 ***
## colorw                              -64.39      35.27  -1.826   0.0776 .  
## treatmentVoluntary Walk-On          -40.40      23.76  -1.700   0.0991 .  
## colorw:treatmentVoluntary Walk-On    47.96      49.67   0.966   0.3417    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61.69 on 31 degrees of freedom
## Multiple R-squared:   0.15,  Adjusted R-squared:  0.06779 
## F-statistic: 1.824 on 3 and 31 DF,  p-value: 0.1633

Aside: More fun with R-formulas

Adding 0 + to the formula eliminates the intercept

m3 <- lm(pct_gain ~ 0 + color*treatment, data=chicks)
summary(m3)
## 
## Call:
## lm(formula = pct_gain ~ 0 + color * treatment, data = chicks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -112.757  -44.038    6.086   36.246  121.473 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## colorb                              169.01      17.11   9.878  4.3e-11 ***
## colorw                              104.61      30.85   3.391  0.00191 ** 
## treatmentVoluntary Walk-On          -40.40      23.76  -1.700  0.09907 .  
## colorw:treatmentVoluntary Walk-On    47.96      49.67   0.966  0.34174    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61.69 on 31 degrees of freedom
## Multiple R-squared:  0.8552, Adjusted R-squared:  0.8366 
## F-statistic: 45.78 on 4 and 31 DF,  p-value: 1.395e-12

Caution!

If the variables we use to define subgroups are affected by the treatment, then the estimates of the subgroup treatment effects are no longer valid.

Typically, we use variables that were collected prior to randomization. Another name for modifier variable is pre-randomization covariates to emphasize this.

There can be multiple modifiers

m4 <- lm(pct_gain ~ color*treatment + sex*treatment, data=chicks) # does the same thing
summary(m4)
## 
## Call:
## lm(formula = pct_gain ~ color * treatment + sex * treatment, 
##     data = chicks)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -105.888  -40.450   -7.452   40.592  110.482 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                        179.998     26.707   6.740 3.09e-07 ***
## colorw                             -70.920     38.305  -1.851   0.0751 .  
## treatmentVoluntary Walk-On         -37.855     36.606  -1.034   0.3103    
## sexm                               -17.860     32.553  -0.549   0.5878    
## colorw:treatmentVoluntary Walk-On   88.334     61.562   1.435   0.1628    
## treatmentVoluntary Walk-On:sexm     -5.833     45.738  -0.128   0.8995    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 63.68 on 27 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.1613, Adjusted R-squared:  0.005978 
## F-statistic: 1.038 on 5 and 27 DF,  p-value: 0.4154
confint(m4)
##                                        2.5 %     97.5 %
## (Intercept)                        125.19958 234.795498
## colorw                            -149.51541   7.675208
## treatmentVoluntary Walk-On        -112.96449  37.253957
## sexm                               -84.65292  48.933420
## colorw:treatmentVoluntary Walk-On  -37.98083 214.649214
## treatmentVoluntary Walk-On:sexm    -99.67978  88.013821

Summary

If we have variables that are unaffected by treatment, we can use them to create subgroups and estimate conditional average treatment effects (CATE) for each subgroup.

Challenges with regression for subgroup analysis

  • For the chicks experiment, we have five potential modifiers, and we don’t know which ones are important.
  • We’ve got two potential modifiers that are continuous hatch_date and grams_day_0.
    • If we put those in the linear regression, we are assuming the relationship between the treatment effect and the modifier is linear.
  • We’ve got a few NAs for sex and lm throws out those incomplete cases.

Causal Forests: A flexible way to identify important modifiers and relate them to treatment effects

First some data munging

We are going to be using some packages that don’t play nicely with factors and data frames. So, let’s create some numeric versions of our potential modifiers.

# **potential modifiers must be numeric** when using the grf package
breeder <- 1*(chicks$breeding_group == "breeder") # 1=breeder, 0=non
hatch_date <- as.numeric(chicks$hatch_date) 
grams_day0 <- chicks$grams_day0 
female <- 1*(chicks$sex == "f") # 1=female, 0=male
brown <- 1*(chicks$color == "b") # 1=b, 2=w

Preliminary: Classification and regression trees

A classification and regression tree (CART) is an alternative to a linear model for regression. Here is a CART predicting grams_day25 from our potential modifiers.

Preliminary: Random forests

A random forest is a collection of trees estimated from different sub-samples of the data. The predictions for each unit are averaging the predictions across trees.

This is an example of bagging which is an ensemble technique.

We can fit a random forest using the grf package. grf stands for Generalized Random Forests.

Y <- chicks$grams_day25 # outcome
W <- chicks$treatment=="Voluntary Walk-On" # treatment
X <-  data.frame(breeder, hatch_date, grams_day0, female, brown)

rf <- regression_forest(X, Y, num.trees = 100, seed=20230517)

Random forest tree 1

plot(get_tree(rf, 1))

Random forest tree 2

plot(get_tree(rf, 4))

Random forest predictions

Random forests aggregate across trees to get the prediction. Unlike lm the predictions don’t necessarily follow a linear pattern.

plot(chicks$hatch_date, rf$predictions, col=chicks$color)

Causal Forests: A flexible way to identify important modifiers and relate them to treatment effects

Causal forests

A causal forest is a random forest build to predict each unit’s treatment effect \(\tau_i = Y_i(1) - Y_i(0)\) as a function of potential modifiers \(X_i\). It allows us to sort through the potential modifiers quickly and identify non-linear relationships relationships.

causal_forest documentation

Causal forest for chick experiment

To fit a causal forest for the chick experiment, we call grf::causal_forest.

The inputs are the modifiers X, the outcomes Y, the treatment W, and the probability of treatment in the experiment W.hat.

cf <- causal_forest(X, Y, W, W.hat = 0.5, seed=19980103) 

Causal forest CATES

The goal of the causal forest is to estimate the conditional average treatment effects (CATEs), i.e.  \[\mathbb{E}[\tau_i|X_i] = \mathbb{E}[Y_i(1)-Y_i(0)|X_i]\] For the Rhea chicks experiment this is the effect of the Voluntary Walk-On treatment versus the Involuntary Bucket on the chick weight at day 25.

data.frame(CATE=cf$predictions,X) 
##         CATE breeder hatch_date grams_day0 female brown
## 1  -139.4173       0      19506     362.88      1     1
## 2  -161.7334       0      19507     408.24      0     1
## 3  -156.8684       0      19507     399.17      1     1
## 4  -164.1048       0      19507     449.06      0     1
## 5  -151.3777       0      19508     430.92      0     1
## 6  -145.4583       0      19510     435.46      0     0
## 7  -143.8213       0      19516     426.38      1     1
## 8  -136.3567       0      19517     385.56      0     1
## 9  -160.1464       0      19517     444.53      1     1
## 10 -165.6790       0      19518     362.88      0     1
## 11 -142.8160       0      19518     403.70      1     1
## 12 -154.8662       1      19505     421.85      1     0
## 13 -132.9799       1      19506     367.42      0     1
## 14 -143.9325       1      19506     399.17      0     1
## 15 -171.1630       1      19507     317.52      1     1
## 16 -122.9478       1      19507     403.70     NA     0
## 17 -124.2064       1      19507     408.24      0     1
## 18 -127.9846       1      19508     367.42     NA     0
## 19 -113.0343       0      19507     381.02      0     1
## 20 -163.6664       0      19507     385.56      0     1
## 21 -153.9435       0      19507     376.49      1     1
## 22 -181.0604       0      19508     399.17      0     1
## 23 -170.4922       0      19508     430.92      0     0
## 24 -127.6462       0      19517     426.38      0     1
## 25 -138.9372       0      19517     381.02      0     1
## 26 -136.0210       0      19517     476.28      1     0
## 27 -155.7796       0      19517     399.17      1     1
## 28 -124.9835       0      19518     385.56      1     1
## 29 -112.2167       1      19505     390.10      0     1
## 30 -160.0674       1      19506     344.74      0     1
## 31 -134.5818       1      19506     412.78      1     1
## 32 -185.2478       1      19506     449.06      1     0
## 33 -153.4530       1      19507     426.38      1     0
## 34 -125.6560       1      19507     444.53      0     1
## 35 -154.8203       1      19508     412.78      1     1

Heterogeneity in predicted CATES

hist(cf$predictions)

Causal forest versus random forest

The random forest we fit before predicts grams_day25 but the causal forest predicts the CATEs

plot(rf$predictions, cf$predictions)

Causal forest CATES versus chick color

stripchart(cf$predictions~chicks$color, vertical=TRUE)

Causal forest CATES versus chick sex

stripchart(cf$predictions~chicks$sex, vertical=TRUE)

Causal forest CATEs versus hatch date

plot(chicks$hatch_date, cf$predictions)

Which chicks should get what?

library(policytree)
rewards <- cbind(control=-get_scores(cf), treat=get_scores(cf))
tree <- policy_tree(X, rewards, min.node.size = 1)
plot(tree, leaf.labels=c("Involuntary Walk-On", "Voluntary Bucket"))

Which predictors are the most important?

It can be hard to figure out which modifiers are “important.” One way to do that is to use grf:best_linear_projection which finds the linear model that fits best to the random forest.

best_linear_projection(cf, X)
## 
## Best linear projection of the conditional average treatment effect.
## Confidence intervals are cluster- and heteroskedasticity-robust (HC3):
## 
##                Estimate  Std. Error t value Pr(>|t|)  
## (Intercept) 761337.0846 559380.7162  1.3610  0.18475  
## breeder       -408.5533    279.8861 -1.4597  0.15591  
## hatch_date     -38.9312     28.6896 -1.3570  0.18603  
## grams_day0      -3.5664      4.2136 -0.8464  0.40477  
## female         276.6646    205.1490  1.3486  0.18867  
## brown         -526.4337    257.6604 -2.0431  0.05091 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Causal forest average treatment effect

grf::average_treatment_effect combines the CATES to obtain an average treatment effect. It is similar to our difference-in-means estimate of -128.1.

average_treatment_effect(cf)
##   estimate    std.err 
## -138.56257   98.13805

Sometimes std.err of the causal forest ATE will be smaller than the difference-in-means ATE. Accounting for variability due to treatment modifiers can make the ATE estimate more precise.

Summary

  • A causal forest builds on a machine learning model called a random forest to predict conditional average treatment effects (CATEs).
  • This allows us to quickly identify treatment modifiers in experiments.

Homework (optional!)

What’s next?

  • Our analysis of the chicks experiment was limited by the small sample size.
    • We didn’t identify any really strong treatment modifiers
    • We didn’t test the quality of our predictions using holdout validation.
  • The grf tutorial walks you through the analysis of two different analyses.
  • You should try it on your own experiments!
  • Causal forests can also be used to analyze observational data for a treatment that was not randomized.
    • If your data includes all potential confounders, then the estimated ATE and CATEs will be causal.

Thank you!