In this lecture we will discuss how to analyze contingency tables in R. Most of the focus will be on 2x2 tables, but we will discuss some methods for rxc tables. In particular, we will discuss

Chi-square test of independence

Let’s consider the dental data again. Recall, that this dataset has two variables

  1. Frequent swimmer (>= 6 hours or <6 hours)
  2. Dental erosin (Yes/No).

Does frequent exposure to chlorinated swimming pool water increase the risk of dental erosion?

dental <- array(c(32,17,118,127), dim=c(2,2),
                dimnames = list(c(">=6hrs","<6hrs"), 
                                c("Erosion (Yes)", "Erosion (No)")))
dental
##        Erosion (Yes) Erosion (No)
## >=6hrs            32          118
## <6hrs             17          127
prop.table(dental) #cell percentages
##        Erosion (Yes) Erosion (No)
## >=6hrs    0.10884354    0.4013605
## <6hrs     0.05782313    0.4319728
prop.table(dental, margin=1) #row percentages
##        Erosion (Yes) Erosion (No)
## >=6hrs     0.2133333    0.7866667
## <6hrs      0.1180556    0.8819444
prop.table(dental, margin=2) #column percentages
##        Erosion (Yes) Erosion (No)
## >=6hrs     0.6530612    0.4816327
## <6hrs      0.3469388    0.5183673
chisq.test(dental) #with continuity correction
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  dental
## X-squared = 4.1405, df = 1, p-value = 0.04187
chisq.test(dental,correct=FALSE) #without continuity correction
## 
##  Pearson's Chi-squared test
## 
## data:  dental
## X-squared = 4.802, df = 1, p-value = 0.02843

Fisher’s Exact Test

Recall that the chi-square test is only valid if the expected cell counts are greater than 5. If this fails for any of the cells, then we need to use another test. One such test is Fisher’s exact test. In the following example, the chi-square test would not be appropriate.

glasses <- array(c(1,5,8,2), dim=c(2,2),
                 dimnames=list(c("Juvenile Delinquent", "Non-delinquent"), c("Wears Glasses", "No Glasses")))

glasses
##                     Wears Glasses No Glasses
## Juvenile Delinquent             1          8
## Non-delinquent                  5          2
glasses.chi <- chisq.test(glasses)
## Warning in chisq.test(glasses): Chi-squared approximation may be incorrect
glasses.chi$expected
##                     Wears Glasses No Glasses
## Juvenile Delinquent         3.375      5.625
## Non-delinquent              2.625      4.375
fisher.test(glasses)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  glasses
## p-value = 0.03497
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.0009525702 0.9912282442
## sample estimates:
## odds ratio 
## 0.06464255

McNemar’s Test

McNemar’s test is for comparing proportions with matched pairs data. Let’s take another look at the marijuana sleep study data. Recall that here the question of interest is “Does marijuana use have an effect on sleeping trouble?” 32 subjects who use marijuana were selected and 32 mathced subjects who did not use marijuana were selected and asked if they had trouble sleeping. This results in the following data:

mcnemar <- array(c(4,3,9,16), dim=c(2,2),
                 dimnames=list("Control"=c("Yes", "No"),
                               "Marijuana"=c("Yes","No")))
mcnemar
##        Marijuana
## Control Yes No
##     Yes   4  9
##     No    3 16
mcnemar.test(mcnemar, correct=FALSE)
## 
##  McNemar's Chi-squared test
## 
## data:  mcnemar
## McNemar's chi-squared = 3, df = 1, p-value = 0.08326

Risk Difference

prop.test(dental, correct=FALSE)
## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  dental
## X-squared = 4.802, df = 1, p-value = 0.02843
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.01116222 0.17939334
## sample estimates:
##    prop 1    prop 2 
## 0.2133333 0.1180556

Risk Ratio and Odds Ratio

library(Epi)
## Warning: package 'Epi' was built under R version 3.4.2
## 
## Attaching package: 'Epi'
## The following object is masked from 'package:base':
## 
##     merge.data.frame
twoby2(dental)
## 2 by 2 table analysis: 
## ------------------------------------------------------ 
## Outcome   : Erosion (Yes) 
## Comparing : >=6hrs vs. <6hrs 
## 
##        Erosion (Yes) Erosion (No)    P(Erosion (Yes)) 95% conf. interval
## >=6hrs            32          118              0.2133    0.1550   0.2861
## <6hrs             17          127              0.1181    0.0747   0.1817
## 
##                                    95% conf. interval
##              Relative Risk: 1.8071    1.0510   3.1070
##          Sample Odds Ratio: 2.0259    1.0689   3.8398
## Conditional MLE Odds Ratio: 2.0211    1.0269   4.0989
##     Probability difference: 0.0953    0.0098   0.1794
## 
##              Exact P-value: 0.0296 
##         Asymptotic P-value: 0.0304 
## ------------------------------------------------------

Cochran-Mantel-Haenszel Chi-square Test for Count Data

policy <- array(c(48,96,12,94,55,7,135,53), dim=c(2,2,2),
                dimnames=list(c("Low","High"),
                              c("FA","UFA"),
                              c("Urban","Rural")))
policy
## , , Urban
## 
##      FA UFA
## Low  48  12
## High 96  94
## 
## , , Rural
## 
##      FA UFA
## Low  55 135
## High  7  53
mantelhaen.test(policy, correct=FALSE)
## 
##  Mantel-Haenszel chi-squared test without continuity correction
## 
## data:  policy
## Mantel-Haenszel X-squared = 23.05, df = 1, p-value = 1.578e-06
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.066744 6.069367
## sample estimates:
## common odds ratio 
##          3.541726
library(DescTools)
## Warning: package 'DescTools' was built under R version 3.4.2
BreslowDayTest(policy)
## 
##  Breslow-Day test on Homogeneity of Odds Ratios
## 
## data:  policy
## X-squared = 0.18297, df = 1, p-value = 0.6688

General rxc tables and CMH tests

For two nominal categorical variables we use the general association line of the CMH test.

strain <- array(c(11,42,30,79,14,63,3,52), dim=c(2,4),
                dimnames=list(c("Y","N"),
                              c("G1","G2","G3","G4")))
strain
##   G1 G2 G3 G4
## Y 11 30 14  3
## N 42 79 63 52
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.2
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.2
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.2
CMHtest(strain)
## Cochran-Mantel-Haenszel Statistics 
## 
##                  AltHypothesis   Chisq Df      Prob
## cor        Nonzero correlation  6.6214  1 0.0100759
## rmeans  Row mean scores differ  6.6214  1 0.0100759
## cmeans  Col mean scores differ 11.3707  3 0.0098812
## general    General association 11.3707  3 0.0098812

For one nominal and one ordinal, we use the row mean scores line of the CMH test.

drug <- array(c(13,29,7,7,21,7),dim=c(2,3),
              dimnames=list(c("Test","Placebo"),
                            c("None","Some","Marked")))
drug
##         None Some Marked
## Test      13    7     21
## Placebo   29    7      7
CMHtest(drug)
## Cochran-Mantel-Haenszel Statistics 
## 
##                  AltHypothesis  Chisq Df       Prob
## cor        Nonzero correlation 12.859  1 0.00033586
## rmeans  Row mean scores differ 12.859  1 0.00033586
## cmeans  Col mean scores differ 12.900  2 0.00158084
## general    General association 12.900  2 0.00158084

For two ordinal categorical variables, we use the cor line of the CMH test.