In this lecture we will discuss how to analyze contingency tables in R. Most of the focus will be on 2x2 tables, but we will discuss some methods for rxc tables. In particular, we will discuss
Let’s consider the dental data again. Recall, that this dataset has two variables
Does frequent exposure to chlorinated swimming pool water increase the risk of dental erosion?
dental <- array(c(32,17,118,127), dim=c(2,2),
dimnames = list(c(">=6hrs","<6hrs"),
c("Erosion (Yes)", "Erosion (No)")))
dental
## Erosion (Yes) Erosion (No)
## >=6hrs 32 118
## <6hrs 17 127
prop.table(dental) #cell percentages
## Erosion (Yes) Erosion (No)
## >=6hrs 0.10884354 0.4013605
## <6hrs 0.05782313 0.4319728
prop.table(dental, margin=1) #row percentages
## Erosion (Yes) Erosion (No)
## >=6hrs 0.2133333 0.7866667
## <6hrs 0.1180556 0.8819444
prop.table(dental, margin=2) #column percentages
## Erosion (Yes) Erosion (No)
## >=6hrs 0.6530612 0.4816327
## <6hrs 0.3469388 0.5183673
chisq.test(dental) #with continuity correction
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: dental
## X-squared = 4.1405, df = 1, p-value = 0.04187
chisq.test(dental,correct=FALSE) #without continuity correction
##
## Pearson's Chi-squared test
##
## data: dental
## X-squared = 4.802, df = 1, p-value = 0.02843
Recall that the chi-square test is only valid if the expected cell counts are greater than 5. If this fails for any of the cells, then we need to use another test. One such test is Fisher’s exact test. In the following example, the chi-square test would not be appropriate.
glasses <- array(c(1,5,8,2), dim=c(2,2),
dimnames=list(c("Juvenile Delinquent", "Non-delinquent"), c("Wears Glasses", "No Glasses")))
glasses
## Wears Glasses No Glasses
## Juvenile Delinquent 1 8
## Non-delinquent 5 2
glasses.chi <- chisq.test(glasses)
## Warning in chisq.test(glasses): Chi-squared approximation may be incorrect
glasses.chi$expected
## Wears Glasses No Glasses
## Juvenile Delinquent 3.375 5.625
## Non-delinquent 2.625 4.375
fisher.test(glasses)
##
## Fisher's Exact Test for Count Data
##
## data: glasses
## p-value = 0.03497
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 0.0009525702 0.9912282442
## sample estimates:
## odds ratio
## 0.06464255
McNemar’s test is for comparing proportions with matched pairs data. Let’s take another look at the marijuana sleep study data. Recall that here the question of interest is “Does marijuana use have an effect on sleeping trouble?” 32 subjects who use marijuana were selected and 32 mathced subjects who did not use marijuana were selected and asked if they had trouble sleeping. This results in the following data:
mcnemar <- array(c(4,3,9,16), dim=c(2,2),
dimnames=list("Control"=c("Yes", "No"),
"Marijuana"=c("Yes","No")))
mcnemar
## Marijuana
## Control Yes No
## Yes 4 9
## No 3 16
mcnemar.test(mcnemar, correct=FALSE)
##
## McNemar's Chi-squared test
##
## data: mcnemar
## McNemar's chi-squared = 3, df = 1, p-value = 0.08326
prop.test(dental, correct=FALSE)
##
## 2-sample test for equality of proportions without continuity
## correction
##
## data: dental
## X-squared = 4.802, df = 1, p-value = 0.02843
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.01116222 0.17939334
## sample estimates:
## prop 1 prop 2
## 0.2133333 0.1180556
library(Epi)
## Warning: package 'Epi' was built under R version 3.4.2
##
## Attaching package: 'Epi'
## The following object is masked from 'package:base':
##
## merge.data.frame
twoby2(dental)
## 2 by 2 table analysis:
## ------------------------------------------------------
## Outcome : Erosion (Yes)
## Comparing : >=6hrs vs. <6hrs
##
## Erosion (Yes) Erosion (No) P(Erosion (Yes)) 95% conf. interval
## >=6hrs 32 118 0.2133 0.1550 0.2861
## <6hrs 17 127 0.1181 0.0747 0.1817
##
## 95% conf. interval
## Relative Risk: 1.8071 1.0510 3.1070
## Sample Odds Ratio: 2.0259 1.0689 3.8398
## Conditional MLE Odds Ratio: 2.0211 1.0269 4.0989
## Probability difference: 0.0953 0.0098 0.1794
##
## Exact P-value: 0.0296
## Asymptotic P-value: 0.0304
## ------------------------------------------------------
policy <- array(c(48,96,12,94,55,7,135,53), dim=c(2,2,2),
dimnames=list(c("Low","High"),
c("FA","UFA"),
c("Urban","Rural")))
policy
## , , Urban
##
## FA UFA
## Low 48 12
## High 96 94
##
## , , Rural
##
## FA UFA
## Low 55 135
## High 7 53
mantelhaen.test(policy, correct=FALSE)
##
## Mantel-Haenszel chi-squared test without continuity correction
##
## data: policy
## Mantel-Haenszel X-squared = 23.05, df = 1, p-value = 1.578e-06
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
## 2.066744 6.069367
## sample estimates:
## common odds ratio
## 3.541726
library(DescTools)
## Warning: package 'DescTools' was built under R version 3.4.2
BreslowDayTest(policy)
##
## Breslow-Day test on Homogeneity of Odds Ratios
##
## data: policy
## X-squared = 0.18297, df = 1, p-value = 0.6688
For two nominal categorical variables we use the general association line of the CMH test.
strain <- array(c(11,42,30,79,14,63,3,52), dim=c(2,4),
dimnames=list(c("Y","N"),
c("G1","G2","G3","G4")))
strain
## G1 G2 G3 G4
## Y 11 30 14 3
## N 42 79 63 52
library(vcdExtra)
## Warning: package 'vcdExtra' was built under R version 3.4.2
## Loading required package: vcd
## Warning: package 'vcd' was built under R version 3.4.2
## Loading required package: grid
## Loading required package: gnm
## Warning: package 'gnm' was built under R version 3.4.2
CMHtest(strain)
## Cochran-Mantel-Haenszel Statistics
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 6.6214 1 0.0100759
## rmeans Row mean scores differ 6.6214 1 0.0100759
## cmeans Col mean scores differ 11.3707 3 0.0098812
## general General association 11.3707 3 0.0098812
For one nominal and one ordinal, we use the row mean scores line of the CMH test.
drug <- array(c(13,29,7,7,21,7),dim=c(2,3),
dimnames=list(c("Test","Placebo"),
c("None","Some","Marked")))
drug
## None Some Marked
## Test 13 7 21
## Placebo 29 7 7
CMHtest(drug)
## Cochran-Mantel-Haenszel Statistics
##
## AltHypothesis Chisq Df Prob
## cor Nonzero correlation 12.859 1 0.00033586
## rmeans Row mean scores differ 12.859 1 0.00033586
## cmeans Col mean scores differ 12.900 2 0.00158084
## general General association 12.900 2 0.00158084
For two ordinal categorical variables, we use the cor line of the CMH test.