R_ContingencyTables

In this lecture we will discuss how to analyze contingency tables in R. Most of the focus will be on 2x2 tables, but we will discuss some methods for rxc tables. In particular, we will discuss

Chi-square test of independence
Fisher’s Exact Test
Risk Difference
McNemar’s Test
Relative Risk
Odds Ratio
Cochran-Mantel-Hanzel Test
Breslow-Day Test

Chi-square test of independence

Let’s consider the dental data again. Recall, that this dataset has two variables

Frequent swimmer (>= 6 hours or <6 hours)
Dental erosin (Yes/No).

Does frequent exposure to chlorinated swimming pool water increase the risk of dental erosion?

dental <- array(c(32,17,118,127), dim=c(2,2),
                dimnames = list(c(">=6hrs","<6hrs"), 
                                c("Erosion (Yes)", "Erosion (No)")))
dental

##        Erosion (Yes) Erosion (No)
## >=6hrs            32          118
## <6hrs             17          127

prop.table(dental) #cell percentages

##        Erosion (Yes) Erosion (No)
## >=6hrs    0.10884354    0.4013605
## <6hrs     0.05782313    0.4319728

prop.table(dental, margin=1) #row percentages

##        Erosion (Yes) Erosion (No)
## >=6hrs     0.2133333    0.7866667
## <6hrs      0.1180556    0.8819444

prop.table(dental, margin=2) #column percentages

##        Erosion (Yes) Erosion (No)
## >=6hrs     0.6530612    0.4816327
## <6hrs      0.3469388    0.5183673

chisq.test(dental) #with continuity correction

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  dental
## X-squared = 4.1405, df = 1, p-value = 0.04187

chisq.test(dental,correct=FALSE) #without continuity correction

## 
##  Pearson's Chi-squared test
## 
## data:  dental
## X-squared = 4.802, df = 1, p-value = 0.02843

Fisher’s Exact Test

Recall that the chi-square test is only valid if the expected cell counts are greater than 5. If this fails for any of the cells, then we need to use another test. One such test is Fisher’s exact test. In the following example, the chi-square test would not be appropriate.

glasses <- array(c(1,5,8,2), dim=c(2,2),
                 dimnames=list(c("Juvenile Delinquent", "Non-delinquent"), c("Wears Glasses", "No Glasses")))

glasses

##                     Wears Glasses No Glasses
## Juvenile Delinquent             1          8
## Non-delinquent                  5          2

glasses.chi <- chisq.test(glasses)

## Warning in chisq.test(glasses): Chi-squared approximation may be incorrect

glasses.chi$expected

##                     Wears Glasses No Glasses
## Juvenile Delinquent         3.375      5.625
## Non-delinquent              2.625      4.375

fisher.test(glasses)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  glasses
## p-value = 0.03497
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  0.0009525702 0.9912282442
## sample estimates:
## odds ratio 
## 0.06464255

McNemar’s Test

McNemar’s test is for comparing proportions with matched pairs data. Let’s take another look at the marijuana sleep study data. Recall that here the question of interest is “Does marijuana use have an effect on sleeping trouble?” 32 subjects who use marijuana were selected and 32 mathced subjects who did not use marijuana were selected and asked if they had trouble sleeping. This results in the following data:

mcnemar <- array(c(4,3,9,16), dim=c(2,2),
                 dimnames=list("Control"=c("Yes", "No"),
                               "Marijuana"=c("Yes","No")))
mcnemar

##        Marijuana
## Control Yes No
##     Yes   4  9
##     No    3 16

mcnemar.test(mcnemar, correct=FALSE)

## 
##  McNemar's Chi-squared test
## 
## data:  mcnemar
## McNemar's chi-squared = 3, df = 1, p-value = 0.08326

Risk Difference

prop.test(dental, correct=FALSE)

## 
##  2-sample test for equality of proportions without continuity
##  correction
## 
## data:  dental
## X-squared = 4.802, df = 1, p-value = 0.02843
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.01116222 0.17939334
## sample estimates:
##    prop 1    prop 2 
## 0.2133333 0.1180556

Risk Ratio and Odds Ratio

library(Epi)

## Warning: package 'Epi' was built under R version 3.4.2

## 
## Attaching package: 'Epi'

## The following object is masked from 'package:base':
## 
##     merge.data.frame

twoby2(dental)

## 2 by 2 table analysis: 
## ------------------------------------------------------ 
## Outcome   : Erosion (Yes) 
## Comparing : >=6hrs vs. <6hrs 
## 
##        Erosion (Yes) Erosion (No)    P(Erosion (Yes)) 95% conf. interval
## >=6hrs            32          118              0.2133    0.1550   0.2861
## <6hrs             17          127              0.1181    0.0747   0.1817
## 
##                                    95% conf. interval
##              Relative Risk: 1.8071    1.0510   3.1070
##          Sample Odds Ratio: 2.0259    1.0689   3.8398
## Conditional MLE Odds Ratio: 2.0211    1.0269   4.0989
##     Probability difference: 0.0953    0.0098   0.1794
## 
##              Exact P-value: 0.0296 
##         Asymptotic P-value: 0.0304 
## ------------------------------------------------------

Cochran-Mantel-Haenszel Chi-square Test for Count Data

policy <- array(c(48,96,12,94,55,7,135,53), dim=c(2,2,2),
                dimnames=list(c("Low","High"),
                              c("FA","UFA"),
                              c("Urban","Rural")))
policy

## , , Urban
## 
##      FA UFA
## Low  48  12
## High 96  94
## 
## , , Rural
## 
##      FA UFA
## Low  55 135
## High  7  53

mantelhaen.test(policy, correct=FALSE)

## 
##  Mantel-Haenszel chi-squared test without continuity correction
## 
## data:  policy
## Mantel-Haenszel X-squared = 23.05, df = 1, p-value = 1.578e-06
## alternative hypothesis: true common odds ratio is not equal to 1
## 95 percent confidence interval:
##  2.066744 6.069367
## sample estimates:
## common odds ratio 
##          3.541726

library(DescTools)

## Warning: package 'DescTools' was built under R version 3.4.2

BreslowDayTest(policy)

## 
##  Breslow-Day test on Homogeneity of Odds Ratios
## 
## data:  policy
## X-squared = 0.18297, df = 1, p-value = 0.6688

General rxc tables and CMH tests

For two nominal categorical variables we use the general association line of the CMH test.

strain <- array(c(11,42,30,79,14,63,3,52), dim=c(2,4),
                dimnames=list(c("Y","N"),
                              c("G1","G2","G3","G4")))
strain

##   G1 G2 G3 G4
## Y 11 30 14  3
## N 42 79 63 52

library(vcdExtra)

## Warning: package 'vcdExtra' was built under R version 3.4.2

## Loading required package: vcd

## Warning: package 'vcd' was built under R version 3.4.2

## Loading required package: grid

## Loading required package: gnm

## Warning: package 'gnm' was built under R version 3.4.2

CMHtest(strain)

## Cochran-Mantel-Haenszel Statistics 
## 
##                  AltHypothesis   Chisq Df      Prob
## cor        Nonzero correlation  6.6214  1 0.0100759
## rmeans  Row mean scores differ  6.6214  1 0.0100759
## cmeans  Col mean scores differ 11.3707  3 0.0098812
## general    General association 11.3707  3 0.0098812

For one nominal and one ordinal, we use the row mean scores line of the CMH test.

drug <- array(c(13,29,7,7,21,7),dim=c(2,3),
              dimnames=list(c("Test","Placebo"),
                            c("None","Some","Marked")))
drug

##         None Some Marked
## Test      13    7     21
## Placebo   29    7      7

CMHtest(drug)

## Cochran-Mantel-Haenszel Statistics 
## 
##                  AltHypothesis  Chisq Df       Prob
## cor        Nonzero correlation 12.859  1 0.00033586
## rmeans  Row mean scores differ 12.859  1 0.00033586
## cmeans  Col mean scores differ 12.900  2 0.00158084
## general    General association 12.900  2 0.00158084

For two ordinal categorical variables, we use the cor line of the CMH test.