R假设检验 Hypothesis testing

统计学检验概述

大数定理：当样本量足够多时，样本发生的频率近似于概率。

中心极限定理： 中心极限定理以严格的数学形式阐明了在大样本条件下，不论总体的分布如何，样本的均值总是近似地服从正态分布。如果一个随机变量能够分解为独立同分布的随机变量序列之和，则可以直接利用中心极限定理进行解决。总之，恰当地使用中心极限定理解决实际问题有着极其重要意义。

1.常见的检验包括参数检验和非参数检验。参数检验包括 t.test, fisher.test 等, 非参数检验包括 wilcox.test 等。

2.假设检验的五个步骤

1. 根据目的，设计实验，选择合适的统计量。
2. 设置 零假设 (null hypothesis)。就是认为没有差别，没有影响。
3. 确定拒绝区间 rejection region
4. 收集处理数据，计算统计量。
5. 做决定：如果统计学参数落在一个小概率事件的范围，则根据小概率事件在一次实验中不可能发生，拒绝原假设。

3.假设检验的两类错误 Types of error。

Test vs reality	Null hypothesis is true	... is false
Reject null hypothesis	Type I error (false positive)	True positive
Do not reject	True negative	Type II error (false negative)

通常可以根据具体情况权衡降低两种错误中的哪一类。

两类错误总是有一个权衡。
对于非典等流行病的公开与上报，标准要适当宽松，因为弃真的代价太大。
而对于(孕产妇等)新药的临床试验，标准要尽量严格，因为纳伪的代码太大。

弃真: 
	条件严格了(更小的p值，比如0.01)，筛选到的真值比例高了，但是被错误过滤掉的真值也多了。
	I类错误 假阳性，false positive rate (FPR) 
	the FPR is the same as the probability α that we mentioned above. 1−α is also called the specificity of a test. 
纳伪: 
	条件宽松了(更大的p值，比如0.05)，筛选到的真值总数多了，但是被错误接受的假值也多了。
	II类错误 假阴性，false negative rate (FNR)
	The FNR is sometimes also called β, and 1−β the power, sensitivity or true positive rate of a test.

参数检验

t检验

# 检验鸢尾花花萼的长宽是否有差异
# 分组t检验
> t1=t.test(iris$Sepal.Length, iris$Sepal.Width); t1

	Welch Two Sample t-test

data:  iris$Sepal.Length and iris$Sepal.Width
t = 36.463, df = 225.68, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 2.63544 2.93656
sample estimates:
mean of x mean of y 
 5.843333  3.057333 
> t1$p.value
[1] 1.459543e-96

# 配对 t 检验，检验效率更高
> t2=t.test(iris$Sepal.Length - iris$Sepal.Width); t2

	One Sample t-test

data:  iris$Sepal.Length - iris$Sepal.Width
t = 34.815, df = 149, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 2.627874 2.944126
sample estimates:
mean of x 
    2.786 

> t2$p.value
[1] 1.849554e-73

方差分析(1-way, 2-way, multi-way)

使用R进行多因素多水平的方差分析，考虑交叉效应。

参考: pdf1， pdf2， pdf3，

# 有5个因素，每个因素4个水平，因素之间有相互作用
# 造数据
set.seed(202112)
data2=data.frame(
  value=c(rnorm(100, 70, 15),  rnorm(100, 80, 15), 
	rnorm(100, 90, 15), rnorm(100, 100, 15)),
  A=rep(c(1,2,3,4), each=100), #A should be sig.
  B=rep(c(1,2,3,4), time=100),
  C=rep(c(4,1,2,3, 5), time=80),
  D=rep(c(3,4,1,2, 5,6,7,8,9,100), time=40),
  E=rep(c(2,3,4,1, 5,6,7,8), time=50)
)

# improve sig of B
# 10 sig
for(i in 1:400){
  data2[i, "value"]=data2[i, "value"] + (i %%4)*2
}

# to factors 一定要把水平转为因子
data2$A=as.factor(data2$A)
data2$B=as.factor(data2$B)
data2$C=as.factor(data2$C)
data2$D=as.factor(data2$D)
data2$E=as.factor(data2$E)
str(data2)
dim(data2)
head(data2)

# anova 交叉效应
aov2=aov(value~A*B*C*D*E, data2) #very slow(>1 min)
summary(aov2)

# method2
result=anova(lm(value ~ A*B*C*D*E, data=data2)) #very slow(>1min)
result

输出:
Analysis of Variance Table

Response: value
           Df Sum Sq Mean Sq F value  Pr(>F)    
A           3  45445 15148.3 72.5612 < 2e-16 ***
B           3   2130   710.2  3.4017 0.01844 *  
C           4    662   165.5  0.7928 0.53085    
D           4    679   169.7  0.8127 0.51808    
E           4   1557   389.2  1.8643 0.11739    
A:B         9   1799   199.9  0.9573 0.47619    
A:C        12   2185   182.1  0.8722 0.57602    
B:C         8   1337   167.1  0.8004 0.60276    
A:D        12   2468   205.6  0.9850 0.46380    
A:E        12   4005   333.7  1.5986 0.09259 .  
C:E        16   4218   263.6  1.2629 0.22207    
A:B:C      24   6261   260.9  1.2496 0.20081    
A:C:E      48  14631   304.8  1.4601 0.03521 *  
Residuals 240  50104   208.8                    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


多重比较
> TukeyHSD(aov(value~A, data2))
> TukeyHSD(aov(value~A*C*E, data2)) #includes all the interactions

> TukeyHSD(aov2) #many outputs

非参数检验

置换检验

置换检验常常用来比较均值，和 bootstrap 很像，基本区别是后者重抽样时允许重复，前者不重复。 //The so-called permutation test, introduced by R. A. Fisher in the 1930s, is sometimes recommended for comparing means. The method is somewhat similar in spirit to the bootstrap, but a fundamental difference between it and the bootstrap is that the bootstrap resamples with replacement and the permutation test does not.

置换检验是怎么工作的呢？在H0假设下，2个样本来自相同的分布，所以随机置换两组中的元素会给出相同分布下的新的一套数据。 //Why permutation test works? It works because under H0, the two samples are from the same distribution. Thus, randomly exchanging the elements in the two samples should give us a new set of data from the same distribution

置换检验的步骤:

1). 计算开始的统计量 d0=mean(X1)-mean(X2); Determine & calculate the initial test-statistic.

2). 抽样重新计算该统计量，重复N次(足够大)，得到其分布 dn=mean(X1new)-mean(X2new); Construct approximate test-statistic distribution.

3). 超出最初统计量的值/n就是p值 p=(larger than d0)/N; Calculate the p-value.

如果想要控制I类错误为0.05，就是要让95%的d1,...,dN 不包含 d0。 //If we want the Type I error probability to be .05, we conclude that the groups differ if the middle 95% of the values d1, …, dB do not contain d.

//Both are nonparametric procedures in the sense that they make no specific assumptions about the form of any underlying probability distributions.

Bootstrap方法能够测量一个参数的估计值的有效性，而置换检验产生了检验某些统计假设的新方法。 // Bootstrap methods enable to measure the efficacy of an estimator of a parameter, while permutation tests yield new ways to test certain statistical hypotheses.

# 计算两组数据的均值是否有差异
Group1 =c(6, 19, 34, 15)
Group2 =c(9, 21,  8, 53, 25)
d0=mean(Group1)-mean(Group2); d0 #-4.7

# 组合到一起
pool=c(Group1, Group2); pool

# 置换后，前4个一组，后5个一组，计算平均值差
oneTest=function(){
  dt2=sample(pool, size=length(pool), replace = F)
  mean(dt2[1:4]) - mean(dt2[5:9])
}
N=10000
set.seed(2021)
dn=replicate(N, oneTest())

# 画图展示
hist(dn, n=50)
abline(v=c(d0, -d0), lty=2, lwd=2, col="red")

# 看超过d0的百分比
len=length( dn[which( abs(dn) > abs(d0) )] ); len

# 计算p值
p=len/N; p #[1] 0.7111
# p值很大，无法拒绝原假设，就是说接受两组间无显著差异。

#计算两组数据的 中位数 是否有差异
SX = c(-2, -1, -1, 0, -2, -1, 0, -1, -2, 100)
SY = c(7, 13, 11, 5, 14, 9, 8, 10, 12, 11)

#检验统计量是 中位数差 
d0=median(SX) - median(SY); d0 #-11.5

length(SX) #10
length(SY) #10

oneTest=function(){
  SS=sample( c(SX, SY) )
  median(SS[1:10]) - median(SS[11:20])
}

# 重排后，计算均值差；重复多次
set.seed(202108)
TestNum=10000
result=replicate(TestNum, oneTest() )
# 画分布
hist(result, n=50)
abline(v=c(d0, -d0), lty=2, lwd=2, col="red")

#计算超过 |T1| 的个数
len=length( result[abs(result) > abs(d0)] ); len #33
p=len/(TestNum); p #0.0033

参考资料

http://web.stanford.edu/class/bios221/book/Chap-Testing.html
https://www.sciencedirect.com/topics/mathematics/permutation-test
https://www.sciencedirect.com/topics/mathematics/bootstrap-method