久久国产成人av_抖音国产毛片_a片网站免费观看_A片无码播放手机在线观看,色五月在线观看,亚洲精品m在线观看,女人自慰的免费网址,悠悠在线观看精品视频,一级日本片免费的,亚洲精品久,国产精品成人久久久久久久

分享

R筆記:描述性統(tǒng)計(jì)分析

 Memo_Cleon 2020-05-01

在數(shù)據(jù)進(jìn)行分析時(shí),,我們往往需要先看一下數(shù)據(jù)的基本信息,比如求和,、平均數(shù),、標(biāo)準(zhǔn)差、標(biāo)準(zhǔn)誤,、中位數(shù),、四分位數(shù)、最小值,、最大值,、極差、偏度,、峰度等,。在R中這不是難事,有N多的程序包函數(shù)可以實(shí)現(xiàn)這些功能,,有時(shí)候你會(huì)覺得多得都不知道選擇哪一個(gè),。本文只介紹兩個(gè),一個(gè)是基礎(chǔ)安裝summary{base},,另外一個(gè)是stat.desc{pastecs},同時(shí)會(huì)用到分組計(jì)算輸出函數(shù)by(),。

示例:16例(id)受試者分為A,、B兩組(group),每組8例,,測量每位受試者的體重(weight)和身高(height),。
數(shù)據(jù)載入:將stata文件Multicariate中數(shù)據(jù)載入到R數(shù)據(jù)框ma中,命令代碼如下:

library(foreign)

ma<- read.dta("D:/Temp/STATA/Multivariate.dta")

【1】summary
基礎(chǔ)安裝summary()函數(shù)提供了最小值,、最大值,、四分位數(shù)和數(shù)值型變量的均值,以及因子向量和邏輯型向量的頻數(shù)統(tǒng)計(jì),。

如果不考慮分組,,想知道16例受試者體重和身高的一些基本信息,可使用命令:

summary(ma["weight"])

summary(ma["height"])

或者同時(shí)輸出體重和身高的基礎(chǔ)信息:

var=c("weight","height")

summary(ma[var])

當(dāng)然你可以直接用一條命令:

summary(ma[c("weight","height")])  #等同于使用命令summary(ma[3:4])或summary(ma[-1:-2])

以上只為演示,,實(shí)際工作中我們想知道的是A,、B兩組各自的基本信息,這就需要分組計(jì)算,。數(shù)by {base}Apply a Function to a Data Frame Split by Factors,,應(yīng)用格式為by(data, INDICES, FUN, ..., simplify = TRUE)具體解釋可通過命令help("by")查詢,該函數(shù)可以按照INDICES將要分析的data數(shù)據(jù)分割成幾個(gè)數(shù)據(jù)框,,然后對每個(gè)數(shù)據(jù)框應(yīng)用FUN函數(shù)的功能,。

本例分組計(jì)算輸出命令代碼如下:

by(ma[c("weight","height")],ma$group,summary)  #對數(shù)據(jù)框ma中的weight和height變量按group分組,分別進(jìn)行summary獲取描述統(tǒng)計(jì)量后輸出結(jié)果

【2】stat.desc{pastecs}

stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.95),,這是一個(gè)比較牛X的函數(shù),,會(huì)得出比較多的描述性指標(biāo)。x為數(shù)據(jù)框或時(shí)間序列,,在默認(rèn)情況下(basic=TRUE, desc=TRUE),,函數(shù)會(huì)返回x所有值、空值,、缺失值的數(shù)量,,最小值,最大值,,值域,,總和,中位數(shù),,平均數(shù),,平均數(shù)的標(biāo)準(zhǔn)誤,均數(shù)在P水平時(shí)的置信區(qū)間,,方差,,標(biāo)準(zhǔn)差以及變異系數(shù)。若norm=TRUE(默認(rèn)是FALSE),,返回正態(tài)分布的一些統(tǒng)計(jì)量,,如偏度和峰度(以及它們的統(tǒng)計(jì)顯著程度)和Shapiro-Wilk正態(tài)檢驗(yàn)結(jié)果。P=0.95,,是默認(rèn)的置信度為0.95來計(jì)算平均數(shù)的置信區(qū)間,。

命令接前面的數(shù)據(jù)載入命令:

library(pastecs)  #載入程序包pastecs,該程序包非默認(rèn),,需要通過install.packages("pastecs")下載安裝 

stat.desc(ma[3:4],norm=TRUE,p=0.95)

分組計(jì)算命令代碼如下:

stat.desc(ma[1:8,3],norm=TRUE)  #weightA

stat.desc(ma[9:16,”weight”],norm=TRUE)  #weightB

stat.desc(ma[1:8,”height”],norm=TRUE)  #heightA

stat.desc(ma[9:16,4],norm=TRUE)  #heightB

當(dāng)然我們也可以用函數(shù)by()直接分組計(jì)算輸出:

by(ma[3:4],ma$group,stat.desc)  #對數(shù)據(jù)框ma中的第3列和第4列變量按group分組,,分別進(jìn)行stat.desc獲取描述統(tǒng)計(jì)量后輸出結(jié)果

如果想輸出正態(tài)分布的統(tǒng)計(jì)量,命令如下:

by(ma[3:4],ma$group,function(x)stat.desc(x,norm=TRUE))  #對數(shù)據(jù)框ma中的第3列和第4列變量按group分組,,分別進(jìn)行stat.desc獲取基本描述統(tǒng)計(jì)量和正態(tài)分布的統(tǒng)計(jì)量后輸出結(jié)果

關(guān)于函數(shù)stat.desc():

stat.desc{pastecs}:Descriptive statistics on a data frame or time series,。Compute a table giving various descriptive statistics about the series in a data frame or in a single/multiple time series

Useage:stat.desc(x, basic=TRUE, desc=TRUE, norm=FALSE, p=0.95)

x:a data frame or a time series

basic:do we have to return basic statistics (by default, it is TRUE)? These are: the number of values (nbr.val), the number of null values (nbr.null), the number of missing values (nbr.na), the minimal value (min), the maximal value (max), the range (range, that is, max-min) and the sum of all non-missing values (sum)

desc:do we have to return various descriptive statistics (by default, it is TRUE)? These are: the median (median), the mean (mean), the standard error on the mean (SE.mean), the confidence interval of the mean (CI.mean) at the p level, the variance (var), the standard deviation (std.dev) and the variation coefficient (coef.var) defined as the standard deviation divided by the mean

norm:do we have to return normal distribution statistics (by default, it is FALSE)? the skewness coefficient g1 (skewness), its significant criterium (skew.2SE, that is, g1/2.SEg1; if skew.2SE > 1, then skewness is significantly different than zero), kurtosis coefficient g2 (kurtosis), its significant criterium (kurt.2SE, same remark than for skew.2SE), the statistic of a Shapiro-Wilk test of normality (normtest.W) and its associated probability (normtest.p)

p:the probability level to use to calculate the confidence interval on the mean (CI.mean). By default, p=0.95

END


    轉(zhuǎn)藏 分享 獻(xiàn)花(0

    0條評論

    發(fā)表

    請遵守用戶 評論公約

    類似文章 更多