Scatter plots in survey sampling

When it comes to analyzing survey data, you have to take into account the stochastic structure of the sample that was selected to obtain the data. Plots and graphics should not be an exception. The main aim of such studies is to try to infer about how the behavior of the outcomes of interest in … Sigue leyendo Scatter plots in survey sampling

dplyr and the design effect in survey samples

Blogdown entry here.For those guys like me who are not such R geeks, this trick could be of interest. The package dplyr can be very useful when it comes to data manipulation and you can extract valuable information from a data frame. For example, when using if you want to count how many humans have … Sigue leyendo dplyr and the design effect in survey samples

Automatic output format in Rmarkdown

I am writing a Rmarkdown document with plenty of tables, and I want them in a decent format, e.g. kable. However I don't want to format them one by one. For example, I have created the following data frame in dplyrdata2 %>% group_by(uf) %>% summarise(n = n(), ) %>% arrange(desc(n)) One solution to the output format … Sigue leyendo Automatic output format in Rmarkdown

Sampling weights and multilevel modeling in R

So many things have been said about weighting, but on my personal view of statistical inference processes, you do have to weight. From a single statistic until a complex model, you have to weight, because of the probability measure that induces the variation of the sample comes from an (almost always) complex sampling design that … Sigue leyendo Sampling weights and multilevel modeling in R

Small Area Estimation 101

Small area estimation (SAE) has become a widely used technique in official statistics since the last decade of past century. When the sample size is not enough to provide reliable estimates at a very particular level, the power of models and auxiliary information must be applied with no hesitation. In a nutshell, SAE tries to … Sigue leyendo Small Area Estimation 101

Multilevel regression with poststratification (Gelman’s MrP) in R – What is this all about?

Multilevel regression with poststratification (MrP) is a useful technique to predict a parameter of interest within small domains through modeling the mean of the variable of interest conditional on poststratification counts. This method (or methods) was first proposed by Gelman and Little (1997) and is widely used in political science where the voting intention is … Sigue leyendo Multilevel regression with poststratification (Gelman’s MrP) in R – What is this all about?

3PL models viewed through the lens of total probability theorem (updated)

As I currently am the NPM for PISA in Colombia, I must assist to several meetings dealing with the proper implementation of this assessment in my country. Few of them are devoted to the analysis of this kind of data (coming from IRT models). As usual, OECD has hired organizations with high technical standards. The … Sigue leyendo 3PL models viewed through the lens of total probability theorem (updated)

Computing Sample Size for Variance Estimation

The R package samplesize4surveys contains functions that allow to calculate sample sizes for estimating proportions, means, difference of proportions and even difference of two means. It also permits the calculation of sample error and power level for a fixed sample size.Here four functions are introduced for the estimation of a population variance and for conducting … Sigue leyendo Computing Sample Size for Variance Estimation

Highlighting R code for the web

When blogging about statistics and R, it is very useful to differentiate the body text to R code. I used to manage this issue by highlighting the code and pretty-R was a valuable instrument from Revolutions Analytics to accomplish this. However, as you may know, Microsoft acquired that company, and now this feature (dressing R … Sigue leyendo Highlighting R code for the web

How important is that variable?

When modeling any phenomena by including explanatory variables that highly relates the variable of interest, one question arises: which of the auxiliary variables have a higher influence on the response? I am not writing about significance testing or something like this. I am just thinking like a researcher who wants to know the ranking of … Sigue leyendo How important is that variable?

Lord’s Paradox in R

In an article called A Paradox in the Interpretation of Group Comparisons published in Psychological Bulletin, Lord (1967) made famous the following controversial story:A university is interested in investigating the effects of the nutritional diet its students consume in the campus restaurant. Various types of data were collected including the weight of each student in … Sigue leyendo Lord’s Paradox in R

Sublime Text 3: an alternative to RStudio

It was a Saturday morning; I was lecturing my students of my Item Response Theory class when I decided to run some R scripts to introduce my students with the JAGS syntax and the estimation of parameters in a Bayesian logistic regression setup.As it was usual, I opened RStudio because it was my favorite R … Sigue leyendo Sublime Text 3: an alternative to RStudio