Understanding relationships
Last updated
Last updated
Sometimes, we are not interested in either describing groups or validating their differences. Instead, we may be interested in relationships between variables. For example, we might want to how much income level affects life expectancy, and even compare that to the effect of sex and healthcare expenditure. This is an area of statistics that quickly grows in complexity (see e.g. ). For this short introduction, we will thus only limit ourselves to the most simple of methods, intended to just convey the general gist of what these are about.
When wanting to evaluate the relationship between two numerical variables (e.g. life expectancy and healthcare spending), the simplest approach is to look at their correlation. Correlation measures the extent to which the variables are linearly related, i.e. have a relationship where if one variable grows a certain amount, the second variable either grows or diminishes a proportional amount (e.g. that for every 100 million spent into healthcare, life expectancy would increase by a year - note also that as stated, correlation only accounts for linear relationships, so, for example, would not be able to model diminishing returns in healthcare spending, etc).
Stepping on from correlation, one may want to start building a formal model of the data, through which one could posit and verify laws about the world that gave rise to it. While such models can grow increasingly complex (again see e.g. ), it is good again to start with the simplest of such models: a linear regression model. Here, the idea is to formally code the linear relationships sought for through correlation. For example, to come up with a formula that the height of a person is 50cm+8times their hand size in cm (height=50+8*hand size).