# Correlational Techniques Rahul's Noteblog Notes on Biostatistics Correlational Techniques

## What is Correlation?

• Medical sciences often establish relations between two variables, eg., smoking and cancer.

• The methods used are called correlational techniques.

• Correlation: Establish and quantify the strength and direction of the relationship between two variables. It is expressed as r (correlation coefficient).

• The relation between two correlated variables forms a bivariate distribution, which is commonly presented graphically in the form of a scattergram.

• Coefficient of determination = correlation coefficient X correlation coefficient.

• Correlation doesn't establish a casual relation between two variables, but merely a statistical association.

## Scatter Plots:

• A scatterplot or scattergram shows the relationship between two quantitative variables (continuous - interval or ratio data) measured on the same individuals. Value of one variable on horizontal axis and variable of other variable appear on the vertical axis.

• Values of r near 0 indicate a weak linear relationship. r=-1 and r= +1 is perfect.

## Types of Correlation:

### Pearson product-moment correlation:

• Used for interval or ratio scale data.

### Spearman rank-order correlation:

• Used for ordinal scale data.

Both these techniques are linear, and cannot be used for non-linear relation.

## Regression:

• Regression: Express the functional relationship between two variables, so that the value of one variable can be predicted from knowledge of the other. One value X is used to predict Y.

• When two variables are correlated, it is possible to predict the value of one of them if the other variable is numerically known.

• A simple linear regression equation may be: Y = a + bX, where X and Y are the two variables.

• X = independent/explanatory variable; Y = dependent variable/response variable.

• Slope (b) (rate of change) - Slope of the regression line and is known as regression coefficient.

• x is the value of the variable x.

• Intercept (a) is known as "intercept constant" (where point on Y axis where Y axis is intercepted by the regression line.

## Multiple Regression:

• More than one variable is used to predict the expected value of Y, thus increasing the overall percentage of variance in Y that can be accounted for.

• Birth weight of a baby (Y in grams) can be partly predicted from number of cigarettes smoked on a daily basis by both baby's mother (x1) and baby's father (x2). Y=3385-9x1-6x2.

## Z-Test for Correlation:

• If n >100, or if the s.d. of population is known a z-test is used.