Understanding

Data Literacy


  • Correlation


    Definition:
    The correlation coefficient is a statistical measure of the strength of a linear relationship between two variables. Its values can range from -1 to 1.
    A correlation coefficient of -1 describes a perfect negative correlation , with values in one series rising as those in the other decline.
    A coefficient of 1 shows a perfect positive correlation both variables rise or fall together.
    A correlation coefficient of 0 means there is no correlation between the two vaiables.

    Example:
    An example of a negative correlation is: The faster a car is driven the more fuel it consumes.
    Here is a more detailed example

    In the below image one variable (ex. speed) is on the vertical axis and a second variable (ex. fuel consumed) on the the horizontal axis
    Based on our example case we would expect to see a graph that resembled the Negative Correlation image.



    Causation vs Correlation:
    Corrlation idicates a relationship or association between two variables. It does not imply causation.
    Causation implies that one variable directly influences or causes change in another variable.
    Here is a more detailed example

    Sample Size:
    The number of records (samples) that are being analyzed for correlation will need to increase to determine if there is a meaningful (statsically signficant) relationship.

    Here are some example sample sizes needed for different correlation effect sizes are statsically signficant (significance level of 0.05 and power of 0.8):
    Large effect size ( Correlation=0.5): Approximately 29 data points.
    Medium effect size ( Correlation=0.3): Approximately 85 data points.
    Small effect size ( Correlation=0.1): Approximately 783 data points.


    Statistical Significance:
    For the above Sample Size example a significance level of 0.05 means you are accepting a 5% chance of a false positive,
    while a power of 0.8 means you have an 80% chance of detecting a true correlation if it exists.

    Here is a more detailed example

    Data Literacy