Heteroscedasticity refers to the situation in statistical modeling where the variability of a variable is unequal across the range of values of a second variable that predicts it. In simpler terms, the "spread" of the residuals (the difference between observed and predicted values) is not constant across all levels of the independent variable(s).
Understanding Heteroscedasticity
Heteroscedasticity violates one of the key assumptions of ordinary least squares (OLS) regression, which assumes that the variance of the error term is constant across all observations (homoscedasticity). When heteroscedasticity is present, the standard errors of the regression coefficients are unreliable, potentially leading to incorrect inferences about the significance of the predictors.
Identifying Heteroscedasticity
Several methods can be used to identify heteroscedasticity:
-
Visual Inspection: Examining a scatterplot of residuals against predicted values or independent variables can reveal a "funnel shape," indicating that the spread of residuals changes with the predictor variable.
-
Formal Tests: Statistical tests such as the Breusch-Pagan test, White test, and Goldfeld-Quandt test can formally assess the presence of heteroscedasticity. These tests evaluate whether the variance of the residuals is related to the independent variables.
Consequences of Heteroscedasticity
The presence of heteroscedasticity can have several consequences:
-
Inefficient Estimators: While OLS estimators remain unbiased, they are no longer the Best Linear Unbiased Estimators (BLUE), meaning they are not the most efficient estimators available.
-
Incorrect Standard Errors: The estimated standard errors of the regression coefficients are biased, leading to incorrect hypothesis tests and confidence intervals. This can result in Type I errors (false positives) or Type II errors (false negatives).
-
Invalid Statistical Inferences: Hypothesis testing based on t-statistics and F-statistics may be unreliable, leading to incorrect conclusions about the significance of the predictors.
Addressing Heteroscedasticity
Several methods can be used to address heteroscedasticity:
-
Transforming the Data: Applying transformations such as the logarithmic transformation or square root transformation to the dependent variable can sometimes stabilize the variance.
-
Weighted Least Squares (WLS): WLS involves weighting each observation by the inverse of its variance. This gives more weight to observations with lower variance and less weight to observations with higher variance.
-
Robust Standard Errors: Using robust standard errors (e.g., Huber-White standard errors) provides consistent estimates of the standard errors even in the presence of heteroscedasticity. These are sometimes referred to as heteroscedasticity-consistent standard errors.
Example Scenario
Imagine analyzing the relationship between income and expenditure. It is likely that higher-income individuals have more discretionary income, resulting in a wider range of spending habits compared to lower-income individuals. In this case, the variance of expenditure may increase as income increases, indicating heteroscedasticity.