You can find the skew of a box plot by visually inspecting the position of the median within the box and the relative lengths of the whiskers. Skewness describes the asymmetry of a distribution, indicating whether the data is concentrated more to one side.
Understanding Skewness in Box Plots
A box plot visually summarizes the five-number summary of a dataset: minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. By observing these components, you can infer the distribution's skewness.
Crucially, if a distribution is skewed, then the median will not be in the middle of the box, and instead off to the side. Additionally, you may find an imbalance in the whisker lengths, where one side is short with no outliers, and the other has a long tail with many more outliers.
Types of Skewness and Their Box Plot Characteristics
There are three primary types of skewness you can identify from a box plot:
1. Right (Positive) Skewness
A distribution is right-skewed when its long tail extends to the right side, meaning most of the data is concentrated on the lower end.
- Box Plot Characteristics:
- The median line is closer to the left side (Q1) of the box.
- The right whisker is noticeably longer than the left whisker, indicating more spread in the upper half of the data.
- More outliers, if present, tend to be on the right side of the distribution.
- Example: Income distribution in a country often exhibits right skewness, as most people earn lower to middle incomes, while a smaller number of individuals earn very high incomes, creating a long tail to the right.
2. Left (Negative) Skewness
A distribution is left-skewed when its long tail extends to the left side, indicating that most of the data is concentrated on the higher end.
- Box Plot Characteristics:
- The median line is closer to the right side (Q3) of the box.
- The left whisker is noticeably longer than the right whisker, indicating more spread in the lower half of the data.
- More outliers, if present, tend to be on the left side of the distribution.
- Example: Exam scores in an easy test might be left-skewed, as most students score high marks, with a few scoring lower marks, creating a tail to the left.
3. Symmetrical Distribution
A symmetrical distribution is balanced, with no significant tail on either side.
- Box Plot Characteristics:
- The median line is approximately in the middle of the box.
- Both whiskers are roughly equal in length.
- Outliers, if present, are balanced on both sides.
- Example: The heights of adult males or females often approximate a symmetrical, bell-shaped distribution.
Visual Cues at a Glance
The following table summarizes how to quickly identify skewness from a box plot:
Skew Type | Median Position (within the box) | Whisker Lengths | Outliers (tendency) |
---|---|---|---|
Right (Positive) | Closer to Q1 (left side) | Right whisker significantly longer than the left | More on the right |
Left (Negative) | Closer to Q3 (right side) | Left whisker significantly longer than the right | More on the left |
Symmetrical | Roughly in the middle | Approximately equal in length | Balanced on both |
Practical Insights and Considerations
- Visual Assessment: Identifying skewness from a box plot is primarily a visual assessment. While it provides a good qualitative understanding, for a precise quantitative measure, statistical skewness coefficients (like Pearson's coefficient or moment coefficient of skewness) are used.
- Outliers' Influence: The presence and location of outliers can strongly indicate skewness, especially if they extend far in one direction, pulling the whisker along with them.
- Data Interpretation: Recognizing skewness is important because it informs how you interpret the "center" of the data. For skewed distributions, the median is often a more robust measure of central tendency than the mean, as the mean is pulled in the direction of the skew.