About Dataset¶
This dataset focuses on the black-white wage gap in the United States. It provides insights into the disparities in hourly wages between black and white workers, as well as different gender and subgroup breakdowns.
The data is derived from the Economic Policy Institute’s State of Working America Data Library, a reputable source for socio-economic research and analysis.
This dataset contains information about the black-white wage gap in the USA at different levels, such as median, average. It includes data on houly wages for workers ages 16 and older, adjusted into 2022 dollars.
Dataset Link -Kaggle link
Data Dictionary -¶
Year
- Year of the data collectionWhite_median
- Median hourly wage for white workers.White_average
- Average hourly wage for white workers.Black_median
- Median hourly wage for black workers.black_average
- Average hourly wage for black workers.white_men_median
- Median hourly wage for white male workerswhite_men_average
- Average hourly wage for white male workersblack_men_median
-Median hourly wage for black male workers.black_men_average
- Average hourly wage for black male workers.white_women_median
- Median hourly wage for white female workers.white_women_average
- Average hourly wage for white female workers.black_women_median
- Median hourly wage for black female workers.black_women_average
- Average hourly wage for black female workers.
Installing dependency¶
👉Ignore It if already installed
1. !pip install numpy
2. !pip install pandas
3. !pip install matplotlib
4. !pip install seaborn
step -1 Data Preprocessing and Cleaning¶
Importing Required library¶
# perform linear operations
import numpy as np
# Data manipulation
import pandas as pd
#Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Remove warnings
import warnings
warnings.filterwarnings('ignore')
# Perfrom Stastical operation
from scipy.stats import ttest_ind
#Load the dataset
black_white = pd.read_csv(r"C:\Users\Lenovo\Downloads\content\Black-White Wage Gap Data Analysis\black_white_wage_gap.csv")
# Print top 5 rows
black_white.head()
year | white_median | white_average | black_median | black_average | white_men_median | white_men_average | black_men_median | black_men_average | white_women_median | white_women_average | black_women_median | black_women_average | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2022 | 24.96 | 34.49 | 19.60 | 25.61 | 27.11 | 39.10 | 20.02 | 27.43 | 22.47 | 29.50 | 19.00 | 23.99 |
1 | 2021 | 25.40 | 34.50 | 19.45 | 25.40 | 27.76 | 38.78 | 20.08 | 26.88 | 22.76 | 29.90 | 18.85 | 24.13 |
2 | 2020 | 25.98 | 34.86 | 19.85 | 26.03 | 28.36 | 39.08 | 20.56 | 27.40 | 23.05 | 30.30 | 19.26 | 24.87 |
3 | 2019 | 24.39 | 32.79 | 18.45 | 24.09 | 27.39 | 36.84 | 19.31 | 25.18 | 22.01 | 28.41 | 18.08 | 23.17 |
4 | 2018 | 23.97 | 32.44 | 17.57 | 23.53 | 26.79 | 36.55 | 18.66 | 24.67 | 21.75 | 28.01 | 17.34 | 22.55 |
# Print last 5 rows
black_white.tail()
year | white_median | white_average | black_median | black_average | white_men_median | white_men_average | black_men_median | black_men_average | white_women_median | white_women_average | black_women_median | black_women_average | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
45 | 1977 | 20.00 | 23.38 | 16.23 | 18.93 | 24.94 | 27.66 | 18.70 | 20.84 | 15.33 | 17.57 | 14.06 | 16.91 |
46 | 1976 | 20.06 | 23.47 | 16.25 | 19.20 | 24.32 | 27.54 | 19.19 | 21.57 | 15.42 | 17.79 | 13.97 | 16.73 |
47 | 1975 | 19.96 | 23.30 | 16.15 | 18.46 | 24.68 | 27.37 | 19.15 | 20.60 | 15.32 | 17.45 | 13.41 | 16.14 |
48 | 1974 | 20.04 | 23.21 | 16.07 | 18.36 | 24.55 | 27.34 | 19.02 | 20.84 | 15.22 | 17.23 | 13.46 | 15.68 |
49 | 1973 | 20.53 | 23.72 | 15.96 | 18.61 | 24.98 | 27.93 | 19.29 | 21.09 | 15.36 | 17.57 | 13.38 | 15.83 |
black_white.shape
(50, 13)
#Check info of each colummn
black_white.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 50 entries, 0 to 49 Data columns (total 13 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 year 50 non-null int64 1 white_median 50 non-null float64 2 white_average 50 non-null float64 3 black_median 50 non-null float64 4 black_average 50 non-null float64 5 white_men_median 50 non-null float64 6 white_men_average 50 non-null float64 7 black_men_median 50 non-null float64 8 black_men_average 50 non-null float64 9 white_women_median 50 non-null float64 10 white_women_average 50 non-null float64 11 black_women_median 50 non-null float64 12 black_women_average 50 non-null float64 dtypes: float64(12), int64(1) memory usage: 5.2 KB
# check for duplicate
black_white.duplicated().sum()
0
From above cell we see that there is no duplicate present in the data
Step -2 Data analysis¶
Let's ask some question from our data
black_white['year'].unique()
array([2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1998, 1997, 1996, 1995, 1994, 1993, 1992, 1991, 1990, 1989, 1988, 1987, 1986, 1985, 1984, 1983, 1982, 1981, 1980, 1979, 1978, 1977, 1976, 1975, 1974, 1973], dtype=int64)
plt.figure(figsize=(12, 8))
# Overall median wages
plt.plot(black_white['year'], black_white['white_median'], label='White Median', linestyle='--', marker='o')
plt.plot(black_white['year'], black_white['black_median'], label='Black Median', linestyle='--', marker='o')
# Overall average wages
plt.plot(black_white['year'], black_white['white_average'], label='White Average', linestyle='-', marker='x')
plt.plot(black_white['year'], black_white['black_average'], label='Black Average', linestyle='-', marker='x')
plt.title('Overall Median and Average Wages Over Time')
plt.xlabel('Year')
plt.ylabel('Wage')
plt.legend()
plt.grid(True)
plt.show()
White Workers:¶
- White Average Wage:
- The white average wage has experienced an overall increasing trend over the years.
- The range of white average wage values is between 25.0 and 32.5.
- White Median Wage:
- Similar to the average wage, the white median wage has shown an upward trend.
- The range of white median wage values is between 20.0 and 25.0.
Black Workers:¶
- Black Average Wage:
- The black average wage starts from 19.0 and increases over the years, reaching up to 25.0.
- There is an overall positive trend in black average wage values.
- Black Median Wage:
- The black median wage begins around 16.0 and shows an increasing trend, reaching up to 20.5.
- Similar to the average wage, there is an upward trajectory in black median wage values.
Are there any noticeable trends in the wage gap between white and black workers?¶
# Calculate the wage gap
black_white['wage_gap'] = black_white['white_median'] - black_white['black_median']
# Line chart for the wage gap over time
plt.figure(figsize=(12, 8))
plt.plot(black_white['year'], black_white['wage_gap'], label='Wage Gap', marker='o', linestyle='--', color='red')
plt.title('Wage Gap Between White and Black Workers Over Time')
plt.xlabel('Year')
plt.ylabel('Wage Gap')
plt.axhline(y=0, color='black', linestyle='--', linewidth=0.8, label='Zero Gap')
plt.legend()
plt.grid(True)
plt.show()
Wage Gap Trends:¶
Fluctuations Over Time:
- The wage gap between white and black workers shows frequent changes over the years.
- The line plot indicates variability in the magnitude of the wage gap.
- Peak in 2018:
- The wage gap appears to reach its highest point in 2018.
- This could be indicative of specific economic or societal factors contributing to a larger disparity in that particular year.
- Low in 1979:
- Conversely, the wage gap is noted to be low in 1979.
- Understanding the conditions or policies during that time may provide insights into the factors contributing to a narrower wage gap.
# Calculate the wage gap for men and women separately
black_white['wage_gap_men'] = black_white['white_men_median'] - black_white['black_men_median']
black_white['wage_gap_women'] = black_white['white_women_median'] - black_white['black_women_median']
# Line chart for the wage gap for men and women over time
plt.figure(figsize=(12, 8))
plt.plot(black_white['year'], black_white['wage_gap_men'], label='Wage Gap (Men)', marker='o', linestyle='--', color='blue')
plt.plot(black_white['year'], black_white['wage_gap_women'], label='Wage Gap (Women)', marker='x', linestyle='--', color='green')
plt.title('Wage Gap Between White and Black Men and Women Over Time')
plt.xlabel('Year')
plt.ylabel('Wage Gap')
plt.axhline(y=0, color='black', linestyle='--', linewidth=0.8, label='Zero Gap')
plt.legend()
plt.grid(True)
plt.show()
Wage Gap Between White and Black Men:¶
Trend Over Time:¶
The wage gap between white and black men exhibits changes across each year, indicating a dynamic trend.
- Peak in 2017:
- The wage gap reaches its highest point in 2017 for men.
This suggests that, during that year, there was a significant difference in median wages between white and black men. Low in 1975:
- Conversely, the wage gap is noted to be low in 1975 for men. Understanding the conditions or policies during that time may provide insights into factors contributing to a narrower wage gap.
- Range:
- The wage gap for men ranges between 5 to 8.3, indicating fluctuations but generally staying within this range over the years.
Wage Gap Between White and Black Women:¶
Trend Over Time:¶
The wage gap between white and black women also changes across each year, suggesting ongoing variations.
- Peak in 2018:
- The wage gap is highest in 2018 for women.
Similar to men, this signals a notable difference in median wages between white and black women during that year.
- Low in 2019:
- The wage gap is observed to be low in 2019 for women.
Analyzing the context of that year may reveal factors contributing to a reduced wage gap.
Range:
The wage gap for women ranges between 1.5 to 4.3, indicating fluctuations within this range over the years.
Are there significant changes in the median and average wages for white and black men and women over the years?¶
years = black_white['year'].unique()
for subgroup in ['white_men', 'black_men', 'white_women', 'black_women']:
median_p_values = []
average_p_values = []
for year in years:
# Subset data for the specific year and subgroup
subset_data = black_white[black_white['year'] == year]
white_values = black_white[f'{subgroup}_median']
black_values = black_white[f'{subgroup}_median']
# Perform t-test for median wages
_, median_p_value = ttest_ind(white_values, black_values, equal_var=False)
median_p_values.append(median_p_value)
# Perform t-test for average wages
white_values = subset_data[f'{subgroup}_average']
black_values = subset_data[f'{subgroup}_average']
_, average_p_value = ttest_ind(white_values, black_values, equal_var=False)
average_p_values.append(average_p_value)
# Visualize p-values over the years
plt.figure(figsize=(12, 8))
plt.plot(years, median_p_values, label='Median Wage p-values', marker='o', linestyle='--')
plt.plot(years, average_p_values, label='Average Wage p-values', marker='x', linestyle='--')
plt.title(f'Significance of Changes in {subgroup.capitalize()} Wages Over Time')
plt.xlabel('Year')
plt.ylabel('p-value')
plt.axhline(y=0.05, color='red', linestyle='--', linewidth=0.8, label='Significance Level (0.05)')
plt.legend()
plt.grid(True)
plt.show()
Analysis of Median Wages Over Time¶
White Men:
Observation: The median wage for white men shows no significant changes over the years.
Statistical Test: The p-value for the test comparing median wages for white men over time is consistently 1, suggesting no evidence of significant changes.Black Men:
Observation: Similar to white men, the median wage for black men exhibits no significant changes over the years.
Statistical Test: The p-value for the test comparing median wages for black men over time is consistently 1, indicating no statistical evidence of significant changes.White Women:
Observation: The median wage for white women remains stable with no significant changes across the years.
Statistical Test: The p-value for the test comparing median wages for white women over time is consistently 1, suggesting no statistical evidence of significant changes.Black Women:
Observation: Similar to white women, the median wage for black women does not show significant changes over the years.
Statistical Test: The p-value for the test comparing median wages for black women over time is consistently 1, indicating no statistical evidence of significant changes.
# Filter data for the past decade
start_year = black_white['year'].max() - 10
recent_data = black_white[black_white['year'] >= start_year]
# Calculate the wage gap
recent_data['wage_gap'] = recent_data['white_median'] - recent_data['black_median']
# Line chart for the wage gap over the past decade
plt.figure(figsize=(12, 8))
plt.plot(recent_data['year'], recent_data['wage_gap'], label='Wage Gap', marker='o', linestyle='--', color='red')
plt.title('Wage Gap Over the Past Decade')
plt.xlabel('Year')
plt.ylabel('Wage Gap')
plt.axhline(y=0, color='black', linestyle='--', linewidth=0.8, label='Zero Gap')
plt.legend()
plt.grid(True)
plt.show()
Wage Gap Trends Over the Past Decade:¶
- 2012 to 2022:
- The wage gap fluctuates over the years, ranging from 5.1 to 6.5.
- Notable peaks are observed in 2018 (6.5) and 2017 (6.0), while the lowest point is in 2013 (5.1).
How do the median and average wages of white and black workers compare in recent years?¶
# Filter data for recent years
recent_data = black_white[black_white['year'] >= 2015]
# Line chart for median and average wages comparison
plt.figure(figsize=(12, 5))
# Median wages
plt.plot(recent_data['year'], recent_data['white_median'], label='White Median', marker='o', linestyle='--', color='red')
plt.plot(recent_data['year'], recent_data['black_median'], label='Black Median', marker='x', linestyle='--', color='orange')
# Average wages
plt.plot(recent_data['year'], recent_data['white_average'], label='White Average', marker='o', linestyle='-', color='blue')
plt.plot(recent_data['year'], recent_data['black_average'], label='Black Average', marker='x', linestyle='-', color='green')
plt.title('Comparison of Median and Average Wages for White and Black Workers (Recent Years)')
plt.xlabel('Year')
plt.ylabel('Wage')
plt.legend()
plt.grid(True)
plt.show()
Wage Comparison in Recent Years (2015 to 2022):¶
Black Workers:¶
Black Median Wage:¶
Observation: The black median wage fluctuates between 17.5 to 20.0 across the years from 2015 to 2022.
Trend: There is variability, and the median wage for black workers does not follow a consistent upward or downward trend.
Black Average Wage:¶
Observation: The black average wage changes between 22.5 to 25.5 during the same period.
Trend: Similar to the median wage, the average wage for black workers shows fluctuations without a clear upward or downward trend.
White Workers:¶
White Median Wage:¶
Observation: The white median wage varies between 23.0 to 26.0 from 2015 to 2022.
Trend: There is variability, and the median wage for white workers does not follow a consistent upward or downward trend.
White Average Wage:¶
Observation: The white average wage ranges between 31.0 to 35.0 during the same period.
Trend: Similar to the median wage, the average wage for white workers shows fluctuations without a clear upward or downward trend.
Recommendations and Conclusions -¶
Recommendations:¶
- Policy Advocacy:
Advocate for policies that address the observed fluctuations in the wage gap. Consider supporting initiatives that focus on equal pay, anti-discrimination measures, and fair labor practices. - Industry-Specific Analysis:
Conduct a more detailed analysis of wage trends in specific industries. Identify sectors where wage disparities are more pronounced and work towards implementing targeted interventions or policies within those industries. - Intersectional Analysis:
Explore the intersectionality of race and gender in wage disparities. Investigate how the wage gap differs for individuals who belong to both marginalized groups and tailor interventions accordingly. - Educational Initiatives:
Collaborate with educational institutions and organizations to address disparities in educational attainment. Promote programs that enhance access to education and skills training, particularly for communities facing wage gaps. - Continuous Monitoring:
Implement a system for continuous monitoring of wage disparities. Regularly update and assess the effectiveness of policies and interventions to ensure ongoing progress. - Economic Conditions:
Stay vigilant about economic conditions and their impact on wages. Advocate for measures that promote economic stability and growth, as these factors can influence overall wage levels.
Conclusions:¶
Dynamic Wage Landscape:
The analysis reveals a dynamic landscape of wage disparities between white and black workers. Fluctuations over the years highlight the sensitivity of wages to external factors.Persistent Wage Gaps:
While there have been improvements in median and average wages for both white and black workers, persistent wage gaps remain. These gaps vary across demographic groups and genders.Importance of Intersectionality:
The intersectional analysis emphasizes the importance of considering multiple dimensions, such as race and gender, in understanding wage disparities. Intersectionality provides a more nuanced view of the challenges faced by individuals.Need for Targeted Interventions:
Targeted interventions are crucial to addressing specific areas of concern. Policy changes, educational initiatives, and industry-specific strategies are recommended to effectively narrow wage gaps.Ongoing Monitoring and Evaluation:
Continuous monitoring and evaluation of wage disparities are essential. Regular assessments will facilitate the identification of evolving trends and the effectiveness of implemented measures.Collaborative Efforts:
Addressing wage disparities requires collaborative efforts from policymakers, businesses, educational institutions, and advocacy groups. A multifaceted approach is necessary to achieve meaningful and sustainable change.Consideration of Historical Context:
Historical context, as indicated by the low wage gap in 1979, should be considered when analyzing current disparities. Understanding historical patterns can provide insights into long-term systemic challenges. Remember to adapt these recommendations and conclusions based on the specific nuances of your dataset and the context in which the analysis is presented. Adding concrete examples and case studies related to successful interventions can further strengthen your recommendations.
Practice Questions -¶
- Explore long-term trends by looking at data over several decades. Have there been persistent patterns or shifts in the wage gap over this extended period?
- Investigate whether changes in the minimum wage have had a discernible impact on the wage gap between white and black workers.