About Dataset¶
This meticulously curated dataset offers a panoramic view of education on a global scale , delivering profound insights into the dynamic landscape of education across diverse countries and regions. Spanning a rich tapestry of educational aspects, it encapsulates crucial metrics including out-of-school rates, completion rates, proficiency levels, literacy rates, birth rates, and primary and tertiary education enrollment statistics. A treasure trove of knowledge, this dataset is an indispensable asset for discerning researchers, dedicated educators, and forward-thinking policymakers, enabling them to embark on a transformative journey of assessing, enhancing, and reshaping education systems worldwide.
This dataset provides a comprehensive global perspective on education, offering vital insights into diverse education systems worldwide. It covers essential metrics like out-of-school rates, completion rates, proficiency levels, literacy rates, birth rates, and enrollment in primary and tertiary education. It's a valuable resource for researchers, educators, and policymakers looking to assess and improve education systems globally.
Dataset Link -Kaggle link
Data Dictionary¶
Countries and Areas
: Name of the countries and areas.Latitude
: Latitude coordinates of the geographical location.Longitude
: Longitude coordinates of the geographical location.OOSR_Pre0Primary_Age_Male
: Out-of-school rate for pre-primary age males.OOSR_Pre0Primary_Age_Female
: Out-of-school rate for pre-primary age females.OOSR_Primary_Age_Male
: Out-of-school rate for primary age males.OOSR_Primary_Age_Female
: Out-of-school rate for primary age females.OOSR_Lower_Secondary_Age_Male
: Out-of-school rate for lower secondary age males.OOSR_Lower_Secondary_Age_Female
: Out-of-school rate for lower secondary age females.OOSR_Upper_Secondary_Age_Male
: Out-of-school rate for upper secondary age males.OOSR_Upper_Secondary_Age_Female
: Out-of-school rate for upper secondary age females.Completion_Rate_Primary_Male
: Completion rate for primary education among males.Completion_Rate_Primary_Female
: Completion rate for primary education among females.Completion_Rate_Lower_Secondary_Male
: Completion rate for lower secondary education among males.Completion_Rate_Lower_Secondary_Female
: Completion rate for lower secondary education among females.Completion_Rate_Upper_Secondary_Male
: Completion rate for upper secondary education among males.Completion_Rate_Upper_Secondary_Female
: Completion rate for upper secondary education among females.Grade_2_3_Proficiency_Reading
: Proficiency in reading for grade 2-3 students.Grade_2_3_Proficiency_Math
: Proficiency in math for grade 2-3 students.Primary_End_Proficiency_Reading
: Proficiency in reading at the end of primary education.Primary_End_Proficiency_Math
: Proficiency in math at the end of primary education.Lower_Secondary_End_Proficiency_Reading
: Proficiency in reading at the end of lower secondary education.Lower_Secondary_End_Proficiency_Math
: Proficiency in math at the end of lower secondary education.Youth_15_24_Literacy_Rate_Male
: Literacy rate among male youths aged 15-24.Youth_15_24_Literacy_Rate_Female
: Literacy rate among female youths aged 15-24.Birth_Rate
: Birth rate in the respective countries/areas.Gross_Primary_Education_Enrollment
: Gross enrollment in primary education.Gross_Tertiary_Education_Enrollment
: Gross enrollment in tertiary education.Unemployment_Rate
: Unemployment rate in the respective countries/areas.
Installing dependency¶
👉Ignore It if already installed
1. !pip install numpy
2. !pip install pandas
3. !pip install matplotlib
4. !pip install seaborn
step -1 Data Preprocessing and Cleaning¶
Importing Required library¶
# perform linear operations
import numpy as np
# Data manipulation
import pandas as pd
#Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Remove warnings
import warnings
warnings.filterwarnings('ignore')
#Load the dataset
education = pd.read_csv(r"C:\Users\Lenovo\Downloads\content\World Education Data Analysis\Global_Education.csv", encoding='ISO-8859-1')
# Print top 5 rows
education.head()
Countries and areas | Latitude | Longitude | OOSR_Pre0Primary_Age_Male | OOSR_Pre0Primary_Age_Female | OOSR_Primary_Age_Male | OOSR_Primary_Age_Female | OOSR_Lower_Secondary_Age_Male | OOSR_Lower_Secondary_Age_Female | OOSR_Upper_Secondary_Age_Male | ... | Primary_End_Proficiency_Reading | Primary_End_Proficiency_Math | Lower_Secondary_End_Proficiency_Reading | Lower_Secondary_End_Proficiency_Math | Youth_15_24_Literacy_Rate_Male | Youth_15_24_Literacy_Rate_Female | Birth_Rate | Gross_Primary_Education_Enrollment | Gross_Tertiary_Education_Enrollment | Unemployment_Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 33.939110 | 67.709953 | 0 | 0 | 0 | 0 | 0 | 0 | 44 | ... | 13 | 11 | 0 | 0 | 74 | 56 | 32.49 | 104.0 | 9.7 | 11.12 |
1 | Albania | 41.153332 | 20.168331 | 4 | 2 | 6 | 3 | 6 | 1 | 21 | ... | 0 | 0 | 48 | 58 | 99 | 100 | 11.78 | 107.0 | 55.0 | 12.33 |
2 | Algeria | 28.033886 | 1.659626 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 21 | 19 | 98 | 97 | 24.28 | 109.9 | 51.4 | 11.70 |
3 | Andorra | 42.506285 | 1.521801 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 7.20 | 106.4 | 0.0 | 0.00 |
4 | Angola | 11.202692 | 17.873887 | 31 | 39 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 40.73 | 113.5 | 9.3 | 6.89 |
5 rows × 29 columns
education
Countries and areas | Latitude | Longitude | OOSR_Pre0Primary_Age_Male | OOSR_Pre0Primary_Age_Female | OOSR_Primary_Age_Male | OOSR_Primary_Age_Female | OOSR_Lower_Secondary_Age_Male | OOSR_Lower_Secondary_Age_Female | OOSR_Upper_Secondary_Age_Male | ... | Primary_End_Proficiency_Reading | Primary_End_Proficiency_Math | Lower_Secondary_End_Proficiency_Reading | Lower_Secondary_End_Proficiency_Math | Youth_15_24_Literacy_Rate_Male | Youth_15_24_Literacy_Rate_Female | Birth_Rate | Gross_Primary_Education_Enrollment | Gross_Tertiary_Education_Enrollment | Unemployment_Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 33.939110 | 67.709953 | 0 | 0 | 0 | 0 | 0 | 0 | 44 | ... | 13 | 11 | 0 | 0 | 74 | 56 | 32.49 | 104.0 | 9.7 | 11.12 |
1 | Albania | 41.153332 | 20.168331 | 4 | 2 | 6 | 3 | 6 | 1 | 21 | ... | 0 | 0 | 48 | 58 | 99 | 100 | 11.78 | 107.0 | 55.0 | 12.33 |
2 | Algeria | 28.033886 | 1.659626 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 21 | 19 | 98 | 97 | 24.28 | 109.9 | 51.4 | 11.70 |
3 | Andorra | 42.506285 | 1.521801 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 7.20 | 106.4 | 0.0 | 0.00 |
4 | Angola | 11.202692 | 17.873887 | 31 | 39 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 40.73 | 113.5 | 9.3 | 6.89 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
197 | Venezuela | 6.423750 | 66.589730 | 14 | 14 | 10 | 10 | 15 | 13 | 28 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 17.88 | 97.2 | 79.3 | 8.80 |
198 | Vietnam | 14.058324 | 108.277199 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 55 | 51 | 86 | 81 | 98 | 98 | 16.75 | 110.6 | 28.5 | 2.01 |
199 | Yemen | 15.552727 | 48.516388 | 96 | 96 | 10 | 21 | 23 | 34 | 46 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 30.45 | 93.6 | 10.2 | 12.91 |
200 | Zambia | 13.133897 | 27.849332 | 0 | 0 | 17 | 13 | 0 | 0 | 0 | ... | 0 | 0 | 5 | 2 | 93 | 92 | 36.19 | 98.7 | 4.1 | 11.43 |
201 | Zimbabwe | 19.015438 | 29.154857 | 60 | 58 | 0 | 0 | 0 | 0 | 45 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 30.68 | 109.9 | 10.0 | 4.95 |
202 rows × 29 columns
education.shape
(202, 29)
From above cell we see that the dataset contain 202 observations and 29 columns
#Check info of each colummn
education.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 202 entries, 0 to 201 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Countries and areas 202 non-null object 1 Latitude 202 non-null float64 2 Longitude 202 non-null float64 3 OOSR_Pre0Primary_Age_Male 202 non-null int64 4 OOSR_Pre0Primary_Age_Female 202 non-null int64 5 OOSR_Primary_Age_Male 202 non-null int64 6 OOSR_Primary_Age_Female 202 non-null int64 7 OOSR_Lower_Secondary_Age_Male 202 non-null int64 8 OOSR_Lower_Secondary_Age_Female 202 non-null int64 9 OOSR_Upper_Secondary_Age_Male 202 non-null int64 10 OOSR_Upper_Secondary_Age_Female 202 non-null int64 11 Completion_Rate_Primary_Male 202 non-null int64 12 Completion_Rate_Primary_Female 202 non-null int64 13 Completion_Rate_Lower_Secondary_Male 202 non-null int64 14 Completion_Rate_Lower_Secondary_Female 202 non-null int64 15 Completion_Rate_Upper_Secondary_Male 202 non-null int64 16 Completion_Rate_Upper_Secondary_Female 202 non-null int64 17 Grade_2_3_Proficiency_Reading 202 non-null int64 18 Grade_2_3_Proficiency_Math 202 non-null int64 19 Primary_End_Proficiency_Reading 202 non-null int64 20 Primary_End_Proficiency_Math 202 non-null int64 21 Lower_Secondary_End_Proficiency_Reading 202 non-null int64 22 Lower_Secondary_End_Proficiency_Math 202 non-null int64 23 Youth_15_24_Literacy_Rate_Male 202 non-null int64 24 Youth_15_24_Literacy_Rate_Female 202 non-null int64 25 Birth_Rate 202 non-null float64 26 Gross_Primary_Education_Enrollment 202 non-null float64 27 Gross_Tertiary_Education_Enrollment 202 non-null float64 28 Unemployment_Rate 202 non-null float64 dtypes: float64(6), int64(22), object(1) memory usage: 45.9+ KB
From above cell we see that there are 1 object column and 22 integer and 6 column contain float values
# Checking null values
education.isnull().sum()
Countries and areas 0 Latitude 0 Longitude 0 OOSR_Pre0Primary_Age_Male 0 OOSR_Pre0Primary_Age_Female 0 OOSR_Primary_Age_Male 0 OOSR_Primary_Age_Female 0 OOSR_Lower_Secondary_Age_Male 0 OOSR_Lower_Secondary_Age_Female 0 OOSR_Upper_Secondary_Age_Male 0 OOSR_Upper_Secondary_Age_Female 0 Completion_Rate_Primary_Male 0 Completion_Rate_Primary_Female 0 Completion_Rate_Lower_Secondary_Male 0 Completion_Rate_Lower_Secondary_Female 0 Completion_Rate_Upper_Secondary_Male 0 Completion_Rate_Upper_Secondary_Female 0 Grade_2_3_Proficiency_Reading 0 Grade_2_3_Proficiency_Math 0 Primary_End_Proficiency_Reading 0 Primary_End_Proficiency_Math 0 Lower_Secondary_End_Proficiency_Reading 0 Lower_Secondary_End_Proficiency_Math 0 Youth_15_24_Literacy_Rate_Male 0 Youth_15_24_Literacy_Rate_Female 0 Birth_Rate 0 Gross_Primary_Education_Enrollment 0 Gross_Tertiary_Education_Enrollment 0 Unemployment_Rate 0 dtype: int64
From above cell we see that there are no missing values in the dataset
# check for duplicate
education.duplicated().sum()
0
From above cell we see that there is no duplicate present in our dataset
# find the summary satistics of our data
education.describe()
Latitude | Longitude | OOSR_Pre0Primary_Age_Male | OOSR_Pre0Primary_Age_Female | OOSR_Primary_Age_Male | OOSR_Primary_Age_Female | OOSR_Lower_Secondary_Age_Male | OOSR_Lower_Secondary_Age_Female | OOSR_Upper_Secondary_Age_Male | OOSR_Upper_Secondary_Age_Female | ... | Primary_End_Proficiency_Reading | Primary_End_Proficiency_Math | Lower_Secondary_End_Proficiency_Reading | Lower_Secondary_End_Proficiency_Math | Youth_15_24_Literacy_Rate_Male | Youth_15_24_Literacy_Rate_Female | Birth_Rate | Gross_Primary_Education_Enrollment | Gross_Tertiary_Education_Enrollment | Unemployment_Rate | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | ... | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 | 202.000000 |
mean | 25.081422 | 55.166928 | 19.658416 | 19.282178 | 5.282178 | 5.569307 | 8.707921 | 8.831683 | 20.292079 | 19.975248 | ... | 10.717822 | 10.376238 | 25.787129 | 24.450495 | 35.801980 | 35.084158 | 18.914010 | 94.942574 | 34.392574 | 6.000000 |
std | 16.813639 | 45.976287 | 25.007604 | 25.171147 | 9.396442 | 10.383092 | 13.258203 | 14.724717 | 21.485592 | 23.140376 | ... | 24.866101 | 22.484423 | 33.181384 | 31.965467 | 45.535186 | 45.249643 | 10.828184 | 29.769338 | 29.978206 | 5.273136 |
min | 0.023559 | 0.824782 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
25% | 11.685062 | 18.665678 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.250000 | 0.250000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 10.355000 | 97.200000 | 9.000000 | 2.302500 |
50% | 21.207861 | 43.518091 | 9.000000 | 7.000000 | 1.000000 | 1.000000 | 2.000000 | 2.000000 | 15.000000 | 12.000000 | ... | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 17.550000 | 101.850000 | 24.850000 | 4.585000 |
75% | 39.901792 | 77.684945 | 31.000000 | 30.000000 | 6.000000 | 6.750000 | 12.750000 | 10.750000 | 32.750000 | 30.000000 | ... | 0.000000 | 0.000000 | 56.750000 | 50.750000 | 94.000000 | 96.750000 | 27.692500 | 107.300000 | 59.975000 | 8.655000 |
max | 64.963051 | 178.065032 | 96.000000 | 96.000000 | 58.000000 | 67.000000 | 61.000000 | 70.000000 | 84.000000 | 89.000000 | ... | 99.000000 | 89.000000 | 89.000000 | 94.000000 | 100.000000 | 100.000000 | 46.080000 | 142.500000 | 136.600000 | 28.180000 |
8 rows × 28 columns
education.columns
Index(['Countries and areas', 'Latitude ', 'Longitude', 'OOSR_Pre0Primary_Age_Male', 'OOSR_Pre0Primary_Age_Female', 'OOSR_Primary_Age_Male', 'OOSR_Primary_Age_Female', 'OOSR_Lower_Secondary_Age_Male', 'OOSR_Lower_Secondary_Age_Female', 'OOSR_Upper_Secondary_Age_Male', 'OOSR_Upper_Secondary_Age_Female', 'Completion_Rate_Primary_Male', 'Completion_Rate_Primary_Female', 'Completion_Rate_Lower_Secondary_Male', 'Completion_Rate_Lower_Secondary_Female', 'Completion_Rate_Upper_Secondary_Male', 'Completion_Rate_Upper_Secondary_Female', 'Grade_2_3_Proficiency_Reading', 'Grade_2_3_Proficiency_Math', 'Primary_End_Proficiency_Reading', 'Primary_End_Proficiency_Math', 'Lower_Secondary_End_Proficiency_Reading', 'Lower_Secondary_End_Proficiency_Math', 'Youth_15_24_Literacy_Rate_Male', 'Youth_15_24_Literacy_Rate_Female', 'Birth_Rate', 'Gross_Primary_Education_Enrollment', 'Gross_Tertiary_Education_Enrollment', 'Unemployment_Rate'], dtype='object')
Step -2 Data analysis¶
Let's Solve some practice Questions
grouped_data = education.groupby('Countries and areas')
# Compute average completion rates for each countries and education level
completion_rates = grouped_data[['Completion_Rate_Primary_Male', 'Completion_Rate_Primary_Female',
'Completion_Rate_Lower_Secondary_Male', 'Completion_Rate_Lower_Secondary_Female',
'Completion_Rate_Upper_Secondary_Male', 'Completion_Rate_Upper_Secondary_Female']].mean()
top_10_countries_completion_rates=completion_rates.sort_values(by=['Completion_Rate_Primary_Male', 'Completion_Rate_Primary_Female',
'Completion_Rate_Lower_Secondary_Male', 'Completion_Rate_Lower_Secondary_Female',
'Completion_Rate_Upper_Secondary_Male', 'Completion_Rate_Upper_Secondary_Female'],ascending=False).head(10)
top_10_countries_completion_rates
Completion_Rate_Primary_Male | Completion_Rate_Primary_Female | Completion_Rate_Lower_Secondary_Male | Completion_Rate_Lower_Secondary_Female | Completion_Rate_Upper_Secondary_Male | Completion_Rate_Upper_Secondary_Female | |
---|---|---|---|---|---|---|
Countries and areas | ||||||
North Korea | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
Kazakhstan | 100.0 | 100.0 | 100.0 | 100.0 | 95.0 | 96.0 |
Belarus | 100.0 | 100.0 | 100.0 | 100.0 | 91.0 | 94.0 |
Turkmenistan | 100.0 | 100.0 | 99.0 | 100.0 | 93.0 | 95.0 |
Georgia | 100.0 | 100.0 | 98.0 | 98.0 | 79.0 | 83.0 |
Ukraine | 100.0 | 99.0 | 100.0 | 100.0 | 97.0 | 97.0 |
Cuba | 100.0 | 98.0 | 95.0 | 98.0 | 85.0 | 86.0 |
Kyrgyzstan | 99.0 | 100.0 | 99.0 | 99.0 | 89.0 | 85.0 |
Serbia | 99.0 | 100.0 | 99.0 | 99.0 | 71.0 | 81.0 |
Bosnia and Herzegovina | 99.0 | 100.0 | 97.0 | 97.0 | 92.0 | 92.0 |
# Plot the data
plt.figure(figsize=(14, 5))
sns.heatmap(top_10_countries_completion_rates, annot=True, cmap="YlGnBu", fmt=".0f", cbar_kws={'label': 'Average Completion Rate'})
plt.title('Average Completion Rates of top 10 countries and Education Levels')
plt.show()