About Dataset¶
Compiled from the National Center of Education Statistics Annual Digest. Specifically, Table 330.20: Average undergraduate tuition and fees and room and board rates charged for full-time students in degree-granting postsecondary institutions, by control and level of institution and state or jurisdiction.
Dataset Link - Kaggle Link
Data Dictionary -¶
Year
- The Digest year this information comes fromState
- The U.S. StateType
- Type of University, Private or Public and in-state or out-of-state. Private colleges charge the same for in/out of stateLength
- Whether the college mainly offers 2-year (Associates) or 4-year (Bachelors) programsExpenses
- The Expense being described, tuition/fees or on-campus living expensesValue
- The average cost for this particular expense, in USD ($)
Installing dependency¶
👉Ignore It if already installed
1. !pip install numpy
2. !pip install pandas
3. !pip install matplotlib
4. !pip install seaborn
step -1 Data Preprocessing and Cleaning¶
Importing Required library¶
# perform linear operations
import numpy as np
# Data manipulation
import pandas as pd
#Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Remove warnings
import warnings
warnings.filterwarnings('ignore')
#Load the dataset
us_data=pd.read_csv(r"C:\Users\Lenovo\Downloads\content\US undergrad data\nces330_20.csv")
# Print top 5 rows
us_data.head()
Year | State | Type | Length | Expense | Value | |
---|---|---|---|---|---|---|
0 | 2013 | Alabama | Private | 4-year | Fees/Tuition | 13983 |
1 | 2013 | Alabama | Private | 4-year | Room/Board | 8503 |
2 | 2013 | Alabama | Public In-State | 2-year | Fees/Tuition | 4048 |
3 | 2013 | Alabama | Public In-State | 4-year | Fees/Tuition | 8073 |
4 | 2013 | Alabama | Public In-State | 4-year | Room/Board | 8473 |
# check for shape
us_data.shape
(3548, 6)
From above cell we see that the dataset contains 3548 observations and 6 columns
#Check info of each colummn
us_data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3548 entries, 0 to 3547 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Year 3548 non-null int64 1 State 3548 non-null object 2 Type 3548 non-null object 3 Length 3548 non-null object 4 Expense 3548 non-null object 5 Value 3548 non-null int64 dtypes: int64(2), object(4) memory usage: 166.4+ KB
From above cell we see that there are 4 object column and 2 integer
# Checking null values
us_data.isnull().sum()
Year 0 State 0 Type 0 Length 0 Expense 0 Value 0 dtype: int64
From above cell we see that there are no missing values in our dataset
# check for duplicate
us_data.duplicated().sum()
0
From above cell we see that there are no duplicates present in our dataset
Step -2 Data Analysis¶
Let's Check the Distribution of each column¶
us_data
Year | State | Type | Length | Expense | Value | |
---|---|---|---|---|---|---|
0 | 2013 | Alabama | Private | 4-year | Fees/Tuition | 13983 |
1 | 2013 | Alabama | Private | 4-year | Room/Board | 8503 |
2 | 2013 | Alabama | Public In-State | 2-year | Fees/Tuition | 4048 |
3 | 2013 | Alabama | Public In-State | 4-year | Fees/Tuition | 8073 |
4 | 2013 | Alabama | Public In-State | 4-year | Room/Board | 8473 |
... | ... | ... | ... | ... | ... | ... |
3543 | 2021 | Wyoming | Public In-State | 2-year | Fees/Tuition | 3987 |
3544 | 2021 | Wyoming | Public In-State | 4-year | Room/Board | 9799 |
3545 | 2021 | Wyoming | Public Out-of-State | 2-year | Fees/Tuition | 9820 |
3546 | 2021 | Wyoming | Public Out-of-State | 4-year | Fees/Tuition | 14710 |
3547 | 2021 | Wyoming | Public Out-of-State | 4-year | Room/Board | 9799 |
3548 rows × 6 columns
Year
¶
ax=sns.countplot(x='Year',data=us_data)
for label in ax.containers:
ax.bar_label(label)
plt.title("Various years available in the dataset")
plt.show()
From above bar plot we see that the year range is start from 2013 and end upto 2021
State
¶
plt.figure(figsize=(10,5))
ax=sns.countplot(x='State',data=us_data)
for label in ax.containers:
ax.bar_label(label)
plt.title("Various states available in the dataset")
plt.xticks(rotation=90)
plt.show()
Above plot indicates that there are various states in our data or all the states of US
Type
¶
plt.figure(figsize=(10,5))
ax=sns.countplot(x='Type',data=us_data)
for label in ax.containers:
ax.bar_label(label)
plt.title("Various Types of Universities in our data")
plt.xticks(rotation=45)
plt.show()
The distribution of universities across the United States reveals three distinct categories: "Private," "Public In-State," and "Public Out-of-State." Notably, the dataset indicates the prevalence of 905 "private" universities, 1296 "public in-state" institutions, and 1347 "public out-of-state" establishments throughout the country. This data underscores the diversity in the higher education landscape, with various institutions offering educational opportunities to students nationwide. The presence of a larger number of public universities, both in-state and out-of-state, suggests the significance of accessible and affordable education within the United States.
Length
¶
plt.figure(figsize=(10,5))
ax=sns.countplot(x='Length',data=us_data)
for label in ax.containers:
ax.bar_label(label)
plt.title("Types of program available")
plt.xticks(rotation=45)
plt.show()
The analysis of university programs in the United States highlights the prevalence of two primary categories: "4-year" and "2-year" programs. The dataset demonstrates a substantial presence of 2672 records affiliated with "4-year" programs and 876 observations associated with "2-year" programs.
Expense
¶
plt.figure(figsize=(10,5))
ax=sns.countplot(x='Expense',data=us_data)
for label in ax.containers:
ax.bar_label(label)
plt.title("Types of Expense")
plt.xticks(rotation=45)
plt.show()
The bar plot elucidates the distinct types of expenses incurred by students in the United States, namely "Fees/Tuition" and "Room/Board." Notably, the data reveals that there are 2198 records associated with "Fees/Tuition" expenses, while 1350 records correspond to "Room/Board" expenditures.
Value
¶
sns.histplot(us_data['Value'],kde=True)
plt.title("distribution of cost")
plt.show()
Let's ask some question from the data¶
How do the expenses differ between public and private universities in various states?¶
us_data
Year | State | Type | Length | Expense | Value | |
---|---|---|---|---|---|---|
0 | 2013 | Alabama | Private | 4-year | Fees/Tuition | 13983 |
1 | 2013 | Alabama | Private | 4-year | Room/Board | 8503 |
2 | 2013 | Alabama | Public In-State | 2-year | Fees/Tuition | 4048 |
3 | 2013 | Alabama | Public In-State | 4-year | Fees/Tuition | 8073 |
4 | 2013 | Alabama | Public In-State | 4-year | Room/Board | 8473 |
... | ... | ... | ... | ... | ... | ... |
3543 | 2021 | Wyoming | Public In-State | 2-year | Fees/Tuition | 3987 |
3544 | 2021 | Wyoming | Public In-State | 4-year | Room/Board | 9799 |
3545 | 2021 | Wyoming | Public Out-of-State | 2-year | Fees/Tuition | 9820 |
3546 | 2021 | Wyoming | Public Out-of-State | 4-year | Fees/Tuition | 14710 |
3547 | 2021 | Wyoming | Public Out-of-State | 4-year | Room/Board | 9799 |
3548 rows × 6 columns
plt.figure(figsize=(14, 5))
sns.barplot(x='State', y='Value', hue='Type', data=us_data[us_data.Expense=='Fees/Tuition'], ci=None)
plt.title('Average Expenses of Tuition fees for Different Types of Universities in Each State')
plt.xticks(rotation=45)
plt.xlabel('State')
plt.ylabel('Average Expense Value')
plt.show()
Comparison of Expenses between Private and Public Universities in Various States:
The bar plot provides a visual representation of the average expenses incurred by private and public universities, categorized by "Public In-State" and "Public Out-of-State," across different states in the US. The analysis reveals the following insights:
Tuition Fees Expense:
Private universities tend to have the highest tuition fees across all states, indicating that they are generally more expensive compared to public universities. "Public Out-of-State" universities exhibit higher tuition fees compared to "Public In-State" institutions, highlighting the potential cost disparity for non-residents. Notably, the expense order for the tuition fees is as follows: Private > Public Out-of-State > Public In-State.
plt.figure(figsize=(14, 5))
sns.barplot(x='State', y='Value', hue='Type', data=us_data[us_data.Expense=='Room/Board'], ci=None)
plt.title('Average Expenses of Room/Board fees for Different Types of Universities in Each State')
plt.xticks(rotation=45)
plt.xlabel('State')
plt.ylabel('Average Expense Value')
plt.show()
Comparison of Expenses between Private and Public Universities in Various States for Room/Board Fees:
The bar plot offers a clear representation of the average expenses incurred by private and public universities, classified into "Public In-State" and "Public Out-of-State," across different states in the US. The analysis yields the following insights:
Room/Board Fees Expense: Private universities tend to have the highest or equivalent room/board expenses across all states, suggesting that they are generally more expensive or equally as expensive as public universities. Notably, the data indicates that the expenses for "Public Out-of-State" and "Public In-State" universities are often comparable, suggesting that the cost of accommodation and meals might not significantly differ for in-state and out-of-state students in certain states. Overall, the expense pattern for room/board is as follows: Private >= Public Out-of-State == Public In-State.
What is the average expense for each type of program (2-year and 4-year) across different states?¶
plt.figure(figsize=(14, 5))
sns.barplot(x='State', y='Value', hue='Length', data=us_data,ci=None)
plt.title('Average Expenses of various types of program for Each State')
plt.xticks(rotation=45)
plt.xlabel('State')
plt.ylabel('Average Expense Value')
plt.show()
Based on the analysis, it appears that the average expense for a 4-year program is approximately twice the average expense for a 2-year program across the various states. This stark difference suggests that the length of the program significantly impacts the overall expenses for education in different states.
Which states have the highest and lowest expenses for tuition and room/board in 4-year programs?¶
tuition_expenses_4year = us_data[us_data['Length'] == '4-year'][us_data['Expense'] == 'Fees/Tuition']
highest_tuition_expenses_4year = tuition_expenses_4year.groupby('State')['Value'].max().sort_values(ascending=False).head(1)
lowest_tuition_expenses_4year = tuition_expenses_4year.groupby('State')['Value'].min().sort_values().head(1)
print("State That Have Higest Tution Expense for 4-year Program:")
print(highest_tuition_expenses_4year)
print("State That Have Lowest Tution Expense for 4-year Program:")
print(lowest_tuition_expenses_4year)
State That Have Higest Tution Expense for 4-year Program: State Massachusetts 49152 Name: Value, dtype: int64 State That Have Lowest Tution Expense for 4-year Program: State Wyoming 3642 Name: Value, dtype: int64
plt.figure(figsize=(14, 5))
sns.barplot(x='State', y='Value', data=us_data[(us_data['Length']=='4-year')&(us_data['Expense']=='Fees/Tuition')],ci=None)
plt.title('Average Expenses of Tuition fees for 4-year program for Each State')
plt.xticks(rotation=90)
plt.xlabel('State')
plt.ylabel('Average Expense Value')
plt.show()
By analyzing the data, we can observe that Massachusetts
and Vermont
are the highest-expense states for 4-year programs in terms of tuition fees. On the other hand, Wyoming
stands out as the state with the lowest expenses for 4-year programs regarding tuition fees. This finding emphasizes the significant variation in educational expenses across different states in the United States, highlighting the diverse financial landscapes that students encounter when pursuing higher education. It underscores the need for comprehensive financial planning and aid programs to address the challenges associated with varying tuition costs in different regions.
room_board_expenses_4year = us_data[us_data['Length'] == '4-year'][us_data['Expense'] == 'Room/Board']
highest_room_board_expenses_4year = room_board_expenses_4year.groupby('State')['Value'].max().sort_values(ascending=False).head(1)
lowest_room_board_expenses_4year = room_board_expenses_4year.groupby('State')['Value'].min().sort_values().head(1)
print("State That Have Higest Room/Board Expense for 4-year Program:")
print(highest_room_board_expenses_4year)
print("State That Have Lowest Room/Board Expense for 4-year Program:")
print(lowest_room_board_expenses_4year)
State That Have Higest Room/Board Expense for 4-year Program: State Nevada 21602 Name: Value, dtype: int64 State That Have Lowest Room/Board Expense for 4-year Program: State Idaho 4792 Name: Value, dtype: int64
plt.figure(figsize=(14, 5))
sns.barplot(x='State', y='Value', data=us_data[(us_data['Length']=='4-year')&(us_data['Expense']=='Room/Board')],ci=None)
plt.title('Average Expenses for Room/Board for 4-year program for Each State')
plt.xticks(rotation=90)
plt.xlabel('State')
plt.ylabel('Average Expense Value')
plt.show()
Based on the analysis, it is evident that the District of Columbia and Nevada are the states with the highest expenses for 4-year programs, specifically in the category of room/board fees. Conversely, Idaho emerges as the state with the lowest expenses for 4-year programs in terms of room/board fees. This comparison highlights the substantial disparities in living expenses and accommodations between different states, shedding light on the financial considerations that students and their families need to account for when planning for higher education. It underscores the significance of understanding regional cost variations and the potential impact on student living arrangements and overall education expenses.
How have the expenses for tuition and room/board changed over the years across different states?¶
expenses_by_year = us_data.groupby(['Year', 'State', 'Expense'])['Value'].mean().reset_index()
expenses_by_year
Year | State | Expense | Value | |
---|---|---|---|---|
0 | 2013 | Alabama | Fees/Tuition | 10844.000000 |
1 | 2013 | Alabama | Room/Board | 8483.000000 |
2 | 2013 | Alaska | Fees/Tuition | 10945.000000 |
3 | 2013 | Alaska | Room/Board | 9039.666667 |
4 | 2013 | Arizona | Fees/Tuition | 10451.400000 |
... | ... | ... | ... | ... |
913 | 2021 | West Virginia | Room/Board | 10670.000000 |
914 | 2021 | Wisconsin | Fees/Tuition | 18432.500000 |
915 | 2021 | Wisconsin | Room/Board | 9858.333333 |
916 | 2021 | Wyoming | Fees/Tuition | 9505.666667 |
917 | 2021 | Wyoming | Room/Board | 9799.000000 |
918 rows × 4 columns
plt.figure(figsize=(12, 8))
sns.lineplot(x='Year', y='Value', style='Expense', data=expenses_by_year)
plt.title('Changes in Expenses for Tuition and Room/Board Over the Years')
plt.xlabel('Year')
88plt.ylabel('Expense Value')
plt.show()
Certainly! The line plot vividly illustrates an upward trajectory in the expense value across the years, highlighting the increasing costs associated with both tuition and room/board fees. This consistent upward trend emphasizes the financial challenges associated with pursuing higher education in the United States. It suggests that the cost of attending universities has been steadily rising, which can have significant implications for students, families, and policymakers.
This trend underscores the need for proactive measures to address the affordability and accessibility of higher education. It calls for the development of effective financial aid programs, scholarship opportunities, and policies aimed at mitigating the financial burden on students. Moreover, it emphasizes the importance of strategic planning and budget allocation by educational institutions to ensure the sustainability and affordability of higher education for future generations.
Can we identify any trends in the expenses between different types of universities (private and public) for both 2-year and 4-year programs?¶
expenses_by_type = us_data.groupby(['Type', 'Length', 'Expense'])['Value'].mean().reset_index()
expenses_by_type
Type | Length | Expense | Value | |
---|---|---|---|---|
0 | Private | 4-year | Fees/Tuition | 26662.002198 |
1 | Private | 4-year | Room/Board | 10871.720000 |
2 | Public In-State | 2-year | Fees/Tuition | 3809.910959 |
3 | Public In-State | 4-year | Fees/Tuition | 9010.340686 |
4 | Public In-State | 4-year | Room/Board | 10231.495556 |
5 | Public Out-of-State | 2-year | Fees/Tuition | 8404.132420 |
6 | Public Out-of-State | 4-year | Fees/Tuition | 23887.908497 |
7 | Public Out-of-State | 4-year | Room/Board | 10231.495556 |
plt.figure(figsize=(10, 6))
sns.barplot(x='Type', y='Value', hue='Expense', data=expenses_by_type[expenses_by_type['Length']=='4-year'],ci=None)
plt.title('Expenses for Private and Public Universities for 4-Year Programs')
plt.xlabel('University Type')
plt.ylabel('Expense Value')
plt.show()
The bar plot emphasizes the cost discrepancies between different types of universities, specifically in terms of tuition fees and room/board fees for 4-year programs. Notably, for private and public out-of-state university types, the tuition fees are approximately double the room/board fees. Conversely, the bar plot showcases that for public in-state university types, the room/board fees surpass the tuition fees, indicating a substantial variation in expenses. This clear illustration of the financial disparities can aid prospective students and their families in making informed decisions regarding their higher education journey and associated costs.
plt.figure(figsize=(10, 6))
sns.barplot(x='Type', y='Value', hue='Expense', data=expenses_by_type[expenses_by_type['Length']=='2-year'],ci=None)
plt.title('Expenses for Private and Public Universities for 2-Year Programs')
plt.xlabel('University Type')
plt.ylabel('Expense Value')
plt.show()
The bar plot highlights that the tuition fees expense for the Public Out-of-State university type is nearly twice as much as that of the Public In-State university type for the 2-year program. This significant variation in expenses between the two categories can provide valuable insights for students and their families when considering educational options and associated costs.
Recommendation and Conclusions -¶
Recommendations:
Financial Aid Programs: Implement comprehensive financial aid programs and scholarships to support students, particularly in states with high educational expenses.
Cost-Effective Solutions: Foster partnerships with local businesses and organizations to provide cost-effective living arrangements for students, such as affordable housing and meal plans.
Educational Awareness Campaigns: Launch educational awareness campaigns to inform students and families about the long-term financial implications of pursuing higher education, emphasizing the importance of financial planning and budget management.
Policy Reforms: Advocate for policy reforms at the state and federal levels to address the escalating expenses associated with tuition and room/board fees, ensuring the accessibility and affordability of education for all students.
Collaborative Initiatives: Foster collaborative initiatives between educational institutions and government bodies to promote transparent financial policies and cost-effective strategies for students and their families.
Conclusions:
Financial Disparities: The analysis reveals significant financial disparities between private and public universities, highlighting the need for comprehensive financial aid programs and policies to support students in managing educational expenses.
Cost Variations Across States: The report highlights substantial variations in educational expenses across different states, underscoring the importance of regional financial planning and aid programs to address the diverse financial landscapes students encounter.
Increasing Educational Costs: The consistent upward trend in educational expenses over the years indicates the pressing need for proactive measures and policy reforms to mitigate the financial burden on students and their families.
Strategic Financial Planning: The findings underscore the significance of strategic financial planning by educational institutions to ensure the sustainability and affordability of higher education for future generations.
Promoting Accessibility and Affordability: The analysis emphasizes the critical role of collaborative initiatives and policy reforms in promoting accessibility and affordability of education, ensuring equal opportunities for all students, regardless of their financial backgrounds.
Questions -¶
- How do the expenses differ between public and private universities in various states?
- What is the average expense for each type of program (2-year and 4-year) across different states?
- Which states have the highest and lowest expenses for tuition and room/board in 4-year programs?
- How have the expenses for tuition and room/board changed over the years across different states?
- Can we identify any trends in the expenses between different types of universities (private and public) for both 2-year and 4-year programs?
Practice Questions -¶
- How do the trends in university expenses in the US compare with global educational cost trends?
- What are the potential factors contributing to the increasing expenses in 4-year university programs in specific states?
- How does the student enrollment rate in different types of universities (private and public) impact the overall educational expenses in various states?
- Can we identify any correlations between the overall state budget allocation for education and the expenses incurred by students in public universities?
- What are the long-term implications of the financial burden on students and their families in terms of student loan debts and financial stability post-graduation?