About DatasetĀ¶
The dataset contains information about a group of test subjects and their sleep patterns. Each test subject is identified by a unique "Subject ID" and their age and gender are also recorded. The "Bedtime" and "Wakeup time" features indicate when each subject goes to bed and wakes up each day, and the "Sleep duration" feature records the total amount of time each subject slept in hours. The "Sleep efficiency" feature is a measure of the proportion of time spent in bed that is actually spent asleep. The "REM sleep percentage", "Deep sleep percentage", and "Light sleep percentage" features indicate the amount of time each subject spent in each stage of sleep. The "Awakenings" feature records the number of times each subject wakes up during the night. Additionally, the dataset includes information about each subject's caffeine and alcohol consumption in the 24 hours prior to bedtime, their smoking status, and their exercise frequency.
Dataset Link - Kaggle link
Data Dictionary -Ā¶
ID
: Unique identifier for each test subject.Age
: The age of each test subject in years.Gender
: The gender of each test subject, categorized as either male or female.Bedtime
: The time at which each test subject goes to bed each night.Wakeup time
: The time at which each test subject wakes up each morning.Sleep duration
: The total amount of time each test subject sleeps in hours.Sleep efficiency
: A measure of the proportion of time spent in bed that is actually spent asleep.REM sleep percentage
: The percentage of time spent in rapid eye movement (REM) sleep.Deep sleep percentage
: The percentage of time spent in deep sleep.Light sleep percentage
: The percentage of time spent in light sleep.Awakenings
: The number of times each test subject wakes up during the night.Caffeine consumption
: The amount of caffeine consumed by each subject in the 24 hours prior to bedtime.Alcohol consumption
: The amount of alcohol consumed by each subject in the 24 hours prior to bedtime.Smoking status
: Indicates whether each subject is a smoker or non-smoker.Exercise frequency
: The frequency with which each subject engages in exercise in a week.
Installing dependencyĀ¶
šIgnore It if already installed
1. !pip install numpy
2. !pip install pandas
3. !pip install matplotlib
4. !pip install seaborn
step -1 Data Preprocessing and CleaningĀ¶
Importing Required libraryĀ¶
# perform linear operations
import numpy as np
# Data manipulation
import pandas as pd
#Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Remove warnings
import warnings
warnings.filterwarnings('ignore')
#Load the dataset
data=pd.read_csv(r"C:\Users\Lenovo\Downloads\content\Sleep Efficiency Analysis\Sleep_Efficiency.csv")
# Print top 5 rows
data.head()
ID | Age | Gender | Bedtime | Wakeup time | Sleep duration | Sleep efficiency | REM sleep percentage | Deep sleep percentage | Light sleep percentage | Awakenings | Caffeine consumption | Alcohol consumption | Smoking status | Exercise frequency | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 65 | Female | 2021-03-06 01:00:00 | 2021-03-06 07:00:00 | 6.0 | 0.88 | 18 | 70 | 12 | 0.0 | 0.0 | 0.0 | Yes | 3.0 |
1 | 2 | 69 | Male | 2021-12-05 02:00:00 | 2021-12-05 09:00:00 | 7.0 | 0.66 | 19 | 28 | 53 | 3.0 | 0.0 | 3.0 | Yes | 3.0 |
2 | 3 | 40 | Female | 2021-05-25 21:30:00 | 2021-05-25 05:30:00 | 8.0 | 0.89 | 20 | 70 | 10 | 1.0 | 0.0 | 0.0 | No | 3.0 |
3 | 4 | 40 | Female | 2021-11-03 02:30:00 | 2021-11-03 08:30:00 | 6.0 | 0.51 | 23 | 25 | 52 | 3.0 | 50.0 | 5.0 | Yes | 1.0 |
4 | 5 | 57 | Male | 2021-03-13 01:00:00 | 2021-03-13 09:00:00 | 8.0 | 0.76 | 27 | 55 | 18 | 3.0 | 0.0 | 3.0 | No | 3.0 |
# check for shape
data.shape
(452, 15)
From above cell we see that the dataset is contains 452 observations and 15 columns
#Check info of each colummn
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 452 entries, 0 to 451 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 452 non-null int64 1 Age 452 non-null int64 2 Gender 452 non-null object 3 Bedtime 452 non-null object 4 Wakeup time 452 non-null object 5 Sleep duration 452 non-null float64 6 Sleep efficiency 452 non-null float64 7 REM sleep percentage 452 non-null int64 8 Deep sleep percentage 452 non-null int64 9 Light sleep percentage 452 non-null int64 10 Awakenings 432 non-null float64 11 Caffeine consumption 427 non-null float64 12 Alcohol consumption 438 non-null float64 13 Smoking status 452 non-null object 14 Exercise frequency 446 non-null float64 dtypes: float64(6), int64(5), object(4) memory usage: 53.1+ KB
From above cell we see that there are 4 object column and 5 integer and 6 column contain float values
# Checking null values
data.isnull().sum()
ID 0 Age 0 Gender 0 Bedtime 0 Wakeup time 0 Sleep duration 0 Sleep efficiency 0 REM sleep percentage 0 Deep sleep percentage 0 Light sleep percentage 0 Awakenings 20 Caffeine consumption 25 Alcohol consumption 14 Smoking status 0 Exercise frequency 6 dtype: int64
From above cell we see that there are some missing values in our data So we can either drop the missing values or we can fill those
Let's fill the missing values in Awakenings column
data.Awakenings.value_counts()
1.0 154 0.0 95 3.0 63 4.0 63 2.0 57 Name: Awakenings, dtype: int64
awakenings_frequent_category=data.Awakenings.mode()
awakenings_frequent_category
0 1.0 Name: Awakenings, dtype: float64
data.Awakenings.fillna(1,inplace=True)
data.Awakenings.isna().sum()
0
Fill the missing value in caffeine consumption
data['Caffeine consumption'].value_counts()
0.0 211 50.0 107 25.0 79 75.0 25 200.0 4 100.0 1 Name: Caffeine consumption, dtype: int64
caffeine_consumption_frequent_category=data['Caffeine consumption'].mode()
caffeine_consumption_frequent_category
0 0.0 Name: Caffeine consumption, dtype: float64
data['Caffeine consumption'].fillna(0,inplace=True)
data['Caffeine consumption'].isnull().sum()
0
Fill the missing values in alcohol consumption columns
data['Alcohol consumption'].isnull().sum()
14
data['Alcohol consumption'].value_counts()
0.0 246 1.0 54 3.0 48 2.0 37 5.0 30 4.0 23 Name: Alcohol consumption, dtype: int64
alcohol_consumption_frequent_category=data['Alcohol consumption'].mode()
alcohol_consumption_frequent_category
0 0.0 Name: Alcohol consumption, dtype: float64
data['Alcohol consumption'].fillna(0,inplace=True)
data['Alcohol consumption'].isnull().sum()
0
Fill the missing values in Exercise frequency column
data['Exercise frequency'].isnull().sum()
6
data['Exercise frequency'].value_counts()
3.0 130 0.0 116 1.0 97 2.0 54 4.0 41 5.0 8 Name: Exercise frequency, dtype: int64
exercise_frequency_frequent_category=data['Exercise frequency'].mode()
exercise_frequency_frequent_category
0 3.0 Name: Exercise frequency, dtype: float64
data['Exercise frequency'].fillna(3,inplace=True)
data['Exercise frequency'].isnull().sum()
0
# check for duplicate
data.duplicated().sum()
0
From above cell we see that there are no duplicates present in our dataset
data
ID | Age | Gender | Bedtime | Wakeup time | Sleep duration | Sleep efficiency | REM sleep percentage | Deep sleep percentage | Light sleep percentage | Awakenings | Caffeine consumption | Alcohol consumption | Smoking status | Exercise frequency | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 65 | Female | 2021-03-06 01:00:00 | 2021-03-06 07:00:00 | 6.0 | 0.88 | 18 | 70 | 12 | 0.0 | 0.0 | 0.0 | Yes | 3.0 |
1 | 2 | 69 | Male | 2021-12-05 02:00:00 | 2021-12-05 09:00:00 | 7.0 | 0.66 | 19 | 28 | 53 | 3.0 | 0.0 | 3.0 | Yes | 3.0 |
2 | 3 | 40 | Female | 2021-05-25 21:30:00 | 2021-05-25 05:30:00 | 8.0 | 0.89 | 20 | 70 | 10 | 1.0 | 0.0 | 0.0 | No | 3.0 |
3 | 4 | 40 | Female | 2021-11-03 02:30:00 | 2021-11-03 08:30:00 | 6.0 | 0.51 | 23 | 25 | 52 | 3.0 | 50.0 | 5.0 | Yes | 1.0 |
4 | 5 | 57 | Male | 2021-03-13 01:00:00 | 2021-03-13 09:00:00 | 8.0 | 0.76 | 27 | 55 | 18 | 3.0 | 0.0 | 3.0 | No | 3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 448 | 27 | Female | 2021-11-13 22:00:00 | 2021-11-13 05:30:00 | 7.5 | 0.91 | 22 | 57 | 21 | 0.0 | 0.0 | 0.0 | No | 5.0 |
448 | 449 | 52 | Male | 2021-03-31 21:00:00 | 2021-03-31 03:00:00 | 6.0 | 0.74 | 28 | 57 | 15 | 4.0 | 25.0 | 0.0 | No | 3.0 |
449 | 450 | 40 | Female | 2021-09-07 23:00:00 | 2021-09-07 07:30:00 | 8.5 | 0.55 | 20 | 32 | 48 | 1.0 | 0.0 | 3.0 | Yes | 0.0 |
450 | 451 | 45 | Male | 2021-07-29 21:00:00 | 2021-07-29 04:00:00 | 7.0 | 0.76 | 18 | 72 | 10 | 3.0 | 0.0 | 0.0 | No | 3.0 |
451 | 452 | 18 | Male | 2021-03-17 02:30:00 | 2021-03-17 10:00:00 | 7.5 | 0.63 | 22 | 23 | 55 | 1.0 | 50.0 | 0.0 | No | 1.0 |
452 rows Ć 15 columns
Convert the Bedtime
and Wakeup time
column to datetime format
data['Bedtime']=pd.to_datetime(data['Bedtime'])
data['Wakeup time']=pd.to_datetime(data['Wakeup time'])
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 452 entries, 0 to 451 Data columns (total 15 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 ID 452 non-null int64 1 Age 452 non-null int64 2 Gender 452 non-null object 3 Bedtime 452 non-null datetime64[ns] 4 Wakeup time 452 non-null datetime64[ns] 5 Sleep duration 452 non-null float64 6 Sleep efficiency 452 non-null float64 7 REM sleep percentage 452 non-null int64 8 Deep sleep percentage 452 non-null int64 9 Light sleep percentage 452 non-null int64 10 Awakenings 452 non-null float64 11 Caffeine consumption 452 non-null float64 12 Alcohol consumption 452 non-null float64 13 Smoking status 452 non-null object 14 Exercise frequency 452 non-null float64 dtypes: datetime64[ns](2), float64(6), int64(5), object(2) memory usage: 53.1+ KB
Step 2- Data AnalysisĀ¶
data
ID | Age | Gender | Bedtime | Wakeup time | Sleep duration | Sleep efficiency | REM sleep percentage | Deep sleep percentage | Light sleep percentage | Awakenings | Caffeine consumption | Alcohol consumption | Smoking status | Exercise frequency | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 65 | Female | 2021-03-06 01:00:00 | 2021-03-06 07:00:00 | 6.0 | 0.88 | 18 | 70 | 12 | 0.0 | 0.0 | 0.0 | Yes | 3.0 |
1 | 2 | 69 | Male | 2021-12-05 02:00:00 | 2021-12-05 09:00:00 | 7.0 | 0.66 | 19 | 28 | 53 | 3.0 | 0.0 | 3.0 | Yes | 3.0 |
2 | 3 | 40 | Female | 2021-05-25 21:30:00 | 2021-05-25 05:30:00 | 8.0 | 0.89 | 20 | 70 | 10 | 1.0 | 0.0 | 0.0 | No | 3.0 |
3 | 4 | 40 | Female | 2021-11-03 02:30:00 | 2021-11-03 08:30:00 | 6.0 | 0.51 | 23 | 25 | 52 | 3.0 | 50.0 | 5.0 | Yes | 1.0 |
4 | 5 | 57 | Male | 2021-03-13 01:00:00 | 2021-03-13 09:00:00 | 8.0 | 0.76 | 27 | 55 | 18 | 3.0 | 0.0 | 3.0 | No | 3.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 448 | 27 | Female | 2021-11-13 22:00:00 | 2021-11-13 05:30:00 | 7.5 | 0.91 | 22 | 57 | 21 | 0.0 | 0.0 | 0.0 | No | 5.0 |
448 | 449 | 52 | Male | 2021-03-31 21:00:00 | 2021-03-31 03:00:00 | 6.0 | 0.74 | 28 | 57 | 15 | 4.0 | 25.0 | 0.0 | No | 3.0 |
449 | 450 | 40 | Female | 2021-09-07 23:00:00 | 2021-09-07 07:30:00 | 8.5 | 0.55 | 20 | 32 | 48 | 1.0 | 0.0 | 3.0 | Yes | 0.0 |
450 | 451 | 45 | Male | 2021-07-29 21:00:00 | 2021-07-29 04:00:00 | 7.0 | 0.76 | 18 | 72 | 10 | 3.0 | 0.0 | 0.0 | No | 3.0 |
451 | 452 | 18 | Male | 2021-03-17 02:30:00 | 2021-03-17 10:00:00 | 7.5 | 0.63 | 22 | 23 | 55 | 1.0 | 50.0 | 0.0 | No | 1.0 |
452 rows Ć 15 columns
How does sleep duration vary between different age groups and genders?
def agegroup(x):
if x > 0 and x <= 12:
return 'Kid'
elif x > 12 and x <= 18:
return 'Teenager'
elif x > 18 and x <= 30:
return 'Young Adult'
elif x > 30 and x <= 40:
return 'Adult'
elif x > 40 and x <= 60:
return 'Middle'
else:
return 'Senior'
data['Agegroup'] = data['Age'].apply(agegroup)
plt.figure(figsize=(10, 6))
sns.boxplot(x="Agegroup", y="Sleep duration", hue="Gender", data=data)
plt.title('Sleep Duration Variation by Age Group and Gender')
plt.xlabel('Age Group')
plt.ylabel('Sleep Duration (hours)')
plt.show()
Sleep Duration Across Different Age Groups and Genders:
The data reveals insightful trends about sleep duration among various age groups and genders.
Senior Citizens: On average, both males and females in this age group sleep for approximately 7.5 hours. Notably, there appears to be one male individual who sleeps for only 5 hours.
Adults: The average sleep duration for adults is also approximately 7.5 hours, with most individuals experiencing a sleep duration between 6 and 9 hours. One female adult also demonstrates a sleep duration of only 5 hours.
Middle-Aged Adults: Similar to other age groups, the average sleep duration for females falls within the range of 6 to 9 hours and males falls within the range of 5.5 to 9 hours .
Young Adults: Sleep duration for this group also shows a range of 6 to 9 hours, with a comparable pattern between male and female subjects.
Kids: The data indicates that children generally sleep for 7 to 9 hours, with the average sleep duration for kids estimated to be 8.7 hours.
Teenagers: Similarly, teenagers exhibit sleep durations in the range of 7 to 9 hours, with an average sleep duration of 8.7 hours as well for female.
Is there a relationship between sleep efficiency and the amount of REM sleep experienced by the test subjects?
plt.figure(figsize=(8, 6))
plt.scatter(data['Sleep efficiency'], data['REM sleep percentage'], alpha=0.5)
plt.title('Relationship between Sleep Efficiency and REM Sleep')
plt.xlabel('Sleep Efficiency')
plt.ylabel('REM Sleep Percentage')
plt.show()
# Calculate the correlation coefficient
correlation_coef = np.corrcoef(data['Sleep efficiency'],data['REM sleep percentage'] )[0, 1]
print("Correlation Coefficient:", correlation_coef)
Correlation Coefficient: 0.062362454433546856
The correlation coefficient of 0.062 suggests a weak positive correlation between sleep efficiency and the percentage of REM sleep experienced by the test subjects. While there is a positive association between the two variables, it is relatively weak, indicating that changes in one variable do not consistently predict proportional changes in the other.
What is the average bedtime and wakeup time for different age groups and genders?
data['bedtime_hours']=data['Bedtime'].dt.hour
data
def change_bedtime(x):
if x==0:
return 12
elif x<12:
return x
elif x>12:
return x-12
data['bedtime_hours']=data['bedtime_hours'].apply(change_bedtime)
data['bedtime_hours'].value_counts()
12 110 10 83 9 73 1 67 2 64 11 55 Name: bedtime_hours, dtype: int64
plt.figure(figsize=(10, 6))
sns.boxplot(x="Agegroup", y="bedtime_hours", hue="Gender", data=data)
plt.title('Average bed time by Age Group and Gender')
plt.xlabel('Age Group')
plt.ylabel('Bed time hours')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()
data['wakeuptime_hours']=data['Wakeup time'].dt.hour
data
ID | Age | Gender | Bedtime | Wakeup time | Sleep duration | Sleep efficiency | REM sleep percentage | Deep sleep percentage | Light sleep percentage | Awakenings | Caffeine consumption | Alcohol consumption | Smoking status | Exercise frequency | Agegroup | bedtime_hours | wakeuptime_hours | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 65 | Female | 2021-03-06 01:00:00 | 2021-03-06 07:00:00 | 6.0 | 0.88 | 18 | 70 | 12 | 0.0 | 0.0 | 0.0 | Yes | 3.0 | Senior | 1 | 7 |
1 | 2 | 69 | Male | 2021-12-05 02:00:00 | 2021-12-05 09:00:00 | 7.0 | 0.66 | 19 | 28 | 53 | 3.0 | 0.0 | 3.0 | Yes | 3.0 | Senior | 2 | 9 |
2 | 3 | 40 | Female | 2021-05-25 21:30:00 | 2021-05-25 05:30:00 | 8.0 | 0.89 | 20 | 70 | 10 | 1.0 | 0.0 | 0.0 | No | 3.0 | Adult | 9 | 5 |
3 | 4 | 40 | Female | 2021-11-03 02:30:00 | 2021-11-03 08:30:00 | 6.0 | 0.51 | 23 | 25 | 52 | 3.0 | 50.0 | 5.0 | Yes | 1.0 | Adult | 2 | 8 |
4 | 5 | 57 | Male | 2021-03-13 01:00:00 | 2021-03-13 09:00:00 | 8.0 | 0.76 | 27 | 55 | 18 | 3.0 | 0.0 | 3.0 | No | 3.0 | Middle | 1 | 9 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
447 | 448 | 27 | Female | 2021-11-13 22:00:00 | 2021-11-13 05:30:00 | 7.5 | 0.91 | 22 | 57 | 21 | 0.0 | 0.0 | 0.0 | No | 5.0 | Young Adult | 10 | 5 |
448 | 449 | 52 | Male | 2021-03-31 21:00:00 | 2021-03-31 03:00:00 | 6.0 | 0.74 | 28 | 57 | 15 | 4.0 | 25.0 | 0.0 | No | 3.0 | Middle | 9 | 3 |
449 | 450 | 40 | Female | 2021-09-07 23:00:00 | 2021-09-07 07:30:00 | 8.5 | 0.55 | 20 | 32 | 48 | 1.0 | 0.0 | 3.0 | Yes | 0.0 | Adult | 11 | 7 |
450 | 451 | 45 | Male | 2021-07-29 21:00:00 | 2021-07-29 04:00:00 | 7.0 | 0.76 | 18 | 72 | 10 | 3.0 | 0.0 | 0.0 | No | 3.0 | Middle | 9 | 4 |
451 | 452 | 18 | Male | 2021-03-17 02:30:00 | 2021-03-17 10:00:00 | 7.5 | 0.63 | 22 | 23 | 55 | 1.0 | 50.0 | 0.0 | No | 1.0 | Teenager | 2 | 10 |
452 rows Ć 18 columns
plt.figure(figsize=(10, 6))
sns.boxplot(x="Agegroup", y="wakeuptime_hours", hue="Gender", data=data)
plt.title('Average Wake up time by Age Group and Gender')
plt.xlabel('Age Group')
plt.ylabel('Wake Up time hours')
plt.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()
Waking Times Across Different Age Groups and Genders:
Senior Citizens: Most individuals in this age group tend to wake up between 8 and 9 AM, suggesting a preference for later wake-up times.
Adults: Both males and females in the adult age group typically wake up between 5 AM and 8:30 AM, highlighting a moderate variation in wake-up times between genders.
Middle-Aged Adults: The wake-up times for middle-aged individuals range from 5:30 AM to 8:30 AM, indicating a broader range compared to other groups.
Young Adults: The data suggests that females in this age group wake up between 5:30 AM and 8 AM, whereas males tend to awaken between 6 AM and 8 AM, reflecting a relatively consistent pattern across genders.
Kids: Female children generally wake up between 8 AM and 10 AM, showcasing a tendency for later wake-up times compared to other groups.
Teenagers: Male teenagers typically wake up between 8 AM and 10 AM, suggesting a preference for later mornings in this group.
Do individuals with higher caffeine consumption experience more frequent awakenings during the night?
sns.countplot(x=data['Caffeine consumption'],hue=data['Awakenings'])
plt.title('Effect of Caffeine consumption on Awakenings during night')
plt.show()
Effect of Caffeine Consumption on Nighttime Awakenings:
The count plot reveals a surprising trend regarding the relationship between caffeine consumption and the frequency of awakenings during the night. Regardless of the amount of caffeine consumed, the majority of individuals experience a consistent number of nighttime awakenings, most frequently occurring only once. This suggests that there might not be a substantial link between caffeine intake and the frequency of nighttime disturbances. This finding could prompt further investigation into the potential factors influencing sleep quality and disturbances, outside of caffeine consumption.
How does exercise frequency impact the overall sleep quality and duration of the test subjects?
sns.barplot(x='Exercise frequency', y='Sleep duration', data=data)
plt.title('Exercise Frequency vs. Sleep Duration')
plt.show()