Data Analysis Project by Anushka Sharma, Team edSlash.
This dataset captures detailed information about customer transactions, providing a comprehensive view of purchasing behavior across various demographics, regions, and product categories. It includes features such as customer demographics (age, gender, location), product details (item purchased, category, size, color, and season), and transaction specifics (purchase amount, payment method, shipping type, and discounts applied).
👉Ignore it if already installed
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv(r"C:\Users\harsh\Downloads\shopping_trends.csv")
df.head()
Customer ID | Age | Gender | Item Purchased | Category | Purchase Amount (USD) | Location | Size | Color | Season | Review Rating | Subscription Status | Payment Method | Shipping Type | Discount Applied | Promo Code Used | Previous Purchases | Preferred Payment Method | Frequency of Purchases | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 55 | Male | Blouse | Clothing | 53 | Kentucky | L | Gray | Winter | 3.1 | Yes | Credit Card | Express | Yes | Yes | 14 | Venmo | Fortnightly |
1 | 2 | 19 | Male | Sweater | Clothing | 64 | Maine | L | Maroon | Winter | 3.1 | Yes | Bank Transfer | Express | Yes | Yes | 2 | Cash | Fortnightly |
2 | 3 | 50 | Male | Jeans | Clothing | 73 | Massachusetts | S | Maroon | Spring | 3.1 | Yes | Cash | Free Shipping | Yes | Yes | 23 | Credit Card | Weekly |
3 | 4 | 21 | Male | Sandals | Footwear | 90 | Rhode Island | M | Maroon | Spring | 3.5 | Yes | PayPal | Next Day Air | Yes | Yes | 49 | PayPal | Weekly |
4 | 5 | 45 | Male | Blouse | Clothing | 49 | Oregon | M | Turquoise | Spring | 2.7 | Yes | Cash | Free Shipping | Yes | Yes | 31 | PayPal | Annually |
df.describe()
Customer ID | Age | Purchase Amount (USD) | Review Rating | Previous Purchases | |
---|---|---|---|---|---|
count | 3900.000000 | 3900.000000 | 3900.000000 | 3900.000000 | 3900.000000 |
mean | 1950.500000 | 44.068462 | 59.764359 | 3.749949 | 25.351538 |
std | 1125.977353 | 15.207589 | 23.685392 | 0.716223 | 14.447125 |
min | 1.000000 | 18.000000 | 20.000000 | 2.500000 | 1.000000 |
25% | 975.750000 | 31.000000 | 39.000000 | 3.100000 | 13.000000 |
50% | 1950.500000 | 44.000000 | 60.000000 | 3.700000 | 25.000000 |
75% | 2925.250000 | 57.000000 | 81.000000 | 4.400000 | 38.000000 |
max | 3900.000000 | 70.000000 | 100.000000 | 5.000000 | 50.000000 |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3900 entries, 0 to 3899 Data columns (total 19 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Customer ID 3900 non-null int64 1 Age 3900 non-null int64 2 Gender 3900 non-null object 3 Item Purchased 3900 non-null object 4 Category 3900 non-null object 5 Purchase Amount (USD) 3900 non-null int64 6 Location 3900 non-null object 7 Size 3900 non-null object 8 Color 3900 non-null object 9 Season 3900 non-null object 10 Review Rating 3900 non-null float64 11 Subscription Status 3900 non-null object 12 Payment Method 3900 non-null object 13 Shipping Type 3900 non-null object 14 Discount Applied 3900 non-null object 15 Promo Code Used 3900 non-null object 16 Previous Purchases 3900 non-null int64 17 Preferred Payment Method 3900 non-null object 18 Frequency of Purchases 3900 non-null object dtypes: float64(1), int64(4), object(14) memory usage: 579.0+ KB
we have non null values
df.dtypes
Customer ID int64 Age int64 Gender object Item Purchased object Category object Purchase Amount (USD) int64 Location object Size object Color object Season object Review Rating float64 Subscription Status object Payment Method object Shipping Type object Discount Applied object Promo Code Used object Previous Purchases int64 Preferred Payment Method object Frequency of Purchases object dtype: object
sns.displot(x = df.Age, kde=True , bins= 28)
D:\Installed Softwares\Anaconda\Lib\site-packages\seaborn\axisgrid.py:118: UserWarning: The figure layout has changed to tight self._figure.tight_layout(*args, **kwargs)
<seaborn.axisgrid.FacetGrid at 0x16eeeb06b50>
def agegrp(data):
if data < 20 :
return "below 20 "
elif (data >= 20) & (data < 30):
return "20-30"
elif (data >= 30) & (data < 40):
return "30-40"
elif (data >= 40) & (data < 50):
return "40-50"
elif (data >= 50) & (data < 60):
return "50-60"
elif (data >= 60) & (data <= 70):
return "60-70"
else:
return np.nan
d1=df.copy()
d1["age group"]= d1.Age.apply(agegrp)
d1
Customer ID | Age | Gender | Item Purchased | Category | Purchase Amount (USD) | Location | Size | Color | Season | Review Rating | Subscription Status | Payment Method | Shipping Type | Discount Applied | Promo Code Used | Previous Purchases | Preferred Payment Method | Frequency of Purchases | age group | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 55 | Male | Blouse | Clothing | 53 | Kentucky | L | Gray | Winter | 3.1 | Yes | Credit Card | Express | Yes | Yes | 14 | Venmo | Fortnightly | 50-60 |
1 | 2 | 19 | Male | Sweater | Clothing | 64 | Maine | L | Maroon | Winter | 3.1 | Yes | Bank Transfer | Express | Yes | Yes | 2 | Cash | Fortnightly | below 20 |
2 | 3 | 50 | Male | Jeans | Clothing | 73 | Massachusetts | S | Maroon | Spring | 3.1 | Yes | Cash | Free Shipping | Yes | Yes | 23 | Credit Card | Weekly | 50-60 |
3 | 4 | 21 | Male | Sandals | Footwear | 90 | Rhode Island | M | Maroon | Spring | 3.5 | Yes | PayPal | Next Day Air | Yes | Yes | 49 | PayPal | Weekly | 20-30 |
4 | 5 | 45 | Male | Blouse | Clothing | 49 | Oregon | M | Turquoise | Spring | 2.7 | Yes | Cash | Free Shipping | Yes | Yes | 31 | PayPal | Annually | 40-50 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
3895 | 3896 | 40 | Female | Hoodie | Clothing | 28 | Virginia | L | Turquoise | Summer | 4.2 | No | Cash | 2-Day Shipping | No | No | 32 | Venmo | Weekly | 40-50 |
3896 | 3897 | 52 | Female | Backpack | Accessories | 49 | Iowa | L | White | Spring | 4.5 | No | PayPal | Store Pickup | No | No | 41 | Bank Transfer | Bi-Weekly | 50-60 |
3897 | 3898 | 46 | Female | Belt | Accessories | 33 | New Jersey | L | Green | Spring | 2.9 | No | Credit Card | Standard | No | No | 24 | Venmo | Quarterly | 40-50 |
3898 | 3899 | 44 | Female | Shoes | Footwear | 77 | Minnesota | S | Brown | Summer | 3.8 | No | PayPal | Express | No | No | 24 | Venmo | Weekly | 40-50 |
3899 | 3900 | 52 | Female | Handbag | Accessories | 81 | California | M | Beige | Spring | 3.1 | No | Bank Transfer | Store Pickup | No | No | 33 | Venmo | Quarterly | 50-60 |
3900 rows × 20 columns
sns.barplot(x= d1["age group"], y=d1.index)
<Axes: xlabel='age group'>
no significant change in shopping pattern of each age group. thus people of all age shopped regularly .
sns.countplot(x=d1.Gender)
<Axes: xlabel='Gender', ylabel='count'>
we see that males have purchased more than females.
plt.figure(figsize=(10,6))
sns.countplot(x=d1.Location)
plt.xticks(rotation=90)
plt.show()
We can see highest purchasing from "MONTANA" and lowest purchasing from Rhode Island
sns.countplot(x=d1.Category, hue=d1.Gender)
<Axes: xlabel='Category', ylabel='count'>
highest shopped category is "Clothing"
meanprice=d1.groupby("Category")["Purchase Amount (USD)"].mean()
meanprice
Category Accessories 59.838710 Clothing 60.025331 Footwear 60.255426 Outerwear 57.172840 Name: Purchase Amount (USD), dtype: float64
totalpurchase=d1.groupby("Category")["Purchase Amount (USD)"].sum()
totalpurchase
Category Accessories 74200 Clothing 104264 Footwear 36093 Outerwear 18524 Name: Purchase Amount (USD), dtype: int64
from above, we can conclude that highest shopped category is "Clothing" with total sales of 104264 USD. but the mean sales of "footware" category is high which means more expensive items from footware category were sold with mean being 60 USD.
sns.countplot(x=d1["Frequency of Purchases"])
plt.xticks(rotation=35)
plt.show()