About Dataset¶
The "vgsales.csv" dataset, or Video Game Sales Dataset, offers insights into video game sales. Analysis tasks include data cleaning, genre and platform analysis, regional sales examination, identifying top games and publishers, time trend analysis, correlations, predictive modeling, and sentiment analysis. The dataset allows for understanding market trends, popular genres, and the impact of factors like critic scores on sales, making it valuable for the gaming industry.
Data Dictionary¶
- Rank: Game's popularity ranking.
- Name: Game title.
- Platform: Gaming console.
- Year: Release year.
- Genre: Game category.
- Publisher: Game's publisher.
- NA_Sales: North American sales.
- EU_Sales: European sales.
- JP_Sales: Japanese sales.
- Other_Sales: Sales in regions outside NA, EU, and JP.
- Global_Sales: Worldwide total sales.
Installing dependency¶
👉Ignore It if already installed
1. !pip install numpy
2. !pip install pandas
3. !pip install matplotlib
4. !pip install seaborn
Step-1 Data Preprocessing and Data Cleaning¶
Import All Required Libraries¶
# perform linear operations
import numpy as np
# Data manipulation
import pandas as pd
#Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Remove warnings
import warnings
warnings.filterwarnings('ignore')
# Load the Dataset
df = pd.read_csv('vgsales.csv')
df
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006.0 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985.0 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008.0 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009.0 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996.0 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16593 | 16596 | Woody Woodpecker in Crazy Castle 5 | GBA | 2002.0 | Platform | Kemco | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16594 | 16597 | Men in Black II: Alien Escape | GC | 2003.0 | Shooter | Infogrames | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16595 | 16598 | SCORE International Baja 1000: The Official Game | PS2 | 2008.0 | Racing | Activision | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
16596 | 16599 | Know How 2 | DS | 2010.0 | Puzzle | 7G//AMES | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 |
16597 | 16600 | Spirits & Spells | GBA | 2003.0 | Platform | Wanadoo | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16598 rows × 11 columns
# Shape of Dataset
df.shape
(16598, 11)
Looks like the dataset is quite large it contains 16598 Rows and 11 columns.
# getting info
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 16598 entries, 0 to 16597 Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Rank 16598 non-null int64 1 Name 16598 non-null object 2 Platform 16598 non-null object 3 Year 16327 non-null float64 4 Genre 16598 non-null object 5 Publisher 16540 non-null object 6 NA_Sales 16598 non-null float64 7 EU_Sales 16598 non-null float64 8 JP_Sales 16598 non-null float64 9 Other_Sales 16598 non-null float64 10 Global_Sales 16598 non-null float64 dtypes: float64(6), int64(1), object(4) memory usage: 1.4+ MB
From Above Cell We noticed there are 1 int column 4 object and 6 float columns are present in our Dataset.
# Finding Duplicate Values
df.duplicated().sum()
0
There are No Duplicate Values in Our Dataset.
# Finding null values
df.isnull().sum()
Rank 0 Name 0 Platform 0 Year 271 Genre 0 Publisher 58 NA_Sales 0 EU_Sales 0 JP_Sales 0 Other_Sales 0 Global_Sales 0 dtype: int64
There are 271
Missing values in Year Column and 58
in Publisher Column.
Handling the null values of Year Column
# Fill all the null values by its mean
df['Year'].fillna(df['Year'].mean(),inplace = True)
df['Year'] = np.ceil(df['Year'])
df['Year'] = df['Year'].astype(int)
df
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16593 | 16596 | Woody Woodpecker in Crazy Castle 5 | GBA | 2002 | Platform | Kemco | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16594 | 16597 | Men in Black II: Alien Escape | GC | 2003 | Shooter | Infogrames | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16595 | 16598 | SCORE International Baja 1000: The Official Game | PS2 | 2008 | Racing | Activision | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
16596 | 16599 | Know How 2 | DS | 2010 | Puzzle | 7G//AMES | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 |
16597 | 16600 | Spirits & Spells | GBA | 2003 | Platform | Wanadoo | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16598 rows × 11 columns
Handling the null the values of Publisher Column
df.Publisher.unique()
array(['Nintendo', 'Microsoft Game Studios', 'Take-Two Interactive', 'Sony Computer Entertainment', 'Activision', 'Ubisoft', 'Bethesda Softworks', 'Electronic Arts', 'Sega', 'SquareSoft', 'Atari', '505 Games', 'Capcom', 'GT Interactive', 'Konami Digital Entertainment', 'Sony Computer Entertainment Europe', 'Square Enix', 'LucasArts', 'Virgin Interactive', 'Warner Bros. Interactive Entertainment', 'Universal Interactive', 'Eidos Interactive', 'RedOctane', 'Vivendi Games', 'Enix Corporation', 'Namco Bandai Games', 'Palcom', 'Hasbro Interactive', 'THQ', 'Fox Interactive', 'Acclaim Entertainment', 'MTV Games', 'Disney Interactive Studios', nan, 'Majesco Entertainment', 'Codemasters', 'Red Orb', 'Level 5', 'Arena Entertainment', 'Midway Games', 'JVC', 'Deep Silver', '989 Studios', 'NCSoft', 'UEP Systems', 'Parker Bros.', 'Maxis', 'Imagic', 'Tecmo Koei', 'Valve Software', 'ASCII Entertainment', 'Mindscape', 'Infogrames', 'Unknown', 'Square', 'Valve', 'Activision Value', 'Banpresto', 'D3Publisher', 'Oxygen Interactive', 'Red Storm Entertainment', 'Video System', 'Hello Games', 'Global Star', 'Gotham Games', 'Westwood Studios', 'GungHo', 'Crave Entertainment', 'Hudson Soft', 'Coleco', 'Rising Star Games', 'Atlus', 'TDK Mediactive', 'ASC Games', 'Zoo Games', 'Accolade', 'Sony Online Entertainment', '3DO', 'RTL', 'Natsume', 'Focus Home Interactive', 'Alchemist', 'Black Label Games', 'SouthPeak Games', 'Mastertronic', 'Ocean', 'Zoo Digital Publishing', 'Psygnosis', 'City Interactive', 'Empire Interactive', 'Success', 'Compile', 'Russel', 'Taito', 'Agetec', 'GSP', 'Microprose', 'Play It', 'Slightly Mad Studios', 'Tomy Corporation', 'Sammy Corporation', 'Koch Media', 'Game Factory', 'Titus', 'Marvelous Entertainment', 'Genki', 'Mojang', 'Pinnacle', 'CTO SpA', 'TalonSoft', 'Crystal Dynamics', 'SCi', 'Quelle', 'mixi, Inc', 'Rage Software', 'Ubisoft Annecy', 'Scholastic Inc.', 'Interplay', 'Mystique', 'ChunSoft', 'Square EA', '20th Century Fox Video Games', 'Avanquest Software', 'Hudson Entertainment', 'Nordic Games', 'Men-A-Vision', 'Nobilis', 'Big Ben Interactive', 'Touchstone', 'Spike', 'Jester Interactive', 'Nippon Ichi Software', 'LEGO Media', 'Quest', 'Illusion Softworks', 'Tigervision', 'Funbox Media', 'Rocket Company', 'Metro 3D', 'Mattel Interactive', 'IE Institute', 'Rondomedia', 'Sony Computer Entertainment America', 'Universal Gamex', 'Ghostlight', 'Wizard Video Games', 'BMG Interactive Entertainment', 'PQube', 'Trion Worlds', 'Laguna', 'Ignition Entertainment', 'Takara', 'Kadokawa Shoten', 'Destineer', 'Enterbrain', 'Xseed Games', 'Imagineer', 'System 3 Arcade Software', 'CPG Products', 'Aruze Corp', 'Gamebridge', 'Midas Interactive Entertainment', 'Jaleco', 'Answer Software', 'XS Games', 'Activision Blizzard', 'Pack In Soft', 'Rebellion', 'Xplosiv', 'Ultravision', 'GameMill Entertainment', 'Wanadoo', 'NovaLogic', 'Telltale Games', 'Epoch', 'BAM! Entertainment', 'Knowledge Adventure', 'Mastiff', 'Tetris Online', 'Harmonix Music Systems', 'ESP', 'TYO', 'Telegames', 'Mud Duck Productions', 'Screenlife', 'Pioneer LDC', 'Magical Company', 'Mentor Interactive', 'Kemco', 'Human Entertainment', 'Avanquest', 'Data Age', 'Electronic Arts Victor', 'Black Bean Games', 'Jack of All Games', '989 Sports', 'Takara Tomy', 'Media Rings', 'Elf', 'Kalypso Media', 'Starfish', 'Zushi Games', 'Jorudan', 'Destination Software, Inc', 'New', 'Brash Entertainment', 'ITT Family Games', 'PopCap Games', 'Home Entertainment Suppliers', 'Ackkstudios', 'Starpath Corp.', 'P2 Games', 'BPS', 'Gathering of Developers', 'NewKidCo', 'Storm City Games', 'CokeM Interactive', 'CBS Electronics', 'Magix', 'Marvelous Interactive', 'Nihon Falcom Corporation', 'Wargaming.net', 'Angel Studios', 'Arc System Works', 'Playmates', 'SNK Playmore', 'Hamster Corporation', 'From Software', 'Nippon Columbia', 'Nichibutsu', 'Little Orbit', 'Conspiracy Entertainment', 'DTP Entertainment', 'Hect', 'Mumbo Jumbo', 'Pacific Century Cyber Works', 'Indie Games', 'Liquid Games', 'NEC', 'Axela', 'ArtDink', 'Sunsoft', 'Gust', 'SNK', 'NEC Interchannel', 'FuRyu', 'Xing Entertainment', 'ValuSoft', 'Victor Interactive', 'Detn8 Games', 'American Softworks', 'Nordcurrent', 'Bomb', 'Falcom Corporation', 'AQ Interactive', 'CCP', 'Milestone S.r.l.', 'Sears', 'JoWood Productions', 'Seta Corporation', 'On Demand', 'NCS', 'Aspyr', 'Gremlin Interactive Ltd', 'Agatsuma Entertainment', 'Compile Heart', 'Culture Brain', 'Mad Catz', 'Shogakukan', 'Merscom LLC', 'Rebellion Developments', 'Nippon Telenet', 'TDK Core', 'bitComposer Games', 'Foreign Media Games', 'Astragon', 'SSI', 'Kadokawa Games', 'Idea Factory', 'Performance Designed Products', 'Asylum Entertainment', 'Core Design Ltd.', 'PlayV', 'UFO Interactive', 'Idea Factory International', 'Playlogic Game Factory', 'Essential Games', 'Adeline Software', 'Funcom', 'Panther Software', 'Blast! Entertainment Ltd', 'Game Life', 'DSI Games', 'Avalon Interactive', 'Popcorn Arcade', 'Neko Entertainment', 'Vir2L Studios', 'Aques', 'Syscom', 'White Park Bay Software', 'System 3', 'Vatical Entertainment', 'Daedalic', 'EA Games', 'Media Factory', 'Vic Tokai', 'The Adventure Company', 'Game Arts', 'Broccoli', 'Acquire', 'General Entertainment', 'Excalibur Publishing', 'Imadio', 'Swing! Entertainment', 'Sony Music Entertainment', 'Aqua Plus', 'Paradox Interactive', 'Hip Interactive', 'DreamCatcher Interactive', 'Tripwire Interactive', 'Sting', 'Yacht Club Games', 'SCS Software', 'Bigben Interactive', 'Havas Interactive', 'Slitherine Software', 'Graffiti', 'Funsta', 'Telstar', 'U.S. Gold', 'DreamWorks Interactive', 'Data Design Interactive', 'MTO', 'DHM Interactive', 'FunSoft', 'SPS', 'Bohemia Interactive', 'Reef Entertainment', 'Tru Blu Entertainment', 'Moss', 'T&E Soft', 'O-Games', 'Aksys Games', 'NDA Productions', 'Data East', 'Time Warner Interactive', 'Gainax Network Systems', 'Daito', 'O3 Entertainment', 'Gameloft', 'Xicat Interactive', 'Simon & Schuster Interactive', 'Valcon Games', 'PopTop Software', 'TOHO', 'HMH Interactive', '5pb', 'Cave', 'CDV Software Entertainment', 'Microids', 'PM Studios', 'Paon', 'Micro Cabin', 'GameTek', 'Benesse', 'Type-Moon', 'Enjoy Gaming ltd.', 'Asmik Corp', 'Interplay Productions', 'Asmik Ace Entertainment', 'inXile Entertainment', 'Image Epoch', 'Phantom EFX', 'Evolved Games', 'responDESIGN', 'Culture Publishers', 'Griffin International', 'Hackberry', 'Hearty Robin', 'Nippon Amuse', 'Origin Systems', 'Seventh Chord', 'Mitsui', 'Milestone', 'Abylight', 'Flight-Plan', 'Glams', 'Locus', 'Warp', 'Daedalic Entertainment', 'Alternative Software', 'Myelin Media', 'Mercury Games', 'Irem Software Engineering', 'Sunrise Interactive', 'Elite', 'Evolution Games', 'Tivola', 'Global A Entertainment', 'Edia', 'Athena', 'Aria', 'Gamecock', 'Tommo', 'Altron', 'Happinet', 'iWin', 'Media Works', 'Fortyfive', 'Revolution Software', 'Imax', 'Crimson Cow', '10TACLE Studios', 'Groove Games', 'Pack-In-Video', 'Insomniac Games', 'Ascaron Entertainment GmbH', 'Asgard', 'Ecole', 'Yumedia', 'Phenomedia', 'HAL Laboratory', 'Grand Prix Games', 'DigiCube', 'Creative Core', 'Kaga Create', 'WayForward Technologies', 'LSP Games', 'ASCII Media Works', 'Coconuts Japan', 'Arika', 'Ertain', 'Marvel Entertainment', 'Prototype', 'TopWare Interactive', 'Phantagram', '1C Company', 'The Learning Company', 'TechnoSoft', 'Vap', 'Misawa', 'Tradewest', 'Team17 Software', 'Yeti', 'Pow', 'Navarre Corp', 'MediaQuest', 'Max Five', 'Comfort', 'Monte Christo Multimedia', 'Pony Canyon', 'Riverhillsoft', 'Summitsoft', 'Milestone S.r.l', 'Playmore', 'MLB.com', 'Kool Kizz', 'Flashpoint Games', '49Games', 'Legacy Interactive', 'Alawar Entertainment', 'CyberFront', 'Cloud Imperium Games Corporation', 'Societa', 'Virtual Play Games', 'Interchannel', 'Sonnet', 'Experience Inc.', 'Zenrin', 'Iceberg Interactive', 'Ivolgamus', '2D Boy', 'MC2 Entertainment', 'Kando Games', 'Just Flight', 'Office Create', 'Mamba Games', 'Fields', 'Princess Soft', 'Maximum Family Games', 'Berkeley', 'Fuji', 'Dusenberry Martin Racing', 'imageepoch Inc.', 'Big Fish Games', 'Her Interactive', 'Kamui', 'ASK', 'Headup Games', 'KSS', 'Cygames', 'KID', 'Quinrose', 'Sunflowers', 'dramatic create', 'TGL', 'Encore', 'Extreme Entertainment Group', 'Intergrow', 'G.Rev', 'Sweets', 'Kokopeli Digital Studios', 'Number None', 'Nexon', 'id Software', 'BushiRoad', 'Tryfirst', 'Strategy First', '7G//AMES', 'GN Software', "Yuke's", 'Easy Interactive', 'Licensed 4U', 'FuRyu Corporation', 'Lexicon Entertainment', 'Paon Corporation', 'Kids Station', 'GOA', 'Graphsim Entertainment', 'King Records', 'Introversion Software', 'Minato Station', 'Devolver Digital', 'Blue Byte', 'Gaga', 'Yamasa Entertainment', 'Plenty', 'Views', 'fonfun', 'NetRevo', 'Codemasters Online', 'Quintet', 'Phoenix Games', 'Dorart', 'Marvelous Games', 'Focus Multimedia', 'Imageworks', 'Karin Entertainment', 'Aerosoft', 'Technos Japan Corporation', 'Gakken', 'Mirai Shounen', 'Datam Polystar', 'Saurus', 'HuneX', 'Revolution (Japan)', 'Giza10', 'Visco', 'Alvion', 'Mycom', 'Giga', 'Warashi', 'System Soft', 'Sold Out', 'Lighthouse Interactive', 'Masque Publishing', 'RED Entertainment', 'Michaelsoft', 'Media Entertainment', 'New World Computing', 'Genterprise', 'Interworks Unlimited, Inc.', 'Boost On', 'Stainless Games', 'EON Digital Entertainment', 'Epic Games', 'Naxat Soft', 'Ascaron Entertainment', 'Piacci', 'Nitroplus', 'Paradox Development', 'Otomate', 'Ongakukan', 'Commseed', 'Inti Creates', 'Takuyo', 'Interchannel-Holon', 'Rain Games', 'UIG Entertainment'], dtype=object)
There are lots of unique Values present in the Publisher
column and filling the missing values is difficult so, I decide to drop all the null values.
df.dropna(inplace = True)
df
Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | Wii Sports | Wii | 2006 | Sports | Nintendo | 41.49 | 29.02 | 3.77 | 8.46 | 82.74 |
1 | 2 | Super Mario Bros. | NES | 1985 | Platform | Nintendo | 29.08 | 3.58 | 6.81 | 0.77 | 40.24 |
2 | 3 | Mario Kart Wii | Wii | 2008 | Racing | Nintendo | 15.85 | 12.88 | 3.79 | 3.31 | 35.82 |
3 | 4 | Wii Sports Resort | Wii | 2009 | Sports | Nintendo | 15.75 | 11.01 | 3.28 | 2.96 | 33.00 |
4 | 5 | Pokemon Red/Pokemon Blue | GB | 1996 | Role-Playing | Nintendo | 11.27 | 8.89 | 10.22 | 1.00 | 31.37 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
16593 | 16596 | Woody Woodpecker in Crazy Castle 5 | GBA | 2002 | Platform | Kemco | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16594 | 16597 | Men in Black II: Alien Escape | GC | 2003 | Shooter | Infogrames | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16595 | 16598 | SCORE International Baja 1000: The Official Game | PS2 | 2008 | Racing | Activision | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 |
16596 | 16599 | Know How 2 | DS | 2010 | Puzzle | 7G//AMES | 0.00 | 0.01 | 0.00 | 0.00 | 0.01 |
16597 | 16600 | Spirits & Spells | GBA | 2003 | Platform | Wanadoo | 0.01 | 0.00 | 0.00 | 0.00 | 0.01 |
16540 rows × 11 columns
Step - 2 Data Analysis¶
# Name columns
df['Name'].unique()
array(['Wii Sports', 'Super Mario Bros.', 'Mario Kart Wii', ..., 'Plushees', 'Woody Woodpecker in Crazy Castle 5', 'Know How 2'], dtype=object)
df['Name'].value_counts()
Name Need for Speed: Most Wanted 12 FIFA 14 9 Madden NFL 07 9 Ratatouille 9 LEGO Marvel Super Heroes 9 .. Nintendo Presents: Crossword Collection 1 TrackMania: Build to Race 1 DanceDanceRevolution II 1 Covert Ops: Nuclear Dawn 1 Know How 2 1 Name: count, Length: 11442, dtype: int64
df[['Rank','Name','Genre']].sort_values(by = 'Rank').head(10)
Rank | Name | Genre | |
---|---|---|---|
0 | 1 | Wii Sports | Sports |
1 | 2 | Super Mario Bros. | Platform |
2 | 3 | Mario Kart Wii | Racing |
3 | 4 | Wii Sports Resort | Sports |
4 | 5 | Pokemon Red/Pokemon Blue | Role-Playing |
5 | 6 | Tetris | Puzzle |
6 | 7 | New Super Mario Bros. | Platform |
7 | 8 | Wii Play | Misc |
8 | 9 | New Super Mario Bros. Wii | Platform |
9 | 10 | Duck Hunt | Shooter |
Top 10 Best Games in the Market based on Genres.
# Platform Column
df.Platform.unique()
array(['Wii', 'NES', 'GB', 'DS', 'X360', 'PS3', 'PS2', 'SNES', 'GBA', '3DS', 'PS4', 'N64', 'PS', 'XB', 'PC', '2600', 'PSP', 'XOne', 'GC', 'WiiU', 'GEN', 'DC', 'PSV', 'SAT', 'SCD', 'WS', 'NG', 'TG16', '3DO', 'GG', 'PCFX'], dtype=object)
ax = sns.countplot(data = df,x = 'Platform')
for label in ax.containers:
ax.bar_label(label)
plt.xticks(rotation = 90)
plt.title('All different Platforms')
plt.show()
Now I found that Most of the games released on PS2 i.e 2159
and on DS with 2156
.
`What are they
DS: The Nintendo DS is handheld console with dual screens and touch capabilities, offering unique gameplay experiences.
PS2: The PlayStation 2 is Sony's second console and one of the best-selling consoles of all time, offering a vast game library.
# Year Column
df.groupby(df['Year'])['Name'].count().sort_values(ascending = False).head(10)
Year 2007 1450 2009 1431 2008 1428 2010 1257 2011 1136 2006 1008 2005 936 2002 829 2003 775 2004 744 Name: Name, dtype: int64
sorted_df = df.sort_values(by='Year')
plt.figure(figsize=(12, 6))
ax = sns.countplot(data=sorted_df, x='Year', width=0.5, saturation=1.5)
for label in ax.containers:
ax.bar_label(label)
plt.xticks(rotation=90)
plt.title('Games Release Years')
plt.tight_layout()
plt.show()
In the Year
2007,2008 and 2009 most of the games were released and in 2020 only 1
game were released.
The counts of the games in the year are:-
1] 2007 -
1450
2] 2008 -
1428
3] 2009 -
1431
# Genre Column
df.Genre.unique()
array(['Sports', 'Platform', 'Racing', 'Role-Playing', 'Puzzle', 'Misc', 'Shooter', 'Simulation', 'Action', 'Fighting', 'Adventure', 'Strategy'], dtype=object)
plt.figure(figsize=(12,4))
ax = sns.countplot(data = df,x = 'Genre')
for label in ax.containers:
ax.bar_label(label)
# plt.xticks(rotation = 90)
plt.title('Best Game Genres')
plt.show()
- Action is the most prominent genre because a majority of games are based on the Action genre, with a count of 3309.
- Sports is also a beloved genre in the realm of video games, with 2343 video games based on the Sports genre.
# Higest Global Sales According to different Genres
df.groupby('Genre')['Global_Sales'].max().sort_values(ascending = False)
Genre Sports 82.74 Platform 40.24 Racing 35.82 Role-Playing 31.37 Puzzle 30.26 Misc 29.02 Shooter 28.31 Simulation 24.76 Action 21.40 Fighting 13.04 Adventure 11.18 Strategy 5.45 Name: Global_Sales, dtype: float64
This study identifies the highest-selling game in each gaming genre based on global sales data. The goal is to pinpoint the most successful game within each genre and understand its impact on the industry.
# These are the games based on Action Genre
action = df[df['Genre'] == 'Action'].sort_values(by = 'Rank')
action[['Rank','Name','Year','Publisher','Global_Sales']]
Rank | Name | Year | Publisher | Global_Sales | |
---|---|---|---|---|---|
16 | 17 | Grand Theft Auto V | 2013 | Take-Two Interactive | 21.40 |
17 | 18 | Grand Theft Auto: San Andreas | 2004 | Take-Two Interactive | 20.81 |
23 | 24 | Grand Theft Auto V | 2013 | Take-Two Interactive | 16.38 |
24 | 25 | Grand Theft Auto: Vice City | 2002 | Take-Two Interactive | 16.15 |
38 | 39 | Grand Theft Auto III | 2001 | Take-Two Interactive | 13.10 |
... | ... | ... | ... | ... | ... |
16564 | 16567 | Original Frisbee Disc Sports: Ultimate & Golf | 2007 | Destination Software, Inc | 0.01 |
16567 | 16570 | Fujiko F. Fujio Characters: Great Assembly! Sl... | 2014 | Namco Bandai Games | 0.01 |
16582 | 16585 | Planet Monsters | 2001 | Titus | 0.01 |
16583 | 16586 | Carmageddon 64 | 1999 | Virgin Interactive | 0.01 |
16589 | 16592 | Chou Ezaru wa Akai Hana: Koi wa Tsuki ni Shiru... | 2016 | dramatic create | 0.01 |
3309 rows × 5 columns
# These are the games based on Sports Genre
sports = df[df['Genre'] == 'Sports'].sort_values(by = 'Rank')
sports[['Rank','Name','Year','Publisher','Global_Sales']]
Rank | Name | Year | Publisher | Global_Sales | |
---|---|---|---|---|---|
0 | 1 | Wii Sports | 2006 | Nintendo | 82.74 |
3 | 4 | Wii Sports Resort | 2009 | Nintendo | 33.00 |
13 | 14 | Wii Fit | 2007 | Nintendo | 22.72 |
14 | 15 | Wii Fit Plus | 2009 | Nintendo | 22.00 |
77 | 78 | FIFA 16 | 2015 | Electronic Arts | 8.49 |
... | ... | ... | ... | ... | ... |
16576 | 16579 | Rugby Challenge 3 | 2016 | Alternative Software | 0.01 |
16578 | 16581 | Outdoors Unleashed: Africa 3D | 2011 | Mastiff | 0.01 |
16579 | 16582 | PGA European Tour | 2000 | Infogrames | 0.01 |
16581 | 16584 | Fit & Fun | 2011 | Unknown | 0.01 |
16587 | 16590 | Mezase!! Tsuri Master DS | 2009 | Hudson Soft | 0.01 |
2343 rows × 5 columns
# Pulisher Column
# df.Publisher.unique()
x = df['Publisher'].value_counts().head(10)
x
Publisher Electronic Arts 1351 Activision 975 Namco Bandai Games 932 Ubisoft 921 Konami Digital Entertainment 832 THQ 715 Nintendo 703 Sony Computer Entertainment 683 Sega 639 Take-Two Interactive 413 Name: count, dtype: int64
x.index
Index(['Electronic Arts', 'Activision', 'Namco Bandai Games', 'Ubisoft', 'Konami Digital Entertainment', 'THQ', 'Nintendo', 'Sony Computer Entertainment', 'Sega', 'Take-Two Interactive'], dtype='object', name='Publisher')
# Plot the Result
plt.figure(figsize = (8,4))
ax = plt.bar(x.index, x.values)
plt.xticks(rotation=60)
# Add labels to the bars
plt.bar_label(ax)
plt.show()
I Found Electronic Arts is the best Publisher
.
Now Visualize all the Sales Columns One by One using Distribution plot¶
# Create a distribution plot with KDE for NA_Sales
plt.figure(figsize=(6, 4))
sns.histplot(data=df, x='NA_Sales', kde=True, binwidth=0.5) # Adjust the binwidth as needed
plt.title('Distribution of NA Sales')
plt.xlabel('NA Sales')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# EU_Sales
# Create a distribution plot with KDE for NA_Sales
plt.figure(figsize=(6, 4))
sns.histplot(data=df, x='EU_Sales', kde=True, binwidth=1) # Adjust the binwidth as needed
plt.title('Distribution of EU Sales')
plt.xlabel('EU Sales')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# Create a distribution plot with KDE for NA_Sales
plt.figure(figsize=(6, 4))
sns.histplot(data=df, x='JP_Sales', kde=True, binwidth=0.5) # Adjust the binwidth as needed
plt.title('Distribution of JP Sales')
plt.xlabel('JP Sales')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# Create a distribution plot with KDE for NA_Sales
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='Other_Sales', kde=True, binwidth=0.5) # Adjust the binwidth as needed
plt.title('Distribution of Other Sales')
plt.xlabel('Other Sales')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
# Total Sales
# Create a distribution plot with KDE for NA_Sales
plt.figure(figsize=(10, 6))
sns.histplot(data=df, x='Global_Sales', kde=True, binwidth=0.5) # Adjust the binwidth as needed
plt.title('Distribution of Global Sales')
plt.xlabel('Global Sales')
plt.ylabel('Frequency')
plt.grid(True)
plt.show()
Descriptive Statistics of Sales Columns¶
df[['NA_Sales','EU_Sales','JP_Sales','Other_Sales','Global_Sales']].describe()
NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales | |
---|---|---|---|---|---|
count | 16540.000000 | 16540.000000 | 16540.000000 | 16540.000000 | 16540.000000 |
mean | 0.265079 | 0.146883 | 0.077998 | 0.048191 | 0.538426 |
std | 0.817929 | 0.506129 | 0.309800 | 0.188879 | 1.557424 |
min | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.010000 |
25% | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.060000 |
50% | 0.080000 | 0.020000 | 0.000000 | 0.010000 | 0.170000 |
75% | 0.240000 | 0.110000 | 0.040000 | 0.040000 | 0.480000 |
max | 41.490000 | 29.020000 | 10.220000 | 10.570000 | 82.740000 |
plt.figure(figsize=(12,6))
plt.bar(df['Year'],df['Global_Sales'])
plt.title('Global Sales on Different Year')
plt.xlabel('Years')
plt.ylabel('Global Sales')
plt.show()
By Using this Plot I found that between 2006 to 2009 the Global Sales is very High and it is decreases by Year.
plt.figure(figsize=(12,6))
plt.bar(df['Genre'],df['Global_Sales'])
plt.title('Global Sales Based On Genres')
plt.xlabel('Genres')
plt.ylabel('Global Sales')
plt.show()
Global Sales of Action Genre is Very High Among all other.
df.columns
Index(['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales'], dtype='object')
Conclusion¶
Based on the analysis of the video game sales dataset, several key insights can be drawn:
Platform Trends: The platform with the most game releases is PS2, followed closely by DS. These platforms have been prolific in terms of hosting a wide range of games.
Release Years: The years 2007, 2008, and 2009 saw the highest number of game releases. However, the number of releases significantly declined after 2009, indicating a potential shift in the industry.
Genre Popularity: The action genre has the highest number of game releases, followed by sports. Action games seem to dominate the market in terms of sheer numbers.
Publisher Impact: Electronic Arts emerges as the leading publisher with the most games released. This suggests its strong presence and influence in the gaming industry.
Global Sales Distribution: Distribution plots show that global sales tend to be concentrated in the lower range for all regions (NA, EU, JP, Other, and Global). However, a few games have exceptionally high global sales.
Platform Performance: Certain platforms like the PS2 and DS have seen impressive global sales, indicating their popularity among gamers.
Genre and Global Sales: The action genre consistently boasts the highest global sales, followed by sports and shooter genres. These genres seem to resonate well with players on a global scale.
Yearly Global Sales: Over the years, global game sales have experienced a peak between 2006 and 2009, with a subsequent decline in sales. This suggests a potential shift in gaming trends.
In conclusion, the analysis provides valuable insights into the dynamics of the video game industry. The data highlights the dominance of certain genres, platforms, and publishers, while also showcasing trends in global sales. Understanding these patterns can aid in strategic decision-making and offer insights into the preferences of gamers and the evolving landscape of the gaming market.
Video Game Sales Insights: Best Games and Rankings
This analysis encompasses two key aspects of the Video Game Sales Dataset: identifying top games within each genre based on global sales and understanding game rankings.
Best Games by Genre and Global Sales:
- Genres like Action and Sports have high game counts and significant global sales.
- Electronic Arts stands out as a prolific publisher.
- Examining global sales trends within genres reveals successful games across categories.
Rankings, Platforms, and Trends:
- Rankings indicate game popularity and sales performance.
- PlayStation 2 (PS2) and Nintendo DS host numerous top-ranked games.
- Successful years, such as 2007-2009, contribute to game performance.
Genre and Publisher Impact:
- Genres like Action and Sports resonate with players and yield top rankings.
- Publishers' strategies influence consistent high-ranking games.
Global Sales and Rankings Nexus:
- Correlating rankings with global sales reveals sales distribution's influence on rankings.
Strategic Insights:
- Combined insights aid strategic decisions in game development, marketing, and investments.
- Holistic analysis empowers stakeholders to navigate the dynamic gaming industry.