Data Overview: - While Myntra is recognized primarily for fashion, this dataset relates to the company& #39;s online retail operations for Myntra Gifts Ltd., a UK-based division specializing in unique all-occasion giftware. - This dataset spans transactions from December 1, 2009,to December 9, 2011, and includes detailed records of sales made through Myntra Gifts Ltd.’s non-store online platform. - The dataset provides a thorough snapshot of the company& #39;s international online retail activities during this period. - The primary purpose of analyzing this dataset is to extract valuable insights to enhance Myntra Gifts Ltd.’s business strategies.
The dataset consists of 328609 records and 20 columns of a Myntra , each representing different aspects of the transactions.
2.1 Importing Neccessary Libraries
Show Code
import pandas as pdimport numpy as npimport matplotlib.pyplot as pltimport seaborn as sns import cufflinks as cfcf.go_offline()import warningsimport sqlite3from scipy.stats import ttest_indimport scipy.stats as statssns.set(style="whitegrid")warnings.filterwarnings('ignore')
Total_Quantity: Total quantity of items sold in the invoice.
Total_Revenue: Total revenue generated from the invoice.
Year: Year of the transaction.
Month: Month of the transaction.
Quarter: Quarter of the year when the transaction occurred.
Weekend: Indicator if the transaction occurred on a weekend.
InvoiceHour: Hour of the day when the transaction took place.
DayOfWeek: Day of the week when the transaction occurred.
IsWeekend: Boolean indicating if the transaction was on a weekend.
PartOfDay: Time of day (e.g., Morning, Afternoon, Evening).
Avg_Price_Per_Item: Average price per item sold.
Items_Per_Invoice: Average number of items per invoice.
4.0.2 Check Unique Values for each variable.
Show Code
# check unique values for each columnfor i in df.columns.tolist():print("No. Of unique values in ",i,"is",df[i].nunique(),".")
No. Of unique values in InvoiceNo is 16826 .
No. Of unique values in StockCode is 3392 .
No. Of unique values in Description is 3565 .
No. Of unique values in Quantity is 27 .
No. Of unique values in InvoiceDate is 15784 .
No. Of unique values in UnitPrice is 182 .
No. Of unique values in CustomerID is 4191 .
No. Of unique values in Country is 37 .
No. Of unique values in Total_Quantity is 44 .
No. Of unique values in Total_Revenue is 1329 .
No. Of unique values in Year is 2 .
No. Of unique values in Month is 12 .
No. Of unique values in Quarter is 4 .
No. Of unique values in Weekend is 6 .
No. Of unique values in InvoiceHour is 15 .
No. Of unique values in DayOfWeek is 6 .
No. Of unique values in IsWeekend is 2 .
No. Of unique values in PartOfDay is 3 .
No. Of unique values in Avg_Price_Per_Item is 224 .
No. Of unique values in Items_Per_Invoice is 792 .
4.0.3 Check Distribution of the numerical variables.
Show Code
# Distribution of the numerical columnsnumerical_col=df.select_dtypes("number").columnsplt.figure(figsize=(15,10))for i ,col inenumerate(numerical_col): plt.subplot(3,4,i+1) sns.histplot(df[col],kde=True,bins=30) plt.title(col)plt.tight_layout()plt.show()
4.0.4 Outliers Detection
Show Code
# Outliers Detection with Boxplotsplt.figure(figsize=(15,10))for i ,col inenumerate(numerical_col): plt.subplot(3,4,i+1) sns.boxplot(df[col]) plt.title(col)plt.tight_layout()plt.show()
The monthly revenue graph shows a steady increase from February to November 2025, with the lowest revenue around $200K in February and the highest at approximately $600K in November.
Quarterly revenue has grown consistently from Q1 to Q4 2025, starting at around $500K in Q1 and reaching approximately $1.4M in Q4.
Both graphs indicate a positive revenue trend over the year, with a noticeable acceleration in the second half of 2025.
The monthly data highlights November as a peak month, while the quarterly data emphasizes Q4 as the strongest quarter.
5.1.21.2) Which months have the highest and lowest Average Order Value?
Show Code
from calendar import monthfrom re import Mprint("Figure 2: Average Order Value (AOV) by Month")# group by InvoiceNo to get Revenue per orderorders=df.groupby('InvoiceNo').agg({'Total_Revenue':'sum','Month':'first'}).reset_index()# Calculate the AOV by Monthmonthly_aov=orders.groupby('Month')['Total_Revenue'].mean().reset_index()monthly_aov.columns = ['Month', 'AOV']# Sort by AOV (descending)month_order = ['January', 'February', 'March', 'April', 'May', 'June','July', 'August', 'September', 'October', 'November', 'December']monthly_aov_sorted = monthly_aov.sort_values('AOV', ascending=False)monthly_aov_sorted['Month'] = pd.Categorical(monthly_aov_sorted['Month'], categories=month_order, ordered=True)monthly_aov_sorted = monthly_aov_sorted.sort_values('Month')# AOV by month plotplt.figure(figsize=(12, 6))plt.bar(monthly_aov_sorted['Month'], monthly_aov_sorted['AOV'], color='skyblue')plt.title('Average Order Value by Month', fontsize=14)plt.xlabel('Month', fontsize=12)plt.ylabel('AOV ($)', fontsize=12)plt.xticks(rotation=45)plt.grid(axis='y', linestyle='--', alpha=0.7)plt.show()# Top 3 months with highest AOVhighest_aov = monthly_aov_sorted.sort_values('AOV', ascending=False).head(3)print("The Top 3 Highest Average order Value within the Months:\n",highest_aov)# Bottom 3 months with lowest AOVlowest_aov = monthly_aov_sorted.sort_values('AOV', ascending=True).head(3)print("\n The Bottom 3 Lowest Average Order Value within the Months:\n",lowest_aov)
Figure 2: Average Order Value (AOV) by Month
The Top 3 Highest Average order Value within the Months:
Month AOV
10 October 303.047145
11 September 302.164541
1 August 266.125009
The Bottom 3 Lowest Average Order Value within the Months:
Month AOV
6 June 229.601845
2 December 233.439522
8 May 234.698949
5.1.2.1 Insights:
The Average Order Value (AOV) fluctuates throughout the year 2025, with values generally ranging between 150 and 250.
The highest AOV is observed in September and October, peaking at around 250.
The lowest AOV occurs in June and July, dropping to approximately 150.
Overall, there is no consistent upward or downward trend, with peaks and troughs indicating seasonal or other influencing factors.
5.1.31.3) How does the average price per item change across time (monthly or quarterly)?
Show Code
def format_dollars(x, pos):"""Converts values to $1.1K or $1.1M format."""if x >=1e6:returnf'${x/1e6:.1f}M'elif x >=1e3:returnf'${x/1e3:.0f}K'else:returnf'${x:.2f}'
Show Code
print("Figure 3: Monthly and Quarterly Average Price per Item")# Monthly Average Price Per Itemmonthly_avg_unitprice=df.groupby('Month')['UnitPrice'].mean().sort_values()# Quaterly Average Price per Itemquarterly_avg_unitprice = df.groupby('Quarter')['UnitPrice'].mean().sort_index()# Create subplotsfig, (ax1, ax2) = plt.subplots(2, 1, figsize=(15, 10))# Monthly average price plotmonthly_avg_unitprice.plot(kind='line',marker='o', color='teal', ax=ax1, linewidth=2)ax1.set_title('Monthly Average Price per Item', fontsize=14, pad=15)ax1.set_ylabel('Price', fontsize=12)ax1.yaxis.set_major_formatter(FuncFormatter(format_dollars))ax1.grid(axis='y', linestyle='--', alpha=0.6)plt.sca(ax1)plt.xticks(rotation=45, ha='right')# Quarterly Average price plotquarterly_avg_unitprice.plot(kind='bar', color='skyblue', alpha=0.8, ax=ax2)ax2.set_title('Quarterly Average Price per Item', fontsize=14, pad=15)ax2.set_xlabel('Quarter')ax2.set_ylabel('Price', fontsize=12)ax2.yaxis.set_major_formatter(FuncFormatter(format_dollars))ax2.grid(axis='y', linestyle=':', alpha=0.6)plt.sca(ax2)plt.xticks(rotation=45, ha='left')plt.tight_layout()ax1.grid(True)plt.show()
Figure 3: Monthly and Quarterly Average Price per Item
5.1.3.1 Insights:
The monthly average price per item shows a steady increase from July 2025 to February 2026, starting at around $12.00 and rising to approximately $13.25.
The quarterly average price per item remains relatively stable across Q1 to Q4 2025, hovering around $12.00 to $12.50.
The monthly data indicates a noticeable upward trend, particularly from October 2025 onwards.
5.2🧍♂️2. Customer Behavior Analysis
5.2.12.1) What is the distribution of the number of items purchased per invoice (Items_Per_Invoice)?
Show Code
print("Figure 4: Distribution of Items Per Invoice")# plot the Item_Per_Invoice plt.figure(figsize=(12,6))sns.histplot(df['Items_Per_Invoice'],bins=20,kde=True,color='#4e79a7')plt.title('Distribution of Item Per Invoice',fontsize=14)plt.xlabel('Number of Items',fontsize=12)plt.ylabel('Frequency (Count Of Invoices) ',fontsize=12)plt.grid(axis='y')plt.show()# Segment by Customer Behaviour# Top 10% of invoices(heavy buyers)top_10_invoices=df[df['Items_Per_Invoice'] >= df['Items_Per_Invoice'].quantile(0.9)]# Bottom 50% of Invoices (Light buyer)bottom_50_invoices=df[df['Items_Per_Invoice'] <= df['Items_Per_Invoice'].quantile(0.5)]print(f"Top 10 % Invoice: Avg {top_10_invoices['Items_Per_Invoice'].mean():.1f} items")print(f"Bottom 50 % Invoices: Avg {bottom_50_invoices['Items_Per_Invoice'].mean():.1f} items")
Insights: - The top 10% of invoices (the biggest 10% based on items purchased) had an average of 888.8 items per invoice. - The bottom 50% of invoices (the smallest half) had an average of 108.8 items per invoice. - There’s a huge gap between the largest and smallest orders. Just the top 10% of purchases are more than 8 times larger (on average) than half of all purchases. This indicates that a few large invoices are driving a significant portion of total sales, which is typical in many retail or B2B scenarios.
5.2.22.2) How does customer behavior vary between weekdays and weekends?
5.2.2.12.2.a) What is the effect of Averag Order Value in the Weekends and Weekdays?
Show Code
print("Figure 5: Average Order Value (AOV) by Weekend vs Weekday")# Ensure 'IsWeekend' and 'Total_Revenue' columns exist and are correctdf['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])df['IsWeekend'] = df['InvoiceDate'].dt.dayofweek >=5# Saturday=5, Sunday=6df['Total_Revenue'] = df['Quantity'] * df['UnitPrice']# Calculate AOV per invoiceinvoice_aov = df.groupby(['InvoiceNo', 'IsWeekend']).agg({'Total_Revenue': 'sum'}).reset_index()# Aggregate AOV by weekend/weekdayaov_by_daytype = invoice_aov.groupby('IsWeekend')['Total_Revenue'].mean().reset_index()aov_by_daytype['Day_Type'] = aov_by_daytype['IsWeekend'].map({True: 'Weekend', False: 'Weekday'})# Plotplt.figure(figsize=(10,6))sns.barplot(data=aov_by_daytype, x='Day_Type', y='Total_Revenue', palette='Set2')plt.title('Average Order Value (AOV): Weekend vs Weekday')plt.ylabel('Average Order Value ($)')plt.xlabel('Day Type')plt.grid(axis='y', linestyle='--', alpha=0.7)plt.show()print(aov_by_daytype[['Day_Type', 'Total_Revenue']].rename(columns={'Total_Revenue': 'AOV'}))
Figure 5: Average Order Value (AOV) by Weekend vs Weekday
The average order value (AOV) is higher on weekdays, reaching approximately $250.
The AOV on weekends is lower, at around $225.
There is a noticeable difference of about $25 in AOV between weekdays and weekends.
Show Code
print("Figure 6: Average Order Value (AOV) by Day of Week")import warningswarnings.filterwarnings("ignore", category=FutureWarning)# Ensure proper datetime formattingfrom turtle import positiondf['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])df['DayOfWeek'] = df['InvoiceDate'].dt.day_name()df['IsWeekend'] = df['InvoiceDate'].dt.dayofweek >=5# Saturday=5, Sunday=6# Total Revenue column (if not already present)df['Total_Revenue'] = df['Quantity'] * df['UnitPrice']# Calculate per-day behaviorday_behavior = df.groupby('DayOfWeek').agg({'InvoiceNo': 'nunique','CustomerID': 'nunique','Total_Revenue': 'sum'}).reset_index()# Add AOV (Average Order Value) and it should be greater than 0day_behavior['AOV'] = day_behavior['Total_Revenue'] / day_behavior['InvoiceNo']day_behavior= day_behavior[day_behavior['AOV'] >0]# Map day typeday_behavior['Day_Type'] = day_behavior['DayOfWeek'].map({'Monday': 'Weekday', 'Tuesday': 'Weekday', 'Wednesday': 'Weekday','Thursday': 'Weekday', 'Friday': 'Weekday','Saturday': 'Weekend', 'Sunday': 'Weekend'})# Ensure correct order on x-axisday_order = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']day_behavior['DayOfWeek'] = pd.Categorical(day_behavior['DayOfWeek'], categories=day_order, ordered=True)# ploting the figureplt.figure(figsize=(12,6))plot = sns.barplot(data=day_behavior, x='AOV', y='DayOfWeek', palette='Set2',orient='h')# Add value labels on barsfor container in plot.containers: plot.bar_label(container, fmt='%.2f', label_type='edge', padding=3, fontsize=10, color='black')plt.title("📊 Distribution of Average Order Value (AOV) by Day Of Week")plt.xlabel("Average Order Value (AOV)")plt.ylabel("Day of a Week")plt.grid(False)plt.tight_layout()plt.legend()plt.show()
Figure 6: Average Order Value (AOV) by Day of Week
No artists with labels found to put in legend. Note that artists whose label start with an underscore are ignored when legend() is called with no argument.
5.2.2.3 Insgihts:
The highest Average Order Value (AOV) is on Monday at $270.02, while the lowest is on Sunday at $240.20.
AOV tends to be higher during the early part of the week (Monday to Thursday), ranging from $248.06 to $270.02, with a slight decline towards the weekend.
Friday and Saturday show moderate AOVs at $254.74 and $258.56, respectively, before dropping on Sunday.
5.2.2.42.2.b) How does the Order Volume varries within each hour Weeknd and Weekdays?
Show Code
print("Figure 7: Invoice Count by Hour - Weekend vs Weekday")# Ensure datetime and extract necessary time infodf['InvoiceDate'] = pd.to_datetime(df['InvoiceDate'])df['InvoiceHour'] = df['InvoiceDate'].dt.hourdf['IsWeekend'] = df['InvoiceDate'].dt.dayofweek >=5# Sat & Sun# Group by hour and weekend flaghourly_orders = df.groupby(['InvoiceHour', 'IsWeekend'])['InvoiceNo'].nunique().reset_index()# Rename for clarityhourly_orders.columns = ['Hour', 'IsWeekend', 'InvoiceCount']hourly_orders['Day_Type'] = hourly_orders['IsWeekend'].map({True: 'Weekend', False: 'Weekday'})# Plotplt.figure(figsize=(12,6))sns.lineplot(data=hourly_orders, x='Hour', y='InvoiceCount', hue='Day_Type', marker='o')plt.title('Invoice Count by Hour - Weekend vs Weekday')plt.xlabel('Hour of Day')plt.ylabel('Number of Invoices')plt.xticks(range(0, 24))plt.grid(True)plt.legend(title='Day Type')plt.tight_layout()plt.show()
Figure 7: Invoice Count by Hour - Weekend vs Weekday
5.2.2.5 Insights:
Invoice counts on weekdays peak between 9 AM and 12 PM, reaching around 2500, with a significant drop after 3 PM.
Weekend invoice counts are lower, peaking around 10 AM to 1 PM at approximately 500, and remain relatively stable with a gradual decline throughout the day.
Both weekday and weekend invoice counts are minimal from midnight to 6 AM, with a noticeable increase starting around 6 AM.
5.2.32.3) Are there certain months or hours when customers are more likely to make purchases?
Show Code
print("Figure 8: Purchases by Day of Week and Hour of Day")# analyse by the day of the weekfig, axes = plt.subplots(1, 2, figsize=(16, 6)) # Increase width and heightdf['InvoiceDate']=pd.to_datetime(df['InvoiceDate'])# extract the month name from dayofweedf['DayOfWeek']=df['InvoiceDate'].dt.month_name()# summarizing by dayday_summary = df.groupby('DayOfWeek')['InvoiceNo'].nunique()# Plot 1: Barplot of purchases by day of weekaxes[0].bar(day_summary.index,day_summary, color='skyblue',)axes[0].set_title('Number of Purchases by Month')axes[0].set_xlabel('DayOfWeek')axes[0].set_ylabel('Frequency Count')axes[0].set_xticklabels((df['DayOfWeek']), rotation=45)# analyse by the hour of a dayhour_summary = df.groupby('InvoiceHour')['InvoiceNo'].nunique()# Plot 2: Line plot of purchases by houraxes[1].plot(hour_summary.index, hour_summary, marker='o')axes[1].set_title('Number of Purchases by Hour of Day')axes[1].set_xlabel('InvoiceHour')axes[1].set_ylabel('Number of orders')plt.tight_layout()plt.show()
Figure 8: Purchases by Day of Week and Hour of Day
5.2.3.1 Insights:
The number of purchases by day of the week shows significant variation, with peaks around September and November at approximately 2500, while other months like April and June have lower counts around 1000-1500.
The number of purchases by hour of day peaks between 10 AM and 2 PM, reaching around 2500, with a sharp decline after 2 PM to near zero by 6 PM.
Purchase activity is highest during midday hours and certain months, indicating potential seasonal and time-based purchasing trends.
5.3 📦 3. Product Performance Analysis
5.3.13.1) Which products (by Product or StockCode) generate the most revenue and highest volume?
Show Code
print("Figure 9: Top 5 Products by Revenue and Quantity")# Group by StockCode and calculate total revenueproduct_wise_revenue = df.groupby('StockCode')['Total_Revenue'].sum().reset_index()# Sort and pick top 5 product based on revenuetop_5_stock_df = product_wise_revenue.sort_values(by='Total_Revenue', ascending=False).head(5)top_5_stock_series = top_5_stock_df.set_index('StockCode')['Total_Revenue']total_revenue = product_wise_revenue['Total_Revenue'].sum()# Group by Product ["Description"] and calculate Total Quantityprod_wise_qty=df.groupby('Description')['Total_Quantity'].sum().reset_index()# sort and pick top 5 products based on quantitytop_5_prod=prod_wise_qty.sort_values(by='Total_Quantity',ascending=False).head(5)top_5_prod_series=top_5_prod.set_index('Description')['Total_Quantity']total_qty=top_5_prod['Total_Quantity'].sum()# Create subplotsfig, ax = plt.subplots(1, 2, figsize=(15, 5))# --- Plot 1: Horizontal bar chart for top 5 products ---ax[0].barh(top_5_stock_series.index, top_5_stock_series.values, color='lightblue')ax[0].set_title('Top 5 Stockcode based on Revenue')ax[0].set_xlabel('Total_Revenue')ax[0].set_ylabel('StockCode')ax[0].invert_yaxis()ax[0].grid(axis='x')# --- Plot 2: Pie chart for Total quantity share ---pct_contribution=(top_5_prod_series/total_qty) *100ax[1].pie( x=pct_contribution, labels=top_5_prod_series.index, autopct='%1.2f%%', explode=[0.1, 0, 0, 0, 0], shadow=True)ax[1].set_title('Top 5 Product contribution based on Total_Quantity')# Final layoutplt.tight_layout()plt.show()
Figure 9: Top 5 Products by Revenue and Quantity
5.3.1.1 Insights:
The top 5 products by revenue are StockCodes 47566, 85123A, 85099B, 23298, and 84879, with 47566 generating the highest revenue, exceeding 30,000.
The top 5 products by quantity contribution to total quantity are “pack_of_72_retrospot_cake_cases” (24.42%), “assorted_colour_bird_ornament” (22.25%), “jumbo_bag_red_retrospot” (19.80%), “lunch_bag_red_retrospot” (17.58%), and “white_hanging_heart_tlight_holder” (15.94%).
The revenue and quantity contributions highlight a diverse product performance, with “pack_of_72_retrospot_cake_cases” leading in quantity and 47566 leading in revenue.
5.3.23.2) What is the distribution of UnitPrice across different products? Are there any extremely low or high values?
Show Code
import warningswarnings.filterwarnings("ignore", category=FutureWarning)print("Figure 10: Distribution of Unit Price")# Histogram for typical price rangeplt.figure(figsize=(12, 6))sns.histplot(df[df['UnitPrice'] <=10]['UnitPrice'], bins=50, kde=True, color='#1f77b4')plt.title('Distribution of UnitPrice (≤ $10)', fontsize=14)plt.xlabel('Unit Price ($)', fontsize=12)plt.ylabel('Product Count', fontsize=12)plt.grid(alpha=0.3)plt.xlim(0, 10) plt.grid(axis='x')plt.show()# price summary statisticsprice_stats = df['UnitPrice'].describe(percentiles=[0.01, 0.25, 0.5, 0.75, 0.99])print(f"The statistics of UnitePrie Based on different products:\n",price_stats,end='\n')# Low Value Products having UnitPrice Less than $ 0.5 low_price = df[df['UnitPrice'] <0.5] print(f"\nThe Total Count of products having products priced < $0.50 : {len(low_price)}\n")print("The List of Products having UnitPrice (< $0.50) are :\n",low_price[['StockCode', 'Description', 'UnitPrice']].sample(5),end='\n')# Top 1% of products by Unit Pricehigh_price = df[df['UnitPrice'] > df['UnitPrice'].quantile(0.99)]print(f"\nTop 1% products having Unitprice (> ${high_price['UnitPrice'].min():.2f}) are : ")print(high_price[['StockCode', 'Description', 'UnitPrice']] .sort_values('UnitPrice', ascending=False) .drop_duplicates('StockCode').head(5))
Figure 10: Distribution of Unit Price
The statistics of UnitePrie Based on different products:
count 328610.000000
mean 2.200017
std 1.548488
min 0.001000
1% 0.290000
25% 1.250000
50% 1.650000
75% 2.950000
99% 6.750000
max 7.500000
Name: UnitPrice, dtype: float64
The Total Count of products having products priced < $0.50 : 31338
The List of Products having UnitPrice (< $0.50) are :
StockCode Description UnitPrice
175266 23232 wrap_vintage_leaf_design 0.42
68467 21985 pack_of_12_hearts_design_tissues 0.29
2203 22420 lipstick_pen_baby_pink 0.42
5764 22076 6_ribbons_empire 0.39
170679 22706 wrap_cowboys 0.42
Top 1% products having Unitprice (> $6.89) are :
StockCode Description UnitPrice
214994 84880 white_wire_egg_holder 7.5
214718 23112 parisienne_curio_cabinet 7.5
214459 23118 parisienne_jewellery_drawer 7.5
214946 90176C diamante_necklace 7.5
214947 90176D diamante_necklace_purple 7.5
5.4🌐 4. Country-Level Insights:
5.4.14.1) Which countries contribute most to the Total Revenue of Myntra Gift Business?
Show Code
print("Figure 11: Top 5 Countries by Total Revenue")import matplotlib.ticker as ticker# Custom dollar formatterdef format_dollars(x, pos):"""Converts values to $1.1K or $1.1M format."""if x >=1e6:returnf'${x/1e6:.1f}M'elif x >=1e3:returnf'${x/1e3:.0f}K'else:returnf'${x:.2f}'# group by top 5 countries based On 'Total_Revenue'top_5_countries = df.groupby('Country')['Total_Revenue'].sum().sort_values(ascending=False).head(5)# Plot horizontal bar chartplt.figure(figsize=(10, 6))ax=top_5_countries.plot(kind='barh',color='grey', edgecolor='black')# Step 3: Apply the custom formatter to x-axisax.xaxis.set_major_formatter(ticker.FuncFormatter(format_dollars))plt.title('🌍 Top 5 Countries by Total Revenue', fontsize=14)plt.xlabel('Total Revenue', fontsize=12)plt.ylabel('Country', fontsize=12)plt.grid(axis='x', linestyle='--', alpha=0.7)plt.gca().invert_yaxis()plt.tight_layout()plt.show()# Print top countries with formatted revenueprint('📊 The top 5 countries based on Total Revenue:')for country, revenue in top_5_countries.items():print(f'{country:<20} : ${revenue:,.2f}')
Figure 11: Top 5 Countries by Total Revenue
📊 The top 5 countries based on Total Revenue:
United Kingdom : $3,655,815.35
Germany : $137,909.44
France : $121,911.53
EIRE : $105,895.87
Switzerland : $29,790.46
5.4.1.1 Insights:
The United Kingdom dominates total revenue among the top 5 countries, contributing $3,655,815.35, far exceeding the others.
Germany, France, and EIRE follow with significantly lower revenues at $137,909.44, $121,911.53, and $105,895.87 respectively, indicating a steep drop-off after the UK.
Switzerland has the lowest revenue at $29,790.46, highlighting a vast disparity in market performance across these countries.
5.4.24.2) How does pricing differ between countries based on average UnitPrice per country?
Show Code
import warningswarnings.filterwarnings("ignore", category=FutureWarning)print("Figure 12: Unit Price Distribution by Country")import plotly.express as pximport cufflinks as cfcf.go_offline()# Summary of country based on the columns ---> "UnitPrice","Total_Revenue","Quantity"country_stats=df.groupby('Country').agg({'UnitPrice':['mean','median','count'],'Total_Revenue': 'sum','Quantity' : 'sum'}).sort_values(('Total_Revenue','sum'),ascending=False)# Rename the Aggregate columncountry_stats.columns = ['Avg_Price', 'Median_Price', 'Order_Count', 'Total_Revenue', 'Total_Items']# Derived new column such as - "Revenue_per_order" and "Items_per_order"country_stats['Revenue_per_Order'] = country_stats['Total_Revenue'] / country_stats['Order_Count']country_stats['Items_per_Order'] = country_stats['Total_Items'] / country_stats['Order_Count']plot_data = df[df['Country'].isin(country_stats.head(10).index) & (df['UnitPrice'] <=10)]plt.figure(figsize=(12,10))# Create interactive boxplotfig = px.box( plot_data, x='Country', y='UnitPrice', color='Country', color_discrete_sequence=px.colors.sequential.Viridis, title='Unit Price Distribution by Country ', labels={'UnitPrice': 'Unit Price ($)', 'Country': ''}, hover_data=['Description'] # Show product description on hover)fig.show()# print(f'The Overall Summary of Top 10 Countries:')print(country_stats.head(10).reset_index().to_string(index=False))