๊ฐœ๋ฐœ Code/์ธ๊ณต์ง€๋Šฅ A.I.

[Python][AI] Exploratory Data Analysis (EDA) - Wine Quality Dataset - 3

5hr1rnp 2025. 2. 10. 16:48
๋ฐ˜์‘ํ˜•

 

2025.02.10 - [๊ฐœ๋ฐœ Code/์ธ๊ณต์ง€๋Šฅ A.I.] - [Python][AI] Exploratory Data Analysis (EDA) - Wine Quality Dataset - 1

2025.02.10 - [๊ฐœ๋ฐœ Code/์ธ๊ณต์ง€๋Šฅ A.I.] - [Python][AI] Exploratory Data Analysis (EDA) - Wine Quality Dataset - 2

 

In this section, we will visualize the relationships between variables and identify key patterns in the dataset.


Wine Quality Distribution & Correlation Analysis


# Library Version
# pandas    : 2.2.3
# numpy     : 1.23.5
# matplotlib: 3.9.2
# seaborn   : 0.13.2

import matplotlib.pyplot as plt
import seaborn as sns

# Wine quality distribution visualization
plt.figure(figsize=(10, 5))
sns.histplot(red_wine['quality'], bins=6, kde=True, color='red', label='Red Wine')
sns.histplot(white_wine['quality'], bins=6, kde=True, color='blue', label='White Wine')
plt.legend()
plt.title("Wine Quality Distribution (Red & White)")
plt.xlabel("Quality")
plt.ylabel("Count")
plt.grid(True)
plt.show()

# Correlation heatmap for Red Wine
plt.figure(figsize=(12, 8))
sns.heatmap(red_wine.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap (Red Wine)")
plt.show()

# Correlation heatmap for White Wine
plt.figure(figsize=(12, 8))
sns.heatmap(white_wine.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap (White Wine)")
plt.show()

Observations:

  • The distribution of quality scores differs between red and white wines.
  • Mid-range quality (5–6) is the most common, while extremely high-quality wines (8+) are rare.

Wine Quality Distribution

728x90
๋ฐ˜์‘ํ˜•

 

Observations from the Heatmap:

  • Red Wine:
    • Alcohol has a strong positive correlation with quality.
  • White Wine:
    • Alcohol also positively correlates with quality.
    • Volatile acidity shows a negative correlation with quality.
  • Overall Trend:
    • Density and quality show a negative correlation in both red and white wines.

Red Wine Variable Correlation
White Wine Variable Correlation


Violin Plot Analysis


To analyze key variables in relation to wine quality, we use violin plots.

# Select key features for visualization
features = ['alcohol', 'volatile acidity', 'density', 'sulphates', 'citric acid']

# Violin plot for Red Wine
plt.figure(figsize=(15, 10))
for i, feature in enumerate(features, 1):
    plt.subplot(2, 3, i)
    sns.violinplot(x=red_wine['quality'], y=red_wine[feature], palette="Reds")
    plt.title(f"{feature} vs Quality (Red Wine)")
    plt.xlabel("Quality")
    plt.ylabel(feature)
    plt.grid(True)

plt.tight_layout()
plt.show()

# Violin plot for White Wine
plt.figure(figsize=(15, 10))
for i, feature in enumerate(features, 1):
    plt.subplot(2, 3, i)
    sns.violinplot(x=white_wine['quality'], y=white_wine[feature], palette="Blues")
    plt.title(f"{feature} vs Quality (White Wine)")
    plt.xlabel("Quality")
    plt.ylabel(feature)
    plt.grid(True)

plt.grid(True)
plt.tight_layout()
plt.show()

Observations from Violin Plots:

  • Alcohol:
    • Higher alcohol content is associated with higher-quality wines.
    • Especially in quality 7–8 wines, alcohol content is noticeably higher.
  • Volatile Acidity:
    • Lower-quality wines (4–5 range) have higher volatile acidity.
    • This suggests that high volatile acidity negatively affects wine quality.
  • Density:
    • Lower quality wines tend to have higher density.
    • This trend is particularly strong in white wines.
  • Sulphates:
    • Wines with quality 7+ tend to have slightly higher sulphate levels.
  • Citric Acid:
    • Higher citric acid concentrations are generally observed in higher-quality wines, but the difference is not significant in some cases.

Checking the Distribution of Key Characteristics by Red Wine Quality
Checking the Distribution of Key Characteristics by Red Wine Quality

 

๋ฐ˜์‘ํ˜•