๋ฐ์ํ
In this section, we will visualize the relationships between variables and identify key patterns in the dataset.
Wine Quality Distribution & Correlation Analysis
# Library Version
# pandas : 2.2.3
# numpy : 1.23.5
# matplotlib: 3.9.2
# seaborn : 0.13.2
import matplotlib.pyplot as plt
import seaborn as sns
# Wine quality distribution visualization
plt.figure(figsize=(10, 5))
sns.histplot(red_wine['quality'], bins=6, kde=True, color='red', label='Red Wine')
sns.histplot(white_wine['quality'], bins=6, kde=True, color='blue', label='White Wine')
plt.legend()
plt.title("Wine Quality Distribution (Red & White)")
plt.xlabel("Quality")
plt.ylabel("Count")
plt.grid(True)
plt.show()
# Correlation heatmap for Red Wine
plt.figure(figsize=(12, 8))
sns.heatmap(red_wine.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap (Red Wine)")
plt.show()
# Correlation heatmap for White Wine
plt.figure(figsize=(12, 8))
sns.heatmap(white_wine.corr(), annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title("Correlation Heatmap (White Wine)")
plt.show()
Observations:
- The distribution of quality scores differs between red and white wines.
- Mid-range quality (5–6) is the most common, while extremely high-quality wines (8+) are rare.
728x90
๋ฐ์ํ
Observations from the Heatmap:
- Red Wine:
- Alcohol has a strong positive correlation with quality.
- White Wine:
- Alcohol also positively correlates with quality.
- Volatile acidity shows a negative correlation with quality.
- Overall Trend:
- Density and quality show a negative correlation in both red and white wines.
Violin Plot Analysis
To analyze key variables in relation to wine quality, we use violin plots.
# Select key features for visualization
features = ['alcohol', 'volatile acidity', 'density', 'sulphates', 'citric acid']
# Violin plot for Red Wine
plt.figure(figsize=(15, 10))
for i, feature in enumerate(features, 1):
plt.subplot(2, 3, i)
sns.violinplot(x=red_wine['quality'], y=red_wine[feature], palette="Reds")
plt.title(f"{feature} vs Quality (Red Wine)")
plt.xlabel("Quality")
plt.ylabel(feature)
plt.grid(True)
plt.tight_layout()
plt.show()
# Violin plot for White Wine
plt.figure(figsize=(15, 10))
for i, feature in enumerate(features, 1):
plt.subplot(2, 3, i)
sns.violinplot(x=white_wine['quality'], y=white_wine[feature], palette="Blues")
plt.title(f"{feature} vs Quality (White Wine)")
plt.xlabel("Quality")
plt.ylabel(feature)
plt.grid(True)
plt.grid(True)
plt.tight_layout()
plt.show()
Observations from Violin Plots:
- Alcohol:
- Higher alcohol content is associated with higher-quality wines.
- Especially in quality 7–8 wines, alcohol content is noticeably higher.
- Volatile Acidity:
- Lower-quality wines (4–5 range) have higher volatile acidity.
- This suggests that high volatile acidity negatively affects wine quality.
- Density:
- Lower quality wines tend to have higher density.
- This trend is particularly strong in white wines.
- Sulphates:
- Wines with quality 7+ tend to have slightly higher sulphate levels.
- Citric Acid:
- Higher citric acid concentrations are generally observed in higher-quality wines, but the difference is not significant in some cases.
๋ฐ์ํ