๋ฐ์ํ

Exploratory Data Analysis (EDA) is the first step in data analysis, where data is visually explored, summary statistics are examined, and patterns and characteristics of the dataset are identified. In this post, we will walk through the step-by-step process of exploring data using the Wine Quality Dataset.
What is EDA?
EDA (Exploratory Data Analysis) is a crucial process for gaining a deeper understanding of a dataset. The primary objectives of EDA include:
- Understanding the distribution and structure of the data
- Detecting missing values and outliers
- Identifying key characteristics and patterns in the data
- Extracting insights necessary for preprocessing and modeling
Dataset Information
- Dataset Name: Wine Quality Dataset
- Source: UCI Machine Learning Repository
- Dataset Composition:
- Total samples: 6,497 (Red wine : 1,599 and White wine : 4,898)
- Input variables (11 features): Chemical properties of wine (e.g., acidity, sugar content, pH, etc.)
- Output variable (1 target): Wine quality score (0โ10)
Input Variables (Features)
1. Fixed Acidity (fixed_acidity)
- Description: Fixed acidity refers to the concentration of non-volatile acids in wine, primarily including tartaric acid and malic acid.
- Unit: g/dmยณ
- Role:
- Influences the tartness of the wine
- Essential for maintaining the wineโs freshness
2. Volatile Acidity (volatile_acidity)
- Description: Volatile acidity represents the concentration of acids that can evaporate, primarily acetic acid.
- Unit: g/dmยณ
- Role:
- High volatile acidity can give the wine a vinegar-like taste, lowering its quality
- Moderate levels add complexity to the wineโs flavor
3. Citric Acid (citric_acid)
- Description: Citric acid is a natural component that helps regulate acidity and contributes to the wineโs freshness.
- Unit: g/dmยณ
- Role:
- Balances and enhances the wineโs tartness
- Low levels may indicate an aged or lower-quality wine
4. Residual Sugar (residual_sugar)
- Description: Residual sugar refers to the concentration of sugar that remains after fermentation.
- Unit: g/dmยณ
- Role:
- Directly affects the wineโs sweetness
- Most wines range from 1 g/dmยณ to 10 g/dmยณ, while dessert wines have much higher levels
5. Chlorides (chlorides)
- Description: Chlorides represent the concentration of salts in wine.
- Unit: g/dmยณ
- Role:
- Affects the saltiness of the wine
- High chloride levels can degrade the wineโs quality
728x90
๋ฐ์ํ
6. Free Sulfur Dioxide (free_sulfur_dioxide)
- Description: The amount of sulfur dioxide (SOโ) present in a chemically free state in the wine.
- Unit: mg/dmยณ
- Role:
- Prevents oxidation and microbial activity, preserving the wineโs freshness
- Excessive amounts can negatively affect the wineโs flavor
7. Total Sulfur Dioxide (total_sulfur_dioxide)
- Description: The total concentration of free and bound sulfur dioxide in the wine.
- Unit: mg/dmยณ
- Role:
- Enhances wine preservation
- High levels may cause health concerns for sensitive individuals
8. Density (density)
- Description: The relative density of wine compared to water (1.000 g/cmยณ).
- Unit: g/cmยณ
- Role:
- Reflects the concentration of residual sugar and alcohol
- Higher alcohol or sugar content increases the density
9. pH (pH)
- Description: The pH value measures the acidity or alkalinity of the wine.
- Range: 0โ14 (typically between 3โ4 for wine)
- Role:
- Affects the wineโs freshness and stability
- Lower pH indicates a fresher taste, while higher pH increases oxidation risk
10. Sulphates (sulphates)
- Description: Sulphates are compounds formed during fermentation that impact the wineโs preservation and aroma.
- Unit: g/dmยณ
- Role:
- Enhances the wineโs aroma and oxidation resistance
- Excessive sulphates can cause a bitter taste
11. Alcohol (alcohol)
- Description: The alcohol content of the wine.
- Unit: % (by volume)
- Role:
- Directly affects the body and flavor of the wine
- Higher alcohol content is often associated with higher-quality wines
Output Variable (Target)
Quality Score (quality)
- Description: The overall quality score of the wine, rated on a scale of 0 to 10.
- Role:
- Used as the target variable for prediction
- The goal of this dataset is to predict wine quality based on its chemical properties
๋ฐ์ํ