๊ฐœ๋ฐœ Code/ํŒŒ์ด์ฌ Python

[Python][pandas] Loading Data - Excel

5hr1rnp 2025. 2. 13. 21:19
๋ฐ˜์‘ํ˜•

What is an Excel File?


An Excel file is a spreadsheet format created by Microsoft Excel, commonly stored with the extensions .xlsx or .xls. Each Excel file consists of multiple sheets, where data is organized into rows and columns.

Excel is one of the most widely used formats in data analysis. Pandas provides the read_excel() function to easily handle Excel files. This guide will cover the basic structure of an Excel file, how to read it using Pandas, and common issues and solutions.

Example Excel File:

# Sheet1
Name		Age	City
KimSeoul	25	Seoul
LeeJeonju	30	Jeonju
SongGoungju	35	Goungju

# Sheet2
Product    Sales
A          100
B          200
C          300

Loading an Excel File with Pandas


The read_excel() function in Pandas reads an Excel file and converts it into a DataFrame.

 

import pandas as pd

# Read an Excel file
df = pd.read_excel('data.xlsx')

# Display first few rows
print(df.head())

# output example
# Name		Age	City
# KimSeoul	25	Seoul
# LeeJeonju	30	Jeonju
# SongGoungju	35	Goungju
 
 

728x90
๋ฐ˜์‘ํ˜•

Key Parameters of read_excel()


1. sheet_name โ€“ Specifying the Sheet to Read

By default, Pandas reads the first sheet (sheet_name=0).

 

df = pd.read_excel('data.xlsx', sheet_name=0)  # Read the first sheet
df_sheet1 = pd.read_excel('data.xlsx', sheet_name='Sheet1')  # Read a specific sheet by name
dfs = pd.read_excel('data.xlsx', sheet_name=None)  # Read all sheets (returns a dictionary)

2. header โ€“ Defining the Header Row

By default, the first row (header=0) is treated as the column names.

df = pd.read_excel('data.xlsx', header=0)  # Use the first row as headers
df_no_header = pd.read_excel('data.xlsx', header=None)  # Read without headers

3. usecols โ€“ Selecting Specific Columns

Define which columns to read using column indices or names.

 

df_columns = pd.read_excel('data.xlsx', usecols='A:C')  # Read columns A to C
df_selected = pd.read_excel('data.xlsx', usecols=['Name', 'Age'])  # Read specific column names
 

4. skiprows โ€“ Skipping Rows

Skip the first n rows when reading the file, useful when there is metadata or descriptions at the top.

 

df_skip = pd.read_excel('data.xlsx', skiprows=2)  # Skip the first 2 rows

5. dtype โ€“ Specifying Column Data Types

Define the data type of specific columns when reading the file.

 

df = pd.read_excel('data.xlsx', dtype={'Age': int})  # Read 'Age' column as integers

Handling Errors in read_excel()


Like read_csv(), read_excel() may encounter errors related to encoding, missing data, or formatting issues. Refer to previous guides on handling errors in Pandas file operations for troubleshooting tips.

By understanding and utilizing these parameters, loading and processing Excel files with Pandas becomes more efficient and flexible.

 

2025.02.11 - [๊ฐœ๋ฐœ Code/ํŒŒ์ด์ฌ Python] - [Python][pandas] Loading Data - CSV

 

[Python][pandas] Loading Data - CSV

What is a CSV File?One of the most commonly used formats in data analysis is CSV (Comma-Separated Values). CSV files store data in a simple text format, where values are separated by commas (or other delimiters).Pandas provides a powerful function, read_cs

5hr1rnp.tistory.com

 

๋ฐ˜์‘ํ˜•