
What is an Excel File?
An Excel file is a spreadsheet format created by Microsoft Excel, commonly stored with the extensions .xlsx or .xls. Each Excel file consists of multiple sheets, where data is organized into rows and columns.
Excel is one of the most widely used formats in data analysis. Pandas provides the read_excel() function to easily handle Excel files. This guide will cover the basic structure of an Excel file, how to read it using Pandas, and common issues and solutions.
Example Excel File:
# Sheet1
Name Age City
KimSeoul 25 Seoul
LeeJeonju 30 Jeonju
SongGoungju 35 Goungju
# Sheet2
Product Sales
A 100
B 200
C 300
Loading an Excel File with Pandas
The read_excel() function in Pandas reads an Excel file and converts it into a DataFrame.
import pandas as pd
# Read an Excel file
df = pd.read_excel('data.xlsx')
# Display first few rows
print(df.head())
# output example
# Name Age City
# KimSeoul 25 Seoul
# LeeJeonju 30 Jeonju
# SongGoungju 35 Goungju
Key Parameters of read_excel()
1. sheet_name โ Specifying the Sheet to Read
By default, Pandas reads the first sheet (sheet_name=0).
df = pd.read_excel('data.xlsx', sheet_name=0) # Read the first sheet
df_sheet1 = pd.read_excel('data.xlsx', sheet_name='Sheet1') # Read a specific sheet by name
dfs = pd.read_excel('data.xlsx', sheet_name=None) # Read all sheets (returns a dictionary)
2. header โ Defining the Header Row
By default, the first row (header=0) is treated as the column names.
df = pd.read_excel('data.xlsx', header=0) # Use the first row as headers
df_no_header = pd.read_excel('data.xlsx', header=None) # Read without headers
3. usecols โ Selecting Specific Columns
Define which columns to read using column indices or names.
df_columns = pd.read_excel('data.xlsx', usecols='A:C') # Read columns A to C
df_selected = pd.read_excel('data.xlsx', usecols=['Name', 'Age']) # Read specific column names
4. skiprows โ Skipping Rows
Skip the first n rows when reading the file, useful when there is metadata or descriptions at the top.
df_skip = pd.read_excel('data.xlsx', skiprows=2) # Skip the first 2 rows
5. dtype โ Specifying Column Data Types
Define the data type of specific columns when reading the file.
df = pd.read_excel('data.xlsx', dtype={'Age': int}) # Read 'Age' column as integers
Handling Errors in read_excel()
Like read_csv(), read_excel() may encounter errors related to encoding, missing data, or formatting issues. Refer to previous guides on handling errors in Pandas file operations for troubleshooting tips.
By understanding and utilizing these parameters, loading and processing Excel files with Pandas becomes more efficient and flexible.
2025.02.11 - [๊ฐ๋ฐ Code/ํ์ด์ฌ Python] - [Python][pandas] Loading Data - CSV
[Python][pandas] Loading Data - CSV
What is a CSV File?One of the most commonly used formats in data analysis is CSV (Comma-Separated Values). CSV files store data in a simple text format, where values are separated by commas (or other delimiters).Pandas provides a powerful function, read_cs
5hr1rnp.tistory.com
'๊ฐ๋ฐ Code > ํ์ด์ฌ Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Python][pandas] Parquet ํ์ผ ํฌ๋งท: ๊ณ ์ ๋ฐ์ดํฐ ์ฒ๋ฆฌ์ ์ต์ ํ๋ ์ปฌ๋ผ ์ ์ฅ ๋ฐฉ์ (0) | 2025.02.19 |
---|---|
[Python][pandas] Sorting Data - sort (0) | 2025.02.13 |
[Python][program] CLI ASCII art ๋ฐ๋ ํ์ธ ๋ฉ์ธ์ง ์ฐ๊ธฐ (0) | 2025.02.12 |
[Python][pandas] Loading Data - CSV (0) | 2025.02.11 |
[Python][pandas] Exploring pandas in Depth (0) | 2025.02.11 |