This guide covers various methods for sorting data in Pandas, including the primary sorting functions sort_values() and sort_index(), as well as nlargest(), nsmallest(), reindex(), and the use of the key parameter in sort_values().
1. Sorting with sort_values()
The sort_values() method sorts a DataFrame based on column values. It is the most commonly used sorting function.
import pandas as pd
df = pd.DataFrame({
'A': [3, 1, 2, 4],
'B': [10, 50, 20, 20],
'C': ['b', 'a', 'b', 'c']
})
# 1) Sorting by a single column
df_sorted_single = df.sort_values(by='A')
print(df_sorted_single)
# A B C
# 1 1 50 a
# 2 2 20 b
# 0 3 10 b
# 3 4 20 c
# 2) Sorting by multiple columns
df_sorted_multi = df.sort_values(by=['B', 'A'], ascending=[True, False])
print(df_sorted_multi)
# A B C
# 0 3 10 b
# 3 4 20 c
# 2 2 20 b
# 1 1 50 a
Key Parameters
- by: Column name(s) to sort by.
- ascending: True for ascending order, False for descending order. Can be a list for multiple columns.
- inplace: True modifies the original DataFrame.
- na_position: 'first' or 'last' (default) to specify where NaN values should appear.
- key: (Pandas 1.1.0+) Applies a function to the column before sorting.
Example: Sorting Without Case Sensitivity
df.sort_values(by='C', key=lambda col: col.str.lower())
2. Sorting with sort_index()
The sort_index() method sorts a DataFrame based on index labels (row or column names).
df = pd.DataFrame({
'A': [3, 1, 2],
'B': [2, 3, 1]
}, index=['c', 'a', 'b'])
# Sorting by index (ascending)
df_sorted_index = df.sort_index()
print(df_sorted_index)
# A B
# a 1 3
# b 2 1
# c 3 2
# Sorting by index (descending)
df_sorted_index_desc = df.sort_index(ascending=False)
print(df_sorted_index_desc)
# A B
# c 3 2
# b 2 1
# a 1 3
# Sorting by column names
df_sorted_columns = df.sort_index(axis=1)
print(df_sorted_columns)
# A B
# c 3 2
# a 1 3
# b 2 1
- axis: 0 (default) for row index, 1 for column names.
- ascending: True (default) for ascending order, False for descending.
- inplace: True modifies the original DataFrame.
3. Selecting Top or Bottom n Values: nlargest() and nsmallest()
The nlargest() and nsmallest() methods extract the largest or smallest n values from a specified column. These methods are optimized for efficiency.
# Selecting top 3 values from column 'B'
df_top3 = df.nlargest(3, 'B')
# A B
# a 1 3
# c 3 2
# b 2 1
# Selecting bottom 3 values from column 'B'
df_bottom3 = df.nsmallest(3, 'B')
# A B
# b 2 1
# c 3 2
# a 1 3
4. Reindexing with reindex()
The reindex() method rearranges rows or columns based on a provided index list. While not strictly a sorting method, it can be used to reorder a DataFrame.
# Sorting index manually
sorted_idx = sorted(df.index)
df_reindexed = df.reindex(sorted_idx)
print(df_reindexed)
# A B
# a 1 3
# b 2 1
# c 3 2
5. Sorting MultiIndex DataFrames
For MultiIndex DataFrames, sorting is done using the level parameter.
df_multi = df.sort_index(level=[0, 1], ascending=[True, False])
Summary of Sorting Methods
Method | Description |
sort_values() | Sorts by column values (ascending/descending, multiple columns, custom sorting with key). |
sort_index() | Sorts by row or column labels. |
nlargest() / nsmallest() | Quickly extracts the top/bottom n rows from a specific column. |
reindex() | Reorders rows/columns based on a given index list. |
MultiIndex Sorting | Uses sort_index(level=...) for fine-grained control over MultiIndex sorting. |
Most sorting tasks can be handled with sort_values() and sort_index(), while nlargest() and nsmallest() are useful for quickly extracting top or bottom values. The key parameter and MultiIndex sorting provide additional flexibility when needed.
'๊ฐ๋ฐ Code > ํ์ด์ฌ Python' ์นดํ ๊ณ ๋ฆฌ์ ๋ค๋ฅธ ๊ธ
[Python][pandas] DataFrame ํ๋ณ ์ํ(iterate) ๋ฐฉ๋ฒ ์ ๋ฆฌ (0) | 2025.02.24 |
---|---|
[Python][pandas] Parquet ํ์ผ ํฌ๋งท: ๊ณ ์ ๋ฐ์ดํฐ ์ฒ๋ฆฌ์ ์ต์ ํ๋ ์ปฌ๋ผ ์ ์ฅ ๋ฐฉ์ (0) | 2025.02.19 |
[Python][pandas] Loading Data - Excel (0) | 2025.02.13 |
[Python][program] CLI ASCII art ๋ฐ๋ ํ์ธ ๋ฉ์ธ์ง ์ฐ๊ธฐ (0) | 2025.02.12 |
[Python][pandas] Loading Data - CSV (0) | 2025.02.11 |