๊ฐœ๋ฐœ Code/ํŒŒ์ด์ฌ Python

[Python][pandas] Sorting Data - sort

5hr1rnp 2025. 2. 13. 21:27
๋ฐ˜์‘ํ˜•

 

This guide covers various methods for sorting data in Pandas, including the primary sorting functions sort_values() and sort_index(), as well as nlargest(), nsmallest(), reindex(), and the use of the key parameter in sort_values().


1. Sorting with sort_values()


The sort_values() method sorts a DataFrame based on column values. It is the most commonly used sorting function.

 

import pandas as pd

df = pd.DataFrame({
    'A': [3, 1, 2, 4],
    'B': [10, 50, 20, 20],
    'C': ['b', 'a', 'b', 'c']
})

# 1) Sorting by a single column
df_sorted_single = df.sort_values(by='A')
print(df_sorted_single)

#    A   B  C
# 1  1  50  a
# 2  2  20  b
# 0  3  10  b
# 3  4  20  c

# 2) Sorting by multiple columns
df_sorted_multi = df.sort_values(by=['B', 'A'], ascending=[True, False])
print(df_sorted_multi)

#    A   B  C
# 0  3  10  b
# 3  4  20  c
# 2  2  20  b
# 1  1  50  a

Key Parameters

  • by: Column name(s) to sort by.
  • ascending: True for ascending order, False for descending order. Can be a list for multiple columns.
  • inplace: True modifies the original DataFrame.
  • na_position: 'first' or 'last' (default) to specify where NaN values should appear.
  • key: (Pandas 1.1.0+) Applies a function to the column before sorting.

Example: Sorting Without Case Sensitivity

df.sort_values(by='C', key=lambda col: col.str.lower())
 

2. Sorting with sort_index()


The sort_index() method sorts a DataFrame based on index labels (row or column names).

 

df = pd.DataFrame({
    'A': [3, 1, 2],
    'B': [2, 3, 1]
}, index=['c', 'a', 'b'])

# Sorting by index (ascending)
df_sorted_index = df.sort_index()
print(df_sorted_index)

#    A  B
# a  1  3
# b  2  1
# c  3  2

# Sorting by index (descending)
df_sorted_index_desc = df.sort_index(ascending=False)
print(df_sorted_index_desc)

#    A  B
# c  3  2
# b  2  1
# a  1  3

# Sorting by column names
df_sorted_columns = df.sort_index(axis=1)
print(df_sorted_columns)

#    A  B
# c  3  2
# a  1  3
# b  2  1
Key Parameters
  • axis: 0 (default) for row index, 1 for column names.
  • ascending: True (default) for ascending order, False for descending.
  • inplace: True modifies the original DataFrame.

728x90
๋ฐ˜์‘ํ˜•

3. Selecting Top or Bottom n Values: nlargest() and nsmallest()


The nlargest() and nsmallest() methods extract the largest or smallest n values from a specified column. These methods are optimized for efficiency.

 

# Selecting top 3 values from column 'B'
df_top3 = df.nlargest(3, 'B')

#    A  B
# a  1  3
# c  3  2
# b  2  1

# Selecting bottom 3 values from column 'B'
df_bottom3 = df.nsmallest(3, 'B')

#    A  B
# b  2  1
# c  3  2
# a  1  3
 

4. Reindexing with reindex()


The reindex() method rearranges rows or columns based on a provided index list. While not strictly a sorting method, it can be used to reorder a DataFrame.

 

# Sorting index manually
sorted_idx = sorted(df.index)
df_reindexed = df.reindex(sorted_idx)
print(df_reindexed)

#    A  B
# a  1  3
# b  2  1
# c  3  2
 

5. Sorting MultiIndex DataFrames


For MultiIndex DataFrames, sorting is done using the level parameter.

df_multi = df.sort_index(level=[0, 1], ascending=[True, False])
 

Summary of Sorting Methods


Method Description
sort_values() Sorts by column values (ascending/descending, multiple columns, custom sorting with key).
sort_index() Sorts by row or column labels.
nlargest() / nsmallest() Quickly extracts the top/bottom n rows from a specific column.
reindex() Reorders rows/columns based on a given index list.
MultiIndex Sorting Uses sort_index(level=...) for fine-grained control over MultiIndex sorting.

 

Most sorting tasks can be handled with sort_values() and sort_index(), while nlargest() and nsmallest() are useful for quickly extracting top or bottom values. The key parameter and MultiIndex sorting provide additional flexibility when needed.

๋ฐ˜์‘ํ˜•