728x90
๋ฐ˜์‘ํ˜•

๊ฐœ๋ฐœ Code/ํŒŒ์ด์ฌ Python 14

[Python][pandas] DataFrame ํ–‰๋ณ„ ์ˆœํšŒ(iterate) ๋ฐฉ๋ฒ• ์ •๋ฆฌ

Pandas์˜ DataFrame์—์„œ ํ–‰์„ ์ˆœํšŒ(iterate)ํ•ด์•ผ ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์ข…์ข… ์žˆ๋‹ค. ํ•˜์ง€๋งŒ Pandas๋Š” ๋ฒกํ„ฐํ™” ์—ฐ์‚ฐ์ด ํ›จ์”ฌ ๋น ๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ๊ฐ€๋Šฅํ•˜๋ฉด apply() ๊ฐ™์€ ๋ฉ”์„œ๋“œ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ข‹๋‹ค. ๊ทธ๋ ‡๋‹ค๋ฉด, ์–ธ์ œ ํ–‰์„ ์ˆœํšŒํ•ด์•ผ ํ• ๊นŒ? ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ค ๋ฐฉ๋ฒ•์ด ๊ฐ€์žฅ ํšจ์œจ์ ์ผ๊นŒ? ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ์ •๋ฆฌํ•ด๋ณธ๋‹ค.1. iterrows() ์‚ฌ์šฉํ•˜๊ธฐiterrows()๋Š” ๊ฐ€์žฅ ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ง€๋งŒ, ์„ฑ๋Šฅ์ด ๋Š๋ฆฌ๋‹ค๋Š” ๋‹จ์ ์ด ์žˆ๋‹ค. ๊ฐ ํ–‰์„ index, Series ํ˜•ํƒœ๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค.import pandas as pd# ์˜ˆ์ œ ๋ฐ์ดํ„ฐdata = {'A': [1, 2, 3], 'B': [4, 5, 6]}df = pd.DataFrame(data)# iterrows ์‚ฌ์šฉfor index, row in d..

[Python][pandas] Parquet ํŒŒ์ผ ํฌ๋งท: ๊ณ ์† ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ์ตœ์ ํ™”๋œ ์ปฌ๋Ÿผ ์ €์žฅ ๋ฐฉ์‹

๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ๋จธ์‹ ๋Ÿฌ๋‹์„ ํ•˜๋‹ค ๋ณด๋ฉด CSV, JSON, Excel ๋“ฑ์˜ ํŒŒ์ผ ํฌ๋งท์„ ์ž์ฃผ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ๋•Œ๋Š” ์†๋„์™€ ์ €์žฅ ํšจ์œจ์„ฑ์ด ์ค‘์š”ํ•œ๋ฐ, ์ด๋Ÿด ๋•Œ Parquet ํฌ๋งท์ด ๊ฐ•๋ ฅํ•œ ๋Œ€์•ˆ์ด ๋  ์ˆ˜ ์žˆ๋‹ค.1. Parquet๋ž€?Apache Parquet๋Š” ์ปฌ๋Ÿผ ๊ธฐ๋ฐ˜ ์ €์žฅ ๋ฐฉ์‹(columnar storage format)์„ ์‚ฌ์šฉํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ๋ฐ์ดํ„ฐ ํฌ๋งท์ž„. Hadoop ์ƒํƒœ๊ณ„์—์„œ ๊ฐœ๋ฐœ๋˜์—ˆ์œผ๋ฉฐ, ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์™€ ๋ถ„์„ ์„ฑ๋Šฅ์„ ๊ทน๋Œ€ํ™”ํ•˜๋Š” ๋ฐ ์ตœ์ ํ™”๋จ.ํŠน์ง•์ปฌ๋Ÿผ ๊ธฐ๋ฐ˜ ์ €์žฅ(Columnar Storage)CSV๋‚˜ JSON ๊ฐ™์€ ํฌ๋งท์€ ๋ฐ์ดํ„ฐ๋ฅผ ํ–‰(Row) ๋‹จ์œ„๋กœ ์ €์žฅํ•˜์ง€๋งŒ, Parquet์€ ์ปฌ๋Ÿผ(Column) ๋‹จ์œ„๋กœ ์ €์žฅํ•จํŠน์ • ์ปฌ๋Ÿผ๋งŒ ์ฝ์–ด๋„ ๋˜๋ฏ€๋กœ, ๋ถ„์„ ์†๋„๊ฐ€ ํ–ฅ์ƒ๋จ์••์ถ• ๋ฐ ์ธ์ฝ”๋”ฉ(Co..

[Python][pandas] Sorting Data - sort

This guide covers various methods for sorting data in Pandas, including the primary sorting functions sort_values() and sort_index(), as well as nlargest(), nsmallest(), reindex(), and the use of the key parameter in sort_values().1. Sorting with sort_values()The sort_values() method sorts a DataFrame based on column values. It is the most commonly used sorting function. import pandas as pddf = ..

[Python][pandas] Loading Data - Excel

What is an Excel File?An Excel file is a spreadsheet format created by Microsoft Excel, commonly stored with the extensions .xlsx or .xls. Each Excel file consists of multiple sheets, where data is organized into rows and columns.Excel is one of the most widely used formats in data analysis. Pandas provides the read_excel() function to easily handle Excel files. This guide will cover the basic s..

[Python][program] CLI ASCII art ๋ฐœ๋ Œํƒ€์ธ ๋ฉ”์„ธ์ง€ ์“ฐ๊ธฐ

Python์„ ํ™œ์šฉํ•ด ๋ฐœ๋ Œํƒ€์ธ ๋ฐ์ด์— ์ง์ ‘ ๋งŒ๋“  ASCII ์•„ํŠธ ์นด๋“œ๋ฅผ ํ„ฐ๋ฏธ๋„์—์„œ ์ถœ๋ ฅํ•ด๋ณด๋ฉด ์–ด๋–จ๊นŒ? ๋ผ๋Š” ์ƒ๊ฐ์œผ๋กœ ์ž‘์—…์„ ์ง„ํ–‰ํ–ˆ๋‹ค. cowsay๊ฐ€ ๋– ์˜ฌ๋ž๊ณ  ์ด๊ฑธ ํ™œ์šฉํ•ด์„œ ๋งŒ๋“ค์–ด๋ณด๋ฉด ์žฌ๋ฐŒ๊ฒ ๋‹ค ์‹ถ์—ˆ๋‹ค.ํ•˜ํŠธ๋‚˜ ์นด๋“œ ๋ชจ์–‘์„ ์ถœ๋ ฅํ•˜๊ณ , ์›ํ•˜๋Š” ๋ฌธ๊ตฌ๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ํŠน๋ณ„ํ•œ ๋ฉ”์‹œ์ง€๋ฅผ ๋‚จ๊ธธ ์ˆ˜ ์žˆ๊ฒŒ๋” ์ž‘์—…ํ–ˆ๋‹ค. ์ฃผ์š” ํฌ์ธํŠธ:Python์—์„œ ํ…์ŠคํŠธ๋ฅผ ์ž๋™ ์ค„๋ฐ”๊ฟˆํ•ด์ฃผ๋Š” textwrap ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์‚ฌ์šฉASCII ์•„ํŠธ๋ฅผ ๋ฌธ์ž์—ด๋กœ ๊ตฌ์„ฑ์‚ฌ์šฉ์ž ์ž…๋ ฅ์„ ๋ฐ›์•„์„œ ์นด๋“œ ๋‚ด์šฉ์— ๋ฐ˜์˜ํ•˜์ง€๋งŒ ํ•œ๊ธ€๋กœ ์ง„ํ–‰ํ•˜๋ ค๋‹ˆ ์ž๋ฆฌ๊ฐ€ ์ž˜ ๋งž์ง€ ์•Š๋Š” ๋ฌธ์ œ๋Š” ๋‚จ๊ฒจ๋‘์—ˆ๋‹ค. ์˜์–ด๋กœ ์ž‘์„ฑํ•ด ๋ณด์‹œ๋ผ. import textwrapdef generate_valentine_card(message): # ๋ฉ”์‹œ์ง€๋ฅผ ์ค„๋ฐ”๊ฟˆํ•˜์—ฌ ์ •๋ฆฌ (์ตœ๋Œ€ 15์ž ๊ธฐ์ค€ ์ค„๋ฐ”๊ฟˆ) wrappe..

[Python][pandas] Loading Data - CSV

What is a CSV File?One of the most commonly used formats in data analysis is CSV (Comma-Separated Values). CSV files store data in a simple text format, where values are separated by commas (or other delimiters).Pandas provides a powerful function, read_csv(), to easily load CSV files into a DataFrame. In this guide, we will explore what a CSV file is, how to load it using Pandas, key parameters..

[Python][pandas] Exploring pandas in Depth

What is Pandas?  Pandas is an open-source Python library designed for data manipulation and analysis. It was developed by Wes McKinney in 2008 when he saw the need for an efficient and intuitive tool to handle financial data. The name "Pandas" originates from "PANel DAta," reflecting its focus on handling multidimensional data structures.   Built on top of NumPy, Pandas provides a powerful and f..

[Python][numpy] Numpy๋กœ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ ์ƒ˜ํ”Œ๋ง ๋ฐ ๋‚œ์ˆ˜ ์ƒ์„ฑ

๋ฐ์ดํ„ฐ ๋ถ„์„ ๋ฐ ๋จธ์‹ ๋Ÿฌ๋‹์—์„œ๋Š” ๋‚œ์ˆ˜(random number) ์ƒ์„ฑ๊ณผ ์ƒ˜ํ”Œ๋ง(sampling)์ด ์ž์ฃผ ์‚ฌ์šฉ๋œ๋‹ค. Numpy์˜ np.random ๋ชจ๋“ˆ์„ ํ™œ์šฉํ•˜๋ฉด ๋‹ค์–‘ํ•œ ํ™•๋ฅ  ๋ถ„ํฌ์—์„œ ๋‚œ์ˆ˜๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋งํ•  ์ˆ˜ ์žˆ๋‹ค.1. ๋‚œ์ˆ˜ ์ƒ์„ฑ์˜ ๊ธฐ๋ณธ1.1 ๊ท ๋“ฑ ๋ถ„ํฌ ๋‚œ์ˆ˜ ์ƒ์„ฑ๊ท ๋“ฑ ๋ถ„ํฌ์—์„œ ๋‚œ์ˆ˜๋ฅผ ์ƒ์„ฑํ•˜๋ ค๋ฉด np.random.rand() ๋˜๋Š” np.random.uniform()์„ ์‚ฌ์šฉํ•˜๋ฉด ๋จimport numpy as np# 0๊ณผ 1 ์‚ฌ์ด์˜ ๋‚œ์ˆ˜ 5๊ฐœ ์ƒ์„ฑrandom_numbers = np.random.rand(5)print(random_numbers)# [0.68210576 0.42857438 0.15101299 0.54555321 0.02568058]# ํŠน์ • ๋ฒ”์œ„(์˜ˆ: 10~20)์—์„œ ๊ท ๋“ฑ ๋ถ„ํฌ ๋‚œ์ˆ˜ 5๊ฐœ..

[Python][numpy] Numpy ๋ฐฐ์—ด ์ €์žฅ ๋ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

Numpy๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๊ณ  ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•œ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” Numpy ๋ฐฐ์—ด์„ ํŒŒ์ผ๋กœ ์ €์žฅํ•˜๊ณ , ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•์„ ์ •๋ฆฌํ•œ๋‹ค.1. Numpy ๋ฐฐ์—ด ์ €์žฅํ•˜๊ธฐNumpy ๋ฐฐ์—ด์€ ๋‹ค์–‘ํ•œ ํฌ๋งท์œผ๋กœ ์ €์žฅ ๊ฐ€๋Šฅํ•˜๋ฉฐ, ์—ฌ๊ธฐ์„œ๋Š” .npy, .npz, .csv ํฌ๋งท์„ ๋‹ค๋ฃฐ ๊ฒƒ์ž„1.1 .npy ํฌ๋งท์œผ๋กœ ์ €์žฅ.npy ํŒŒ์ผ์€ Numpy์˜ ๊ธฐ๋ณธ์ ์ธ ๋ฐ”์ด๋„ˆ๋ฆฌ ์ €์žฅ ํฌ๋งท์œผ๋กœ, ๋ฐฐ์—ด์˜ ๊ตฌ์กฐ๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๋ฉด์„œ ์ €์žฅํ•  ์ˆ˜ ์žˆ์Œimport numpy as nparr = np.array([1, 2, 3, 4, 5])np.save("array.npy", arr)1.2 .npz ํฌ๋งท์œผ๋กœ ์—ฌ๋Ÿฌ ๋ฐฐ์—ด ์ €์žฅ.npz ํŒŒ์ผ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฐฐ์—ด์„ ํ•œ ๋ฒˆ์— ์ €์žฅํ•  ์ˆ˜ ์žˆ๋Š” ์••์ถ•๋œ Numpy ํฌ๋งท์ž„x = np.arange(10)y = ..

[Python][numpy] Numpy ๊ธฐ์ดˆ๋ถ€ํ„ฐ ํ™œ์šฉ๊นŒ์ง€

Python์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃฐ ๋•Œ ํ•„์ˆ˜์ ์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ค‘ ํ•˜๋‚˜๊ฐ€ Numpy์ด๋‹ค. ๋Œ€๊ทœ๋ชจ ๋ฐฐ์—ด ๋ฐ ํ–‰๋ ฌ ์—ฐ์‚ฐ์„ ๋น ๋ฅด๊ฒŒ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ค๊ณ„๋˜์—ˆ์œผ๋ฉฐ, ๊ณผํ•™ ์—ฐ์‚ฐ, ๋จธ์‹ ๋Ÿฌ๋‹, ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋“ฑ์— ํญ๋„“๊ฒŒ ์‚ฌ์šฉ๋œ๋‹ค.1. Numpy๋ž€?Numpy(NumPy, Numerical Python)๋Š” ๋‹ค์ฐจ์› ๋ฐฐ์—ด ๊ฐ์ฒด(ndarray)๋ฅผ ์ง€์›ํ•˜๋ฉฐ, ์ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์—ฐ์‚ฐํ•  ์ˆ˜ ์žˆ๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•จPython์˜ ๊ธฐ๋ณธ ๋ฆฌ์ŠคํŠธ(list)๋ณด๋‹ค ๋น ๋ฅด๊ณ , ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์‚ฌ์šฉํ•จNumpy์˜ ํŠน์ง•๊ณ ์† ์—ฐ์‚ฐ: C ์–ธ์–ด๋กœ ๊ตฌํ˜„๋˜์–ด ์žˆ์–ด ๋ฆฌ์ŠคํŠธ๋ณด๋‹ค ๋น ๋ฆ„๋‹ค์ฐจ์› ๋ฐฐ์—ด ์ง€์›: ๋ฒกํ„ฐ, ํ–‰๋ ฌ ์—ฐ์‚ฐ์„ ์‰ฝ๊ฒŒ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ๋ฐฉ๋Œ€ํ•œ ์ˆ˜ํ•™ ๋ฐ ์„ ํ˜•๋Œ€์ˆ˜ ์—ฐ์‚ฐ ๊ธฐ๋Šฅ: ๋‹ค์–‘ํ•œ ํ•จ์ˆ˜ ์ œ๊ณต๋‹ค๋ฅธ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์™€์˜ ํ˜ธํ™˜์„ฑ: Pandas, SciPy, TensorFlo..

[Python][pandas] ๋ฐ์ดํ„ฐ ์ •๋ ฌํ•˜๊ธฐ - Sort

์•„๋ž˜ ๊ธ€์€ Pandas์—์„œ DataFrame ๋ฐ์ดํ„ฐ ์ •๋ ฌ๊ณผ ๊ด€๋ จ๋œ ์—ฌ๋Ÿฌ ๋ฉ”์„œ๋“œ๋ฅผ ์ •๋ฆฌํ•œ ๊ฒƒ์ด๋‹ค.  ์ฃผ์š” ๋ฉ”์„œ๋“œ์ธ sort_values(), sort_index() ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ nlargest(), nsmallest(), reindex(), ๊ทธ๋ฆฌ๊ณ  sort_values()์˜ key ํŒŒ๋ผ๋ฏธํ„ฐ๊นŒ์ง€ ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋‹ค.1. DataFrame.sort_values()์—ด(column) ๊ฐ’์„ ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ด๊ณ  ํ•ต์‹ฌ์ ์ธ ๋ฉ”์„œ๋“œ(method)by ํŒŒ๋ผ๋ฏธํ„ฐ์— ๊ธฐ์ค€ ์—ด(๋˜๋Š” ์—ด์˜ ๋ฆฌ์ŠคํŠธ)์„ ์ง€์ •ํ•˜๊ณ , ascending์œผ๋กœ ์˜ค๋ฆ„์ฐจ์ˆœ/๋‚ด๋ฆผ์ฐจ์ˆœ์„ ์„ค์ •ํ•จ # pandas.__version__# 2.2.3import pandas as pddf = pd.DataFrame({ 'A': [3, 1, 2, 4],..

[Python][pandas] ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ - Excel

Excel ํŒŒ์ผ์ด๋ž€ ?  Excel ํŒŒ์ผ์€ Microsoft Excel์—์„œ ์ƒ์„ฑ๋˜๋Š” ๋ฐ์ดํ„ฐ ํ˜•์‹์œผ๋กœ, ์ผ๋ฐ˜์ ์œผ๋กœ .xlsx ๋˜๋Š” .xls ํ™•์žฅ์ž๋ฅผ ๊ฐ€์ง„๋‹ค. ๊ฐ Excel ํŒŒ์ผ์€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์‹œํŠธ(Sheet)๋กœ ๊ตฌ์„ฑ๋˜๋ฉฐ, ์‹œํŠธ ์•ˆ์—๋Š” ํ–‰(Row)๊ณผ ์—ด(Column)๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋œ๋‹ค.   ๋ฐ์ดํ„ฐ ๋ถ„์„์—์„œ Excel ํŒŒ์ผ์€ ๊ฐ€์žฅ ๋„๋ฆฌ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ ํ˜•์‹ ์ค‘ ํ•˜๋‚˜๋‹ค. Pandas๋Š” Excel ํŒŒ์ผ์„ ์†์‰ฝ๊ฒŒ ๋‹ค๋ฃฐ ์ˆ˜ ์žˆ๋Š” read_excel() ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ์ด ๊ธ€์—์„œ๋Š” Excel ํŒŒ์ผ์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๋ฅผ ์ดํ•ดํ•˜๊ณ , Pandas๋ฅผ ํ™œ์šฉํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ฝ๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•, ๊ทธ๋ฆฌ๊ณ  ์ž์ฃผ ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ์™€ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์„ ์‚ดํŽด๋ณธ๋‹ค. Excel ํŒŒ์ผ ์˜ˆ์‹œ:# Sheet1Name Age City๊น€์„œ์šธ 25..

[Python][pandas] ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ - CSV

CSV๋ž€ ?   ๋ฐ์ดํ„ฐ ๋ถ„์„์—์„œ ๊ฐ€์žฅ ํ”ํžˆ ์ ‘ํ•˜๋Š” ํ˜•์‹ ์ค‘ ํ•˜๋‚˜๊ฐ€ CSV(Comma Separated Values) ํŒŒ์ผ์ด๋‹ค. CSV๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ์‰ผํ‘œ(๋˜๋Š” ๋‹ค๋ฅธ ๊ตฌ๋ถ„์ž)๋กœ ๊ตฌ๋ถ„๋œ ํ…์ŠคํŠธ ํ˜•์‹์œผ๋กœ ์ €์žฅ๋œ๋‹ค. Pandas์˜ read_csv() ํ•จ์ˆ˜๋Š” ์ด๋Ÿฌํ•œ ํŒŒ์ผ์„ ๊ฐ„๋‹จํžˆ ์ฝ์–ด๋“ค์ผ ์ˆ˜ ์žˆ๋„๋ก ๊ฐ•๋ ฅํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค. ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” CSV ํ˜•์‹์— ๋Œ€ํ•œ ๊ฐ„๋‹จํ•œ ์†Œ๊ฐœ์™€ ํ•จ๊ป˜ Pandas๋กœ CSV ํŒŒ์ผ์„ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๋ฐฉ๋ฒ•, ์ž์ฃผ ์‚ฌ์šฉํ•˜๋Š” ์ฃผ์š” ๋งค๊ฐœ๋ณ€์ˆ˜, ๊ทธ๋ฆฌ๊ณ  ์—๋Ÿฌ๋ฅผ ์˜ˆ๋ฐฉํ•˜๊ฑฐ๋‚˜ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.CSV ํŒŒ์ผ ํ˜•์‹์ด๋ž€ ?  CSV ํŒŒ์ผ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์‰ผํ‘œ(,)๋กœ ๊ตฌ๋ถ„๋˜์–ด ์ €์žฅ๋œ ๋‹จ์ˆœํ•œ ํ…์ŠคํŠธ ํŒŒ์ผ์ด๋‹ค. ๊ฐ ํ–‰์€ ๋ฐ์ดํ„ฐ์˜ ํ•œ ๋ ˆ์ฝ”๋“œ๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ฒซ ๋ฒˆ์งธ ํ–‰์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์—ด ์ด๋ฆ„(ํ—ค๋”)์œผ๋กœ ์‚ฌ์šฉ๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ..

[Python][pandas] pandas ํ†บ์•„๋ณด๊ธฐ

Pandas๋ž€ ?  Pandas๋Š” 2008๋…„ Wes McKinney์— ์˜ํ•ด ๊ฐœ๋ฐœ๋จ. ๋‹น์‹œ ๊ทธ๋Š” ๊ธˆ์œต ๋ฐ์ดํ„ฐ๋ฅผ ์กฐ์ž‘ํ•˜๊ณ  ๋ถ„์„ํ•˜๋Š” ๋ฐ ์žˆ์–ด ํšจ์œจ์ ์ด๊ณ  ์ง๊ด€์ ์ธ ๋„๊ตฌ์˜ ํ•„์š”์„ฑ์„ ๋Š๊ผˆ์Œ. ์ด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๊ทธ๋Š” Pandas๋ฅผ ์„ค๊ณ„ํ–ˆ๊ณ , ์ดํ›„ ์˜คํ”ˆ ์†Œ์Šค๋กœ ๊ณต๊ฐœ๋˜์–ด ์ „ ์„ธ๊ณ„ ๋ฐ์ดํ„ฐ ๊ณผํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ๋น ๋ฅด๊ฒŒ ์ธ๊ธฐ๋ฅผ ์–ป๊ฒŒ ๋จ. Pandas๋ผ๋Š” ์ด๋ฆ„์€ "PANel DAta"์—์„œ ์œ ๋ž˜๋˜์—ˆ์œผ๋ฉฐ, ์ด๋Š” ๋‹ค์ฐจ์› ๋ฐ์ดํ„ฐ ๊ตฌ์กฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ ์ค‘์ ์„ ๋‘”๋‹ค๋Š” ์˜๋ฏธ๋ฅผ ๋‚ดํฌํ•จ.   Pandas๋Š” NumPy๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์˜คํ”ˆ ์†Œ์Šค์ด๋ฉฐ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋กœ ๋ฐ์ดํ„ฐ ์กฐ์ž‘๊ณผ ๋ถ„์„์„ ์œ„ํ•ด ์„ค๊ณ„๋œ ํŒŒ์ด์ฌ์˜ ๊ฐ•๋ ฅํ•˜๊ณ  ์œ ์—ฐํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ.์ดˆ๋ณด์ž๋ถ€ํ„ฐ ์ˆ™๋ จ๋œ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž๊นŒ์ง€, ๋ˆ„๊ตฌ๋‚˜ ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ๋‘ ๊ฐ€์ง€ ์ฃผ์š” ๋ฐ์ดํ„ฐ ๊ตฌ..

728x90
๋ฐ˜์‘ํ˜•