[Python] Pandas :: Data Wrangling, EDA, 데이터 전처리 자주 쓰는 코드

데이터셋 업로드, 불러오기

업로드

from google.colab import files
files.upload()

Data frame불러오기

df = pd.read_csv('데이터.csv')

url 불러오기

URL = "https://~~~~ csv"
df = pd.read_csv(URL)

엑셀시트불러오기

df1 = pd.read_excel(data_url, sheet_name='sheet1')
df2 = pd.read_excel(data_url, sheet_name='sheet2')

데이터프레임 확인

Missing value 확인

df.isnull().sum()

Missing value 0으로 처리

df.fillna(0, inplace=True)

해당 열의 중복되는 데이터 확인

df[df.duplicated(['Column'])]

중복되는 데이터 삭제

df = df.drop_duplicates()

dimension확인

df.shape

개수 체크

df[(df.column1 == "ABC") & (df.column2 == "DEF")].shape

데이터정보(타입)확인

df.info()

데이터타입 확인

df.dtypes

통계치 확인

df.describe()

데이터타입 변경

df = df.astype({'컬럼1':'int'})

빈도확인

pd.crosstab(df['컬럼1'], df['컬럼2'])

데이터프레임 추가삭제

원하는 컬럼만 보기

df = df[['컬럼1', '컬럼2', '컬럼3']]

원하는 값을 가진 데이터만 보기

df[(df.column1 == "ABC") & (df.column2 == "DEF")]

원하는 컬럼 제외해서 보기

df = df['컬럼1'][~df['컬럼2'].isin(['데이터값1', '데이터값2', '데이터값3'])]

특정 인덱스의 데이터 확인

df.loc[1:2] # index가 1~2인 데이터 확인

행 삭제

df = df.drop(index=0, axis=0)

Feature데이터 열 추가

df["추가컬럼"] = ((df['컬럼1']) - (df['컬럼2'] + (df['컬럼3'])/100))*100

데이터변경

행,열 transpose

df1 = df1.T

column 이름 변경

df.rename(columns = {'전컬럼이름':'바뀐컬럼이름'})

시각화

Boxplot 그리기

green_diamond = dict(markerfacecolor='g', marker='D')
plt.boxplot(df['컬럼1'], flierprops=green_diamond)
plt.title("box plot")
plt.show()

sns.boxplot(data=df, x='컬럼1', y='컬럼2', palette=['lightgrey', 'skyblue'])
plt.show();

Histogram 그리기

sns.displot(df['history'], bins=50)
plt.axis([1000, 3500, 0, 200]) # [x축 시작, x축 끝, y축 시작, y축 끝]
plt.show();

plt.hist(data=df, x='Age')
plt.show();

Bar plot그리기

df.plot.bar(color=['g', 'purple'], rot=0);

sns.countplot(data=df, x='Pclass', color='green')
plt.show();

Pie plot 그리기

ratio = df['컬럼1'].value_counts(normalize=True) # normalize는 비율을 나타내 주는 파라미터
plt.pie(ratio, labels=[0,1], autopct='%.0f%%', explode=[0, 0.05], colors=['lightgrey', 'skyblue'])
plt.title('컬럼1');

Violin plot 그리기

sns.violinplot(data=df, x='컬럼1', y='컬럼2', palette=['lightgrey', 'skyblue'])
plt.show();

Count plot 그리기

sns.countplot(x='컬럼1', data=df)
plt.show()

'코딩💻' 카테고리의 다른 글

그래프와 인접리스트 인접행렬, 순회(전위,중위,후위) (1)	2022.10.25
[Machine learning] 단순선형회귀 (Simple Linear Regression), 기준모델 (0)	2022.10.23
[Python] OOP(Object-Oriented Programming), 캡슐화, 상속과 포함, 추상화, 다형성 (0)	2022.10.12
[Python] 다양한 파이썬 함수 코드 :: 반복문(for문), append, insert, extend, remove, pop, del, index, count, enumerate (1)	2022.10.11
[Python] 다양한 파이썬 함수 코드 :: 정규표현식, Raw String, rjust, zfill, split, starswith, endswith, replace, copy, deepcopy (1)	2022.10.11

[Python] Pandas :: Data Wrangling, EDA, 데이터 전처리 자주 쓰는 코드

데이터셋 업로드, 불러오기

업로드

Data frame불러오기

url 불러오기

엑셀시트불러오기

데이터프레임 확인

Missing value 확인

Missing value 0으로 처리

해당 열의 중복되는 데이터 확인

중복되는 데이터 삭제

dimension확인

개수 체크

데이터정보(타입)확인

데이터타입 확인

통계치 확인

데이터타입 변경

빈도확인

데이터프레임 추가삭제

원하는 컬럼만 보기

원하는 값을 가진 데이터만 보기

원하는 컬럼 제외해서 보기

특정 인덱스의 데이터 확인

행 삭제

Feature데이터 열 추가

데이터변경

행,열 transpose

column 이름 변경

시각화

Boxplot 그리기

Histogram 그리기

Bar plot그리기

Pie plot 그리기

Violin plot 그리기

Count plot 그리기

'코딩💻' 카테고리의 다른 글

관련글

티스토리툴바