[선형 회귀] 자동차 예상 연비 예측하기

DataAnalysis/모델 분석 2022. 6. 1. 01:04

데이터 가져오기

import numpy as np #numpy import
import pandas as pd #pandas import
data_df= pd.read_csv('auto-mpg.csv', header = 0, engine = 'python') #데이터 csv파일 읽어오기
print('데이터셋크기: ', data_df.shape) #데이터크기 출력

#'car_name', 'origin', 'horsepower' 제거, axis=1 열을 기준으로 삭제, inplace=false: 원본을 변경하지 않는다.
data_df= data_df.drop(['car_name', 'origin', 'horsepower'], axis = 1, inplace= False)

선형 회귀 모델 만들기

from sklearn.linear_model import LinearRegression #사이킷런을 사용하여 머신러닝 선형 회귀분석을 위한 LinearRegression import
from sklearn.model_selection import train_test_split #데이터셋 분리 작업을 위한 train_test_split
from sklearn.metrics import mean_squared_error, r2_score #성능측정평가를 위해 임포트

Y = data_df['mpg'] #mpg피처를 종속 변수 y로 설정
X = data_df.drop(['mpg'], axis = 1, inplace= False)  #mpg피처 제외 나머지 피처를 종속 변수 x로 설정 (열기준,원본변경x)

#데이터를 7:3 비율로 test를 3로 분할하여 test data, train data 설정, random_stae=난수의 기준을 설정
X_train, X_test, Y_train, Y_test= train_test_split(X, Y, test_size= 0.3, random_state= 0)

lr= LinearRegression() #선형 회귀 분석 모델 객체 생성
lr.fit(X_train, Y_train) #학습 데이터를 가지고 학습 수행
Y_predict= lr.predict(X_test) # 평가데이터x로 예측 수행

mse= mean_squared_error(Y_test, Y_predict) #평가 데이터의 결과값과 예측 결과값의 오차 계산
rmse= np.sqrt(mse) #mse의 제곱근 계산
print('MSE : {0:.3f}, RMSE : {1:.3f}'.format(mse, rmse)) #mse,rmse출력
print('R^2(Variance score) : {0:.3f}'.format(r2_score(Y_test, Y_predict))) #결정계수 구하기

print('Y 절편값: ', np.round(lr.intercept_, 2)) #선형 회귀의 y절편 구하기, 소수둘째점 자리에서 반올림
print('회귀계수값: ', np.round(lr.coef_, 2)) #각 피처의 회귀 계수 구하기,소수둘째점 자리에서 반올림

coef= pd.Series(data = np.round(lr.coef_, 2), index = X.columns) #회귀 계수값(소수둘째점자리에서 반올림)과 index를 피처이름으로 series자료형만들기
coef.sort_values(ascending = False) # acending=false 내림차순 정렬

차트그리기

import matplotlib.pyplot as plt #matplotlib 임포트
import seaborn as sns #seaborn 임포트

fig, axs= plt.subplots(figsize= (16, 16), ncols= 3, nrows= 2) #창의 크기를 가로세로 16인치,row 2개, col 3개로 설정
x_features= ['model_year', 'acceleration', 'displacement', 'weight', 'cylinders'] #다섯개피처 지정
plot_color= ['r', 'b', 'y', 'g', 'r'] #색깔 지정
for i, feature in enumerate(x_features): #연비 mpg와 feature회귀 관계를 보여주는 그래프
    row = int(i / 3)
    col = i % 3
    #x를 feature로 y를 'mpg'피처로, data를 data_df 데이터프레임으로, plot_color로 color지정,row,col로 축지정
    #5개 그래프를 2행 3열 구조로 나타냄
    sns.regplot(x=feature, y='mpg', data=data_df, ax=axs[row][col], color=plot_color[i])

자동차 예상 연비 예측하기

rint("연비를 예측하고 싶은 차의 정보를입력해주세요.")
cylinders_1 =int(input("cylinders : ")) #cylinders 입력값 받기
displacement_1 = int(input("displacement : "))  #displacement 입력값 받기
weight_1 = int(input("displacement : ")) #displacement입력값 받기
acceleration_1 = int(input("acceleration : ")) #acceleration입력값 받기
model_year_1 = int(input("model_year: ")) #model_year 입력값 받기

mpg_predict= lr.predict([[cylinders_1, displacement_1, weight_1, acceleration_1 , model_year_1]]) #변수를 회귀 모델에 적용하여 예측 결과값 구하기
print("이 자동차의 예상연비(MPG)는 %.2f입니다." %mpg_predict)

출처: 데이터 과학 기반의 파이썬 빅데이터 분석(이지은 지음)책을 공부하며 작성한 내용입니다.

'DataAnalysis > 모델 분석' 카테고리의 다른 글

[결정 트리 분석] 센서 데이터로 움직임 분류하기 (0)	2022.06.02
[로지스틱 회귀 분석] 특징데이터로 유방암 진단하기 (0)	2022.06.01
[선형회귀분석+ 산점도/선형회귀그래프] 환경에따른주택가격예측하기 (0)	2022.06.01
[상관분석+히트맵] 타이타닉호 생존율 분석하기 (0)	2022.06.01
[기술통계분석] 와인 품질 예측하기 (0)	2022.06.01

ABOUT ME

다미에게다미가 다미에게다미가

'DataAnalysis > 모델 분석' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'DataAnalysis > 모델 분석' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바