정밀도 / 재현율 트레이드오프

분류하려는 업무의 특성상 정밀도 또는 재현율이 특별히 강조돼야 할 경우 분류의 결정 임계값을 조정해 정밀도 또는 재현율의 수치를 높일 수 있다.

두 개는 상호 보완적인 평가지표여서 하나가 오르면 다른 하나가 떨어지기 쉽다. 이걸 트레이드오프라고 한다.

임계값이 낮아질수록 positive로 예측할 확률이 높아짐 - 재현율 증가

predict_probal() 메서드는 분류 결정 예측 확률을 반환한다.

# precision / recall tradeOff

- predict_proba(): 예측 레이블의 확률을 반환해 주는 함수

pred_pro_result = lr_model.predict_proba(X_test)

pred_pro_result

분류를 하는 데 있어서 0인지 1인지를 결정하는 비율을 나타낸다. 두 값 중 높은 쪽의 값을 1로 정의한다.

print('shape',pred_pro_result.shape)

print('result\n', pred_pro_result[:4])



print()

print("*"*50)

print()

y_pred = lr_model.predict(X_test)

print(y_pred)

1번 쪽이 negative

2번 쪽이 positive이다.

result = np.concatenate([pred_pro_result, y_pred.reshape(-1,1)],axis=1)

print('확률에 따른 예측 결과\n',result[:5])

확률과 그에 따른 예측 결과를 0,1로 나타낸다. 왼쪽이 높으면 0 오른쪽이 높으면 1을 나타낸다.

- Binarizer 클래스 fit_transform()

Binarizer는 이항 변수 변환으로 연속형 변수를 기준으로 0과 1을 결정하는 값을 가지는 변수로 만든다.

user_threshold = 0.5

pred_pro_result[:,1].reshape(-1,1)

from sklarn.preprocessing import Binarizer



- threshold를 낮추면 재현율은 올라가고, 정밀도는 떨어진다.



user_threshold = 0.5

positive_pred_proba = pred_pro_result[:,1].reshape(-1,1)

user_predict = Binarizer(threshold=user_threshold).fit(positive_pred_proba).transform(positive_pred_proba)

display_eval(y_test, user_predict)

여기서 보면 Binarizer함수를 사용해서 prositive_pred_proba를 fit 하고 있다.

그 속에서 threshold에 따라 0과 1을 결정하는 이항 변수화 기준선(threshold)을 사용해 transform을 하는 것을 볼 수 있다.

threshold 값에 바뀌는 것을 볼 수 있다.

user_threshold = 0.2

positive_pred_proba = pred_pro_result[:,1].reshape(-1,1)

user_predict = Binarizer(threshold=user_threshold).fit(positive_pred_proba).transform(positive_pred_proba)

display_eval(y_test, user_predict)

threshold를 낮추면 재현율은 올라가고, 정밀도는 떨어진다.

- precision_recall_curve(정답, 예측 확률 값)

=> 정밀도, 재현율 값을 리턴 시켜준다.

from sklearn.metrics import precision_recall_curve



# 레이블 값이 1일때의 예측확률을 추출

pred_positive_label = lr_model.predict_proba(X_test)[:,1]

# print(pred_positive_label)

precisons, recalls,thresholds = precision_recall_curve(y_test,pred_positive_label)

print('precisons\n:', precisons)

print('recalls:\n', recalls)

print('thresholds:\n', thresholds)

- 시각화 (정밀도, 재현율이 임계값 변화에 따른 시각화)

import matplotlib.pyplot as plt

%matplotlib inline





precisions, recalls, thresholds = precision_recall_curve(y_test, pred_positive_label)



plt.figure(figsize=(15,5))



plt.plot(thresholds,precisions[0:thresholds.shape[0]],linestyle='--', label='precisiom')

plt.plot(thresholds,recalls[0:thresholds.shape[0]],label='recall')

plt.xlabel('threshold ratio')

plt.ylabel('precision nad reall value')

plt.legend()

plt.grid()

plt.show()

저작자표시 비영리 변경금지

'Data scientist > Machine Learning' 카테고리의 다른 글

[ML/DL] DecisionTree 구현 및 hyper parameter 설정 (1)	2020.11.03
[ML/DL] python 으로 구현하는 ROC곡선과 AUC (0)	2020.11.02
[ML/DL] python 을 통한 분류(classification) 성능평가지표 사용법(Accuracy,Precision,Recall,F1 Scroe) (0)	2020.11.02
[ML/DL] python 을 통한 교차검증 ( k -Fold , stratifiedkFold) (0)	2020.10.28
[ML/DL] python 을 통한 결측값 확인 및 결측치 처리 방법 (0)	2020.10.28

[ML/DL] 정밀도와 재현율의 트레이드 오프 정의와 구현