Scikit-Learn(사이킷런) 코드 완벽 분석

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.metrics import r2_score

필요한 모듈을 가져옵니다.

# #############################################################################
# Generate some sparse data to play with
np.random.seed(42)

n_samples, n_features = 50, 100
X = np.random.randn(n_samples, n_features)

# Decreasing coef w. alternated signs for visualization
idx = np.arange(n_features)
coef = (-1) ** idx * np.exp(-idx / 10)
coef[10:] = 0  # sparsify coef
y = np.dot(X, coef)

# Add noise
y += 0.01 * np.random.normal(size=n_samples)

# Split data in train set and test set
n_samples = X.shape[0]
X_train, y_train = X[:n_samples // 2], y[:n_samples // 2]
X_test, y_test = X[n_samples // 2:], y[n_samples // 2:]

앞부분 코드입니다. 하나씩 파헤쳐 보겠습니다.

# #############################################################################
# Generate some sparse data to play with
np.random.seed(42)

np.random은 난수를 생성합니다.

np.random.seed('0 이상의 정수')처럼 seed를 붙이는 데 다음번 난수 값에서도 같은 숫자가 나오게 할 수 있습니다.

n_samples, n_features = 50, 100
n_samples, n_features

n_samples, n_features에 각각 50과 100을 할당합니다.

X = np.random.randn(n_samples, n_features)
print(X)
print(X.shape)

np.random.randn(d0, d1,...)은 임의의 표준 정규 분포 값을 반환합니다. 뒤의 parameter는 차원을 결정합니다.

X의 값들은 그림2와 같고 n_samples, n_features이 각각 50과 100이니 (50 X 100) 행렬이 됩니다.

# Decreasing coef w. alternated signs for visualization
idx = np.arange(n_features)
idx

n_features는 100이고 np.arrange로 0부터 99를 array로 반환한 후 idx 변수에 할당합니다.

coef = (-1) ** idx * np.exp(-idx / 10)
print(coef)
print(coef.shape)

coef 값을 설정합니다.

coef[10:] = 0  # sparsify coef
coef

coef에 sparse 처리를 해줍니다.

y = np.dot(X, coef)
print(y)
print(y.shape)

y를 생성합니다. np.dot은 차원에 따라 구하는 값이 달라집니다.

이 식에서는 matrix multiplication값이 구해졌습니다.

# Add noise
y += 0.01 * np.random.normal(size=n_samples)
y

noise를 더해주었습니다.

np.random.normal는 무작위로 정규분포에서 표본을 추출합니다. defalt 값은 표준 정규분포로 설정되어있고

y에 맞추기 위해 50개를 추출해 주었습니다.

# Split data in train set and test set
n_samples = X.shape[0]
n_samples

데이터를 train set과 test set으로 나누겠습니다.

X.shape = (50, 100)이고 첫 번째 값을 가져와서 n_samples에 할당하니 n_samples의 값은 50입니다.

X_train, y_train = X[:n_samples // 2], y[:n_samples // 2]
X_test, y_test = X[n_samples // 2:], y[n_samples // 2:]

X_train, y_train, X_test, y_test를 위의 수식대로 겹치는 것 없이 분할하였습니다.

# #############################################################################
# Lasso
from sklearn.linear_model import Lasso

alpha = 0.1
lasso = Lasso(alpha=alpha)

y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(lasso)
print("r^2 on test data : %f" % r2_score_lasso)

Lasso 코드입니다.

# #############################################################################
# Lasso
from sklearn.linear_model import Lasso

필요한 모델을 가져옵니다.

alpha = 0.1
lasso = Lasso(alpha=alpha)

Lasso 모델의 설명은 위와 같습니다.

여기서는 $\alpha$ 값만 조정해 주었습니다.

y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(lasso)
print("r^2 on test data : %f" % r2_score_lasso)

위에서 만든 Lasso 모델에 X_train, y_train 데이터를 적합시킨 후 X_test라는 새로운 X를 넣어주어 y를 예측합니다.

$R^{2}$ 도 구해주었습니다. 값은 0.658064가 나왔습니다.

# #############################################################################
# ElasticNet
from sklearn.linear_model import ElasticNet

enet = ElasticNet(alpha=alpha, l1_ratio=0.7)

y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print(enet)
print("r^2 on test data : %f" % r2_score_enet)

이번에는 ElasticNet 코드를 분석하겠습니다.

# #############################################################################
# ElasticNet
from sklearn.linear_model import ElasticNet

필요한 모델을 가져옵니다.

enet = ElasticNet(alpha=alpha, l1_ratio=0.7)

ElasticNet 모델의 설명은 위와 같습니다.

여기서는 $\alpha$ 값과 l1_ratio의 값만 조정해주었습니다.

y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print(enet)
print("r^2 on test data : %f" % r2_score_enet)

위에서 만든 ElasticNet 모델에 X_train, y_train 데이터를 적합시킨 후 X_test라는 새로운 X를 넣어주어 y를 예측합니다.

$R^{2}$ 도 구해주었습니다. 값은 0.642515가 나왔습니다.

m, s, _ = plt.stem(np.where(enet.coef_)[0], enet.coef_[enet.coef_ != 0],
                   markerfmt='x', label='Elastic net coefficients')
plt.setp([m, s], color="#2ca02c")
m, s, _ = plt.stem(np.where(lasso.coef_)[0], lasso.coef_[lasso.coef_ != 0],
                   markerfmt='x', label='Lasso coefficients')
plt.setp([m, s], color='#ff7f0e')
plt.stem(np.where(coef)[0], coef[coef != 0], label='true coefficients',
         markerfmt='bx')

plt.legend(loc='best')
plt.title("Lasso $R^2$: %.3f, Elastic Net $R^2$: %.3f"
          % (r2_score_lasso, r2_score_enet))
plt.show()

그래프로 나타내겠습니다.

m, s, _ = plt.stem(np.where(enet.coef_)[0], enet.coef_[enet.coef_ != 0],
                   markerfmt='x', label='Elastic net coefficients')
plt.setp([m, s], color="#2ca02c")

plt.stem은 plt.bar와 비슷하지만 폭이 없습니다.

np.where은 조건에 맞는 색인을 찾습니다. np.where(enct.coef_)[0]로 enct.coef가 0이 아닌 곳의 인덱스를 찾아주고 array 전체에 괄호가 넣어지지 않도록 [0]을 추가해줍니다.

enet.coef_[enet.coef_!= 0]로는 enet.coef가 0이 아닌 곳의 값만 추려서 보여줍니다.

markerfmt은 각 stem 머리마다 해줄 표시입니다. x표시를 해 주었습니다.

m은 markerlines, s는 stemlines, _는 baselines를 뜻합니다.

선 두께, 색깔 등 좀 더 다양한 조정을 위해서 setp()를 사용했습니다. m, s의 색깔을 조정해주었습니다.

m, s, _ = plt.stem(np.where(lasso.coef_)[0], lasso.coef_[lasso.coef_ != 0],
                   markerfmt='x', label='Lasso coefficients')
plt.setp([m, s], color='#ff7f0e')

lasso도 똑같이 진행하였고, m, s색만 다르게 해주었습니다.

plt.stem(np.where(coef)[0], coef[coef != 0], label='true coefficients',
         markerfmt='bx')

마지막으로 제일 처음에 구했던 coef로 plt.stem 그래프를 만들었습니다.

plt.legend(loc='best')
plt.title("Lasso $R^2$: %.3f, Elastic Net $R^2$: %.3f"
          % (r2_score_lasso, r2_score_enet))
plt.show()

plt.legend로 범례를 표시합니다. loc = 'best' 옵션을 사용하면 그래프를 가리지 않도록 적절한 위치에 범례를 자동으로 표시합니다.

plt.title로 제목을 표시합니다.

plt.show()로 보여주면

완성입니다.

코드 원문 링크를 첨부합니다.

https://scikit-learn.org/stable/auto_examples/linear_model/plot_lasso_and_elasticnet.html#sphx-glr-auto-examples-linear-model-plot-lasso-and-elasticnet-py

Lasso and Elastic Net for Sparse Signals — scikit-learn 0.21.3 documentation

Note Click here to download the full example code Lasso and Elastic Net for Sparse Signals Estimates Lasso and Elastic-Net regression models on a manually generated sparse signal corrupted with an additive noise. Estimated coefficients are compared with th

scikit-learn.org

저작자표시

'scikit-learn' 카테고리의 다른 글

Scikit-Learn(사이킷런) 코드 완벽 분석 - Tree Regression (0)	2019.10.22
Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression OLS, Ridge Variance 비교 (0)	2019.10.21
Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression Ridge (0)	2019.10.16
Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression 내장 데이터셋 (0)	2019.10.16
Scikit-Learn(사이킷런) 소개 (0)	2019.10.16

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인문계공돌이

Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression Lasso

'scikit-learn' 카테고리의 다른 글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression Lasso

'scikit-learn' 카테고리의 다른 글

관련글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역