Scikit-Learn(사이킷런) 코드 완벽 분석

# Author: Fabian Pedregosa -- <fabian.pedregosa@inria.fr>
# License: BSD 3 clause

코드 소스와 저작권은 위와 같음을 밝힙니다

import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
%matplotlib inline

필요한 모듈을 가져옵니다.

# X is the 10x10 Hilbert matrix
X = 1. / (np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis])
y = np.ones(10)

힐버트 행렬을 만들어줍니다.

힐버트 행렬은 다음과 같습니다. $H_{ij} = \frac {1}{i + j - 1}$

np.arange(1, 11)

np.arrange는 numpy.arrange(start,stop, step, dtype=None)로 구성되어 있습니다.

1부터 시작해서 10까지를(마지막 11은 제외) array로 반환합니다.

np.arange(0, 10)[:, np.newaxis]

값을 np.newaxis를 통해 차원을 추가하여 (10 X 1) 행렬을 만들어 줬습니다.

np.arange(1, 11) + np.arange(0, 10)[:, np.newaxis]

위의 두 명령어를 더해 최종적으로 (10 X 10) 힐버트 행렬이 완성되었습니다.

y = np.ones(10)
y

np.ones(10)은 1로 채워진 arrary를 반환합니다.

기본적으로 데이터 타입은 float64로 되어있기 때문에 소수점까지 표시됩니다.

# #############################################################################
# Compute paths

n_alphas = 200
alphas = np.logspace(-10, -2, n_alphas)

numpy.logspace는 log scale로 변환된 값을 arrary로 반환합니다.

-10으로 시작해서 -2까지의 범위에서 200개의 샘플을 생성해주었습니다.

alphas

coefs = []
for a in alphas:
    ridge = linear_model.Ridge(alpha=a, fit_intercept=False)
    ridge.fit(X, y)
    coefs.append(ridge.coef_)

coefs = [] 빈 리스트를 하나 생성해 줍니다.

alphas에는 200개의 값이 있습니다. 순서대로 a에 할당된 후

linear_model.Ridge(alpha=a, fit_intercept=False) 모델을 만듭니다.

모형의 상수항은 관련이 없으므로 False로 설정합니다.

X, y에 적합시킨 후

계산된 rideg.coef_를 비어있던 coefs 리스트에 추가합니다.

coefs

반복 후 coefs의 결과입니다.

X의 feature 수가 10개 이므로 각각에 대한 coef가 계산되어 한 arrary 당 10개의 값이 들어가 있고

alpha 값이 200개 이므로 arrary가 200개가 생기게 됩니다.

# #############################################################################
# Display results

ax = plt.gca()

ax.plot(alphas, coefs)
ax.set_xscale('log')
ax.set_xlim(ax.get_xlim()[::-1])  # reverse axis
plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('Ridge coefficients as a function of the regularization')
plt.axis('tight')
plt.show()

이제 그래프로 나타내겠습니다.

ax = plt.gca()

plt.gca()는 그림의 axes를 얻기 위해 사용합니다.

ax.plot(alphas, coefs)
ax.set_xscale('log')

ax.plot는 라인(line) 그래프를 그려줍니다. alphas, coefs에 해당하는 선을 그립니다.

현재의 x축의 값은 log scale 된 alphas와 맞지 않으므로

ax.set_xscale('log')를 사용하여 맞춰 줍니다.

plt.xlabel('alpha')
plt.ylabel('weights')
plt.title('Ridge coefficients as a function of the regularization')
plt.axis('tight')
plt.show()

plt.xlabel, plt.ylabel로 x축과 y축의 레이블을 지정합니다. 각각 'alpha', 'weights'로 지정해주었습니다.

plt.title로 그래프 제목을 지을 수 있습니다.

plt.axis('tight')는 모든 데이터가 보이게끔 충분히 x축과 y축의 범위를 설정합니다.

plt.show는 그래프를 보여줍니다.

그래프를 보면 10개의 선이 있고 $\alpha$가 커질수록 coef가 0에 가깝게 됨을 볼 수 있습니다.

코드 원문 링크를 첨부합니다.

https://scikit-learn.org/stable/auto_examples/linear_model/plot_ridge_path.html#sphx-glr-auto-examples-linear-model-plot-ridge-path-py

Plot Ridge coefficients as a function of the regularization — scikit-learn 0.21.3 documentation

Note Click here to download the full example code Plot Ridge coefficients as a function of the regularization Shows the effect of collinearity in the coefficients of an estimator. Ridge Regression is the estimator used in this example. Each color represent

scikit-learn.org

저작자표시

'scikit-learn' 카테고리의 다른 글

Scikit-Learn(사이킷런) 코드 완벽 분석 - Tree Regression (0)	2019.10.22
Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression OLS, Ridge Variance 비교 (0)	2019.10.21
Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression Lasso (0)	2019.10.18
Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression 내장 데이터셋 (0)	2019.10.16
Scikit-Learn(사이킷런) 소개 (0)	2019.10.16

인문계공돌이

Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression Ridge

'scikit-learn' 카테고리의 다른 글

댓글

티스토리툴바

Scikit-Learn(사이킷런) 코드 완벽 분석 - Linear Regression Ridge

'scikit-learn' 카테고리의 다른 글

관련글

댓글

티스토리툴바