ML for ASSET MANAGERS

Machine Learning for Asset Managers Ch 6. Feature Importance Analysis (1)

LunaMooN 2021. 11. 18. 23:20

6.1 Motivation

6.2 p-Values

p-Value 의 단점
- 분포에 대한 강한 가정 필요: type1, type2 error 발생 가능성 높음
- 다중공선성 있는 경우 잡아내지 못함
- 주어진 귀무가설 및 추정치에 대해, 추정치보다 같은 혹은 더 큰 값을 얻게 되는 확률에 대해 이야기함 -> 그러나 추정치가 관측 됐을 때 귀무가설이 사실일 확률에 더 관심이 있음
- in-sample 에 대한 유의미성 평가만 함

6.3 Feature Importance

6.3.1 Mean-Decrease Impurity(MDI)

Tree-based 알고리즘에 적용
N개의 sample, F개의 feature ${\lbrace {X_f}\ \rbrace}_{f=1,...,F}$
Purity
- 주어진 하나의 feature에 대한 label들 중 threshold $\tau$보다 작은 것은 왼쪽, 큰 것은 오른쪽 가지에 놓고
- 각 sample의 label들이 하나의 종류이면 purity가 높고, uniformly distribute 하다면 가장 impure하다고 봄
- impurity가 낮아진 만큼 information을 얻었다고 판단할 수 있음

$\Delta g[t,f] = i[t] - \frac{N^{(0)}_t}{N_t} i[t^{(0)}] - \frac{N^{(1)}_t}{N_t} i[t^{(1)}]$
- $i[t]$ : the impurity of labels at node t (before the split)
- $i[t^{(0)}]$: the impurity of labels in the left sample, $i[t^{(1)}]$: in the right sample
- $\Delta g[t,f]$ : 기존의 impurity에서 split 후의 impurity의 차이(다만 왼쪽, 오른쪽 각각 샘플 수에 따라 가중치 정함)
  - 가장 감소분이 큰 feature $f$를 선택
  - 각 feature의 importance는 모든 node에 대해 해당 feature가 선택된 경우, $\Delta g[t,f]$의 가중치 합
  - 모든 feature의 importance 합은 1
p-value를 보완한 점
- circumvents the need for strong distributional assumptions
- derived from a bootstrap of trees
  - reduces the probability of false positives caused by overfitting
- not to estimate the coefficients and the probability of a particular null hypothesis
그러나 MDI 또한 p-value 처럼 in-sample 에서만 계산되는 문제점이 있음

6.3.2 Mean-Decrease Accuracy(MDA)

Method
- Step1: model fitting 후 cross-validation 계산
- Step2: 하나의 feature에 대해 observation을 shuffle 한 후 같은 모델의 cross-validation 값을 계산
- Step3: before and after 값 비교
  - 만약 해당 feature가 중요하다면 shuffle 후의 값이 현저하게 낮아짐

The average of multiple estimates (k-fold cross-validation)
만약 feature들이 독립적이지 않다면, MDA는 importance를 과소평가함
- 예를 들어, 두개의 동일한 feature가 있고 둘 다 중요하다면 MDA는 두 feature 모두 unimportant 하다고 판단: 한 feature에 대한 shuffling의 효과가 동일한 다른 feature에 의해 상쇄되기 때문

6.4 Probability-Weighted Accuracy

Alternatives to accuracy
Log-loss
NegAL = $-N^{-1} \sum_{n=0}^{N-1} \sum_{k=0}^{K-1} {y_{n,k}p_{n,k}}$
- $p_{n,k}$: probability associated with prediction n of label k
- $y_{n,k} \in \lbrace {0, 1} \rbrace$: indicator

PWA = $\sum_{n=0}^{N-1}{y_n(p_n-K^{-1})}/ \sum_{n=0}^{N-1}{(p_n-K^{-1})}$
- $p_n = max_k\lbrace p_{n, k} \rbrace$
- $y_n \in \lbrace {0 ,1} \rbrace$: 1 if prediction is correct

6.5 Substitution Effects

두 개의 feature가 information 공유할 때(상관관계가 높을 때)
MDI 경우 같은 확률로 randomly하게 선택되기 때문에 두 개의 동일한 feature의 중요도는 반감됨
MDA 경우 한 feature를 shuffle한 효과과 다른 feature에 의해 반감되어 importance를 과소평가

6.5.1 Orthogonalization

feature들이 highly codependent하면 observation의 작은 변화도 importance 추정치에 daramatic한 변화를 주게 됨
한 가지 방법은 PCA를 적용하여 multicollinearity를 없앤 principle components들에 대해 MDI나 MDA로 중요도 측정
- non-linear combination의 경우 여전히 해결 안됨
- 직관적인 설명 불가능
- out-of-sample performance 향상 불가능

6.5.2 Cluster Feature Importance

몇 개의 cluster로 faeture를 나누고 clustering 할 것인지
개별 feature importance 구하는 것이 아니라 cluster 별로 importance 구함
Step 1
- Project the observed features into a metric space ${\lbrace X \rbrace}_{f=1,...,F}$
- Find the ONC(Optimal number of clusters)
- Residual feature 계산
  - $D_k$: subset of index features $D = \rbrace {1,...,F} \lbrace$ included in cluster k
  - $D_k \subset D$, $||D_k|| > 0$, $D_k \cap D_l = \varnothing$, $\cup_{k=1}^{K} D_k = D$
  - Fitting $X_{n, i} = \alpha_i + \sum_{j\in \lbrace {\cup_{l<k} D_l} \rbrace} \beta_{i,j}X_{n,j} + \epsilon_{n,i}$, where $n=1,...N$ is the index of observations per feature and $X_i$ is a given feature where $i \in D_k$
  - 위의 식은 silhouette score가 불확실할 때 사용

Step 2
- Clustered MDI
  - 각 feature의 MDI를 계산하고 Stpe1에 따라 만들어진 cluster 안에서 해당 feature의 MDI값을 더함
  - Ensemble tree에서 각 tree마다 하나의 clustered MDI가 계산되고 mean, std of MDI 계산 가능
- Clustered MDA
  - 하나의 feature에 대해 shuffling 하는 것이 아니라 cluster에 포함된 전체 feature를 shuffling 함

출처: Marcos López de Prado, 『Machine Learning for Asset Managers, Cambridge University Press(2020) (p.74-91)

'ML for ASSET MANAGERS' 카테고리의 다른 글

Machine Learning for Asset managers Ch 7. Portfolio Construction (0)	2021.11.20
Machine Learning for Asset Managers Ch.5 Financial Labels (1) (0)	2021.11.10
[Machine Learning for Asset Managers] Ch 4. Optimal Clustering (1) (1)	2021.10.15
[Machine Learning for Asset Managers] Ch3. Distance Metrics (1) (0)	2021.10.07
[Machine Learning for Asset Managers] Ch 2. Denosing and Detoning (1) (0)	2021.10.06

현재글Machine Learning for Asset Managers Ch 6. Feature Importance Analysis (1)

Feature Importance, OU process, Marcenko Pastur distribution, Hurst exponent, R/S 비율, 변수중요도, pair trading, clustering, AFML, 금융시장 불확실성, fractional difference, NCO, ONC, Denosing, 분수차분, Stochastic Process, Detoning, 허스트지수, mean reversion, Quant,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

F

Machine Learning for Asset Managers Ch 6. Feature Importance Analysis (1)

6.1 Motivation

6.2 p-Values

6.3 Feature Importance

6.3.1 Mean-Decrease Impurity(MDI)

6.3.2 Mean-Decrease Accuracy(MDA)

6.4 Probability-Weighted Accuracy

6.5 Substitution Effects

6.5.1 Orthogonalization

6.5.2 Cluster Feature Importance

출처: Marcos López de Prado, 『Machine Learning for Asset Managers, Cambridge University Press(2020) (p.74-91)

'ML for ASSET MANAGERS' 카테고리의 다른 글

'ML for ASSET MANAGERS'의 다른글

티스토리툴바

Machine Learning for Asset Managers Ch 6. Feature Importance Analysis (1)

6.1 Motivation

6.2 p-Values

6.3 Feature Importance

6.3.1 Mean-Decrease Impurity(MDI)

6.3.2 Mean-Decrease Accuracy(MDA)

6.4 Probability-Weighted Accuracy

6.5 Substitution Effects

6.5.1 Orthogonalization

6.5.2 Cluster Feature Importance

출처: Marcos López de Prado, 『Machine Learning for Asset Managers, Cambridge University Press(2020) (p.74-91)

'ML for ASSET MANAGERS' 카테고리의 다른 글

'ML for ASSET MANAGERS'의 다른글

관련글

티스토리툴바