maseully_hoit.log

GPU 사용 환경 설정 with CUDA, cuDNN

Tue, 12 Sep 2023 14:14:43 GMT

Deep Learning 학습을 위한 GPU 환경 설정 과정

1. GPU 모델 확인

GPU 모델에 따라 설치해야하는 CUDA와 cuDNN 버전이 다름 ** → GPU 확인 부터 시작!!**

PC 환경을 확인하고 설치해야 호환성 문제 발생하지 않음

RTX 4070 Ti

2. NVIDIA 그래픽 드라이버 설치

NVIDIA 그래픽 드라이버 설치 드라이버는 GPU에 맞는 것 검색 후 설치

설치 완료는 nvidia-smi로 확인 가능

3. compute capability 확인

compute capability

그래픽카드 검색 → 제일 좌측이 compute capability

GPU의 compute capability에 맞는 CUDA SDK version 확인 CUDA SDK version 확인

4. CUDA version 확인

Pytorch and Tensorflow → CUDA version 확인 * ubuntu에서는 tensorflow 기준으로 version을 맞추는 것이 편했음

[Tensorflw cuda version 확인]

tensorflow 각 버전 별로 사용 시에 필요한 cuDNN과 CUDA 버전 확인 가능 * Windows에서는 tensorflow-gpu는 2.10 이하 버전만 지원 (23.09)

tensorflow-gpu 2.10.0 기준 → cuDNN : 8.1 / CUDA : 11.2

[Pytorch cuda version 확인]

PC 환경에 맞는 version 확인

현재 stable version이 설치 가능하면 진행 or 사용하고자 하는 cuda version에 맞는 이전 pytorch version 찾기

pytorch를 좀 더 자주 사용하기도 하고, 요구되는 cuda version이 pytorch가 높아서 pytorch 기준으로 cuda 설치를 진행 함

pytorch 2.0 기준 → CUDA 11.7 or CUDA 11.8

* 현재 CUDA version 확인 : nvcc --version 잘 설치되었다면 아래 그림과 같이 현재 설치된 버전을 확인할 수 있음

5. CUDA Toolkit 설치

* 호환되는 version 찾는 것 중요

사용하고자 하는 pytorch or tf version에 호환되는 CUDA 버전에 맞춰 CUDA Toolkit download → CUDA Toolkit 11.8.0 download 했음

[CUDA Toolkit 설치]

6. cuDNN 설치

NVIDIA developer에 가입 후 download [NVIDIA cuDNN Archive link]

위 다운로드 링크에서 설치한 cuda version과 호환되는 cudnn 다운로드 후 설치

설치한 cuda 11.8에 맞는 cuDNN 중 8.8 설치 (좀 더 안정화된 버전을 설치하고자 했음)

다운로드 후 압축 해제 → 압축 해제한 폴더 내부에 있는 모든 파일을 복사 (bin, include, lib) → CUDA Toolkit이 설치된 폴더에 모두 붙여넣기 * (windows) cuda toolkit 설치 시에 default path로 설정했을 때의 경로 C: / Program Files / NVIDIA GPU Computing Toolkit / CUDA / v11.8

7. 환경 변수 설정 (windows)

사용자 변수 : 현재 사용자에게 적용되는 변수 시스템 변수 : 같은 시스템을 사용한는 모든 사용자 모두에게 적용되는 변수

우선 적용 순위 : 사용자 변수 > 시스템 변수

but 수정하려 하는 Path 변수는 예외 Path 변수는 추가되어 있는 변수들을 위에서부터 우선 순위를 두고 적용 * 현재 두 가지의 python 버전과 가상 환경을 사용 중이기 때문에 확인이 필요한 사항 아래 경우 python 3.9보다 python 3.7에 우선 순위

Path 변수에 cuda toolkit의 모든 폴더를 추가 (사용자 변수 path에 추가했음) : toolkit 내의 bin, include, lib까지의 경로를 추가 (lib까지인지 x64까지인지 확인 필요)

환경 변수 수정 후에는 다시 시작

** * 기회가 되면 Ubuntu에서 CUDA 설정 과정 추가 기록 예정 **

이미지 라벨링을 위한 labelImg 사용법 (with YOLO)

Wed, 06 Sep 2023 05:30:07 GMT

Obejct Detection을 수행할 때 필수 요소인 bounding box custom dataset을 training 하기 위해서는 해당 dataset의 label이 필요하다

직접 이미지에서 탐지하고자 하는 물체가 어디에 존재하는지 지정해줄 필요성이 존재 → labelImg을 사용하여 물체의 좌표값과 label을 설정

설치 및 실행

labelImg github 에서 github file을 download
- git clone or .zip download
  
  $ git clone https://github.com/heartexlabs/labelImg.git
- download한 labelImg directory로 이동
```
$ cd [download path]/labelImg
```

필요한 library 설치 사용 환경에 따라 알맞은 설치 방법을 따라가면 됨 Windows 10 + python venv module 기준으로 작성 (mac upload 예정) Windows에서 작업할 때, cmd/power shell 관리자 모드로 실행
(1) pip install pyqt5 (2) pip install lxml (3) pyrcc5 -o libs/resources.py resources.qrc

labelImg 실행 (1) labelImg directory를 찾아 감 (2) directory 내의 labelImg.py 실행

$ python labelImg.py [image directory path] [label file]

labelImg GUI가 뜨면 성공

label class 작성

Custom label class 작성 (YOLO 기준)
(1) predefined class에 추가 ../labelImg/data/predefined_class.txt 에 기존에 정의되어 있는 class 목록 존재 이 파일에 추가하고 싶은 class 입력 후 저장
(2) 새로운 custom class 생성 target class를 한 줄에 한 class씩 작성

labelImg 사용 방법

save directory bounding box가 저장될 directory 설정

bounding box 형식 사용하고자 하는 bounding box 형식으로 변경

bounding box 생성 결과 [YOLO 기준] [class number] [center x] [center y] [width] [height]

단축키

단축키	내용
Ctrl + u	directory에서 모든 image load
Ctrl + r	d
Ctrl + s	save
Ctrl + d	현재 label과 rect box copy
Ctrl + Shift + d	delete the current image
Space	Flag the current image as verified
w	rectangle box 생성
d	next image
a	previous image
del	선택한 rectangle box 삭제
Ctrl ++ / --	Zoom in / Zoom out

* [view] - [auto save mode] : 자동 저장

Python virtual environment 사용법 정리 (venv and Anaconda)

Thu, 20 Jul 2023 14:49:36 GMT

Python virtual environment

여러 project를 진행할 때, 각 project 마다 필요한 라이브러리의 version이 다름 → version에 따른 라이브러리 간의 dependency 발생 → conflict error를 막기 위해 project마다 별도의 환경 세팅

python 표준 library인 venv
많이 사용되는 anaconda

Windows10 기준으로 작성 ** mac은 추후에 추가 기록 예정

venv

python에서 기본으로 제공하는 표준 라이브러리 → 따로 설치가 필요 없음

create

system version으로 설치

$ python -m venv [env_name]
ex) python -m venv env_tmp

특정 python version을 함께 설치

  $ py -[version] -m venv [evn_name]
  ex) py -3.10 -m venv env_tmp
  python3.10 -m venv myenv

activate

Windows powershell

powershell로 실행했을 때 권한 문제 발생할 수 있음
Get-ExecutionPolicy로 권한 확인 후 변경 (추후 기록)

$ env_dir\Scripts\Activate.ps1

Windows cmd

$ env_dir\Scripts\activate.bat

mac / Linux

$ (source) env_dir/bin/activate

delete

directory 삭제하는 것과 같음

$ rm -option [env name]
ex) rm -r env_tmp

deactivate

$ (env) deactivate

install and uninstall

pip로 install and uninstall

$ pip install [package]
$ pip install [package]=version (특정 버전)

$ pip uninstall [package]

update

$ pip install [package] --upgrade [package]
$ pip install (--upgrade) package==version

requirements

현재 가상환경에 설치된 라이브러리 목록 확인 및 list 저장

설치 목록 확인
$ pip list

설치된 라이브러리 버전 목록 저장
$ pip freeze > requirements.txt

사전에 저장해둔 requirements 설치
$ pip install -r requirements.txt

kernel 추가 및 삭제

jupyter notebook(lab)에 생성한 가상환경 kernel 추가 jupyter notebook(lab)을 설치할 때 ipykernel이 함께 설치 만약 설치되지 않았다면 ipykernel 설치

$ python -m ipykernel install

kernel 추가

$ pip install ipykernel --user --name [가상환경 이름] --display-name '[jupyter에서 보여질 kernel 이름]'

kernel 목록 확인

$ jupyter kernelspec list

kernel 삭제

$ jupyter kernelspec uninstall [kernel 이름]

Anaconda

Python 기반의 데이터 분석 등에 사용되는 라이브러리를 모아놓은 플랫폼 가상환경과 라이브러리를 효율적으로 관리할 수 있음 → project의 개발 환경 관리에 용이

그러나,

무거움
최신 버전 라이브러리 지원하지 않는 것이 다수 존재
간혹 지원하지 않는 라이브러리도 존재 (다른 채널에도)

anaconda 관련

anaconda version 확인

$ conda --version

anaconda update

$ conda update conda

create

$ conda create -n(--name) [env_name]
ex) conda create -n env

혹은 가상환경을 생성할 때, python version을 지정하여 설치

$ conda create -n [env_name] python=version
ex) conda create -n env python=3.10

activate / deactivate

activate
$ conda activate [env_name]
ex) conda activate env

deactivate
$ conda deactivate

env 목록 확인

현재 생성되어 있는 가상환경의 목록 확인 둘 중 편한 것으로 사용하면 됨

$ conda env list
$ conda info --envs

delete

$ conda env remove -n [env_name]
ex) conda env remove -n env

--all 옵션 : 가상환경에 설치된 모든 라이브러리들을 깨끗하게 삭제할 지울 수 있음

$ conda env remove -n [env_name] --all

install / uninstall / update

$ conda install [package]
$ conda install package=version (특정 버전)
ex) conda install numpy=1.20

$ conda uninstall/remove [package]

#$ conda update [package]

만약 현재 가상환경이 아닌 다른 가상환경에 설치하고 싶은 경우

$ conda install -n [설치하려는 env_name] [package]

copy

anaconda에는 가상환경을 rename하는 방법이 따로 존재하지는 않음 → 기존 가상환경 copy & new name 설정

$ conda create --name(-n) [new_env_name] --clone [old_env_name]

Multivariate Time-series Anomaly Detection via Graph Attention Network (a.k.a MTAD-GAT)

Mon, 08 May 2023 14:23:16 GMT

Multivariate Time-series Anomaly Detection via Graph Attention Network 2020 IEEE International Conference on Data Mining (ICDM)

Introduction

이전의 방법들의 한계점 : 시계열 간의 상관관계를 고려하지 않아 False Positive가 탐지 됨

실제 데이터 수집 상황은 multivariate 때문에 univariate time sereis의 이상읕 탐지 하는 것은 해당 시스템의 정상 작동 여부를 판단하기에 어려움이 있음 한 시계열이 변화가 바로 시스템 오류를 의미하지 않을 수 있음 ➞ 시스템을 구성하고 있는 각 시계열 간의 상관관계를 확인하는 것이 필요함

green : noraml / red : abnormal 초록색 영역의 2,3번 시계열 값이 갑작스럽게 증가 → 이전의 방법들(주로 point-wise)을 사용하면 이상치로 판단할 것 → 두 시계열의 변화 양상이 유사 : 정상 현상임 → 시계열 간의 상관관계를 고려했을 때 두 시계열은 정상이며, 시게열 간의 상관관계를 고려하여 이상치를 탐지하는 것이 유의미 함

EncDec-AD(ICML 2016) LSTM Encoder-Decoder, reconstruction error
telemanom(KDD 2018) LSTM based prediction
OmniAnomaly(KDD 2019) stochastic recurrent nerural network, 데이터 분포를 모델링 해서 정상 패턴을 탐지

본 논문에서

다변량 시계열에 존재하는 각 단변량 시계열 간의 상관관계를 포함하는 모델
각 시계열 내의 temporal dependency를 반영하는 모델 을 만족하는 Multivariate Time-series Anomaly Detection via Graph Attention Network(MTAD-GAT)를 제안하고자 함

Methodology

basic structures

2개의 graph layer 사용
- feature-oriented : 각 시계열 간 인과관계 탐지
- time-oriendted : temporal dependency 탐지

2개의 모델을 함께 학습
- forecasting based model : focuse on single time stamp prediction
- reconstruction based model : learn a latent representation of the entire time series

다변량 시계열은 단변량 시계열로 형성되어 있음 각 시계열 단위로 이상치를 탐지하는 것을 목표로 함

2개의 graph attention network를 parallel하게 쌓아 inter-feature correlation과 temporal dependency를 모델링 long-term dependency를 해결하기 위해 GRU(Gated Recurrent Unit)를 사용

notation

$x \in R^{n \times k}$ : multivariate time-series input
$n$ : num of timestamp(window length)
$k$ : num of dimensions
$y \in R^{n}$ : label $y_i$는 $i^{th}$ timestamp가 anomaly인지 normal인지 나타냄
$v_i$ : feature vector of each node

architecture

출처 : MTAD-GAT paper

Preprocessing & 1-D Convolution
- normalization(MinMax) → train and test
- cleaning prediction and reconstruction based model은 noise에 민감 training set에 Spectral Residual 적용하여 이상치가 존재하는 time stamp 주변을 normal value로 대체 SR-CNN(KDD'19), SR(CVPR'07) SR : target의 spectrum에서 average spectrum을 빼면 target의 특징만 남음 → anomaly detection에서는 이상치 부분이 남아 있을 것
  
  training 과정에서 깔끔한 normal data만 학습하기 위해 cleaning을 진행했을 것으로 생각함

feature extraction 각 시계열은 high-level feature extraction을 위해 1-D convolution 사용 convolution을 적용하는 것은 time window 내의 local feature 추출에 효과적 시간 순서 유지 & window 내의 정보를 함께 고려하여 embedding

Graph Attention graph layer → node 간의 relationship을 modeling $\left{v_1, v_2, \cdots, v_n \right} : node:set$ $e_{ij} = LeakyReLU(\omega^{T}\cdot(v_i \bigoplus v_j))$, $\bigoplus : concat$ attention score $\alpha_{ij} = \frac{\mathrm{exp}(e_{ij})}{\sum^{L}{l=1}\mathrm{exp}(e{il})}$, $L$ : node $i$의 adjacent node 수 GAT output $h_i = \sigma(\sum_{j=1}^{L}\alpha_{ij}v_{j})$, $\sigma$ : activation function(sigmoid)

- Feature-oriented GAT multivariate time-series의 각 time-series의 상관성에 대한 사전 정보가 없음 → complete graph(서로 다른 vertex 사이에 한 개 이상의 edge가 존재)를 가정 각 시계열은 node, 시계열 사이의 relationship은 edg로 표현 → 모든 시계열이 서로 관련되어 있다고 가정 feature oriented GAT output $h^{feat} (k \times n)$
feature-oriented attention layer $h_1$ : final output

- Time-oriented GAT temporal dependency를 capture하는 목적 sliding window 내의 timestamps는 complete graph로 가정 node $x_t$는 timestamp $t$ 시점의 각 dimension의 feature로 표현됨
- layer's final output feature-oriendted GAT layer: $h^{feat} : (k \times n)$ time-oriented GAT layer : $h^{time} : (n \times k)$ after preprocessing data : $\tilde{x} : (n \times k)$ ⇨ 3개의 output vector를 concat $concat: vectors : (n \times 3k)$는 GRU의 input으로 사용 서로 다른 information을 고려하여 학습하기 위함
1. Joint Optimization 본 논문에서는 forecasting and reconstruction models을 함께 사용하여 각 모델 보완 각 model의 loss를 동시에 업데이트 시키는 것을 목적으로 함 $Loss= Loss_{for} + Loss_{rec}$
- forecasting-based model three fully-connected layers 사용 next timestamp의 value를 예측
$Loss_{for} = \sqrt{\sum_{i=1}^{k}(x_{n,i}-\hat{x}_{n,i})^{2}}$
- reconstruction-based model VAE를 사용 latent representation $z$의 주변 데이터 분포 학습을 목표로 함 time series values를 variable로 취급하여
시계열 값을 변수로 취급함으로써 VAE 모델은 전체 시계열의 데이터 분포를 캡처할 수 있습니다.

$Loss_{rec} = E_{q_{\phi}(z|x)}[logp_{\theta}(x|z)]+D_{KL}(q_{\phi}(z|x)||p_{\theta}(z))$

Score $Score = \sum_{i=1}^{k}\frac{(\hat{x_{i}}-x_{i})^{2}+\gamma(1-p_{i})}{1+\gamma}$

$k$ : the number of features
$\gamma$ : a hyper-parameter to combine the forecasting-based error and the reconstrution-based probability

Experiments

Datasets

SMAP (Soil Moisture Active Passice satellite)
MSL (Mars Science Laboratory rover)
Original dataset

Setting

window size $n = 100$
$\gamma = 0.8$ ; a grid search on the validation set(from 0.4 to 1.0)
epochs : 100

anomaly or not

보통 각각 point를 신경 쓰기 어렵고, 모든 anomaly point를 탐지하는 것은 어려움이 있음 따라서 아래 그림과 같은 조정된 판단 기준을 사용

point : 0.5 이상 → anomaly
adjusted strategy → segment 내에서 anomaly와 인접한 point도 anomaly로 판단

Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications(WWW'18)

Conclusion

inter and intra time seires relationship
time seires 간의 관계 파악 → root cause 파악 가능
feature 간 관계에 대한 사전 지식 없이 실험 → 사전 지식이 포함되면 성능 향상 가능성 존재

PaDiM : a Patch Distribution Modeling Framework for Anomaly Detection and Localization

Thu, 16 Mar 2023 09:07:34 GMT

PaDiM : a Patch Distrivution Modeling Framework for Anomaly Detection and Localization

International Conference on Pattern Recognition, International Workshops and Challenges 2021 (ICPR 2021)

Abstract

pre-trained CNN based model
이상 부분이 존재하는 구역을 탐지
Multivariate Gaussian distribution : normal class의 probabilistic representation을 얻기 위한 목적

Introduction

이미지 분야의 anomaly detection은 주로 이미지 전체에 대해 정상 혹은 이상을 판단 본 논문에서 제안하는 PaDiM은 patch 단위로 이상을 탐지하여 보다 정확하고 설명 가능한 결과를 도출하고자 함 이상 부분을 patch 단위로 탐지하는 여러 방법이 존재 (ex. SPADE, Patch SVDD etc.) but 본 논문에서 말하는 이전 방법들은 아래와 같은 한계점이 존재

normal data를 학습 시키는 과정이 필요하거나,
전체 training dataset에 대해 kNN을 통한 distance 도출 과정이 필요

특히 kNN의 경우는 training dataset이 커짐에 따라 complexity가 linear하게 증가

본 논문에서는 위의 문제점을 보완하고자 Patch Distribution Modeling(PaDiM)을 제안 PaDiM은 아래 두 논문을 참고하여 성능을 개선시키고자 함

Modeling the Distribution of Normal Data in Pre-Trained Deep Features for Anomaly Detection (International Conference on Pattern Recognition 2020; ICPR 2020)
Sub-Image Anomaly Detection with Deep Pyramid Correspondences (arxiv 2020)

PaDiM은 두 논문문과 같이 pre-trained model을 사용하여 학습에 필요한 시간 소모를 없애고, Multivariate Gaussian distribution을 사용하여 patch 간의 correlation을 측정

Patch Distribution Modeling

PadiM의 전체적인 process

출처 : PaDiM papaer

Embedding extraction

Normal image의 feature를 추출하는 과정 본 논문에서는 pre-trained model을 사용하여 dataset에 최적화 시키기 위한 학습 과정을 skip

사용한 pre-trained model : ResNet18, Wide ResNet-50-2, EfficientNet-B5

PaDiM의 feature extraction 과정은 SPADE와 유사

N개의 normal images로부터 embedding된 feature 추출 위의 참고 논문들과 같이 다양한 layer에서 추출한 feature를 같이 사용하여 local/global information을 함께 사용

3개의 layer에서 추출한 feature를 함께 사용 이 때, 가장 큰 resolution의 feature에 맞게 resize 한 뒤 concat해서 embedding vector를 생성 ➞ 서로 다른 sementic level과 resolution에서 나온 정보 포함하고 있는 embedding vector

3개의 layer에서 추출한 feature를 합하여 하나의 feature로 구성 추출한 wxh개의 patch들이 각각의 mean과 covariance를 가짐

Learning of the normality

normal embedding vector의 분포를 구함 추출한 feature의 patch 별 mahalanobis distance를 구하기 위해 mean과 covariance를 계산

patch embedding vector는 multivariate gaussian distriution에 의해 생성되었음을 가정

Inference : computation of the anomaly map

Mahalanobis distance를 사용하여 normal/abnormal을 판단

AD_ICPR 2020 : 추출된 하나의 feature map들로 분포를 구한 뒤 mahalanobis distance 계산
PaDiM : patch 별 적용하여 mahalanobis distance 계산
SPADE : patch 별 kNN을 사용하여 측정

patch 별로 anomaly score를 측정 ➞ image 당 patch 개수만큼의 anomaly score가 계산되어 anomaly map이 생성됨

anomaly map은 patch 별 이상치 정도로 표현 가능하며, 전체 image의 anomaly score는 score map의 가장 큰 anomaly score로 표현 ➞ 이미지에서 가장 결함이 큰 부분이 이미지의 결함 정도로 대표될 수 있음

Anomaly socre 비교

SPADE - patch 별 anomaly score 측정 - kNN을 사용하여 모든 patch와 비교(k개)

Multivariate Gaussian & Mahalanobis AD - image 하나에 대한 anomaly score - mahalanobis distance를 사용

PaDiM - patch 별 anomaly score 측정 - mahalanobis distance를 사용, 동일한 위치에 존재하는 patch들과 비교

Experiments and Conclusion

Experiments

PaDiM으로 이상 부분을 탐지한 결과 예시

left column : normal images middle column : abnormal images(ground truth) right column : anomaly heatmap obtained PaDiM(high score : yellow)

비정상 이미지의 결함 부분의 anomaly score가 높게 측정 결함 위치에 따라 normal/abnormal 판단

MVTec과 Shanghai Tec Campus (STC) dataset 사용 MVTec의 경우 ramdom rotation과 random crop 후에 사용 ➞ real dataset의 경우에 탐지하고자 하는 물체가 항상 잘 정렬되어 있지 않을 경우가 많기 때문

MVTec 실험 결과를 texture class와 object class로 나눠서 비교 texture class : 전체적으로 비슷하게 구성되어 있음 object class : 핵심이 되는 물체가 존재(물체의 특징 뚜렷)

* 결과에 대한 추가 고찰 필요 (추가)

Conlusion

Pretrained model 사용하여 feature extraction
Mahalanobis distance를 사용하여 anomaly score 측정
patch 단위의 연산을 통해 불량의 위치 탐지 가능
추가적인 학습 없이 사용 가능

Sub-Image Anomaly Detection with Deep Pyramid Correspondences

Thu, 09 Mar 2023 12:57:23 GMT

Sub-Image Anomaly Detection with Deep Pyramid Correspondences

arXiv 2020

출처 : paperswithcode

MVTec dataset을 사용한 anomaly detection task에서의 rank

Introduction

대부분의 제품은 정상으로 존재하지만, 일부 제품에 결함이 존재할 수 있음 이 때, 이러한 결함이 존재하는 제품을 빨리 감지하는 것이 중요 각 제품의 결함을 탐지하기 위해서는 computer vision solution이 필요

본 논문에서는 이미지에 존재하는 이상 부분을 탐지하기 위한 방법을 제시 제안하는 방법은 Semantic Pyramid Anomaly Detetion(SPADE)라고 칭함 기존의 kNN based methods의 한계점이었던 segmentation 성능을 보완하고, feature pyramid를 사용하여 다양한 resolution의 정보를 사용하여 task를 수행하고자 함

SPADE는 sub-image의 anomaly detection과 segmentation task를 해결하여 각 image에서 어떤 부분이 이상인지를 탐지하고자 함

Details of SPADE

Image feature extraction

SPADE는 task를 수행하기 위하여 따로 학습 과정을 거치지 않고 사전에 학습된 모델을 사용하여 image feature를 추출 본 논문에서는 ImageNet으로 학습한 ResNet을 사용

pre-trained model을 사용하여 따로 학습하는 시간 감소
pre-trained model 큰 dataset을 사용하여 학습했기 때문에 다양한 feature를 구분할 수 있을 것

➔ 큰 dataset으로 학습한 pre-trained model을 사용하여 따로 학습하는 시간을 줄이고 보장된 성능의 model을 사용하여 이미지의 feature를 추출

K Nearest Neighbor Normal Image Retrieval

pre-trained model을 사용하여 추출한 noraml data의 feature들은 서로 가까운 거리에 위치할 것 test dataset의 normal 이미지 feature의 분포는 train dataset의 feature의 분포와 유사할 것 반면 abnormal 이미지의 feature는 normal data의 featrue와 먼 거리에 위치할 것

➔ kNN을 사용하여 feature들이 얼마나 멀리 떨어져 있는지 측정

Image-level detection

하나의 image에 대해 normal/abnormal을 판단 SPADE에서는 마지막 convolutional layer를 통과한 feature에 global pooling을 적용하여 최종 feature를 추출

* ResNet의 마지막 layer의 구조 출처 : Deep Residual Learning for Image Recognition(CVPR 2016)

Global Average Pooling feature map 내의 값들의 평균을 사용 max 값을 대푯값으로 사용하는 maxpooling과 달리 일정 구역 내의 평균 값을 사용 때문에 어느 위치에 있는지보다 해당 구역 내에 있는지 없는지에 중점을 둠

Pixel-level detection

SPADE가 초점을 맞춘 detection method

image level에서와 달리 average pooling을 하지 않고 feature를 patch 단위로 분리 (본 논문에서는 pixel이란 단어를 사용했지만, ViT에서 사용하는 patch라는 단어가 익숙해서 patch로 작성)

모든 patch들을 모아둔 것을 feature gallery로 칭하고 해당 patch들을 anomaly score를 측정하는 단계에서 사용

Anomaly Score

$F$ : feature extractor $x_i$ : image $f_i$ : extracted feature

$f_i = F(x_i)$

$y$ : test image

Image-level

Euclidean distance를 이용하여 image-level feature 간의 거리를 측정

$N_k(f_y)$ : 가장 가까운 k개의 training set의 normal image feature

$Anomaly : score(y) = \frac{1}{K} \sum_{f \in N_k(f_y)} |f - f_y |^{2}$

training set의 normal image feature와 test set의 image feature와의 거리

normal : 가장 가까운 k개의 정상 featue와의 평균 거리는 가까울 것
abnormal : 평균 거리가 멀게 계산될 것

➔ training set의 normal feature들로부터 먼 거리에 위치할수록 anomaly로 판단

Pixel-level

feature map이 m개의 patch로 나뉘었다면, 각 patch를 모든 patch와의 거리를 측정 image-level의 경우 normal data임에도 training set에 존재하는 image의 형태와 다르면 anomaly 판단할 것 또한, 같은 위치에 존재하는 patch 끼리만 비교하게 되면 같은 image의 다른 부분이 해당 위치에 존재할 경우 feature patch의 거리가 멀게 측정될 것

➔ 모든 patch와의 거리를 비교하여 같은 이미지가 회전 등의 이유로 변형되었을 경우에도 이를 normal로 판단할 수 있도록 함

$Anomaly:score(y,p) = \frac{1}{k} \sum_{f \in N_k(F(y,p))} |f - F(y-p) |^{2}$

Feature Pyramid Matching

CNN based model의 각 layer는 layer의 위치에 따라 서로 다른 feature 정보를 학습

low level feature : local 정보 ↑ / contextual(global) 정보 ↓ higher resolution features encodingn less context
high level feature : local 정보 ↓ / contextual 정보 ↑

각 layer level에 따라 지니고 있는 정보의 특징이 다름 본 논문에서는 각 level에 따른 feature 정보를 모두 사용하여 각 level에서 부족한 정보량을 보완하여 성능을 향상시키고자 함 최종적인 feature는 fine-grained local features와 global context features 모두 인코딩함

본 논문에서는 3개의 layer의 output을 사용 서로 다른 resolution의 feature를 concat하기 위해서 bilinear interpolation을 통해 resize 후에 concat으로 하나의 feature map으로 만들어 줌

Conclusion

MVTec dataset의 hazelnut과 metal net class SPADE example

첫 번째 행 : anomalous image 두 번째 행 : 가장 비슷한 normal image 세 번째 행 : anomalous image piexel의 ground truth 네 번째 행 : SPADE가 예측한 anomalous image pixels

test image와 가장 비슷한 normal image를 탐색하고 해당 image와 비교를 통해 anomalous 영역을 예측

요약

SPADE는 image 내부의 결함 영역을 탐지하고자 함 pre-trained model을 사용하여 feature extraction을 수행 ➔ normal image를 학습시키는 시간 감소 & 사용 편리 kNN 방법을 사용하여 정상과 비정상을 구분 image-level detection과 piexel-level detection이 모두 가능 feature pyramid 방식을 사용 ➔ 다양한 feature 정보를 사용

* 현재 SOTA인 PatchCore의 기반이 되는 모델

Github SPADE github

Docker Container와 Image 삭제

Fri, 03 Mar 2023 10:32:36 GMT

Docker Container 삭제

* container가 실행 중이면 삭제되지 않음

$ docker rm [container ID]
$ docker rm [container 이름]

Image 삭제

* 삭제하려는 이미지와 관련있는 컨테이너가 존재하면 이미지가 삭제되지 않음

$ docker rmi [image ID]
$ docker rmi [image 이름]

삭제 시에 image의 이름만 지정하면 해당 이름을 가진 모든 이미지가 삭제됨 만약 특정 버전의 이미지만 삭제하고 싶으면 이미지 이름과 태그를 함께 지정

$ docker rmi [이미지 이름/ID]:[tag]

Docker Container 생성 및 실행

Mon, 13 Feb 2023 12:27:07 GMT

Docker Container 생성

Docker version 확인
```
$ docker -v
```

사용하고자 하는 프로세스를 실행하기 위한 image 필요 Docker Hub에 다양한 image가 존재 Docker Hub를 통해 내가 필요한 설정을 image로 만들어 upload 할 수 있음

Docker image 확인 현재 download한 image를 확인하는 방법

$ docker images

위의 명령어를 터미널 창에 입력하면 아래 그림과 같이 현재 존재하는 image 목록을 확인할 수 있음

Docker image pull
ex) tensorflow

$ docker pull tensorflow/tensorflow:2.11.0-gpu

ex) ubuntu

$ docker pull ubuntu:latest

Docker container 생성 $ docker run --gpus all -it -v [현재 컴퓨터(서버)의 경로]:[container 내부의 경로] -p [컴퓨터(서버)의 port / container 외부에서 접속하려는]:[container 내부 연결 port] --name [사용하려는 container 이름] [image 이름]:[tag] [/bin/bash]

$ docker run --gpus all -it -v /home/user/Desktop:/home/workspace -p 8888:8888 tensorflow/tensorflow:2.10.0-gpu /bin/bash 옵션들의 순서는 상관 없음

\-- gpus : gpu를 container 내에서 사용할 수 있도록 설정 $\:\:\:\:\:\:\:\:\:\:\:\:\:\:$ gpus 다음엔 gpu 번호를 할당, all을 쓸 경우 모든 gpu 사용 가능 \-it : 키보드를 사용할 수 있도록 설정 (container 내부에서 command 입력 가능하도록) \-v : container의 폴더 경로와 host 컴퓨터 폴더 경로를 mount $\:\:\:\:\:$ -v 옵션을 걸어줘야 서로 연결됨 \-p : container와 host 컴퓨터의 포트를 매핑 \\bin\bash : 이걸 해줘야 command 입력이 자유롭다
나중에 container 이름 변경 가능
$ docker rename [현재 이름] [바꿀 이름]

Docker Container 실행

동작 중인 container 목록 확인 현재 동작 중인 container의 정보를 확인할 수 있음 즉, stop 상태인 container의 정보는 출력하지 않음

$ docker ps

모든 container 목록 확인 존재하는 모든 container의 목록 및 정보를 확인 stop 상태인 container의 정보도 포함한 모든 container의 정보를 출력

$ docker ps -a

container 실행 및 중지 container를 사용하기 위해선 먼저 실행이 필요 모든 작업 후 container 작동을 멈춤

container 실행 $ docker start [container name]

container 실행 중지 $ docker stop [container name]

container 접속 ver 01
attach* 현재 실행시킨 터미널 창에서 사용하는 방식 직접 연결된 상태

$ docker attach [container name] 현재 실행 중인 container를 종료하지 않고 나오는 방법

ctrl+p+q attach로 container에 붙은 경우 exit를 사용하여 container 외부로 나오면 container의 모든 동작을 멈춘 후 종료하게 됨

container 접속 ver 02
exec* container를 실행 시킨 터미널 창 외의 새로운 터미널 창으로 해당 container를 사용할 수 있음 container 내에 새롭게 접근할 수 있도록 연결

$ docker exec -it [container name] /bin/bash exec로 접속하게 되면 별로의 연결을 생성해준 것이기 때문에 exit을 통해 container를 실행시킨 채로 container 외부로 나올 수 있음

GitHub repository 생성 및 연동

Sun, 05 Feb 2023 06:42:26 GMT

GitHub repository

GitHub 계정을 생성한 뒤에 코드를 정리하기 위한 repository 생성 방법

GitHub의 원격 repository와 local repository를 연동하여 code를 관리

1. Clone 방식

GitHub에 remote repository를 생성한 뒤 해당 repository를 local에 clone해서 연동하는 방식 clone으로 연동하면 remote add 없이 remote repository와 local repository 연동 가능

remote repository 생성 github login 한 뒤에 your repositorie의 초록색 new 버튼을 누르면 새로운 repository를 생성할 수 있음
생성하고자 하는 repository name을 기입하고 해당 repo의 공개 범위를 설정
Add a README file을 선택하면 repo와 함께 readme file이 생성 but 이 file을 생성한 뒤 local과 연동하려면 fetch 등의 다른 작업들이 필요
remote repository를 local로 clone ssh 주소는 github repo를 생성하면 아래 그림과 같이 초기 세팅을 위한 정보에서 확인할 수 있음 ssh 주소를 터미널에서 git clone과 함께 입력하면 local에 해당 repo가 생성

$ git clone [ssh address]

2. local repo를 연동하는 방식

local에 존재하는 폴더를 github에 연동

remote repository를 clone해서 사용하는 방식이 초기 setting을 필요로 하지 않는다는 편리함이 존재하지만, 코드 변경 및 추가 시 마다 업로드 및 업데이트가 필요

local을 remote에 연동하는 방식은 github를 처음 접했을 당시에 사용하던 방법이어서 좀 더 익숙한 방법 또한, 진행하던 작업이 완료된 후에 github에 업로드 할 때 좀 더 편리한 방식이라고 생각

remote and local repository 생성 remote repository는 위의 방법과 같이 생성 local repository는 code를 저장하고자 하는 경로에 생성

local과 remote 연결 main/master branch를 생성 및 이동하면서 연동 만약 원래 main/master branch가 존재하는 경우 현재 branch를 확인 후 add하면 됨

$ cd [folder path]
$ git init (git initialization)
$ git remote add origin 
$ git checkout -b main (main branch를 만들면서 이동)
or
$ git branch (현재 branch 확인)
$ git checkout [이동하려는 branch name]

$ git add . (수정한 file 전부 add)
or
$ git add filename.txt(file 하나만 add)
$ git status (file의 상태 변경 확인)
$ git commit -m "[commit name]"
$ git push origin main 
 (git push [remote repo name] [branch name])

* 맨 처음 pust 할 때 -u option을 붙이면 다음부터는 pull/push 할 때 맨 처음 설정한 branch로 연결하라는 option

$ git push -u origin main

GitHub 계정 생성 및 ssh-key 등록

Wed, 01 Feb 2023 14:36:00 GMT

GitHub 계정 생성하기

GitHub에 접속 (github link) sign up click

가입에 필요한 정보 기입

email address
password
user name
email로 관련 소식을 받고 싶은지 선택

가입 이메일로 발송된 인증코드 입력
선택 사항들 체크

화면 하단의 Skip personalization 클릭 후 스킵도 가능

github 화면이 확인되면 계정 생성 완료

ssh-key registration

이전 github 사용 시에는 id와 password로 접속 가능했었는데, 현재는 공개키 인증 방식을 사용 로컬 환경과 외부 서버 환경 사이의 안전한 통신하고자 사용 - clone, push 등

ssh key를 생성하면 공개키(Public key)와 개인키(Private key) 한 쌍이 생성됨

public key : git server 등록 용도
private key : password의 역할, public key를 등록해 둔 서버에 접속하기 위해 사용

ssh key 생성

ssh-keygen을 사용해서 key를 생성

$ ssh-keygen -t ed25519 -C "github account email address"

* ed25519 방식으로 동작하지 않으면 rsa 옵션으로 변경 후 작성

$ ssh-keygen -t rsa -b 4096 -C "github account email address"

key가 생성되면 위치 및 password 지정 문구가 출력 Enter file in which to save the key (~~/.ssh/id_ed25519): → 저장 위치 → default 값 사용 시 그냥 Enter

Enter passphrase (empty for no passphrase) : → ssh key에 대한 비밀번호 설정 → 설정하지 않을 시에는 그냥 넘어가도 됨

생성된 key 확인 후 github에 등록

cat 명령어를 사용해서 key가 잘 생성되었는지 확인

$ cat id_ed25519.pub

를 입력하면 공개키가 출력되며 이를 복사하여 github에 등록

$ cat id_ed25519

를 입력하면 개인키를 확인할 수 있음

github_key등록 or [github account-Setting-SSH and GPG keys-add new]

key title과 공개키를 입력하여 등록

ssh-key 확인

터미널 창에

$ ssh -T git@github.com

을 입력하여 생성한 ssh-key가 잘 동작되는지 확인한다

Enter passphrase for key '~~/.ssh/id_ed25519' : 문구가 출력되면 기존에 입력한 key password를 입력

이후엔 You've successfully authenticated라는 문구와 함께 동작 확인

config를 생성해야 이후에 local과 github 간의 연동이 원활

$ git config –global user.email [github email account]
$ git config –global user.name [github user name]

github에 key title과 함께 열쇠 모양으로 키가 잘 생성되었음을 확인할 수 있음

Time Series Anomaly Detection with Multiresolution Ensemble Decoding

Tue, 31 Jan 2023 14:16:09 GMT

Time Series Anomaly Detection with Multiresolution Ensemble Decoding

2021 AAAI Conference on Artificial Intelligence (AAAI-21)

RNN based autoencoder with decoder ensembles

기존 RNN을 사용한 autoencoder는 sequential decoding으로 인해 overfitting 및 error accumulation이 발생하기 쉬움 ➜ decoding length가 다른 여러 개의 decoder를 사용하는 방법을 적용

Introduction

previous recurrent auto-encoder based anomaly detection methods ➜ privious time steps로 인한 error accumulation 때문에 long time series를 reconstruction 하는데 어려움이 존재할 수 있음

error accumulation : decoder의 input으로 이전 시점의 output이 사용되면서 이전 시점에 존재하던 error가 축적되는 문제

본 논문에서는 reccurent based multi-resolution decoders를 사용한 Multi-Resolution Ensemble Decoding(RAMED)을 제안

RAMED는 각각 다른 time step을 가지는 decoder를 사용하여 다양한 temporal information을 얻음

short decoding length : focus on macro temporal characteristics
- trend patterns
- seasonality
long decoding length : focus on more detailed local temporal patterns

Trend and Seasonality example

출처 : my notebook

S-RNN(S-RNN review)과 달리 lower-resolution temporal information을 higher resolution decoder에 전달

Contriution

서로 다른 decoding length를 가지는 multiple decoders ensemble
서로 다른 multiresolution temporal information을 융합하기 위한 mechanism

Architecture

Notation

Input time series : $X = [x_1, x_2, ...:, x_{T}]$, where $x_t\in \mathbb{R}^{d}$ Output reconstructed time series : $Y = [y_T, y_{T-1}, ...:, y_1]$ , where $y_t \in \mathbb{R}^{d}$ Error : $e(t) = y_t - x_t$ the number of encoders : $L^{(E)}$ the number of decoders : $L^{(D)}$ small noise : $\epsilon \delta$ ($\epsilon = 10^{-4}$, random noise $\delta$~$N(0,1)$)

Multiresolution Ensemble Decoding

본 논문에서는 RNN을 사용한 S-RNN과 달리 LSTM을 사용

Encoding process는 S-RNN과 동일 sparsely connected RNN을 사용하는 encoder ensemble 사용 본 논문에서는 총 3개의 encoder 사용

$h^{(E)} = F_{MLP}(concat[h_{T}^{(E_1)}; ...:;h_{T}^{(E_i)}; ...:; h_{T}^{(E_{L(E)})} ])$ $F_{MLP}$ : fully-connected layer

Decoding process

coarser decoder의 hidden state와 finer decoder의 hidden state를 concat하여 output 도출

각 decoder가 서로 다른 temporal information을 잡아낼 수 있도록 서로 다른 수의 step을 사용

robustness를 위해 small noise를 input에 추가

decoding length	temporal characteristic
short	macro
long	micro

서로 다른 길이의 decoded output을 dynamic time warping(DTW)를 통해 input time series와 비슷하게 만듦

Decoder Lengths

$k$th decoder $D^{(k)}$ reconstructs a time series of length $T^{(k)}$, where $T^{(k)} = \alpha_{k}T$

$\alpha_{k} = \frac{1}{\tau^{k-1}} \in (0, 1]$

단,

$\tau > 1$
- $\tau = 2$ in model figure
$\alpha_{1} = 1$ and $T^{(1)} = T$
$T^{(L^{(D)})} \geq 2$ ➜ 가장 top decoder가 적어도 2 steps는 가지도록 설정

Coarse-to-Fine Fusion

outputs의 length가 각각 다르기 때문에 average or median으로 ensemble output을 정리할 수 없음 이를 위해 coarse-to-fusion strategy 사용 아래는 example

두 decoder $D^{(k+1)}$과 $D^{(k)}$ $T^{(k)} = \tau T^{(k+1)} > T^{(k+1)}$

* $T^{(k+1)} = \frac{1}{\tau ^{k}} \cdot T$ $: : : : : : : : : : : : : : : ,= \frac{1}{\tau ^{k-1} \cdot \tau} \cdot T$ $: : : : : : : : : : : : : : : ,= \frac{1}{\tau}\cdot \frac{1}{\tau ^{k-1}}\cdot T : (\frac{1}{\tau ^{k-1}}\cdot T = T^{(k)})$ $: : : : : : : : : : : : : : : ,= \frac{1}{\tau} \cdot T^{(k)}$

$D^{(k+1)}$의 information이 $D^{(k)}$보다 coarse ➜ top decoder의 infromation은 모든 decoder의 것보다 coarse함

가장 macro information을 담당하는 $k = L^{(D)}$인 decoder의 hidden state $h_{t-1}^{(k)} = LSTM^{(k)}([y_{t}^{(k)}; h_{t}^{(k)}])$와 같음

decoding 시에 보다 coarse한 information을 함께 사용하는 나머지 decoders의 hidden state 아래 식과 같이 계산됨

나머지 decoders의 hidden state는 previous hidden state의 설명은 아래와 같음 $D^{(k)}$의 hidden state $h^{(k)}{t+1}$는 sibiling decoder $D^{(k+1)}$의 slightly-coarser information $h^{(k+1)}{\left \lceil t/\tau \right \rceil}$와 관련 있음

$\hat{h}{t}^{(k)} = \beta h{t+1}^{(k)} + (1-\beta)F'{MLP}(concat[h{t+1}^{(k)}; , h^{(k+1)}_{\left \lceil t/\tau \right \rceil}])$
- $F'_{MLP}$ : two-layer fully-connected network with PReLU(Parametric Rectified Linear Unit)
- $\beta h_{t+1}^{(k)}$ : similar role as the residual connection
$\hat{h}{t}^{(k)}$는 LSTM cell의 input으로 사용됨 $h{t}^{(k)} = LSTM^{(k)}([y_{t+1}^{(k)}] + \epsilon \bigodot \delta; : \hat{h}_{t}^{(k)}),:::t = T^{(k)}-1, ...:, : 1$

the ensemble's reconstruced output $Y = [y_{1}^{(1)}, y_{2}^{(1)}, ... , y_{T}^{(1)}]$

Loss Function

본 논문에서는 두 가지 loss를 함께 사용

reconstruction loss

$L_{MSE}(X) = \sum_{t=1}^{T} \left| y_t^{(1)} - x_t\right|_{2}^{2}$

output이 input을 얼마나 잘 reconstruction 했는지 확인
multiresolution shape-forcing loss

input $X$와 output $Y$의 모양이 유사한 형태를 가지도록 하는 loss

time series 간의 similarity를 측정하는 DTW 사용 but min 값을 찾는 DTW는 미분이 불가능하기 때문에 smoothed DTW(sDTW)사용
- Soft-DTW : a Differential Loss Function for Time-Series(ICML 2017, Soft-DTW)
  
  $sDTW(X, T^{(k)}) = -\gamma, log(\sum_{A \in \mathit{A'}} e^{-/\gamma})$
- smoothed min operator : $min^{\gamma}\left{ a_1, ..., a_n \right}$이 있을 때 DTW는 min만 선택하지만 sDTW는 모든 경로를 고려하는 느낌...
  
  matrix $A \in \left{0,1 \right}^{T \times T^{(k)}}$ ➜ warping 경로 탐색을 위한 matrix (최단 경로 기반으로 warping)
  
  matrix $C$ : Euclidean distance matrix
- $C_{i,j} = \left|x_i - y^{(k)}_{j} \right|$

matrix $\mathit{A'}$ : the set of $T \times T^{(k)}$ binary alignment matrix
$<\cdot, \cdot>$ : matrix inner product

$L_{shape}(X) = \frac{1}{L^{(D)}-1} \sum_{k=2}^{L^{(D)}}sDTW(X, Y^{(k)})$

Total loss

$L = \frac{1}{B} \sum^{B}{b=1}(L{MSE}(X_{b}), +, \lambda L_{shape}(X_{b}))$

$B$ : batch size ($b=1,2,...,B$) $\lambda$ : trade-off parameter

Anomaly Score

reconstruction error at time $t$ $e(t) = y_t - x_t$

validation step의 $\left{ e(t) \right}$를 normal distribution $N(\mu, \Sigma)$에 적합시킴

test set의 $x_t$가 anomalous할 확률은 $1 - \frac{1}{\sqrt{(2\pi)^{d}} \left|\Sigma \right|}exp(-\frac{1}{2}(e(t)-\mu)^{T} \Sigma^{-1}(e(t)-\mu))$

$x_t$의 anomaly score $s(t) = (e(t) - \mu)^{T} \sum^{-1}(e(t) - \mu)$

➜ $s(t)$가 클수록 anomalous할 확률값이 작아져서 normal data의 error distribution을 벗어 남

Conclusion

proposed recurrent ensemble network
multiresolution decoder를 사용해서 다양한 time step의 information을 사용
coarse-to-fine fusion mechanism을 통해 서로 다른 길이의 output을 통합
local information에 overfitting 되는 것을 방지하며, decoding 중 발생할 수 있는 error accumulation을 완화
probability based anomaly score

Dataset

본 논문에서 사용한 dataset list $T$는 window size

각 dataset의 dimension

Dataset	Dimension
ECG	bivariate
2D-gesture	bivariate
Power-demand	univariate
Yahoo's S5 Webscope	univariate

* 3개 이상의 multivariate time series를 사용한 추가 실험이 존재했으면 좀 더 다른 비교군들과 비교하기 좋았을 것 같음

Outlier Detection for Time Series with Recurrent Autoencoder Ensembles

Sat, 28 Jan 2023 17:36:03 GMT

Outlier Detection for Time Series with Recurrent Autoencoder ensembles

2019 International Joint Conference on Artificial Intelligence (IJCAI-19)

Autoencoder ensemble based Time series anomaly detection method

ensemble ➜ overfitting된 일부 autoencoder의 영향을 줄이면서 model의 전반적인 성능을 향상시키는 것이 목표

Introduction

previous autoencoder ensemble methods ➜ time series보다 non-sequential data에 적합했음

time series에 적합한 autoencoder ensemble을 구축하기 위하여, Recurrent Neural Network(RNN)을 사용 ➜ sparsely connected RNN을 사용

본 논문에서는 두 가지 방법의 ensemble frameworks를 제안

Independent Framework(IF) 여러 autoencoder를 독립적으로 train IF trains multiple autoencoders independently
Shared Framework(SF) 여러 autoencoder가 함께 train SF trains multiple autoencoder jointly through a shared feature space

➜ multiple encoder and decode의 조합을 통한 성능 향상 목적

Contribution

Sparsely-connected recurrent units을 통해 서로 다른 구조를 가지는 autoencoder 제안
multiple autoencoder를 활용하는 ensemble frameworks
autoencoder ensemble method를 time series에 적용

Autoencoder Ensembles For Time Series

본 논문에서는 time series modeling에 효과적이라고 알려진 RNN을 사용하여 autoencoder 구성

Notation

Time series $T = \left< s_1, s_2, ...: , s_C \right>$
Reconstructed time seris $\hat{T}^{(i)} = \left< \hat{s}^{(i)}{C}, ...: , \hat{s}^{(i)}{2}, \hat{s}^{(i)}_{1} \right>$

$s_i = (s^{(1)}_i, s^{(2)}_i,: ... :, s^{(k)}_i)$
- each vector $s$ represents $k$ features at a time point $t_i$
num of features $k$
time series length $C$

$1 \leq i \leq C$ $k = 1$ ➜ univariate $k > 1$ ➜ multivariate

Autoencoder Ensembles

목표 : autoencoder based anomaly detection method의 성능 향상

서로 다른 autoencoder라고 해도 fully-conneted면 결국 동일한 network이기 때문에 connection이 randomly remove된 sparsely-connected autoencoder를 사용하는 것이 더 좋음

different network structure의 sparsely-connected network를 사용하면 overall reconstruction errors의 variances를 줄일 수 있음

두 프레임워크 모두 시계열의 관측값이 이상치일 가능성을 정량화하는 최종 재구성 오류로 여러 오토인코더의 재구성 오류의 중앙값을 사용합니다

출처 : S-RNNs paper

Sparsely-connected RNNs(S-RNNs)

an autoencoder for anomaly detection in time series example

출처 : S-RNNs paper

$s$ : time series $h$ : hidden state

RNN units computation $h_t = f(s_t, h_{t-1})$ ➜ only consider previous hidden state

In previous research proposed Recurrent Skip Connection Networks(RSCNs)

➜ previous hidden state and additional hidden stated in the past

Recurrent residual learning for sequence classification (EMNLP, 2016)

hidden state in time step $t$ $h_t = \frac{f(s_t,:h_{t-1}) : + : f'(s_t, : h_{t-L}) }{2}$

$h_t$는 이전 hidden state인 $h_{t-1}$ 뿐만 아니라 $L$ 시점 이전의 hidden state $h_{t-L}$도 함께 고려 (같은 비중으로)

In this paper, proposed Sparsely-connected RNNS(S-RNNs)

RSCNs + randomly remove some connections bewteen hidden states ➜ sparseness weight를 사용해서 random connection 생성

sparseness weight vector $w_t = (w^{(f)}{t}, w^{(f')}{t})$ $w^{(f)}{t} \in \left{0, 1 \right}$ $w^{(f')}{t} \in \left{0, 1 \right}$

weight vector는 0 or 1의 값을 가짐 0 ➡ disconnected 1 ➡ connected

$h_t = \frac{f(s_{t}, : h_{t-1}) \cdot w^{(f)}{t} : + : f'(s_t, : h{t-L}) \cdot w^{(f')}{t}}{\left|w{t} \right|_{0}}$

$\left|w_{t} \right|_{0}$ : the number of non-zero elements in vector $w_t$

RCSN and S-RNN example

RSCN은 모든 hidden state가 $L$ 시점 이전의 hidden state를 매번 고려하지만, S-RNN은 이전 hidden state를 고려하는 state가 randomly select

본 논문에서 언급하는 RNNs with dropout과 S-RNNs의 다른 점

S-RNN : fixed throughout the training phase
RNN with dropout : randomly remove connections at every training epoch

S-RNN Autoencoder Ensembles

In this paper, proposed two different frameworks

Independent Framework(IF)

총 $N$개의 S-RNN autoencoder를 사용 각 autoencoder는 독립적으로 학습되며, 각각의 sparseness weight vector가 존재

Loss function $J_i = \sum^{C}{t=1}{\left|s{t} - \hat{s}{t}^{(D_i)} \right|}{2}^{2}$

$\hat{s}_{t}^{(D_i)}$ : reconstructed vector at time step $t$ from decoder $D_i$

Shared Framework(SF)

IF의 경우 training phase동안 autoencoder 간의 interation이 존재하지 않음 but 모든 autoencoder는 결국 original input을 reconstruction하는 것이 목표이기 때문에 autoencoder 간의 상호 작용이 의미가 있음 ➜ multi-task learning 적용

original input data를 reconstruction 하는 $N$개의 task(encoder)가 주어지면 각 encoder output이 shared layer를 통해 공유

$h^{(E)}_{C}$ : shared layer ➜ 모든 encoder의 last hidden states 를 linear combination을 통해 concat

$W^{E_i}$ : linear weight matrices

$h^{(E)}{C} = concatenate(h{C}^{(E_1)} : \cdot : W^{(E_1)}, :...: , h_{N}^{(E_N)})$

각 decoder는 concatenated hidden states 를 initial hidden state로 사용해서 input으로 사용된 time series를 reconstruction

모든 autoencoder는 하나의 loss function 함께 train

Loss Function

$J = \sum^{N}{i=1}J_i + \lambda{\left| h{C}^{(E)} \right|}_1$

$:::::= \sum^{N}{i=1}\sum^{C}{t=1}{\left|s_t - \hat{s}^{(D_i)} \right|{2}^{2}} + \lambda{\left| h{C}^{(E)} \right|}_1$

$\lambda$ : L1 regularization의 weight control parameter

loss function은 모든 autoencoder의 reconstruction error의 합산과 L1 regularization term으로 구성

L1 regularization
- 핵심 feature들만 남기기 위해서 사용 → 너무 작은 weight들은 0이 되어 중요한 weight들만 남게 됨
- shared hidden state를 sparse하게 만드는 효과
- 일부 encoder가 overfitting되는 경우 방지 & decoder를 robust하게

➜ autoencoder가 anomalous value를 만나면 residual의 차이가 더욱 커짐

Anomaly score

original time series $T$에 대한 autoencoder 개수 $N$개 만큼의 reconstructed time series가 도출되며 이에 따라 $N$개의 reconstructed error 생성 reconstructed error $\left{\left| s_k - \hat{s}{k}^{(1)} \right|{2}^{2}, \left|s_k - \hat{s}{k}^{2} \right|{2}^{2} , ...:, \left|s_k - \hat{s}_{k}^{N} \right| \right}$

final anomaly score $anomaly : score(s_k) = median(\left{\left| s_k - \hat{s}{k}^{(1)} \right|{2}^{2}, \left|s_k - \hat{s}{k}^{2} \right|{2}^{2} , ...:, \left|s_k - \hat{s}_{k}^{N} \right| \right})$

overfitting된 reconstruction errors의 영향을 감소시키기 위해 mean대신 median을 사용

Experiments

evaluation metirc : RP-AUC, ROC-AUC

대부분의 dataset에서 SF가 IF보다 좀 더 높은 성능을 보임을 확인할 수 있음

Concolusion

Sparsely-connected RNN 사용
두 가지 방법의 RNN based autoencoder ensemble frameworks

* 각 autoencoder를 독립적으로 학습
* autoencoders를 동시에 학습

Outlier Detection with Autoencoder Ensembles

Thu, 26 Jan 2023 12:36:59 GMT

Outlier Detection with Autoencoder Ensembles

2017 SIAM International Conference on Data Mining (SDM)

Introduction

In this paper, authors proposed autoencoder ensemble method : Randomized Neural Network for Outlier Detection(RandNet)

fully connected autoencoder 대신 randomly connected autoencoder를 사용 각 autoencoder는 structures와 connection densities가 서로 다름 ➜ computational complexity ↓

Deep Neural Network의 overfitting 위험성이 존재하며 종종 local optima로 수렴 ➜ 일부 structure가 overfit 되더라도 emsemble의 특성인 다양성으로 인하여 전반적으로 효율성이 향상될 수 있음

RandNet

Autoencoder based model ➜ input과 output이 최대한 비슷하게 reconstruction하도록 model training ➜ latent vector는 input의 주요한 특성을 포함하고 있을 것 ➜ anomaly는 잘 reconstruction x

fully connected auto-encoder example

출처 : RandNet paper

train data에 anomaly가 포함되어 있는 경우가 많아 neural networks based methods 학습 시에 overfitting의 가능성이 높아짐

➜ overfitting의 문제 해결하기 위해 ensemble method 적용

but multiple methods의 조합이 항상 best performance를 보이는 individual model보다 더 나은 결과를 보인다고 보장할 수는 없음 따라서, ensemble method가 잘 작동하기 위해서는 각 구성 요소가 충분히 다양해야 함 다양한 구조가 조합되면 서로 다른 패턴을 포착하기 용이할 것

본 논문에서는 fully connected autoencoder를 사용하면 각 model의 output이 비슷할 것이기 때문에 randomly connected autoencoder를 사용

randomly connected auto-encoder example

출처 : RandNet paper

RandNet과 dropout의 차이점

RandNet : 서로 다른 구조의 NN model의 결과가 합쳐짐 ➜ 단일 구조의 overfitting의 큰 문제 없음
Dropout : 단일 모델 내의 random connection ➜ overfitting 방지 목적

Neural Network Structure

autoencoder structure 각 layer에서 사용되는 node 수는 이전 layer의 node의 $\alpha$ 비율로 설정되며 최소 node 수는 3 ➜ botteleck hidden layer에서 과도한 압축으로 인하여 발생하는 정보 손실을 막기 위해

Activation Function

첫 번째 hidden layer와 output layer : Sigmoid function 나머지 layer : ReLU

두 activation function의 장/단점의 균형을 맞추는 것을 목표로 하며, 해당 activation function 조합을 사용하였을 때 가장 좋은 성능을 보임

Anomaly score

notation

$n$ : the number of data points
$d$ : data dimension
$m$ : ensemble components

$x_{ij} \in \mathbb{R}^{d}$ : $i$-th ensemble component's $j$-th input data point $o_{ij} \in \mathbb{R}^{d}$ : autoencoder's reconstructed output

$[OS_i]j = \sum{k=1}^{d}([x_{ij}]k - [o{ij}]_k)^2$

$i \in 1 ... m, : j \in 1 ... n$

data point의 final anomaly score는 ensemble의 median score로 계산

Conclusions

In this paper ...

autoencoder ensemble
adaptive data sampling : training iteration에 따라 sampling size를 증가시키며 학습

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data

Tue, 24 Jan 2023 14:02:50 GMT

A Deep Neural Network for Unsupervised Anomaly Detection and Diagnosis in Multivariate Time Series Data

2019 AAAI Conference on Artificial Intelligence

mutlivariate time series anomaly detection method
signature matrix

multivariate time series의 각 time series 사이의 inter-correlation을 사용한 signature matrix를 input으로 사용

Introduction

multivariate time series anomaly detection

previous anomaly detection methods ex) distance/clustering method(kNN), classification methods(OC-SVM), density estimation methods(Deep Autoencoding Gaussian Mixture Model; DAGMM)

➜ they cannot capture temporal dependencies across different time steps

위의 한계점을 보완하기 위하여 signature matrix를 제안

서로 다른 time step 사이의 정보를 multiple level로 표현하는 multi-scale(resolution) signature matrices 생성하여 temporal information을 활용

Contributions

Convolutional encoder : encode the inter-sensor correlations
Attention based ConvLSTM : incorporate temporal patterns
considers correlations among multivariate time series

MSCRED Framework

Multi-Scale Convolutional Recurrent Encoder-Decoder (MSCRED)

encode the spatial information in signature matrices via a convolutional encoder
model the tmeporal information via an attention based ConvLSTM
reconstruct signature matrices based upon a convolutional decoder

Notation

$X = (x_1, \cdots, x_n)^T \in \mathbb{R}^{n \times T}$

$X$ : multivariate time series $x_n$ : $n$ time series with length $T$ $w$ : window size

Signature Matrix

Time series 사이의 correlation을 확인하는 것은 system status를 파악하는데 중요
Multivariate time series의 각 time series 간의 inter-correlation을 나타내기 위하여 Signature Matrix를 제안
window 내에서의 correlation에 초점을 맞춤
two time series의 inner-product를 사용해서 correlation 계산

➜ capture the shape similarities and value scale correlations between two time series ➜ robust to input noise : 특정 구간에 anomaly가 존재하더라도 해당 data가 signature matrix의 생성 과정에서 미치는 영향이 작음

$n$-dim의 multivariate time series의 $t$시점까지의 signature matrix $M^t \in \mathbb{R}^{n \times n}$ $X$의 sub-time series $x_i$ and $x_j$

$x^{w}_i = (x^{t-w}_i,: x^{t-w+1}_i, : \cdots , : x^{t}_i)$ $x^{w}_j = (x^{t-w}_j,: x^{t-w+1}_j, : \cdots , : x^{t}_j)$

두 time series의 correlation $m^{t}_{ij} \in M^t$

$m_{ij}^{t} = \frac{\sum_{\delta =0}^{w}: x_{i}^{t-\delta}: x_{j}^{t-\delta}}{w}$

출처 : MSCRED paper

(a) Multivariate time series example (b) Signature matrix example

본 논문에서 사용한 segment 관련 size

hop size : 10
window size : 10, 30, 60 다양한 size를 사용하여 signature matrices 형성 후 concat해서 사용 window size가 anomaly 판단에 영향을 줄 것

총 5쌍의 signatur matrices를 사용함

Signature Matrix example

출처 : my notebook

synthetic data의 일부를 example로 사용 정현파로 구성된 normal data와 이상치가 주입된 일부 구간의 anomaly data

출처 : my notebook

위 구간의 data를 signatur matrix로 표현한 모습

➜ normal 구간의 signature matrix와 abnormal 구간의 signature matrix의 형태가 다른 것을 확인할 수 있음

Framework

MSCRED의 구조는 아래와 같음 전체적으로 autoencoder의 구조를 사용하고 있음

(a) Convolutional Encoder

signature matrice의 spatial pattern을 포착하는 것이 목표 (inter-time series correlation patterns) CNN으로 구성된 총 4개의 fully convolutional encoder를 사용

(b) Attention based ConvLSTM

Convolutional lstm network: A machine learning approach for precipitation nowcasting (Shi et al. 2015, NIPS)
- 기존 LSTM의 spatio 특성을 잘 반영하지 못하는 약점을 보완하기 위해 제안
- LSTM 내부 연산이 convolutiond으로 이루어져 spatio-temporal information을 동시에 학습할 수 있음

ConvLSTM을 사용하여 spatio-temporal information을 동시에 학습할 수 있지만, 여전히 sequence length가 길어질수록 성능 저하가 발생 ➜ 이전 time step의 signature matrices와의 attention을 사용하여 보완하고자 함 본 논문에서는 step length를 5로 설정하여 이전의 feature maps와의 attention을 확인

각 encoder layer를 통과한 feature maps를 ConvLSTM layer를 통해 temporal information을 추출한 hidden state를 생성 ➜ 각 시점의 hidden state 간의 attention을 수행(stnd : last feature map) ➜ feature maps 생성

(b)의 최종 feature map은 spatio-temporal informateion을 모두 포함하고 있을 것

(c) Convolutional Decoder

input으로 사용된 signature matrices를 reconstruction하기 위해 총 4개의 deconvolutional layer를 사용

각 위치의 ConvLSTM layer의 output과 이전 decoder의 output을 concat ➜ 다음 decoder의 input으로 사용

ConvLSTM layer의 output과 DeConv layer의 output을 결합함으로써 더 나은 anomaly detection performance를 기대할 수 있다고 함

모든 decoder를 거친 후엔 input으로 사용된 signature matrices 중에서 가장 마지막 $t$ 시점의 signature matrices가 reconstruction 됨

(d) Calculate Residual Matrices

본 논문에서는 총 5 time step을 input으로 사용 복원 시에는 last time step의 signature matrices를 복원

$t$ 시점의 signature matrices와 reconstructed signature matrices 간의 차이로 residual signature matrices를 구함 ➜ MSE loss를 사용하여 model training

Conclusion

original time series대신 signature matrices라는 새로운 input 형태를 도입
signature matrix는 각 univariate time series 간의 inter-correlation을 기반으로 형성
training process 동안 inter-correlation과 temporal dependencies를 train
poorly reconstructed row/column ➜ anomaly root cause 판단

Robust Random Cut Forest Based Anomaly Detection On Streams

Wed, 18 Jan 2023 10:43:48 GMT

Robust Random Cut Forest Based Anomaly Detection On Streams

2016 Proceedings of The 33rd International Conference on Machine Learning

Binary Search Tree 기반 Algorithm으로 Stream data에 존재하는 Anomaly 탐지 목적 Isolation Forst를 real-time streaming 환경에서 적용할 수 있도록 변형

Tree 구조를 Stream data에 적용했다는 의의

본 논문의 핵심 질문

1) How do we define anomalies? 2) What data structure do we use to efficiently detect anomalies over dynamic data streams?

Differences between IF and RCF

기존의 Isolation Forest(IF) 구분되는 부분

Feature selection
Anomaly score

Feature Selection

Isolation Forest split에 사용할 feature를 randomly select
Extended Isolation Forest IF와 동일
Random Cut Forest feature의 범위에 따라 각 feature가 선택될 확률을 부여

Anomaly Score

Isolation Forest 모든 Tree의 Average path length를 anomaly score로 사용

0.5를 기준으로 normal과 anomaly를 구분

Extended Isolation Forest IF와 동일
Random Cut Forest dataset에서 data point를 제거하고 남은 data에서 발생하는 depth 변화의 관점에서 새로운 anomaly score를 정의 model complexity 관점

Robust Random Cut Tree

robust random cut tree on point set S

$T(S)$ : $S$로부터 생성된 tree

random choice feature $p$ $i$번째 feature가 선택될 확률 : $\frac{l_i}{\sum_jl_j}$ $l_i = max_{x\in S} : x_i - min_{x\in S}: x_i$ ➜ 각 feature의 값의 범위에 따라 해당 feature가 선택될 확률이 결정

randomly select value $q$ choose $X_i$ ~ $Uniform[min_{x\in S} : x_i, max_{x\in S} : x_i]$
split point $q$보다 작으면 left branch로 크면 right branch로 분기

Anomaly Score

IF는 anomaly면 tree에서 먼저 isolation된다는 특징을 사용하여 anomaly score를 측정 ➜ average path lengh

RRCF는 model complexity 관점에서 anomaly score를 측정 ➜ abnormal point increases model complexity

Displacement(DISP)

$DISP(x, Z)$ : dataset $Z$에 존재하는 data point $x$를 제거했을 때, 남은 data들의 depth 변화의 총합 ➜ 각 tree에서 발생하는 depth 변화의 기댓값

(a) : before delete $x$ (b) : after delete $x$

$x$를 제거하면 sub-tree c에 존재하는 node들의 depth가 1씩 감소 $x$와 직접적으로 연결되어 있지 않은 sub-tree b의 depth는 변화 없음 ➜ $x$로 인한 depth 변화의 총합 == $x$의 sibling node에 있는 data의 개수 ➜ $x$가 anomaly일수록 $x$로 인한 전체 depth 변화가 클 것

Collusive Displacement(CODISP)

본 논문에서는 DISP는 masking 문제를 고려하기 위해 anomaly의 주변까지 고려하는 anomaly score를 제안

masking : 이상치들끼리 모여 마치 정상인 것 처럼 보이게 하는 문제

masking 현상 때문에 abnormal data $p$ 옆에 $q$가 있다면 $p$의 $DISP$는 매우 작을 것 abnormal data를 숨겨주는 colluder까지 고려하여 anomaly score를 계산 $x$ 주변의 collusive cluster $C$를 제거했을 때 발생하는 depth의 총합을 고려 ➜ but $C$의 size가 클수록 depth 변화가 클 것 ➜ $C$의 size의 영향을 줄이고자 최종적으로 $DISP$를 $C$의 size로 나눈 $CODISP$를 사용 ➜ but $C$의 size를 정확하게 파악할 수 없다는 문제가 존재하기 때문에 고려할 수 있는 max value를 사용

$x$ : data point $Z$ : dataset $S$ : sub-set

$CODISP(x, Z, |S|) = \mathbb{E}[\underset{x\in C \subseteq S}{max} \frac{1}{|C|} \sum DISP(x, z)]$

Algorithm

Forget Point

Tree $T$에서 $p$에 해당하는 node $v$를 찾음
node $v$의 parents node를 제거하고 node $v$의 sibling node $u$를 parents node로 설정 (root to $u$의 path ↓)
new parents $u'$로부터 시작하는 모든 sub-tree update
return modified tree $T'$

Insert Point

insert point algorithm에 대한 내용 추가

Experiments

real time data를 사용하여 실험을 진행 shingling 기법을 이용하여 preprocessing 진행

shingling : 1-d sequence data를 n-dimensional vector로 형태를 변형하는 전처리 방법 ➜ 규칙적인 변화 탐지 및 작은 noise filtering에 효과적 ➜ 성능이 shingle size에 영향 받을 수 있음

synthetic data example

일정 구간에 이상치를 주입한 synthetic data를 사용하여 IF 와 RRCF를 실험

blue는 실험에 사용한 data, red는 anomaly score

(a) IF
(b) RRCF

본 논문에서 stream data에서 이상치를 탐지할 때 start point를 탐지하는 것의 중요성을 언급

이상치의 end point 이후에 system은 이미 normal state로 돌아오기 때문에 start point를 탐지하는 것이 중요함

해당 예시에서 IF의 경우 anomaly의 start point를 탐지하지 못했지만, RRCF는 start와 end point를 탐지

tree based anomaly detection method를 time-series에 적용 multivariate time-series에 적용하기 위해서는 좀 더 고려해볼 사항이 존재

papaer github

https://github.com/kLabUM/rrcf

Extended Isolation Forest

Mon, 09 Jan 2023 16:13:19 GMT

Extended Isolation Forest

2021 IEEE Transactions on Knowledge and Data Engineering

Isolation Forest의 수직/수평 방향의 split 방법을 보완하는 split 방법을 사용 Isolation Forest가 잘 분할하지 못하는 영역에 존재하는 anomaly도 분할하려는 목표

Introduction

전체적인 흐름은 Isolation Forest(IF)와 비슷 IF에서 tree를 만들기 위해 사용한 random split의 method를 확장

Isolation Forest iTree ensemble model axis-parallel split ↳ 축에 수직/수평 방향으로 split random choice feature and random choice value(in feature)
Extended Isolation Forest IF의 split rule 변경 ↳ 구분이 어려운 영역에 존재하는 anomaly를 탐지하는데 어려움이 있음 non-axis-parallel split ↳ Binary Search Tree(BST)를 생성할 때 data split을 위해 random slope가 있는 hyper-plane을 사용

Motivation

image 출처 : Extended Isolation Forest

간단한 dataset에 IF를 적용한 결과 예시 좌측 : dataset 우측 : synthetic data의 anomaly score map

빨간색이 짙어질수록 높은 anomaly score를 나타냄 anomaly로 판단될 가능성이 높은 영역의 anomaly score가 정상에 가깝게 판단된 것을 확인할 수 있음 (unexpected artifacts) 정현파 data의 경우 곡선 사이의 비정상 영역까지 모두 정상으로 판단됨

anomaly score map도 정상 데이터의 형태와 비슷하게 원의 바깥을 향할수록 anomaly score가 높아지는 것을 기대 정상 범위를 벗어나는 부분에서 직사각형 모양으로 낮은 anomaly score를 보이고 있음 이 때, 중심으로부터 같은 거리에 위치한 data point의 anomaly score가 서로 다른 값을 지닐 수 있음

dataset의 위치는 대략 (0, 10)과 (10, 0)에 위치 score map의 해당 위치는 anomaly score가 잘 표시되어 있지만 (0, 0)과 (10, 10) 주변에도 'ghost cluster'가 생성 또한, 앞선 예제와 같이 cluster 주변 사각형 형태의 score map이 생성된 것을 확인할 수 있음 False Positive의 가능성을 높일 수 있으며 실제 데이터에 존재하지 않는 모습을 만들어낼 수 있음

sign 곡선 사이의 data가 없는 빈 공간은 anomaly region으로 판단되어야 하는데 정상 영역이 거의 직사각형의 형태로 나타남

위의 anomaly score map과 같이 이상치가 판별되는 이유 ➜ IF는 수직/수평 방향으로 split을 진행하기 때문

IF의 split example

위 그림으로 봤을 때, data의 밀도가 높은 영역일수록 data point를 isolation 시키기 위해 더 많은 split이 진행되는 것을 확인할 수 있음 특히 정상 영역의 중심부에 존재하는 data point를 isolation 시키기 위해 여러 번의 split이 진행되며 data가 존재하지 않는 영역도 같이 split 됨 axis-parallel split으로 인해 data가 존재하지 않는 영역도 같이 분할되며 정상 영역으로 판단될 확률이 높아지는 것 또한 해당 부분에 존재하는 anomaly의 isolation 난이도가 높아질 것

data의 밀도가 높을 수록 더 많은 split이 필요한 것을 보여줌 split의 밀도가 낮다는 것은 해당 영역에 존재하는 point들을 isolation 시키기 쉽다는 말이므로 anomaly score가 작아짐 (b)의 (10, 10) 주변은 이상 영역으로 판단되어야 하지만 (0, 10)과 (10, 0) 주변에 존재하는 data의 isolation을 위한 split의 영향을 받아 많은 split이 발생

Extended Isolation Forest

split example

이상 영역이 정상 영역으로 판단되는 문제를 해결하기 위해 제안 IF의 split을 위한 feature selection 때, 각 feature들의 중요도 혹은 설명력을 고려하며 선택하지 않음 ➜ split rule에 random slope를 추가하면 split 행위에 큰 의미 변화는 없으면서 좀 더 normal과 abnormal을 잘 구분하는 선을 그을 수 있을 것 ➜ non-axis-parallel split

EIF의 split example 기울기가 있는 선으로 split 수행

data point의 밀도가 높은 영역에 존재하는 data를 isolation 시키기 위해 비정상 영역에도 많은 split이 적용되었던 IF의 문제가 어느 정도 해결된 것을 확인할 수 있음

normal data의 밀도가 높은 부분을 따라 생성되는 ghost cluster와 같이 오판을 일으키는 split이 감소 data가 존재하는 영역 위주의 split (b)의 경우 IF에서 (10, 10)과 같은 위치에서 발생했던 split 문제가 감소

split rule

The selection of the branch cuts requires two information

IF : axis-parallel split
- random feature
- random value

EIF : non-axis-parallel split
- random slope
- random intercept

split criteria

➜ $(\vec{x} - \vec{p}) \cdot \vec{n} \leq 0$

dataset dimension : $N$ dataset : $X$ data point : $\vec{x}$ random slope : $\vec{n}$ random intercept : $\vec{p}$

selecting a random slope is the same as choosing a normal vector ➜ draw random number for each coordinate of $\vec{n}$ from the standard normal distribution $N(0,1)$

selecting a random intercept ➜ draw from a uniform distribution over the range of values present at each branching point

$(\vec{x} - \vec{p}) \cdot \vec{n} \leq 0$을 만족시키는 data point 는 left branch로 아닌 경우는 right branch로 이동

data point $x$와 절편 $p$ 사이의 직선과 normal vector $n$이 직교하면 내적은 0

절편 $p$의 경우 분기점에서 사용 가능한 data로 제한되기 때문에 tree가 깊어질수록 data가 있는 곳에 누적되는 경향이 있음

branch hyperplane

$N$ dimension data의 branch cut은 최대 $N-1$ dimension hypterplanes를 사용 extension level에 따른 branch cut을 사용

아래 그림은 3-dimensional dat에서의 extension level에 따른 branch cut example

$2^{nd}$ Extension
- 모든 축과 교차하는 hyperplane을 사용하여 split

$1^{st}$ Extension
- 해당 예시에서 hyperplane은 항상 3개의 축 중 하나의 축에 평행

$0^{th}$ Extension
- 항상 두 개의 축과 평행한 random slice 사용
- 가장 낮은 확장 수준은 standard IF의 split과 일치

본 논문에서 제안한 extension level에 따른 split은 다차원 dataset에서 각 dimension에 따른 data의 range가 다를 때 유용할 수 있음 적절한 extension level을 선택하면 computational overhead를 감소시킬 수 있음

ex) 3-dimensional data에서 하나의 dimension에 비해 나머지 두 dimension의 범위가 훨씬 작으면 standard IF가 더 최적의 결과를 도출할 것

Algorithm

split rule을 제외한 나머지 algorithm은 standard IF와 동일

Algorithm 1 : build iForest는 IF와 같음

Algorithm 2 : build iTree

(좌) : IF (우) : EIF

input data $X$가 isolation되면 external node로 return

split에 사용할 normal vector $\vec{n}$을 select 이 때 필요한 coordinate는 standard Gaussian distribuion으로부터 추출
intercept point $\vec{p}$를 $X$의 범위에서 randomly select
extension level에 따라 $\vec{n}$의 coordinate를 조정
$(\vec{x} - \vec{p}) \cdot \vec{n} \leq 0$을 만족하면 left branch로 할당
$(\vec{x} - \vec{p}) \cdot \vec{n} > 0$을 만족하면 right branch로 할당

Algorithm 3 : Path Length IF와 같음 ➜ data point에 대한 average path length를 구해서 최종적으로 anomaly score를 산출

Result and Conclusion

Score Map

예시로 사용한 세 경우에서 발생했던 직사각형의 형태로 존재했던 artifacts가 조금 사라진 것을 확인할 수 있음

two blobs synthetic dataset의 경우 두 cluster 사이의 연결 형태가 아직 남아있음

sin graph의 경우 EIF의 score map에서 original data의 형태와 비슷하 모양으로 anomaly score가 계산된 것을 확인할 수 있음

Conclusion

Isolation Forest로 도출된 anomaly score map에서 잘못 판단될 가능성이 높은 영역을 포착 ➜ IF의 split 방법으로 인해 생긴 이상 영역이라고 판단
standard Isolation Forest의 split 방법을 확장 IF : 특정 축에 평행인 선을 활용한 split EIF : 특정 축에 평행인 선을 포함하는 random 기울기가 추가된 선을 활용한 split
dataset에 따라 IF보다 더 더 정확한 결과를 도출할 수 있음

Isolation Forest

Thu, 05 Jan 2023 18:14:46 GMT

Isolation Forest

2008 IEEE International Conference on Data Mining

Introduction

Isolation Forest(IF)는 anomaly의 정량적 특성을 사용하여 nomal과 anomaly를 구분

1) Anomalies are the minority consisting of fewer instances 2) Anomalies have attribute-values that are very differnt from those of normal instance

➡ few and different : more susceptible to isolation than normal points

abnormal points : isolated closer to the root of the tree normal points : isolated at the deeper end of the tree

IF는 tree 구조를 사용하여 point를 분리하고, 이런 tree의 ensemble(Forest)로 구성 저자는 time complexity가 linear하기 때문에 data의 양이 늘어나도 computational cost가 data 양에 비례하여 늘어남 ➞ large dataset or high-dimensional data에도 충분히 적용 가능

Isolation Trees and Forest

isolation의 의미 : separating an instance from the rest of the instances 즉, 어떤 data point가 단 하나의 구역에 존재하게 되는 상황 ➡ 이상치는 정상 데이터에 비하여 isolation에 민감

Isolation tree

IF는 iTree를 사용하는 tree ensemble model 모든 point가 isolation될 때까지 재귀적으로 랜덤 이진 분할

normal point : isolation까지 많은 split이 필요 ➞ 깊은 depth
abnormal point : normal point와 비교했을 때 더 적은 split으로 isolation ➞ 얕은 depth

➡ split 횟수(path length)를 normal과 anomaly를 구분하는 measure로 사용

random partitioning example

출처 : Isolation Forest (ICDM 2008)

정상 데이터의 분포에 속해있는 $x_i$와 비교했을 때, anomaly point $x_o$가 더 적은 split으로 isolation 됨 ➡ 각 point를 고립시키는 것은 이상치가 정상 데이터보다 쉬움

출처 : Isolation Forest (ICDM 2008)

normal point $x_i$와 abnormal point $x_o$를 isolation 하는데 사용된 split 수(path length)가 iTree의 개수가 증가함에 따라 수렴 ➡ ensemble하면 robust한 모델을 만들 수 있음

추가 일반적으로 sample의 크기가 클 수록 더 나은 성능을 보여주는 기존 anomaly detection 방법들과 달리 IF는 sampling size를 작게 유지할 때 가장 잘 작동 ➞ 많은 data를 사용할 때, sub-sample을 사용하면 성능에는 큰 차이가 존재하지 않지만 computational complexity는 충분히 줄어들 수 있음

Anomaly Score

dataset $X$ = {$x_1, ..., x_n$} $h(x)$ : point $x$의 path length(isolation에 사용된 split 수) $E(h(x))$ : $x$의 path length의 평균(모든 iTree에서 사용된 path length의 평균) anomaly score $s(x,n)$ = $2^{-{E(h(x))\over c(n)}}$ $c(n)$은 average path length of unsuccessful search in Binary Searcg Tree(BST), $h(x)$를 normalize 하기 위해 사용(iTree의 평균 경로 길이)

anomaly score는 1에 가까울수록 point $x$가 anomaly에 가까움을 나타냄 $E(h(x))$가 0에 가까워진다는 것은 $x$를 isolation 시키기 위한 평균 split 횟수가 0에 가까워 진다는 것

Algorithm

iForest

$X$ : input data $t$ : number or trees $\psi$ : sub-sampling size

parameters

sub-sampling size $\psi$
- train data의 size를 관리
- 적절한 값으로 결정하면 anomaly를 보다 안정적으로 탐지할 수 있음
- 너무 큰 size로 선택하면 성능과 상관 없이 processing time과 memory size를 크게 증가시킴 본 논문에서는 $2^8$로 설정 ➞ anomaly를 detection하기에 충분한 size

number of trees $t$
- ensemble size 제어
- 본 논문에서 100보다 전에 수렴한다는 것을 확인

tree의 heigh limit : $logN$ ➞ 대략적인 average tree length ➞ node가 $N$개인 complete binary tree의 height는 $logN$ ➞ average tree length까지 tree를 성장시키는 이유 : anomaly는 average path length가 normal에 비하여 더 짧음 ($logN$ 전에 isolation 될 것)

iTree

$X$ : input data $e$ : current tree height $l$ : height limit

    1. input data가 isolation or tree height가 height limit에 도달하면 external node로 return 
    4-5. X의 attribute 중 하나를 randomly select ➞ q
    6. q의 범위 내에서 split에 사용할 point p를 randomly choice
    7-8. input data의 q의 value가 p보다 작으면 left branch로, 크거나 같으면 right branch로 할당
    9. 모든 data point가 isolation 될 때까지 tree 생성을 반복하고 q와 p의 정보를 저장

Evaluation stage

t개의 iTree에 대해 모든 data point $x$의 path length를 계산하고 이를 기반으로 각 data point의 anomaly score를 도출

PathLength를 사용해서 point $x$가 root node에서 leaf node로 가는 edge 수를 구함 각 tree의 path length의 평균으로 point $x$의 anomaly score를 산출

1. (normal) x가 external node인 경우 current path length에 split을 계속했을 때 
   기대되는 path length를 더함
4-8. split value와 x의 a attribute의 value를 비교
     criteria보다 작으면 left node로 크면 right node로 보내 다시 알고리즘 적용

Conclusion

anomaly의 'few and different' 특징을 사용해서 abnormal point를 탐지 ➞ anomaly는 normal과 비교했을 때 tree의 root에 더 가까운 위치에서 isolation

train data의 일부를 사용하여 생성한 tree들의 ensemble을 통해 이상치를 탐지하는 효과적인 모델 구축

linear time complexity ➞ large dataset에도 적용 가능

Dataset

본 논문에서 사용한 dataset list