ray_park.log

PyTorch 익히기 - ReLU & Weight Initialization & Dropout & Batch Normalization

Mon, 11 Jul 2022 14:44:57 GMT

PyTorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 PyTorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

기본적인 딥러닝 내용과 파이썬 문법은 어느 정도 알고 있다고 가정하고, PyTorch 실습 내용 위주로 정리해두었다.

!youtube[https://www.youtube.com/watch?v=KofAX-K4dk4&list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv&index=14]

간단한 설명이 포함된 실습 자료는 Github를 참조하자.

1. ReLU (Rectified Linear Unit)

1) Problem of Sigmoid Function

Sigmoid function은 Vanish Gradient Problem이라는 큰 문제점이 존재한다.

Backpropagation 과정에서 gradient를 구하여 곱하게 되는데, sigmoid 함수는 위와 같이 양쪽 끝 지점에서의 기울기가 매우 작으므로, layer가 깊을수록 그 기울기의 값이 0에 수렴해버린다.(vanishing, 즉 값의 영향이 사라져버리는 것이다.)

2) ReLU

이를 해결하기 위해 제안된 Optimizer가 바로 Rectified Linear Unit이다.

먼저, ReLU 함수의 수식과 그래프 개형은 다음과 같다.

$f(x) = \max(0, x)$

PyTorch에서의 함수는 다음과 같다.

x = torch.nn.relu(x)

이외의 다양한 activation function을 사용할 수 있다.

x = torch.nn.sigmoid(x)
x = torch.nn.tanh(x)
x = torch.nn.leaky_relu(x, 0.01)

3) Optimizers in PyTorch

torch.nn 함수 내에서는 다음과 같은 다양한 optimizer를 제공한다.

torch.optim.SGD
torch.optim.Adadelta
torch.optim.Adagrad
torch.optim.Adam # 많이 사용!
torch.optim.SparseAdam
torch.optim.Adamax
torch.optim.ASGD
torch.optim.LBFGS
torch.optim.RMSprop # 많이 사용!
torch.optim.Rprop

각각의 자세한 원리는 여기서 다루지 않겠다. 대신, 다음 그림을 통해 개념적으로 여러 Optimizer를 파악할 수 있다.

2. Weight Initialization

인공지능 분야를 개척했다고 알려진 제프리 힌턴(Geoffrey Hinton) 교수는 weight initialization을 강조했다.

그 이유와 여러 초기화 방법을 알아보자.

1) Why Good Initialization?

먼저, 가중치 초기화가 왜 중요한지 알아보자.

지금까지 실습에서 항상 가중치를 무작위로 초기화하였다. 하지만, 다음 그림을 보자.

위 그림은 MNIST와 CIFAR10 데이터셋에 대한 error curve이다.

그래프의 색은 서로 다른 optimizer를 뜻하고, 실선과 점선은 weight initialization 방식에 따라 나뉜다.

한 눈에 보아도, 같은 색의 그래프에서 점선이 훨씬 더 좋은 성능을 보이는 것을 알 수 있다. 이때 N(점선)으로 표시된 것은, 무작위로 초기화하는 것이 아닌 Normalized Initialization을 뜻한다.

2) Weight Initialization Methods

그렇다면 어떻게 초기화하는 것이 현명한 방법일까?

일단, 모두 0으로 초기화하는 것은 안된다. 왜냐하면 gradient를 계산하는 과정에서 모두 0이면 학습이 진행되지 않기 때문이다.

Hinton 등은 2006년의 논문에서 Restricted Boltzmann Machine (RBM)을 이용하여 초기화하였을 때 Deep Neural Network의 성능이 훨씬 좋아졌다는 것을 보였다.

(1) Restricted Boltzmann Machine

Restricted라는 의미는 하나의 Layer 내의 노드 간에는 연결이 없다는 의미이다.

또한 위 사진과 같이 다른 layer 간의 노드끼리는 모두 연결이 된 형태이다.

이 Machine 내에서는 입력 x가 (v layer) 들어갔을 때, y (h layer)를 만들 수 있는 encoding 과정과, 반대로 y에서 x'으로 돌아가는 decoding 과정이 있다.

Hinton 교수는 이러한 RBM의 원리를 인접한 두 layer 간의 pre-training step에 적용하였다.

Pre-training은 다음의 과정을 거친다.

(a)에서와 같이 두 개의 layer를 RBM으로 학습한다.
(b)에서와 같이 $h_1$와 $x$간의 parameter(weight)는 고정시키고, $h_1$ layer와 새로운 $h_2$ layer를 RBM으로 학습한다.
위 과정을 마지막 layer까지 반복한다.

이를 이용하여 Fine-tuning의 과정을 거친다.

이미 RBM을 통해 초기화가 된 weight를 사용하여 y 및 loss를 구하여 backpropagation 등의 알고리즘에 따라 학습을 진행하는 것을 Fine-tuning이라 한다.

(2) Xavier Initialization / He Initialization

RBM을 이용한 initialization은 매우 복잡하지만, 시간이 지나면서 더 간단하고 좋은 성능을 보이는 initialization 알고리즘이 개발되었다.

먼저, Xaiver Initialization은 2010년에 고안된 알고리즘으로, 단순히 Normal distribution 또는 Uniform distribution으로 가중치를 초기화한다.

[Xavier Normal Initialization]

$W \sim N(0, Var(W))$ $Var(W) = \sqrt{\frac{2}{n_{in} + n_{out}}}$

여기서 $n_{in}$은 layer의 input node 개수를, $n_{out}$은 layer의 output node 개수를 말한다.

[Xavier Uniform Initialization]

$W \sim U(-\sqrt{\frac{6}{n_{in} + n_{out}}}, +\sqrt{\frac{6}{n_{in} + n_{out}}})$

He Initialization도 똑같이 표준분포와 균일분포를 통해 생성하는데, 수식에 약간의 차이가 있다.

[He Normal Initialization]

$W \sim N(0, Var(W))$ $Var(W) = \sqrt{\frac{2}{n_{in}}}$

[He Uniform Initialization]

$W \sim U(-\sqrt{\frac{6}{n_{in}}}, +\sqrt{\frac{6}{n_{in}}})$

단순히 Xavier Initialization의 수식에서 output node 개수 term만 없앴다는 사실을 알 수 있다.

3) Xavier Initialization Implementation

xavier initialization 실습을 진행해보자.

Xavier Initialization 함수는 다음과 같이 간단하게 적용할 수 있다.

torch.nn.init.xavier_uniform_(layer.weight)

다른 부분의 실습 설명은 이전 포스팅을 참고하자.

import torch
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import random

# parameters
learning_rate = 0.001
training_epochs = 15
batch_size = 100

# MNIST dataset
mnist_train = dsets.MNIST(root='MNIST_data/',
                          train=True,
                          transform=transforms.ToTensor(),
                          download=True)
mnist_test = dsets.MNIST(root='MNIST_data/',
                        train=False,
                        transform=transforms.ToTensor(),
                        download=True)

# Dataset Loader
data_loader = torch.utils.data.DataLoader(dataset=mnist_train,
                                          batch_size=batch_size,
                                          shuffle=True,
                                          drop_last=True)

# nn layers
linear1 = torch.nn.Linear(784, 256, bias=True)
linear2 = torch.nn.Linear(256, 256, bias=True)
linear3 = torch.nn.Linear(256, 10, bias=True)
relu = torch.nn.ReLU()

# xavier initialization
torch.nn.init.xavier_uniform_(linear1.weight)
torch.nn.init.xavier_uniform_(linear2.weight)
torch.nn.init.xavier_uniform_(linear3.weight)

# model
model = torch.nn.Sequential(linear1, relu, linear2, relu, linear3)

# define cost & optimizer
criterion = torch.nn.CrossEntropyLoss() # softmax is internally computed.
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

total_batch = len(data_loader)
for epoch in range(training_epochs):
    avg_cost = 0

    for X, Y in data_loader:
        # reshape input image into [batch_size by 784]
        # label is not one-hot encoded
        X = X.view(-1, 28 * 28)

        optimizer.zero_grad()
        hypothesis = model(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()

        avg_cost += cost / total_batch

    print('Epoch: ', '%04d' % (epoch + 1), 'cost: ', '{:.9f}'.format(avg_cost))

print('Learning finished')

# Test the model using test sets
with torch.no_grad():
    X_test = mnist_test.test_data.view(-1, 28 * 28).float()
    Y_test = mnist_test.test_labels

    prediction = model(X_test)
    correct_prediction = torch.argmax(prediction, 1) == Y_test
    accuracy = correct_prediction.float().mean()
    print('Accuracy: ', accuracy.item())

    # Get one and predict
    r = random.randint(0, len(mnist_test) - 1)
    X_single_data = mnist_test.test_data[r:r + 1].view(-1, 28 * 28).float()
    Y_single_data = mnist_test.test_labels[r:r + 1]

    print('Label: ', Y_single_data.item())
    single_prediction = model(X_single_data)
    print('Predcition: ', torch.argmax(single_prediction, 1).item())

Test 결과는 다음과 같다.

결과를 통해 weight 초기화 방법을 바꾸는 간단한 작업 하나로도 정확도를 꽤 향상시킬 수 있음을 알 수 있다.

PyTorch 익히기 - Multi Layer Perceptron

Mon, 11 Jul 2022 07:38:39 GMT

PyTorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 PyTorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

기본적인 딥러닝 내용과 파이썬 문법은 어느 정도 알고 있다고 가정하고, PyTorch 실습 내용 위주로 정리해두었다.

!youtube[https://www.youtube.com/watch?v=KofAX-K4dk4&list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv&index=12]

간단한 설명이 포함된 실습 자료는 Github를 참조하자.

1. Perceptron

Perceptron은 인공 신경망의 기본 unit이다. 인공 신경망은 뇌의 '뉴런'의 동작 방식을 본따 만들었다. 뉴런 각각의 동작 방식은 매우 간단하다. 입력 신호를 받아 그 신호 값이 threshold를 넘게 되면 신호를 전파하고, 다음 뉴런으로 그 신호를 전해주게 된다.

이러한 인공신경망 중의 하나인 perceptron에 대해 알아보자.

Perceptron은 입력 $x$에 가중치 $w$를 곱한 후 bias $b$와 더해지고, activation function을 거쳐 output을 낸다.

초창기 Perceptron은 Linear Classifier로 사용되었다.

1) AND, OR Problem

1950년대에 Perceptron이 개발되었는데, 이 때에는 AND, OR 문제를 해결하기 위해 사용되었다.

[AND Gate]

A	B	result
0	0	0
0	1	0
1	0	0
1	1	1

[OR Gate]

A	B	result
0	0	0
0	1	1
1	0	1
1	1	1

아래 그림을 보면 Perceptron을 통해 AND, OR 문제를 잘 해결할 수 있음을 알 수 있다.

Linear Classifier로 이러한 문제를 해결하면서, 인공신경망은 크게 주목받기 시작했다.

2) XOR Problem

하지만, 민스키가 Perceptron 구조로는 XOR 문제를 해결할 수 없다는 사실을 증명하였다. 하나의 Linear Classifier로는 위 그림처럼 XOR 문제를 해결할 수 없다는 결론을 얻은 것이다.

XOR 문제는 다음과 같다.

[XOR Gate]

A	B	result
0	0	0
0	1	1
1	0	1
1	1	0

이로 인해 인공신경망 분야는 암흑기에 빠지게 된다.

XOR Implementation

code로 XOR Problem을 Perceptron으로 해결해보자.

import torch

X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = torch.FloatTensor([[0], [1], [1], [0]])

# nn layers
linear = torch.nn.Linear(2, 1, bias=True) # FC Layer
sigmoid = torch.nn.Sigmoid() # activation function
model = torch.nn.Sequential(linear, sigmoid)

# define cost & optimizer
criterion = torch.nn.BCELoss() # binary cross entropy loss (0, 1 분류)
optimizer = torch.optim.SGD(model.parameters(), lr=1)

for step in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)
    # cost function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()
    if step % 1000 == 0:
        print(step, cost.item())

결과는 다음과 같다.

결과를 살펴보면, 약 200 step 이후부터는 loss가 줄지 않는다. 즉, 학습이 제대로 진행되지 않는다.

test를 진행하면 다음과 같은 결과를 얻을 수 있다.

학습이 끝난 후에 각 x에 대한 결과값을 보면, perceptron 모델이 모든 값을 0.5로 예측하고 있음을 알 수 있다. 실제로는 [0, 0, 0, 0]을 나타내므로 정확도도 50%에서 더 나아지지 않는다.

2. Multi Layer Perceptron (MLP)

그러면 XOR 문제를 해결하려면 어떻게 해야할까?

Perceptron은 단순히 linear classifier이지만, XOR 분류 문제에서는 고차원의 분류기를 사용해야 할 것이다.

1) Multi Layer Perceptron

다음 사진을 보자.

위의 XOR 문제 그림에서, 하나의 선으로는 구분이 불가능했다. 하지만, 선 하나를 더 긋는다면 (Perceptron 두 개를 사용한 모델) 해결이 가능해진다. 하지만 당시에는 여러 퍼셉트론을 학습할 방법이 없었고, 그렇게 인공신경망 분야는 암흑기에 빠졌다.

2) Backpropagation

Backpropagation 알고리즘은 loss(error)를 output단에서부터 input단으로까지 전파시키며 weight를 업데이트하는 알고리즘이다. 이는 현재에도 사용되는 알고리즘이며, 이 때부터 MLP를 학습할 수 있게 되었다.

Backpropagation Implementation

Backpropagation 알고리즘 실습을 진행해보자.

먼저, 라이브러리를 import하고, 데이터와 라벨을 생성한다.

import torch

X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = torch.FloatTensor([[0], [1], [1], [0]])

다음으로, 3가지 인공신경망 layer를 생성해준다. backpropagation 과정을 자세히 알아보기 위해 weight, bias를 따로 선언해주었고, sigmoid 함수와 sigmoid 함수의 미분까지 함수로 구현해주었다.

w1 = torch.Tensor(2, 2)
b1 = torch.Tensor(2)
w2 = torch.Tensor(2, 1)
b2 = torch.Tensor(1)

def sigmoid(x):
    # sigmoid function
    return 1.0 / (1.0 + torch.exp(-x))

def sigmoid_prime(x):
    # derivative of the sigmoid function
    return sigmoid(x) * (1 - sigmoid(x))

실제 구현 시에는 nn.Linear 함수를 사용하면 다음과 같이 간단히 표현 가능하다.

# nn layers
linear1 = torch.nn.Linear(2, 2, bias=True)
linear2 = torch.nn.Linear(2, 1, bias=True)
sigmoid = torch.nn.Sigmoid()

sigmoid는 activation 함수이므로 각 linear layer 이후에 한 번씩 붙고, linear1은 입력 2개를 받아 출력으로 2개의 값을, linear2는 입력 2개를 받아 출력으로 하나의 값을 내는 layer이다.

즉, 총 2개의 layer를 갖는 Multi Layer Perceptron이다. 여기서 layer를 추가하여 더 깊은(deep) MLP를 만들 수 있으며, 입출력 노드 개수를 늘림으로써 더 넓은(wide) MLP를 만들 수 있다.

다음으로 cost function과 optimizer를 선언한다.

# define cost & optimizer
criterion = torch.nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=1)

학습은 다음과 같이 진행된다.

먼저 backpropagation 과정을 자세히 나타낸 코드를 보자.

for epoch in range(10001):
    # forward propagation (cost 계산 과정)
    l1 = torch.add(torch.matmul(X, w1), b1)
    a1 = sigmoid(l1)
    l2 = torch.add(torch.matmul(a1, w2), b2)
    Y_pred = sigmoid(l2)

    cost = -torch.mean(Y * torch.log(Y_pred) + (1 - Y) * torch.log(1 - Y_pred)) # binary cross entropy loss
    # back propagation (chain rule)
    # Loss derivative
    d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7) # 0으로 나누지 않도록 1e-7 더함
    # Layer 2
    d_l2 = d_Y_pred * sigmoid_prime(l2)
    d_b2 = d_l2
    d_w2 = torch.matmul(torch.transpose(a1, 0, 1), d_b2)

    # Layer 1
    d_a1 = torch.matmul(d_b2, torch.transpose(w2, 0, 1))
    d_l1 = d_a1 * sigmoid_prime(l1)
    d_b1 = d_l1
    d_w1 = torch.matmul(torch.transpose(X, 0, 1), d_b1))

    # Weight update
    w1 = w1 - learning_rate * d_w1
    b1 = b1 - learning_rate * torch.mean(d_b1, 0)
    w2 = w2 - learning_rate * d_w2
    b2 = b2 - learning_rate * torch.mean(d_b2, 0)

    if epoch % 100 == 0:
        print(epoch, cost.item())

Backpropagation 부분의 전체적인 코드 흐름을 살펴보면, error가 derivative 함수(sigmoid_prime)를 따라 output 방향에서 input 방향으로 전파됨을 알 수 있다. 또한, weight update가 gradient descent에 따라 이루어짐을 살펴볼 수 있다.

PyTorch에서는 backward(), step()함수로 위와 같은 긴 과정을 간단히 구현할 수 있다.

for epoch in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)

    # cost function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()

    if epoch % 1000 == 0:
        print('Epoch: {:4d} \\ Cost: {:.4f}'.format(
            epoch, cost.item()
        ))

학습된 모델을 Test한다.

# Accuracy computation
with torch.no_grad():
    hypothesis = model(X)
    predicted = (hypothesis > 0.5).float()
    accuracy = (predicted == Y).float().mean()
    print('Hypothesis: \n', hypothesis.detach().cpu().numpy(),
          '\nCorrect: \n', predicted.detach().cpu().numpy(), '\nAccuracy: ', accuracy.item())

최종 결과는 다음과 같다

PyTorch 익히기 - Deep Learning 기초 (MNIST)

Mon, 11 Jul 2022 05:59:23 GMT

Pytorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 Pytorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

기본적인 딥러닝 내용은 어느 정도 알고 있다고 가정하고, PyTorch 실습 내용 위주로 정리해두었다.

!youtube[https://www.youtube.com/watch?v=JcNkszxJuak&list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv&index=10]

간단한 설명이 포함된 실습 자료는 다음 Github를 참조하자.

1. Basic Approach to Train Deep Neural Network

딥러닝 모델을 구축할 때 기본적인 절차를 살펴보자.

Neural Network Architecture 설계
- 목적에 맞는 딥러닝 모델을 설계한다.
학습을 시킨 후, 모델이 ovefitting되지는 않았는지 확인한다. (Training loss는 줄어들지만 Validation Loss는 오히려 커지는 경우)
- overfitting되지 않았다면 모델의 사이즈를 키우고 (깊거나 넓게)
- overfitting되었다면 dropout, batch-normalization 등의 방법을 추가하여 overfitting을 방지해준다.
Step '2.'를 반복한다.

Implementation

실습을 통해 딥러닝 모델을 만드는 과정을 살펴보자.

1) Import

먼저, 필요한 라이브러리를 import한다.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

# For reproducibility
torch.manual_seed(1)

2) Training and Test Dataset

다음으로, training set과 test set을 나누어준다.

# training set
x_train = torch.FloatTensor([[1, 2, 1],
                             [1, 3, 2],
                             [1, 3, 4],
                             [1, 5, 5],
                             [1, 7, 5],
                             [1, 2, 5],
                             [1, 6, 6],
                             [1, 7, 7]
                            ]) # (8, 3)
y_train = torch.LongTensor([2, 2, 2, 1, 1, 1, 0, 0]) # (8, )

# test set
x_test = torch.FloatTensor([[2, 1, 1], [3, 1, 2], [3, 3, 4]]) # (3, 3)
y_test = torch.LongTensor([2, 2, 2]) # (3, )

여기서 분류할 class 개수는 3개이고, train sample은 8개, test sample은 3개임을 알 수 있다.

3) Model

다음으로 softmax classification 모델을 생성해보자. 자세한 방법은 이전 포스팅에 설명되어있다.

class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 3)
    def forward(self, x):
        return self.linear(x)

model = SoftmaxClassifierModel()

# Optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=0.1)

4) Training

이어서 다음과 같이 학습을 진행한다.

def train(model, optimizer, x_train, y_train):
    nb_epochs = 20
    for epoch in range(nb_epochs):
        # H(x) 계산
        prediction = model(x_train)
        # cost 계산
        cost = F.cross_entropy(prediction, y_train)
        # cost로 H(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

        print('Epoch {:4d} / {} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

5) Test (Validation)

Training 후에는 Overfitting 여부를 판단하기 위해 test dataset에 대해 test dataset에 대해 test를 진행한다.

여기서는 validation과 같은 용어를 사용하지만, 엄밀히 말하자면 validation은 학습과정 내에서 모델의 하이퍼파라미터를 조정해주기 위한 평가 과정이고, test는 학습이 모두 종료된 이후, generalization이 잘 되었는지 평가하는 과정이다.

즉, validation dataset은 training dataset의 일부를 사용하는 경우가 많으며, test dataset은 unseen data(본 적 없는 데이터)를 사용한다.

def test(model, optimizer, x_test, y_test):
    prediction = model(x_test)
    predicted_classes = prediction.max(1)[1]
    correct_count = (predicted_classes == y_test).sum().item()
    cost = F.cross_entropy(prediction, y_test)

    print('Accuracy: {}% Cost: {:.6f}'.format(
        correct_count / len(y_test) * 100, cost.item()
    ))

모델에 test 입력을 주고 마찬가지로 cross_entropy를 사용하여 예측된 클래스와 실제 클래스가 같은지 확인해준다.

6) Run

간단히 위에서 정의한 함수를 호출하여 train과 test를 진행한다.

[train]

train(model, optimizer, x_train, y_train)

[test]

test(model, optimizer, x_test, y_test)

Train과 test 결과는 다음과 같다.

결과에서 train data에 대한 cost는 계속 떨어지지만, test data에 대한 cost는 1.4로 매우 높다. 따라서 overfitting이 발생했음을 알 수 있다.

7) Generalization 성능을 높이는 방법들

overfitting을 완화하기 위해서는 Regularization을 적용하거나, 여러가지 hyperaparmeter를 조정해주거나, 데이터를 정제해주는 등의 방법이 있다.

우리가 이제까지 크게 신경쓰지 않고 써왔던 'learning rate'가 바로 대표적인 hyperparameter의 예이다.

값이 너무 크면 cost 값이 줄지 않고 발산해버리고, 너무 작으면 cost가 너무 늦게 줄어든다.

코드에서 'optimizer'를 정의할 때 lr의 값을 조정하면 된다.

데이터를 사전에 정제해주는 작업인 Data Preprocessing(전처리) 과정의 대표적인 예시는 'Standardization(표준화, normalization)'이 있다. 이는 다음과 같은 수식으로 진행한다.

$x'_{j} = \frac{x_j - \mu_j}{\sigma_j}$

여기서 $\sigma$는 데이터의 standard deviation, $\mu$는 데이터의 평균값이다.

코드로 다음과 같이 구현할 수 있다.

mu = x_train.mean(dim=0)
sigma = x_train.std(dim=0)
normalized_x_train = (x_train - mu) / sigma
print(normalized_x_train)

print된 값들은 표준정규분포를 따른다.

2. MNIST

딥러닝 계의 'Hello World!'인 MNIST 데이터셋에 대한 딥러닝 모델을 간단히 만들어보자.

먼저, 'torchvision'이라는 패키지에 대해 알아둘 필요가 있다.

torchvision 패키지는 PyTorch에서 제공하는 유명한 데이터셋, 모델, transforms 등을 포함하는 라이브러리이다.

1) Reading Data

torchvision에 포함된 데이터셋을 불러오는 과정은 다음과 같다.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision.datasets as dsets
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import random

mnist_train = dsets.MNIST(root="MNIST_data/", train=True, transform=transforms.ToTensor(), download=True)
mnist_test = dsets.MNIST(root="MNIST_data/", train=False, transform=transforms.ToTensor(), download=True)

# parameters
training_epochs = 15
batch_size = 100

data_loader = torch.utils.data.DataLoader(dataset=mnist_train, batch_size=batch_size, shuffle=True, drop_last=True)

MNIST 함수의 인자는 각각 다음을 의미한다.

root : MNIST 데이터가 어느 경로에 있는지를 나타낸다.
train : train data인지(true), test data인지(false)를 나타낸다.
transform : MNIST 이미지를 불러올 때 어떤 transform을 적용할지를 나타낸다.
- 일반적으로 PyTorch에서 받아들이는 이미지는 0에서 1 사이의 값을 가지며, 순서는 Channel-Height-Weight이다. 하지만 이미지는 0-255의 값을 가지며, 순서가 Height-Weight-Channel이다.
- 원래 이미지의 형태를 PyTorch에서 받아들일 수 있도록 바꾸어주는 과정이 'transforms.ToTensor()'함수이다.
download에 True를 부여하면 root 경로에 MNIST 데이터가 존재하지 않으면 다운을 받는다는 의미를 갖는다.

DataLoader 함수 인자는 다음과 같은 의미를 갖는다.

DataLoader : 어떤 데이터를 불러올 것인지를 의미한다.
batch_size : train 이미지를 몇 개씩 잘라서 갖고올 것인지를 의미한다.
shuffle : 순서를 섞을지 여부를 나타낸다.(보통 True)
drop_last : batch_size만큼 잘라서 사용하고 남은 데이터를 버릴지를 나타낸다. (True 시 버림)

epoch 내에서 iterable variable인 data_loader에 따라 'view'함수를 사용하여 28 * 28 사이즈를 (, 784)의 사이즈로 바꾼다.

Terminology : Epoch/Batch size/Iteration

Neural Network에서 자주 사용하는 용어를 정리해보자.

epoch : 모든 training samples에 대해 한 번의 forward pass와 backward pass를 진행할 때, 1 epoch이라 한다.
batch size : 학습 시간을 줄이고, 한정적인 메모리 용량을 효율적으로 사용하도록 training sample 전체를 'batch_size'만큼 잘라 forward/backward pass를 진행한다. 예를 들어, 총 60000장의 이미지가 있고, 100장씩 나누어 학습을 진행한다면 batch size는 600이 된다.
- batch size가 커질수록 필요한 메모리가 커진다.
iterations : batch를 학습에 몇 번 사용했는가를 나타낸다. 위의 예시에서 1epoch을 돌기 위해서는 100 iteration이 필요하다.

$\text{1 epoch} = \text{batch size} × \text{iterations}$

2) Training with Softmax Classifier

# MNIST data image of shape 28 * 28 = 784
linear = torch.nn.Linear(784, 10, bias=True)

# define cost/loss & optimizer
criterion = torch.nn.CrossEntropyLoss() # softmax is internally computed
optimizer = torch.optim.SGD(linear.parameters(), lr=0.1)

for epoch in range(training_epochs):
    avg_cost = 0
    total_batch = len(data_loader)

    for X, Y in data_loader:
        # reshape input image into [batch_size by 784]
        # label is not one-hot encoded
        X = X.view(-1, 28 * 28)
        # GPU 쓸 경우
        # Y = Y.to(device)

        optimizer.zero_grad()
        hypothesis = linear(X)
        cost = criterion(hypothesis, Y)
        cost.backward()
        optimizer.step()
        avg_cost += cost / total_batch

    print("Epoch: ", "%04d" % (epoch+1), "cost =", "{:.9f}".format(avg_cost))
print('Learning Finished')

결과는 아래와 같다.

3) Test & Visualization

이제 test dataset을 통해 모델이 예측을 잘 해내는지 알아보자.

# Test the model using test sets
with torch.no_grad():
    X_test = mnist_test.test_data.view(-1, 28 * 28).float()
    Y_test = mnist_test.test_labels

    prediction = linear(X_test)
    correct_prediction = torch.argmax(prediction, 1) == Y_test
    accuracy = correct_prediction.float().mean()
    print('Accuracy: ', accuracy.item())

    # Get one and predict
    r = random.randint(0, len(mnist_test) - 1)
    X_single_data = mnist_test.test_data[r:r + 1].view(-1, 28 * 28).float()
    Y_single_data = mnist_test.test_labels[r:r + 1]

    print('Label: ', Y_single_data.item())
    single_prediction = linear(X_single_data)
    print('Prediction: ', torch.argmax(single_prediction, 1).item())

    plt.imshow(mnist_test.test_data[r:r + 1].view(28, 28), cmap='Greys', interpolation='nearest')
    plt.show()

여기서, torch.no_grad()란, gradient를 계산하지 않겠다는 의미이다. test 시에는 gradient를 계산하지 않아야 하므로, 위와 같이 with 구문을 통해 실수를 방지할 수 있다.

결과는 다음과 같다.

모델이 실험 데이터에 대해서 약 88%의 정확도를 갖는 것을 확인할 수 있다.

PyTorch 익히기 - Logistic Regression & Softmax Classification

Mon, 11 Jul 2022 04:58:15 GMT

Pytorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 Pytorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

기본적인 딥러닝 내용은 어느 정도 알고 있다고 가정하고, PyTorch 실습 내용 위주로 정리해두었다.

간단한 설명이 포함된 실습 자료는 다음 Github를 참조하자.

1. Logistic Regression 개요

앞서 살펴봤던 Linear Regression은 수로 표현하는 설명변수와 연속형 숫자로 이루어진 종속변수의 관계를 선형으로 어떻게 잘 나타낼 수 있는지를 살펴보기 위한 것이었다.

이에 반해 Logistic Regression의 경우, Classification 문제에 자주 사용되는데, 이때에는 종속변수 $Y$가 연속형 숫자가 아닌 범주(Class)를 나타낸다.

1) Binary Classification

먼저 종속변수가 두 개인, 즉 데이터가 두 가지로 나뉘는 경우를 생각해보자. 이를 Binary Classification 문제라 한다.

Hypothesis로는 다음과 같은 sigmoid 함수가 많이 사용된다. sigmoid 함수는 0과 1 사이의 값을 출력해주므로, 이는 확률과 비슷한 개념으로 볼 수 있다.

$H(X) = \frac{1}{1 + e^{-W^T X}}$

또한 Cost로는 다음과 같은 Binary Cross-entropy가 많이 사용된다.

$\text{cost}(W) = -\frac{1}{m} \sum{y \log{H(x)} + (1-y) \log{(1 - H(x))}}$

그리고 파라미터의 업데이트는 앞선 방법과 동일하게 Gradient Descent로 진행된다.

$W := W - \alpha \frac{\partial}{\partial W} \text{cost}(W) = W - \alpha \nabla_w \text{cost}(W)$

Implementation

먼저 다음과 같이 import를 진행한다.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

그리고 다음 코드를 통해 seed를 부여해서 같은 코드에 대해 추후에 같은 결과를 제공해줄 수 있다.

# For reproducibility (나중에도 같은 결과 제공 - seed 부여)
torch.manual_seed(1)

Training Data

다음과 같은 x_train, y_train 데이터를 가정하자.

x_data = [[1, 2], [2, 3], [3, 1], [4, 3], [5, 3], [6, 2]] # 6 by 2
y_data = [[0], [0], [0], [1], [1], [1]] # 6 by 1

x_train = torch.FloatTensor(x_data)
y_train = torch.FloatTensor(y_data)

Computing the Hypothesis

PyTorch에서 제공하는 'torch.exp()'함수를 통해 exponential function을 쉽게 구현할 수 있다.

$H(X) = \frac{1}{1 + e^{-W^T X}}$

W = torch.zeros((2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

hypothesis = 1 / (1 + torch.exp(-(x_train.matmul(W) + b)))

또는 간단히 'torch.sigmoid()'함수를 사용할 수도 있다.

hypothesis = torch.sigmoid(x_train.matmul(W) + b

Computing the Cost Function

cost 수식을 구현하기 위해, 다음과 같이 구현한다.

$\text{cost}(W) = -\frac{1}{m} \sum{y \log{H(x)} + (1-y) \log{(1 - H(x))}}$

losses = -(y_train * torch.log(hypothesis) + 
            (1 - y_train) * torch.log(1 - hypothesis))
cost = losses.mean()

또는 PyTorch의 'F.binary_cross_entropy()'함수를 사용할 수도 있다.

F.binary_cross_entropy(hypothesis, y_train)

Whole Training Procedure

전체적으로 학습 과정과 그 결과를 살펴보자.

# 모델 초기화
W = torch.zeros((2, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

# optimizer 설정
optimizer = optim.SGD([W, b], lr=1) # SGD 사용

nb_epochs = 1000
for epoch in range(nb_epochs + 1):
    # Cost 계산
    hypothesis = torch.sigmoid(x_train.matmul(W) + b)
    cost = F.binary_cross_entropy(hypothesis, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번 마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d} / {} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

Evaluation

모델을 학습한 이후에는 모델이 test set에 얼마나 잘 작동하는지 (일반화 성능이 얼마나 좋은지가 인공지능 모델의 최종 목표) 알아보아야 한다.

hypothesis = torch.sigmoid(x_test.matmul(W) + b)
prediction = hypothesis >= torch.FloatTensor([0.5])
correct_prediction = prediction.float() == y_train

hypothesis는 0과 1사이의 실수 값이므로, 그 값이 0.5보다 크면 1, 같거나 작으면 0을 할당하여 false(0) 또는 true(1)의 값을 갖는 binary prediction으로 변경해준다.

이때 prediction의 datatype은 BoolTensor이다.

다음으로 correct_prediction을 출력하여 예측을 제대로 했는지 살펴본다. 최종 결과는 다음과 같다.

Higher Implementation with Class

앞서 Linear Regression에서도 그랬듯이, Class를 활용하여 효율적인 코드를 작성할 수 있다.

class BinaryClassifier(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 1) # 2개 입력받을 것
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        return self.sigmoid(self.linear(x))

model = BinaryClassifier()

# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=1)

nb_epochs = 100
for epoch in range(nb_epochs + 1):
    # H(x) 계산
    hypothesis = model(x_train)
    cost = F.binary_cross_entropy(hypothesis, y_train)

    # cost로 H(x) 계산
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 20번마다 로그 출력
    if epoch % 20 == 0:
        prediction = hypothesis >= torch.FloatTensor([0.5])
        correct_prediction = prediction.float() == y_train
        accuracy = correct_prediction.sum().item() / len(correct_prediction)
        print('Epoch {:4d} / {} Cost: {:.6f} Accuracy: {:2.2f}%'.format(
            epoch, nb_epochs, cost.item(), accuracy * 100,
        ))

결과는 아래와 같다.

2. Softmax Classification

앞서 종속변수가 두 가지 범주인 경우를 살펴보았다.

하지만, 실생활에서는 3개 이상의 클래스를 가지는 경우가 훨씬 많다.

이때 Softmax fuction을 사용한다.

1) Softmax Function

Softmax Function은 다음 수식을 통해 (특히 3개 이상의) 클래스들 중 특정 클래스에 속할 확률을 계산해준다.

$P(\text{class} = i) = \frac{e^i}{\sum{e^i}}$

각 softmax 값을 모두 더하면 1이 되므로 확률의 개념이라고 생각할 수 있다.

예를 들어, 다음과 같은 tensor가 있다고 가정하자.

x = torch.tensor([[1, 2], [3, 4]])

이때, PyTorch에서 제공하는 'softmax' 함수를 사용하면 간단하게 softmax를 계산할 수 있다.

단, dimension에 유의해야 한다. dim이라는 인자를 입력해주고, 해당하는 축을 기준으로 normalize를 한다. 즉, 위의 예시에서 dim=0인 경우 열의 합이 1, dim=1인 경우 행의 합이 1이 된다.

2) Cross Entropy Loss (Low-level)

이제 Multi-class classification 분제에서 cross entropy를 구해보자.

$L = \frac{1}{N} \sum{-y \log{\hat{y}}}$

이때 $\hat{y}$는 예측값(확률), $y$는 정답값(확률)이다.

다음 예시 코드를 보자.

z = torch.rand(3, 5, requires_grad=True) # uniformly random
hypothesis = F.softmax(z, dim=1)
print(hypothesis)

# 정답을 랜덤하게 생성
y = torch.randint(5, (3,)).long() # low는 default로 0, high는 5, output tensor size는 (3,)
print(y)

3 × 5 크기의 tensor를 생성하고, softmax 함수를 적용한다. 이떄 class는 3개, sample은 5개이다.

이에 따라 3개의 클래스를 갖는 정답 y를 생성한다.

예를 들어 tensor([4, 2, 3])이라는 y가 생성되었다고 생각하자. 이것을 바로 사용할 수는 없으니, 각 값에 대해 해당 index만 1이고 나머지 index는 0인 one hot vector로 정답을 바꾸어준다.

y_one_hot = torch.zeros_like(hypothesis)
y_one_hot.scatter_(1, y.unsqueeze(1), 1) # in-place 연산
# dim=1, y의 사이즈를 (3,)에서 (3,1)으로 만든 후 1을 뿌림

cost = (y_one_hot * -torch.log(hypothesis)).sum(dim=1).mean() # cross-entropy
print(cost)

그 후, cross-entropy로 cost를 계산한다.

전체 결과는 다음과 같다.

하지만 위와 같이 복잡한 과정을 'F.cross_entropy'라는 함수로 쉽게 대체할 수 있다.

사실 'F.cross_entropy' 함수도 softmax 함수 결과에 로그를 바로 씌워주는 'F.log_softmax()'함수와 Negative Log Likelihood Loss를 계산하는 'F.nll_loss()'함수의 기능을 합한 것으로, 위와 같은 로직을 갖는다.

Implementation

1) Training with F.cross_entropy

전체 Training 과정은 다음과 같다.

x_train = [[1, 2, 1, 1],
           [2, 1, 3, 2],
           [3, 1, 3, 4],
           [4, 1, 5, 5],
           [1, 7, 5, 5],
           [1, 2, 5, 6],
           [1, 6, 6, 6],
           [1, 7, 7, 7]] # 8 samples, 4 input vectors(dim)
y_train = [2, 2, 2, 1, 1, 1, 0, 0] # 3 classes
x_train = torch.FloatTensor(x_train)
y_train = torch.LongTensor(y_train) # discrete

# 모델 초기화
W = torch.zeros((4, 3), requires_grad=True)
b = torch.zeros(1, requires_grad=True)
# optimizer 설정
optimizer = optim.SGD([W, b], lr=0.1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):
    # Cost 계산 (직접 구현)
    # hypothesis = F.softmax(x_train.matmul(W) + b, dim=1)
    # y_one_hot = torch.zeros_like(hypothesis)
    # y_one_hot.scatter_(1, y_train.unsqueeze(1), 1)
    # cost = (y_one_hot * -torch.log(F.softmax(hypothesis, dim=1))).sum(dim=1).mean()

    # Cost 계산 (cross_entropy 함수로 구현)
    z = x_train.matmul(W) + b
    cost = F.cross_entropy(z, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d} / {} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

2) High-level Implementation with nn.Module

이제 Class를 이용한 High-level 구현을 살펴보자.

모두 배운 내용이니 코드를 보면 금방 이해할 수 있을 것이다.

class SoftmaxClassifierModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(4, 3) # num of output classes = 3

    def forward(self, x):
        return self.linear(x)

model = SoftmaxClassifierModel()

# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=0.1)

nb_epochs = 1000
for epoch in range(nb_epochs + 1):
    # Cost 계산
    # hypothesis = F.softmax(x_train.matmul(W) + b, dim=1)
    # y_one_hot = torch.zeros_like(hypothesis)
    # y_one_hot.scatter_(1, y_train.unsqueeze(1), 1)
    # cost = (y_one_hot * -torch.log(F.softmax(hypothesis, dim=1))).sum(dim=1).mean()
    # z = x_train.matmul(W) + b
    # cost = F.cross_entropy(z, y_train)

    # H(x) 계산
    prediction = model(x_train)

    # Cost 계산
    cost = F.cross_entropy(prediction, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    # 100번마다 로그 출력
    if epoch % 100 == 0:
        print('Epoch {:4d} / {} Cost: {:.6f}'.format(
            epoch, nb_epochs, cost.item()
        ))

PyTorch 익히기 - Multivariable Linear Regression

Mon, 11 Jul 2022 04:37:36 GMT

Pytorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 Pytorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

기본적인 딥러닝 내용은 어느 정도 알고있다고 가정하고, PyTorch 실습 내용 위주로 정리해두었다.

간단한 설명이 포함된 실습 자료는 다음 Github를 참조하자.

1. Multivariable Linear Regression

이제까지 살펴본 Simple Linear Regression은 하나의 정보로부터 하나의 예측을 진행하는 간단한 형태의 Linear Regression이었다. 하지만 실제로는 여러 정보가 주어지는, 즉 x_train이 여러 개의 변수인 경우가 많다.

1) Data Definition

이제 x_train은 2차원 행렬 형태를 갖는다.

예를 들어,

x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

위와 같은 형태이다. x_train은 5 × 3 형태의 tensor, y_train은 5 by 1 형태의 tensor이다.

2) Hypothesis Function

Hypothesis Function은 예시의 경우 다음과 같이 나타낼 수 있을 것이다.

$H(x) = w_1 x_1 + w_2 x_2 + w_3 x_3 + b$

설명변수가 3개이므로 x_train의 열이 3개가 되고, 위와 같이 각각의 설명변수를 $x_1, x_2, x_3$으로 나타낼 수 있다.

하지만, 설명변수가 매우 많아진다면, 표현할 때도 복잡하고, 계산할 때도 매우 복잡해진다.

따라서 행렬을 이용하여 다음과 같이 나타낸다.

$H(x) = \boldsymbol{H}(x) = \boldsymbol{W} \boldsymbol{x} + \boldsymbol{b}$

예시의 경우 $\boldsymbol{W}$는 1 × 3, $\boldsymbol{x}$는 3 × 5, $\boldsymbol{b}$는 1 × 5이다.

이론적으로 구할 때에는 일반적으로 열벡터를 사용하기 때문에 위와 같은 형태를 갖지만, 코딩에서는 일반적으로 Weight의 shape을 ('입력 차원', '출력 차원')으로 구현함에 유의하자.

따라서 'x_train × W' 와 같이 곱해줄 것이고, 크기도 모두 (행, 열이)반대일 것이다.

코딩 시 간단히 matmul 함수를 통해 다음과 같이 표현할 수 있다.

hypothesis = x_train.matmul(W) + b

matmul 함수를 사용하면 더 간결하고, x의 길이가 바뀌어도 코드를 바꿀 필요가 없어지며, 계산 속도도 더 빠르다.

이후 Cost Function, Gradient Descent 과정은 Simple Linear Regression에서와 같다.

결론적으로 학습 데이터와 모델 정의 분만 바뀌는데, 이는 곧 PyTorch의 확장성을 보여준다.

3) Full Code with torch.optim

전체 코드와 결과는 다음과 같다.

# 데이터
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

# 모델 초기화
W = torch.zeros((3, 1), requires_grad=True)
b = torch.zeros(1, requires_grad_True)

# optimizer 설정
optimizer = optim.SGD([W, b], lr=1e-5)

nb_epochs = 20
for epoch in range(nb_epochs + 1):
    # H(x) 계산
    hypothesis = x_train.matmul(W) + b

    # cost 계산
    cost = torch.mean((hypothesis - y_train) ** 2)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} hypothesis: {} Cost: {:.6f}'.format(
        epoch, nb_epochs, hypothesis.squeeze().detach(),
        cost.item()
    ))

Cost가 점점 작아지고, H(x) 값은 y_train 값에 가까워짐을 볼 수 있다.

4) nn.Module

위와 같이 모델 초기화를 직접 해줄 수도 있지만, 복잡한 task에서 Weight과 Bias를 일일이 선언해 주는 일이 귀찮을 수 있다. 따라서 PyTorch에서는 nn.Module이라는 모듈을 제공한다.

다음 두 코드를 비교해보자.

<직접 모델 초기화>

# 모델 초기화
W = torch.zeros((3, 1), requires_grad=True)
b = torch.zeros(1, requires_grad=True)

# H(x) 계산
hypothesis = x_train.matmul(W) + b

import torch.nn as nn

class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1) # 입력차원: 3, 출력차원: 1

    def forward(self, x): # Hypothesis 계산
        return self.linear(x)

hypothesis = model(x_train)

# gradient 계산은 backward()를 통해 PyTorch가 알아서 해준다!

모듈을 사용하려면 간단히 파이썬의 클래스 개념을 숙지하고 있어야 한다. 자세한 내용은 다음 글을 참고하자.

여기서 클래스를 간단하게 설명하자면, 반복되는 함수들을 클래스라는 개념 안에 묶어두어 쉽게 사용하도록 하기 위함이다.

클래스의 개념에서 보자면 위 코드의 마지막줄은 hypothesis라는 객체에 MultivariateLinearRegressionModel이라는 클래스의 인스턴스를 생성하여 할당하는 과정이다.

또한 __init__ 라는 이름의 함수는 클래스의 생성자이다. 생성자란, 객체가 생성될 때 자동으로 호출되는 메서드(함수)를 의미한다.

'self'는 생성되는 객체이고, 위의 경우 객체는 linear라는 객체 변수를 갖게 된다.

그리고, 클래스 간에 상속을 할 수도 있다. 위의 클래스 정의 부분을 해석하자면 'MultivariateLinearRegressionModel이라는 클래스가 nn.Module이라는 클래스를 상속함'을 뜻한다.

이는 nn.Module 클래스의 기능을 MultivariateLinearRegressionModel 클래스에서 사용할 수 있음을 말한다.

super()는 부모 클래스의 메소드를 호출하기 위해 사용한다. 즉, super()자체가 부모 클래스를 의미한다고 볼 수 있다.

따라서 'super().__init__()' 부분은 nn.Module 클래스의 속성 및 메소드를 가져오는 부분이다. 만약 부모 클래스에 전달할 input이 있다면 __init__()의 인자로 써주면 된다.

얼핏 보기에는 모듈을 사용하는 것이 더 길고 복잡하지 않은가 하고 생각할 수 있지만, nn.Module을 사용하여 클래스를 생성해줌으로써 인공신경망 모델을 편하게 만들 수 있다.

nn.Linear에 입력 차원, 출력 차원만 알려주고, forward 함수에서 hypothesis 계산을 어떻게 할지만 알려주면 된다.

5) torch.nn.functional

또한, PyTorch에서는 'torch.nn.functional' 라이브러리를 통해 다양한 cost function을 제공해준다.

예를 들어, MSE loss function을 사용하려면, 다음과 같이 간단히 사용할 수 있다.

import torch.nn.functional as F

# cost 계산
cost = F.mse_loss(prediction, y_train)

6) Full Code

최종 코드와 결과는 다음과 같다.

import torch
from torch import optim
import torch.nn as nn
import torch.nn.functional as F

class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1) # 입력차원: 3, 출력차원: 1

    def forward(self, x): # Hypothesis 계산
        return self.linear(x)

# 데이터
x_train = torch.FloatTensor([[73, 80, 75],
                             [93, 88, 93],
                             [89, 91, 90],
                             [96, 98, 100],
                             [73, 66, 70]])
y_train = torch.FloatTensor([[152], [185], [180], [196], [142]])

# 모델 초기화
model = MultivariateLinearRegressionModel()

# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=1e-5) # model의 파라미터!

nb_epochs = 20
for epoch in range(nb_epochs + 1):
    # H(x) 계산
    Hypothesis = model(x_train)

    # cost 계산
    cost = F.mse_loss(Hypothesis, y_train)

    # cost로 H(x) 개선
    optimizer.zero_grad()
    cost.backward()
    optimizer.step()

    print('Epoch {:4d}/{} hypothesis: {} Cost: {:.6f}'.format(
        epoch, nb_epochs, hypothesis.squeeze().detach(), cost.item()
    ))

2. Loading Data

지금까지는 적은 양의 데이터(빠른 시간 내에 입력이 가능한 데이터)를 갖고 학습을 진행했다.

하지만 딥러닝에서는 엄청나게 많은 양의 데이터를 사용한다. (대부분 데이터셋은 적어도 수십만 개의 데이터를 제공한다.)

이를 위해 PyTorch에서는 Data를 어떻게 다루는지 알아보자.

1) Minibatch Gradient Descent

데이터가 많다는 것은 모델이 견고하고 완성된 예측을 할 수 있으므로 좋은 일이다.

하지만 이러한 엄청난 양의 데이터를 한 번에 학습시키는 것은 거의 불가능하다. 속도가 너무 느리거나 하드웨어적으로 메모리에 다 저장할 수 없을 수 있기 때문이다.

따라서 복잡한 머신러닝 모델에서는 Minibatch Gradient Descent라는 방법을 통해 데이터를 일부로 나누어서 학습한다.

전체 데이터를 Minibatch라는 균일한 적은 양의 데이터로 나누어서 학습을 진행하는 것이다.

이러한 방법을 사용할 경우, cost를 계산할 때마다 모든 데이터를 쓰지 않기 때문에 각 업데이트 당 계산할 cost의 양이 줄어들어 업데이트 속도가 빨라진다.

하지만 전체 데이터를 쓰지 않아 잘못된 방향으로 업데이트가 진행될 수도 있다. 이에 따라 위 그림과 같이 Batch GD에 비해 거칠게 업데이트가 진행된다.

2) 'Dataset' module

데이터 샘플을 처리하는 코드는 유지보스가 어려울 수 있다. 더 나은 readability와 modularity를 위해 데이터셋 코드를 모델 학습 코드로부터 분리하는 것이 이상적이다.

PyTorch는 이를 위해 torch.utils.data.Dataset과 torch.utils.data.DataLoader라는 데이터를 다루는 모듈을 제공해준다.

이를 통해 미리 준비된 데이터셋 뿐만 아니라, 가지고 있는 데이터를 사용할 수 있도록 해준다.

간단히 Dataset은 sample, label을 저장하고, DataLoader는 Dataset을 샘플에 쉽게 접근할 수 있도록 iterable 객체로 랩핑해준다.

'torch.utils.data.Dataset'부터 알아보자.

커스텀 데이터셋을 만들 때, 아래와 같은 2가지 magic method를 구현해야 한다.

__len__() : 데이터셋의 총 데이터 수 반환
__getitem__() : 어떤 인덱스 idx를 받았을 때, 그에 상응하는 입출력 데이터를 반환

코드는 다음과 같이 작성한다.

from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self):
        self.x_data = [[73, 80, 75],
                       [93, 88, 93],
                       [89, 91, 90],
                       [96, 98, 100],
                       [73, 66, 70]]
        self.y_data = [[152], [185], [180], [196], [142]]

    def __len__(self):
        return len(self.x_data)

    def __getitem__(self, idx):
        x = torch.FloatTensor(self.x_data[idx])
        y = torch.FloatTensor(self.y_data[idx])
        return x, y

dataset = CustomDataset()

3) 'DataLoader' module

'torch.utils.data.DataLoader'라는 모듈을 사용한다.

DataLoader의 인스턴스를 만들기 위해서는 다음 두 가지 변수를 지정해야 한다.

dataset : 위에서 만든 dataset이다.
batch_size : 각 minibatch의 크기 (일반적으로 2의 제곱수로 설정한다.)

통상적으로 많이 사용하는 옵션은 'shuffle'이 있다. shuffle을 True로 지정할 시 Epoch마다 데이터셋을 섞어서 데이터가 학습되는 순서를 바꾸어준다.

DataLoader를 사용하면 Dataset이 iterable 객체가 되어, minibatch의 인덱스와 데이터에 쉽게 접근할 수 있게 된다.

4) Full Code with Dataset and DataLoader

Dataset과 DataLoader를 이용한 전체 코드와 결과를 살펴보자.

import torch
from torch import optim
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset
from torch.utils.data import DataLoader

# Dataset
class CustomDataset(Dataset):
    def __init__(self):
        self.x_data = [[73, 80, 75],
                       [93, 88, 93],
                       [89, 91, 90],
                       [96, 98, 100],
                       [73, 66, 70]]
        self.y_data = [[152], [185], [180], [196], [142]]

    def __len__(self):
        return len(self.x_data)

    def __getitem__(self, idx):
        x = torch.FloatTensor(self.x_data[idx])
        y = torch.FloatTensor(self.y_data[idx])
        return x, y

dataset = CustomDataset()

# DataLoader
dataloader = DataLoader(
    dataset,
    batch_size=2,
    shuffle=True
)

class MultivariateLinearRegressionModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(3, 1) # 입력차원: 3, 출력차원: 1

    def forward(self, x): # Hypothesis 계산
        return self.linear(x)

# 모델 초기화
model = MultivariateLinearRegressionModel()

# optimizer 설정
optimizer = optim.SGD(model.parameters(), lr=1e-5) # model의 파라미터!

nb_epochs = 20
for epoch in range(nb_epochs + 1):
    for batch_idx, samples in enumerate(dataloader):
        x_train, y_train = samples

        # H(x) 계산
        prediction = model(x_train)

        # cost 계산
        cost = F.mse_loss(prediction, y_train)

        # cost로 H(x) 개선
        optimizer.zero_grad()
        cost.backward()
        optimizer.step()

        print('Epoch {:4d}/{} Batch: {}/{} Cost: {:.6f}'.format(
            epoch, nb_epochs, batch_idx+1, len(dataloader), cost.item()
        ))

enumerate(dataloader)를 통해 iterable 객체인 dataloader에서 minibatch 인덱스와 데이터를 받아오고, len(dataloader)는 한 epoch 당 minibatch가 몇 개인지를 알려준다.

PyTorch 익히기 - Linear Regression & Gradient Descent

Mon, 11 Jul 2022 04:12:24 GMT

Pytorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 Pytorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

기본적인 딥러닝 내용은 어느 정도 알고있다고 가정하고, PyTorch 실습 내용 위주로 정리해두었다.

간단한 설명이 포함된 실습 자료는 다음 Github를 참조하자.

1. Linear Regression 개요

Linear Regression은 인공지능 뿐만 아니라 통계학에서도 매우 많이 사용되는 기본 개념이다.

일반적인 linear regression은 하나 이상의 설명 변수(explanatory variables)에 의해 선형적인 관계를 갖는 응답을 모델링하는 것이다.

설명 변수가 하나인 경우에는 simple linear regression, 둘 이상인 경우에는 multivariate linear regression이라 한다.

우선 이번 포스팅에서는 simple linear regression에 대해 알아보자.

1) Data Definition

Linear Regression에서 데이터셋은 두 가지로 나뉜다.

Training Dataset : 모델을 학습시키는 데 사용되는 데이터
Test Dataset : 모델이 얼마나 잘 작동하는지 확인하는 데이터

PyTorch를 사용한다면 데이터는 'torch.tensor'의 형태를 가질 것이고, simple linear regression인 경우 학습 데이터의 입력은 x_train, 출력은 y_train으로 둘 수 있다.

x_train = torch.FloatTensor([[1], [2], [3]])
y_train = torch.FloatTensor([[2], [4], [6]])

2) Hypothesis

앞서 Linear Regression이란 학습 데이터와 잘 맞는 직선을 찾는 과정이라고 하였다.

이에 따라, 다음과 같은 수식으로 모델을 표현할 수 있다.

$y = Wx + b$

Hypothesis를 PyTorch를 사용해서 표현하면 다음과 같다.

W = torch.zeros(1, requires_grad=True)
b = torch.zeros(1, requires_grad=True)
hypothesis = x_train * W + b

여기서, Weight $W$와 Bias $b$는 0으로 초기화했으며, requires_grad=True는 학습 과정에 필요한 'gradient'가 필요하다, 즉 해당 변수를 학습시킬 것이라는 것을 명시해주는 과정이다.

또한 torch.zeros의 인자로 1을 전해주면서 tensor의 shape이 1임을 명시해주었다.

3) Compute Loss

$W$와 $b$를 통해 모델은 예측 값을 출력하게 된다. 이때, 모델이 예측한 값과 실제 정답 값이 얼마나 가까운가를 Loss 또는 Cost라 한다.

여러 방법으로 Loss와 Cost를 나타낼 수 있는데, 대표적으로 Mean Squared Error (MSE) 를 많이 사용한다. 수식은 다음과 같다.

$\text{cost}(W, b) = \frac{1}{m} \sum^m_{i=1} (H(x^{(i)}) - y^{(i)} ) ^2$

코드로는 다음과 같이 나타낸다.

cost = torch.mean((hypothesis - y_train) ** 2)

4) Gradient Descent

앞서 Weight와 Bias를 0으로 초기화했었는데, cost를 줄이는 값으로 Weight와 Bias를 반복적으로 업데이트해준다. 이러한 과정을 Optimization(최적화)라 한다. 특히 머신러닝에서 최적하에 많이 사용되는 방식 중에는 Gradient Descent가 있다.

코드를 살펴보자.

optimizer = optim.SGD([W, b], lr=0.01) # Stochastic Gradient Descent

optimizer = zero_grad() # gradient 초기화
cost.backward() # gradient 계산
optimizer.step() # 계산된 gradient 값을 사용하여 W, b 업데이트

torch.optim 라이브러리를 사용하여 학습할 Tensor([W, b])와 learning rate를 설정해준다.

5) Full Training Code

전체 학습 과정은 다음과 같다.

학습 데이터를 살펴보았을 때, x_train의 값 1, 2, 3에 따라 y_train의 값이 2, 4, 6이므로 W는 2, b는 0이 되는 것이 정답일 것이다.

weight와 bias 업데이트를 반복하는 횟수를 1000번으로 지정했을 때 위 결과 사진과 같이 W는 1.9708, b는 0.0664라는 값이 나왔다.

2. Gradient Descent

Gradient Descent에 대해 좀 더 살펴보자.

1) Cost Function과 Gradient

앞서 모델의 예측값과 실제 값 간의 차이를 Cost라 한다고 하였다.

편의를 위해 bias는 고려하지 않고, Weight $W$만 고려하자.

cost function은 (만약 위에서 소개한 MSE를 사용한다면) 아래와 같이 2차함수 형태가 될 것이다.

$\text{cost}(W) = \frac{1}{m} = \sum^m_{i=1} ( H(x^{(i)} ) - y^{(i)} )^2$

위 2차 함수를 W, Cost에 대한 곡선으로 나타내면

위와 같은 형태일 것이다.

Cost를 낮추는 것이 모델 학습의 목적이므로, 현재 값에서 기울기(Gradient)를 계산하여 W값을 업데이트해줄 것이다.

Gradient는 다음과 같이 나타낸다.

$\frac{\partial \text{cost}}{\partial W} = \nabla W$

2) Gradient Descent

Gradient Descent 방법에서는 W를 다음과 같이 업데이트한다.

$W := W - \alpha \nabla W$

여기서 (\alpha)는 learning rate이고, cost 값을 줄이는 방향으로 업데이트해야 하므로 gradient와 learning rate값의 곱 만큼 빼주어야 한다.

Gradient Descent 과정을 코드로 나타내면 다음과 같다.

gradient = 2 * torch.mean((W * x_train - y_train) * x_train)
lr = 0.1
W -= lr * gradient

3) Full Code (with Gradient Descent)

결과에서는 직접 gradient를 계산했지만, torch.optim 라이브러리를 통해 쉽게 gradient descent를 구현할 수 있다. (1-5의 Full Training Code 참조)

PyTorch 익히기 - Tensor

Mon, 11 Jul 2022 03:37:21 GMT

Pytorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 Pytorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌 2'를 참고하였다.

!youtube[https://www.youtube.com/watch?v=St7EhvnFi6c&list=PLQ28Nx3M4JrhkqBVIXg-i5_CVVoS1UzAv&index=2]

간단한 설명이 포함된 실습 자료는 Github를 참조하자.

1. Vector, Matrix and Tensor

딥러닝에서는 'Tensor'라는 데이터 단위를 사용한다. 이는 개념적으로 벡터, 행렬과 매우 유사하다.

데이터를 다룰 때에 이 Tensor의 크기(Size, Shape)가 매우 중요하므로, 개념을 잘 숙지해두자.

0차원 Tensor는 차원이 없는 값으로, Scalar에 해당한다.

1D Tensor는 Vector이고, 2D Tensor는 Matrix로 볼 수 있다.

이러한 방식으로 차원을 하나씩 추가해 나갈 때마다 3D, 4D, 5D Tensor 등이 된다.

1) 2D Tensor (Typical Simple Setting)

$|t| = (\text{batch size}, \text{dim})$

2D Tensor의 전형적인 예는 'batch size', 'dimension'이라는 차원을 갖는 형태의 Tensor이다.

만약 batch size를 64, dimension이 256인 데이터가 있다면, 이를 '64 by 256 2D Tensor'라 한다.

2) 3D Tensor (Typical Computer Vision)

Computer Vision 분야에서 일반적으로 사용하는 3D Tensor를 살펴보자.

$|t| = (\text{batch size}, \text{width}, \text{height})$

위와 같이 이미지 하나가 (width, height)의 차원을 갖고, 이러한 이미지가 여러 장 모여 3D Tensor를 이루는 형태이다.

3) 3D Tensor (Typical Natural Language Processing)

이번에는 같은 3D Tensor이지만, 자연어 처리(Natural Language Processing) 분야에서 사용되는 시계열 데이터(Sequential Data)의 전형적인 형태를 살펴보자.

$|t| = (\text{batch size}, \text{length}, \text{dim})$

CV에서는 이미지가 한 층을 이루었으나, NLP에서의 Sequential Data는 한 문장이 한 층을 이루게 된다.

이러한 문장이 시간 순서대로 층을 이루어 쌓이면 위와 같은 3D Tensor 형태를 갖는다.

Implementation

1) Numpy Review, PyTorch Tensor 선언

PyTorch 실습에 앞서, 데이터를 다루어본 사람이라면 누구나 한 번쯤 써봤을 법한 'Numpy'를 간단하게 리뷰해보자.

벡터, 행렬, 스칼라를 다루는 데 있어 비슷한 방식을 사용하므로 Numpy를 사용해본 사람이라면 금방 PyTorch의 tensor를 다룰 수 있을 것이다.

(1) 1D, 2D Array with Numpy

numpy에서 1d 또는 2d array는 다음과 같이 선언한다.

t = np.array([0., 1., 2., 3., 4., 5., 6.]) # (7, ) 1d vector

t = np.array([[1., 2., 3.],
              [4., 5., 6.],
              [7., 8., 9.],
              [10., 11., 12.]]) # 4 by 3 matrix

또한, 다음과 같이 차원(rank or dimension)과 크기(size or shape)를 나타낼 수 있다.

t.ndim
t.shape

그리고, 파이썬의 인덱싱을 사용하여 element를 표현하거나 slicing이 가능하다.

실습 결과는 다음과 같다.

(2) PyTorch Tensor

이를 이해한다면, PyTorch에서 Tensor를 선언하는 것도 금방 익숙해질 수 있다.

PyTorch에서 1D, 2D Tensor를 선언하는 방법은 다음과 같다. (datatype : float, 실제로 int, long, bool 등 다양한 datatype 존재)

t = torch.FloatTensor([0., 1., 2., 3., 4., 5., 6.]) # 1d tensor

t = torch.FloatTensor([[1., 2., 3.],
                       [4., 5., 6.],
                       [7., 8., 9.],
                       [10., 11., 12.]]) # 2d tensor

또한, rank, shape, slicing 등 크기를 나타내거나 특정 부분을 인덱싱하는 방법이 numpy와 거의 비슷하다.

print(t.dim()) # rank
print(t.shape) # shape
print(t.size()) # shape
print(t[0], t[1], t[-1]) # element
print(t[2:5], t[4:-1]) # slicing
print(t[:2], t[3:]) # slicing

실행 결과는 다음과 같다.

2) Broadcasting

행렬은 덧셈, 뺄셈 시에 크기(차원)가 같아야 한다. 또한, 행렬곱을 수행하려면 앞 행렬의 마지막 차원과 뒷 행렬의 첫 번째 차원이 같아야 한다.

하지만, Pytorch에서는 다른 크기의 행렬의 연산을 Broadcasting을 통해 자동으로 맞추어 진행해준다.

자동으로 실행되므로, 컴파일 오류는 나지 않지만 원하지 않는 결과를 낼 수 있으므로, 사용자의 주의가 필요하다.

3) Multiplication vs Matrix Multiplication

일반적인 'mul'함수의 경우 element-wise 곱을 하고, 'matmul'함수의 경우 행렬곱을 수행한다.

이에 따라 서로 다른 두 행렬에 대해 broadcasting이 일어날 수도, 일어나지 않을 수도 있다.

m1 = torch.FloatTensor([[1, 2], [3, 4]]) # 2 by 2
m2 = torch.FloatTensor([[1], [2]]) # 2 by 1

m1.mul(m2) # element-wise multiplication
m1.matamul(m2) # Matrix Multiplication

4) Mean

mean 함수는 default로 tensor 전체 element의 평균을 구한다. 여기서, 정수형 데이터타입에는 사용할 수 없음에 유의하자.

t = torch.FloatTensor([[1, 2], [3, 4]])
t.mean() # 전체 element의 평균
t.mean(dim=0) # dim 0을 없앰 (행 방향, 즉 상하방향의 element의 평균)
t.mean(dim=1) # dim 1을 없앰 (열 방향, 즉 좌우방향의 element의 평균)
t.mean(dim=-1) # 마지막 dimension을 없앰

5) Sum

Sum 함수도 마찬가지로, default로 모든 element의 합을 구해주고, dim=0 옵션 부여 시 행끼리, dim=1 옵션 부여 시 열끼리 더해준다.

t = torch.FloatTensor([[1, 2], [3, 4]])

t.sum()
t.sum(dim=0) # 행끼리 더함
t.sum(dim=1) # 열끼리 더함
t.sum(dim=-1) # 마지막 차원끼리 더함

6) Max and Argmax

max operator는 인자 없이 호출될 경우 단순히 모든 요소 중 최댓값을 반환하고, dimension을 인자로 넘겨줄 경우, 2가지 값을 반환한다. 첫 번째 값은 최댓값, 두 번째 값은 최댓값의 index이다.

t = torch.FloatTensor([[1, 2], [3, 4]])

t.max()

t.max(dim=0) # 행방향, 즉 상하 방향의 값들 중 max 값과 argmax 반환
t.max(dim=1) # 열방향, 즉 좌우 방향의 값들 중 max 값과 argmax 반환
t.max(dim=-1) # 마지막 dimension에 맞추어 max 값과 argmax 반환

7) View

PyTorch에서의 View함수는 Numpy의 Reshape와 같은 역할을 한다.

즉, tensor의 크기를 수정할 수 있다. 이때 변동 가능한 차원을 '-1'로 표시하는데, 전체 shape의 차원의 곱이 같아야 한다.

t = torch.FloatTensor([[[0, 1, 2],
                        [3, 4, 5]],

                       [[6, 7, 8],
                        [9, 10, 11]]]) # 2 by 2 by 3

t.view([-1, 3])

t.view([-1, 1, 3])

두 번째 셀을 살펴보면, (2, 2, 3) 크기의 3D Tensor를 (변동 가능한 값, 3) 크기의 2D Tensor로 변환한다.

세 번째 셀에서도 마찬가지로 (변동 가능한 값, 1, 3) 크기의 3D Tensor로 변환한다.

이때 세 Tensor 모두 차원의 곱이 12로 같음을 확인할 수 있다.

8) Squeeze

Squeeze는 영어 단어 자체로 '짜낸다'라는 의미를 갖는다. dimension이 1인 경우 그것을 없애면서 tensor의 rank를 낮춰준다.

이때 dimension을 지정해주게 되면 해당 dimension이 1이 아니면 변화가 없고, 1이면 없애준다.

ft = torch.FloatTensor([[0], [1], [2]]) # size: (3, 1)

ft.squeeze() # size: (3, )

ft.squeeze(dim=0) # size: (3, 1) -> 아무 반응 x
ft.squeeze(dim=1) # size: (3, ) -> ft.squeeze()와 같은 결과

9) Unsqueeze

Squeeze의 반대로, 원하는 dimension을 추가해줄 수 있다. 추가해줄 dimension을 인자로 꼭 명시해주어야 한다.

ft = torch.Tensor([0, 1, 2]) # size: (3, )

ft.unsqueeze(0) # size: (1, 3)

ft.view(1, -1) # size: (1, 3) -> 위와 같은 결과

ft.unsqueeze(1) # size: (3, 1)
ft.unsqueeze(-1) # size: (3, 1)

10) Type Casting

Tensor의 datatype을 명시적으로 바꿔줄 수 있다.

11) Concatenating, Stacking

여러 Tensor를 이어붙이는 방법은 concatenate, stacking 등의 방법이 있다.

먼저 concatenate를 살펴보자.

dim=0의 경우, 행 방향(상하방향)으로 이어주고, dim=1인 경우 열 방향(좌우방향)으로 이어준다.

x = torch.FloatTensor([[1, 2], [3, 4]])
y = torch.FloatTensor([[5, 6], [7, 8]])
print(torch.cat([x, y], dim=0)) # 행을 concat
print(torch.cat([x, y], dim=1)) # 열을 concat

다음으로, Stacking을 살펴보자.

Stack은 여러 tensor를 쌓을 때, concat보다 좀 더 편하게 이어줄 수 있는 함수이다.

예를 들어 1D tensor 여러 개를 2D tensor로 쌓아주는 경우, cat 함수를 사용하려면 unsqueeze를 통해 차원을 확장시킨 후에 이어줄 수 있다.

하지만 Stack을 사용하면 간단하게 리스트로 묶어주어 바로 연결시켜줄 수 있다.

# 3 개의 1D vector (2, )에 대해
x = torch.FloatTensor([1, 4])
y = torch.FloatTensor([2, 5])
z = torch.FloatTensor([3, 6])

# Stacking
print(torch.stack([x, y, z])) # 쌓을 방향은 default가 행방향
print(torch.stack([x, y, z], dim=1)) # 열 방향으로 쌓음

# Concatenating
print(torch.cat([x.unsqueeze(0), y.unsqueeze(0), z.unsqueeze(0) ], dim=0))

12) Ones and Zeros

ones는 모든 요소가 1인 행렬을 생성하는 것이고, zeros는 모든 요소가 0인 행렬을 생성해주는 함수이다.

PyTorch에서는 특정 행렬 x에 대해 x와 같은 크기의 ones 또는 zeros를 생성해줄 수 있다.

x = torch.FloatTensor([[0, 1, 2], [2, 1, 0]])

print(torch.ones_like(x)) # x와 같은 shape의 ones
print(torch.zeros_like(x)) # x와 같은 shape의 zeros
# device(CPU, GPU 등)도 x와 일치시켜야 함

13) In-place Operation

보통 위에서 언급한 함수들을 포함하여 대부분의 PyTorch 내에서 Tensor를 다루는 함수들은 새로운 메모리를 할당하여 그 결과를 저장하고, 원래 tensor는 변하지 않는다.

하지만, 함수 뒤에 '_'를 포함하여 사용하면 해당 tensor 자체에 연산을 적용시켜 tensor를 변환시킨다. (하지만, PyTorch 자체에 garbage collector가 잘 설계되어 있어, 연산 속도가 크게 차이나지는 않는다고 한다.)

14) 기타 기억해야 할 함수들 (계속 추가할 예정)

(1) Tensor.item()

Tensor의 값을 표준 파이썬 number로 반환해준다.

텐서가 스칼라(element가 한 개)일 때에만 사용 가능하다. (여러 개인 경우 tolist()함수 사용하여 표현)

PyTorch 익히기 - Docker 세팅

Sun, 10 Jul 2022 14:28:29 GMT

Pytorch를 간단히 다루어본 적이 있는데, 앞으로의 연구에 익숙하게 활용하기 위해 Pytorch 내용을 정리해보려 한다.

대부분의 내용은 유튜브의 '모두를 위한 딥러닝 시즌2'를 참고하였다.

먼저, Docker에 대한 설명을 기록해둔다.

docker는 연구 과정에서도 많이 사용되는 유용한 기능을 제공하므로, 개념을 살펴볼 필요가 있다. (설치 방법 등은 따로 구글링해보면 쉽게 알아낼 수 있다. 또한, 이 블로그의 '버전관리' 카테고리에 유용한 명령어들을 정리해둘 것이다.)

1. Docker가 제공하는 유용한 기능

프로그래밍과 SW개발을 해오면서 이전에 다른 사람이 제공해놓은 유용한 코드를 똑같이 클론코딩 하였는데, 오류가 뜨는 경우가 수도 없이 많다.

도커가 가장 유용한 것은, 다른 사람이 코드를 작성하고 실행했던 환경을 어떤 형태로 저장해두어서, 내가 그것을 이용할 때 작성자와 똑같은 가상 환경 상에서 코드를 작성하고 실행할 수 있다는 점이다.

즉, 환경변수, 설치 오류, 드라이버, 레지스트리, 라이브러리의 버전 등을 신경쓰지 않아도 된다.

2. Docker란?

도커는 Container 기반의 가상 환경 시스템 (Container-based Virtualization System)이다.

무슨 말인가 싶겠지만, 하나 하나 살펴보면 이해가 될 것이다.

1) Virtualization (가상화)

가상화를 통해 내 컴퓨터에 실제로 하드디스크가 하나밖에 없지만, 파티션을 나누어 C드라이브와 D드라이브로 있는 것처럼 보이게 할 수 있다.

좀 더 나아가 컴퓨터 안에 가상의 컴퓨터를 만들 수 있다.

맥os에서 parallels를 사용하여 windows 환경을 사용할 수 있는데, 이를 가상화에 의한 가상머신이라 한다.

가상화를 사용하면 서버 하나를 여러 가상 서버(가상 머신)로 쪼개어 여러 사람이 사용할 수 있게 된다.

하지만, 한 컴퓨터에서 독립된 운영체제를 여러 개 돌리는 가상화의 경우, 매우 느려진다.

2) Container (컨테이너)

따라서 리눅스 측에서는 우분투, CentOS, 레드햇 등의 운영체제들을 모두 하나의 리눅스로 다룰 수 없을까 고민한 결과 탄생한 것이 바로 도커이다.

도커를 사용하면 독립된 운영체제를 여러개 띄울 필요가 없다. 즉, 리눅스라는 기본 운영체제 하나 위에 다음과 같이 도커를 설치하여 여러 가상 환경을 만드는 것이다.

이와 같이 운영체제는 하나로 통일하고, 그 상위에서 나머지 필요한 부분(라이브러리 버전 등!)만 묶어 가볍게 가상화 한 것이 바로 컨테이너이다.

모두를 위한 딥러닝 시즌 2에서는 실습을 진행하는 데 필요한 모든 것이 세팅된 도커 이미지를 제공한다.

이를 통해 로컬에서 모든 라이브러리의 버전을 하나하나 맞추어 가며 환경 설정을 할 필요가 없어진다.

3. 윈도우즈나 맥에서 도커 사용 시 유의할 점

윈도우즈와 맥에서도 도커를 사용할 수 있으나, 도커는 원래 리눅스 기반으로 만들어졌다. 따라서 별도의 가상 머신이나 하이퍼바이저를 사용하기 때문에 리눅스 만큼의 성능이 나오지 않을 수 있고, gpu 사용이 불가능하다.