HeyHo.log

특정 Layer의 weight를 고정시키기

Tue, 07 Mar 2023 06:20:19 GMT

import torch
import torch.nn as nn
import torch.optim as optim

# Load a pre-trained model
model = torch.hub.load('pytorch/vision', 'resnet18', pretrained=True)

# Fix the weights of the first 5 layers
for param in model.parameters():
    param.requires_grad = False

for param in model.layer4.parameters():
    param.requires_grad = True

# Define a new classifier
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)

# Define a loss function
criterion = nn.CrossEntropyLoss()

# Define an optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Train the model
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, labels)

    # Backward pass for unfrozen layers
    optimizer.zero_grad()
    loss.backward()

    # Update the weights for unfrozen layers
    optimizer.step()

tensorf 정리

Tue, 07 Feb 2023 06:46:15 GMT


TensorVMSplit(
  (density_plane): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x16x100x100 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x16x100x100 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x16x100x100 (GPU 0)]
  )
  (density_line): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x16x100x1 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x16x100x1 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x16x100x1 (GPU 0)]
  )
  (app_plane): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x48x100x100 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x48x100x100 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x48x100x100 (GPU 0)]
  )
  (app_line): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x48x100x1 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x48x100x1 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x48x100x1 (GPU 0)]
  )
  (basis_mat): Linear(in_features=144, out_features=27, bias=False)
  (renderModule): MLPRender_Fea(
    (mlp): Sequential(
      (0): Linear(in_features=150, out_features=128, bias=True)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=128, out_features=128, bias=True)
      (3): ReLU(inplace=True)
      (4): Linear(in_features=128, out_features=3, bias=True)
    )
  )
)

renderModule에서 nn.Linear의 첫번쨰 intput 같은 경우에는, 2 $\times$ viewpe $\times$ 3 + 2 $\times$ feape $\times$ P(=27) + 3 + P(=27) 의 input channel로 nn.Linear를 만든다.

WGAN

Thu, 08 Dec 2022 06:01:54 GMT

- WGAN

WGAN propose using the Wassertein distance to measure the distance between the two distribution.

1. Why Wasserstein is better than Js or KL?

(1) suppose that we have two probaility distribution P and Q

다음과 같이 distribution이 겹치지 않는 두 개의 joint propability distribution이 있다고 가정해보자.

$\forall(x, y) \in P, x=0$ and $y \sim U(0,1)$
$\forall(x, y) \in Q, x=\theta, 0 \leq \theta \leq 1$, and $y \sim U(0,1)$
(2) when $\theta \neq 0$
다음과 같은 조건에서 KL divergence 와 JS divergence를 구해보면,
$D_{K L}(P | Q)=\sum_{x=0, y \sim U(0,1)} 1 \cdot \log \frac{1}{0}=+\infty$
$D_{K L}(Q | P)=\sum_{x=\theta, y \sim U(0,1)} 1 \cdot \log \frac{1}{0}=+\infty$
$D_{J S}(P, Q)=\frac{1}{2}\left(\sum_{x=0, y \sim U(0,1)} 1 \cdot \log \frac{1}{\frac{1}{2}}+\sum_{x=0, y \sim U(0,1)} 1 \cdot \log \frac{1}{\frac{1}{2}}\right)=\log 2$ 으로 Gradient가 0인 것을 확인할 수 있다.

하지만, Wasserstein distance의 경우에는 2개의 distribution을 가장 가까이 겹치게 하는 방법은 line을 따라서 옮기는 것이므로,

$W(P,Q) = |\theta|$

(3) when $\theta = 0$

$D_{K L}(P | Q)=D_{K L}(Q | P)=D_{J S}(P, Q)=0$
$W(P,Q) = 0 = |\theta|$

(2), (3)의 경우를 통해서 보았을 때, wasserstien distance가 KL divergence와 JS divergence보다 훨씬 smooth한 distance measure를 가지고 있는 것을 알 수 있다.

2. Kantorovich-Rubinstein duality

우리는 지금까지 2개의 distribution이 겹치지 않는 경우에 Wasserstien distance가 KL, JS divergence 보다 distribution measure 측면에서 훨씬 유의미한 정보를 제공하는 것을 확인하였다. 하지만, $W\left(p_r, p_\theta\right)=\inf {\gamma \sim \Pi\left(p_r, p_\theta\right)} \mathbb{E}{(x, y) \sim \gamma}[|x-y|]$ 에서 ${\gamma \sim \Pi\left(p_r, p_\theta\right)}$는 가능한 모든 joint distribution의 set이기 때문에, 이를 모두 고려하여 wassertein distance를 구한다는 것은 사실상 불가능하다. 이러한 primal problem을 풀기 위해 dual problem으로 바꿔서 문제를 해결하는 Kantorovish-Rubinstein duality가 등장한다.

(1) Highly intractable term in inf

$W\left(\mathbb{P}r, \mathbb{P}_g\right)=\inf _{\gamma \in \Pi\left(\mathbb{P}_r, \mathbb{P}_g\right)} \mathbb{E}{(x, y) \sim \gamma}[|x-y|]$ 는 highly intractable하다. 따라서 Kantorovish-Rubinstein duality를 통해, Lipschitz continuous 조건을 만족하는 '어떠한' function $f: X \rightarrow R$를 통해서 다음과 같은 dual problem으로 문제를 해결한다.

$W\left(p_r, p_\theta\right)=\frac{1}{K} \sup {|f|{L \leq K}} \mathbb{E}{x \sim p_r}[f(x)]-\mathbb{E}{x \sim p_\theta}[f(x)]$

이렇게 Wasserstein distance를 duaility를 통해서 다르게 정의할 수 있다. 그렇다면 $f$는 무엇인가?

(2) Find optimal $f(x)$

$f$는 말 그대로, $\mathbb{E}{x \sim p_r}[f(x)]-\mathbb{E}{x \sim p_\theta}[f(x)]$의 값을 최대로 만족하는 '어떠한' function $f$이다.

이러한 $f$를 잘 '추정' 하기 위해서 parameter $w$를 가지는 neural network를 사용하여 다음과 같은 수식을 만족시키는 $f_w$를 추정해준다. (neural network 는 universal function approximator 이기 때문에 neural net을 사용하여 $f$를 추정한다.) $\max {w \in W} \mathbb{E}{x \sim p_r}\left[f_w(x)\right]-\mathbb{E}{x \sim p_\theta}\left[f_w(x)\right] \leq \sup _{|f|{L \leq K}} \mathbb{E}{x \sim p_r}[f(x)]-\mathbb{E}{x \sim p_\theta}[f(x)]=K \cdot W(P_r, P_\theta)$

우리는 그저 sup 형태로 존재하는 $f(x)$를 잘 추정하면 되기 때문에, 굳이 $K$에 대해 자세하게 알 필요가 없다.

$\max {w \in W} \mathbb{E}{x \sim p_r}\left[f_w(x)\right]-\mathbb{E}_{x \sim p_\theta}\left[f_w(x)\right]$ 을 만족하는 $f_w(x)$를 찾기 위해 gradient를 구해준다.
$\nabla_w[f_w(x)-f(g_\theta(z))]$ 를 통해서 sup problem의 solution인 $f(x)$를 추정해주는 $f_w(x)$의 parameter $w$를 update한다.

(3) Generator update process

Neural network를 통해서 $f_w$를 구한 다음, Generator의 parameter $\theta$를 update 시켜준다.

$\begin{aligned} \nabla_\theta W\left(p_r, p_g\right) & =\nabla_\theta\left(\mathbb{E}{x \sim p_r}\left[f_w(x)\right]-\mathbb{E}{z \sim Z}\left[f_w\left(g_\theta(z)\right)\right]\right) \ & =-\mathbb{E}_{z \sim Z}\left[\nabla_\theta f_w\left(g_\theta(z)\right)\right]\end{aligned}$

(4) Weight Clipping

초기에 sup problem을 생각해보면, Kantorovish-Rubinstein duality를 통해서 Primal problem을 dual problem으로 끌고 간다. 이 때, Lipschitz Continuous를 만족해야 한다는 조건이 붙는다. Neural Network를 통해서 $f_w(x)$를 추정할 때, $f_w(x)$또한 Lipschitz Continuous를 만족해야 한다. Neural network에서 $f_w$의 Gradient는 곧 weight이기 때문에, Weight Clipping 해주면 그 것 자체로 Lipschitz Continuous를 만족시키는 것이다. 따라서, Weight Clipping을 통해서 Lipschitz Countinuous를 만족시켜준다.

(5) Total training process

전체적인 Process의 pseudo code이다.

우선, Earth Mover's distance optimization problem을 dual problem으로 바꾸어준다. 그 다음, 고정된 $\theta$에서 Dual problem solution의 approximated function $f_w(x)$를 training을 통해서서 찾아준다.
Wasserstein distance를 backprop해서 generator의 파라미터를 update 시켜준다.

Lipshitz Continous Visualization

Thu, 08 Dec 2022 03:26:41 GMT

https://www.desmos.com/calculator/dobs3sfeiv?lang=ko

한줄 정리 - VAE

Wed, 07 Dec 2022 05:42:26 GMT

VAE는 image generation 분야에서 GAN 이전시대에 주름 잡던 모델이다.

VAE 구조.

한줄 정리: Encoder를 통해서 input data x를 latent space로 보내고, latent space의 mean과 variance를 구한다. mean과 variance에서 stochastic sample인 $\epsilon$ ~ $N(0,1)$를 통해서 z를 구한다. 해당 z를 다시 decoder에 넣어서 input data x를 복원한다.

Markdown 수식 모음

Fri, 02 Dec 2022 07:52:18 GMT

https://ko.wikipedia.org/wiki/%EC%9C%84%ED%82%A4%EB%B0%B1%EA%B3%BC:TeX_%EB%AC%B8%EB%B2%95 이 곳에서 MarkDown 또는 Tex 수식 작성에 필요한 문법을 구할 수 있다.

Stanford CS230 Career Advice

Fri, 02 Dec 2022 05:03:51 GMT

https://youtu.be/733m6qBH-jI

정규 강의에서 career advice까지 해주다니... 정말 탑 학교는 다르다고 느낀다...

nn.dataparrel 사용시 GPU 몰림 현상

Wed, 30 Nov 2022 12:15:51 GMT

NeRF 코드를 'model = nn.DataParallel(model).to(device)' 를 사용하여 GPU 분산처리를 시도하였다.

문제점 그러나 다음과 같이 한 GPU에 memory가 쏠리는 문제가 발생하였다.

모델에서 weight는 분산되서 사용될 지 몰라도, loss value의 경우 한 GPU에 몰려서 계산된다고 한다.

해결방법 loss value도 GPU로 병렬처리 해서 연산할 수 있다고 하는데, 해당 내용과 관련된 링크를 첨부한다. [링크 1] - https://medium.com/daangn/pytorch-multi-gpu-%ED%95%99%EC%8A%B5-%EC%A0%9C%EB%8C%80%EB%A1%9C-%ED%95%98%EA%B8%B0-27270617936b [링크 2] - https://aigong.tistory.com/186

수식 이미지를 text로 변환시켜주는 사이트

Mon, 21 Nov 2022 04:13:12 GMT

https://snip.mathpix.com/ 굉장히 유용하다.

수식이 담긴 이미지를 그냥 업로드만 하면, Text, Latex 등등 다양한 형태의 수식으로 변환해준다. 진짜 정말 편하다.

torch.unsqueeze()와 torch.flatten()의 차이

Wed, 16 Nov 2022 11:51:13 GMT

일반적인 torch array

코드

import torch
x = torch.linspace(-1,1,20)
x.shape

결과

torch.unsqueeze()

코드

import torch
x = torch.linspace(-1,1,20).unsqueeze(dim=1)
x.shape

결과

torch.flatten()

코드
```
x.flatten()
x.shape
```
결과

optimizer.zero_grad()는 왜 사용하는가?

Wed, 16 Nov 2022 09:31:52 GMT

optimizer.zero_grad()는 왜 사용하는가? 답은 간단하다. loss.backward()를 통해서 tensor의 gradient가 연산될 때, gradient값이 축적되는걸 막아주기 위해서이다. 다음과 같이 $a2 = 2 \times a1$으로 정의한 후, backward()를 진행하면, print(a1.grad)를 통해서 a1의 gradient를 알 수 있다. a1.grad의 결과가 tensor(2.)로 나온 것을 확인할 수 있다.

하지만, 똑같이 a2.backward()를 실행해주면, 다음과 같이 a1의 gradient값이 4로 나온 것을 확인할 수 있다. 원래는 $a2 = 2 \times a1$이기 때문에 a1.grad값이 2가 나와야 하지만 optimizer.zero_grad()를 통해서 gradient 값이 축적되는 걸 막지 않았기 때문에, a1.grad 값이 4가 나온 것을 확인할 수 있다.

nn.module에서 CPU, GPU 혼합 사용으로 인한 error

Wed, 16 Nov 2022 06:51:30 GMT

#all tensors of each operation should be in same device
class Module(nn.Module):
  def __init__(self):
    super().__init__()
    self.network = nn.Sequential(nn.Linear(1000, 1000), nn.Linear(1000, 100))

  def forward(self, x):
    return self.network(x)

다음과 같이 module이 정의될 때, 아래와 같이 연산을 진행해보았다.

module = Module().to(0)
x = torch.zeros(1000, 1000).to(0)
module(x)

module에서 to(0)연산을 통해 module의 parameter를 GPU로 보냈지만, x는 여전히 cpu에 남아있기 때문에 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!' 에러가 발생하였다.

module = Module().to(0)
x = torch.zeros(1000, 1000)
module(x)

똑같이 x의 device를 GPU로 보내주면, device에서 없이 연산이 잘 된 것을 확인할 수 있다.

colab에서 성가신 자동완성 window 끄기

Wed, 16 Nov 2022 06:21:36 GMT

colab에서 코딩을 하다 보면, 다음과 같이 자동완성 창이 보조 설명으로 떠서 위에 있는 코드들을 가린다. 너무 성가신데, 이 창 이름이 뭔지 몰라서 해결 방법을 어떻게 검색할지도 몰랐었다. 그냥 구글에 colab annoying window라고 검색하니까 바로 결과가 바로 나와버렸다...

역시 구글 형님들 짱...

1. 다음과 같이 상단에 톱니바퀴 모양의 설정 메뉴를 클릭한다.

2. 그 다음, 편집기에서 '코드 완성 제안을 자동으로 표시' 기능을 꺼주도록 한다.

출처 : https://stackoverflow.com/questions/63696360/google-colab-how-to-turn-off-suggestion-window

point들 사이의 중점 계산하기

Tue, 08 Nov 2022 04:42:15 GMT

import torch

x = torch.tensor([1.0,2.0,4.0,10.0,12])
x_mids = 0.5*(a[1:] + a[:-1])
y = [1,1,1,1,1]
y_mids = [1,1,1,1]

plt.plot(x,y, 'ro', label='org_points')
plt.plot(x_mids,y_mids, 'b*', label='mids')
plt.legend()
plt.show()

해당 포인트들 사이의 중점들을 계산해준다.

NeRF Code Review - def raw2outputs

Fri, 04 Nov 2022 08:42:49 GMT

run_nerf.py 파일 안에 있다.

Input

raw: [N_rand, N_samples, 3+1], NeRF network로 부터 estimation 된 RGB$\sigma$ output z_vals: [N_rand, N_samples] 코드에서는 Integration time이라고 하는데 이게 뭐지... rays_d: [N_rand, 3] direction of each ray white_bkgd: 흰색 배경 flag pytest -> 이건 뭐?

변수 부가 설명

rgb_map: [N_rand, 3] Estimated RGB color of a ray disp_map: [N_rand

전체 코드

def raw2outputs(raw, z_vals, rays_d, raw_noise_std=0, white_bkgd=False, pytest=False):
    """Transforms model's predictions to semantically meaningful values.
    Args:
        raw: [num_rays, num_samples along ray, 4]. Prediction from model.
        z_vals: [num_rays, num_samples along ray]. Integration time.
        rays_d: [num_rays, 3]. Direction of each ray.
    Returns:
        rgb_map: [num_rays, 3]. Estimated RGB color of a ray.
        disp_map: [num_rays]. Disparity map. Inverse of depth map.
        acc_map: [num_rays]. Sum of weights along each ray.
        weights: [num_rays, num_samples]. Weights assigned to each sampled color.
        depth_map: [num_rays]. Estimated distance to object.
    """
    raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)

    dists = z_vals[...,1:] - z_vals[...,:-1]
    dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1)  # [N_rays, N_samples]

    dists = dists * torch.norm(rays_d[...,None,:], dim=-1)

    rgb = torch.sigmoid(raw[...,:3])  # [N_rays, N_samples, 3]
    noise = 0.
    if raw_noise_std > 0.:
        noise = torch.randn(raw[...,3].shape) * raw_noise_std

        # Overwrite randomly sampled data if pytest
        if pytest:
            np.random.seed(0)
            noise = np.random.rand(*list(raw[...,3].shape)) * raw_noise_std
            noise = torch.Tensor(noise)

    alpha = raw2alpha(raw[...,3] + noise, dists)  # [N_rays, N_samples]
    # weights = alpha * tf.math.cumprod(1.-alpha + 1e-10, -1, exclusive=True)
    weights = alpha * torch.cumprod(torch.cat([torch.ones((alpha.shape[0], 1)), 1.-alpha + 1e-10], -1), -1)[:, :-1]
    rgb_map = torch.sum(weights[...,None] * rgb, -2)  # [N_rays, 3]

    depth_map = torch.sum(weights * z_vals, -1)
    disp_map = 1./torch.max(1e-10 * torch.ones_like(depth_map), depth_map / torch.sum(weights, -1))
    acc_map = torch.sum(weights, -1)

    if white_bkgd:
        rgb_map = rgb_map + (1.-acc_map[...,None])

    return rgb_map, disp_map, acc_map, weights, depth_map

1. alpha, dists 구하기

    raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)

    dists = z_vals[...,1:] - z_vals[...,:-1]
    dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1)  # [N_rays, N_samples]

    dists = dists * torch.norm(rays_d[...,None,:], dim=-1)

    rgb = torch.sigmoid(raw[...,:3])  # [N_rays, N_samples, 3]

raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)

-> 5.2 Hierarchical volume sampling 에서 등장하는 weight term과 alpha term이다. $$\sigma_i$$는 Volume density를, $$\delta_i$$는 하나의 ray에서 sampling point들 사이의 distance($$t_{i+1} - t_i$$)를 뜻한다. raw2alpha는 말 그대로 raw data에서 paper의 alpha로 값을 mapping한다. $$ raw2alpha := 1-exp{(-ReLU(x) \times dists)} $$ |수식|코드| |------|---| |$$\alpha_i = 1 - exp(-\sigma_i\delta_i)$$| raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)|

dists = z_vals[...,1:] - z_vals[...,:-1]

출처: https://towardsdatascience.com/its-nerf-from-nothing-build-a-vanilla-nerf-with-pytorch-7846e4c45666

ray에서 startified sampling을 통해 뽑은 point들 사이의 거리. 뭐 대충 이런 느낌인 것 같다...

dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1) # [N_rays, N_samples]

dists = dists * torch.norm(rays_d[...,None,:], dim=-1)

앞에서 구한 stratified sampling point들 사이의 거리인 dists에서 1e10을 concatenate 해주고, ray의 방향 정보인 rays_d의 norm을 구해서 이를 dists에 곱해준다.

rgb = torch.sigmoid(raw[...,:3]) # [N_rays, N_samples, 3] NeRF model의 RGB + volume density estimation 값을 sigmoid를 통해서 0~1 사이로 mapping 해준다. 왜 굳이 또 mapping 해주는거지?
- 그래서 NeRF model의 min, max값을 찍어보았다,
- torch.sigmoid로 mapping 이후 rgb min, max 값
0.5 근처로 mapping 된 것을 확인할 수 있다.

alpha = raw2alpha(raw[...,3] + noise, dists)
- raw2alpha function을 통해 alpha 계산.

2. $T_i$, RGB, Depth, Disp, acc map 구하기.

2.1 Weight식 깔끔하게 정리하기.

우선, Paper에서 $T_i$는 다음과 같이 정의되어 있다.

이때, alpha는

$T_i$와 $\alpha_i$ 두 수식이 매우 비슷하게 생기지 않았는가? 이 식을 정리하면 다음과 같다. $$ 1-\alpha_i = exp(-\sigma_i\delta_i) $$ 이를 log sum 형태로 나타내면, $$ T_i = \prod_{j = 1}^{i-1}( 1-\alpha_j) = exp(-\sum_{j = 1}^{i-1} \sigma_j\delta_j) $$ 다음과 같이 나타낼 수 있다. 코드에서는 이렇게 alpha term을 변형하여, torch.cumprod를 통해서 Pi 연산을 한다. 최종적으로 $T_i$를 계산한다.

    alpha = raw2alpha(raw[...,3] + noise, dists)  # [N_rays, N_samples]
    # weights = alpha * tf.math.cumprod(1.-alpha + 1e-10, -1, exclusive=True)
    weights = alpha * torch.cumprod(torch.cat([torch.ones((alpha.shape[0], 1)), 1.-alpha + 1e-10], -1), -1)[:, :-1]
    rgb_map = torch.sum(weights[...,None] * rgb, -2)  # [N_rays, 3]

    depth_map = torch.sum(weights * z_vals, -1)
    disp_map = 1./torch.max(1e-10 * torch.ones_like(depth_map), depth_map / torch.sum(weights, -1))
    acc_map = torch.sum(weights, -1)

    if white_bkgd:
        rgb_map = rgb_map + (1.-acc_map[...,None])

    return rgb_map, disp_map, acc_map, weights, depth_map

weights = alpha * torch.cumprod(torch.cat([torch.ones((alpha.shape[0], 1)), 1.-alpha + 1e-10], -1), -1)[:, :-1]
- 위 코드를 수식으로 나타내면 다음과 같다.
- $$w_i$$ $$= T_i(1-exp(-\sigma_i\delta_i) = T_i\alpha_i = \prod_{j = 1}^{i-1}( 1-\alpha_i)\alpha_i$$
- torch.cumprod 계산을 통해서 weights를 계산한다.
- 1e-10 maybe for prevent Nan?

rgb_map = torch.sum(weights[...,None] * rgb, -2) # [N_rays, 3]
- 위 코드를 수식으로 나타내면 다음과 같다.
- $$\hat{C_c}(r) = \sum_{i = 1}^{N_c} w_ic_i$$
depth_map = torch.sum(weights * z_vals, -1)
- 그냥 weith에 ray를 startify sampling한 point들에 곱한다.
disp_map = 1./torch.max(1e-10 * torch.ones_like(depth_map), depth_map / torch.sum(weights, -1))
- disparity map을 구한다. inverse depth라고 생각하면 된다.
acc_map = torch.sum(weights, -1)
- 이건 뭐지? acc_map의 역할이 뭔지 좀 더 알아봐야겠다.

NeRF Code Review - def render_rays

Fri, 04 Nov 2022 07:51:09 GMT

render_rays 함수에 대해 알아보자.

다음은 render_rays 함수의 전체 code이다. 하나하나 뜯어보도록 하자.

def render_rays(ray_batch,
                network_fn,
                network_query_fn,
                N_samples,
                retraw=False,
                lindisp=False,
                perturb=0.,
                N_importance=0,
                network_fine=None,
                white_bkgd=False,
                raw_noise_std=0.,
                verbose=False,
                pytest=False):
    """Volumetric rendering.
    Args:
      ray_batch: array of shape [batch_size, ...]. All information necessary
        for sampling along a ray, including: ray origin, ray direction, min
        dist, max dist, and unit-magnitude viewing direction.
      network_fn: function. Model for predicting RGB and density at each point
        in space.
      network_query_fn: function used for passing queries to network_fn.
      N_samples: int. Number of different times to sample along each ray.
      retraw: bool. If True, include model's raw, unprocessed predictions.
      lindisp: bool. If True, sample linearly in inverse depth rather than in depth.
      perturb: float, 0 or 1. If non-zero, each ray is sampled at stratified
        random points in time.
      N_importance: int. Number of additional times to sample along each ray.
        These samples are only passed to network_fine.
      network_fine: "fine" network with same spec as network_fn.
      white_bkgd: bool. If True, assume a white background.
      raw_noise_std: ...
      verbose: bool. If True, print more debugging info.
    Returns:
      rgb_map: [num_rays, 3]. Estimated RGB color of a ray. Comes from fine model.
      disp_map: [num_rays]. Disparity map. 1 / depth.
      acc_map: [num_rays]. Accumulated opacity along each ray. Comes from fine model.
      raw: [num_rays, num_samples, 4]. Raw predictions from model.
      rgb0: See rgb_map. Output for coarse model.
      disp0: See disp_map. Output for coarse model.
      acc0: See acc_map. Output for coarse model.
      z_std: [num_rays]. Standard deviation of distances along ray for each
        sample.
    """
    N_rays = ray_batch.shape[0]
    rays_o, rays_d = ray_batch[:,0:3], ray_batch[:,3:6] # [N_rays, 3] each
    viewdirs = ray_batch[:,-3:] if ray_batch.shape[-1] > 8 else None
    bounds = torch.reshape(ray_batch[...,6:8], [-1,1,2])
    near, far = bounds[...,0], bounds[...,1] # [-1,1]

    t_vals = torch.linspace(0., 1., steps=N_samples)
    if not lindisp:
        z_vals = near * (1.-t_vals) + far * (t_vals)
    else:
        z_vals = 1./(1./near * (1.-t_vals) + 1./far * (t_vals))

    z_vals = z_vals.expand([N_rays, N_samples])

    if perturb > 0.:
        # get intervals between samples
        mids = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        upper = torch.cat([mids, z_vals[...,-1:]], -1)
        lower = torch.cat([z_vals[...,:1], mids], -1)
        # stratified samples in those intervals
        t_rand = torch.rand(z_vals.shape)

        # Pytest, overwrite u with numpy's fixed random numbers
        if pytest:
            np.random.seed(0)
            t_rand = np.random.rand(*list(z_vals.shape))
            t_rand = torch.Tensor(t_rand)

        z_vals = lower + (upper - lower) * t_rand

    pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples, 3]


#     raw = run_network(pts)
    raw = network_query_fn(pts, viewdirs, network_fn)
    rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

    if N_importance > 0:

        rgb_map_0, disp_map_0, acc_map_0 = rgb_map, disp_map, acc_map

        z_vals_mid = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        z_samples = sample_pdf(z_vals_mid, weights[...,1:-1], N_importance, det=(perturb==0.), pytest=pytest)
        z_samples = z_samples.detach()

        z_vals, _ = torch.sort(torch.cat([z_vals, z_samples], -1), -1)
        pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples + N_importance, 3]

        run_fn = network_fn if network_fine is None else network_fine
#         raw = run_network(pts, fn=run_fn)
        raw = network_query_fn(pts, viewdirs, run_fn)

        rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

    ret = {'rgb_map' : rgb_map, 'disp_map' : disp_map, 'acc_map' : acc_map}
    if retraw:
        ret['raw'] = raw
    if N_importance > 0:
        ret['rgb0'] = rgb_map_0
        ret['disp0'] = disp_map_0
        ret['acc0'] = acc_map_0
        ret['z_std'] = torch.std(z_samples, dim=-1, unbiased=False)  # [N_rays]

    for k in ret:
        if (torch.isnan(ret[k]).any() or torch.isinf(ret[k]).any()) and DEBUG:
            # print(f"! [Numerical Error] {k} contains nan or inf.")
            print('what?')

    return ret

1. ray_batch로 부터 ray_o, rays_d, near focal 나누기.

    N_rays = ray_batch.shape[0]
    rays_o, rays_d = ray_batch[:,0:3], ray_batch[:,3:6] # [N_rays, 3] each
    viewdirs = ray_batch[:,-3:] if ray_batch.shape[-1] > 8 else None
    bounds = torch.reshape(ray_batch[...,6:8], [-1,1,2])
    near, far = bounds[...,0], bounds[...,1] # [-1,1]

rat_batch: 이전에 만들어졌던 batch 단위의 ray. lego의 경우 dimension은 [N_rand, 11]
ray_batch에서 rays_o와 rays_d를 나눈다.
ray_batch에서 viewdirs를 나눈다.
bound = torch.reshape... -> ray에서 near값과 far 값을 분리한다.
- np.shape(bound) = [N_rand,1,2] ex)[1024,1,2]
최종적으로 near는 값이 2인 [N_rand,1], far는 값이 6인 [N_rand,1]로 저장된다.

2. Stratified sampling

    t_vals = torch.linspace(0., 1., steps=N_samples)
    if not lindisp:
        z_vals = near * (1.-t_vals) + far * (t_vals)
    else:
        z_vals = 1./(1./near * (1.-t_vals) + 1./far * (t_vals))

    z_vals = z_vals.expand([N_rays, N_samples])

    if perturb > 0.:
        # get intervals between samples
        mids = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        upper = torch.cat([mids, z_vals[...,-1:]], -1)
        lower = torch.cat([z_vals[...,:1], mids], -1)
        # stratified samples in those intervals
        t_rand = torch.rand(z_vals.shape)

        # Pytest, overwrite u with numpy's fixed random numbers
        if pytest:
            np.random.seed(0)
            t_rand = np.random.rand(*list(z_vals.shape))
            t_rand = torch.Tensor(t_rand)

        z_vals = lower + (upper - lower) * t_rand

    pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples, 3]


#     raw = run_network(pts)
    raw = network_query_fn(pts, viewdirs, network_fn)
    rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

이 부분은 논문에서 언급한 sampling 방법인 Stratified sampling 을 진행하는 부분이다. $$ t_i \sim \mathcal{U}\left[t_n+\frac{i-1}{N}\left(t_f-t_n\right), t_n+\frac{i}{N}\left(t_f-t_n\right)\right] $$ |논문 내용| 코드 | |----|----| |0부터 1까지 N_sampling개 만큼 간격 동일한 점 생성
N evenly-spaced bins|t_vals = torch.linspace(0., 1., steps=N_samples)| |we use a stratified sampling approach where we partition $[t_n,t_f]$|if not lindisp:
z_vals = near * (1.-t_vals) + far * (t_vals)
else:
z_vals = 1./(1./near * (1.-t_vals) + 1./far * (t_vals))
z_vals = z_vals.expand([N_rays, N_samples])|

z_vals = lower + (upper - lower) * t_rand 여기가 이해가 안되는데? 왜 이렇게 연산을 복잡하게 하는거지? 그냥 stratified sampling 인가?

작성중

 raw = network_query_fn(pts, viewdirs, network_fn)
 rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

raw2ouput을 통해서 (raw: NeRF network를 통해서 예측된 point들의 RGB값.) rgb_map, disp_map, acc_map, weights를 output으로 얻는다. raw2output에 대한 정보는 NeRF Code Review 시리즈에서 확인할 수 있다. raw2output 링크

3. Hierarchical volume Sampling

    if N_importance > 0:

        rgb_map_0, disp_map_0, acc_map_0 = rgb_map, disp_map, acc_map

        z_vals_mid = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        z_samples = sample_pdf(z_vals_mid, weights[...,1:-1], N_importance, det=(perturb==0.), pytest=pytest)
        z_samples = z_samples.detach()

        z_vals, _ = torch.sort(torch.cat([z_vals, z_samples], -1), -1)
        pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples + N_importance, 3]

        run_fn = network_fn if network_fine is None else network_fine
#         raw = run_network(pts, fn=run_fn)
        raw = network_query_fn(pts, viewdirs, run_fn)

        rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

코드 설명

  rgb_map_0, disp_map_0, acc_map_0 = rgb_map, disp_map, acc_map

coarse network를 통해서 얻은 rgb_map, disp_map, acc_map을 따로 저장한다.

  z_vals_mid = .5 * (z_vals[...,1:] + z_vals[...,:-1])

coarse sampline을 통해 얻은 ray에서의 sample의 중점 계산.

  z_samples = sample_pdf(z_vals_mid, weights[...,1:-1], N_importance, det=(perturb==0.), pytest=pytest)
  z_samples = z_samples.detach()

Hierarchical sampling을 통해 fine network에 들어갈 sample 계산.

  z_vals, _ = torch.sort(torch.cat([z_vals, z_samples], -1), -1)
  pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples + N_importance, 3]

z_vals = coarse sampled points + fine sampled points ray에서 fine sampled points + coarse sampled point 정의

  run_fn = network_fn if network_fine is None else network_fine

  raw = network_query_fn(pts, viewdirs, run_fn)

  rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

$N_c + N_f$ sample들을 input으로 fine network를 통해 rgb_map, disp_map, acc_map depth_map을 output으로 반환.

    ret = {'rgb_map' : rgb_map, 'disp_map' : disp_map, 'acc_map' : acc_map}
    if retraw:
        ret['raw'] = raw
    if N_importance > 0:
        ret['rgb0'] = rgb_map_0
        ret['disp0'] = disp_map_0
        ret['acc0'] = acc_map_0
        ret['z_std'] = torch.std(z_samples, dim=-1, unbiased=False)  # [N_rays]

    for k in ret:
        if (torch.isnan(ret[k]).any() or torch.isinf(ret[k]).any()) and DEBUG:
            # print(f"! [Numerical Error] {k} contains nan or inf.")
            print('what?')

    return ret

코드 설명 ret이라는 dictionary를 통해서 rgb_map, disp_map, acc_map 저장. 그 후, nan을 계산해서 error 검출. 마지막으로 ret 반환.

NeRF Code Review - def batchify(fn, chunk)

Fri, 04 Nov 2022 07:50:55 GMT

def batchify(fn, chunk):
    """Constructs a version of 'fn' that applies to smaller batches.
    """
    if chunk is None:
        return fn
    def ret(inputs):
        return torch.cat([fn(inputs[i:i+chunk]) for i in range(0, inputs.shape[0], chunk)], 0)
    return ret

if chunk is None:
- chunk가 정해져 있지 않으면, fn을 반환한다. default 값으로 1024*64가 저장되어 있다.
def ret(inputs): return torch.cat([fn(inputs[i:i+chunk]) for i in range(0, inputs.shape[0], chunk)], 0)
- fn은 NeRF의 class object이다.
- chunk 단위 만큼 input을 잘라서 NeRF network에 input으로 넣어준다.
  - 이 부분에서 run network가 실행된다(?) 추후에 알아보고 수정
- fn(inputs[i:i+chunk]를 통해서 batch화 된 ray들이 NeRF network로 들어가서 RGB estimation이 진행된다.

NeRF Code Review - class NeRF(nn.Module)

Fri, 04 Nov 2022 06:15:38 GMT

class NeRF(nn.Module):
    def __init__(self, D=8, W=256, input_ch=3, input_ch_views=3, output_ch=4, skips=[4], use_viewdirs=False):
        """ 
        """
        super(NeRF, self).__init__()
        self.D = D
        self.W = W
        self.input_ch = input_ch
        self.input_ch_views = input_ch_views
        self.skips = skips
        self.use_viewdirs = use_viewdirs

        self.pts_linears = nn.ModuleList(
            [nn.Linear(input_ch, W)] + [nn.Linear(W, W) if i not in self.skips else nn.Linear(W + input_ch, W) for i in range(D-1)])

        ### Implementation according to the official code release (https://github.com/bmild/nerf/blob/master/run_nerf_helpers.py#L104-L105)
        self.views_linears = nn.ModuleList([nn.Linear(input_ch_views + W, W//2)])

        ### Implementation according to the paper
        # self.views_linears = nn.ModuleList(
        #     [nn.Linear(input_ch_views + W, W//2)] + [nn.Linear(W//2, W//2) for i in range(D//2)])

        if use_viewdirs:
            self.feature_linear = nn.Linear(W, W)
            self.alpha_linear = nn.Linear(W, 1)
            self.rgb_linear = nn.Linear(W//2, 3)
        else:
            self.output_linear = nn.Linear(W, output_ch)

    def forward(self, x):
        input_pts, input_views = torch.split(x, [self.input_ch, self.input_ch_views], dim=-1)
        h = input_pts
        for i, l in enumerate(self.pts_linears):
            h = self.pts_linears[i](h)
            h = F.relu(h)
            if i in self.skips:
                h = torch.cat([input_pts, h], -1)

        if self.use_viewdirs:
            alpha = self.alpha_linear(h)
            feature = self.feature_linear(h)
            h = torch.cat([feature, input_views], -1)

            for i, l in enumerate(self.views_linears):
                h = self.views_linears[i](h)
                h = F.relu(h)

            rgb = self.rgb_linear(h)
            outputs = torch.cat([rgb, alpha], -1)
        else:
            outputs = self.output_linear(h)

        return outputs

input_pts, input_views = torch.split(x, [self.input_ch,self.input_ch_views], dim=-1)

코드해석
- input_pts : rays_o에 해당하는 ray 위치 정보. shape = [1024*64, 63] 아마 60+3(?)
- input_views : rays_d에 해당하는 ray의 방향 정보 shape = [1024*64, 27] 아마 24+3(?)

     for i, l in enumerate(self.pts_linears):
         h = self.pts_linears[i](h)
         h = F.relu(h)
         if i in self.skips:
             h = torch.cat([input_pts, h], -1)

코드해석
- self.pts_linears type: 'torch.nn.modules.container.ModuleList'
- NeRF 모델에서 rays_o(논문에서 $\gamma(x)$) 정보를 추가적으로 받는 network 까지의 5개의 fully-connected network 이다.
- Activation function으로 ReLU를 사용하였다.
- i가 self.skips 안에 해당되면, rays_o에 해당하는 inputs_pts가 fully connected network의 output인 h와 concatenate 되어 다시 network에 입력된다.

     if self.use_viewdirs:
         alpha = self.alpha_linear(h)
         feature = self.feature_linear(h)
         h = torch.cat([feature, input_views], -1)

         for i, l in enumerate(self.views_linears):
             h = self.views_linears[i](h)
             h = F.relu(h)

         rgb = self.rgb_linear(h)
         outputs = torch.cat([rgb, alpha], -1)
     else:
         outputs = self.output_linear(h)

코드 해석
- alpha = self.alpha_linear(h)
  - Volume Density($\sigma$)를 output으로 뽑는다. Paper의 그림으로만 보았을 때, Activation function 없이 바로 feature extraction 하였을 때, Volume density값과 256 dimension의 feature가 exreact 될 것 같은데, 실제 코드에서는 그렇지 않았다.
    'Detailed expression' 그림을 참조해서 코드를 설명하면, activation function skip 과정 전 단계에서 input feature가 256, output feature가 1로 뽑히는 것을 확인할 수 있다. Paper 에서도 'volume density $\sigma$ (which is rectified using a ReLU to ensure that the output volume density is nonegative)'라고 명시되어 있다.
  - feature = self.feature_linear(h)
    - Activation Function 없이 feature extraction을 진행한다. Paper의 그림에서 주황색 화살표에 해당한다.
  - h = self.views_linears[i](h)
    - ray의 direction 값을 256 dimension feature와 concatenate하여 linear layer에 input으로 넣어준다. 283 dimension의 input을 받는다.
      256(feature dim) + 24(direction dim - embedded by encoding) + 3(original direction) = 283
  - rgb = self.rgb_linear(h)
    - 128 dimension의 feature를 통해서 3 dimension인 RGB 값을 계산한다.

NeRF code review - def get_embedder (작성중)

Thu, 03 Nov 2022 10:08:29 GMT

def get_embedder(multires, i=0):
    if i == -1:
        return nn.Identity(), 3

    embed_kwargs = {
                'include_input' : True,
                'input_dims' : 3,
                'max_freq_log2' : multires-1,
                'num_freqs' : multires,
                'log_sampling' : True,
                'periodic_fns' : [torch.sin, torch.cos],
    }

    embedder_obj = Embedder(**embed_kwargs)
    embed = lambda x, eo=embedder_obj : eo.embed(x)
    return embed, embedder_obj.out_dim

multires는 encoding되는 frequency의 max frequency를 의미한다.
NeRF paper에서 positon 정보(rays_o)가 encoding 될 때는 multires는 L=10, direction 정보(rays_d)가 encoding 될 때는 multires는 L=4가 된다.

positional encoding이 기본적으로 sin, cos로 encoding 되기 때문에 'periodic_fns' : [torch.sin, torch.cos]로 표현되었다.

class Embedder:
    def __init__(self, **kwargs):
        self.kwargs = kwargs
        self.create_embedding_fn()

    def create_embedding_fn(self):
        embed_fns = []
        d = self.kwargs['input_dims']
        out_dim = 0
        if self.kwargs['include_input']:
            embed_fns.append(lambda x : x)
            out_dim += d

        max_freq = self.kwargs['max_freq_log2']
        N_freqs = self.kwargs['num_freqs']

        if self.kwargs['log_sampling']:
            freq_bands = 2.**torch.linspace(0., max_freq, steps=N_freqs)
        else:
            freq_bands = torch.linspace(2.**0., 2.**max_freq, steps=N_freqs)

        for freq in freq_bands:
            for p_fn in self.kwargs['periodic_fns']:    #   torch.sin, torch.cos
                embed_fns.append(lambda x, p_fn=p_fn, freq=freq : p_fn(x * freq))   # sin(2^freq * x), cos(2^freq * x)
                out_dim += d

        self.embed_fns = embed_fns
        self.out_dim = out_dim

    def embed(self, inputs):
        return torch.cat([fn(inputs) for fn in self.embed_fns], -1)

rays_o와 rays_d를 positional encoding 해주는 Embedder class이다.
kwargs는 dictionary 형태로 코드 초기에 parser로 argument들이 저장되어있다.
rays_o와 rays_d는 channel이 3개이므로, 'input_dims'는 3으로 저장되어 있다.
include_input:True -> positional encoding으로 embedding된 function들을 embed_fns에 appnd로 저장할 때, input function을 저장하는 용도로 사용된다.
max_freq는 paper에서 encoding 된 function의 마지막 freq에 해당하는 L-1이 된다.
N_freq는 paper에서 encoding된 function들의 갯수이다.

- positional Encoding with code

        for freq in freq_bands:
            for p_fn in self.kwargs['periodic_fns']:    #   torch.sin, torch.cos
                embed_fns.append(lambda x, p_fn=p_fn, freq=freq : p_fn(x * freq))   # sin(2^freq * x), cos(2^freq * x)
                out_dim += d

NeRF Code Review - def train() 내부 ray sampling (작성중)

Wed, 02 Nov 2022 07:00:01 GMT

전체 코드

            if N_rand is not None:
                rays_o, rays_d = get_rays(H, W, K, torch.Tensor(pose))  # (H, W, 3), (H, W, 3)

                if i < args.precrop_iters:
                    dH = int(H//2 * args.precrop_frac)
                    dW = int(W//2 * args.precrop_frac)
                    coords = torch.stack(
                        torch.meshgrid(
                            torch.linspace(H//2 - dH, H//2 + dH - 1, 2*dH), 
                            torch.linspace(W//2 - dW, W//2 + dW - 1, 2*dW)
                        ), -1)
                    if i == start:
                        print(f"[Config] Center cropping of size {2*dH} x {2*dW} is enabled until iter {args.precrop_iters}")                
                else:
                    coords = torch.stack(torch.meshgrid(torch.linspace(0, H-1, H), torch.linspace(0, W-1, W)), -1)  # (H, W, 2)

                coords = torch.reshape(coords, [-1,2])  # (H * W, 2)
                select_inds = np.random.choice(coords.shape[0], size=[N_rand], replace=False)  # (N_rand,)
                select_coords = coords[select_inds].long()  # (N_rand, 2)
                rays_o = rays_o[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
                rays_d = rays_d[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
                batch_rays = torch.stack([rays_o, rays_d], 0)
                target_s = target[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
                # print(target_s)

1. Cropping 부분.(lego)

if N_rand is not None:
   rays_o, rays_d = get_rays(H, W, K, torch.Tensor(pose))  # (H, W, 3), (H, W, 3)

   if i < args.precrop_iters:
        dH = int(H//2 * args.precrop_frac)
        dW = int(W//2 * args.precrop_frac)
        coords = torch.stack(
                 torch.meshgrid(
                 torch.linspace(H//2 - dH, H//2 + dH - 1, 2*dH), 
                 torch.linspace(W//2 - dW, W//2 + dW - 1, 2*dW)
                 ), -1)
        if i == start:
            print(f"[Config] Center cropping of size {2*dH} x {2*dW} is enabled until iter {args.precrop_iters}")                
   else:
       coords = torch.stack(torch.meshgrid(torch.linspace(0, H-1, H), torch.linspace(0, W-1, W)), -1)  # (H, W, 2)

ray_o, rays_d를 get_rays를 통해서 return 받는다. 그 다음, i < args.precrop_iters가 true일 경우 [H,W,3]에 해당하는 이미지에서 정 중앙에 위치하고, 면적이 기존 면적에서 1/4에 해당하는 이미지를 cropping하여 indexing 해준다.

ex) lego.blend의 경우, 학습 초기에 center cropping을 진행한다. 기존 이미지는 [400,400,3] i < args.precrop_iters 일 경우( precrop_iter 전까지 초기 학습에서는 image에서 center를 중점적으로 학습한다.) 다음과 같이 [400,400,3] 이미지에서 가운데 사각형 영역으로 coords가 indexing 된 것을 확인할 수 있다!

하단의 그림 처럼 cropping된 [200 $\times$ 200]영역에서 random으로 N_rand개의 pixel을 선택해서 ray로 만들어 nerf Network에 input으로 넣어준다.

2. Ray random sampling.

coords = torch.reshape(coords, [-1,2])  # (H * W, 2)
select_inds = np.random.choice(coords.shape[0], size=[N_rand], replace=False)  # (N_rand,)
select_coords = coords[select_inds].long()  # (N_rand, 2)
rays_o = rays_o[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
rays_d = rays_d[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
batch_rays = torch.stack([rays_o, rays_d], 0)
target_s = target[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)

parser에서 학습 초기에 N_rand 값을 외부에서 사용자 지정 값으로 입력 받았다. N_rand는 random ray의 갯수인데, 이는 위 코드에서 활용된다. torch.reshape(coords, [-1,2])를 통해서 [H,W,3]이였던 shape을 [H * W, 2] shape으로 변경해주고, np.random.choid를 통해서 (0 ~ H * W) 숫자 중에서 random으로 N_rand 갯수 만큼 숫자를 뽑는다. 해당 숫자들은 ray의 index에 해당되고, 선정된 index로 부터 rays_o, rays_d, target을 지정해준다.

rays_o: ray 시작점 위치 rays_d: ray 방향 target_s: Index에 해당하는 이미지의 pixel RGB value

HeyHo.log

특정 Layer의 weight를 고정시키기

tensorf 정리

WGAN

- WGAN

1. Why Wasserstein is better than Js or KL?

(1) suppose that we have two probaility distribution P and Q

(2) when $\theta \neq 0$

(3) when $\theta = 0$

2. Kantorovich-Rubinstein duality

(1) Highly intractable term in inf

이렇게 Wasserstein distance를 duaility를 통해서 다르게 정의할 수 있다. 그렇다면 $f$는 무엇인가?

(2) Find optimal $f(x)$

$f$는 말 그대로, $\mathbb{E}{x \sim p_r}[f(x)]-\mathbb{E}{x \sim p_\theta}[f(x)]$의 값을 최대로 만족하는 '어떠한' function $f$이다.

(3) Generator update process

Neural network를 통해서 $f_w$를 구한 다음, Generator의 parameter $\theta$를 update 시켜준다.

(4) Weight Clipping

(5) Total training process

Lipshitz Continous Visualization

한줄 정리 - VAE

Markdown 수식 모음

Stanford CS230 Career Advice

nn.dataparrel 사용시 GPU 몰림 현상

수식 이미지를 text로 변환시켜주는 사이트

torch.unsqueeze()와 torch.flatten()의 차이

optimizer.zero_grad()는 왜 사용하는가?

nn.module에서 CPU, GPU 혼합 사용으로 인한 error

colab에서 성가신 자동완성 window 끄기

1. 다음과 같이 상단에 톱니바퀴 모양의 설정 메뉴를 클릭한다.

2. 그 다음, 편집기에서 '코드 완성 제안을 자동으로 표시' 기능을 꺼주도록 한다.

출처 : https://stackoverflow.com/questions/63696360/google-colab-how-to-turn-off-suggestion-window

point들 사이의 중점 계산하기

NeRF Code Review - def raw2outputs

전체 코드

1. alpha, dists 구하기

출처: https://towardsdatascience.com/its-nerf-from-nothing-build-a-vanilla-nerf-with-pytorch-7846e4c45666

2. $T_i$, RGB, Depth, Disp, acc map 구하기.

2.1 Weight식 깔끔하게 정리하기.

NeRF Code Review - def render_rays

render_rays 함수에 대해 알아보자.

1. ray_batch로 부터 ray_o, rays_d, near focal 나누기.

2. Stratified sampling

작성중

3. Hierarchical volume Sampling

NeRF Code Review - def batchify(fn, chunk)

NeRF Code Review - class NeRF(nn.Module)

256(feature dim) + 24(direction dim - embedded by encoding) + 3(original direction) = 283

NeRF code review - def get_embedder (작성중)

- positional Encoding with code

NeRF Code Review - def train() 내부 ray sampling (작성중)

전체 코드

1. Cropping 부분.(lego)

2. Ray random sampling.