<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>HeyHo.log</title>
        <link>https://velog.io/</link>
        <description>Coputer vision, AI</description>
        <lastBuildDate>Tue, 07 Mar 2023 06:20:19 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>HeyHo.log</title>
            <url>https://velog.velcdn.com/images/coma_403/profile/776b7d6f-e409-44f2-89b8-77db116dc50a/image.png</url>
            <link>https://velog.io/</link>
        </image>
        <copyright>Copyright (C) 2019. HeyHo.log. All rights reserved.</copyright>
        <atom:link href="https://v2.velog.io/rss/coma_403" rel="self" type="application/rss+xml"/>
        <item>
            <title><![CDATA[특정 Layer의 weight를 고정시키기]]></title>
            <link>https://velog.io/@coma_403/%ED%8A%B9%EC%A0%95-Layer%EC%9D%98-weight%EB%A5%BC-%EA%B3%A0%EC%A0%95%EC%8B%9C%ED%82%A4%EA%B8%B0</link>
            <guid>https://velog.io/@coma_403/%ED%8A%B9%EC%A0%95-Layer%EC%9D%98-weight%EB%A5%BC-%EA%B3%A0%EC%A0%95%EC%8B%9C%ED%82%A4%EA%B8%B0</guid>
            <pubDate>Tue, 07 Mar 2023 06:20:19 GMT</pubDate>
            <description><![CDATA[<pre><code class="language-python">import torch
import torch.nn as nn
import torch.optim as optim

# Load a pre-trained model
model = torch.hub.load(&#39;pytorch/vision&#39;, &#39;resnet18&#39;, pretrained=True)

# Fix the weights of the first 5 layers
for param in model.parameters():
    param.requires_grad = False

for param in model.layer4.parameters():
    param.requires_grad = True

# Define a new classifier
num_features = model.fc.in_features
model.fc = nn.Linear(num_features, 2)

# Define a loss function
criterion = nn.CrossEntropyLoss()

# Define an optimizer
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Train the model
for epoch in range(num_epochs):
    # Forward pass
    outputs = model(inputs)
    loss = criterion(outputs, labels)

    # Backward pass for unfrozen layers
    optimizer.zero_grad()
    loss.backward()

    # Update the weights for unfrozen layers
    optimizer.step()
</code></pre>
]]></description>
        </item>
        <item>
            <title><![CDATA[tensorf 정리]]></title>
            <link>https://velog.io/@coma_403/tensorf-%EC%A0%95%EB%A6%AC</link>
            <guid>https://velog.io/@coma_403/tensorf-%EC%A0%95%EB%A6%AC</guid>
            <pubDate>Tue, 07 Feb 2023 06:46:15 GMT</pubDate>
            <description><![CDATA[<pre><code>
TensorVMSplit(
  (density_plane): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x16x100x100 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x16x100x100 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x16x100x100 (GPU 0)]
  )
  (density_line): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x16x100x1 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x16x100x1 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x16x100x1 (GPU 0)]
  )
  (app_plane): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x48x100x100 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x48x100x100 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x48x100x100 (GPU 0)]
  )
  (app_line): ParameterList(
      (0): Parameter containing: [torch.float32 of size 1x48x100x1 (GPU 0)]
      (1): Parameter containing: [torch.float32 of size 1x48x100x1 (GPU 0)]
      (2): Parameter containing: [torch.float32 of size 1x48x100x1 (GPU 0)]
  )
  (basis_mat): Linear(in_features=144, out_features=27, bias=False)
  (renderModule): MLPRender_Fea(
    (mlp): Sequential(
      (0): Linear(in_features=150, out_features=128, bias=True)
      (1): ReLU(inplace=True)
      (2): Linear(in_features=128, out_features=128, bias=True)
      (3): ReLU(inplace=True)
      (4): Linear(in_features=128, out_features=3, bias=True)
    )
  )
)</code></pre><p>renderModule에서 nn.Linear의 첫번쨰 intput 같은 경우에는, 
2 $\times$ viewpe $\times$ 3 + 
2 $\times$ feape $\times$ P(=27) + 
3 + P(=27)
의 input channel로 nn.Linear를 만든다.</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[WGAN]]></title>
            <link>https://velog.io/@coma_403/WGAN</link>
            <guid>https://velog.io/@coma_403/WGAN</guid>
            <pubDate>Thu, 08 Dec 2022 06:01:54 GMT</pubDate>
            <description><![CDATA[<h1 id="--wgan">- WGAN</h1>
<ul>
<li>WGAN propose using the Wassertein distance to measure the distance between the two distribution.</li>
</ul>
<h2 id="1-why-wasserstein-is-better-than-js-or-kl">1. Why Wasserstein is better than Js or KL?</h2>
<h3 id="1-suppose-that-we-have-two-probaility-distribution-p-and-q">(1) suppose that we have two probaility distribution P and Q</h3>
<p>다음과 같이 distribution이 겹치지 않는 두 개의 joint propability distribution이 있다고 가정해보자.
<img src="https://velog.velcdn.com/images/coma_403/post/e6fdde4b-4e0b-4e9e-95f8-b0032571acb1/image.png" width="40%" height="50%"></p>
<ul>
<li>$\forall(x, y) \in P, x=0$ and $y \sim U(0,1)$</li>
<li>$\forall(x, y) \in Q, x=\theta, 0 \leq \theta \leq 1$, and $y \sim U(0,1)$<h3 id="2-when-theta-neq-0">(2) when $\theta \neq 0$</h3>
다음과 같은 조건에서 KL divergence 와 JS divergence를 구해보면,</li>
<li>$D_{K L}(P | Q)=\sum_{x=0, y \sim U(0,1)} 1 \cdot \log \frac{1}{0}=+\infty$</li>
<li>$D_{K L}(Q | P)=\sum_{x=\theta, y \sim U(0,1)} 1 \cdot \log \frac{1}{0}=+\infty$</li>
<li>$D_{J S}(P, Q)=\frac{1}{2}\left(\sum_{x=0, y \sim U(0,1)} 1 \cdot \log \frac{1}{\frac{1}{2}}+\sum_{x=0, y \sim U(0,1)} 1 \cdot \log \frac{1}{\frac{1}{2}}\right)=\log 2$
으로 Gradient가 0인 것을 확인할 수 있다.</li>
</ul>
<p>하지만, Wasserstein distance의 경우에는 2개의 distribution을 가장 가까이 겹치게 하는 방법은 line을 따라서 옮기는 것이므로, </p>
<ul>
<li>$W(P,Q) = |\theta|$</li>
</ul>
<h3 id="3-when-theta--0">(3) when $\theta = 0$</h3>
<ul>
<li>$D_{K L}(P | Q)=D_{K L}(Q | P)=D_{J S}(P, Q)=0$</li>
<li>$W(P,Q) = 0 = |\theta|$</li>
</ul>
<p>(2), (3)의 경우를 통해서 보았을 때, wasserstien distance가 KL divergence와 JS divergence보다 훨씬 <span style="color:red">smooth</span>한 distance measure를 가지고 있는 것을 알 수 있다.</p>
<h2 id="2-kantorovich-rubinstein-duality">2. Kantorovich-Rubinstein duality</h2>
<p>우리는 지금까지 2개의 distribution이 겹치지 않는 경우에 Wasserstien distance가 KL, JS divergence 보다 distribution measure 측면에서 훨씬 유의미한 정보를 제공하는 것을 확인하였다. 
하지만, $W\left(p_r, p_\theta\right)=\inf <em>{\gamma \sim \Pi\left(p_r, p_\theta\right)} \mathbb{E}</em>{(x, y) \sim \gamma}[|x-y|]$ 에서 
${\gamma \sim \Pi\left(p_r, p_\theta\right)}$는 가능한 모든 joint distribution의 set이기 때문에, 
이를 <span style="color:red">모두 고려하여</span> wassertein distance를 구한다는 것은 사실상 <span style="color:red">불가능</span>하다.
이러한 primal problem을 풀기 위해 <span style="color:red">dual problem</span>으로 바꿔서 문제를 해결하는 <span style="color:red">Kantorovish-Rubinstein duality</span>가 등장한다.</p>
<h3 id="1-highly-intractable-term-in-inf">(1) Highly intractable term in inf</h3>
<p>$W\left(\mathbb{P}<em>r, \mathbb{P}_g\right)=\inf _{\gamma \in \Pi\left(\mathbb{P}_r, \mathbb{P}_g\right)} \mathbb{E}</em>{(x, y) \sim \gamma}[|x-y|]$ 는 highly intractable하다.
따라서 <strong>Kantorovish-Rubinstein duality</strong>를 통해, Lipschitz continuous 조건을 만족하는 &#39;어떠한&#39;  function $f: X \rightarrow R$를 통해서 다음과 같은 dual problem으로 문제를 해결한다.</p>
<ul>
<li>$W\left(p_r, p_\theta\right)=\frac{1}{K} \sup <em>{|f|</em>{L \leq K}} \mathbb{E}<em>{x \sim p_r}[f(x)]-\mathbb{E}</em>{x \sim p_\theta}[f(x)]$</li>
</ul>
<h4 id="이렇게-wasserstein-distance를-duaility를-통해서-다르게-정의할-수-있다-그렇다면-f는-무엇인가">이렇게 Wasserstein distance를 duaility를 통해서 다르게 정의할 수 있다. 그렇다면 $f$는 무엇인가?</h4>
<h3 id="2-find-optimal-fx">(2) Find optimal $f(x)$</h3>
<h4 id="f는-말-그대로-mathbbex-sim-p_rfx-mathbbex-sim-p_thetafx의-값을-최대로-만족하는-어떠한-function-f이다">$f$는 말 그대로, $\mathbb{E}<em>{x \sim p_r}[f(x)]-\mathbb{E}</em>{x \sim p_\theta}[f(x)]$의 값을 최대로 만족하는 &#39;어떠한&#39; function $f$이다.</h4>
<p>이러한 <span style="color:red">$f$</span>를 잘 <span style="color:red"><strong>&#39;추정&#39;</strong></span> 하기 위해서 parameter $w$를 가지는 neural network를 사용하여 다음과 같은 수식을 만족시키는 $f_w$를 추정해준다. (neural network 는 universal function approximator 이기 때문에 neural net을 사용하여 $f$를 추정한다.)
$\max <em>{w \in W} \mathbb{E}</em>{x \sim p_r}\left[f_w(x)\right]-\mathbb{E}<em>{x \sim p_\theta}\left[f_w(x)\right] \leq \sup _{|f|</em>{L \leq K}} \mathbb{E}<em>{x \sim p_r}[f(x)]-\mathbb{E}</em>{x \sim p_\theta}[f(x)]=K \cdot W(P_r, P_\theta)$</p>
<p>우리는 그저 sup 형태로 존재하는 $f(x)$를 잘 추정하면 되기 때문에, 굳이 $K$에 대해 자세하게 알 필요가 없다. </p>
<ul>
<li><p>$\max <em>{w \in W} \mathbb{E}</em>{x \sim p_r}\left[f_w(x)\right]-\mathbb{E}_{x \sim p_\theta}\left[f_w(x)\right]$ 을 만족하는 $f_w(x)$를 찾기 위해 gradient를 구해준다.</p>
</li>
<li><p>$\nabla_w[f_w(x)-f(g_\theta(z))]$ 를 통해서 sup problem의 solution인 $f(x)$를 추정해주는 $f_w(x)$의 parameter $w$를 update한다.</p>
</li>
</ul>
<h3 id="3-generator-update-process">(3) Generator update process</h3>
<h4 id="neural-network를-통해서-f_w를-구한-다음-generator의-parameter-theta를-update-시켜준다">Neural network를 통해서 $f_w$를 구한 다음, Generator의 parameter $\theta$를 update 시켜준다.</h4>
<p>$\begin{aligned} \nabla_\theta W\left(p_r, p_g\right) &amp; =\nabla_\theta\left(\mathbb{E}<em>{x \sim p_r}\left[f_w(x)\right]-\mathbb{E}</em>{z \sim Z}\left[f_w\left(g_\theta(z)\right)\right]\right) \ &amp; =-\mathbb{E}_{z \sim Z}\left[\nabla_\theta f_w\left(g_\theta(z)\right)\right]\end{aligned}$</p>
<h3 id="4-weight-clipping">(4) Weight Clipping</h3>
<p>초기에 sup problem을 생각해보면, <strong>Kantorovish-Rubinstein duality</strong>를 통해서 Primal problem을 dual problem으로 끌고 간다. 이 때, Lipschitz Continuous를 만족해야 한다는 조건이 붙는다. Neural Network를 통해서 $f_w(x)$를 추정할 때, $f_w(x)$또한 Lipschitz Continuous를 만족해야 한다. Neural network에서 $f_w$의 Gradient는 곧 weight이기 때문에, Weight Clipping 해주면 그 것 자체로 Lipschitz Continuous를 만족시키는 것이다. 따라서, Weight Clipping을 통해서 Lipschitz Countinuous를 만족시켜준다.</p>
<h3 id="5-total-training-process">(5) Total training process</h3>
<p>전체적인 Process의 pseudo code이다.</p>
<ol>
<li>우선, Earth Mover&#39;s distance optimization problem을 dual problem으로 바꾸어준다. 그 다음, 고정된 $\theta$에서 Dual problem solution의 approximated function $f_w(x)$를 training을 통해서서 찾아준다.</li>
<li>Wasserstein distance를 backprop해서 generator의 파라미터를 update 시켜준다.<img src="https://velog.velcdn.com/images/coma_403/post/564e4612-3c78-40b6-a2cd-3a7924ded417/image.png" width="80%" height="50%">
</li>
</ol>
]]></description>
        </item>
        <item>
            <title><![CDATA[Lipshitz Continous Visualization]]></title>
            <link>https://velog.io/@coma_403/Lipshitz-Continous-Visualization</link>
            <guid>https://velog.io/@coma_403/Lipshitz-Continous-Visualization</guid>
            <pubDate>Thu, 08 Dec 2022 03:26:41 GMT</pubDate>
            <description><![CDATA[<p><a href="https://www.desmos.com/calculator/dobs3sfeiv?lang=ko">https://www.desmos.com/calculator/dobs3sfeiv?lang=ko</a></p>
]]></description>
        </item>
        <item>
            <title><![CDATA[한줄 정리 - VAE]]></title>
            <link>https://velog.io/@coma_403/%ED%95%9C%EC%A4%84-%EC%A0%95%EB%A6%AC-VAE</link>
            <guid>https://velog.io/@coma_403/%ED%95%9C%EC%A4%84-%EC%A0%95%EB%A6%AC-VAE</guid>
            <pubDate>Wed, 07 Dec 2022 05:42:26 GMT</pubDate>
            <description><![CDATA[<p>VAE는 image generation 분야에서 GAN 이전시대에 주름 잡던 모델이다.</p>
<p align="center" style="..."> 
  <img style="..." src="https://velog.velcdn.com/images/coma_403/post/a4d6c0a6-8d9a-4264-a0d4-bdbabbe35a30/image.png" alt="text" width="number"/>
VAE 구조.
</p>

<p>한줄 정리: Encoder를 통해서 input data x를 latent space로 보내고, latent space의 mean과 variance를 구한다. mean과 variance에서 stochastic sample인 $\epsilon$ ~ $N(0,1)$를 통해서 z를 구한다. 해당 z를 다시 decoder에 넣어서 input data x를 복원한다.</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[Markdown 수식 모음]]></title>
            <link>https://velog.io/@coma_403/Markdown-%EC%88%98%EC%8B%9D-%EB%AA%A8%EC%9D%8C</link>
            <guid>https://velog.io/@coma_403/Markdown-%EC%88%98%EC%8B%9D-%EB%AA%A8%EC%9D%8C</guid>
            <pubDate>Fri, 02 Dec 2022 07:52:18 GMT</pubDate>
            <description><![CDATA[<p><a href="https://ko.wikipedia.org/wiki/%EC%9C%84%ED%82%A4%EB%B0%B1%EA%B3%BC:TeX_%EB%AC%B8%EB%B2%95">https://ko.wikipedia.org/wiki/%EC%9C%84%ED%82%A4%EB%B0%B1%EA%B3%BC:TeX_%EB%AC%B8%EB%B2%95</a>
이 곳에서 MarkDown 또는 Tex 수식 작성에 필요한 문법을 구할 수 있다.</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[Stanford CS230 Career Advice]]></title>
            <link>https://velog.io/@coma_403/Stanford-CS230-Career-Advice</link>
            <guid>https://velog.io/@coma_403/Stanford-CS230-Career-Advice</guid>
            <pubDate>Fri, 02 Dec 2022 05:03:51 GMT</pubDate>
            <description><![CDATA[<p><a href="https://youtu.be/733m6qBH-jI">https://youtu.be/733m6qBH-jI</a></p>
<p>정규 강의에서 career advice까지 해주다니... 정말 탑 학교는 다르다고 느낀다...</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[nn.dataparrel 사용시 GPU 몰림 현상]]></title>
            <link>https://velog.io/@coma_403/nn.dataparrel-%EC%82%AC%EC%9A%A9%EC%8B%9C-GPU-%EB%AA%B0%EB%A6%BC-%ED%98%84%EC%83%81</link>
            <guid>https://velog.io/@coma_403/nn.dataparrel-%EC%82%AC%EC%9A%A9%EC%8B%9C-GPU-%EB%AA%B0%EB%A6%BC-%ED%98%84%EC%83%81</guid>
            <pubDate>Wed, 30 Nov 2022 12:15:51 GMT</pubDate>
            <description><![CDATA[<p>NeRF 코드를 &#39;model = nn.DataParallel(model).to(device)&#39; 를 사용하여 GPU 분산처리를 시도하였다.</p>
<ul>
<li>문제점
그러나 다음과 같이 한 GPU에 memory가 쏠리는 문제가 발생하였다.
<img src="https://velog.velcdn.com/images/coma_403/post/7941e4a3-7241-4370-8c87-fd676c2a07ca/image.png" alt=""></li>
</ul>
<p>모델에서 weight는 분산되서 사용될 지 몰라도, loss value의 경우 한 GPU에 몰려서 계산된다고 한다.</p>
<ul>
<li>해결방법
loss value도 GPU로 병렬처리 해서 연산할 수 있다고 하는데, 해당 내용과 관련된 링크를 첨부한다.
[링크 1] - <a href="https://medium.com/daangn/pytorch-multi-gpu-%ED%95%99%EC%8A%B5-%EC%A0%9C%EB%8C%80%EB%A1%9C-%ED%95%98%EA%B8%B0-27270617936b">https://medium.com/daangn/pytorch-multi-gpu-%ED%95%99%EC%8A%B5-%EC%A0%9C%EB%8C%80%EB%A1%9C-%ED%95%98%EA%B8%B0-27270617936b</a>
[링크 2] - <a href="https://aigong.tistory.com/186">https://aigong.tistory.com/186</a></li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[수식 이미지를 text로 변환시켜주는 사이트]]></title>
            <link>https://velog.io/@coma_403/%EC%88%98%EC%8B%9D-%EC%9D%B4%EB%AF%B8%EC%A7%80%EB%A5%BC-text%EB%A1%9C-%EB%B3%80%ED%99%98%EC%8B%9C%EC%BC%9C%EC%A3%BC%EB%8A%94-%EC%82%AC%EC%9D%B4%ED%8A%B8</link>
            <guid>https://velog.io/@coma_403/%EC%88%98%EC%8B%9D-%EC%9D%B4%EB%AF%B8%EC%A7%80%EB%A5%BC-text%EB%A1%9C-%EB%B3%80%ED%99%98%EC%8B%9C%EC%BC%9C%EC%A3%BC%EB%8A%94-%EC%82%AC%EC%9D%B4%ED%8A%B8</guid>
            <pubDate>Mon, 21 Nov 2022 04:13:12 GMT</pubDate>
            <description><![CDATA[<p><a href="https://snip.mathpix.com/">https://snip.mathpix.com/</a>
굉장히 유용하다.
<img src="https://velog.velcdn.com/images/coma_403/post/08edc0fd-e9e3-49c1-b781-f9dc5615db48/image.png" alt=""></p>
<ul>
<li>수식이 담긴 이미지를 그냥 업로드만 하면, Text, Latex 등등 다양한 형태의 수식으로 변환해준다. 진짜 정말 편하다.</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[torch.unsqueeze()와 torch.flatten()의 차이]]></title>
            <link>https://velog.io/@coma_403/torch.unsqueeze%EC%99%80-torch.flatten%EC%9D%98-%EC%B0%A8%EC%9D%B4</link>
            <guid>https://velog.io/@coma_403/torch.unsqueeze%EC%99%80-torch.flatten%EC%9D%98-%EC%B0%A8%EC%9D%B4</guid>
            <pubDate>Wed, 16 Nov 2022 11:51:13 GMT</pubDate>
            <description><![CDATA[<ol>
<li>일반적인 torch array</li>
</ol>
<ul>
<li>코드<pre><code class="language-python">import torch
x = torch.linspace(-1,1,20)
x.shape</code></pre>
</li>
<li>결과<img src="https://velog.velcdn.com/images/coma_403/post/04214f29-9f7d-468d-af93-69e76537221b/image.png" alt=""></li>
</ul>
<ol start="2">
<li>torch.unsqueeze()</li>
</ol>
<ul>
<li>코드<pre><code class="language-python">import torch
x = torch.linspace(-1,1,20).unsqueeze(dim=1)
x.shape</code></pre>
</li>
<li>결과
<img src="https://velog.velcdn.com/images/coma_403/post/2e51773e-717d-4b38-8dbe-4dd6933213a4/image.png" alt=""></li>
</ul>
<ol start="3">
<li>torch.flatten()</li>
</ol>
<ul>
<li>코드<pre><code class="language-python">x.flatten()
x.shape</code></pre>
</li>
<li>결과
<img src="https://velog.velcdn.com/images/coma_403/post/ab1e1e47-3fdc-41c7-994e-a71fd94fc527/image.png" alt=""></li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[optimizer.zero_grad()는 왜 사용하는가?]]></title>
            <link>https://velog.io/@coma_403/optimizer.zerograd%EB%8A%94-%EC%99%9C-%EC%82%AC%EC%9A%A9%ED%95%98%EB%8A%94%EA%B0%80</link>
            <guid>https://velog.io/@coma_403/optimizer.zerograd%EB%8A%94-%EC%99%9C-%EC%82%AC%EC%9A%A9%ED%95%98%EB%8A%94%EA%B0%80</guid>
            <pubDate>Wed, 16 Nov 2022 09:31:52 GMT</pubDate>
            <description><![CDATA[<p>optimizer.zero_grad()는 왜 사용하는가?
답은 간단하다. loss.backward()를 통해서 tensor의 gradient가 연산될 때, gradient값이 축적되는걸 막아주기 위해서이다.
<img src="https://velog.velcdn.com/images/coma_403/post/e74dc683-f9ff-4e03-b427-49bbbf13819f/image.png" alt="">
다음과 같이 $a2 =  2 \times a1$으로 정의한 후, backward()를 진행하면, print(a1.grad)를 통해서 a1의 gradient를 알 수 있다. a1.grad의 결과가 tensor(2.)로 나온 것을 확인할 수 있다.</p>
<p>하지만, 똑같이 a2.backward()를 실행해주면,
<img src="https://velog.velcdn.com/images/coma_403/post/dbbfd5ba-81fe-497a-9660-c4253fa9ca3f/image.png" alt=""> 다음과 같이 a1의 gradient값이 4로 나온 것을 확인할 수 있다.
원래는 $a2 =  2 \times a1$이기 때문에 a1.grad값이 2가 나와야 하지만 optimizer.zero_grad()를 통해서 gradient 값이 축적되는 걸 막지 않았기 때문에, a1.grad 값이 4가 나온 것을 확인할 수 있다.</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[nn.module에서 CPU, GPU 혼합 사용으로 인한 error]]></title>
            <link>https://velog.io/@coma_403/nn.module%EC%97%90%EC%84%9C-CPU-GPU-%ED%98%BC%ED%95%A9-%EC%82%AC%EC%9A%A9%EC%9C%BC%EB%A1%9C-%EC%9D%B8%ED%95%9C-error</link>
            <guid>https://velog.io/@coma_403/nn.module%EC%97%90%EC%84%9C-CPU-GPU-%ED%98%BC%ED%95%A9-%EC%82%AC%EC%9A%A9%EC%9C%BC%EB%A1%9C-%EC%9D%B8%ED%95%9C-error</guid>
            <pubDate>Wed, 16 Nov 2022 06:51:30 GMT</pubDate>
            <description><![CDATA[<pre><code class="language-python">#all tensors of each operation should be in same device
class Module(nn.Module):
  def __init__(self):
    super().__init__()
    self.network = nn.Sequential(nn.Linear(1000, 1000), nn.Linear(1000, 100))

  def forward(self, x):
    return self.network(x)</code></pre>
<p>다음과 같이 module이 정의될 때, 아래와 같이 연산을 진행해보았다.</p>
<pre><code class="language-python">module = Module().to(0)
x = torch.zeros(1000, 1000).to(0)
module(x)</code></pre>
<p><img src="https://velog.velcdn.com/images/coma_403/post/8dff08db-18eb-4478-b12b-a5c4e4ad4ad0/image.png" alt="">
module에서 to(0)연산을 통해 module의 parameter를 GPU로 보냈지만, x는 여전히 cpu에 남아있기 때문에 
&#39;RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!&#39; 에러가 발생하였다.</p>
<pre><code class="language-python">module = Module().to(0)
x = torch.zeros(1000, 1000)
module(x)</code></pre>
<p><img src="https://velog.velcdn.com/images/coma_403/post/a72c3c0e-f14e-4013-8b30-2c768d2273de/image.png" alt=""></p>
<p>똑같이 x의 device를 GPU로 보내주면, device에서 없이 연산이 잘 된 것을 확인할 수 있다.</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[colab에서 성가신 자동완성 window 끄기]]></title>
            <link>https://velog.io/@coma_403/colab%EC%97%90%EC%84%9C-%EC%84%B1%EA%B0%80%EC%8B%A0-%EC%9E%90%EB%8F%99%EC%99%84%EC%84%B1-window-%EB%81%84%EA%B8%B0</link>
            <guid>https://velog.io/@coma_403/colab%EC%97%90%EC%84%9C-%EC%84%B1%EA%B0%80%EC%8B%A0-%EC%9E%90%EB%8F%99%EC%99%84%EC%84%B1-window-%EB%81%84%EA%B8%B0</guid>
            <pubDate>Wed, 16 Nov 2022 06:21:36 GMT</pubDate>
            <description><![CDATA[<p><img src="https://velog.velcdn.com/images/coma_403/post/05443729-f621-4641-a864-d246d1441410/image.png" alt=""></p>
<ul>
<li>colab에서 코딩을 하다 보면, 다음과 같이 자동완성 창이 보조 설명으로 떠서 위에 있는 코드들을 가린다. 너무 성가신데, 이 창 이름이 뭔지 몰라서 해결 방법을 어떻게 검색할지도 몰랐었다.
그냥 구글에 colab annoying window라고 검색하니까 바로 결과가 바로 나와버렸다...
<img src="https://velog.velcdn.com/images/coma_403/post/0062d4ae-da2e-4271-9e0a-1cb879bd00c6/image.png" alt=""></li>
</ul>
<p align = "center">
역시 구글 형님들 짱...
</p>

<p><img src="https://velog.velcdn.com/images/coma_403/post/766de1ae-9ab5-4ff7-96f9-9896f5e12066/image.png" alt=""></p>
<h4 id="1-다음과-같이-상단에-톱니바퀴-모양의-설정-메뉴를-클릭한다">1. 다음과 같이 상단에 톱니바퀴 모양의 설정 메뉴를 클릭한다.</h4>
<p><img src="https://velog.velcdn.com/images/coma_403/post/0e9c4e85-7f8a-4266-bfa0-d1a52612b8b7/image.png" alt=""></p>
<h4 id="2-그-다음-편집기에서-코드-완성-제안을-자동으로-표시-기능을-꺼주도록-한다">2. 그 다음, 편집기에서 &#39;코드 완성 제안을 자동으로 표시&#39; 기능을 꺼주도록 한다.</h4>
<h4 id="출처--httpsstackoverflowcomquestions63696360google-colab-how-to-turn-off-suggestion-window">출처 : <a href="https://stackoverflow.com/questions/63696360/google-colab-how-to-turn-off-suggestion-window">https://stackoverflow.com/questions/63696360/google-colab-how-to-turn-off-suggestion-window</a></h4>
]]></description>
        </item>
        <item>
            <title><![CDATA[point들 사이의 중점 계산하기]]></title>
            <link>https://velog.io/@coma_403/point%EB%93%A4-%EC%82%AC%EC%9D%B4%EC%9D%98-%EC%A4%91%EC%A0%90-%EA%B3%84%EC%82%B0%ED%95%98%EA%B8%B0</link>
            <guid>https://velog.io/@coma_403/point%EB%93%A4-%EC%82%AC%EC%9D%B4%EC%9D%98-%EC%A4%91%EC%A0%90-%EA%B3%84%EC%82%B0%ED%95%98%EA%B8%B0</guid>
            <pubDate>Tue, 08 Nov 2022 04:42:15 GMT</pubDate>
            <description><![CDATA[<pre><code class="language-python">import torch

x = torch.tensor([1.0,2.0,4.0,10.0,12])
x_mids = 0.5*(a[1:] + a[:-1])
y = [1,1,1,1,1]
y_mids = [1,1,1,1]

plt.plot(x,y, &#39;ro&#39;, label=&#39;org_points&#39;)
plt.plot(x_mids,y_mids, &#39;b*&#39;, label=&#39;mids&#39;)
plt.legend()
plt.show()</code></pre>
<p><img src="https://velog.velcdn.com/images/coma_403/post/76e53d05-753a-4871-ba16-5d5563b2dd14/image.png" alt="">
해당 포인트들 사이의 중점들을 계산해준다.</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[NeRF Code Review - def raw2outputs]]></title>
            <link>https://velog.io/@coma_403/NeRF-Code-Review-def-raw2outputs</link>
            <guid>https://velog.io/@coma_403/NeRF-Code-Review-def-raw2outputs</guid>
            <pubDate>Fri, 04 Nov 2022 08:42:49 GMT</pubDate>
            <description><![CDATA[<blockquote>
<p>run_nerf.py 파일 안에 있다.</p>
</blockquote>
<blockquote>
<p>Input</p>
<blockquote>
<p>raw: [N_rand, N_samples, 3+1], NeRF network로 부터 estimation 된 RGB$\sigma$ output
z_vals: [N_rand, N_samples] 코드에서는 Integration time이라고 하는데 이게 뭐지...
rays_d: [N_rand, 3] direction of each ray
white_bkgd: 흰색 배경 flag
pytest -&gt; 이건 뭐?</p>
</blockquote>
</blockquote>
<blockquote>
<p>변수 부가 설명</p>
<blockquote>
<p>rgb_map: [N_rand, 3] Estimated RGB color of a ray
disp_map: [N_rand</p>
</blockquote>
</blockquote>
<h2 id="전체-코드">전체 코드</h2>
<pre><code class="language-python">def raw2outputs(raw, z_vals, rays_d, raw_noise_std=0, white_bkgd=False, pytest=False):
    &quot;&quot;&quot;Transforms model&#39;s predictions to semantically meaningful values.
    Args:
        raw: [num_rays, num_samples along ray, 4]. Prediction from model.
        z_vals: [num_rays, num_samples along ray]. Integration time.
        rays_d: [num_rays, 3]. Direction of each ray.
    Returns:
        rgb_map: [num_rays, 3]. Estimated RGB color of a ray.
        disp_map: [num_rays]. Disparity map. Inverse of depth map.
        acc_map: [num_rays]. Sum of weights along each ray.
        weights: [num_rays, num_samples]. Weights assigned to each sampled color.
        depth_map: [num_rays]. Estimated distance to object.
    &quot;&quot;&quot;
    raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)

    dists = z_vals[...,1:] - z_vals[...,:-1]
    dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1)  # [N_rays, N_samples]

    dists = dists * torch.norm(rays_d[...,None,:], dim=-1)

    rgb = torch.sigmoid(raw[...,:3])  # [N_rays, N_samples, 3]
    noise = 0.
    if raw_noise_std &gt; 0.:
        noise = torch.randn(raw[...,3].shape) * raw_noise_std

        # Overwrite randomly sampled data if pytest
        if pytest:
            np.random.seed(0)
            noise = np.random.rand(*list(raw[...,3].shape)) * raw_noise_std
            noise = torch.Tensor(noise)

    alpha = raw2alpha(raw[...,3] + noise, dists)  # [N_rays, N_samples]
    # weights = alpha * tf.math.cumprod(1.-alpha + 1e-10, -1, exclusive=True)
    weights = alpha * torch.cumprod(torch.cat([torch.ones((alpha.shape[0], 1)), 1.-alpha + 1e-10], -1), -1)[:, :-1]
    rgb_map = torch.sum(weights[...,None] * rgb, -2)  # [N_rays, 3]

    depth_map = torch.sum(weights * z_vals, -1)
    disp_map = 1./torch.max(1e-10 * torch.ones_like(depth_map), depth_map / torch.sum(weights, -1))
    acc_map = torch.sum(weights, -1)

    if white_bkgd:
        rgb_map = rgb_map + (1.-acc_map[...,None])

    return rgb_map, disp_map, acc_map, weights, depth_map</code></pre>
<h3 id="1-alpha-dists-구하기">1. alpha, dists 구하기</h3>
<pre><code class="language-python">    raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)

    dists = z_vals[...,1:] - z_vals[...,:-1]
    dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1)  # [N_rays, N_samples]

    dists = dists * torch.norm(rays_d[...,None,:], dim=-1)

    rgb = torch.sigmoid(raw[...,:3])  # [N_rays, N_samples, 3]</code></pre>
<p align="center">
  <img src=https://velog.velcdn.com/images/coma_403/post/660e1275-b8c1-4025-b5b2-2e573a3c42ae/image.png> 
  <img src = https://velog.velcdn.com/images/coma_403/post/0bcc2985-be1a-492e-b087-22a7552a7a80/image.png width="500"</p>


<ul>
<li>raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)</li>
</ul>
<p>-&gt; 5.2 Hierarchical volume sampling 에서 등장하는 weight term과 alpha term이다.
$$\sigma_i$$는 Volume density를, $$\delta_i$$는 하나의 ray에서 sampling point들 사이의 distance($$t_{i+1} - t_i$$)를 뜻한다.
raw2alpha는 말 그대로 <span style="color: red">raw data에서 paper의 alpha로 값을 mapping한다.</span>
$$
raw2alpha := 1-exp{(-ReLU(x) \times dists)}
$$
|수식|코드|
|------|---|
|$$\alpha_i = 1 - exp(-\sigma_i\delta_i)$$| raw2alpha = lambda raw, dists, act_fn=F.relu: 1.-torch.exp(-act_fn(raw)*dists)|</p>
<ul>
<li>dists = z_vals[...,1:] - z_vals[...,:-1]<p align="center">
<img src=https://velog.velcdn.com/images/coma_403/post/c69efd74-1c99-49a3-8f89-2ed384c99053/image.png>
<img src=https://velog.velcdn.com/images/coma_403/post/6f044fc0-b457-4e48-9365-13bbdf9ea8ad/image.png>
</p>

</li>
</ul>
<h6 id="출처-httpstowardsdatasciencecomits-nerf-from-nothing-build-a-vanilla-nerf-with-pytorch-7846e4c45666">출처: <a href="https://towardsdatascience.com/its-nerf-from-nothing-build-a-vanilla-nerf-with-pytorch-7846e4c45666">https://towardsdatascience.com/its-nerf-from-nothing-build-a-vanilla-nerf-with-pytorch-7846e4c45666</a></h6>
<p> ray에서 startified sampling을 통해 뽑은 point들 사이의 거리.
뭐 대충 이런 느낌인 것 같다...</p>
<ul>
<li><p>dists = torch.cat([dists, torch.Tensor([1e10]).expand(dists[...,:1].shape)], -1)  # [N_rays, N_samples]</p>
<p> dists = dists * torch.norm(rays_d[...,None,:], dim=-1)</p>
</li>
</ul>
<p>앞에서 구한 stratified sampling point들 사이의 거리인 dists에서 1e10을 concatenate 해주고, ray의 방향 정보인 rays_d의 norm을 구해서 이를 dists에 곱해준다.</p>
<ul>
<li>rgb = torch.sigmoid(raw[...,:3])  # [N_rays, N_samples, 3]
NeRF model의 RGB + volume density estimation 값을 sigmoid를 통해서 0~1 사이로 mapping 해준다. 왜 굳이 또 mapping 해주는거지?  <ul>
<li>그래서 NeRF model의 min, max값을 찍어보았다,<p align="center">
<img src=https://velog.velcdn.com/images/coma_403/post/a5eff39a-dc2f-40ee-af31-a55adc675e72/image.png>
</p></li>
<li>torch.sigmoid로 mapping 이후 rgb min, max 값<p align="center">
<img src=https://velog.velcdn.com/images/coma_403/post/dd5b9309-7a35-47a4-8096-1c7ebff3368e/image.png>
</p></li>
</ul>
0.5 근처로 mapping 된 것을 확인할 수 있다.
<br><br></li>
</ul>
<ul>
<li>alpha = raw2alpha(raw[...,3] + noise, dists)<ul>
<li>raw2alpha function을 통해 alpha 계산.</li>
</ul>
</li>
</ul>
<h3 id="2-t_i-rgb-depth-disp-acc-map-구하기">2. $T_i$, RGB, Depth, Disp, acc map 구하기.</h3>
<h4 id="21-weight식-깔끔하게-정리하기">2.1 Weight식 깔끔하게 정리하기.</h4>
<p>우선, Paper에서 $T_i$는 다음과 같이 정의되어 있다.
<img src="https://velog.velcdn.com/images/coma_403/post/e25fbe24-46ef-4b15-985e-1e80fa607bae/image.png" alt=""></p>
<p>이때, alpha는 <img src="https://velog.velcdn.com/images/coma_403/post/210dcd93-eb04-4bf2-984c-31e3652b821a/image.png" alt=""></p>
<p>$T_i$와 $\alpha_i$ 두 수식이 매우 비슷하게 생기지 않았는가? 이 식을 정리하면 다음과 같다.
$$
1-\alpha_i = exp(-\sigma_i\delta_i)
$$
이를 log sum 형태로 나타내면, 
$$
T_i = \prod_{j = 1}^{i-1}( 1-\alpha_j) = exp(-\sum_{j = 1}^{i-1} \sigma_j\delta_j) 
$$
다음과 같이 나타낼 수 있다.
코드에서는 이렇게 alpha term을 변형하여, torch.cumprod를 통해서 Pi 연산을 한다. 최종적으로 $T_i$를 계산한다.</p>
<pre><code class="language-python">    alpha = raw2alpha(raw[...,3] + noise, dists)  # [N_rays, N_samples]
    # weights = alpha * tf.math.cumprod(1.-alpha + 1e-10, -1, exclusive=True)
    weights = alpha * torch.cumprod(torch.cat([torch.ones((alpha.shape[0], 1)), 1.-alpha + 1e-10], -1), -1)[:, :-1]
    rgb_map = torch.sum(weights[...,None] * rgb, -2)  # [N_rays, 3]

    depth_map = torch.sum(weights * z_vals, -1)
    disp_map = 1./torch.max(1e-10 * torch.ones_like(depth_map), depth_map / torch.sum(weights, -1))
    acc_map = torch.sum(weights, -1)

    if white_bkgd:
        rgb_map = rgb_map + (1.-acc_map[...,None])

    return rgb_map, disp_map, acc_map, weights, depth_map</code></pre>
<ul>
<li><p>weights = alpha * torch.cumprod(torch.cat([torch.ones((alpha.shape[0], 1)), 1.-alpha + 1e-10], -1), -1)[:, :-1]</p>
<ul>
<li>위 코드를 수식으로 나타내면 다음과 같다. </li>
<li><span style="color: red">$$w_i$$</span> $$= T_i(1-exp(-\sigma_i\delta_i) = T_i\alpha_i = \prod_{j = 1}^{i-1}( 1-\alpha_i)\alpha_i$$</li>
<li>torch.cumprod 계산을 통해서 weights를 계산한다.</li>
<li>1e-10 maybe for prevent Nan?<p align="center">
<img src=https://velog.velcdn.com/images/coma_403/post/42ecb7d8-7574-4126-82f4-1c834bf2707b/image.png>
<img src=https://velog.velcdn.com/images/coma_403/post/25ca43bb-bffc-4aab-8b3a-b9ec345d44fb/image.png></p>


</li>
</ul>
</li>
</ul>
<ul>
<li><p>rgb_map = torch.sum(weights[...,None] * rgb, -2)  # [N_rays, 3]</p>
<ul>
<li>위 코드를 수식으로 나타내면 다음과 같다.</li>
<li>$$\hat{C_c}(r) = \sum_{i = 1}^{N_c} w_ic_i$$</li>
</ul>
</li>
<li><p>depth_map = torch.sum(weights * z_vals, -1)</p>
<ul>
<li>그냥 weith에 ray를 startify sampling한 point들에 곱한다.</li>
</ul>
</li>
<li><p>disp_map = 1./torch.max(1e-10 * torch.ones_like(depth_map), depth_map / torch.sum(weights, -1))</p>
<ul>
<li>disparity map을 구한다. inverse depth라고 생각하면 된다.</li>
</ul>
</li>
<li><p>acc_map = torch.sum(weights, -1)</p>
<ul>
<li>이건 뭐지? acc_map의 역할이 뭔지 좀 더 알아봐야겠다.</li>
</ul>
</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[NeRF Code Review - def render_rays]]></title>
            <link>https://velog.io/@coma_403/NeRF-Code-Review-def-renderrays</link>
            <guid>https://velog.io/@coma_403/NeRF-Code-Review-def-renderrays</guid>
            <pubDate>Fri, 04 Nov 2022 07:51:09 GMT</pubDate>
            <description><![CDATA[<h2 id="render_rays-함수에-대해-알아보자">render_rays 함수에 대해 알아보자.</h2>
<p>다음은 render_rays 함수의 전체 code이다. 하나하나 뜯어보도록 하자.</p>
<pre><code class="language-python">def render_rays(ray_batch,
                network_fn,
                network_query_fn,
                N_samples,
                retraw=False,
                lindisp=False,
                perturb=0.,
                N_importance=0,
                network_fine=None,
                white_bkgd=False,
                raw_noise_std=0.,
                verbose=False,
                pytest=False):
    &quot;&quot;&quot;Volumetric rendering.
    Args:
      ray_batch: array of shape [batch_size, ...]. All information necessary
        for sampling along a ray, including: ray origin, ray direction, min
        dist, max dist, and unit-magnitude viewing direction.
      network_fn: function. Model for predicting RGB and density at each point
        in space.
      network_query_fn: function used for passing queries to network_fn.
      N_samples: int. Number of different times to sample along each ray.
      retraw: bool. If True, include model&#39;s raw, unprocessed predictions.
      lindisp: bool. If True, sample linearly in inverse depth rather than in depth.
      perturb: float, 0 or 1. If non-zero, each ray is sampled at stratified
        random points in time.
      N_importance: int. Number of additional times to sample along each ray.
        These samples are only passed to network_fine.
      network_fine: &quot;fine&quot; network with same spec as network_fn.
      white_bkgd: bool. If True, assume a white background.
      raw_noise_std: ...
      verbose: bool. If True, print more debugging info.
    Returns:
      rgb_map: [num_rays, 3]. Estimated RGB color of a ray. Comes from fine model.
      disp_map: [num_rays]. Disparity map. 1 / depth.
      acc_map: [num_rays]. Accumulated opacity along each ray. Comes from fine model.
      raw: [num_rays, num_samples, 4]. Raw predictions from model.
      rgb0: See rgb_map. Output for coarse model.
      disp0: See disp_map. Output for coarse model.
      acc0: See acc_map. Output for coarse model.
      z_std: [num_rays]. Standard deviation of distances along ray for each
        sample.
    &quot;&quot;&quot;
    N_rays = ray_batch.shape[0]
    rays_o, rays_d = ray_batch[:,0:3], ray_batch[:,3:6] # [N_rays, 3] each
    viewdirs = ray_batch[:,-3:] if ray_batch.shape[-1] &gt; 8 else None
    bounds = torch.reshape(ray_batch[...,6:8], [-1,1,2])
    near, far = bounds[...,0], bounds[...,1] # [-1,1]

    t_vals = torch.linspace(0., 1., steps=N_samples)
    if not lindisp:
        z_vals = near * (1.-t_vals) + far * (t_vals)
    else:
        z_vals = 1./(1./near * (1.-t_vals) + 1./far * (t_vals))

    z_vals = z_vals.expand([N_rays, N_samples])

    if perturb &gt; 0.:
        # get intervals between samples
        mids = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        upper = torch.cat([mids, z_vals[...,-1:]], -1)
        lower = torch.cat([z_vals[...,:1], mids], -1)
        # stratified samples in those intervals
        t_rand = torch.rand(z_vals.shape)

        # Pytest, overwrite u with numpy&#39;s fixed random numbers
        if pytest:
            np.random.seed(0)
            t_rand = np.random.rand(*list(z_vals.shape))
            t_rand = torch.Tensor(t_rand)

        z_vals = lower + (upper - lower) * t_rand

    pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples, 3]


#     raw = run_network(pts)
    raw = network_query_fn(pts, viewdirs, network_fn)
    rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

    if N_importance &gt; 0:

        rgb_map_0, disp_map_0, acc_map_0 = rgb_map, disp_map, acc_map

        z_vals_mid = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        z_samples = sample_pdf(z_vals_mid, weights[...,1:-1], N_importance, det=(perturb==0.), pytest=pytest)
        z_samples = z_samples.detach()

        z_vals, _ = torch.sort(torch.cat([z_vals, z_samples], -1), -1)
        pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples + N_importance, 3]

        run_fn = network_fn if network_fine is None else network_fine
#         raw = run_network(pts, fn=run_fn)
        raw = network_query_fn(pts, viewdirs, run_fn)

        rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)

    ret = {&#39;rgb_map&#39; : rgb_map, &#39;disp_map&#39; : disp_map, &#39;acc_map&#39; : acc_map}
    if retraw:
        ret[&#39;raw&#39;] = raw
    if N_importance &gt; 0:
        ret[&#39;rgb0&#39;] = rgb_map_0
        ret[&#39;disp0&#39;] = disp_map_0
        ret[&#39;acc0&#39;] = acc_map_0
        ret[&#39;z_std&#39;] = torch.std(z_samples, dim=-1, unbiased=False)  # [N_rays]

    for k in ret:
        if (torch.isnan(ret[k]).any() or torch.isinf(ret[k]).any()) and DEBUG:
            # print(f&quot;! [Numerical Error] {k} contains nan or inf.&quot;)
            print(&#39;what?&#39;)

    return ret</code></pre>
<h2 id="1-ray_batch로-부터-ray_o-rays_d-near-focal-나누기">1. ray_batch로 부터 ray_o, rays_d, near focal 나누기.</h2>
<pre><code class="language-python">    N_rays = ray_batch.shape[0]
    rays_o, rays_d = ray_batch[:,0:3], ray_batch[:,3:6] # [N_rays, 3] each
    viewdirs = ray_batch[:,-3:] if ray_batch.shape[-1] &gt; 8 else None
    bounds = torch.reshape(ray_batch[...,6:8], [-1,1,2])
    near, far = bounds[...,0], bounds[...,1] # [-1,1]</code></pre>
<ul>
<li>rat_batch: 이전에 만들어졌던 batch 단위의 ray. lego의 경우 dimension은 [N_rand, 11]</li>
<li>ray_batch에서 rays_o와 rays_d를 나눈다.</li>
<li>ray_batch에서 viewdirs를 나눈다.</li>
<li>bound = torch.reshape... -&gt; ray에서 near값과 far 값을 분리한다.<ul>
<li>np.shape(bound) = [N_rand,1,2] ex)[1024,1,2]</li>
</ul>
</li>
<li>최종적으로 near는 값이 2인 [N_rand,1], far는 값이 6인 [N_rand,1]로 저장된다.</li>
</ul>
<h2 id="2-stratified-sampling">2. Stratified sampling</h2>
<pre><code class="language-python">    t_vals = torch.linspace(0., 1., steps=N_samples)
    if not lindisp:
        z_vals = near * (1.-t_vals) + far * (t_vals)
    else:
        z_vals = 1./(1./near * (1.-t_vals) + 1./far * (t_vals))

    z_vals = z_vals.expand([N_rays, N_samples])

    if perturb &gt; 0.:
        # get intervals between samples
        mids = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        upper = torch.cat([mids, z_vals[...,-1:]], -1)
        lower = torch.cat([z_vals[...,:1], mids], -1)
        # stratified samples in those intervals
        t_rand = torch.rand(z_vals.shape)

        # Pytest, overwrite u with numpy&#39;s fixed random numbers
        if pytest:
            np.random.seed(0)
            t_rand = np.random.rand(*list(z_vals.shape))
            t_rand = torch.Tensor(t_rand)

        z_vals = lower + (upper - lower) * t_rand

    pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples, 3]


#     raw = run_network(pts)
    raw = network_query_fn(pts, viewdirs, network_fn)
    rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)</code></pre>
<p>이 부분은 논문에서 언급한 sampling 방법인 Stratified sampling 을 진행하는 부분이다.
$$
t_i \sim \mathcal{U}\left[t_n+\frac{i-1}{N}\left(t_f-t_n\right), t_n+\frac{i}{N}\left(t_f-t_n\right)\right]
$$
|논문 내용| 코드  |
|----|----|
|0부터 1까지 N_sampling개 만큼 간격 동일한 점 생성 <br>N evenly-spaced bins|t_vals = torch.linspace(0., 1., steps=N_samples)|
|we use a stratified sampling approach where we partition $[t_n,t_f]$|if not lindisp:<br>z_vals = near * (1.-t_vals) + far * (t_vals)<br>else:<br>z_vals = 1./(1./near * (1.-t_vals) + 1./far * (t_vals))<br>z_vals = z_vals.expand([N_rays, N_samples])|</p>
<ul>
<li>z_vals = lower + (upper - lower) * t_rand 여기가 이해가 안되는데? 왜 이렇게 연산을 복잡하게 하는거지? 그냥 stratified sampling 인가?</li>
</ul>
<hr>
<h2 id="작성중">작성중</h2>
<pre><code> raw = network_query_fn(pts, viewdirs, network_fn)
 rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)</code></pre><p>raw2ouput을 통해서 (raw: NeRF network를 통해서 예측된 point들의 RGB값.) rgb_map, disp_map, acc_map, weights를 output으로 얻는다. raw2output에 대한 정보는 NeRF Code Review 시리즈에서 확인할 수 있다. <a href="https://velog.io/@coma_403/NeRF-Code-Review-def-raw2outputs">raw2output 링크</a></p>
<h2 id="3-hierarchical-volume-sampling">3. Hierarchical volume Sampling</h2>
<pre><code class="language-python">    if N_importance &gt; 0:

        rgb_map_0, disp_map_0, acc_map_0 = rgb_map, disp_map, acc_map

        z_vals_mid = .5 * (z_vals[...,1:] + z_vals[...,:-1])
        z_samples = sample_pdf(z_vals_mid, weights[...,1:-1], N_importance, det=(perturb==0.), pytest=pytest)
        z_samples = z_samples.detach()

        z_vals, _ = torch.sort(torch.cat([z_vals, z_samples], -1), -1)
        pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples + N_importance, 3]

        run_fn = network_fn if network_fine is None else network_fine
#         raw = run_network(pts, fn=run_fn)
        raw = network_query_fn(pts, viewdirs, run_fn)

        rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)
</code></pre>
<ul>
<li><p>코드 설명</p>
<pre><code>  rgb_map_0, disp_map_0, acc_map_0 = rgb_map, disp_map, acc_map</code></pre><p>coarse network를 통해서 얻은 rgb_map, disp_map, acc_map을 따로 저장한다.</p>
<pre><code>  z_vals_mid = .5 * (z_vals[...,1:] + z_vals[...,:-1])</code></pre><p>coarse sampline을 통해 얻은 ray에서의 sample의 중점 계산.</p>
<pre><code>  z_samples = sample_pdf(z_vals_mid, weights[...,1:-1], N_importance, det=(perturb==0.), pytest=pytest)
  z_samples = z_samples.detach()</code></pre><p>Hierarchical sampling을 통해 fine network에 들어갈 sample 계산.</p>
<pre><code>  z_vals, _ = torch.sort(torch.cat([z_vals, z_samples], -1), -1)
  pts = rays_o[...,None,:] + rays_d[...,None,:] * z_vals[...,:,None] # [N_rays, N_samples + N_importance, 3]</code></pre><p>z_vals = coarse sampled points + fine sampled points
ray에서 fine sampled points + coarse sampled point 정의</p>
<pre><code>  run_fn = network_fn if network_fine is None else network_fine

  raw = network_query_fn(pts, viewdirs, run_fn)

  rgb_map, disp_map, acc_map, weights, depth_map = raw2outputs(raw, z_vals, rays_d, raw_noise_std, white_bkgd, pytest=pytest)</code></pre><p>$N_c + N_f$ sample들을 input으로 fine network를 통해 rgb_map, disp_map, acc_map depth_map을 output으로 반환.</p>
</li>
</ul>
<pre><code class="language-python">    ret = {&#39;rgb_map&#39; : rgb_map, &#39;disp_map&#39; : disp_map, &#39;acc_map&#39; : acc_map}
    if retraw:
        ret[&#39;raw&#39;] = raw
    if N_importance &gt; 0:
        ret[&#39;rgb0&#39;] = rgb_map_0
        ret[&#39;disp0&#39;] = disp_map_0
        ret[&#39;acc0&#39;] = acc_map_0
        ret[&#39;z_std&#39;] = torch.std(z_samples, dim=-1, unbiased=False)  # [N_rays]

    for k in ret:
        if (torch.isnan(ret[k]).any() or torch.isinf(ret[k]).any()) and DEBUG:
            # print(f&quot;! [Numerical Error] {k} contains nan or inf.&quot;)
            print(&#39;what?&#39;)

    return ret</code></pre>
<ul>
<li>코드 설명
ret이라는 dictionary를 통해서 rgb_map, disp_map, acc_map 저장. 그 후, nan을 계산해서 error 검출. 마지막으로 ret 반환.</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[NeRF Code Review - def batchify(fn, chunk)]]></title>
            <link>https://velog.io/@coma_403/NeRF-Code-Review-def-batchifyfn-chunk</link>
            <guid>https://velog.io/@coma_403/NeRF-Code-Review-def-batchifyfn-chunk</guid>
            <pubDate>Fri, 04 Nov 2022 07:50:55 GMT</pubDate>
            <description><![CDATA[<pre><code class="language-python">def batchify(fn, chunk):
    &quot;&quot;&quot;Constructs a version of &#39;fn&#39; that applies to smaller batches.
    &quot;&quot;&quot;
    if chunk is None:
        return fn
    def ret(inputs):
        return torch.cat([fn(inputs[i:i+chunk]) for i in range(0, inputs.shape[0], chunk)], 0)
    return ret</code></pre>
<ol>
<li>if chunk is None:<ul>
<li>chunk가 정해져 있지 않으면, fn을 반환한다. default 값으로 1024*64가 저장되어 있다.</li>
</ul>
</li>
<li>def ret(inputs):
 return torch.cat([fn(inputs[i:i+chunk]) for i in range(0, inputs.shape[0], chunk)], 0)<ul>
<li>fn은 NeRF의 class object이다.    </li>
<li>chunk 단위 만큼 input을 잘라서 NeRF network에 input으로 넣어준다.<ul>
<li>이 부분에서 run network가 실행된다(?) 추후에 알아보고 수정</li>
</ul>
</li>
<li>fn(inputs[i:i+chunk]를 통해서 batch화 된 ray들이 NeRF network로 들어가서 RGB estimation이 진행된다.
<img src="https://velog.velcdn.com/images/coma_403/post/b0ad9b7f-c835-47d4-bce6-bfebbb92e2da/image.png" alt=""></li>
</ul>
</li>
</ol>
]]></description>
        </item>
        <item>
            <title><![CDATA[NeRF Code Review - class NeRF(nn.Module)]]></title>
            <link>https://velog.io/@coma_403/NeRF-Code-Review-class-NeRFnn.Module</link>
            <guid>https://velog.io/@coma_403/NeRF-Code-Review-class-NeRFnn.Module</guid>
            <pubDate>Fri, 04 Nov 2022 06:15:38 GMT</pubDate>
            <description><![CDATA[<pre><code class="language-python">class NeRF(nn.Module):
    def __init__(self, D=8, W=256, input_ch=3, input_ch_views=3, output_ch=4, skips=[4], use_viewdirs=False):
        &quot;&quot;&quot; 
        &quot;&quot;&quot;
        super(NeRF, self).__init__()
        self.D = D
        self.W = W
        self.input_ch = input_ch
        self.input_ch_views = input_ch_views
        self.skips = skips
        self.use_viewdirs = use_viewdirs

        self.pts_linears = nn.ModuleList(
            [nn.Linear(input_ch, W)] + [nn.Linear(W, W) if i not in self.skips else nn.Linear(W + input_ch, W) for i in range(D-1)])

        ### Implementation according to the official code release (https://github.com/bmild/nerf/blob/master/run_nerf_helpers.py#L104-L105)
        self.views_linears = nn.ModuleList([nn.Linear(input_ch_views + W, W//2)])

        ### Implementation according to the paper
        # self.views_linears = nn.ModuleList(
        #     [nn.Linear(input_ch_views + W, W//2)] + [nn.Linear(W//2, W//2) for i in range(D//2)])

        if use_viewdirs:
            self.feature_linear = nn.Linear(W, W)
            self.alpha_linear = nn.Linear(W, 1)
            self.rgb_linear = nn.Linear(W//2, 3)
        else:
            self.output_linear = nn.Linear(W, output_ch)

    def forward(self, x):
        input_pts, input_views = torch.split(x, [self.input_ch, self.input_ch_views], dim=-1)
        h = input_pts
        for i, l in enumerate(self.pts_linears):
            h = self.pts_linears[i](h)
            h = F.relu(h)
            if i in self.skips:
                h = torch.cat([input_pts, h], -1)

        if self.use_viewdirs:
            alpha = self.alpha_linear(h)
            feature = self.feature_linear(h)
            h = torch.cat([feature, input_views], -1)

            for i, l in enumerate(self.views_linears):
                h = self.views_linears[i](h)
                h = F.relu(h)

            rgb = self.rgb_linear(h)
            outputs = torch.cat([rgb, alpha], -1)
        else:
            outputs = self.output_linear(h)

        return outputs   </code></pre>
<ol>
<li><pre><code class="language-python">input_pts, input_views = torch.split(x, [self.input_ch,self.input_ch_views], dim=-1) </code></pre>
</li>
</ol>
<ul>
<li>코드해석<ul>
<li>input_pts : rays_o에 해당하는 ray 위치 정보. shape = [1024*64, 63] 아마 60+3(?)</li>
<li>input_views : rays_d에 해당하는 ray의 방향 정보 shape = [1024*64, 27] 아마 24+3(?)</li>
</ul>
</li>
</ul>
<ol start="2">
<li><pre><code class="language-python">     for i, l in enumerate(self.pts_linears):
         h = self.pts_linears[i](h)
         h = F.relu(h)
         if i in self.skips:
             h = torch.cat([input_pts, h], -1)</code></pre>
</li>
</ol>
<ul>
<li>코드해석<ul>
<li>self.pts_linears type: &#39;torch.nn.modules.container.ModuleList&#39;</li>
<li>NeRF 모델에서 rays_o(논문에서 $\gamma(x)$) 정보를 추가적으로 받는 network 까지의 5개의 fully-connected network 이다.</li>
<li>Activation function으로 ReLU를 사용하였다.</li>
<li>i가 self.skips 안에 해당되면, rays_o에 해당하는 inputs_pts가 fully connected network의 output인 h와 concatenate 되어 다시 network에 입력된다.
<img src="https://velog.velcdn.com/images/coma_403/post/b7c81952-2de0-476b-aa14-b815b1396451/image.png" alt=""></li>
</ul>
</li>
</ul>
<ol start="3">
<li><pre><code class="language-python">     if self.use_viewdirs:
         alpha = self.alpha_linear(h)
         feature = self.feature_linear(h)
         h = torch.cat([feature, input_views], -1)

         for i, l in enumerate(self.views_linears):
             h = self.views_linears[i](h)
             h = F.relu(h)

         rgb = self.rgb_linear(h)
         outputs = torch.cat([rgb, alpha], -1)
     else:
         outputs = self.output_linear(h)</code></pre>
</li>
</ol>
<ul>
<li><p>코드 해석</p>
<ul>
<li><p>alpha = self.alpha_linear(h)</p>
<ul>
<li><p><span style="color:red">Volume Density($\sigma$)</span>를 output으로 뽑는다. Paper의 그림으로만 보았을 때, Activation function 없이 바로 feature extraction 하였을 때, Volume density값과 256 dimension의 feature가 exreact 될 것 같은데, <span style="color:red">실제 코드에서는 그렇지 않았다.</span><br>  &#39;Detailed expression&#39; 그림을 참조해서 코드를 설명하면, activation function skip 과정 전 단계에서 input feature가 256, output feature가 1로 뽑히는 것을 확인할 수 있다. Paper 에서도 &#39;volume density $\sigma$ (which is rectified using a ReLU to ensure that the output volume density is nonegative)&#39;라고 명시되어 있다.
<img src="https://velog.velcdn.com/images/coma_403/post/eaafc494-6b2d-4b7c-8837-64e12477030a/image.png" alt=""></p>
</li>
<li><p>feature = self.feature_linear(h)</p>
<ul>
<li>Activation Function 없이 feature extraction을 진행한다. Paper의 그림에서 주황색 화살표에 해당한다.</li>
</ul>
</li>
<li><p>h = self.views_linears[i](h)</p>
<ul>
<li>ray의 direction 값을 256 dimension feature와 concatenate하여 linear layer에 input으로 넣어준다. 283 dimension의 input을 받는다.<h5 id="256feature-dim--24direction-dim---embedded-by-encoding--3original-direction--283">256(feature dim) + 24(direction dim - embedded by encoding) + 3(original direction) = 283</h5>
</li>
</ul>
</li>
<li><p>rgb = self.rgb_linear(h)</p>
<ul>
<li>128 dimension의 feature를 통해서 3 dimension인 RGB 값을 계산한다.</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><img src="https://velog.velcdn.com/images/coma_403/post/ad1306ae-5162-4584-8570-fad7e887f57f/image.png" alt=""></p>
]]></description>
        </item>
        <item>
            <title><![CDATA[NeRF code review - def get_embedder (작성중)]]></title>
            <link>https://velog.io/@coma_403/NeRF-code-review-def-getembedder</link>
            <guid>https://velog.io/@coma_403/NeRF-code-review-def-getembedder</guid>
            <pubDate>Thu, 03 Nov 2022 10:08:29 GMT</pubDate>
            <description><![CDATA[<pre><code class="language-python">def get_embedder(multires, i=0):
    if i == -1:
        return nn.Identity(), 3

    embed_kwargs = {
                &#39;include_input&#39; : True,
                &#39;input_dims&#39; : 3,
                &#39;max_freq_log2&#39; : multires-1,
                &#39;num_freqs&#39; : multires,
                &#39;log_sampling&#39; : True,
                &#39;periodic_fns&#39; : [torch.sin, torch.cos],
    }

    embedder_obj = Embedder(**embed_kwargs)
    embed = lambda x, eo=embedder_obj : eo.embed(x)
    return embed, embedder_obj.out_dim</code></pre>
<ul>
<li>multires는 encoding되는 frequency의 max frequency를 의미한다.</li>
<li>NeRF paper에서 positon 정보(rays_o)가 encoding 될 때는 multires는 L=10, direction 정보(rays_d)가 encoding 될 때는 multires는 L=4가 된다.
<img src="https://velog.velcdn.com/images/coma_403/post/927e1140-f150-4696-9d06-94c9ec9f466e/image.png" alt=""></li>
</ul>
<ul>
<li>positional encoding이 기본적으로 sin, cos로 encoding 되기 때문에 &#39;periodic_fns&#39; : [torch.sin, torch.cos]로 표현되었다.</li>
</ul>
<pre><code class="language-python">class Embedder:
    def __init__(self, **kwargs):
        self.kwargs = kwargs
        self.create_embedding_fn()

    def create_embedding_fn(self):
        embed_fns = []
        d = self.kwargs[&#39;input_dims&#39;]
        out_dim = 0
        if self.kwargs[&#39;include_input&#39;]:
            embed_fns.append(lambda x : x)
            out_dim += d

        max_freq = self.kwargs[&#39;max_freq_log2&#39;]
        N_freqs = self.kwargs[&#39;num_freqs&#39;]

        if self.kwargs[&#39;log_sampling&#39;]:
            freq_bands = 2.**torch.linspace(0., max_freq, steps=N_freqs)
        else:
            freq_bands = torch.linspace(2.**0., 2.**max_freq, steps=N_freqs)

        for freq in freq_bands:
            for p_fn in self.kwargs[&#39;periodic_fns&#39;]:    #   torch.sin, torch.cos
                embed_fns.append(lambda x, p_fn=p_fn, freq=freq : p_fn(x * freq))   # sin(2^freq * x), cos(2^freq * x)
                out_dim += d

        self.embed_fns = embed_fns
        self.out_dim = out_dim

    def embed(self, inputs):
        return torch.cat([fn(inputs) for fn in self.embed_fns], -1)</code></pre>
<ul>
<li>rays_o와 rays_d를 positional encoding 해주는 Embedder class이다.</li>
<li>kwargs는 dictionary 형태로 코드 초기에 parser로 argument들이 저장되어있다. </li>
<li>rays_o와 rays_d는 channel이 3개이므로, &#39;input_dims&#39;는 3으로 저장되어 있다.</li>
<li>include_input:True -&gt; positional encoding으로 embedding된 function들을 embed_fns에 appnd로 저장할 때, input function을 저장하는 용도로 사용된다.</li>
<li>max_freq는 paper에서 encoding 된 function의 마지막 freq에 해당하는 L-1이 된다.</li>
<li>N_freq는 paper에서 encoding된 function들의 갯수이다.</li>
</ul>
<h3 id="--positional-encoding-with-code">- positional Encoding with code</h3>
<p><img src="https://velog.velcdn.com/images/coma_403/post/bedda5d9-2213-4c96-ad8f-4a2294495077/image.png" alt=""></p>
<pre><code class="language-python">        for freq in freq_bands:
            for p_fn in self.kwargs[&#39;periodic_fns&#39;]:    #   torch.sin, torch.cos
                embed_fns.append(lambda x, p_fn=p_fn, freq=freq : p_fn(x * freq))   # sin(2^freq * x), cos(2^freq * x)
                out_dim += d
</code></pre>
]]></description>
        </item>
        <item>
            <title><![CDATA[NeRF Code Review - def train() 내부 ray sampling (작성중)]]></title>
            <link>https://velog.io/@coma_403/NeRF-ray-sampling</link>
            <guid>https://velog.io/@coma_403/NeRF-ray-sampling</guid>
            <pubDate>Wed, 02 Nov 2022 07:00:01 GMT</pubDate>
            <description><![CDATA[<h2 id="전체-코드">전체 코드</h2>
<pre><code class="language-python">            if N_rand is not None:
                rays_o, rays_d = get_rays(H, W, K, torch.Tensor(pose))  # (H, W, 3), (H, W, 3)

                if i &lt; args.precrop_iters:
                    dH = int(H//2 * args.precrop_frac)
                    dW = int(W//2 * args.precrop_frac)
                    coords = torch.stack(
                        torch.meshgrid(
                            torch.linspace(H//2 - dH, H//2 + dH - 1, 2*dH), 
                            torch.linspace(W//2 - dW, W//2 + dW - 1, 2*dW)
                        ), -1)
                    if i == start:
                        print(f&quot;[Config] Center cropping of size {2*dH} x {2*dW} is enabled until iter {args.precrop_iters}&quot;)                
                else:
                    coords = torch.stack(torch.meshgrid(torch.linspace(0, H-1, H), torch.linspace(0, W-1, W)), -1)  # (H, W, 2)

                coords = torch.reshape(coords, [-1,2])  # (H * W, 2)
                select_inds = np.random.choice(coords.shape[0], size=[N_rand], replace=False)  # (N_rand,)
                select_coords = coords[select_inds].long()  # (N_rand, 2)
                rays_o = rays_o[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
                rays_d = rays_d[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
                batch_rays = torch.stack([rays_o, rays_d], 0)
                target_s = target[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
                # print(target_s)</code></pre>
<h2 id="1-cropping-부분lego">1. Cropping 부분.(lego)</h2>
<pre><code class="language-python">if N_rand is not None:
   rays_o, rays_d = get_rays(H, W, K, torch.Tensor(pose))  # (H, W, 3), (H, W, 3)

   if i &lt; args.precrop_iters:
        dH = int(H//2 * args.precrop_frac)
        dW = int(W//2 * args.precrop_frac)
        coords = torch.stack(
                 torch.meshgrid(
                 torch.linspace(H//2 - dH, H//2 + dH - 1, 2*dH), 
                 torch.linspace(W//2 - dW, W//2 + dW - 1, 2*dW)
                 ), -1)
        if i == start:
            print(f&quot;[Config] Center cropping of size {2*dH} x {2*dW} is enabled until iter {args.precrop_iters}&quot;)                
   else:
       coords = torch.stack(torch.meshgrid(torch.linspace(0, H-1, H), torch.linspace(0, W-1, W)), -1)  # (H, W, 2)</code></pre>
<p>ray_o, rays_d를 get_rays를 통해서 return 받는다.
그 다음, i &lt; args.precrop_iters가 true일 경우
[H,W,3]에 해당하는 이미지에서 정 중앙에 위치하고, 면적이 기존 면적에서 1/4에 해당하는 이미지를 cropping하여 indexing 해준다.</p>
<p>ex) lego.blend의 경우, 학습 초기에 center cropping을 진행한다. 기존 이미지는 [400,400,3]
i &lt; args.precrop_iters 일 경우( precrop_iter 전까지 초기 학습에서는 image에서 center를 중점적으로 학습한다.)
<img src="https://velog.velcdn.com/images/coma_403/post/b0d876ce-e19d-4fd5-b017-4e7f0873b792/image.png" alt="">다음과 같이 [400,400,3] 이미지에서 가운데 <span style="color: red">사각형 영역</span>으로 coords가 indexing 된 것을 확인할 수 있다!</p>
<blockquote>
<ul>
<li>하단의 그림 처럼 cropping된 [200 $\times$ 200]영역에서 random으로 N_rand개의 pixel을 선택해서 ray로 만들어 nerf Network에 input으로 넣어준다.<blockquote>
<p><img src="https://velog.velcdn.com/images/coma_403/post/15804ace-45d4-4f6b-a7f5-f79b95c98a80/image.png" alt=""></p>
</blockquote>
</li>
</ul>
</blockquote>
<h2 id="2-ray-random-sampling">2. Ray random sampling.</h2>
<pre><code class="language-python">coords = torch.reshape(coords, [-1,2])  # (H * W, 2)
select_inds = np.random.choice(coords.shape[0], size=[N_rand], replace=False)  # (N_rand,)
select_coords = coords[select_inds].long()  # (N_rand, 2)
rays_o = rays_o[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
rays_d = rays_d[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
batch_rays = torch.stack([rays_o, rays_d], 0)
target_s = target[select_coords[:, 0], select_coords[:, 1]]  # (N_rand, 3)
</code></pre>
<p>parser에서 학습 초기에 N_rand 값을 외부에서 사용자 지정 값으로 입력 받았다. N_rand는 random ray의 갯수인데, 이는 위 코드에서 활용된다.
torch.reshape(coords, [-1,2])를 통해서 [H,W,3]이였던 shape을 [H * W, 2] shape으로 변경해주고, np.random.choid를 통해서 (0 ~ H * W) 숫자 중에서 random으로 N_rand 갯수 만큼 숫자를 뽑는다. 해당 숫자들은 ray의 index에 해당되고, 선정된 index로 부터 rays_o, rays_d, target을 지정해준다.</p>
<blockquote>
<p>rays_o: ray 시작점 위치
rays_d: ray 방향
target_s: Index에 해당하는 이미지의 pixel RGB value</p>
</blockquote>
]]></description>
        </item>
    </channel>
</rss>