<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>"Hello, Estelle!"</title>
        <link>https://velog.io/</link>
        <description>Studying</description>
        <lastBuildDate>Thu, 01 May 2025 18:13:21 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>"Hello, Estelle!"</title>
            <url>https://velog.velcdn.com/images/estelle_y/profile/5752f59d-7fe2-4fd6-a4f2-b650073054a3/social_profile.jpeg</url>
            <link>https://velog.io/</link>
        </image>
        <copyright>Copyright (C) 2019. "Hello, Estelle!". All rights reserved.</copyright>
        <atom:link href="https://v2.velog.io/rss/estelle_y" rel="self" type="application/rss+xml"/>
        <item>
            <title><![CDATA[[논문스터디] PointNet++: Deep hierarchical feature learning on point sets in a metric space]]></title>
            <link>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-PointNet-Deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space</link>
            <guid>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-PointNet-Deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space</guid>
            <pubDate>Thu, 01 May 2025 18:13:21 GMT</pubDate>
            <description><![CDATA[<h2 id="bibtex-인용">Bibtex 인용</h2>
<pre><code>@article{qi2017pointnet++,
  title={Pointnet++: Deep hierarchical feature learning on point sets in a metric space},
  author={Qi, Charles Ruizhongtai and Yi, Li and Su, Hao and Guibas, Leonidas J},
  journal={Advances in neural information processing systems},
  volume={30},
  year={2017}
}</code></pre><hr>
<h2 id="요약">요약</h2>
<ul>
<li>local feature를 PointNet에 추가해보자<ul>
<li>multi scale로 학습해서 잘 combine해서 결과는 똑같이 내면서 local feature도 utilise해보자</li>
</ul>
</li>
</ul>
<hr>
<h2 id="인트로">인트로</h2>
<ul>
<li>PointNet에서는 local structure를 잘 capture하지 못함<ul>
<li>근데 local structture가 convolutional atchitecture의 성공을 좌지우지한다함</li>
</ul>
</li>
<li>그래서 multi-resolution hierarchy 를 도입<ul>
<li>lower단에서 local structure를 학습할 수 있음</li>
</ul>
</li>
<li>문제점이 두개임<ul>
<li>어떻게 point set을 partitioning할거임?</li>
<li>어떻게 그 set of points or local feature를 local feature learner를 통해서 abstract할거임?</li>
</ul>
</li>
<li>위의 두 문제점은 연관성이 있는데 이게 잘 나눠야 웨이트 공유하면서 학습에도 영향을 받고 그래서 ㅇㅇ<ul>
<li>일단 앞에 우리 잘되는거 만들어놨으니까 local feature leatner로 PointNet쓸거임</li>
</ul>
</li>
<li>overlapping partition of point set을 만드는게 문제임<ul>
<li>entanglement of feature scale 이랑 non-uniformity of input point가 적절한 local 포인트 파티션 만드는걸 힘들게 만듬</li>
</ul>
</li>
</ul>
<hr>
<h2 id="기여">기여</h2>
<ul>
<li>multi scale에서 robust하고 detail한 특징을 capture하는 PointNet++이라는 딥러닝 네트워크 제안</li>
<li>말고는 포인트넷거 이야기</li>
</ul>
<hr>
<h2 id="3-method">3. Method</h2>
<h3 id="31-review-of-pointnet">3.1 Review of PointNet</h3>
<ul>
<li>정리해둔거 있으니까 패스하겠음</li>
</ul>
<h3 id="32-hierarchical-point-set-feature-learning">3.2 Hierarchical Point Set Feature Learning</h3>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/4df2e8d6-a367-4b2a-99fd-d834b46d2c6c/image.png" alt=""></p>
<ul>
<li>FPS로 점들 뽑아서 샘플링하고 그룹핑은 그거 묶어서 넘기는건데 이러면 각 그룹마다 포함된 포인트의 개수가 다름<ul>
<li>근데 POintNet은 flexible # input에 대해서도 고정된 수의 아웃풋 피쳐벡터 뽑을 수 있어서 ㄱㅊ</li>
</ul>
</li>
<li>KNN보다 이 방식이 더 general하게 작동함</li>
<li>그리고 뒤에 PointNet러사용함</li>
</ul>
<h3 id="33-robust-feature-learning-under-non-uniform-sampling-density">3.3 Robust Feature Learning under Non-Uniform Sampling Density</h3>
<ul>
<li>dense한거랑 sparse한거랑 generalize가 잘 안되니까</li>
<li>dense한데서는 좀 더 closely inspect하고 sparse하면 좀 더 넓은 scale에 대해서 inspect</li>
<li>이거를 학습하기 위해서 adaptive PointNet을 사용함</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/04acbdae-4492-42c6-845b-46f64e94043c/image.png" alt=""></p>
<ul>
<li>multi scale grouping<ul>
<li>각 point에 대해서 랜덤 확률로 랜덤 dropping함</li>
<li>empty한 그룹 없게하려고 드랍 확률을 핸드 튜닝했다함 0.95 정도면 없는거같대</li>
<li>이거로 uniformity가 보장되지 않는 point cloud에 대해서 cover</li>
</ul>
</li>
<li>multi resolution grouping<ul>
<li>위에 방법을 아예 raw한 데이터에서부터 쓰면 computational cost가 너무 커서 제안한거</li>
<li>전체거랑 multi resolution거를 concat해서 쓰는데<ul>
<li>density에 따라서 reliable한 데이터가 전체에 있을수도있고 아닐수도있어서</li>
<li>일단 concat하고 뒤에서 뭐가 더 reliavle한지 학습할 수 있게끔<ul>
<li>아무튼 weight가 조정된다는 식으로 말하는데 직접 뭘 하는건 아님</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="34-point-feature-propagation-for-set-segmentation">3.4 Point Feature Propagation for Set Segmentation</h3>
<ul>
<li>원래 포인트 클라우드로 segmentation을 진행하고싶은데 그러면 너무 computational cost가 큼</li>
<li>그래서 skip connection사용해서 진행</li>
<li>interpolation 진행하고 거기에 skip connection된 원래의 것에 feature를 대응시킴</li>
<li>inverse distance weighted average를 사용해서 interpolation진행하고</li>
<li>그렇게 interpolation된 feature들을 skip connection한 포인트 feature랑 concat진행함</li>
<li>그거를 unit PointNet에 넣는데 대충 1by1 convolution같은 느낌</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[[논문스터디] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation]]></title>
            <link>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-PointNet-Deep-Learning-on-Point-Sets-for-3D-Classification-and-Segmentation</link>
            <guid>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-PointNet-Deep-Learning-on-Point-Sets-for-3D-Classification-and-Segmentation</guid>
            <pubDate>Thu, 01 May 2025 18:12:04 GMT</pubDate>
            <description><![CDATA[<h2 id="bibtex-인용">Bibtex 인용</h2>
<pre><code>@inproceedings{qi2017pointnet,
  title={Pointnet: Deep learning on point sets for 3d classification and segmentation},
  author={Qi, Charles R and Su, Hao and Mo, Kaichun and Guibas, Leonidas J},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={652--660},
  year={2017}
}</code></pre><hr>
<h2 id="요약">요약</h2>
<ul>
<li>Point cloud는 irregular한 geometric data라서 보통 3D voxel grid나 collection of image로 변환해서 사용함</li>
<li>이거 불필요하게 voluminous함 그래서 여기서는 directly point를 다루는 neural network를 제시</li>
<li>Object classification, part segmentation으로 scene semantic parsing</li>
<li>간단해서 efficient effective하다</li>
</ul>
<hr>
<h2 id="1-인트로">1. 인트로</h2>
<ul>
<li>포인트클라우드나 메시들은 불규칙적인 형태를 가지고 있음 → 3D 복셀이나 cxollection of image로 변환해서 사용해야함</li>
<li>이렇게 하면 불필요하게 voluminous함</li>
<li>포인트 클라우드는 매시와는 다르게 simple and unified함 → 학습에 용이함</li>
<li>본 논문ㄴ에서 제안하는 PointNety은 unified architecture를 통해 포인트 자체를 입력으로 받아서 모든 포인트 혹은 seggment or part에 대해 label을 반환하는 네트워크</li>
<li>단순히 $(x,~ y,~ z)$ 좌표값만을 사용함 → 다른 dimension에 대한 정본는 normal이나 other local/global feature를 계산하면서 더해질것이라 간주</li>
<li>key approach는 simple symmetric function and max pooling을 이용해서 네트워크를 학습시키는것</li>
<li>놀랍게도(라는데) PointNet은 sparse한 키 포인트 셋에서 skeleton of object를 visualization기반으로 학습함</li>
</ul>
<hr>
<h2 id="기여">기여</h2>
<ul>
<li>novel deep net architecture 근데 이제  unordered 3D 포인트클라우드에 대해 적합한</li>
<li>3D shape classsification, shape part segmentation, scene semantic parsing task를 하는 net을 학습시킴</li>
<li>network에 대한 stability, efficency을 emperical, theoretical한 방식으로 분석을 제공</li>
</ul>
<hr>
<h2 id="2-related-work">2. Related Work</h2>
<h3 id="point-cloud-feature">Point Cloud Feature</h3>
<h3 id="deep-learning-on-3d-data">Deep Learning on 3D Data</h3>
<h3 id="deep-learning-on-unordered-sets">Deep Learning on Unordered Sets</h3>
<h2 id="3-problem-statement">3. Problem Statement</h2>
<ul>
<li>object classification을 위해서는 각 포인트들이 directly sampled shape or pre-segmented from a scene point cloud여야함</li>
<li>$(x,~ y,~z)$ coordinate이외에도 여러 feature에 대한 정보들이 필요함</li>
<li>근데 이제 point net에서는 $(x, ~ y, ~ z)$ 만 사용해서 할거고 각 class에 대해서 각각 score를 매길거임 → n포인트 m클래스면 $n \times m$ output</li>
</ul>
<h2 id="4-deep-learning-on-point-sets">4. Deep Learning on Point Sets</h2>
<h2 id="41-properties-of-point-sets-in-r-n">4.1. Properties of Point Sets in $\R ^n$</h2>
<ul>
<li>입력은 euclidean space에서 추출된 point cloud의 subset</li>
<li>이미지랑 다르게 unordered라서 네트워크는 permutation invarient를 보장해야함</li>
<li>not isolated라서 주위의 점이 meaningful한 subset을 이룰 수 있음</li>
<li>transformation이 적용되어도 그것이 category나 segmentation of point에 invarient하게 작용해야함</li>
</ul>
<h2 id="42-pointnet-architecture">4.2. PointNet Architecture</h2>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/26aad474-a35d-4d3a-9d21-e44ebcb19563/image.png" alt=""></p>
<ul>
<li>3개의 키 모듈<ul>
<li>max pooling layer as a symmetric function to aggregate information</li>
<li>local/global information융합을 위한 구조</li>
<li>point랑 피쳐 align을 위한 alignment network</li>
</ul>
</li>
</ul>
<h3 id="symmetry-function-for-unordered-input">Symmetry Function for Unordered Input</h3>
<ul>
<li>보통 unsorted데이터를 활용할때 아래의 세가지 정도의 solution을 사용하곤함<ul>
<li>입력을 canonical order로 sort<ul>
<li>sort자체가 ordering issue를 완전히 resolve하지는 못함</li>
<li>sort가 언제나 stable하게 유지되어야 학습이 잘되는데 보통 그렇지 못해서</li>
<li>MLP는 unsorted point set에 대해서 더 나은 성능을 보임</li>
</ul>
</li>
<li>입력을 RNN을 훈련하기 위한 seq로 취급<ul>
<li>randomly permuted seq를 RNN학습에 사용하면 학습된 네트워크는 인풋의 order에 invarient함</li>
<li>근데 RNN특성상 입력 seq에 대해서 완전히 독립적인 아웃풋을 낸다고 생각할 수 없어서 순서가 중요한 요소로 남긴함</li>
</ul>
</li>
<li>그냥 각 포인트에 대해서 information을 aggregate하는 simple symmetric function<ul>
<li>empirically이거 잘 작동함</li>
<li>심플해서 분석도 쉽댐</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="local-and-global-information-aggregation">Local and Global Information Aggregation</h3>
<ul>
<li>point classification은 SVM나 MLP로 간단히 됐대</li>
<li>근데 point segmentation은 llocal and global knowledge를 필요로 함</li>
<li>global point cloud feature vector를 계산한 다음에 이걸 per point feature로 feedback함</li>
<li>그렇게 각 point feature에 global feature를 combine한 다음에 다시 per point feature를 extract하는 방식으로 local/global feature를 combine한다함</li>
</ul>
<h3 id="joint-alignment-network">Joint Alignment Network</h3>
<ul>
<li>mini network를 통해서 affine transformation 행렬을 예측하고 이거를 입력된 point의 coordinate에 적용</li>
<li>mininetwork 자체는 그냥 network랑 구조는 유사하고 각 point대해 독립적인 feature 추출 및 ㅡmax pooling and fully connected layer로 이루어져있음</li>
<li>이거 똑같은거 나중에 feature level에서 한번 더 이루어지는데 이때는 단순히 공간에서 계산하는거보다  차원 짱큼 → 최적화 difficulty 커짐</li>
<li>그래서 여기에는 regularization term을 softmax loss에 추가함</li>
</ul>
<h2 id="42-theoretical-analysis">4.2. Theoretical Analysis</h2>
<h3 id="universal-approximation">Universal approximation</h3>
<ul>
<li>intuitively small perturbation은 결과에 영향을 줄 수 없음</li>
<li>max pooling layer에 충분히 많은 뉴런 전달시 아웃풋을 뽑아내는 function이 arbitrary approximated가능</li>
</ul>
<h3 id="theorem-1">Theorem 1.</h3>
<ul>
<li>$f : \chi ~ \rarr \R$ 가 Hausdorff distance를 기준으로 하는 function set이라 해봄</li>
<li>이론적으로 최악의 상황에서는 동일한 크기의 voxel로 분할해서 point cloud를 volumetirc representation으로 바꿀수있음</li>
<li>근데 practically network가 much smarter하게 space를 probe하는 방법을 익혔대</li>
</ul>
<h3 id="bottleneck-dimension-and-stability">Bottleneck dimension and stability</h3>
<ul>
<li>theoretically and experimentally 자기들 네트워크의 expressiveness가 dimension of max pooling layer에 크게 영향을 받음 → 이라 하고 아무말도 안하냐</li>
<li>암튼 다음 theorem을 보면 stability에 영향을 주는 properties에 대해 알 수 있대</li>
</ul>
<h3 id="theorem-2">Theorem 2.</h3>
<ul>
<li>$u : X \to \mathbb{R}^K$ , $u = \max_{x_i \in S} { h(x_i)}$, $f = \gamma \circ u$ 라 가정하고<ul>
<li>a) $\forall S, \exists C_S, N_S \subseteq X, , f(T) = f(S) \text{ if } C_S \subseteq T \subseteq N_S$</li>
<li>b) $|C_S| \leq K$</li>
</ul>
</li>
<li>a)는 모든 포인트가 보존된다면 extra noise에 영향받지 않는다는 거고</li>
<li>b)는 $f$ 가 결론적으로 $K$ element보다 작거나 같은 finite subset에 의해서 결정된다는 거래</li>
<li>그래서 $S, ~K$critical한 point set이 된다?</li>
<li>아무튼 위에 두개를 합치면 robustness를 나타낸다네</li>
<li>intuitively point net learns how to summarise a shape by a sparse set of key points</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/1ad7a544-2fb8-4040-afda-659c6cb70664/image.png" alt=""></p>
<ul>
<li>암튼 잘됨 ~ 1080에서 3-6시간이면 학습 ㄱㄴ</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[[논문스터디] Semantic Graph Based Place Recognition for 3D Point Clouds]]></title>
            <link>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-Semantic-Graph-Based-Place-Recognition-for-3D-Point-Clouds</link>
            <guid>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-Semantic-Graph-Based-Place-Recognition-for-3D-Point-Clouds</guid>
            <pubDate>Thu, 01 May 2025 18:10:30 GMT</pubDate>
            <description><![CDATA[<h2 id="bibtex-인용">Bibtex 인용</h2>
<pre><code>@inproceedings{kong2020semantic,
  title={Semantic graph based place recognition for 3d point clouds},
  author={Kong, Xin and Yang, Xuemeng and Zhai, Guangyao and Zhao, Xiangrui and Zeng, Xianfang and Wang, Mengmeng and Liu, Yong and Li, Wanlong and Wen, Feng},
  booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={8216--8223},
  year={2020},
  organization={IEEE}
}</code></pre><hr>
<h2 id="요약">요약</h2>
<ul>
<li>3d 포인트 클라우드에서 Occulsion 및 viewpoint변화, place recognition에 강인한 descriptortor를 생성하는것이 어려움</li>
<li>대부분 로컬 아니면 글로벌 아니면 통계적 특징을 사용함</li>
<li>이 페이퍼에서는 human perspective에 ㄱㅣ반해서 semantric한 레벨을 목표로함</li>
<li>semantic object를 인식하고 그래프 기반 접근 방식을 제시</li>
<li>장소 인식을 그래프 기반의 매칭 문제로 치환</li>
<li><a href="https://github.com/kxhit/SG_PR.">코드는 여기</a>.</li>
</ul>
<hr>
<h2 id="인트로">인트로</h2>
<ul>
<li>누적된 주행 drift error를 제거하는 가장 효과적인 방법은 loop closing을 하는 방법임</li>
<li>현재의 place recognition 전략은 대부분 descriptor 생성과 feature distance measurement에 기반함</li>
<li>라이다 기반 방법에서 많이 쓰이는거는 raw data에 neural network 혹은 handcrafted design기반으로 local or global descriptor를 뽑는거임</li>
<li>이렇게 하면 보통 low level의 feature를 얻게됨 ex) local structure, distributing characteristic</li>
<li>이런거 occlusion 이나 rotation에 sensitive하고 segment사이의 관계들이 무시되는데 그게 scene expression에 치명적일 수 있음</li>
<li>이 논문에서는 point cloud data를 semantic information을 aggregate해서 만든 novel graph representation을 사용함</li>
<li>이런 graph based reperesentation은 topological relation을 고려하므로 포인트클라우드를 더 efficient and comprehensible하게 만들어줌</li>
</ul>
<hr>
<h2 id="기여">기여</h2>
<ul>
<li>3d point cloud에 대한 semantic graph representation을 제시함<ul>
<li>capture semantic information and model topological relations between objects</li>
</ul>
</li>
<li>loop closure detection에 사용될 수 있는 graph similarity matching 네트워크를 제시함</li>
<li>semantic kitti로 테스트해서 reverse loop closure detection과 occlusion 및 viewpoint변화에 대한 robustness에 SOTA임을 보임</li>
</ul>
<hr>
<h2 id="methodology">Methodology</h2>
<ul>
<li>key insight는<ol>
<li>human perspective사용</li>
<li>semantic level의 descriptor사용</li>
<li>encoding relations among semantic object</li>
</ol>
</li>
<li>raw 포인트에 대해서 semantic segmentation을 통해  instance 및 semantic information topological information을 취득하여 semantic graph를 구성함</li>
<li>그 이후에 raw point cloud들을 topological semantic graph로 변환하여서 place recognition문제를 그래프 매칭 문제로 바꿈</li>
</ul>
<h3 id="a-semantic-graph-representation">A. Semantic Graph Representation</h3>
<p><strong>Semantic Segmentation for Point Clouds</strong></p>
<ul>
<li>RangeNet++이랑 Semantic KITTI사용해서 semantic object detection을 하는데, 이 과정에서 몇개의 클래스들을 합치고 지워서 12개의 카테고리만 사용함</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/70e7221a-2a0b-4910-8ee3-0049ea669b66/image.png" alt=""></p>
<ul>
<li>각 카테고리에 따라서 클러스터링 반경을 다르게 설정하고 , 유클리디안 클러스터링을 통해서 semantic instance를 취득</li>
</ul>
<p><strong>Semantic Graph Constriction</strong></p>
<ul>
<li>64채널 라이다가 보통 한 프레임 당 10만개 이상의 포인트를 capture하는데, 이거 너무 redundant함</li>
<li>줄이기위해서 down sampling이나 2D평면에 투영하는데, 우리는 topological semantic graph를 사용함<ul>
<li>concise하고 meaningful하며 semantic information과 topological relation이 잘 보존됨</li>
</ul>
</li>
<li>각 semantic instance들은 one hot encoding되어서 사용되고 유클리디안 디스턴스 기반으로 나타남</li>
<li>그 그래프가 scene에 대한 representation임 그래서 이제 similarity measurement problem으로 두 그래프를 비교할 수 있음</li>
</ul>
<h3 id="b-graph-similarity-network">B. Graph Similarity Network</h3>
<ul>
<li>보통 그래프 similarity metric으로 Graph Edit Distance(GED), Maximum Common Subgraph(MCS)를 사용하는데 이거 NP-complete라서 정확한 distance를 구하기 힘듦</li>
<li>그리고 loop closing을 위한 place recognition이기때문에 permutation invarient해야하고 rotation invariant해야함</li>
<li>위의 조건을 만족시키면서 원래의 similarity 산출방식을 사용하면 reasonable한 시간 안에 도출이 불가능</li>
<li>그래서 propseg한다, graph matching을 위한 graph similarity network inspired by SimGNN</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/f02af069-06a5-4f48-8fd1-0a43e23c82f2/image.png" alt=""></p>
<p><strong>Node Embedding</strong></p>
<ul>
<li>Graph Convolutional Network는 노드간의 relation을 기반으로 feature를 aggregate하지만, adjacency matrix를 미리 정의해야함</li>
<li>따라서 point cloud를 처리할떄는 dynamic하게 graph를 구성하는 것이 나음<ul>
<li>EdgeConv 사용, Dynamic Graph CNN(DGCNN)에서 제안되었음</li>
</ul>
</li>
<li>EdgeConv는 local geometry information을 capture하고 permutation invariance를 보장함</li>
<li>dynamic하게 업데이트되는 그래프이기때문에 semantic grouping에 용이함</li>
<li>EdgeConv layer<ul>
<li>kNN search를 통해 각 노드에 대해 feature space와 euclidean space에서 가장 가까운 k개의 이웃 찾음</li>
<li>각 노드 feature는 centroid information과 semantic label(one-hot encoded)로 initialize됨</li>
<li>edge function은 $h_{\Theta} (f_i, f_j) = h_{\bar{\Theta}} \left( f_i, f_i - f_j^m \right)$ 와 같이 정의됨</li>
<li>$\Theta$는 학습 가능한 파라미터 집합, $f_i$는 global information을 포함하고 저거 뺀거는 local 관계 정보를 포함함</li>
<li>multimodal feature aggregation을 위해 spatial and semantic level에서 독립적인 convolution을 수행하고 embedding진행 후 concat</li>
</ul>
</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/2945826b-8d41-4f1e-90c5-b723996702ad/image.png" alt=""></p>
<p><strong>Graph Embedding</strong></p>
<ul>
<li>usually node enbedding은 weighted or unweighted average를 사용해서 생성</li>
<li>여기에서는 SimGNN에 영감을 받아 attention module을 활용해 각 node에 대해 학습가능한 가중치 행렬을 추정</li>
<li>neural network가 어떤 노드가 graph를 대표하는데에 더 적합한지 학습</li>
<li>Global Graph Context $c$는 각 노드에 대해서 node embedding 의 평균을 구한후 $tanh$써서 계산함 → $c = \tanh \left( \frac{1}{N} \sum_{i=1}^{N} u_i W \right)$</li>
<li>$c$는 그래프의 전체 구조 및 feature information을 제공하고 학습하면서 가중치 업데이트</li>
<li>global context와 유사한 node가 더 높은 attention을 받음<ul>
<li>attention은 global context와 node embedding을 내적하고 $sigmoid$를 사용해서 $[0,~1]$범위에 있도록함</li>
</ul>
</li>
<li>그래서 이거 weighted sum사용해서 최종 graph embedding계산</li>
<li>$e = \sum_{i=1}^{N} \sigma \left( u_i \tanh \left( \frac{1}{N} \sum_{m=1}^{N} u_m W \right)^T \right) u_i$</li>
</ul>
<p><strong>Graph-Graph Interation</strong></p>
<ul>
<li>graph level embedding에서 두 그래프의 관계 추정에 neural tensor network(NTN)사용<ul>
<li>NTN은 linear layer 대신 bilinear layer를 사용해서 두 벡터간 관계를 학습하는거 → 내적보다 나음</li>
<li>relation between graph level embedding은 아래식대로</li>
<li>$g(e_1, e_2) = \text{ReLU} \left( e_1^T \omega_{[1:S]} e_2 + \alpha \begin{bmatrix} e_1 \ e_2 \end{bmatrix} + b \right)$</li>
<li>이게 뭐냐면 첫 항이 bilinear tensor연산으로 두 그래프 $e_1 ,~e_2$간의 관계를 학습하는거</li>
<li>두 번째 항이 두 embedding을 concat해서 linearize한다음에 추가적인 feature를 학습하는거</li>
<li>세 번째 항은 그냥 bias</li>
</ul>
</li>
</ul>
<p><strong>Graph Similarity</strong></p>
<ul>
<li>similarity 계산을 위해서 FC layer를 사용</li>
<li>최종적으로 $[0,~1]$범위의 score를 출력하고 이것을 통해 binary classification problem으로 치환해서 풀어냄</li>
<li>similarity는 그냥 NTN에서 얻은 feature vector를 FC layer에 통과시켜서 단일 스칼라값(score)를 뽑아내고 여기에 $sigmoid$사용해서 정규화</li>
<li>손실함수는 BCE사용 GT label이 이진수라서 그냥 $L = - \frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$이렇게 계산하면됨</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/0d219390-f20c-4c3c-8420-a2dccbac189c/image.png" alt=""></p>
<h2 id="experiment">Experiment</h2>
<ul>
<li>암튼 잘 됐다 같은 느낌인데 확실히 recall은 좋음</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/9fde2b7b-72e2-45b9-8470-ec7b0f24dd0e/image.png" alt=""></p>
<ul>
<li>근데 이제 precision은 좀 낮은 시퀀스도 있긴함 근데 결과보면 무난히 좋아보인달까</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/98accae4-86e6-466b-bd68-0fc8e17ee8a5/image.png" alt=""></p>
<ul>
<li>threshold distance를 두고 잘 찾는지 보는데 아무튼 잘된다같은 느낌</li>
<li>뭐 다 그런 내용이었다 성능은</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap]]></title>
            <link>https://velog.io/@estelle_y/WIP-ORB-SLAM3-An-Accurate-Open-Source-Library-for-Visual-VisualInertial-and-Multimap</link>
            <guid>https://velog.io/@estelle_y/WIP-ORB-SLAM3-An-Accurate-Open-Source-Library-for-Visual-VisualInertial-and-Multimap</guid>
            <pubDate>Tue, 18 Mar 2025 15:32:02 GMT</pubDate>
            <description><![CDATA[<h1 id="orb-slam3-an-accurate-open-source-library-for-visual-visualinertial-and-multimap">ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap</h1>
<p>Date: 2021
Journal: T-RO</p>
<h1 id="3-system-overview">3. System Overview</h1>
<h2 id="1-atlas">1) Atlas</h2>
<p>Multimap representation composed of a set of disconnected maps</p>
<p>Active map is where tracking threads localizes incoming frames, the others are nonactive maps</p>
<h2 id="2-tracking-thread">2) Tracking thread</h2>
<p>Compute pose of current frame with respect to active map in real time</p>
<p>If tracking is lost, tries to relocalize in all the atlas maps</p>
<h2 id="3-local-mapping-thread">3) Local mapping thread</h2>
<p>Add keyframes and points to the active map, removes redundant ones, and refines the map using visual or visual-inertial BA</p>
<h2 id="4-loop-and-map-merging-thread">4) Loop and map merging thread</h2>
<p>Detect common regions between active map and the whole atlas at keyframe rate</p>
<p>If it belongs to active map, performs loop correction</p>
<p>If not, merge both maps into single one</p>
<p><a href="https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-ORB-SLAM-a-Versatile-and-Accurate-Monocular-SLAM-System">ORB SLAM 정리</a>
<a href="https://velog.io/@estelle_y/WIP-ORB-SLAM2-an-Open-Source-SLAM-System-for-Monocular-Stereo-and-RGB-D-Cameras">WIP ORB SLAM2 정리</a>
<a href="https://velog.io/@estelle_y/WIP-ORB-SLAM3-An-Accurate-Open-Source-Library-for-Visual-VisualInertial-and-Multimap">WIP ORB SLAM3 정리</a></p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras]]></title>
            <link>https://velog.io/@estelle_y/WIP-ORB-SLAM2-an-Open-Source-SLAM-System-for-Monocular-Stereo-and-RGB-D-Cameras</link>
            <guid>https://velog.io/@estelle_y/WIP-ORB-SLAM2-an-Open-Source-SLAM-System-for-Monocular-Stereo-and-RGB-D-Cameras</guid>
            <pubDate>Tue, 18 Mar 2025 15:30:43 GMT</pubDate>
            <description><![CDATA[<h1 id="orb-slam2-an-open-source-slam-system-for-monocular-stereo-and-rgb-d-cameras">ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras</h1>
<p>Date: 2017
Journal: T-RO</p>
<h1 id="3-orb-slam2">3. ORB SLAM2</h1>
<h2 id="c-bundle-adjustment-with-monocular-and-stereo-constraints">c. Bundle Adjustment with Monocular and Stereo Constraints</h2>
<h3 id="motion-anly-ba">Motion anly BA</h3>
<p>Optimize camera orientation and position, minimizing reprojection error between matched 3D points in world coordinates and keypoints</p>
<h3 id="local-ba">Local BA</h3>
<p>Optimize set of covisible keyframes and all points seen in keyframes</p>
<p>All the other keyframes not in covisible frames, contribute to the cost function, while no other optimization</p>
<h3 id="full-ba">Full BA</h3>
<p>Specific case of local BA, where all the keyframes and points in the map are optimized</p>
<p><a href="https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-ORB-SLAM-a-Versatile-and-Accurate-Monocular-SLAM-System">ORB SLAM 정리</a>
<a href="https://velog.io/@estelle_y/WIP-ORB-SLAM2-an-Open-Source-SLAM-System-for-Monocular-Stereo-and-RGB-D-Cameras">WIP ORB SLAM2 정리</a>
<a href="https://velog.io/@estelle_y/WIP-ORB-SLAM3-An-Accurate-Open-Source-Library-for-Visual-VisualInertial-and-Multimap">WIP ORB SLAM3 정리</a></p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space]]></title>
            <link>https://velog.io/@estelle_y/WIP-PointNet-Deep-Hierarchical-Feature-Learning-on-Point-Sets-in-a-Metric-Space</link>
            <guid>https://velog.io/@estelle_y/WIP-PointNet-Deep-Hierarchical-Feature-Learning-on-Point-Sets-in-a-Metric-Space</guid>
            <pubDate>Tue, 18 Mar 2025 15:21:54 GMT</pubDate>
            <description><![CDATA[<h1 id="pointnet-deep-hierarchical-feature-learning-on-point-sets-in-a-metric-space">PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space</h1>
<p>Date: 2017
Journal: CVPR</p>
<h1 id="1-introduction">1 Introduction</h1>
<p>Exploiting local structure has proven to be important for the success of convolutional architectures</p>
<p>CNN takes data defined on regular grids as the input an is able to progressively capture features at increasingly larger scales along a multi resolution hierarchy</p>
<p>PointNet++ is a hierarchical neural network that process a set of points sampled in a metric space in a hierarchical fashion</p>
<p>First partition the set of points into overlapping local regions by distance metric of the underlying space</p>
<p>Two issues are addressed by PointNet++</p>
<ol>
<li>How to generate the partitioning of the point set</li>
<li>How to abstract sets of points or local features through a local feature learner</li>
</ol>
<p>PointNet++ apples PointNet recursively on a nested partitioning of the input set</p>
<p>Unlike CNNs, where smaller kernels often enhance performance, point cloud data can be sparse, making small scales inadequate</p>
<p>PointNet++ addresses this by using multi-scale neighborhoods, adapting to different scales during training, and achieving superior results on 3D point cloud benchmarks</p>
<h1 id="2-problem-statement">2. Problem Statement</h1>
<h1 id="3-method">3. Method</h1>
<p>Extension of PointNet with added hierarchical structure</p>
<h2 id="31-review-of-pointnet">3.1 Review of PointNet</h2>
<p>Invariant to point permutations and can arbitrarily approximate any continuous set function</p>
<p>Lacks the ability to capture local context at different scale</p>
<h2 id="32-hierarchical-point-set-feature-learning">3.2 Hierarchical Point Set Feature Learning</h2>
<p>Use a hierarchical grouping of points and progressively abstract larger and larger local region along the hierarchy</p>
<p>Hierarchical structure is composed by a number of set abstraction levels</p>
<p>Three key layers: Sampling layer, Grouping layer, PointNet layer</p>
<h3 id="sampling-layer">Sampling layer</h3>
<p>Iterative farthest point sampling to choose a subset of points</p>
<p>Generates receptive fields in a data dependent manner</p>
<h3 id="grouping-layer">Grouping layer</h3>
<p>Grouping input point set ($N \times (d~ +<del>C)$) matrix into output ($N&#39; \times K \times (d</del>+~C&#39;)$)</p>
<p>$K$ is the number of points in neighborhood</p>
<h3 id="pointnet-layer">PointNet layer</h3>
<p>Local feature is abstracted by its centroid, and that encode the centroid’s neighbourhood</p>
<p>Output size of $N&#39; \times (d<del>+</del>C&#39;)$</p>
<h2 id="33-robust-feature-learning-under-non-uniform-sampling-density">3.3 Robust Feature Learning under Non-Uniform Sampling Density</h2>
<h3 id="msg-multi-scale-grouping">MSG Multi-scale grouping</h3>
<p>Capture multi scale patterns by applying grouping layers with different scales followed by according PointNet to extract features of each scale</p>
<p>Concatenated to form a multi scale feature</p>
<p>Optimize with random input dropout</p>
<h3 id="mrg-multi-resolution-grouping">MRG Multi-resolution grouping</h3>
<p>The # og centroid points is usually large at the lowest level, which cause time cost increase</p>
<p>Use multi resolution grouping</p>
<p>One vector from summarizing features at each subregion from lower level</p>
<p>One vector from directly processing raw points in local region</p>
<h2 id="34-point-feature-propagation-for-set-segmentation">3.4 Point Feature Propagation for set Segmentation</h2>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP]  PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation]]></title>
            <link>https://velog.io/@estelle_y/WIP-PointNet-Deep-Learning-on-Point-Sets-for-3D-Classification-and-Segmentation</link>
            <guid>https://velog.io/@estelle_y/WIP-PointNet-Deep-Learning-on-Point-Sets-for-3D-Classification-and-Segmentation</guid>
            <pubDate>Tue, 18 Mar 2025 15:19:49 GMT</pubDate>
            <description><![CDATA[<h1 id="pointnet-deep-learning-on-point-sets-for-3d-classification-and-segmentation">PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation</h1>
<p>Date: 2017
Journal: CVPR</p>
<h1 id="1-introduction">1. Introduction</h1>
<p>Point clouds or meshes are not in a regular format, This cause the need for transformation to 3D voxel grids or collection of images</p>
<p>This data representation transformation renders the resulting data unnecessarily voluminous</p>
<p>PointNets simply use point clouds</p>
<p>As point cloud is just a set of points, basic architecture is simple at the initial stages each point is processed identically and independently</p>
<p>PointNet s trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks</p>
<h1 id="2-related-work">2. Related Work</h1>
<h2 id="point-cloud-feature">Point Cloud Feature</h2>
<p>Point Feature encode certain statistical transformation, typically classified, and also be categorized as local and global features</p>
<h2 id="deep-learning-on-3d-data">Deep Learning on 3D Data</h2>
<h2 id="deep-learning-on-unordered-sets">Deep Learning on Unordered Sets</h2>
<p>One recent work used a read process write network with attention mechanism to consume unordered input sets</p>
<h1 id="3-problem-statement">3. Problem Statement</h1>
<p>For object classification task, the input point cloud is either directly sampled from a shape or pre-segmented from a scene point cloud</p>
<h1 id="4-deep-learning-on-point-sets">4. Deep Learning on Point Sets</h1>
<h2 id="41-properties-of-point-sets-in-r-n">4.1. Properties of Point Sets in $\R ^n$</h2>
<p>Input is a subset of points from an Euclidean space</p>
<p>Unordered, interaction among nearby points, invariant to certain transformation for learned representation of the point set</p>
<h2 id="42-pointnet-architecture">4.2. PointNet Architecture</h2>
<p>Three key modules, max pooling layer as a symmetric function to aggregate information, local and global information combination structure, two joint alignment networks</p>
<h3 id="symmetry-function-for-unordered-input">Symmetry Function for Unordered Input</h3>
<ul>
<li>Sort input into a canonical order</li>
</ul>
<p>Sorting does not fully resolve the ordering issue</p>
<p>MLP performs better with unsorted point set</p>
<ul>
<li>Treat input as a sequence to train RNN</li>
</ul>
<p>Using randomly permuted sequences, RNN become invariant to input order</p>
<p>However when it comes to RNN, order does matter and cannot be totally omitted</p>
<ul>
<li>Simple symmetric function to aggregate information from each points</li>
</ul>
<p>Approximate a general function defined on a point set by applying a symmetric function on fransformed elements in the set</p>
<p>Due to simplicity of our module, theoretical analysis were possible</p>
<h3 id="local-and-global-information-aggregation">Local and Global Information Aggregation</h3>
<p>Point segmentation requires a combination of local and global knowledge</p>
<p>After computing the global point cloud feature vector, feed it back to per point feature</p>
<p>Extract new per point features based on the combined point features</p>
<h3 id="joint-alignment-network">Joint Alignment Network</h3>
<p>Predict affine transformation matrix by a mini network and directly apply this transformation to coordinates of input points</p>
<p>The mini network itself resembles big network and is composed by basic modules of point independent feature extraction, max pooling and fully connected layers</p>
<p>Transformation matrix in the feature space has much higher dimension than the spatial transform matrix</p>
<p>Therefore add a regularization term to our softmax training loss</p>
<h2 id="42-theoretical-analysis">4.2. Theoretical Analysis</h2>
<h3 id="universal-approximation">Universal approximation</h3>
<p>Ability of neural network to continuous set functions</p>
<p>Given enough neurons at max pooling layer</p>
<h3 id="theorem-1">Theorem 1.</h3>
<p>Suppose $f : \chi ~ \rarr \R$ is a continuous set function with reference to Hausdorff distance</p>
<p>In the worst case the network can learn to convert a point cloud into a volumetric representation by partitioning the space into equal sized voxels</p>
<h3 id="bottleneck-dimension-and-stability">Bottleneck dimension and stability</h3>
<p>Expressiveness of network is strongly affected by the dimension of the max pooling layer</p>
<p>Defined sub network of $f$ which maps a point set in $[0, ~ 1] ^m$ to a $K$ dimensional vector</p>
<h3 id="theorem-2">Theorem 2.</h3>
<p>Following proposed formula</p>
<p>Extra noise points up to $\mathcal{N}_S$</p>
<p>Robustness is gained in analogy to sparsity principle</p>
<p>Intuitively network learns to summarize a shape by a sparse set of key points</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] Semantic KITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences]]></title>
            <link>https://velog.io/@estelle_y/WIP-Semantic-KITTI-A-Dataset-for-Semantic-Scene-Understanding-of-LiDAR-Sequences</link>
            <guid>https://velog.io/@estelle_y/WIP-Semantic-KITTI-A-Dataset-for-Semantic-Scene-Understanding-of-LiDAR-Sequences</guid>
            <pubDate>Tue, 18 Mar 2025 15:17:09 GMT</pubDate>
            <description><![CDATA[<h1 id="semantic-kitti-a-dataset-for-semantic-scene-understanding-of-lidar-sequences">Semantic KITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences</h1>
<p>Date: 2019
Journal: CVPR</p>
<h1 id="1-introduction">1. Introduction</h1>
<p>LiDAR sensors are not affected by lighting, providing precise distance measurements</p>
<p>SemanticKITTI focuses on laser based semantic segmentation and semantic scene completion</p>
<h1 id="2-related-work">2. Related Work</h1>
<h1 id="3-the-semantic-kitti">3. The Semantic KITTI</h1>
<h2 id="31-labeling-process">3.1. Labeling Process</h2>
<p>Loop close the sequences using an off the shelf laser based SLAM system</p>
<p>Subdivide the sequence of point clouds into tiles of 100m by 100m</p>
<p>For each tile, load scans overlapping with tile, enabling to label all scans consistently</p>
<h2 id="32-dataset-statistics">3.2. Dataset Statistics</h2>
<p>The unbalanced count of classes occured, but is common for data from natural environments</p>
<h2 id="4-evaluation-of-semantic-segmentation">4. Evaluation of Semantic Segmentation</h2>
<h2 id="41-single-scan-experiments">4.1. Single Scan Experiments</h2>
<h3 id="task-and-metrics">Task and Metrics</h3>
<p>Used method, commonly applied mean Jaccard Index or mean intersection over union (mIOU) metric</p>
<p>Cannot expect to distinguish moving from non-moving objects with single scan</p>
<h3 id="state-of-the-art">State of the Art</h3>
<p>Feature extraction and classification is replaced by end to end deep neural networks (CNN) with 3D convolutions for object classification and semantic segmentation</p>
<p>To overcome the limitation of voxel based representation such as exploding memory consumption, recent approaches either upsample voxel predictions using CRF or use different representations</p>
<h3 id="baseline-approaches">Baseline approaches</h3>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] Deep SORT: SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC
]]></title>
            <link>https://velog.io/@estelle_y/WIP-Deep-SORT-SIMPLE-ONLINE-AND-REALTIME-TRACKING-WITH-A-DEEP-ASSOCIATION-METRIC</link>
            <guid>https://velog.io/@estelle_y/WIP-Deep-SORT-SIMPLE-ONLINE-AND-REALTIME-TRACKING-WITH-A-DEEP-ASSOCIATION-METRIC</guid>
            <pubDate>Tue, 18 Mar 2025 15:15:48 GMT</pubDate>
            <description><![CDATA[<h1 id="deep-sort-simple-online-and-realtime-tracking-with-a-deep-association-metric">Deep SORT: SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC</h1>
<p>Date: 2017
Journal: CVPR</p>
<h1 id="1-introduction">1. Introduction</h1>
<p>SORT was simple framework that performs Kalman filtering in image space and frame by frame data association using the Hungarian method with an association metric that measures bounding box overlap</p>
<p>But it returns a relatively high # of identity switches as the employed association metric is only accurate when uncertainty is low</p>
<p>To overcome this issue by replacing association metric with a more informed metric that combines motion and appearance information</p>
<p>Deep SORT increase robustness against isses and occlusions while keeping the system easy to implement efficient and applicable to online</p>
<h1 id="2-sort-with-deep-association-metric">2. Sort with Deep Association Metric</h1>
<h2 id="21-track-handling-and-state-estimation">2.1. Track Handling and State Estimation</h2>
<p>The track handling and Kalman filtering framework is mostly identical to the original formulation</p>
<p>State space is defined $(u, <del>v,</del>\gamma ,<del>h, ~\dot{x} ,</del>\dot{y}, <del>\dot{\gamma},</del>\dot{h} )$</p>
<p>$\gamma$ aspect ratio</p>
<p>Tracks that exceed a predefined maximum age $A_{max}$ are considered to have left the scene and are deleted from the track set</p>
<h2 id="22-assignment-problem">2.2. Assignment Problem</h2>
<p>To integrate motion and appearance information through combination of two appropriate metrics, Mahalanobis distance is used</p>
<p>$d^{(1)} (i, ~j) = (d_j - y_i ) ^T S_i ^{-1} (d_j - y_i )$ </p>
<h2 id="23-matching-cascade">2.3. Matching Cascade</h2>
<p>Mahalanobis distance favors large uncertainty because it effectively reduces the distance in standard deviations of any detection towards the projected track mean</p>
<p>It is an undesired behaviour as it can lead to increased track fragmentations and unstable tracks</p>
<p>In a final matching stage, intersection is done over union association as proposed in the original SORT algorithm</p>
<h2 id="24-deep-appearance-descriptor">2.4. Deep Appearance Descriptor</h2>
<p>A wide residual network with two convolutional layers followed by six residual blocks is employed</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] SORT: SIMPLE ONLINE AND REALTIME TRACKING]]></title>
            <link>https://velog.io/@estelle_y/WIP-SORT-SIMPLE-ONLINE-AND-REALTIME-TRACKING</link>
            <guid>https://velog.io/@estelle_y/WIP-SORT-SIMPLE-ONLINE-AND-REALTIME-TRACKING</guid>
            <pubDate>Tue, 18 Mar 2025 15:14:27 GMT</pubDate>
            <description><![CDATA[<h1 id="sort-simple-online-and-realtime-tracking">SORT: SIMPLE ONLINE AND REALTIME TRACKING</h1>
<p>Date: 2017
Journal: CVPR</p>
<h1 id="1-introduction">1. Introduction</h1>
<p>The MOT problem can be viewed as a data association problem where the aim is to associate detection across the frames </p>
<p>There is a  resurgence of mature data association techniques including Multiple Hypothesis Tracking(MHT) and Joint Probabilistic Data Association(JPDA) which occupy many of the top positions of the MOT benchmark</p>
<p>Traditional Tracker is too slow for realtime applications</p>
<p>Instead of focusing on efficient and reliable handling of the common frame to frame associations, exploit recent advances in visual object detection to solve detection problem directly</p>
<h1 id="2-literature-review">2. Literature review</h1>
<p>Traditional MOT delay making difficult decisions while there is high uncertainty over the object assignments</p>
<p>Many online tracking methods aim to build appearance models of either the individual objects themselves or a global model through online learning</p>
<p>When considering only one-to-one correspondence modelled as bipartite graph matching, globally  optimal solutions such as the Hungarian algorithm can be used</p>
<h1 id="3-methodology">3. Methodology</h1>
<h2 id="31-detection">3.1 Detection</h2>
<p>Utilize the Faster Region CNN (FrCNN) detection framework, which is an end to end framework that consist of two stages in this paper</p>
<p>first stage extracts features and proposes region, second stage classifies</p>
<p>Can be swapped to any design</p>
<h2 id="32-estimation-model">3.2 Estimation Model</h2>
<p>The inter-frame displacements of each object with a linear constant velocity model which is independent of other objects and camera motion</p>
<p>When a detection is associated to target. the bounding box is used to update the target state where the velocity components are solved optimally via Kalman filter framework</p>
<p>If no detection is associated to the target, its state is simply predicted without correction using the linear velocity model</p>
<h2 id="33-data-association">3.3 Data Association</h2>
<p>The assignment cost matrix is the computed as the intersection over union distance between each detection and all predicted bounding boxes from the existing targets</p>
<p>The assignment is solved optimally using Hungarian algorithm</p>
<h2 id="34-creation-and-deletion-of-track-identities">3.4 Creation and deletion of Track Identities</h2>
<p>For any detection with an overlap less than $IOU_{min}$ to signify the existence of an untracked object</p>
<p>Tracks are terminated if they are not detected for $T_{Lost}$ frames to prevent an unbounded growth in the # of trackers and localisation errors</p>
<p>Small  $T_{Lost}$ cause early deletion of lost targets which aids eddiciency</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] Attention Is All You Need]]></title>
            <link>https://velog.io/@estelle_y/Attention-Is-All-You-Need</link>
            <guid>https://velog.io/@estelle_y/Attention-Is-All-You-Need</guid>
            <pubDate>Tue, 18 Mar 2025 15:12:18 GMT</pubDate>
            <description><![CDATA[<p>Date: 2017
Journal: NIPS</p>
<h1 id="1-introduction">1 Introduction</h1>
<h2 id="background">Background</h2>
<p>RNN. LSTM. GRU have been firmly established the state of art in sequence modeling, language modeling and machine translation</p>
<h2 id="problem">Problem</h2>
<p>Recurrent model is critical at longer sequence, as memory constraints limiting batching across</p>
<p>Many solutions(factorization tricks, conditional computation) improved in computational efficiency while improving performance</p>
<p>But, problem still remains</p>
<h1 id="2-background">2 Background</h1>
<p>To reduce sequential computation, many CNN based model was proposed</p>
<p>These models computes hidden representations in parallel for all I/O positions</p>
<p>But to relate signals from arbitrary I/O positions grows in the distance between positions, which makes computation difficulties in learning dependencies between distant positions</p>
<p>In transformer, this is reduced to a constant number of operation</p>
<p>It reduce the effective resolution due to averaging attention weighted positions, but can handle with Multi-Head attention</p>
<p>Self-attention is an attention mechanism relating different position of a single sequence to compute a representation of the sequence</p>
<h1 id="3-model-architecture">3 Model Architecture</h1>
<p>Most competitive neural sequence transduction models have an encoder decoder structure</p>
<p>Encoder - input sequence of symbol representation ($x_1, <del>...,</del>x_n$ ) to continuous representations ($z1, <del>...,</del>z_n$)
Decoder - given continuous representations to output sequence ($y_1, <del>...,</del>y_n$)</p>
<p>Transformer follows this overall architecture using self-attention and point-wise, fully connected layers for both encoder and decoder</p>
<h2 id="31-encoder-and-decoder-stacks">3.1 Encoder and Decoder Stacks</h2>
<h3 id="encoder">Encoder</h3>
<p>Composed of 6 identical layers</p>
<p>Each layer has two sub-layers - multi head self attention mechanism and position wise fully connected feed forward network</p>
<p>Each sub layer has residual connection and following layer normalization</p>
<h3 id="decoder">Decoder</h3>
<p>Composed of 6 identical layers</p>
<p>Each layer has three sub-layers - multi head self attention mechanism and same layer to encoder layer</p>
<p>Each sub layer has residual connection like encoder</p>
<p>Modified self attention sub layer to ensure the prediction can only depend on the known output at position less than its position</p>
<h2 id="32-attention">3.2 Attention</h2>
<p>Attention function can be described as mapping a query and a set of key-value pairs to an output</p>
<p>Output is computed as weighted sum, where weight assigned to each value is computed by a compatibility function of query with matching key</p>
<h3 id="321-scaled-dot-product-attention">3.2.1 Scaled Dot-Product Attention</h3>
<p>Weight on values of dimension $d_v$are obtained by computing dot product of the input queries (dimension $d_k$) with keys (dimension $d_k$), divide each by $\sqrt d_k$ and apply softmax function</p>
<p>Two most commonly used attention function are additive attention and dot product attention</p>
<p>Two has similar theoretical complexity, but dot product attention is much faster and more space efficient since it can be implemented using highly optimized matrix multiplication</p>
<p>If $d_k$ is small two perform similarly, but for larger $d_k$ additive attention outperforms dot product without scaling</p>
<p>To counteract this, used scale dot product by $1 \over \sqrt d_k$</p>
<h3 id="322-multi-head-attention">3.2.2 Multi-Head Attention</h3>
<p>Linearly project the queries, keys, values $h$ times with different learned linear projection to $d_k$,  $d_k$ and $d_v$ dimension respectively is beneficial than using single attention function with $d_{model}$ dimension keys, values, queries</p>
<p>On each projected version of queries, keys and values, perform attention function in parallel, yielding $d_v$ dimensional output values</p>
<p>This reduction in dimension can lower computational cost similar to that of single attention with full dimension</p>
<h3 id="323-application-of-attention-in-out-model">3.2.3 Application of Attention in out Model</h3>
<p>Transformer use multi head attention in three different ways</p>
<ol>
<li>Mimics typical encoder decoder attention mechanism in sequence to sequence models
In “encode decoder attention” layer, the queries come from the previous decoder layer, and the memory keys and values com from the output of the encoder
This allow every position in the decoder attend over all position in input sequence</li>
<li>Allow each position in encoder to attend to all positions in previous layer of the encoder
In self attention layer of encoder, all of the keys, values and queries come from previous layer in the encoder</li>
<li>Similar to decoder, allows to attend to all the position in decoder upto and including that position
Prevent leftward information flow in the decoder to preserve auto-regressive property
This paper implemented this inside of scaled dot product attention by masking out all values of illegal connection from input of softmax</li>
</ol>
<h2 id="33-position-wise-feed-forward-networks">3.3 Position-wise Feed-Forward Networks</h2>
<p>Attention sub layers in encoder and decoder contains fully connected feed forward network - consist of two linear transformations with ReLU activation between - applied to each position separately and identically</p>
<p>Linear transformation are the same across different positions, but use different parameters from layer to layer</p>
<h2 id="34-embeddings-and-softmax">3.4 Embeddings and Softmax</h2>
<p>Similar to other sequence models, learned embeddings are used to convert input tokens and output tokens to vector of dimension $d_{model}$</p>
<p>Learned linear transformation and softmax function are used to convert decoder output to predict next token probabilities</p>
<p>In this model, same weight matrix is used between two embedding layers and pre-softmax linear transformation - in embedding layers, multiply $\sqrt {d_{model}}$ to weights</p>
<h2 id="35-positional-encoding">3.5 Positional Encoding</h2>
<p>As no recurrence and convolution in the model, positional information is needed</p>
<p>Added “positional encodings” to input embeddings at the bottom of encoder and decoder stacks, in dimension of $d_{model}$</p>
<p>Choose sinusoidal positional encoding for easy-learn to attend by relative position, as it can be converted to linear function</p>
<p>Compare to learned positional embeddings, this produced nearly identical result but allows model to extrapolate to sequence length longer</p>
<h1 id="4-why-self-attention">4 Why Self-Attention</h1>
<h2 id="path-length-between-long-range-dependencies-in-network">Path length between long-range dependencies in network</h2>
<p>Key factor to learn dependencies is the length of paths forward and backward signals have to traverse in the network</p>
<p>The shorter these paths, the easier it is to learn long range dependencies</p>
<h2 id="amount-of-computation-that-can-be-parallelized">Amount of computation that can be parallelized</h2>
<h2 id="total-computational-complexity-per-layer">Total computational complexity per layer</h2>
<h3 id="compare-to-recurrent-layer">Compare to Recurrent layer</h3>
<p>Self attention layer connects all positions with a constant number of sequentially executed operations $O(1)$, whereas a recurrent layer requires $O(n)$</p>
<p>In terms of computational complexity, self attention layers - $O(n \cdot d^2 )$ are faster than recurrent layers - $O(n^2 \cdot d )$ when $n&lt;d$</p>
<p>For larger $n$, restricting self attention to neighbour of size $r$ can be considered - leave for further work</p>
<h3 id="compare-to-single-convolutional-layer">Compare to Single convolutional layer</h3>
<p>A single convolutional layer with kernel width $k &lt; n$ does not connect all pairs of input and output positions</p>
<p>Doing so requires a stack of $O(n/k)$ convolutional layers, which increase the length of longest paths between any two positions in the network $O(log_k (n))$</p>
<p>By using separable convolutions, the complexity can be decreased to $O(k \cdot n \cdot d  + n \cdot d^2 )$
Even with $k=n$, complexity is equal to combination of self attention layer and a point wise feed forward layer</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] DETR: End-to-End Object Detection with Transformers]]></title>
            <link>https://velog.io/@estelle_y/WIP-DETR-End-to-End-Object-Detection-with-Transformers</link>
            <guid>https://velog.io/@estelle_y/WIP-DETR-End-to-End-Object-Detection-with-Transformers</guid>
            <pubDate>Tue, 18 Mar 2025 15:09:12 GMT</pubDate>
            <description><![CDATA[<p>Date: 2020
Journal: ECCV</p>
<h1 id="1-introduction">1 Introduction</h1>
<h2 id="problem">Problem</h2>
<p>Modern detectors use indirect way like defining surrogate regression and classification problems on a large set of proposals, anchors, or window centers</p>
<p>This performances are significantly influenced by post-processing steps</p>
<p>To overcome this, direct set prediction approach - end to end philosophy -is used, it has led to significant advances in complex structured prediction but except object detection</p>
<h2 id="proposal">Proposal</h2>
<p>Streamlined the training pipeline bu viewing object detection as direct set prediction problem</p>
<p>Adopt encoder-decoder transformer</p>
<p>DETR predicts all object all at once, and is trained end to end with a set loss function which performs bipartite matching between predicted and ground truth</p>
<p>DETR simplifies the detection pipeline by dropping multiple hand-designed components that encode prior knowledge</p>
<p>DETR doesn’t require any customized layers and can be reproduced easily in any framework that has CNN and transformer</p>
<h1 id="2-related-work">2 Related work</h1>
<h2 id="21-set-prediction">2.1 Set Prediction</h2>
<p>A general approach is to use auto-regressive sequence models such as RNN</p>
<p>As loss function need to be invariant to a permutation of predictions</p>
<p>Usual solution is designing loss based Hungarian algorithm which enforces permutation invariance and guarantees unique match</p>
<p>DETR use transformer with parallel decoding to follow this bipartite matching</p>
<h2 id="22-transformer-and-parallel-decoding">2.2 Transformer and Parallel Decoding</h2>
<p>Transformer introduced self attention layers, which scan through each elements of a sequence and update by aggregating information from whole sequence</p>
<p>The main advantage is global computation and perfect memory - suitable for longer sequences</p>
<p>DETR combine transformer and parallel decoding for their suitable trade off between computational cost and ability to perform global computations</p>
<h2 id="23-object-detection">2.3 Object detection</h2>
<p>In DETR, hand crafted process is removed, and directly predicting the set of detections with absolute box prediction with reference to input image rather than an anchor</p>
<h3 id="set-based-loss">Set based loss</h3>
<p>Several object detectors use bipartite matching loss</p>
<p>But these models was modeled with convolutional or fully connected layers</p>
<p>To improve performance, hand designed NMS post processing is needed</p>
<p>This means unless they use set based loss, they still need manual processing</p>
<h3 id="recurrent-detectors">Recurrent detectors</h3>
<p>Recurrent detector use bipartite matching losses with encoder-decoder architecture based on CNN activation and RNN to directly produce a set of bounding boxes</p>
<p>This performs on small data set, not on modern baselines</p>
<h1 id="3-the-detr-model">3 The DETR model</h1>
<p>Two ingredients are essential</p>
<ol>
<li><p>a set prediction loss that forces unique matching between predicted and ground truth boxes</p>
</li>
<li><p>an architecture that predicts a set of objects and models their relation</p>
</li>
</ol>
<h2 id="31-object-detection-set-prediction-loss">3.1 Object detection set prediction loss</h2>
<p>DETR infers a fixed size set of $N$ predictions</p>
<p>Loss produces optimal bipartite matching between predicted and ground truth, then optimize losses</p>
<p>The optimal assignment is computed with Hungarian Algorithm</p>
<h2 id="32-detr-architecture">3.2 DETR architecture</h2>
<h3 id="backbone">Backbone</h3>
<p>Conventional CNN is used for backbone, which generates a lower resolution activation map</p>
<h3 id="transformer-encoder">Transformer encoder</h3>
<p>First 1by1 convolution reduce the channel dimension of the high level activation map to smaller dimension</p>
<p>Each encoder layer as a standard architecture consist of a multi-head self-attention module and FFN, with additional positional encodings</p>
<h3 id="transformer-decoder">Transformer decoder</h3>
<p>Transformer decoder transforming $N$ embeddings of size $d$ using multi-headed self and encoder decoder attention mechanisms with transformer decoding $N$ objects in parallel at decoding layer</p>
<h3 id="prediction-feed-forward-network">Prediction feed forward network</h3>
<p>Final prediction is computed by 3 layer perceptron with ReLU activation function and hidden dimension $d$ and a linear projection layer</p>
<h3 id="auxiliary-decoding-losses">Auxiliary decoding losses</h3>
<p>Auxiliary losses can help the model output the correct number of objects of each class during training</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] Masked-attention Mask Transformer for Universal Image Segmentation]]></title>
            <link>https://velog.io/@estelle_y/WIP-Masked-attention-Mask-Transformer-for-Universal-Image-Segmentation</link>
            <guid>https://velog.io/@estelle_y/WIP-Masked-attention-Mask-Transformer-for-Universal-Image-Segmentation</guid>
            <pubDate>Tue, 18 Mar 2025 15:06:11 GMT</pubDate>
            <description><![CDATA[<p>Date: 2022
Journal: CVPR</p>
<h1 id="1-introduction">1. Introduction</h1>
<h2 id="background">Background</h2>
<p>The universal architecture is showing SOTA performance for semantic/panoptic segmentation and is flexible. But recent research is focusing on advancing specialized architectures.</p>
<h2 id="problem">Problem</h2>
<p>Why not universal architectures replace specialized ones.</p>
<p>→ Mask2Former : backbone feature extractor - pixel decoder - transformer decoder</p>
<h1 id="2-related-work">2. Related Work</h1>
<h2 id="specialized-semantic-segmentation-architectures">Specialized semantic segmentation architectures</h2>
<p>Typically per pixel classification</p>
<p>FCN based independently per pixel</p>
<p>Follow-up find context per pixel, focus on context modules/self-attention variants</p>
<h2 id="specialized-instance-segmentation-architectures">Specialized instance segmentation architectures</h2>
<p>Typically predict a set of binary masks for each class</p>
<p>Mask R-CNN generate masks from bounding boxes</p>
<p>Follow-up focus on precise bounding boxes/new ways to generate dynamic # of masks</p>
<p>Lack flexibility to generalization</p>
<h2 id="panoptic-segmentation">Panoptic segmentation</h2>
<p>Proposed to unify semantic/panoptic segmentation</p>
<h2 id="universal-architectures">Universal architectures</h2>
<p>Emerge w/ DETR</p>
<p>Show mask classification architectures w/ E2E prediction → general for any image segmentation</p>
<h1 id="3-masked-attention-mask-transformer">3. Masked-attention Mask Transformer</h1>
<h2 id="31-mask-classification-preliminaries">3.1 Mask classification preliminaries</h2>
<p>Mask classification architectures group pixels into N segments by N binary mask (for corresponding category labels) and is general</p>
<p>Difficult to fins good representations for each segment</p>
<p>→ each segmentation can be represented as C-dimention feature vector(”object query”), which can be processed by transformer decoder</p>
<p>Architecture components 
1.backbone - extract low resolution features
2.pixel decoder - gradually upsample to generate high resolution per pixel embeddings
3.transformer decoder - operate to process object queries, from which the binary mask predictions are decoded</p>
<h2 id="32-transformer-decoder-w-masked-attention">3.2 Transformer decoder w/ masked attention</h2>
<p>Key components of proposing Transformer decoder</p>
<p>Extract localized features by constraining cross-attention to foreground region of predicted mask for each query</p>
<p>For small objects, propose efficient multi-scale strategy to use high resolution features</p>
<h3 id="321-masked-attention">3.2.1 Masked attention</h3>
<p>Context feature is important for image segmentation, but cause slow converge as global context need many epoch for cross-attention to learn to attend local object</p>
<p>Hypotheses</p>
<ol>
<li>local features are enough to  update query features</li>
<li>context information can be gathered through self attention</li>
</ol>
<p>Solution
cross-attention attends only within the foreground region of predicted mask for each query</p>
<p>Masked attention matrix
$X_l = softmax(M_{l-1} + Q_l K_l ^T ) V_l + X_{l-1}$ 
$M_{l-1} (x,<del>y) = \begin{cases} 0 &amp;if</del> M_{l-1} (x,~y) = 1\ - \infty &amp; if ~ otherwise \end{cases}$</p>
<p>$M_{l-1}$  is binarized mask prediction of previous Transformer decoder layer obtained from $X_{l-1}$ resized to same resolution of $K_l$</p>
<h3 id="322-high-resolution-features">3.2.2 High resolution features</h3>
<p>Problem
High resolution features good for small objects, but high computation cost</p>
<p>Solution
Not always use high resolution feature map, use multi scale feature to control computation increase
both low/high resolution feature to one Transformer decoder layer</p>
<h3 id="323-optimization-improvements">3.2.3 Optimization improvements</h3>
<ol>
<li>switch self/cross attention order</li>
</ol>
<p>query features to first self attention layer is image independent and do not have signal from image, which means it does not enrich information</p>
<ol>
<li>make query feature learnable, and supervise features before use in Transformer decoder</li>
</ol>
<p>These learnable feature function like region proposal network and have ability to generate mask proposals</p>
<ol>
<li>remove dropout</li>
</ol>
<p>dropout is not necessary and decrease performance</p>
<h2 id="33-improving-training-efficiency">3.3 Improving training efficiency</h2>
<p>Problem
Large memory consumption while training</p>
<p>Solution
Motivated by PointRend/Implicit PointRend, which show a segmentation model can be trained with mask loss calculated on $K$  randomly sampled points</p>
<p>Use sampled points to calculate mask loss in matching/final loss</p>
<p>For matching loss, uniformly sample same set of $K$  points for all prediction and ground truth</p>
<p>For final loss, importance sample different pairs of prediction and ground truth</p>
<h1 id="4-experiments">4. Experiments</h1>
<p>Datasets</p>
<ul>
<li>COCO(80 things, 53 stuff)</li>
<li>ADE20K(100 things, 50 stuff)</li>
<li>Cityscapes(8things, 11 stuff) </li>
<li>Mapillary Vistas(37 things, 28 stuff)</li>
</ul>
<p>Limitations
On panoptic, slightly worse than exact samemodel trained with corresponding annotation for instance and semantic, which means need to be trained for specific tasks</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[논문스터디] LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping]]></title>
            <link>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-LIO-SAM-Tightly-coupled-Lidar-Inertial-Odometry-via-Smoothing-and-Mapping</link>
            <guid>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-LIO-SAM-Tightly-coupled-Lidar-Inertial-Odometry-via-Smoothing-and-Mapping</guid>
            <pubDate>Tue, 18 Mar 2025 14:50:56 GMT</pubDate>
            <description><![CDATA[<h2 id="bibtex-인용">Bibtex 인용</h2>
<pre><code>@INPROCEEDINGS{9341176,
  author={Shan, Tixiao and Englot, Brendan and Meyers, Drew and Wang, Wei and Ratti, Carlo and Rus, Daniela},
  booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  title={LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping},
  year={2020},
  volume={},
  number={},
  pages={5135-5142},
  doi={10.1109/IROS45743.2020.9341176}}</code></pre><hr>
<h2 id="요약">요약</h2>
<ul>
<li>smoothing 및 mapping을 활용한 tightly coupled lidat inertial odometry 프레임 워크 제안</li>
<li>factor graph기반 LIO 구성</li>
<li>IMU preintegration 기반 lidat point cloud deskewing 및 초기 추정</li>
<li>odometry 기반 IMU dias 추정</li>
<li>marginalization 기반 포즈 최적화를 통한 실시간 성능 향상</li>
<li>keyframe selection and sliding window를 활용한 sub keyframe을 통한 성능 향상</li>
</ul>
<hr>
<h2 id="인트로">인트로</h2>
<ul>
<li>비전 기반 SLAM은 장소인식에 유리하지만 initialization, range등이 별로</li>
<li>라이다 기반은 조도 변화에도 불변성 유지가능, fine detail of environment 취득 가능</li>
<li>LOAM이 대표적인데 low-drift, 실시간 pose estimation, mapping제공하지만 voxel map기반이라 loop closing, gps 융합 등에 별로</li>
<li>그리고 LOAM 은 실시간 성능도 구리고 스캔매칭 기반이라 large scale에서 구리다</li>
<li>smoothing and mapping기반 tightly coupled LIO제안</li>
<li>nonlinear 운동 모델 기반 포인트클라우드 deskew</li>
<li>imu를 통해 라이다 스캐닝 동안의 센서 운동 추정 및 최적화의 초기값으로 활용</li>
<li>라이다 오도메트리 IMU bias 추정에 활용</li>
<li>global factor graph를 활용해 traj 추정<ul>
<li>라이다 imu융하</li>
<li>pose간 place recognition 통합</li>
<li>gps, heading등 absolute value활용 가능</li>
<li>여러 factor joint optimzation</li>
</ul>
</li>
<li>prior sub keyframe을 통한 pose 최적화</li>
<li>local scan matching → 실시간 good</li>
</ul>
<hr>
<h2 id="기여">기여</h2>
<ul>
<li>factor graph기반 tightly coupled LIO 구축</li>
<li>local sliding window based scan matching을 통한 실시간 성능 확보</li>
</ul>
<hr>
<h2 id="iii-lidar-inertial-odometry-via-smoothing-and-mapping">III. LIDAR INERTIAL ODOMETRY VIA SMOOTHING AND MAPPING</h2>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/16ebf4c0-f18d-4d5c-be31-ba52736836ec/image.png" alt=""></p>
<h3 id="a-system-overview">A. System Overview</h3>
<ul>
<li>로봇 상태<ul>
<li>$x = \begin{bmatrix} R^T, p^T, v^T, b^T \end{bmatrix}^T$</li>
<li>$R \in SO$ 회전 행렬</li>
<li>$p \in \R ^3$ 위치 벡터</li>
<li>$I_r$  IMU bias</li>
</ul>
</li>
<li>3d 라이다, imu, gps 를 입력으로 사용</li>
<li>센서 관측값을 바탕으로 로봇 pose and traj 추정</li>
<li>상태 추정 문제를 MAP 문제로 정식화<ul>
<li>factor graph사용</li>
<li>가우시안 노이즈 가정, MAP 추론은 non linear least square 문제랑 같음</li>
</ul>
</li>
<li>factor graph<ul>
<li>state variable</li>
<li>IMU pre integration</li>
<li>Lidar odometry</li>
<li>GPS</li>
<li>loop closure</li>
</ul>
</li>
<li>새로운 node 추가 조건은 pose 변화량기반</li>
<li>graph 최적화는 bayes tree기반 incrtemental smoothing and mapping 사용 (iSAM2)</li>
</ul>
<h3 id="b-imu-preintegration-factor">B. IMU Preintegration Factor</h3>
<ul>
<li>IMU measurement<ul>
<li>$\hat{\omega}_t = \omega_t + b^\omega_t + n^\omega_t$</li>
<li>$\hat{a}_t = R^B_W (a_t - g) + b^a_t + n^a_t$</li>
<li>$\hat{\omega} _t , ~\hat{a} _t$  IMU raw data</li>
<li>$b^\omega_t, ~ b^a_t$ IMU bias</li>
<li>$n^\omega_t ,~ n^a_t$ white noise</li>
<li>$R^B_W$ 월드 좌표계에서 바디기준으로 변환하는 행렬</li>
</ul>
</li>
<li>motion update<ul>
<li>$v_{t+\Delta t} = v_t + g\Delta t + R_t(\hat{a}_t - b^a_t - n^a_t) \Delta t$</li>
<li>$p_{t+\Delta t} = p_t + v_t \Delta t + \frac{1}{2} g \Delta t^2 + \frac{1}{2} R_t (\hat{a}_t - b^a_t - n^a_t) \Delta t^2$</li>
<li>$R_{t+\Delta t} = R_t \exp((\hat{\omega}_t - b^\omega_t - n^\omega_t) \Delta t)$</li>
</ul>
</li>
<li>IMU preintegration<ul>
<li>$\Delta v_{ij} = R^T_i (v_j - v_i - g\Delta t_{ij})$</li>
<li>$\Delta p_{ij} = R^T_i (p_j - p_i - v_i \Delta t_{ij} - \frac{1}{2} g \Delta t^2_{ij})$</li>
<li>$\Delta R_{ij} = R^T_i R_j$</li>
<li>IMU bias는 factor graph에서 lidar odometry factor랑 같이 최적화</li>
</ul>
</li>
</ul>
<h3 id="c-lidar-odometry-factor">C. Lidar Odometry Factor</h3>
<ul>
<li>feature extraction<ul>
<li>edge plane 추출</li>
<li>$F_i = {F^e_i, F^p_i}$</li>
</ul>
</li>
<li>key frame selection<ul>
<li>pose 변화가 $1m ~10\degree$ 초과 시 → 메모리 절약 연산 최적화</li>
</ul>
</li>
<li>sub key frame selection based on sliding window<ul>
<li>sub keyframe 기반 voxel 맵 구성<ul>
<li>엣지는 0.2미터 해상도 평면은 0.4미터 해상도로</li>
</ul>
</li>
</ul>
</li>
<li>SCAN MATCHING<ul>
<li>IMU 예측 모션 기반 초기값 적용</li>
<li>feature랑 voxel맵 대응 매칭 수행</li>
</ul>
</li>
<li>relative transformation<ul>
<li>엣지랑 평면 feature간 거리 계싼 기반<ul>
<li>$d^e_k = \frac{\left| (p^e_{i+1,k} - p^e_{i,u}) \times (p^e_{i+1,k} - p^e_{i,v}) \right|}{\left| p^e_{i,u} - p^e_{i,v} \right|}$<ul>
<li>그냥 엣지에 대해서 직선 거리 계산하는거임</li>
</ul>
</li>
<li>$d^p_k = \frac{\left| (p^p_{i+1,k} - p^p_{i,u}) \cdot \left( (p^p_{i,u} - p^p_{i,v}) \times (p^p_{i,u} - p^p_{i,w}) \right) \right|}{\left| (p^p_{i,u} - p^p_{i,v}) \times (p^p_{i,u} - p^p_{i,w}) \right|}$<ul>
<li>이건 그냥 평면 사이 거리 계산하는거임</li>
</ul>
</li>
</ul>
</li>
<li>가우스 뉴턴 방식으로 최적 변환 도출<ul>
<li>$\min_{T_{i+1}} \sum_{p^e_{i+1,k} \in 0F^e_{i+1}} d^e_k + \sum_{p^p_{i+1,k} \in 0F^p_{i+1}} d^p_k$<ul>
<li>초기 추정값을 가지고 시작해서 오차함수(앞에서 구한 엣지 차이 평면 차이들로 정의됨)을 활용해서 그 차이를 최소화 하는 변환 행렬 구하는거</li>
</ul>
</li>
</ul>
</li>
<li>최종적으로 LO factor 계산<ul>
<li>$\Delta T_{i,i+1} = T_i^{-1} T_{i+1}$</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="d-gps-factor">D. GPS Factor</h3>
<ul>
<li>GPS측정값은 local cartesian coordinate로 변환, 새로운 node 추가시에 해당 팩터 같이 넣음</li>
<li>보정 조건<ul>
<li>라이다 프레임이랑 gps 동기화 안되면 gps를 라이타 프레임 타임스탬프에 맞춰서 선형 보간</li>
<li>LO 공분산이 GPS 공분산보다 클 경우에만 GPS factor추가 → 항상 추가하는거 아님</li>
</ul>
</li>
</ul>
<h3 id="e-loop-closure-factor">E. Loop Closure Factor</h3>
<ul>
<li>factor graph활용으로 loop closing 통합 잘됨</li>
<li>euclidean distance기반으로 loop detection을 수행함 → 다른 방법 써도된대 ex descriptor</li>
<li>15m보다 가까우면 loop closing수행</li>
</ul>
<h2 id="iv-experiments">IV. EXPERIMENTS</h2>
<ul>
<li>직접 딴 데이터 사용해서 테스트해보니까 잘됨이라는데</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/be56b3ae-e7d6-4123-97cf-8697afda9c39/image.png" alt=""></p>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/984216d4-3272-41ae-bcf5-7d0a5409a2f4/image.png" alt=""></p>
<ul>
<li>이런거로도 해봤는데 나 잘됨 ㅇㅇ</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/6bafef3e-62cf-413d-bb88-5e02f84fdba2/image.png" alt="">
<img src="https://velog.velcdn.com/images/estelle_y/post/b16322f3-246c-4029-b688-e338de3fb702/image.png" alt=""></p>
<ul>
<li>ㅇㅇ 잘된대</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/9a63b5e3-65cf-4a7a-9dd0-3b4b42ecd7a5/image.png" alt="">
<img src="https://velog.velcdn.com/images/estelle_y/post/b6c78aa7-a0b7-45ea-b280-099b6e5bbe9e/image.png" alt=""></p>
<ul>
<li>근데 잘된다는거 치고 모든 데이터셋에 대해서 동일한 메트릭으로 뽑은 결과치는 안줌</li>
<li></li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] MatrixVT를 돌려보자]]></title>
            <link>https://velog.io/@estelle_y/MatrixVT</link>
            <guid>https://velog.io/@estelle_y/MatrixVT</guid>
            <pubDate>Tue, 18 Mar 2025 14:44:46 GMT</pubDate>
            <description><![CDATA[<p>MatrixVT 설치/실행 똥꼬쇼 로그</p>
<h2 id="설치">설치</h2>
<h3 id="cuda">CUDA</h3>
<p>CUDA 요구버전 11.1</p>
<h4 id="cuda만-지우기">CUDA만 지우기</h4>
<pre><code>sudo apt-get --purge remove &#39;cuda*&#39;
sudo apt-get autoremove --purge &#39;cuda*&#39;
sudo rm -rf /usr/local/cuda*</code></pre><h4 id="cuda-113-설치힐거임">CUDA 11.3 설치힐거임</h4>
<p><a href="https://developer.nvidia.com/cuda-11.3.0-download-archive?target_os=Linux&amp;target_arch=x86_64&amp;Distribution=Ubuntu&amp;target_version=20.04&amp;target_type=runfile_local">CUDA 11.3 설치 사이트</a>에서 local run file로 설치 진행
엔비디아 드라이버는 뺴고 설치</p>
<p>그래픽 드라이버도 없으면 <a href="https://velog.io/@estelle_y/Nvidia-%EA%B7%B8%EB%9E%98%ED%94%BD-%EB%93%9C%EB%9D%BC%EC%9D%B4%EB%B2%84-Cuda-%EC%A7%80%EC%9A%B0%EA%B3%A0-%EB%8B%A4%EC%8B%9C-%EA%B9%94%EA%B8%B0">여기</a></p>
<h3 id="pytorch">pytorch</h3>
<p>요구 버전
torch==1.9.0
torchvision==0.10.0</p>
<pre><code>pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html</code></pre><h3 id="mmdetection3d">MMDetection3D</h3>
<p><a href="https://github.com/open-mmlab/mmdetection3d">MMDetection3D 공식 레포</a>와 <a href="https://mmdetection3d.readthedocs.io/en/latest/get_started.html#installation">MMDet3D의 공식 도큐먼트</a>를 따라 설치</p>
<p><a href="https://velog.io/@estelle_y/MMDetection3D-%EC%84%A4%EC%B9%98">MMDet3D 설치 기록</a></p>
<h3 id="requirement">requirement</h3>
<pre><code>pip3 install -r requirements.txt</code></pre><h3 id="install">install</h3>
<pre><code>python3 setup.py develop --user</code></pre><h2 id="문제-상황">문제 상황</h2>
<h3 id="pytorch-lightning">pytorch-lightning</h3>
<p>아래와 같은 문제 발생</p>
<pre><code>ERROR: No matching distribution found for pytorch-lightning==1.6.0</code></pre><h4 id="해결-방법">해결 방법</h4>
<p>아래와 같이 파이토치 설치시에 한번에 같이 깔아서 파이토치와의 호환성 해결</p>
<pre><code>conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 torchlightning -c pytorch -c conda-forge
</code></pre><p>requirement.txt 에서 <code>pytorch-lightning</code> 부분 삭제</p>
<h3 id="nvcc-exit-status-1">nvcc exit status 1</h3>
<pre><code>1 error detected in the compilation of &quot;bevdepth/ops/voxel_pooling_train/src/voxel_pooling_train_forward_cuda.cu&quot;.
error: command &#39;/usr/local/cuda-11.3/bin/nvcc&#39; failed with exit status 1</code></pre><h4 id="서치-결과-해결-방법">서치 결과 해결 방법</h4>
<ol>
<li>gcc 버전 바꾸기</li>
<li>pytorch 버전 바꾸기</li>
<li>pytorch-lightning 버전 바꾸기</li>
</ol>
<pre><code>pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 torchlightning --extra-index-url https://download.pytorch.org/whl/cu113</code></pre>]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] CIL++ 도커 없이 로컬에서 돌려보자]]></title>
            <link>https://velog.io/@estelle_y/CIL-%EB%8F%84%EC%BB%A4-%EC%97%86%EC%9D%B4-%EB%A1%9C%EC%BB%AC%EC%97%90%EC%84%9C-%EB%8F%8C%EB%A0%A4%EB%B3%B4%EC%9E%90</link>
            <guid>https://velog.io/@estelle_y/CIL-%EB%8F%84%EC%BB%A4-%EC%97%86%EC%9D%B4-%EB%A1%9C%EC%BB%AC%EC%97%90%EC%84%9C-%EB%8F%8C%EB%A0%A4%EB%B3%B4%EC%9E%90</guid>
            <pubDate>Tue, 18 Mar 2025 14:41:43 GMT</pubDate>
            <description><![CDATA[<h3 id="cil-git-repo-레포를-도커-없이-로컬에서-돌리기-위한-똥꼬쇼"><a href="https://github.com/yixiao1/CILv2_multiview?tab=readme-ov-file">CIL++ Git Repo</a> 레포를 도커 없이 로컬에서 돌리기 위한 똥꼬쇼</h3>
<h2 id="깃-클론">깃 클론</h2>
<pre><code>git clone git@github.com:yixiao1/CILv2_multiview.git</code></pre><p>깃 레포 클론</p>
<h2 id="칼라-설치">칼라 설치</h2>
<p>칼라 0.9.13을 사용중이므로 칼라 시뮬레이터 0.9.13의 설치가 필요함</p>
<h3 id="설치-방법">설치 방법</h3>
<ul>
<li><a href="https://carla.readthedocs.io/en/0.9.13/start_quickstart/">칼라 도큐먼트</a>를 참고<ul>
<li>해당 방법으로 설치 시도 실패</li>
</ul>
</li>
<li><a href="https://github.com/carla-simulator/carla/releases/tag/0.9.13/">칼라 0.9.13 git</a>에서 직접 다운받아 carla simulator 0.9.13 설치<ul>
<li>위의 공식 깃헙에서 tar파일 다운 후, <code>/opt</code> 아래에서 압축 해제 진행하여 칼라 설치 완료</li>
</ul>
</li>
</ul>
<h2 id="경로-설정">경로 설정</h2>
<pre><code>export ROOTDIR=/home/amlab/save_ws
export CARLAPATH=$ROOTDIR/CARLA_0.9.13/PythonAPI/carla/:$ROOTDIR/CARLA_0.9.13/PythonAPI/carla/dist/carla-0.9.13-py3.7-linux-x86_64.egg
</code></pre><p>까지 하다가 뭔가 고치고 성공했는데
기억이 안난다</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[논문스터디] KPConv: Flexible and Deformable Convolution for Point Clouds]]></title>
            <link>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-KPConv-Flexible-and-Deformable-Convolution-for-Point-Clouds</link>
            <guid>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8%EC%8A%A4%ED%84%B0%EB%94%94-KPConv-Flexible-and-Deformable-Convolution-for-Point-Clouds</guid>
            <pubDate>Tue, 18 Mar 2025 12:55:37 GMT</pubDate>
            <description><![CDATA[<h2 id="bibtex-인용">Bibtex 인용</h2>
<pre><code class="language-jsx">@InProceedings{Thomas_2019_ICCV,
author = {Thomas, Hugues and Qi, Charles R. and Deschaud, Jean-Emmanuel and Marcotegui, Beatriz and Goulette, Francois and Guibas, Leonidas J.},
title = {KPConv: Flexible and Deformable Convolution for Point Clouds},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}</code></pre>
<p><a href="https://github.com/HuguesTHOMAS/KPConv-PyTorch"><strong>code</strong></a></p>
<hr>
<h2 id="요약">요약</h2>
<ul>
<li>intermediate representation이 없는 kernel point convolution을 제안</li>
<li>convolution weight는 유클리드 공간에서 kernel point로 위치가 지정<ul>
<li>kernel point 변경 가능 → 유연성 제공</li>
</ul>
</li>
<li>local grometry에 kernel point를 adapt하기 위해 deformable convolution으로 확장 가능</li>
<li>규칙적인 sub-sampling을 통해 밀도에 대해서 robust, efficient</li>
<li>SOTA임</li>
</ul>
<hr>
<h2 id="인트로">인트로</h2>
<ul>
<li>discrete convolution에서는 효율적인 계산 가능 but 공간에서는 불가능<ul>
<li>비정형 데이터(non-grid) like 3d point cloud같은 데이터 사용하는 application 증가</li>
<li>포인트 클라우드는 순서도 없고 그리드랑 다르고 spatially localized되어있음</li>
</ul>
</li>
<li>이러한 데이터를 처리하기 위해 여러 방법이 제안되어옴<ul>
<li>MLP이용 직접 처리</li>
<li>point에 직접 convolution</li>
</ul>
</li>
<li>KPConv<ul>
<li>local 3D filter로 구성</li>
<li>kernel pixel이 아닌 point기반 weight 영역 정의</li>
<li>kernel point의 수에 제한이 없음 → 설계가 유연함</li>
</ul>
</li>
<li>deformable<ul>
<li>각 convolution location에 대해 다른 shift를 생성<ul>
<li>입력 포인트 클라우드에 대해서 kernel을 adapt한다는 의미임</li>
</ul>
</li>
</ul>
</li>
<li>radius neighborhood 방식 + regular sub-sampling → density에 robust함</li>
</ul>
<hr>
<h2 id="기여">기여</h2>
<ul>
<li>3d point cloud를 위한 새로운 kernel 제시</li>
<li>deformable한 kernel제시</li>
<li>새로운 네트워크 아키텍쳐 제시</li>
</ul>
<hr>
<h2 id="related-work">Related Work</h2>
<ul>
<li>Projection networks</li>
<li>graph convolution network</li>
<li>pointwise MLP network</li>
<li>point convolution network</li>
</ul>
<h2 id="kernel-point-convolution">Kernel Point Convolution</h2>
<h3 id="a-kernel-function-defined-by-point">A Kernel Function Defined by Point</h3>
<ul>
<li>KPConv는 local 3d filter로 구성</li>
<li>kernel point를 사용하여 kernel의 weight 영역을 정의</li>
<li>포인트의 수에 제한이 없어 flexible한 설계 가능</li>
<li>밀도가 다른 데이터 처리시 robust</li>
<li>일반적 point kernel 함수는 아래와 같음<ul>
<li>$(F * g)(x) = \sum_{x_i \in N_x} g(x_i - x) f_i$</li>
<li>$x_i$ - neighbor point of $x$</li>
<li>$N_x$ - point in radius $r$</li>
<li>일관된 구형 영역을 갖는 것이 네트워크 학습에 의미있다 생각함</li>
</ul>
</li>
<li>point를 이용해서 3D Space에서 area를 어떻게 정의<ul>
<li>가장 intuitive</li>
<li>localized feature</li>
</ul>
</li>
<li>kernel function for any point ← this paper propose<ul>
<li>$g(y_i) = \sum_{k &lt; K} h(y_i, ~x_{k}) W_k$</li>
<li>$h(y_i, ~x_{k}) = \max \left( 0, 1 - \frac{|y_i - ~x_{k}|}{\sigma} \right)$</li>
<li>$\sigma$ 는 influence distance of kernel point인데, input density에 따라 결정됨</li>
<li>가우시안 correlation이 아니라 선형 correlation사용해서 심플하고 back-propagation이 쉬움</li>
</ul>
</li>
</ul>
<h3 id="rigid-or-deformable-kernel">Rigid or Deformable Kernel</h3>
<ul>
<li><p>각 점이 다른 점에 repulsive한 force를 가지는 최적의 위치로 kernel point 배치</p>
</li>
<li><p>sphere안에 있고 atrractive force를 가지는 점들로 제한을 두고 한개의 점은 center에 위치해야함
<img src="https://velog.velcdn.com/images/estelle_y/post/d6d54e01-19ce-4ab3-a91f-b72ad0de56a2/image.png" alt=""></p>
</li>
<li><p>모든 점들은 평균 반지름이 $1.5 \sigma$ 가 되도록 re-scale</p>
<ul>
<li>other kernel들과 small overlap을 ensure</li>
<li>space coverage를 보장</li>
</ul>
</li>
<li><p>K가 충분히 커서 g의 area를 커버가능할 경우 좋음</p>
</li>
<li><p>kernel point position을 학습시켜서 효율성을 확장시킬수도있음</p>
</li>
<li><p>$g$ 가 $~x_k$ 에 대해 미분가능하므로 학습가능한 매개변수임</p>
</li>
<li><p>deformable KPConv는 아래와 같은 g를 가짐</p>
<ul>
<li>$(F * g)(x) = \sum_{x_i \in N_x} g_{\text{deform}}(x - x_i, \Delta(x)) f_i$</li>
<li>$g_{\text{deform}}(y_i, \Delta(x)) = \sum_{k &lt; K} h(y_i, x_{ek} + \Delta_k(x)) W_k$</li>
</ul>
</li>
<li><p>local shift는 rigid KPConv가 입력 feature를 3K로 매핑하는 의 출력으로 정의됨</p>
</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/3ec055a5-d6b1-4788-b95b-5ee7b171d65f/image.png" alt=""></p>
<ul>
<li><p>global nerwork의 lr의 0.1로 학습</p>
<ul>
<li>rigid kernel → shift</li>
<li>deformable kernel → output</li>
</ul>
</li>
<li><p>image convolution 에서 derive된 이런 방식을 사용하면, kernel point가 input point와 멀어지는 방향으로 학습될 수 있음</p>
<ul>
<li>이러면 네트워크에서 소실됨 bcz. shift의 gradient가 influence range안에 없으면 null 이 됨</li>
<li>fitting regularization loss를 제안함</li>
</ul>
</li>
<li><p>regularization loss</p>
<ul>
<li>$L_{\text{reg}} = \sum_x L_{\text{fit}}(x) + L_{\text{rep}}(x)$</li>
<li>$L_{\text{fit}}(x) = \sum_{k &lt; K} \min_{y_i} \left( | y_i - (x_{ek} + \Delta_k(x)) | / \sigma \right)^2$</li>
<li>$L_{\text{rep}}(x) = \sum_{k &lt; K} \sum_{l \neq k} h(x_{ek} + \Delta_k(x), x_{el} + \Delta_l(x))^2$</li>
</ul>
</li>
<li><p>fitting loss는 kernel point와 그 점의 가장 가까운 점과의 거리에 대한 loss</p>
</li>
<li><p>repulsive loss는 kernel들 사이에 overlap에 대한 loss → 완전히 겹치지 않도록
<img src="https://velog.velcdn.com/images/estelle_y/post/9ac1786c-f78d-4ca9-8986-024a18f7acdf/image.png" alt=""></p>
</li>
<li><p>잘되는거 봐라 ㅇㅇ</p>
</li>
</ul>
<h3 id="kernel-point-network-layers">Kernel Point Network Layers</h3>
<ul>
<li><strong>Subsampling to deal with varying densities</strong><ul>
<li>grid subsampling → 위치에 대한 일관성 보장</li>
<li>각 non-empty한 cell에 대해서 질량 중심이 되는 위치를  feature의 location으로 사용</li>
</ul>
</li>
<li><strong>Pooling Layer</strong><ul>
<li>이미 그리드 기반으로 subsampling했으니까 그냥 그리드 크기를 두배씩 키워가면서 pooling layer구성함</li>
<li>새로운 위치에 대한 feature는 max pooling혹은 KPConv를 활용하여 얻음</li>
<li>여기에서는 KPConv를 활용하여 얻고 이거를 stride KPConv라 부름</li>
</ul>
</li>
<li><strong>KPConv layer</strong><ul>
<li>convolution 층의 입력<ul>
<li>point, feature, matrix of neighbourhood indices</li>
<li>matrix of neighborhood size는 가장 큰거 따라감<ul>
<li>안쓰이는 애들 포함되는데 convolution 계산에서는 무시됨</li>
</ul>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="kernel-point-network-architecture">Kernel Point Network Architecture</h3>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/d305599b-9296-40a0-bca4-89a4ae14d47e/image.png" alt=""></p>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/f021c682-cfc5-4093-a3de-049e5e0ba41c/image.png" alt=""></p>
<ul>
<li><p>empirically 두개로 만듬 → classification and segmentation</p>
</li>
<li><p>KP-CNN</p>
<ul>
<li><p>5 layer classification convolution network</p>
</li>
<li><p>각 layer 에 2 conv layer</p>
</li>
<li><p>resnet처럼 디자인됐다</p>
<ul>
<li>image convolution대신 batch norm and leaky ReLu를 사용했다</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/25f3f7e0-565e-4267-8db1-f9a443c3000e/image.png" alt=""></p>
</li>
</ul>
</li>
</ul>
<pre><code>- last layerdㅔ서는 global average pooling으로 feature aggregation을 하고 fully connected layer랑 softmax로 처리
- deformable KPConv에서는 마지막 5개 KPConv에 대해서만 deformable사용</code></pre><ul>
<li>KP-FCNN<ul>
<li>fully convolution layer for segmentation</li>
<li>encoder는 위랑 같음</li>
<li>decoder는 nearest upsampling을 사용</li>
<li>skip connection으로 encoder decoder사이 연결 있음</li>
<li>unary convolution을 활용해서 feature concatenate</li>
<li>nearest upsampling을 KPConv로 대체해도 되지만 성능에 별차이없음</li>
</ul>
</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[[논문 스터디] iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree]]></title>
            <link>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8-%EC%8A%A4%ED%84%B0%EB%94%94-iSAM2-Incremental-Smoothing-and-Mapping-Using-the-Bayes-Tree</link>
            <guid>https://velog.io/@estelle_y/%EB%85%BC%EB%AC%B8-%EC%8A%A4%ED%84%B0%EB%94%94-iSAM2-Incremental-Smoothing-and-Mapping-Using-the-Bayes-Tree</guid>
            <pubDate>Tue, 18 Mar 2025 12:45:54 GMT</pubDate>
            <description><![CDATA[<h2 id="bibtex-인용">Bibtex 인용</h2>
<pre><code class="language-jsx">@INPROCEEDINGS{5979641,
  author={Kaess, Michael and Johannsson, Hordur and Roberts, Richard and Ila, Viorela and Leonard, John and Dellaert, Frank},
  booktitle={2011 IEEE International Conference on Robotics and Automation}, 
  title={iSAM2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering}, 
  year={2011},
  volume={},
  number={},
  pages={3281-3288},
  keywords={Simultaneous localization and mapping;Graphical models;Smoothing methods;Sparse matrices;Accuracy;Trajectory},
  doi={10.1109/ICRA.2011.5979641}}</code></pre>
<hr>
<h2 id="요약">요약</h2>
<ul>
<li>기존의 그래픽 베이스 모델 추론 알고리즘과 sparse matrix factorization method의 연결을 이해하기 위한 기초를 제공<ul>
<li>bayes tree라는 새로운 구조를 통해 제공</li>
</ul>
</li>
<li>Clique tree와 유사하지만 방향성을 가지는 bayes tree를 제시</li>
<li>SLAM 문제의 squre root information matrix에 더 자연스럽게 mapping</li>
</ul>
<hr>
<h2 id="인트로">인트로</h2>
<ul>
<li>Probabilistic inference algorithm<ul>
<li>다양한 로봇 공학 분야에서 활용 ex) SLAM, tracking, etc</li>
</ul>
</li>
<li>본 연구는 large-scale SLAM에 집중</li>
<li>센서의 불확실성 때문에 probabilistic inference algorithm이 선호됨</li>
</ul>
<h3 id="관련-연구">관련 연구</h3>
<ul>
<li>Thin Junction Tree Filter (TJTF), 2003<ul>
<li>그래픽 모델 기반 incremental solution</li>
<li>non-linear SLAM problem에 대해 일관성 없음</li>
</ul>
</li>
<li>Full SLAM, 2005<ul>
<li>모든 로봇 자세에 대해 기록하여 정확한 solution 제공, 일관성 문제 없음</li>
</ul>
</li>
<li>Graphicall SLAM (Folkesson and christensen), 2004<ul>
<li>locally complexity를 줄이는 방법 제안</li>
</ul>
</li>
<li>Treemap, 2006<ul>
<li>트리 노드에서 QR 분해 수행</li>
</ul>
</li>
<li>Loopy SAM, 2007<ul>
<li>SLAM 그래프에 직접 loopy belief propagation 적용</li>
</ul>
</li>
<li>iSAM<ul>
<li>빠른 Incremental 업데이트를 통해 squre root information matrix를 계산하면서 global map과 경로를 언제든 계산 가능</li>
<li>새로운 measurements는 matrix update equation을 통해 추가, 이전에 사용된 information matrix의 일부를 활용</li>
<li>efficiency와 consistency 유지를 위해 periodic batch step이 필요</li>
<li>실시간성 유지하지 못함 0.7*realtime으로 진행</li>
</ul>
</li>
</ul>
<hr>
<h2 id="기여">기여</h2>
<ul>
<li>Bayes tree라는 새로운 데이터 구조를 제안<ul>
<li>matrix factorization을 bayes net으로 변환 가능</li>
<li>QR factorization의 결과가 더 natural하게 mapping됨</li>
<li>구조를 conditional probabilistic density로 분석 가능</li>
</ul>
</li>
<li>iSAM2라는 새로운 알고리즘을 개발함<ul>
<li>Incremental variable re-ordering과 fluid re-linearization, periodic batch step의 제거를 통한 efficiency 개선</li>
<li>sparse non-linear problem에 효율적 solution</li>
<li>bayes tree기반 영향을 받는 부분만 re-calculate → 효율성 증대</li>
<li>실시간성 확보</li>
</ul>
</li>
</ul>
<hr>
<h2 id="problem">Problem</h2>
<h3 id="target">Target</h3>
<ul>
<li>non-linear한 추정 문제에 대해 incremental하고 real-time인 해결 방법<ul>
<li>incremental: 새로운 measurement가 추가될 때 마다 추정값을 업데이트 현재 측정된 모든 값으로 도출할 수 있는 가장 정확한 환경모델 반영</li>
<li>real-time: 작업을 수행하는 동안 추정값을 실시간 제공, 탐색 및 계획을 위한 추정값 필요</li>
</ul>
</li>
<li>주어진 추정 문제를 그래프 모델로 표현하기 위해 factor graph사용<ul>
<li>다양한 확률 분포나 비용 함수를 포함할 수 있음</li>
<li>factor node - 랜드마크 측정값, 오도메트리(움직임에 관한 정보), loop closing constraint(재방문 시 발생하는 제약조건) 등</li>
<li>variable node - 추정하려는 변수, 각 시간 스텝에서의 위치, 랜드마크의 위치 등</li>
</ul>
</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/82da6f5b-c413-44da-bdb0-a1b9019c6275/image.png" alt=""></p>
<h3 id="gaussian-case">Gaussian Case</h3>
<ul>
<li>non-linear least squre 문제<ul>
<li>$\arg\min_{\Theta} \frac{1}{2} \sum_{i} \lVert h_i(\Theta_i) - z_i \rVert^2_{\Sigma_i}$<ul>
<li>$h_i$ - measurement function</li>
<li>$z_i$ - measurement</li>
<li>$|| e ||^2_Σ = e^T Σ^(-1) e$ - mahalanobis distance</li>
</ul>
</li>
</ul>
</li>
<li>linearization<ul>
<li>gauss-newton, levenberg-marquardt</li>
<li>각 iteration에서 linearization point $\theta$ 부근에서 테일러 전개를 수행 새로운 least squre 문제를 도출</li>
<li>$argmin_Δ || AΔ - b ||^2$</li>
<li>$A$ - measurement jacobian</li>
<li>linearization 됐으니까 새로운 추정값은 단순 +로 계산가능</li>
</ul>
</li>
<li>$A \Delta - b$의 최소 해는 Cholesky 혹은 QR factorization 을 통해 계산</li>
<li>iSAM2는 QR factorization 사용<ul>
<li>incrementally update square root information matrix</li>
<li>measurement 추가 시에 matrix variable의 순서가 최적이 아니게 되고 fill-in현상이 발생할 수 있음<ul>
<li>periodic batch re-ordering을 수행하고 batch factorization 진행</li>
<li>iSAM과 다르게 re-linearization은 batch 단계에서만 수행</li>
<li>period of batch step은 heuristically(empirically인듯?</li>
</ul>
</li>
</ul>
</li>
</ul>
<h2 id="the-bayes-tree">The Bayes Tree</h2>
<ul>
<li>기존의 factor graph를 sparse matrix로 바꾸어 sparse linear algebra대신 graph model 자체에서 연산을 수행</li>
</ul>
<h3 id="inference-and-elimination">Inference and Elimination</h3>
<ul>
<li>추정은 factor graph를 bayes net으로 변환하는 것으로 이해할 수 있음</li>
<li>변수 제거 $P(\Theta) = \prod_{j} P(\theta_j | S_j)$<ul>
<li>$S_j$는 $\theta _j$와 직접 연결된 변수들의 집합</li>
</ul>
</li>
<li>factor graph 변환 과정</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/e5a68a87-0fd6-4b0e-a7b5-b0c1ad6cbd7d/image.png" alt=""></p>
<pre><code>- 위의 과정을 반복하여 모든 variable을 제거하면 bayes net이 됨
- probabilistically variable의 probability의 곱이 conditional probability의 곱으로 변환된는 것과 같음</code></pre><p><img src="https://velog.velcdn.com/images/estelle_y/post/936fef0c-db34-4ae3-ad2b-601b0a1bb717/image.png" alt=""></p>
<ul>
<li>위에 그림이 이해가 좀 안됨</li>
</ul>
<p><a href="https://youtu.be/_W3Ua1Yg2fk">보조 강의</a>
<img src="https://velog.velcdn.com/images/estelle_y/post/e815e36b-fc57-4a6c-b84e-7fe6b2f0a472/image.jpg" alt=""></p>
<ul>
<li>모두 제거하고 나면 all factor들이 conditional probability로 표현가능해짐, tree structure를 가지게 됨<ul>
<li>이것이 부분적으로 새로운 measurement에 대해 inference할 수 있게하는 핵심 요소</li>
</ul>
</li>
</ul>
<p><strong>Gaussian Case</strong></p>
<ul>
<li>elimination 과정이 sparse QR factorization of measurement jacobian과 같음</li>
<li>factor에 대한 gaussian density는 $f_{\text{joint}}(\Delta_j, s_j) \propto \exp \left( -\frac{1}{2} \left| a \Delta_j + A_S s_j - b \right|^2 \right)$로 정의<ul>
<li>$A_j = [a | A_S]$ - $\Delta _j$에 연결된 모든 요인의 partial derivatives를 concat한 matrix</li>
</ul>
</li>
<li>bayes tree로 변환할 때 사용되는 conditional probability는 $P(\Delta_j | s_j) \propto \exp \left( -\frac{1}{2} (\Delta_j + r s_j - d)^2 \right)$<ul>
<li>$r = a^\dagger A_S , ~d =a^\dagger b$</li>
<li>$a^\dagger$ 는 $a^T a$의 pseudo-inverse matrix</li>
<li>$a$ - $\Delta$와 관련된 factor의 일부, partial derivative</li>
<li>$b$ - $\Delta$와 관련된 measurements</li>
<li>$d$ - $a^\dagger  b$</li>
<li>$A_S$ - $S_j$의 partial derivatives 집합</li>
<li>$S_j$ - seperator; $\theta_j$와 직접 연결된 variable</li>
</ul>
</li>
<li>새로운 factor는 $f_{\text{new}}(s_j) = \exp \left( -\frac{1}{2} \left| A&#39;_0 s_j - b&#39;_0 \right|^2 \right)$<ul>
<li>$A_0 &#39; = A_S - ar ,~ b&#39;_0 = b-ad$</li>
</ul>
</li>
<li>이 과정은 gram-schmidt의 한 단계, 밀도 형태로 해석됨</li>
<li>sparse vector $\gamma$와 scalar $d$는 bayes net의 single joint conditional density를 지정하거나 sparse information matrix의 하나의 행</li>
<li>least square problem은 tree의 leaves to root방향으로 한 번 통과하면서 최적의 $\Delta ^*$를 계산, root to leaf 로 내려가며 각 변수의 최적 할당을 구함 → backsubstitution</li>
</ul>
<h3 id="creating-bayes-tree">Creating Bayes Tree</h3>
<ul>
<li>Bayes tree<ul>
<li>linear algebra와의 equivalence를 더 잘 표현</li>
<li>새로운 recursive algorithm을 가능하게 함</li>
<li>chordal 구조 - 모든 부분 순환 구조의 크기가 3이하, 4 이상이면 현을 가져야함</li>
<li>최적화 및 marginalization에 용이</li>
<li>방향성을 가지고, factored probability density를 encode하는 방식</li>
<li>각 node에 대해 $P(\Theta) = \prod_k P(F_k | S_k)$ conditional density 정의<ul>
<li>$S_k$ - 클리크 $C_k$와 부모 클리크 $\Pi_k$의 intersection</li>
<li>$F_k$ - 나머지 변수들</li>
</ul>
</li>
</ul>
</li>
</ul>
<p><strong>Gaussian Case</strong></p>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/537af777-41d5-47bd-b8b2-3f0515e9a0f6/image.png" alt=""></p>
<ul>
<li>하나의 bayes tree가 여러 다른 square root information factor에 대응할 수 있음<ul>
<li>임의의 순서가 매겨지기 때문</li>
<li>전체 variable의 순서는 fill-in이나 수치에 영향주지 않고 matrix내의 위치에만 영향</li>
</ul>
</li>
</ul>
<h3 id="incremental-inference">Incremental Inference</h3>
<p><a href="https://www.youtube.com/watch?v=R47oeNAatLI">inference 영상</a></p>
<ul>
<li>incremental inference는 간단한 트리 수정으로 가능</li>
<li>영향을 받는 clique와 root사이의 경로만 영향을 받음 (clique to root , root to clique)</li>
<li>new factor가 추가되므로, 다시 eliminating process를 거침</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/399d85d7-af7f-4aff-bcac-06807032d3a9/image.png" alt=""></p>
<ul>
<li>사진 이해 잘 안됨
<img src="https://velog.velcdn.com/images/estelle_y/post/87c3b7b2-65bb-425c-8365-3b5887237a28/image.jpg" alt=""></li>
</ul>
<h3 id="incremental-variable-ordering">Incremental Variable Ordering</h3>
<ul>
<li>variable ordering은 sparse matrix solution에 필수적</li>
<li>square root information matrix의 추가 항목인 fill in 최소화를 위해 optimal order가 추구됨</li>
<li>chordal 상황 제외 fill in은 불가피함<ul>
<li>NP-hard, COLAMD등을 통해 optimal한 순서 찾기 가능</li>
</ul>
</li>
<li>incremental inference 시에, 각 update 마다 variable update가능<ul>
<li>iSAM에서 사용한 periodic batch reordering 불필요</li>
<li>bayes tree에서 partial variable reordering을 수행<ul>
<li>globally optimal하지는 않지만 locally optimal한 값을 제공</li>
</ul>
</li>
</ul>
</li>
<li>tree 구조가 가지는 장점에 대한 예시</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/f5da005e-b351-4874-a62f-e7d31e538f2f/image.png" alt=""></p>
<ul>
<li>measurement를 통합하는데 발생하는 비용은 root에 가까워 질수록 작아짐</li>
<li>COLAMD와 같은 휴리스틱을 locally 사용하면 현재단계의 fill in만을 고려하는 한계 존재<ul>
<li>가장 최근 접근한 variable을 끝 순서로 배치하는 incremental ordering 제안</li>
</ul>
</li>
<li>incremental ordering<ul>
<li>constrained COLAMD 사용, most recent variable을 강제로 끝 순서에 배치하면서도 globally 준수한 order를 유지</li>
<li>이후 업데이트 시에 영향을 받는 부분을 작게 유지할 수 있는 방법</li>
<li>다만 큰 loop closing 발생시 예외적으로 비용이 큼</li>
</ul>
</li>
</ul>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/4db3dfe3-4b98-4ee2-8182-acaff52c7441/image.png" alt=""></p>
<pre><code>- batch보다 나음 - 당연하지 않나
- 특정 구간에서 급격한 fill in의 증가 - 아마도 loop closing 때문</code></pre><h2 id="the-isam2-algorithm">The iSAM2 Algorithm</h2>
<ul>
<li>non-linear factor 처리 → 기존의 bayes tree는 linear만 다룸</li>
<li>fluid re-linearization → 필요한 부분만 partially linearization수행 cost를 줄이고 효율성을 향상</li>
<li>partial state update → 실제 변화가 있는 factor들에 대해서만 update 수행</li>
</ul>
<h3 id="fluid-relinearization">Fluid Relinearization</h3>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/eb351895-3498-4bd9-822a-1a6c4858d71d/image.png" alt=""></p>
<ul>
<li>linearization 필요성 판단<ul>
<li>현재 추정값이 linearization point를 벗어날 경우</li>
<li>임계값 이상의 변화가 발생할 경우</li>
</ul>
</li>
<li>bayes tree 의 부분적 수정<ul>
<li>linearization을 수행하는 변수와 관련된 정보만 제거하여 partial relinearization 수행</li>
</ul>
</li>
<li>marginal factor 계산<ul>
<li>relinearization 과정에서 발생한 eliminated sub-tree 정보를 상위단으로 전달</li>
<li>caching시에 tree의 중간에서 부터 다시 계산도 가능</li>
</ul>
</li>
</ul>
<h3 id="partial-state-update">Partial State Update</h3>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/c1f9048c-d219-424a-b3be-652c83984fa8/image.png" alt=""></p>
<ul>
<li>update partially, 변경된 변수만 계산 → computational cost 감소</li>
<li>top tree만 변경되므로 sub-tree로는 제한적으로 propagate</li>
<li>특정 clique의 variable의 $\Delta$ 변화량이 임계치 이하면 업데이트 중지<ul>
<li>해당 clique의 sub tree variable의 변경이 없음이 보장됨</li>
</ul>
</li>
<li>nearly exact solution 유지 가능</li>
</ul>
<h3 id="algorithm-and-complexity">Algorithm and Complexity</h3>
<p><img src="https://velog.velcdn.com/images/estelle_y/post/d4da294a-7932-44c3-a0a9-cd91e727a8ba/image.png" alt=""></p>
<ul>
<li>algorithm<ul>
<li>변수 집합 추정</li>
<li>incremental non-linear factor $F$ 고려</li>
<li>새로운 factor, variable이 계속 추가돔</li>
<li>bayes tree 활용 최적화 수행</li>
<li>선형화 시스템 반복적 해결 방식</li>
</ul>
</li>
<li>complexity<ul>
<li>general case<ul>
<li>gauss-newton 방식 사용</li>
<li>최소점 근처에서 quadratic convergence</li>
</ul>
</li>
<li>exploration task<ul>
<li>각 pose마다 constrain 존재</li>
<li>영향을 받는 factor가 상수개 → $O(1)$</li>
</ul>
</li>
<li>loop closure<ul>
<li>general case → full factorization 필요 → $O(n^3 )$</li>
<li>under certain assumption → backsubstitution → $O(n^{1.5} )$</li>
</ul>
</li>
<li>emperical complexity<ul>
<li>이론적 상한보다 훨씬 낮음</li>
<li>매 단계에서 대부분 partially compute/refactorization수행하므로 대부분의 경우 효율적 계산</li>
</ul>
</li>
</ul>
</li>
</ul>
]]></description>
        </item>
        <item>
            <title><![CDATA[[WIP] ORB SLAM 2]]></title>
            <link>https://velog.io/@estelle_y/ORB-SLAM-2</link>
            <guid>https://velog.io/@estelle_y/ORB-SLAM-2</guid>
            <pubDate>Fri, 14 Mar 2025 04:41:23 GMT</pubDate>
            <description><![CDATA[<h2 id="1-pangolin-설치">1. <a href="https://github.com/stevenlovegrove/Pangolin">Pangolin</a> 설치</h2>
<pre><code>git clone https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin
git checkout v0.6
cmake -B build
cmake --build build</code></pre><p>이거로 하면 ORB SLAM에 패키지 맥이는게 너무 귀찮게함 자꾸 경로 못찾음</p>
<pre><code>cd build &amp;&amp; rm -rf *
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local
make -j
sudo make install</code></pre><p>이렇게 해서 해결함</p>
<h2 id="2-opencv">2. <a href="https://github.com/opencv/opencv">opencv</a></h2>
<pre><code>sudo apt update
sudo apt upgrade
sudo apt install build-essential cmake git pkg-config
sudo apt install libjpeg-dev libtiff-dev libpng-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt install libxvidcore-dev libx264-dev
sudo apt install libgtk-3-dev
sudo apt install libatlas-base-dev gfortran
sudo apt install python3-dev
mkdir opencv &amp;&amp; cd opencv
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
cd opencv &amp;&amp; git checkout 3.2.0
cd ..
cd opencv_contrib &amp;&amp; git checkout 3.2.0
cd ..
cd opencv &amp;&amp; mkdir build &amp;&amp; cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=OFF -D WITH_IPP=OFF -D WITH_1394=OFF -D BUILD_WITH_DEBUG_INFO=OFF -D BUILD_DOCS=OFF -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_QT=OFF -D WITH_GTK=ON -D WITH_OPENGL=ON -D OPENCV_EXTRA_MODULES_PATH=../OpenCV_contrib/modules -D WITH_V4L=ON  -D WITH_FFMPEG=ON -D WITH_XINE=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D WITH_CUDA=OFF  -DLAPACKE_INCLUDE_DIR=/usr/include/lapacke ..
make -j
sudo make install
sudo ldconfig</code></pre><p><a href="https://velog.io/@estelle_y/opencv-%EC%84%A4%EC%B9%98">reference</a></p>
<p>opencv 3.2.0을 깔았는데 안돌아가 슬퍼 눈물나</p>
<p>알고보니
파일이 꺠져서 안되는 것이었다..</p>
<h2 id="3-ros-noetic">3. ROS Noetic</h2>
<p>-중략-</p>
<p>그만두고 orbslam3으로 건너갔음</p>
]]></description>
        </item>
        <item>
            <title><![CDATA[[설치] opencv 설치]]></title>
            <link>https://velog.io/@estelle_y/opencv-%EC%84%A4%EC%B9%98</link>
            <guid>https://velog.io/@estelle_y/opencv-%EC%84%A4%EC%B9%98</guid>
            <pubDate>Tue, 11 Mar 2025 10:00:22 GMT</pubDate>
            <description><![CDATA[<h2 id="아래-순서로-진행">아래 순서로 진행</h2>
<pre><code>sudo apt update
sudo apt upgrade
sudo apt install build-essential cmake git pkg-config
sudo apt install libjpeg-dev libtiff-dev libpng-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt install libxvidcore-dev libx264-dev
sudo apt install libgtk-3-dev
sudo apt install libatlas-base-dev gfortran
sudo apt install python3-dev
sudo apt install libblas-dev libopenblas-dev</code></pre><pre><code>mkdir opencv &amp;&amp; cd opencv
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
cd opencv &amp;&amp; git checkout &lt;desired_ver&gt;
cd ..
cd opencv_contrib &amp;&amp; git checkout &lt;desired_ver&gt;
cd ..</code></pre><pre><code>cd opencv &amp;&amp; mkdir build &amp;&amp; cd build</code></pre><pre><code>cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=OFF -D WITH_IPP=OFF -D WITH_1394=OFF -D BUILD_WITH_DEBUG_INFO=OFF -D BUILD_DOCS=OFF -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_QT=OFF -D WITH_GTK=ON -D WITH_OPENGL=ON -D OPENCV_EXTRA_MODULES_PATH=../OpenCV_contrib/modules -D WITH_V4L=ON  -D WITH_FFMPEG=ON -D WITH_XINE=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D WITH_CUDA=OFF  -DLAPACKE_INCLUDE_DIR=/usr/include/lapacke ..
make -j
sudo make install
sudo ldconfig</code></pre><h3 id="에러-발생한-것들">에러 발생한 것들</h3>
<h4 id="1">1</h4>
<p><code>CMakeLists.txt</code>에
<code>set(CMAKE_CXX_STANDARD 11)</code> 추가</p>
<h4 id="2">2</h4>
<p><code>opencv_lapack.h</code>에서 
<code>#include &quot;/usr/include/eigen3/Eigen/src/misc/lapacke.h&quot;</code>  include path 정확하게 섧정</p>
<p><code>descriptor.cpp</code>에서</p>
<pre><code>CV_Assert(image.size &gt; 0);
CV_Assert(cost.size &gt; 0);</code></pre><p>를</p>
<pre><code>CV_Assert(image.cols &gt; 0 &amp;&amp; image.rows &gt;0);
CV_Assert(cost.cols &gt; 0 &amp;&amp; cost.rows &gt;0);</code></pre><p>로 바꾸기</p>
<pre><code>sudo apt-get install liblapack-dev liblapacke-dev</code></pre><p>로 <code>lapack</code> install하기</p>
<h4 id="3">3</h4>
<pre><code>/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_zgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, void const*, void const*, int, void const*, int, void const*, void*, int)&#39;
/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_dgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, double, double const*, int, double const*, int, double, double*, int)&#39;
/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_sgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const*, int, float const*, int, float, float*, int)&#39;
/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_cgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, void const*, void const*, int, void const*, int, void const*, void*, int)&#39;
collect2: error: ld returned 1 exit status</code></pre><pre><code>sudo apt update
sudo apt install libblas-dev
sudo apt install libopenblas-dev</code></pre><h4 id="4-필수-의존-패키지-누락">4 필수 의존 패키지 누락</h4>
<p>도커 환경에서 에러 발생</p>
<pre><code>python3-dev python3-numpy python3-pip libjasper-dev liblapacke-dev libeigen3-dev libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev</code></pre><h4 id="5-cmake-경로-오류">5 cmake 경로 오류</h4>
<p>cmake 명령어 수정</p>
<pre><code>cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=OFF -D WITH_IPP=OFF -D WITH_1394=OFF -D BUILD_WITH_DEBUG_INFO=OFF -D BUILD_DOCS=OFF -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_QT=OFF -D WITH_GTK=ON -D WITH_OPENGL=ON -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules -D WITH_V4L=ON  -D WITH_FFMPEG=ON -D WITH_XINE=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D WITH_CUDA=OFF  -DLAPACKE_INCLUDE_DIR=/usr/include/lapacke ..
</code></pre>]]></description>
        </item>
    </channel>
</rss>