"Hello, Estelle!"

[논문스터디] PointNet++: Deep hierarchical feature learning on point sets in a metric space

Thu, 01 May 2025 18:13:21 GMT

Bibtex 인용

@article{qi2017pointnet++,
  title={Pointnet++: Deep hierarchical feature learning on point sets in a metric space},
  author={Qi, Charles Ruizhongtai and Yi, Li and Su, Hao and Guibas, Leonidas J},
  journal={Advances in neural information processing systems},
  volume={30},
  year={2017}
}

요약

local feature를 PointNet에 추가해보자
- multi scale로 학습해서 잘 combine해서 결과는 똑같이 내면서 local feature도 utilise해보자

인트로

PointNet에서는 local structure를 잘 capture하지 못함
- 근데 local structture가 convolutional atchitecture의 성공을 좌지우지한다함
그래서 multi-resolution hierarchy 를 도입
- lower단에서 local structure를 학습할 수 있음
문제점이 두개임
- 어떻게 point set을 partitioning할거임?
- 어떻게 그 set of points or local feature를 local feature learner를 통해서 abstract할거임?
위의 두 문제점은 연관성이 있는데 이게 잘 나눠야 웨이트 공유하면서 학습에도 영향을 받고 그래서 ㅇㅇ
- 일단 앞에 우리 잘되는거 만들어놨으니까 local feature leatner로 PointNet쓸거임
overlapping partition of point set을 만드는게 문제임
- entanglement of feature scale 이랑 non-uniformity of input point가 적절한 local 포인트 파티션 만드는걸 힘들게 만듬

기여

multi scale에서 robust하고 detail한 특징을 capture하는 PointNet++이라는 딥러닝 네트워크 제안
말고는 포인트넷거 이야기

3. Method

3.1 Review of PointNet

정리해둔거 있으니까 패스하겠음

3.2 Hierarchical Point Set Feature Learning

FPS로 점들 뽑아서 샘플링하고 그룹핑은 그거 묶어서 넘기는건데 이러면 각 그룹마다 포함된 포인트의 개수가 다름
- 근데 POintNet은 flexible # input에 대해서도 고정된 수의 아웃풋 피쳐벡터 뽑을 수 있어서 ㄱㅊ
KNN보다 이 방식이 더 general하게 작동함
그리고 뒤에 PointNet러사용함

3.3 Robust Feature Learning under Non-Uniform Sampling Density

dense한거랑 sparse한거랑 generalize가 잘 안되니까
dense한데서는 좀 더 closely inspect하고 sparse하면 좀 더 넓은 scale에 대해서 inspect
이거를 학습하기 위해서 adaptive PointNet을 사용함

multi scale grouping
- 각 point에 대해서 랜덤 확률로 랜덤 dropping함
- empty한 그룹 없게하려고 드랍 확률을 핸드 튜닝했다함 0.95 정도면 없는거같대
- 이거로 uniformity가 보장되지 않는 point cloud에 대해서 cover
multi resolution grouping
- 위에 방법을 아예 raw한 데이터에서부터 쓰면 computational cost가 너무 커서 제안한거
- 전체거랑 multi resolution거를 concat해서 쓰는데
  - density에 따라서 reliable한 데이터가 전체에 있을수도있고 아닐수도있어서
  - 일단 concat하고 뒤에서 뭐가 더 reliavle한지 학습할 수 있게끔
    - 아무튼 weight가 조정된다는 식으로 말하는데 직접 뭘 하는건 아님

3.4 Point Feature Propagation for Set Segmentation

원래 포인트 클라우드로 segmentation을 진행하고싶은데 그러면 너무 computational cost가 큼
그래서 skip connection사용해서 진행
interpolation 진행하고 거기에 skip connection된 원래의 것에 feature를 대응시킴
inverse distance weighted average를 사용해서 interpolation진행하고
그렇게 interpolation된 feature들을 skip connection한 포인트 feature랑 concat진행함
그거를 unit PointNet에 넣는데 대충 1by1 convolution같은 느낌

[논문스터디] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Thu, 01 May 2025 18:12:04 GMT

Bibtex 인용

@inproceedings{qi2017pointnet,
  title={Pointnet: Deep learning on point sets for 3d classification and segmentation},
  author={Qi, Charles R and Su, Hao and Mo, Kaichun and Guibas, Leonidas J},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  pages={652--660},
  year={2017}
}

요약

Point cloud는 irregular한 geometric data라서 보통 3D voxel grid나 collection of image로 변환해서 사용함
이거 불필요하게 voluminous함 그래서 여기서는 directly point를 다루는 neural network를 제시
Object classification, part segmentation으로 scene semantic parsing
간단해서 efficient effective하다

1. 인트로

포인트클라우드나 메시들은 불규칙적인 형태를 가지고 있음 → 3D 복셀이나 cxollection of image로 변환해서 사용해야함
이렇게 하면 불필요하게 voluminous함
포인트 클라우드는 매시와는 다르게 simple and unified함 → 학습에 용이함
본 논문ㄴ에서 제안하는 PointNety은 unified architecture를 통해 포인트 자체를 입력으로 받아서 모든 포인트 혹은 seggment or part에 대해 label을 반환하는 네트워크
단순히 $(x,~ y,~ z)$ 좌표값만을 사용함 → 다른 dimension에 대한 정본는 normal이나 other local/global feature를 계산하면서 더해질것이라 간주
key approach는 simple symmetric function and max pooling을 이용해서 네트워크를 학습시키는것
놀랍게도(라는데) PointNet은 sparse한 키 포인트 셋에서 skeleton of object를 visualization기반으로 학습함

기여

novel deep net architecture 근데 이제 unordered 3D 포인트클라우드에 대해 적합한
3D shape classsification, shape part segmentation, scene semantic parsing task를 하는 net을 학습시킴
network에 대한 stability, efficency을 emperical, theoretical한 방식으로 분석을 제공

Point Cloud Feature

Deep Learning on 3D Data

Deep Learning on Unordered Sets

3. Problem Statement

object classification을 위해서는 각 포인트들이 directly sampled shape or pre-segmented from a scene point cloud여야함
$(x,~ y,~z)$ coordinate이외에도 여러 feature에 대한 정보들이 필요함
근데 이제 point net에서는 $(x, ~ y, ~ z)$ 만 사용해서 할거고 각 class에 대해서 각각 score를 매길거임 → n포인트 m클래스면 $n \times m$ output

4. Deep Learning on Point Sets

4.1. Properties of Point Sets in $\R ^n$

입력은 euclidean space에서 추출된 point cloud의 subset
이미지랑 다르게 unordered라서 네트워크는 permutation invarient를 보장해야함
not isolated라서 주위의 점이 meaningful한 subset을 이룰 수 있음
transformation이 적용되어도 그것이 category나 segmentation of point에 invarient하게 작용해야함

4.2. PointNet Architecture

3개의 키 모듈
- max pooling layer as a symmetric function to aggregate information
- local/global information융합을 위한 구조
- point랑 피쳐 align을 위한 alignment network

Symmetry Function for Unordered Input

보통 unsorted데이터를 활용할때 아래의 세가지 정도의 solution을 사용하곤함
- 입력을 canonical order로 sort
  - sort자체가 ordering issue를 완전히 resolve하지는 못함
  - sort가 언제나 stable하게 유지되어야 학습이 잘되는데 보통 그렇지 못해서
  - MLP는 unsorted point set에 대해서 더 나은 성능을 보임
- 입력을 RNN을 훈련하기 위한 seq로 취급
  - randomly permuted seq를 RNN학습에 사용하면 학습된 네트워크는 인풋의 order에 invarient함
  - 근데 RNN특성상 입력 seq에 대해서 완전히 독립적인 아웃풋을 낸다고 생각할 수 없어서 순서가 중요한 요소로 남긴함
- 그냥 각 포인트에 대해서 information을 aggregate하는 simple symmetric function
  - empirically이거 잘 작동함
  - 심플해서 분석도 쉽댐

Local and Global Information Aggregation

point classification은 SVM나 MLP로 간단히 됐대
근데 point segmentation은 llocal and global knowledge를 필요로 함
global point cloud feature vector를 계산한 다음에 이걸 per point feature로 feedback함
그렇게 각 point feature에 global feature를 combine한 다음에 다시 per point feature를 extract하는 방식으로 local/global feature를 combine한다함

Joint Alignment Network

mini network를 통해서 affine transformation 행렬을 예측하고 이거를 입력된 point의 coordinate에 적용
mininetwork 자체는 그냥 network랑 구조는 유사하고 각 point대해 독립적인 feature 추출 및 ㅡmax pooling and fully connected layer로 이루어져있음
이거 똑같은거 나중에 feature level에서 한번 더 이루어지는데 이때는 단순히 공간에서 계산하는거보다 차원 짱큼 → 최적화 difficulty 커짐
그래서 여기에는 regularization term을 softmax loss에 추가함

4.2. Theoretical Analysis

Universal approximation

intuitively small perturbation은 결과에 영향을 줄 수 없음
max pooling layer에 충분히 많은 뉴런 전달시 아웃풋을 뽑아내는 function이 arbitrary approximated가능

Theorem 1.

$f : \chi ~ \rarr \R$ 가 Hausdorff distance를 기준으로 하는 function set이라 해봄
이론적으로 최악의 상황에서는 동일한 크기의 voxel로 분할해서 point cloud를 volumetirc representation으로 바꿀수있음
근데 practically network가 much smarter하게 space를 probe하는 방법을 익혔대

Bottleneck dimension and stability

theoretically and experimentally 자기들 네트워크의 expressiveness가 dimension of max pooling layer에 크게 영향을 받음 → 이라 하고 아무말도 안하냐
암튼 다음 theorem을 보면 stability에 영향을 주는 properties에 대해 알 수 있대

Theorem 2.

$u : X \to \mathbb{R}^K$ , $u = \max_{x_i \in S} { h(x_i)}$, $f = \gamma \circ u$ 라 가정하고
- a) $\forall S, \exists C_S, N_S \subseteq X, , f(T) = f(S) \text{ if } C_S \subseteq T \subseteq N_S$
- b) $|C_S| \leq K$
a)는 모든 포인트가 보존된다면 extra noise에 영향받지 않는다는 거고
b)는 $f$ 가 결론적으로 $K$ element보다 작거나 같은 finite subset에 의해서 결정된다는 거래
그래서 $S, ~K$critical한 point set이 된다?
아무튼 위에 두개를 합치면 robustness를 나타낸다네
intuitively point net learns how to summarise a shape by a sparse set of key points

암튼 잘됨 ~ 1080에서 3-6시간이면 학습 ㄱㄴ

[논문스터디] Semantic Graph Based Place Recognition for 3D Point Clouds

Thu, 01 May 2025 18:10:30 GMT

Bibtex 인용

@inproceedings{kong2020semantic,
  title={Semantic graph based place recognition for 3d point clouds},
  author={Kong, Xin and Yang, Xuemeng and Zhai, Guangyao and Zhao, Xiangrui and Zeng, Xianfang and Wang, Mengmeng and Liu, Yong and Li, Wanlong and Wen, Feng},
  booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  pages={8216--8223},
  year={2020},
  organization={IEEE}
}

요약

3d 포인트 클라우드에서 Occulsion 및 viewpoint변화, place recognition에 강인한 descriptortor를 생성하는것이 어려움
대부분 로컬 아니면 글로벌 아니면 통계적 특징을 사용함
이 페이퍼에서는 human perspective에 ㄱㅣ반해서 semantric한 레벨을 목표로함
semantic object를 인식하고 그래프 기반 접근 방식을 제시
장소 인식을 그래프 기반의 매칭 문제로 치환
코드는 여기.

인트로

누적된 주행 drift error를 제거하는 가장 효과적인 방법은 loop closing을 하는 방법임
현재의 place recognition 전략은 대부분 descriptor 생성과 feature distance measurement에 기반함
라이다 기반 방법에서 많이 쓰이는거는 raw data에 neural network 혹은 handcrafted design기반으로 local or global descriptor를 뽑는거임
이렇게 하면 보통 low level의 feature를 얻게됨 ex) local structure, distributing characteristic
이런거 occlusion 이나 rotation에 sensitive하고 segment사이의 관계들이 무시되는데 그게 scene expression에 치명적일 수 있음
이 논문에서는 point cloud data를 semantic information을 aggregate해서 만든 novel graph representation을 사용함
이런 graph based reperesentation은 topological relation을 고려하므로 포인트클라우드를 더 efficient and comprehensible하게 만들어줌

기여

3d point cloud에 대한 semantic graph representation을 제시함
- capture semantic information and model topological relations between objects
loop closure detection에 사용될 수 있는 graph similarity matching 네트워크를 제시함
semantic kitti로 테스트해서 reverse loop closure detection과 occlusion 및 viewpoint변화에 대한 robustness에 SOTA임을 보임

Methodology

key insight는
1. human perspective사용
2. semantic level의 descriptor사용
3. encoding relations among semantic object
raw 포인트에 대해서 semantic segmentation을 통해 instance 및 semantic information topological information을 취득하여 semantic graph를 구성함
그 이후에 raw point cloud들을 topological semantic graph로 변환하여서 place recognition문제를 그래프 매칭 문제로 바꿈

A. Semantic Graph Representation

Semantic Segmentation for Point Clouds

RangeNet++이랑 Semantic KITTI사용해서 semantic object detection을 하는데, 이 과정에서 몇개의 클래스들을 합치고 지워서 12개의 카테고리만 사용함

각 카테고리에 따라서 클러스터링 반경을 다르게 설정하고 , 유클리디안 클러스터링을 통해서 semantic instance를 취득

Semantic Graph Constriction

64채널 라이다가 보통 한 프레임 당 10만개 이상의 포인트를 capture하는데, 이거 너무 redundant함
줄이기위해서 down sampling이나 2D평면에 투영하는데, 우리는 topological semantic graph를 사용함
- concise하고 meaningful하며 semantic information과 topological relation이 잘 보존됨
각 semantic instance들은 one hot encoding되어서 사용되고 유클리디안 디스턴스 기반으로 나타남
그 그래프가 scene에 대한 representation임 그래서 이제 similarity measurement problem으로 두 그래프를 비교할 수 있음

B. Graph Similarity Network

보통 그래프 similarity metric으로 Graph Edit Distance(GED), Maximum Common Subgraph(MCS)를 사용하는데 이거 NP-complete라서 정확한 distance를 구하기 힘듦
그리고 loop closing을 위한 place recognition이기때문에 permutation invarient해야하고 rotation invariant해야함
위의 조건을 만족시키면서 원래의 similarity 산출방식을 사용하면 reasonable한 시간 안에 도출이 불가능
그래서 propseg한다, graph matching을 위한 graph similarity network inspired by SimGNN

Node Embedding

Graph Convolutional Network는 노드간의 relation을 기반으로 feature를 aggregate하지만, adjacency matrix를 미리 정의해야함
따라서 point cloud를 처리할떄는 dynamic하게 graph를 구성하는 것이 나음
- EdgeConv 사용, Dynamic Graph CNN(DGCNN)에서 제안되었음
EdgeConv는 local geometry information을 capture하고 permutation invariance를 보장함
dynamic하게 업데이트되는 그래프이기때문에 semantic grouping에 용이함
EdgeConv layer
- kNN search를 통해 각 노드에 대해 feature space와 euclidean space에서 가장 가까운 k개의 이웃 찾음
- 각 노드 feature는 centroid information과 semantic label(one-hot encoded)로 initialize됨
- edge function은 $h_{\Theta} (f_i, f_j) = h_{\bar{\Theta}} \left( f_i, f_i - f_j^m \right)$ 와 같이 정의됨
- $\Theta$는 학습 가능한 파라미터 집합, $f_i$는 global information을 포함하고 저거 뺀거는 local 관계 정보를 포함함
- multimodal feature aggregation을 위해 spatial and semantic level에서 독립적인 convolution을 수행하고 embedding진행 후 concat

Graph Embedding

usually node enbedding은 weighted or unweighted average를 사용해서 생성
여기에서는 SimGNN에 영감을 받아 attention module을 활용해 각 node에 대해 학습가능한 가중치 행렬을 추정
neural network가 어떤 노드가 graph를 대표하는데에 더 적합한지 학습
Global Graph Context $c$는 각 노드에 대해서 node embedding 의 평균을 구한후 $tanh$써서 계산함 → $c = \tanh \left( \frac{1}{N} \sum_{i=1}^{N} u_i W \right)$
$c$는 그래프의 전체 구조 및 feature information을 제공하고 학습하면서 가중치 업데이트
global context와 유사한 node가 더 높은 attention을 받음
- attention은 global context와 node embedding을 내적하고 $sigmoid$를 사용해서 $[0,~1]$범위에 있도록함
그래서 이거 weighted sum사용해서 최종 graph embedding계산
$e = \sum_{i=1}^{N} \sigma \left( u_i \tanh \left( \frac{1}{N} \sum_{m=1}^{N} u_m W \right)^T \right) u_i$

Graph-Graph Interation

graph level embedding에서 두 그래프의 관계 추정에 neural tensor network(NTN)사용
- NTN은 linear layer 대신 bilinear layer를 사용해서 두 벡터간 관계를 학습하는거 → 내적보다 나음
- relation between graph level embedding은 아래식대로
- $g(e_1, e_2) = \text{ReLU} \left( e_1^T \omega_{[1:S]} e_2 + \alpha \begin{bmatrix} e_1 \ e_2 \end{bmatrix} + b \right)$
- 이게 뭐냐면 첫 항이 bilinear tensor연산으로 두 그래프 $e_1 ,~e_2$간의 관계를 학습하는거
- 두 번째 항이 두 embedding을 concat해서 linearize한다음에 추가적인 feature를 학습하는거
- 세 번째 항은 그냥 bias

Graph Similarity

similarity 계산을 위해서 FC layer를 사용
최종적으로 $[0,~1]$범위의 score를 출력하고 이것을 통해 binary classification problem으로 치환해서 풀어냄
similarity는 그냥 NTN에서 얻은 feature vector를 FC layer에 통과시켜서 단일 스칼라값(score)를 뽑아내고 여기에 $sigmoid$사용해서 정규화
손실함수는 BCE사용 GT label이 이진수라서 그냥 $L = - \frac{1}{N} \sum_{i=1}^{N} \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]$이렇게 계산하면됨

Experiment

암튼 잘 됐다 같은 느낌인데 확실히 recall은 좋음

근데 이제 precision은 좀 낮은 시퀀스도 있긴함 근데 결과보면 무난히 좋아보인달까

threshold distance를 두고 잘 찾는지 보는데 아무튼 잘된다같은 느낌
뭐 다 그런 내용이었다 성능은

[WIP] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap

Tue, 18 Mar 2025 15:32:02 GMT

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap

Date: 2021 Journal: T-RO

3. System Overview

1) Atlas

Multimap representation composed of a set of disconnected maps

Active map is where tracking threads localizes incoming frames, the others are nonactive maps

2) Tracking thread

Compute pose of current frame with respect to active map in real time

If tracking is lost, tries to relocalize in all the atlas maps

3) Local mapping thread

Add keyframes and points to the active map, removes redundant ones, and refines the map using visual or visual-inertial BA

4) Loop and map merging thread

Detect common regions between active map and the whole atlas at keyframe rate

If it belongs to active map, performs loop correction

If not, merge both maps into single one

ORB SLAM 정리 WIP ORB SLAM2 정리 WIP ORB SLAM3 정리

[WIP] ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Tue, 18 Mar 2025 15:30:43 GMT

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

Date: 2017 Journal: T-RO

3. ORB SLAM2

c. Bundle Adjustment with Monocular and Stereo Constraints

Motion anly BA

Optimize camera orientation and position, minimizing reprojection error between matched 3D points in world coordinates and keypoints

Local BA

Optimize set of covisible keyframes and all points seen in keyframes

All the other keyframes not in covisible frames, contribute to the cost function, while no other optimization

Full BA

Specific case of local BA, where all the keyframes and points in the map are optimized

ORB SLAM 정리 WIP ORB SLAM2 정리 WIP ORB SLAM3 정리

[WIP] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Tue, 18 Mar 2025 15:21:54 GMT

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Date: 2017 Journal: CVPR

1 Introduction

Exploiting local structure has proven to be important for the success of convolutional architectures

CNN takes data defined on regular grids as the input an is able to progressively capture features at increasingly larger scales along a multi resolution hierarchy

PointNet++ is a hierarchical neural network that process a set of points sampled in a metric space in a hierarchical fashion

First partition the set of points into overlapping local regions by distance metric of the underlying space

Two issues are addressed by PointNet++

How to generate the partitioning of the point set
How to abstract sets of points or local features through a local feature learner

PointNet++ apples PointNet recursively on a nested partitioning of the input set

Unlike CNNs, where smaller kernels often enhance performance, point cloud data can be sparse, making small scales inadequate

PointNet++ addresses this by using multi-scale neighborhoods, adapting to different scales during training, and achieving superior results on 3D point cloud benchmarks

2. Problem Statement

3. Method

Extension of PointNet with added hierarchical structure

3.1 Review of PointNet

Invariant to point permutations and can arbitrarily approximate any continuous set function

Lacks the ability to capture local context at different scale

3.2 Hierarchical Point Set Feature Learning

Use a hierarchical grouping of points and progressively abstract larger and larger local region along the hierarchy

Hierarchical structure is composed by a number of set abstraction levels

Three key layers: Sampling layer, Grouping layer, PointNet layer

Sampling layer

Iterative farthest point sampling to choose a subset of points

Generates receptive fields in a data dependent manner

Grouping layer

Grouping input point set ($N \times (d~ +~~C)$) matrix into output ($N' \times K \times (d~~+~C')$)

$K$ is the number of points in neighborhood

PointNet layer

Local feature is abstracted by its centroid, and that encode the centroid’s neighbourhood

Output size of $N' \times (d+C')$

3.3 Robust Feature Learning under Non-Uniform Sampling Density

MSG Multi-scale grouping

Capture multi scale patterns by applying grouping layers with different scales followed by according PointNet to extract features of each scale

Concatenated to form a multi scale feature

Optimize with random input dropout

MRG Multi-resolution grouping

The # og centroid points is usually large at the lowest level, which cause time cost increase

Use multi resolution grouping

One vector from summarizing features at each subregion from lower level

One vector from directly processing raw points in local region

3.4 Point Feature Propagation for set Segmentation

[WIP] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Tue, 18 Mar 2025 15:19:49 GMT

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Date: 2017 Journal: CVPR

1. Introduction

Point clouds or meshes are not in a regular format, This cause the need for transformation to 3D voxel grids or collection of images

This data representation transformation renders the resulting data unnecessarily voluminous

PointNets simply use point clouds

As point cloud is just a set of points, basic architecture is simple at the initial stages each point is processed identically and independently

PointNet s trained to perform 3D shape classification, shape part segmentation and scene semantic parsing tasks

Point Cloud Feature

Point Feature encode certain statistical transformation, typically classified, and also be categorized as local and global features

Deep Learning on 3D Data

Deep Learning on Unordered Sets

One recent work used a read process write network with attention mechanism to consume unordered input sets

3. Problem Statement

For object classification task, the input point cloud is either directly sampled from a shape or pre-segmented from a scene point cloud

4. Deep Learning on Point Sets

4.1. Properties of Point Sets in $\R ^n$

Input is a subset of points from an Euclidean space

Unordered, interaction among nearby points, invariant to certain transformation for learned representation of the point set

4.2. PointNet Architecture

Three key modules, max pooling layer as a symmetric function to aggregate information, local and global information combination structure, two joint alignment networks

Symmetry Function for Unordered Input

Sort input into a canonical order

Sorting does not fully resolve the ordering issue

MLP performs better with unsorted point set

Treat input as a sequence to train RNN

Using randomly permuted sequences, RNN become invariant to input order

However when it comes to RNN, order does matter and cannot be totally omitted

Simple symmetric function to aggregate information from each points

Approximate a general function defined on a point set by applying a symmetric function on fransformed elements in the set

Due to simplicity of our module, theoretical analysis were possible

Local and Global Information Aggregation

Point segmentation requires a combination of local and global knowledge

After computing the global point cloud feature vector, feed it back to per point feature

Extract new per point features based on the combined point features

Joint Alignment Network

Predict affine transformation matrix by a mini network and directly apply this transformation to coordinates of input points

The mini network itself resembles big network and is composed by basic modules of point independent feature extraction, max pooling and fully connected layers

Transformation matrix in the feature space has much higher dimension than the spatial transform matrix

Therefore add a regularization term to our softmax training loss

4.2. Theoretical Analysis

Universal approximation

Ability of neural network to continuous set functions

Given enough neurons at max pooling layer

Theorem 1.

Suppose $f : \chi ~ \rarr \R$ is a continuous set function with reference to Hausdorff distance

In the worst case the network can learn to convert a point cloud into a volumetric representation by partitioning the space into equal sized voxels

Bottleneck dimension and stability

Expressiveness of network is strongly affected by the dimension of the max pooling layer

Defined sub network of $f$ which maps a point set in $[0, ~ 1] ^m$ to a $K$ dimensional vector

Theorem 2.

Following proposed formula

Extra noise points up to $\mathcal{N}_S$

Robustness is gained in analogy to sparsity principle

Intuitively network learns to summarize a shape by a sparse set of key points

[WIP] Semantic KITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

Tue, 18 Mar 2025 15:17:09 GMT

Semantic KITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences

Date: 2019 Journal: CVPR

1. Introduction

LiDAR sensors are not affected by lighting, providing precise distance measurements

SemanticKITTI focuses on laser based semantic segmentation and semantic scene completion

3. The Semantic KITTI

3.1. Labeling Process

Loop close the sequences using an off the shelf laser based SLAM system

Subdivide the sequence of point clouds into tiles of 100m by 100m

For each tile, load scans overlapping with tile, enabling to label all scans consistently

3.2. Dataset Statistics

The unbalanced count of classes occured, but is common for data from natural environments

4. Evaluation of Semantic Segmentation

4.1. Single Scan Experiments

Task and Metrics

Used method, commonly applied mean Jaccard Index or mean intersection over union (mIOU) metric

Cannot expect to distinguish moving from non-moving objects with single scan

State of the Art

Feature extraction and classification is replaced by end to end deep neural networks (CNN) with 3D convolutions for object classification and semantic segmentation

To overcome the limitation of voxel based representation such as exploding memory consumption, recent approaches either upsample voxel predictions using CRF or use different representations

Baseline approaches

[WIP] Deep SORT: SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC

Tue, 18 Mar 2025 15:15:48 GMT

Deep SORT: SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC

Date: 2017 Journal: CVPR

1. Introduction

SORT was simple framework that performs Kalman filtering in image space and frame by frame data association using the Hungarian method with an association metric that measures bounding box overlap

But it returns a relatively high # of identity switches as the employed association metric is only accurate when uncertainty is low

To overcome this issue by replacing association metric with a more informed metric that combines motion and appearance information

Deep SORT increase robustness against isses and occlusions while keeping the system easy to implement efficient and applicable to online

2. Sort with Deep Association Metric

2.1. Track Handling and State Estimation

The track handling and Kalman filtering framework is mostly identical to the original formulation

State space is defined $(u, v,\gamma ,~~h, ~\dot{x} ,~~\dot{y}, ~~\dot{\gamma},~~\dot{h} )$

$\gamma$ aspect ratio

Tracks that exceed a predefined maximum age $A_{max}$ are considered to have left the scene and are deleted from the track set

2.2. Assignment Problem

To integrate motion and appearance information through combination of two appropriate metrics, Mahalanobis distance is used

$d^{(1)} (i, ~j) = (d_j - y_i ) ^T S_i ^{-1} (d_j - y_i )$

2.3. Matching Cascade

Mahalanobis distance favors large uncertainty because it effectively reduces the distance in standard deviations of any detection towards the projected track mean

It is an undesired behaviour as it can lead to increased track fragmentations and unstable tracks

In a final matching stage, intersection is done over union association as proposed in the original SORT algorithm

2.4. Deep Appearance Descriptor

A wide residual network with two convolutional layers followed by six residual blocks is employed

[WIP] SORT: SIMPLE ONLINE AND REALTIME TRACKING

Tue, 18 Mar 2025 15:14:27 GMT

SORT: SIMPLE ONLINE AND REALTIME TRACKING

Date: 2017 Journal: CVPR

1. Introduction

The MOT problem can be viewed as a data association problem where the aim is to associate detection across the frames

There is a resurgence of mature data association techniques including Multiple Hypothesis Tracking(MHT) and Joint Probabilistic Data Association(JPDA) which occupy many of the top positions of the MOT benchmark

Traditional Tracker is too slow for realtime applications

Instead of focusing on efficient and reliable handling of the common frame to frame associations, exploit recent advances in visual object detection to solve detection problem directly

2. Literature review

Traditional MOT delay making difficult decisions while there is high uncertainty over the object assignments

Many online tracking methods aim to build appearance models of either the individual objects themselves or a global model through online learning

When considering only one-to-one correspondence modelled as bipartite graph matching, globally optimal solutions such as the Hungarian algorithm can be used

3. Methodology

3.1 Detection

Utilize the Faster Region CNN (FrCNN) detection framework, which is an end to end framework that consist of two stages in this paper

first stage extracts features and proposes region, second stage classifies

Can be swapped to any design

3.2 Estimation Model

The inter-frame displacements of each object with a linear constant velocity model which is independent of other objects and camera motion

When a detection is associated to target. the bounding box is used to update the target state where the velocity components are solved optimally via Kalman filter framework

If no detection is associated to the target, its state is simply predicted without correction using the linear velocity model

3.3 Data Association

The assignment cost matrix is the computed as the intersection over union distance between each detection and all predicted bounding boxes from the existing targets

The assignment is solved optimally using Hungarian algorithm

3.4 Creation and deletion of Track Identities

For any detection with an overlap less than $IOU_{min}$ to signify the existence of an untracked object

Tracks are terminated if they are not detected for $T_{Lost}$ frames to prevent an unbounded growth in the # of trackers and localisation errors

Small $T_{Lost}$ cause early deletion of lost targets which aids eddiciency

[WIP] Attention Is All You Need

Tue, 18 Mar 2025 15:12:18 GMT

Date: 2017 Journal: NIPS

1 Introduction

Background

RNN. LSTM. GRU have been firmly established the state of art in sequence modeling, language modeling and machine translation

Problem

Recurrent model is critical at longer sequence, as memory constraints limiting batching across

Many solutions(factorization tricks, conditional computation) improved in computational efficiency while improving performance

But, problem still remains

2 Background

To reduce sequential computation, many CNN based model was proposed

These models computes hidden representations in parallel for all I/O positions

But to relate signals from arbitrary I/O positions grows in the distance between positions, which makes computation difficulties in learning dependencies between distant positions

In transformer, this is reduced to a constant number of operation

It reduce the effective resolution due to averaging attention weighted positions, but can handle with Multi-Head attention

Self-attention is an attention mechanism relating different position of a single sequence to compute a representation of the sequence

3 Model Architecture

Most competitive neural sequence transduction models have an encoder decoder structure

Encoder - input sequence of symbol representation ($x_1, ~~...,~~x_n$ ) to continuous representations ($z1, ~~...,~~z_n$) Decoder - given continuous representations to output sequence ($y_1, ~~...,~~y_n$)

Transformer follows this overall architecture using self-attention and point-wise, fully connected layers for both encoder and decoder

3.1 Encoder and Decoder Stacks

Encoder

Composed of 6 identical layers

Each layer has two sub-layers - multi head self attention mechanism and position wise fully connected feed forward network

Each sub layer has residual connection and following layer normalization

Decoder

Composed of 6 identical layers

Each layer has three sub-layers - multi head self attention mechanism and same layer to encoder layer

Each sub layer has residual connection like encoder

Modified self attention sub layer to ensure the prediction can only depend on the known output at position less than its position

3.2 Attention

Attention function can be described as mapping a query and a set of key-value pairs to an output

Output is computed as weighted sum, where weight assigned to each value is computed by a compatibility function of query with matching key

3.2.1 Scaled Dot-Product Attention

Weight on values of dimension $d_v$are obtained by computing dot product of the input queries (dimension $d_k$) with keys (dimension $d_k$), divide each by $\sqrt d_k$ and apply softmax function

Two most commonly used attention function are additive attention and dot product attention

Two has similar theoretical complexity, but dot product attention is much faster and more space efficient since it can be implemented using highly optimized matrix multiplication

If $d_k$ is small two perform similarly, but for larger $d_k$ additive attention outperforms dot product without scaling

To counteract this, used scale dot product by $1 \over \sqrt d_k$

3.2.2 Multi-Head Attention

Linearly project the queries, keys, values $h$ times with different learned linear projection to $d_k$, $d_k$ and $d_v$ dimension respectively is beneficial than using single attention function with $d_{model}$ dimension keys, values, queries

On each projected version of queries, keys and values, perform attention function in parallel, yielding $d_v$ dimensional output values

This reduction in dimension can lower computational cost similar to that of single attention with full dimension

3.2.3 Application of Attention in out Model

Transformer use multi head attention in three different ways

Mimics typical encoder decoder attention mechanism in sequence to sequence models In “encode decoder attention” layer, the queries come from the previous decoder layer, and the memory keys and values com from the output of the encoder This allow every position in the decoder attend over all position in input sequence
Allow each position in encoder to attend to all positions in previous layer of the encoder In self attention layer of encoder, all of the keys, values and queries come from previous layer in the encoder
Similar to decoder, allows to attend to all the position in decoder upto and including that position Prevent leftward information flow in the decoder to preserve auto-regressive property This paper implemented this inside of scaled dot product attention by masking out all values of illegal connection from input of softmax

3.3 Position-wise Feed-Forward Networks

Attention sub layers in encoder and decoder contains fully connected feed forward network - consist of two linear transformations with ReLU activation between - applied to each position separately and identically

Linear transformation are the same across different positions, but use different parameters from layer to layer

3.4 Embeddings and Softmax

Similar to other sequence models, learned embeddings are used to convert input tokens and output tokens to vector of dimension $d_{model}$

Learned linear transformation and softmax function are used to convert decoder output to predict next token probabilities

In this model, same weight matrix is used between two embedding layers and pre-softmax linear transformation - in embedding layers, multiply $\sqrt {d_{model}}$ to weights

3.5 Positional Encoding

As no recurrence and convolution in the model, positional information is needed

Added “positional encodings” to input embeddings at the bottom of encoder and decoder stacks, in dimension of $d_{model}$

Choose sinusoidal positional encoding for easy-learn to attend by relative position, as it can be converted to linear function

Compare to learned positional embeddings, this produced nearly identical result but allows model to extrapolate to sequence length longer

4 Why Self-Attention

Path length between long-range dependencies in network

Key factor to learn dependencies is the length of paths forward and backward signals have to traverse in the network

The shorter these paths, the easier it is to learn long range dependencies

Amount of computation that can be parallelized

Total computational complexity per layer

Compare to Recurrent layer

Self attention layer connects all positions with a constant number of sequentially executed operations $O(1)$, whereas a recurrent layer requires $O(n)$

In terms of computational complexity, self attention layers - $O(n \cdot d^2 )$ are faster than recurrent layers - $O(n^2 \cdot d )$ when $n

For larger $n$, restricting self attention to neighbour of size $r$ can be considered - leave for further work

Compare to Single convolutional layer

A single convolutional layer with kernel width $k < n$ does not connect all pairs of input and output positions

Doing so requires a stack of $O(n/k)$ convolutional layers, which increase the length of longest paths between any two positions in the network $O(log_k (n))$

By using separable convolutions, the complexity can be decreased to $O(k \cdot n \cdot d + n \cdot d^2 )$ Even with $k=n$, complexity is equal to combination of self attention layer and a point wise feed forward layer

[WIP] DETR: End-to-End Object Detection with Transformers

Tue, 18 Mar 2025 15:09:12 GMT

Date: 2020 Journal: ECCV

1 Introduction

Problem

Modern detectors use indirect way like defining surrogate regression and classification problems on a large set of proposals, anchors, or window centers

This performances are significantly influenced by post-processing steps

To overcome this, direct set prediction approach - end to end philosophy -is used, it has led to significant advances in complex structured prediction but except object detection

Proposal

Streamlined the training pipeline bu viewing object detection as direct set prediction problem

Adopt encoder-decoder transformer

DETR predicts all object all at once, and is trained end to end with a set loss function which performs bipartite matching between predicted and ground truth

DETR simplifies the detection pipeline by dropping multiple hand-designed components that encode prior knowledge

DETR doesn’t require any customized layers and can be reproduced easily in any framework that has CNN and transformer

2.1 Set Prediction

A general approach is to use auto-regressive sequence models such as RNN

As loss function need to be invariant to a permutation of predictions

Usual solution is designing loss based Hungarian algorithm which enforces permutation invariance and guarantees unique match

DETR use transformer with parallel decoding to follow this bipartite matching

2.2 Transformer and Parallel Decoding

Transformer introduced self attention layers, which scan through each elements of a sequence and update by aggregating information from whole sequence

The main advantage is global computation and perfect memory - suitable for longer sequences

DETR combine transformer and parallel decoding for their suitable trade off between computational cost and ability to perform global computations

2.3 Object detection

In DETR, hand crafted process is removed, and directly predicting the set of detections with absolute box prediction with reference to input image rather than an anchor

Set based loss

Several object detectors use bipartite matching loss

But these models was modeled with convolutional or fully connected layers

To improve performance, hand designed NMS post processing is needed

This means unless they use set based loss, they still need manual processing

Recurrent detectors

Recurrent detector use bipartite matching losses with encoder-decoder architecture based on CNN activation and RNN to directly produce a set of bounding boxes

This performs on small data set, not on modern baselines

3 The DETR model

Two ingredients are essential

a set prediction loss that forces unique matching between predicted and ground truth boxes
an architecture that predicts a set of objects and models their relation

3.1 Object detection set prediction loss

DETR infers a fixed size set of $N$ predictions

Loss produces optimal bipartite matching between predicted and ground truth, then optimize losses

The optimal assignment is computed with Hungarian Algorithm

3.2 DETR architecture

Backbone

Conventional CNN is used for backbone, which generates a lower resolution activation map

Transformer encoder

First 1by1 convolution reduce the channel dimension of the high level activation map to smaller dimension

Each encoder layer as a standard architecture consist of a multi-head self-attention module and FFN, with additional positional encodings

Transformer decoder

Transformer decoder transforming $N$ embeddings of size $d$ using multi-headed self and encoder decoder attention mechanisms with transformer decoding $N$ objects in parallel at decoding layer

Prediction feed forward network

Final prediction is computed by 3 layer perceptron with ReLU activation function and hidden dimension $d$ and a linear projection layer

Auxiliary decoding losses

Auxiliary losses can help the model output the correct number of objects of each class during training

[WIP] Masked-attention Mask Transformer for Universal Image Segmentation

Tue, 18 Mar 2025 15:06:11 GMT

Date: 2022 Journal: CVPR

1. Introduction

Background

The universal architecture is showing SOTA performance for semantic/panoptic segmentation and is flexible. But recent research is focusing on advancing specialized architectures.

Problem

Why not universal architectures replace specialized ones.

→ Mask2Former : backbone feature extractor - pixel decoder - transformer decoder

Specialized semantic segmentation architectures

Typically per pixel classification

FCN based independently per pixel

Follow-up find context per pixel, focus on context modules/self-attention variants

Specialized instance segmentation architectures

Typically predict a set of binary masks for each class

Mask R-CNN generate masks from bounding boxes

Follow-up focus on precise bounding boxes/new ways to generate dynamic # of masks

Lack flexibility to generalization

Panoptic segmentation

Proposed to unify semantic/panoptic segmentation

Universal architectures

Emerge w/ DETR

Show mask classification architectures w/ E2E prediction → general for any image segmentation

3. Masked-attention Mask Transformer

3.1 Mask classification preliminaries

Mask classification architectures group pixels into N segments by N binary mask (for corresponding category labels) and is general

Difficult to fins good representations for each segment

→ each segmentation can be represented as C-dimention feature vector(”object query”), which can be processed by transformer decoder

Architecture components 1.backbone - extract low resolution features 2.pixel decoder - gradually upsample to generate high resolution per pixel embeddings 3.transformer decoder - operate to process object queries, from which the binary mask predictions are decoded

3.2 Transformer decoder w/ masked attention

Key components of proposing Transformer decoder

Extract localized features by constraining cross-attention to foreground region of predicted mask for each query

For small objects, propose efficient multi-scale strategy to use high resolution features

3.2.1 Masked attention

Context feature is important for image segmentation, but cause slow converge as global context need many epoch for cross-attention to learn to attend local object

Hypotheses

local features are enough to update query features
context information can be gathered through self attention

Solution cross-attention attends only within the foreground region of predicted mask for each query

Masked attention matrix $X_l = softmax(M_{l-1} + Q_l K_l ^T ) V_l + X_{l-1}$ $M_{l-1} (x,~~y) = \begin{cases} 0 &if~~ M_{l-1} (x,~y) = 1\ - \infty & if ~ otherwise \end{cases}$

$M_{l-1}$ is binarized mask prediction of previous Transformer decoder layer obtained from $X_{l-1}$ resized to same resolution of $K_l$

3.2.2 High resolution features

Problem High resolution features good for small objects, but high computation cost

Solution Not always use high resolution feature map, use multi scale feature to control computation increase both low/high resolution feature to one Transformer decoder layer

3.2.3 Optimization improvements

switch self/cross attention order

query features to first self attention layer is image independent and do not have signal from image, which means it does not enrich information

make query feature learnable, and supervise features before use in Transformer decoder

These learnable feature function like region proposal network and have ability to generate mask proposals

remove dropout

dropout is not necessary and decrease performance

3.3 Improving training efficiency

Problem Large memory consumption while training

Solution Motivated by PointRend/Implicit PointRend, which show a segmentation model can be trained with mask loss calculated on $K$ randomly sampled points

Use sampled points to calculate mask loss in matching/final loss

For matching loss, uniformly sample same set of $K$ points for all prediction and ground truth

For final loss, importance sample different pairs of prediction and ground truth

4. Experiments

Datasets

COCO(80 things, 53 stuff)
ADE20K(100 things, 50 stuff)
Cityscapes(8things, 11 stuff)
Mapillary Vistas(37 things, 28 stuff)

Limitations On panoptic, slightly worse than exact samemodel trained with corresponding annotation for instance and semantic, which means need to be trained for specific tasks

[논문스터디] LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping

Tue, 18 Mar 2025 14:50:56 GMT

Bibtex 인용

@INPROCEEDINGS{9341176,
  author={Shan, Tixiao and Englot, Brendan and Meyers, Drew and Wang, Wei and Ratti, Carlo and Rus, Daniela},
  booktitle={2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
  title={LIO-SAM: Tightly-coupled Lidar Inertial Odometry via Smoothing and Mapping},
  year={2020},
  volume={},
  number={},
  pages={5135-5142},
  doi={10.1109/IROS45743.2020.9341176}}

요약

smoothing 및 mapping을 활용한 tightly coupled lidat inertial odometry 프레임 워크 제안
factor graph기반 LIO 구성
IMU preintegration 기반 lidat point cloud deskewing 및 초기 추정
odometry 기반 IMU dias 추정
marginalization 기반 포즈 최적화를 통한 실시간 성능 향상
keyframe selection and sliding window를 활용한 sub keyframe을 통한 성능 향상

인트로

비전 기반 SLAM은 장소인식에 유리하지만 initialization, range등이 별로
라이다 기반은 조도 변화에도 불변성 유지가능, fine detail of environment 취득 가능
LOAM이 대표적인데 low-drift, 실시간 pose estimation, mapping제공하지만 voxel map기반이라 loop closing, gps 융합 등에 별로
그리고 LOAM 은 실시간 성능도 구리고 스캔매칭 기반이라 large scale에서 구리다
smoothing and mapping기반 tightly coupled LIO제안
nonlinear 운동 모델 기반 포인트클라우드 deskew
imu를 통해 라이다 스캐닝 동안의 센서 운동 추정 및 최적화의 초기값으로 활용
라이다 오도메트리 IMU bias 추정에 활용
global factor graph를 활용해 traj 추정
- 라이다 imu융하
- pose간 place recognition 통합
- gps, heading등 absolute value활용 가능
- 여러 factor joint optimzation
prior sub keyframe을 통한 pose 최적화
local scan matching → 실시간 good

기여

factor graph기반 tightly coupled LIO 구축
local sliding window based scan matching을 통한 실시간 성능 확보

III. LIDAR INERTIAL ODOMETRY VIA SMOOTHING AND MAPPING

A. System Overview

로봇 상태
- $x = \begin{bmatrix} R^T, p^T, v^T, b^T \end{bmatrix}^T$
- $R \in SO$ 회전 행렬
- $p \in \R ^3$ 위치 벡터
- $I_r$ IMU bias
3d 라이다, imu, gps 를 입력으로 사용
센서 관측값을 바탕으로 로봇 pose and traj 추정
상태 추정 문제를 MAP 문제로 정식화
- factor graph사용
- 가우시안 노이즈 가정, MAP 추론은 non linear least square 문제랑 같음
factor graph
- state variable
- IMU pre integration
- Lidar odometry
- GPS
- loop closure
새로운 node 추가 조건은 pose 변화량기반
graph 최적화는 bayes tree기반 incrtemental smoothing and mapping 사용 (iSAM2)

B. IMU Preintegration Factor

IMU measurement
- $\hat{\omega}_t = \omega_t + b^\omega_t + n^\omega_t$
- $\hat{a}_t = R^B_W (a_t - g) + b^a_t + n^a_t$
- $\hat{\omega} _t , ~\hat{a} _t$ IMU raw data
- $b^\omega_t, ~ b^a_t$ IMU bias
- $n^\omega_t ,~ n^a_t$ white noise
- $R^B_W$ 월드 좌표계에서 바디기준으로 변환하는 행렬
motion update
- $v_{t+\Delta t} = v_t + g\Delta t + R_t(\hat{a}_t - b^a_t - n^a_t) \Delta t$
- $p_{t+\Delta t} = p_t + v_t \Delta t + \frac{1}{2} g \Delta t^2 + \frac{1}{2} R_t (\hat{a}_t - b^a_t - n^a_t) \Delta t^2$
- $R_{t+\Delta t} = R_t \exp((\hat{\omega}_t - b^\omega_t - n^\omega_t) \Delta t)$
IMU preintegration
- $\Delta v_{ij} = R^T_i (v_j - v_i - g\Delta t_{ij})$
- $\Delta p_{ij} = R^T_i (p_j - p_i - v_i \Delta t_{ij} - \frac{1}{2} g \Delta t^2_{ij})$
- $\Delta R_{ij} = R^T_i R_j$
- IMU bias는 factor graph에서 lidar odometry factor랑 같이 최적화

C. Lidar Odometry Factor

feature extraction
- edge plane 추출
- $F_i = {F^e_i, F^p_i}$
key frame selection
- pose 변화가 $1m ~10\degree$ 초과 시 → 메모리 절약 연산 최적화
sub key frame selection based on sliding window
- sub keyframe 기반 voxel 맵 구성
  - 엣지는 0.2미터 해상도 평면은 0.4미터 해상도로
SCAN MATCHING
- IMU 예측 모션 기반 초기값 적용
- feature랑 voxel맵 대응 매칭 수행
relative transformation
- 엣지랑 평면 feature간 거리 계싼 기반
  - $d^e_k = \frac{\left| (p^e_{i+1,k} - p^e_{i,u}) \times (p^e_{i+1,k} - p^e_{i,v}) \right|}{\left| p^e_{i,u} - p^e_{i,v} \right|}$
    - 그냥 엣지에 대해서 직선 거리 계산하는거임
  - $d^p_k = \frac{\left| (p^p_{i+1,k} - p^p_{i,u}) \cdot \left( (p^p_{i,u} - p^p_{i,v}) \times (p^p_{i,u} - p^p_{i,w}) \right) \right|}{\left| (p^p_{i,u} - p^p_{i,v}) \times (p^p_{i,u} - p^p_{i,w}) \right|}$
    - 이건 그냥 평면 사이 거리 계산하는거임
- 가우스 뉴턴 방식으로 최적 변환 도출
  - $\min_{T_{i+1}} \sum_{p^e_{i+1,k} \in 0F^e_{i+1}} d^e_k + \sum_{p^p_{i+1,k} \in 0F^p_{i+1}} d^p_k$
    - 초기 추정값을 가지고 시작해서 오차함수(앞에서 구한 엣지 차이 평면 차이들로 정의됨)을 활용해서 그 차이를 최소화 하는 변환 행렬 구하는거
- 최종적으로 LO factor 계산
  - $\Delta T_{i,i+1} = T_i^{-1} T_{i+1}$

D. GPS Factor

GPS측정값은 local cartesian coordinate로 변환, 새로운 node 추가시에 해당 팩터 같이 넣음
보정 조건
- 라이다 프레임이랑 gps 동기화 안되면 gps를 라이타 프레임 타임스탬프에 맞춰서 선형 보간
- LO 공분산이 GPS 공분산보다 클 경우에만 GPS factor추가 → 항상 추가하는거 아님

E. Loop Closure Factor

factor graph활용으로 loop closing 통합 잘됨
euclidean distance기반으로 loop detection을 수행함 → 다른 방법 써도된대 ex descriptor
15m보다 가까우면 loop closing수행

IV. EXPERIMENTS

직접 딴 데이터 사용해서 테스트해보니까 잘됨이라는데

이런거로도 해봤는데 나 잘됨 ㅇㅇ

ㅇㅇ 잘된대

근데 잘된다는거 치고 모든 데이터셋에 대해서 동일한 메트릭으로 뽑은 결과치는 안줌

[WIP] MatrixVT를 돌려보자

Tue, 18 Mar 2025 14:44:46 GMT

MatrixVT 설치/실행 똥꼬쇼 로그

설치

CUDA

CUDA 요구버전 11.1

CUDA만 지우기

sudo apt-get --purge remove 'cuda*'
sudo apt-get autoremove --purge 'cuda*'
sudo rm -rf /usr/local/cuda*

CUDA 11.3 설치힐거임

CUDA 11.3 설치 사이트에서 local run file로 설치 진행 엔비디아 드라이버는 뺴고 설치

그래픽 드라이버도 없으면 여기

pytorch

요구 버전 torch==1.9.0 torchvision==0.10.0

pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

MMDetection3D

MMDetection3D 공식 레포와 MMDet3D의 공식 도큐먼트를 따라 설치

MMDet3D 설치 기록

requirement

pip3 install -r requirements.txt

install

python3 setup.py develop --user

문제 상황

pytorch-lightning

아래와 같은 문제 발생

ERROR: No matching distribution found for pytorch-lightning==1.6.0

해결 방법

아래와 같이 파이토치 설치시에 한번에 같이 깔아서 파이토치와의 호환성 해결

conda install pytorch==1.9.0 torchvision==0.10.0 torchaudio==0.9.0 cudatoolkit=11.3 torchlightning -c pytorch -c conda-forge

requirement.txt 에서 pytorch-lightning 부분 삭제

nvcc exit status 1

1 error detected in the compilation of "bevdepth/ops/voxel_pooling_train/src/voxel_pooling_train_forward_cuda.cu".
error: command '/usr/local/cuda-11.3/bin/nvcc' failed with exit status 1

서치 결과 해결 방법

gcc 버전 바꾸기
pytorch 버전 바꾸기
pytorch-lightning 버전 바꾸기

pip3 install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 torchlightning --extra-index-url https://download.pytorch.org/whl/cu113

[WIP] CIL++ 도커 없이 로컬에서 돌려보자

Tue, 18 Mar 2025 14:41:43 GMT

CIL++ Git Repo 레포를 도커 없이 로컬에서 돌리기 위한 똥꼬쇼

깃 클론

git clone git@github.com:yixiao1/CILv2_multiview.git

깃 레포 클론

칼라 설치

칼라 0.9.13을 사용중이므로 칼라 시뮬레이터 0.9.13의 설치가 필요함

설치 방법

칼라 도큐먼트를 참고
- 해당 방법으로 설치 시도 실패
칼라 0.9.13 git에서 직접 다운받아 carla simulator 0.9.13 설치
- 위의 공식 깃헙에서 tar파일 다운 후, /opt 아래에서 압축 해제 진행하여 칼라 설치 완료

경로 설정

export ROOTDIR=/home/amlab/save_ws
export CARLAPATH=$ROOTDIR/CARLA_0.9.13/PythonAPI/carla/:$ROOTDIR/CARLA_0.9.13/PythonAPI/carla/dist/carla-0.9.13-py3.7-linux-x86_64.egg

까지 하다가 뭔가 고치고 성공했는데 기억이 안난다

[논문스터디] KPConv: Flexible and Deformable Convolution for Point Clouds

Tue, 18 Mar 2025 12:55:37 GMT

Bibtex 인용

@InProceedings{Thomas_2019_ICCV,
author = {Thomas, Hugues and Qi, Charles R. and Deschaud, Jean-Emmanuel and Marcotegui, Beatriz and Goulette, Francois and Guibas, Leonidas J.},
title = {KPConv: Flexible and Deformable Convolution for Point Clouds},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2019}
}

code

요약

intermediate representation이 없는 kernel point convolution을 제안
convolution weight는 유클리드 공간에서 kernel point로 위치가 지정
- kernel point 변경 가능 → 유연성 제공
local grometry에 kernel point를 adapt하기 위해 deformable convolution으로 확장 가능
규칙적인 sub-sampling을 통해 밀도에 대해서 robust, efficient
SOTA임

인트로

discrete convolution에서는 효율적인 계산 가능 but 공간에서는 불가능
- 비정형 데이터(non-grid) like 3d point cloud같은 데이터 사용하는 application 증가
- 포인트 클라우드는 순서도 없고 그리드랑 다르고 spatially localized되어있음
이러한 데이터를 처리하기 위해 여러 방법이 제안되어옴
- MLP이용 직접 처리
- point에 직접 convolution
KPConv
- local 3D filter로 구성
- kernel pixel이 아닌 point기반 weight 영역 정의
- kernel point의 수에 제한이 없음 → 설계가 유연함
deformable
- 각 convolution location에 대해 다른 shift를 생성
  - 입력 포인트 클라우드에 대해서 kernel을 adapt한다는 의미임
radius neighborhood 방식 + regular sub-sampling → density에 robust함

기여

3d point cloud를 위한 새로운 kernel 제시
deformable한 kernel제시
새로운 네트워크 아키텍쳐 제시

Projection networks
graph convolution network
pointwise MLP network
point convolution network

Kernel Point Convolution

A Kernel Function Defined by Point

KPConv는 local 3d filter로 구성
kernel point를 사용하여 kernel의 weight 영역을 정의
포인트의 수에 제한이 없어 flexible한 설계 가능
밀도가 다른 데이터 처리시 robust
일반적 point kernel 함수는 아래와 같음
- $(F * g)(x) = \sum_{x_i \in N_x} g(x_i - x) f_i$
- $x_i$ - neighbor point of $x$
- $N_x$ - point in radius $r$
- 일관된 구형 영역을 갖는 것이 네트워크 학습에 의미있다 생각함
point를 이용해서 3D Space에서 area를 어떻게 정의
- 가장 intuitive
- localized feature
kernel function for any point ← this paper propose
- $g(y_i) = \sum_{k < K} h(y_i, ~x_{k}) W_k$
- $h(y_i, ~x_{k}) = \max \left( 0, 1 - \frac{|y_i - ~x_{k}|}{\sigma} \right)$
- $\sigma$ 는 influence distance of kernel point인데, input density에 따라 결정됨
- 가우시안 correlation이 아니라 선형 correlation사용해서 심플하고 back-propagation이 쉬움

Rigid or Deformable Kernel

각 점이 다른 점에 repulsive한 force를 가지는 최적의 위치로 kernel point 배치
sphere안에 있고 atrractive force를 가지는 점들로 제한을 두고 한개의 점은 center에 위치해야함
모든 점들은 평균 반지름이 $1.5 \sigma$ 가 되도록 re-scale
- other kernel들과 small overlap을 ensure
- space coverage를 보장
K가 충분히 커서 g의 area를 커버가능할 경우 좋음
kernel point position을 학습시켜서 효율성을 확장시킬수도있음
$g$ 가 $~x_k$ 에 대해 미분가능하므로 학습가능한 매개변수임
deformable KPConv는 아래와 같은 g를 가짐
- $(F * g)(x) = \sum_{x_i \in N_x} g_{\text{deform}}(x - x_i, \Delta(x)) f_i$
- $g_{\text{deform}}(y_i, \Delta(x)) = \sum_{k < K} h(y_i, x_{ek} + \Delta_k(x)) W_k$
local shift는 rigid KPConv가 입력 feature를 3K로 매핑하는 의 출력으로 정의됨

global nerwork의 lr의 0.1로 학습
- rigid kernel → shift
- deformable kernel → output
image convolution 에서 derive된 이런 방식을 사용하면, kernel point가 input point와 멀어지는 방향으로 학습될 수 있음
- 이러면 네트워크에서 소실됨 bcz. shift의 gradient가 influence range안에 없으면 null 이 됨
- fitting regularization loss를 제안함
regularization loss
- $L_{\text{reg}} = \sum_x L_{\text{fit}}(x) + L_{\text{rep}}(x)$
- $L_{\text{fit}}(x) = \sum_{k < K} \min_{y_i} \left( | y_i - (x_{ek} + \Delta_k(x)) | / \sigma \right)^2$
- $L_{\text{rep}}(x) = \sum_{k < K} \sum_{l \neq k} h(x_{ek} + \Delta_k(x), x_{el} + \Delta_l(x))^2$
fitting loss는 kernel point와 그 점의 가장 가까운 점과의 거리에 대한 loss
repulsive loss는 kernel들 사이에 overlap에 대한 loss → 완전히 겹치지 않도록
잘되는거 봐라 ㅇㅇ

Kernel Point Network Layers

Subsampling to deal with varying densities
- grid subsampling → 위치에 대한 일관성 보장
- 각 non-empty한 cell에 대해서 질량 중심이 되는 위치를 feature의 location으로 사용
Pooling Layer
- 이미 그리드 기반으로 subsampling했으니까 그냥 그리드 크기를 두배씩 키워가면서 pooling layer구성함
- 새로운 위치에 대한 feature는 max pooling혹은 KPConv를 활용하여 얻음
- 여기에서는 KPConv를 활용하여 얻고 이거를 stride KPConv라 부름
KPConv layer
- convolution 층의 입력
  - point, feature, matrix of neighbourhood indices
  - matrix of neighborhood size는 가장 큰거 따라감
    - 안쓰이는 애들 포함되는데 convolution 계산에서는 무시됨

Kernel Point Network Architecture

empirically 두개로 만듬 → classification and segmentation
KP-CNN
- 5 layer classification convolution network
- 각 layer 에 2 conv layer
- resnet처럼 디자인됐다
  - image convolution대신 batch norm and leaky ReLu를 사용했다

- last layerdㅔ서는 global average pooling으로 feature aggregation을 하고 fully connected layer랑 softmax로 처리
- deformable KPConv에서는 마지막 5개 KPConv에 대해서만 deformable사용

KP-FCNN
- fully convolution layer for segmentation
- encoder는 위랑 같음
- decoder는 nearest upsampling을 사용
- skip connection으로 encoder decoder사이 연결 있음
- unary convolution을 활용해서 feature concatenate
- nearest upsampling을 KPConv로 대체해도 되지만 성능에 별차이없음

[논문 스터디] iSAM2: Incremental Smoothing and Mapping Using the Bayes Tree

Tue, 18 Mar 2025 12:45:54 GMT

Bibtex 인용

@INPROCEEDINGS{5979641,
  author={Kaess, Michael and Johannsson, Hordur and Roberts, Richard and Ila, Viorela and Leonard, John and Dellaert, Frank},
  booktitle={2011 IEEE International Conference on Robotics and Automation}, 
  title={iSAM2: Incremental smoothing and mapping with fluid relinearization and incremental variable reordering}, 
  year={2011},
  volume={},
  number={},
  pages={3281-3288},
  keywords={Simultaneous localization and mapping;Graphical models;Smoothing methods;Sparse matrices;Accuracy;Trajectory},
  doi={10.1109/ICRA.2011.5979641}}

요약

기존의 그래픽 베이스 모델 추론 알고리즘과 sparse matrix factorization method의 연결을 이해하기 위한 기초를 제공
- bayes tree라는 새로운 구조를 통해 제공
Clique tree와 유사하지만 방향성을 가지는 bayes tree를 제시
SLAM 문제의 squre root information matrix에 더 자연스럽게 mapping

인트로

Probabilistic inference algorithm
- 다양한 로봇 공학 분야에서 활용 ex) SLAM, tracking, etc
본 연구는 large-scale SLAM에 집중
센서의 불확실성 때문에 probabilistic inference algorithm이 선호됨

기여

Bayes tree라는 새로운 데이터 구조를 제안
- matrix factorization을 bayes net으로 변환 가능
- QR factorization의 결과가 더 natural하게 mapping됨
- 구조를 conditional probabilistic density로 분석 가능
iSAM2라는 새로운 알고리즘을 개발함
- Incremental variable re-ordering과 fluid re-linearization, periodic batch step의 제거를 통한 efficiency 개선
- sparse non-linear problem에 효율적 solution
- bayes tree기반 영향을 받는 부분만 re-calculate → 효율성 증대
- 실시간성 확보

Problem

Target

non-linear한 추정 문제에 대해 incremental하고 real-time인 해결 방법
- incremental: 새로운 measurement가 추가될 때 마다 추정값을 업데이트 현재 측정된 모든 값으로 도출할 수 있는 가장 정확한 환경모델 반영
- real-time: 작업을 수행하는 동안 추정값을 실시간 제공, 탐색 및 계획을 위한 추정값 필요
주어진 추정 문제를 그래프 모델로 표현하기 위해 factor graph사용
- 다양한 확률 분포나 비용 함수를 포함할 수 있음
- factor node - 랜드마크 측정값, 오도메트리(움직임에 관한 정보), loop closing constraint(재방문 시 발생하는 제약조건) 등
- variable node - 추정하려는 변수, 각 시간 스텝에서의 위치, 랜드마크의 위치 등

Gaussian Case

non-linear least squre 문제
- $\arg\min_{\Theta} \frac{1}{2} \sum_{i} \lVert h_i(\Theta_i) - z_i \rVert^2_{\Sigma_i}$
  - $h_i$ - measurement function
  - $z_i$ - measurement
  - $|| e ||^2_Σ = e^T Σ^(-1) e$ - mahalanobis distance
linearization
- gauss-newton, levenberg-marquardt
- 각 iteration에서 linearization point $\theta$ 부근에서 테일러 전개를 수행 새로운 least squre 문제를 도출
- $argmin_Δ || AΔ - b ||^2$
- $A$ - measurement jacobian
- linearization 됐으니까 새로운 추정값은 단순 +로 계산가능
$A \Delta - b$의 최소 해는 Cholesky 혹은 QR factorization 을 통해 계산
iSAM2는 QR factorization 사용
- incrementally update square root information matrix
- measurement 추가 시에 matrix variable의 순서가 최적이 아니게 되고 fill-in현상이 발생할 수 있음
  - periodic batch re-ordering을 수행하고 batch factorization 진행
  - iSAM과 다르게 re-linearization은 batch 단계에서만 수행
  - period of batch step은 heuristically(empirically인듯?

The Bayes Tree

기존의 factor graph를 sparse matrix로 바꾸어 sparse linear algebra대신 graph model 자체에서 연산을 수행

Inference and Elimination

추정은 factor graph를 bayes net으로 변환하는 것으로 이해할 수 있음
변수 제거 $P(\Theta) = \prod_{j} P(\theta_j | S_j)$
- $S_j$는 $\theta _j$와 직접 연결된 변수들의 집합
factor graph 변환 과정

- 위의 과정을 반복하여 모든 variable을 제거하면 bayes net이 됨
- probabilistically variable의 probability의 곱이 conditional probability의 곱으로 변환된는 것과 같음

위에 그림이 이해가 좀 안됨

보조 강의

모두 제거하고 나면 all factor들이 conditional probability로 표현가능해짐, tree structure를 가지게 됨
- 이것이 부분적으로 새로운 measurement에 대해 inference할 수 있게하는 핵심 요소

Gaussian Case

elimination 과정이 sparse QR factorization of measurement jacobian과 같음
factor에 대한 gaussian density는 $f_{\text{joint}}(\Delta_j, s_j) \propto \exp \left( -\frac{1}{2} \left| a \Delta_j + A_S s_j - b \right|^2 \right)$로 정의
- $A_j = [a | A_S]$ - $\Delta _j$에 연결된 모든 요인의 partial derivatives를 concat한 matrix
bayes tree로 변환할 때 사용되는 conditional probability는 $P(\Delta_j | s_j) \propto \exp \left( -\frac{1}{2} (\Delta_j + r s_j - d)^2 \right)$
- $r = a^\dagger A_S , ~d =a^\dagger b$
- $a^\dagger$ 는 $a^T a$의 pseudo-inverse matrix
- $a$ - $\Delta$와 관련된 factor의 일부, partial derivative
- $b$ - $\Delta$와 관련된 measurements
- $d$ - $a^\dagger b$
- $A_S$ - $S_j$의 partial derivatives 집합
- $S_j$ - seperator; $\theta_j$와 직접 연결된 variable
새로운 factor는 $f_{\text{new}}(s_j) = \exp \left( -\frac{1}{2} \left| A'_0 s_j - b'_0 \right|^2 \right)$
- $A_0 ' = A_S - ar ,~ b'_0 = b-ad$
이 과정은 gram-schmidt의 한 단계, 밀도 형태로 해석됨
sparse vector $\gamma$와 scalar $d$는 bayes net의 single joint conditional density를 지정하거나 sparse information matrix의 하나의 행
least square problem은 tree의 leaves to root방향으로 한 번 통과하면서 최적의 $\Delta ^*$를 계산, root to leaf 로 내려가며 각 변수의 최적 할당을 구함 → backsubstitution

Creating Bayes Tree

Bayes tree
- linear algebra와의 equivalence를 더 잘 표현
- 새로운 recursive algorithm을 가능하게 함
- chordal 구조 - 모든 부분 순환 구조의 크기가 3이하, 4 이상이면 현을 가져야함
- 최적화 및 marginalization에 용이
- 방향성을 가지고, factored probability density를 encode하는 방식
- 각 node에 대해 $P(\Theta) = \prod_k P(F_k | S_k)$ conditional density 정의
  - $S_k$ - 클리크 $C_k$와 부모 클리크 $\Pi_k$의 intersection
  - $F_k$ - 나머지 변수들

Gaussian Case

하나의 bayes tree가 여러 다른 square root information factor에 대응할 수 있음
- 임의의 순서가 매겨지기 때문
- 전체 variable의 순서는 fill-in이나 수치에 영향주지 않고 matrix내의 위치에만 영향

Incremental Inference

inference 영상

incremental inference는 간단한 트리 수정으로 가능
영향을 받는 clique와 root사이의 경로만 영향을 받음 (clique to root , root to clique)
new factor가 추가되므로, 다시 eliminating process를 거침

사진 이해 잘 안됨

Incremental Variable Ordering

variable ordering은 sparse matrix solution에 필수적
square root information matrix의 추가 항목인 fill in 최소화를 위해 optimal order가 추구됨
chordal 상황 제외 fill in은 불가피함
- NP-hard, COLAMD등을 통해 optimal한 순서 찾기 가능
incremental inference 시에, 각 update 마다 variable update가능
- iSAM에서 사용한 periodic batch reordering 불필요
- bayes tree에서 partial variable reordering을 수행
  - globally optimal하지는 않지만 locally optimal한 값을 제공
tree 구조가 가지는 장점에 대한 예시

measurement를 통합하는데 발생하는 비용은 root에 가까워 질수록 작아짐
COLAMD와 같은 휴리스틱을 locally 사용하면 현재단계의 fill in만을 고려하는 한계 존재
- 가장 최근 접근한 variable을 끝 순서로 배치하는 incremental ordering 제안
incremental ordering
- constrained COLAMD 사용, most recent variable을 강제로 끝 순서에 배치하면서도 globally 준수한 order를 유지
- 이후 업데이트 시에 영향을 받는 부분을 작게 유지할 수 있는 방법
- 다만 큰 loop closing 발생시 예외적으로 비용이 큼

- batch보다 나음 - 당연하지 않나
- 특정 구간에서 급격한 fill in의 증가 - 아마도 loop closing 때문

The iSAM2 Algorithm

non-linear factor 처리 → 기존의 bayes tree는 linear만 다룸
fluid re-linearization → 필요한 부분만 partially linearization수행 cost를 줄이고 효율성을 향상
partial state update → 실제 변화가 있는 factor들에 대해서만 update 수행

Fluid Relinearization

linearization 필요성 판단
- 현재 추정값이 linearization point를 벗어날 경우
- 임계값 이상의 변화가 발생할 경우
bayes tree 의 부분적 수정
- linearization을 수행하는 변수와 관련된 정보만 제거하여 partial relinearization 수행
marginal factor 계산
- relinearization 과정에서 발생한 eliminated sub-tree 정보를 상위단으로 전달
- caching시에 tree의 중간에서 부터 다시 계산도 가능

Partial State Update

update partially, 변경된 변수만 계산 → computational cost 감소
top tree만 변경되므로 sub-tree로는 제한적으로 propagate
특정 clique의 variable의 $\Delta$ 변화량이 임계치 이하면 업데이트 중지
- 해당 clique의 sub tree variable의 변경이 없음이 보장됨
nearly exact solution 유지 가능

Algorithm and Complexity

algorithm
- 변수 집합 추정
- incremental non-linear factor $F$ 고려
- 새로운 factor, variable이 계속 추가돔
- bayes tree 활용 최적화 수행
- 선형화 시스템 반복적 해결 방식
complexity
- general case
  - gauss-newton 방식 사용
  - 최소점 근처에서 quadratic convergence
- exploration task
  - 각 pose마다 constrain 존재
  - 영향을 받는 factor가 상수개 → $O(1)$
- loop closure
  - general case → full factorization 필요 → $O(n^3 )$
  - under certain assumption → backsubstitution → $O(n^{1.5} )$
- emperical complexity
  - 이론적 상한보다 훨씬 낮음
  - 매 단계에서 대부분 partially compute/refactorization수행하므로 대부분의 경우 효율적 계산

[WIP] ORB SLAM 2

Fri, 14 Mar 2025 04:41:23 GMT

1. Pangolin 설치

git clone https://github.com/stevenlovegrove/Pangolin.git
cd Pangolin
git checkout v0.6
cmake -B build
cmake --build build

이거로 하면 ORB SLAM에 패키지 맥이는게 너무 귀찮게함 자꾸 경로 못찾음

cd build && rm -rf *
cmake .. -DCMAKE_INSTALL_PREFIX=/usr/local
make -j
sudo make install

이렇게 해서 해결함

2. opencv

sudo apt update
sudo apt upgrade
sudo apt install build-essential cmake git pkg-config
sudo apt install libjpeg-dev libtiff-dev libpng-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt install libxvidcore-dev libx264-dev
sudo apt install libgtk-3-dev
sudo apt install libatlas-base-dev gfortran
sudo apt install python3-dev
mkdir opencv && cd opencv
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
cd opencv && git checkout 3.2.0
cd ..
cd opencv_contrib && git checkout 3.2.0
cd ..
cd opencv && mkdir build && cd build
cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=OFF -D WITH_IPP=OFF -D WITH_1394=OFF -D BUILD_WITH_DEBUG_INFO=OFF -D BUILD_DOCS=OFF -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_QT=OFF -D WITH_GTK=ON -D WITH_OPENGL=ON -D OPENCV_EXTRA_MODULES_PATH=../OpenCV_contrib/modules -D WITH_V4L=ON  -D WITH_FFMPEG=ON -D WITH_XINE=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D WITH_CUDA=OFF  -DLAPACKE_INCLUDE_DIR=/usr/include/lapacke ..
make -j
sudo make install
sudo ldconfig

reference

opencv 3.2.0을 깔았는데 안돌아가 슬퍼 눈물나

알고보니 파일이 꺠져서 안되는 것이었다..

3. ROS Noetic

-중략-

그만두고 orbslam3으로 건너갔음

[설치] opencv 설치

Tue, 11 Mar 2025 10:00:22 GMT

아래 순서로 진행

sudo apt update
sudo apt upgrade
sudo apt install build-essential cmake git pkg-config
sudo apt install libjpeg-dev libtiff-dev libpng-dev
sudo apt install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev
sudo apt install libxvidcore-dev libx264-dev
sudo apt install libgtk-3-dev
sudo apt install libatlas-base-dev gfortran
sudo apt install python3-dev
sudo apt install libblas-dev libopenblas-dev

mkdir opencv && cd opencv
git clone https://github.com/opencv/opencv.git
git clone https://github.com/opencv/opencv_contrib.git
cd opencv && git checkout 
cd ..
cd opencv_contrib && git checkout 
cd ..

cd opencv && mkdir build && cd build

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=OFF -D WITH_IPP=OFF -D WITH_1394=OFF -D BUILD_WITH_DEBUG_INFO=OFF -D BUILD_DOCS=OFF -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_QT=OFF -D WITH_GTK=ON -D WITH_OPENGL=ON -D OPENCV_EXTRA_MODULES_PATH=../OpenCV_contrib/modules -D WITH_V4L=ON  -D WITH_FFMPEG=ON -D WITH_XINE=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D WITH_CUDA=OFF  -DLAPACKE_INCLUDE_DIR=/usr/include/lapacke ..
make -j
sudo make install
sudo ldconfig

에러 발생한 것들

1

CMakeLists.txt에 set(CMAKE_CXX_STANDARD 11) 추가

2

opencv_lapack.h에서 #include "/usr/include/eigen3/Eigen/src/misc/lapacke.h" include path 정확하게 섧정

descriptor.cpp에서

CV_Assert(image.size > 0);
CV_Assert(cost.size > 0);

를

CV_Assert(image.cols > 0 && image.rows >0);
CV_Assert(cost.cols > 0 && cost.rows >0);

로 바꾸기

sudo apt-get install liblapack-dev liblapacke-dev

로 lapack install하기

3

/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_zgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, void const*, void const*, int, void const*, int, void const*, void*, int)'
/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_dgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, double, double const*, int, double const*, int, double, double*, int)'
/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_sgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, float, float const*, int, float const*, int, float, float*, int)'
/usr/bin/ld: ../../lib/libopencv_core.so.3.2.0: undefined reference to `cblas_cgemm(CBLAS_ORDER, CBLAS_TRANSPOSE, CBLAS_TRANSPOSE, int, int, int, void const*, void const*, int, void const*, int, void const*, void*, int)'
collect2: error: ld returned 1 exit status

sudo apt update
sudo apt install libblas-dev
sudo apt install libopenblas-dev

4 필수 의존 패키지 누락

도커 환경에서 에러 발생

python3-dev python3-numpy python3-pip libjasper-dev liblapacke-dev libeigen3-dev libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev

5 cmake 경로 오류

cmake 명령어 수정

cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D WITH_TBB=OFF -D WITH_IPP=OFF -D WITH_1394=OFF -D BUILD_WITH_DEBUG_INFO=OFF -D BUILD_DOCS=OFF -D INSTALL_C_EXAMPLES=ON -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=OFF -D BUILD_TESTS=OFF -D BUILD_PERF_TESTS=OFF -D WITH_QT=OFF -D WITH_GTK=ON -D WITH_OPENGL=ON -D OPENCV_EXTRA_MODULES_PATH=../../opencv_contrib/modules -D WITH_V4L=ON  -D WITH_FFMPEG=ON -D WITH_XINE=ON -D BUILD_NEW_PYTHON_SUPPORT=ON -D OPENCV_GENERATE_PKGCONFIG=ON -D WITH_CUDA=OFF  -DLAPACKE_INCLUDE_DIR=/usr/include/lapacke ..

"Hello, Estelle!"

[논문스터디] PointNet++: Deep hierarchical feature learning on point sets in a metric space

Bibtex 인용

요약

인트로

기여

3. Method

3.1 Review of PointNet

3.2 Hierarchical Point Set Feature Learning

3.3 Robust Feature Learning under Non-Uniform Sampling Density

3.4 Point Feature Propagation for Set Segmentation

[논문스터디] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Bibtex 인용

요약

1. 인트로

기여

2. Related Work

Point Cloud Feature

Deep Learning on 3D Data

Deep Learning on Unordered Sets

3. Problem Statement

4. Deep Learning on Point Sets

4.1. Properties of Point Sets in $\R ^n$

4.2. PointNet Architecture

Symmetry Function for Unordered Input

Local and Global Information Aggregation

Joint Alignment Network

4.2. Theoretical Analysis

Universal approximation

Theorem 1.

Bottleneck dimension and stability

Theorem 2.

[논문스터디] Semantic Graph Based Place Recognition for 3D Point Clouds

Bibtex 인용

요약

인트로

기여

Methodology

A. Semantic Graph Representation

B. Graph Similarity Network

Experiment

[WIP] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap

ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual–Inertial, and Multimap

3. System Overview

1) Atlas

2) Tracking thread

3) Local mapping thread

4) Loop and map merging thread

[WIP] ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

3. ORB SLAM2

c. Bundle Adjustment with Monocular and Stereo Constraints

Motion anly BA

Local BA

Full BA

[WIP] PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

1 Introduction

2. Problem Statement

3. Method

3.1 Review of PointNet

3.2 Hierarchical Point Set Feature Learning

Sampling layer

Grouping layer

PointNet layer

3.3 Robust Feature Learning under Non-Uniform Sampling Density

MSG Multi-scale grouping

MRG Multi-resolution grouping

3.4 Point Feature Propagation for set Segmentation

[WIP] PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

1. Introduction

2. Related Work

Point Cloud Feature

Deep Learning on 3D Data

Deep Learning on Unordered Sets

3. Problem Statement

4. Deep Learning on Point Sets

4.1. Properties of Point Sets in $\R ^n$

4.2. PointNet Architecture