yooni1231__.log

Java Coding Test - Input/Output

Tue, 25 Nov 2025 05:14:44 GMT

1. Scanner 말고 BufferedReader를 써야하는 이유

성능 차이 : BUfferedReader - 데이터를 한 덩어리로 읽어서 버퍼 (8KB)에 저장 Scanner - 매번 작은 단위로 읽고 파싱하는 과정이 많음

```

BufferedReader br = new BufferedReadre (new InputScannerReader(System.in)); int n = Integer.parseInt(br.readline());

2. 알고리즘 문제의 기본 구조

1) 첫 줄에 n 입력 2) 둘째 줄에 n개의 숫자 입력 3) 반복문으로 처리 4) 결과 출력

int n = Integer.parserInt(br.readLine());
StringTokenizere st = new Stringtokenizer(br.readline());
for ( int i=0; i

`2.1 입력 패턴`


2.1-1 숫자 하나
int n= Integer.parseInt(br.readline());
2.1-2 한 줄에 여러 숫자
입력 : 3 10 7
StringTokenizer st = new StringTokenizere(br.readLine());
int a = Integer.parseInt(st.nextToken());
int b = Integer.parseInt(st.nextToken());
int c = Integer.parseInt(st.nextToken());
2.1-3 배열 입력
첫 줄 : 5
둘째 줄: 1 2 3 4 5
int n = new StringTokenizer(br.readLine());
int[] arr = new int[n];
StringTokenizer st = new StringTokenizer(br.readLine())
for (int i=0; i
2.1-4 문자열 입력 

String s= br.readLine();
2.2 출력 패턴
2.2-1 단일 결과 출력
System.out.println(answer);
2.2-2 여러 줄 출력
StringBuilder sb = new StringBuilder();
for (int i=0; i
2.3 입출력 기본 템플릿

import java.io.*;
import java.util.*;

public class Main{
public static void main(String[] args){
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
StringBuilder sb = new StringBuilder(); // 여러줄의 출력을 효율적으로 만듬 

//입력 
int n = Integer.parseInt(br.readLine()); //한줄을 읽어 문자열을 정수로 바꿈 
StringTokenizer st = new StringTokenizer(br.readLine()); //공백을 기준으로 토큰 분리 
int[] arr = new int[n];
for(int i=0; i
2.4 전체 예제

백준 A+B 문제
import java.io.*;
import java.utils.*;

public class Main{
public static void main(String[] args) throws IOException{

BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
int a = Integer.parseInt(st.nextToken());
int b = Integer.parseInt(st.nextToken());

System.out.println(A+B);
}
}



숏폼 비디오 자동 생성 프로그램 사용 모델 
Thu, 20 Nov 2025 00:55:48 GMT
** 본 포스팅은 2024년 상반기 진행했던 프로젝트에서 사용했던 컴퓨터 비전 파이프라인과 DeepOCSort 알고리즘 모델에 대한 설명을 담은 글입니다. 이 시스템은 사용자가 업로드한 원본 동영상에서 특정 인물을 추적하고, 해당 인물을 중심으로 9:16 숏폼 비디오를 자동 생성하는 것을 목표로 합니다. **
핵심 문제 및 Computer Vision 파이프라인 자동화
1.해결하고자 했던 문제
기존 수동 편집 방식에서는 동영상 전체를 보며 특정 인물을 지속적으로 중앙에 배치하고, 화면 비율에 맞게 크롭 영역을 수동으로 조정하는 과정이 시간 소모적인 문제였습니다. 우리 프로젝트는 이 과정을 인물 탐지 -> 인물 추적 -> 스마트 크롭의 3단계 computer vision  파이프라인으로 자동화하여 해결했습니다. 
2. 파이프라인 구성

인물 탐지 : 비디오 프레임마다 존재하는 모든 인물의 경계 상자의 좌표를 추출합니다. ( YOLOv5 사용)
인물 추적 : 사용자가 선택한 특정 인물에 고유 ID를 부여하고, 프레임이 바뀌어도 해당 ID를 유지하며 추적합니다.
스마트 크롭: 추적된 인물의 위치를 기반으로, 최종 숏폼 비디오의 화면 중앙에 인물이 자연스럽게 위치하도록 ROI(Region of Interest)를 동적으로 계산합니다.

3. DeepOCSort:인물 재식별 추적 알고리즘 구현
단순한 객체 탐지나 기본적인 추적 알고리즘은 인물이 프레임 밖으로 잠시 나갔다 들어오거나 다른 사람과 겹칠 때 ID가 바뀌는 한계가 있습니다. 이를 해결하기 위해 DeepOCSort 알고리즘을 최종 구현에 도입했습니다. 

구성요소
DeepOCSort (Deep Simple Online and Realtime Tracking)


칼만 필터: 객체 다음의 위치를 예측하는데 사용되어 움직임 기반의 매칭 기초를 제공
Re-Id 네트워크: 딥러닝 모델을 사용하여 각 인물의 외관적 특징 벡터를 추출합니다. 이 벡트는 인물의 고유한 지문 역할을 하여, 인물이 가려지거나 잠시 사라져도 동일 인물임을 재식별하는데 결정적인 역할을 합니다. 


OSNET(Omni-Scle Network)


DeepSORT에서 외관 특징 추출기로 사용되는 강력한 Re-ID 모델입니다. 다중 스케일(Omni-Scale) 특징을 효과적으로 포착하여, 인물이 멀리 있거나(작게 보일 때) 가까이 있을 때(크게 보일 때) 모두 특징 벡터를 생성합니다.
DeepOCSort에서는 DeepSORT의 프레임워크 내에서 OSNET이 특징 추출자 역할을 수행함으로써 추적의 정확도(Identity Preservation)를 극대화합니다.

4. 스마트 크롭 알고리즘 설계 및 구현
DeepOCSort가 반환하는 특정 인물의 경계 상자 꼭짓점 좌표를 기반으로 , 최종 숏폼 비디오를 위한 최적의 크롭 영역을 동적으로 계산했습니다. 

동적 ROI 계산 로직 
추적 인물이 대상이 9:16화면의 중앙에 위치하고 화면에서 자연스러운 여백을 확보하는 것입니다. 
중심 좌표 계산: $$C_x = \frac{x_{min} + x_{max}}{2}, \quad C_y = \frac{y_{min} + y_{max}}{2}$$크롭 영역의 크기 결정:원본 동영상의 화면 너비 $W_o$, 높이 $H_o$.최종 숏폼의 화면 비율은 $9:16$입니다.

최종 RoI 좌표 계산:
크롭 영역이 중심 좌표 $(C_x, C_y)$를 중심으로 하도록 최종 RoI의 네 모서리 좌표 $(\mathbf{X}{start}, \mathbf{Y}{start}, \mathbf{X}{end}, \mathbf{Y}{end})$를 계산합니다. 여백 계산 로직을 적용하여 화면을 벗어나지 않게 조정합니다 (예: $X_{start}$가 0보다 작아지면 $X_{start}=0$으로 설정하고, $X_{end}$가 $W_o$를 초과하지 않도록 제한).

스무딩 필터 적용 ( 떨림 방지 ):
추적 좌표는 프레임마다 미세하게 흔들릴 수 있으며, 이로 인해 최종 크롭 화면이 '떨리는(Jittering)' 현상이 발생합니다.
이를 방지하기 위해 RoI의 중심 좌표 $(C_x, C_y)$에 지수 이동 평균(Exponential Moving Average, EMA) 스무딩 필터를 적용했습니다.


$$\text{Smoothed } C_x^t = \alpha \times C_x^t + (1 - \alpha) \times \text{Smoothed } C_x^{t-1}$$
여기서 $\alpha$는 스무딩 계수로, $\alpha$ 값이 작을수록 움직임은 부드러워지지만 반응 속도는 느려집니다. 최적의 $\alpha$ 값을 튜닝하여 부드러운 화면 이동과 즉각적인 인물 추적 사이의 균형을 맞췄습니다.
5. 백엔드 파이프라인 구현 ( Flask & FFMPEG )
전체 AI 모델 파이프라인의 구동과 최종 비디오 렌더링은 Flask 기반의 백엔드 시스템을 통해 자동화되었습니다.

Flask API 엔드포인트 개발
사용자로부터 원본 영상 파일을 업로드받는 API 엔드포인트를 개발했습니다.

업로드된 영상은 백엔드 서버에서 인물 탐지 → DeepOCSort 추적 → 스마트 크롭 계산 과정을 거칩니다.
AI 모델 파이프라인의 최종 결과는 각 프레임별 최적의 크롭 영역 좌표 (RoI) 데이터입니다.

FFMPEG을 활용한 자동 인코딩 및 렌더링
AI 모델이 계산한 좌표값을 기반으로 실제로 숏폼 비디오 파일을 생성하는 데 FFMPEG 라이브러리를 활용했습니다.

데이터 전달: Flask API는 AI 모델에서 반환된 프레임별 RoI 좌표 리스트를 FFMPEG 명령에 전달할 수 있는 형태로 가공합니다.
FFMPEG 필터 적용: FFMPEG의 crop 필터 기능을 사용하되, 필터의 인수를 프레임별로 동적으로 변경할 수 있도록 구현했습니다. 이는 좌표 리스트를 기반으로 타임스탬프에 따라 크롭 영역이 부드럽게 전환되도록 명령어를 구성하는 핵심입니다.
자동 인코딩: FFMPEG은 이 명령을 받아 원본 동영상을 자동으로 읽어 들이고, 추적된 인물을 중앙에 배치하는 동적 크롭 및 9:16 비율 조정을 수행한 후, 최종 숏폼 비디오로 인코딩 및 렌더링합니다.
이 파이프라인 구축을 통해 프론트엔드에서의 영상 업로드 요청 하나로 AI 분석부터 최종 결과물 생성까지의 전 과정이 자동화되었습니다.



라즈베리파이5로 카메라 2대 + 듀얼 마이크로 실시간 대화하기 (1) 
Fri, 27 Jun 2025 06:27:35 GMT
앞서 개발한 소프트웨어를 적용시킬 하드웨어를 설정할 단계입니다. 
Rasberry 5를 기반으로 다음과 같은 구조의 실시간 AI 대화 시스템을 구축해보겠습니다. 
필요한 물품은 다음과 같습니다. 












 🧩 사용한 하드웨어 구성
  
    
      구성 요소 모델명 기능
    
    
      Raspberry Pi 5 8GB 메인 컨트롤 보드
      Camera Module 3 RGB / NoIR  사용자가 보고 있는 그림 인식 
      ReSpeaker 2-Mics Pi HAT Seeed 마이크 2개 내장, 오디오 입출력
      IR LED 조광기 850nm  사용자의 동공 움직임 감지 
      스피커 유선 or USB GPT 응답 음성 출력
    
  

  

  





 주의: 모든 부품은 연결하기 전 라즈베리파이의 전원을 꼭 off 해주세요.
   부품 분해 후 봉투에 잘 넣어 보관해주세요.  
  
  

  


    📌  라즈베리파이5 





    📷 카메라 
  
    CAMERA 0 포트 → 일반 RGB 카메라
    CAMERA 1 포트 → NoIR 적외선 카메라
    Pi 5는 22핀 CSI 전용 케이블 사용 (15핀은 호환되지 않음)

  

   주의:  
  
 Pi 5는 고밀도 22핀 csi 포트를 사용하므로 동봉된 전용 케이블을 사용해야 합니다. 기존 1핀 케이블은 호환되지않습니다. 
    
 포트를 양 끝으로 조심스럽게 들어올린 후 꼽아주시면 됩니다. 

  
 케이블 방향을 조심해주세요. 검은색 부분이 이더넷 연결 쪽을 향해야 합니다. 


 

 

 
 
 

       👁️ 적외선 조광과 NoIR 카메라의 역할
    
  NoIR 카메라 란? 
    

    
 적외선 차단 필터가 제거된 라즈베리파이용 카메라 모듈로,
사람 눈에는 보이지 않는 850nm~940nm 대역의 IR 조명을 받아들일 수 있음 
    
 어두운 환경에서도 선명한 눈 주변 이미지를 촬영할 수 있음 




 



  

  

  

  

  





  🎤 오디오 입출력
  HAT에 내장된 마이크 2개로 음성 입력, 3.5mm 포트를 통해 스피커 연결 가능.
  USB 스피커는 선택사항이며 필수는 아닙니다.
    오디오 보드 연결 및 작동 
    Respeaker 2-Mics Pi HAT 은 GPIO에 바로 장착합니다.
    마이크 2개가 내장되어 있어 별도 입력 장치가 없이 사용자의 음성을 수신할 수 있습니다. 

  

듀얼 마이크 보드가 안쪽으로 오게 끼우시면 됩니다. 
  


    

    






🧩 완성 




Implementing Book Info Crawling from Yes24 (Spring Boot + Jsoup)
Tue, 24 Jun 2025 04:44:46 GMT
I developed a book management system using React and Spring Boot, and I wanted to allow users to simply enter a book title and have the rest of the information (author, publisher, price, genre) automatically filled in. To achieve this, I implemented a web scraping feature that book data from Yes24, a major Kroean online bookstore.
🤢 Problem
Manually entering all book details during registration is incovenient for users. So, I decided to build a feature that :
-Takes only the book title input 

Scrapes the top result from Yes24's search results
Populates the form with book metadata automatically

💻 Tech Stack
Jsoup - HTML parsing and web scraping library in Java
Spring Boot - REST API backend
React - Frontend interface (not covered here) 
REST API - Used to request and return scraped book data
📜 Final Controller
@GetMapping("/search-from-yes24")
public ResponseEntity searchFromYes24(@RequestParam String title) {
    try {
        String encodedQuery = URLEncoder.encode(title, StandardCharsets.UTF_8);
        String searchUrl = "https://www.yes24.com/Product/Search?domain=BOOK&query=" + encodedQuery;

        // Step 1: Parse search result page
        Document searchDoc = Jsoup.connect(searchUrl)
                .userAgent("Mozilla/5.0")
                .get();

        Element firstItem = searchDoc.selectFirst("div.itemUnit");
        if (firstItem == null) return ResponseEntity.notFound().build();

        String detailUrl = firstItem.selectFirst("a[href]").absUrl("href");
        if (detailUrl == null || detailUrl.isEmpty()) return ResponseEntity.status(502).build();

        // Step 2: Parse book detail page
        Document detailDoc = Jsoup.connect(detailUrl)
                .userAgent("Mozilla/5.0")
                .get();

        String bookTitle = detailDoc.selectFirst("h2.gd_name") != null
                ? detailDoc.selectFirst("h2.gd_name").text().trim()
                : "Unknown Title";

        String author = detailDoc.selectFirst("span.gd_auth a") != null
                ? detailDoc.selectFirst("span.gd_auth a").text().trim()
                : "Unknown Author";

        String priceText = detailDoc.selectFirst("em.yes_m") != null
                ? detailDoc.selectFirst("em.yes_m").text().replaceAll("[^0-9]", "")
                : "0";

        String publisher = detailDoc.selectFirst("span.gd_pub a") != null
                ? detailDoc.selectFirst("span.gd_pub a").text().trim()
                : "Unknown Publisher";

        String genre = "Unknown Genre";
        Elements genreEls = detailDoc.select("div#infoset_goodsCate dl.yesAlertDl dt:contains(Category) + dd ul.yesAlertLi li a");
        if (!genreEls.isEmpty()) {
            genre = genreEls.last().text().trim();
        }

        // Build DTO
        BookDTO dto = BookDTO.builder()
                .title(bookTitle)
                .author(author)
                .publisher(publisher)
                .price(Double.parseDouble(priceText))
                .genre(genre)
                .build();

        return ResponseEntity.ok(dto);

    } catch (Exception e) {
        e.printStackTrace();
        return ResponseEntity.status(500).build();
    }
}



데이터베이스 종류 
Thu, 29 May 2025 22:40:08 GMT
졸업 프로젝트 중 intellij와 mysql을 연결하면서 기존 사용하던 oracle database developer와의 차이가 궁금해져 정리해봅니다. 
데이터 베이스의 종류



분류 기준
종류
특징 및 사용 예시



관계형 (RDBMS)
MySQL, PostgreSQL, Oracle, MariaDB
테이블 기반 구조, SQL 사용, 트랜잭션 및 정합성 중시
예: ERP, 은행, 전자상거래 시스템


키-값 (Key-Value)
Redis, DynamoDB, Riak
단순한 키-값 구조, 빠른 조회 성능
예: 캐시 시스템, 세션 저장소


문서형 (Document)
MongoDB, CouchDB, Firebase
JSON 형태 문서 저장, 유연한 스키마
예: CMS, 사용자 데이터 저장


열 기반 (Column-Family)
Apache Cassandra, HBase
열 단위로 데이터 저장, 대규모 분석에 적합
예: 로그 분석, IoT 데이터 수집


그래프형
Neo4j, Amazon Neptune
노드와 간선 기반, 관계 중심 데이터 처리에 강점
예: 소셜 네트워크, 추천 시스템


시계열형
InfluxDB, TimescaleDB
시간에 따른 데이터 저장 최적화
예: 센서 로그, 서버 모니터링


객체 지향형
db4o, ObjectDB
객체 자체를 저장, OOP와의 연계 강함
예: 복잡한 객체 모델링 시스템


멀티모델
ArangoDB, OrientDB
다양한 모델 (문서+그래프+키-값 등) 혼합 지원
예: 유연한 구조의 복합 시스템


Oracle vs Mysql
| 구분          | Oracle Database Developer            | MySQL                    |
| ----------- | ---------------------------------------- | ---------------------------- |
| 개발사     | Oracle Corporation                       | MySQL AB (현재 Oracle이 인수)     |
| 라이선스    | 상용 (유료) / 무료 버전 있음 (XE)                  | 오픈소스 (GPL) + 상용 (Enterprise) |
| 성능      | 대규모 트랜잭션, 복잡한 비즈니스 로직에 최적화               | 빠른 응답, 웹/중소규모 서비스에 적합        |
| 지원 언어   | PL/SQL (고급 기능 지원)                        | SQL, 프로시저는 상대적으로 단순          |
| 스토리지 엔진 | 자체 스토리지 (고성능, 안정성 중점)                    | InnoDB (기본), MyISAM 등 선택 가능  |
| 트랜잭션 지원 | 완전한 ACID 보장, 고성능 트랜잭션                    | InnoDB 엔진 기준 ACID 지원         |
| 보안 기능   | 고급 보안: 행 수준 보안, 데이터 마스킹 등                | 기본적인 사용자 인증 및 권한 제어          |
| 운영 도구   | Oracle SQL Developer, Enterprise Manager | MySQL Workbench, CLI         |
| 확장성     | 수평/수직 확장 우수, 고가용성 구성 쉬움                  | 수평 확장에 제약, Replication 기반    |
| 적합한 사용처 | 대기업, 금융, 공공기관, ERP 시스템                   | 스타트업, 웹서비스, 교육용 프로젝트         |



Uvicorn 이란 ? 
Thu, 29 May 2025 20:02:14 GMT
모델 파일을 실행시킬 때 uvicorn 을 사용하여 실행시키게 됩니다. 
FastaAPI나 Starlette로 웹 api를 개발한 경험이 있다면 아마 다음 명령어를 실행해본 적이 있을 것입니다. 
uvicorn main:app --reload 
이 명령어 속에는 비동기 python 웹 프레임 워크의 핵심이 있습니다. 
오늘은 FastAPI를 실서비스로 구동할 때 필수로 사용하는 uvicorn에 대해 작성해보겠습니다. 
🌐 Uvicorn이란?
Uvicorn은 python의 비동기 웹 프레임워크를 실행하기 위한 agsi 서버입니다. 쉽게 말해, fastapi로 만든 api를 실제 웹 요청을 받을 수 있는 "서버"형태로 실행해주는 엔진입니다.
🔍 왜 WSGI가 아니라 ASGI인 이유?
전통적인 Django나 Flask 에서는 wsgi라는 동기 기반 표준을 사용합니다. 하지만 비동기 처리, websocket, 스트리밍 api 같은 실시간 기능을 지원하려면 AGSI 라는 새로운 비동기 표준이 필요합니다.
WSGI: 한 번에 하나의 요청만 처리 ( 동기)
AGSI: 여러 요청을 동시에 처리 가능 ( 비동기 ) 
따라서 FastAPI와 같은 처리 프레이무어크는 AGSI 서버인 uvicorn을 사용해야합니다. 
⚙️  Uvicorn 기본 실행 방법 
옵션
main:app main.py파일의 app = FastAPI()객체
--host 0.0.0.0 외부 접속 허용
--port 8000 사용할 포트
--reload 코드 수정 시 자동 재시작 ( 개발 환경에서 유용 ) 
Uvicorn은 그 자체로 조용한 조력자이지만, FastAPI의 성능과 실시간성을 살리는 데 필수적인 존재입니다.



FastAPI와 Spring Boot를 연동한 AI 기반 도슨트 설명 전달 시스템 구축하기 
Thu, 29 May 2025 19:53:52 GMT
이전 포스트까지 AI 모델이 작품 속 객체를 인식하고 설명을 생성한 뒤, 해당 정보를 FAISS 데이터 베이스에 전달하는 구조를 만들었습니다. 
이 시스템은 FAST API와 SPRING BOOT를 연동하여 동작하도록 연결 할 것 이며 실시간 응답성과 인증 보안을 동시에 고려해 설계하였습니다. 
전체 흐름 요약

사용자가 이미지 클릭 -> FAST API 서버로 요청 전송
FAST API는 YOLO/CLIP 기반으로 객체 설명 전송
JWT 인증 토큰과 함께 설명을 SPRING BOOT 로 전송 (POST)
SPRING BOOT는 Painting Id 유효성 검증 후 db에 설명 저장

3/4 가 이번 포스트에서 다룰 내용입니다. 
FAST API에서 SPRING BOOT로 POST 전송 예시
@app.post("analyze")
def analyze_click(req:AnalyzeClickRequest):
    description = generate_descriptoin(req.image_id)
    response = requests.post(
        "http://localhost:8080/api/model/response",
        json = { "paintingId":req.image_id, "description":description},
        headers = {"Authorization": ACCESS_TOKEN} #JWT 인증 헤더 포함
        )
        return { "status" : response.status_code }

 --> AI가 생성한 설명을 SPRING 서버에 POST 방식으로 전송합니다.이때 , 보안 처리를 위해 JWT 토큰을 함께 헤더에 담습니다. 
 ! JWT 란? 
JSON WEB TOKEN 는 사용자 인증 정보를 안전하게 주고 받기 위한 디지털 토큰입니다. 
로그인 이후 서버가 클라이언트에게 토큰을 발급하면, 
이후 요청 시 이 토큰을 HTTP HEADER에 포함시켜 서버는 유저를 식별할 수 있습니다. 
   구성: 헤더, 내용, 서명으로 이루어진 문자열
   장점: 세션 관리 불필요, 서버가 사용자 상태를 저장하지 않아도 됨 
          분산 시스템에 적합
        가볍고 빠름
  ! 비동기 처리란?
  하나의 작업이 끝날때까지 기다리지 않고, 다른 작업을 병렬로 처리할 수 있는 방식입니다. 
FAST API와 같은 프레임워크는 비동기 기반 서버 ( ASGI ) 를 사용해 다음을 가능하게 합니다. 

 - 여러 요청 동시 처리
 - 응답 대기 중에도 다른 요청 처리 가능
 - 서버 자원을 효율적으로 사용
    @app.post("/async.process")
    async def handle_asnyc():
    result = await slow_task()
    return result
SPRING BOOT 서버에서 설명 저장
@PostMapping("/api/model/response")
public ResponseEntity receiveDescription(@RequestBody ObjectDescriptionRequest request){
Painting painting = paintingRepository.findById(reequest.getPaintingID())
.orElseThrow() -> new IllegalArgumentException("해당 그림 없음"));
painting.setBackground(request.getDescription());
return ResponseEntity.ok().build();
}
이 연동 시스템은 비동기 요청 처리, jwt 인증, 모델-백엔드 간 restful 통신 , db 저장 처리 까지 실제 프로덕션 환경에서 요구되는 다양한 기능을 통합하였습니다. 



AI Docent System for Art Exhibitions (6-2)

Wed, 21 May 2025 03:42:48 GMT
Object Dectection -> Embedding -> Faiss -> LlaMa Explanation pipeline
In our project, we aim to develop an AI docent system that automatically provides descriptions for specific objects detected within artwork images.
Pipeline Overview
[ full artwork image ] 
Yolov8 segmentation 
->
[object Dectection & crop]
CLIP embedding 
-> 
[ vectorized image ]
FAISS Indexing 
->
[ query -> object retrieval ]
Llama-based generation
-> 
[ visitor-friendly docent explanation ]

Ojbect Detection and cropping with yolov8

We use the yolov8-seg.pt model to detect objects within a given artwork image. 
Unlike simple bounding box detection, this model utilizes segmentation masks, allowing us to precisely crop each object based on its actual shape.

Embedding Cropped Objects Using CLIP

Each cropped object iamge is passed through CLIP to convert it into a semantic vector.
This enables us to later retrieve sementically similar objects based on the visitor's query.
What is a Semantic Vector?
A semantic vector is a numerical representation that captures the meaning of text, images, or othre human-understandable content in a form that machines can interpret.
In similar terms: it's an array of numbers that represents meaning.
For example:
CAT :     [0.8, 0.2, 0.5]
DOG  :    [0.79, 0.21, 0.52]
CAR : [0.1, 0.9, 0.3]
Cat and Dog have similar vector because they are semantically related.
Car is conceptually different , so its vector is far apart.
The similarity between these vectors tells us how closely related the meanings are.
CLIP is trained to embed both images and text into the same semantic space.
Example: 
"an apple on the table" -> [text vector]
image of an actual apple -> [image vector] 
These are trained to be close in vector space.
So when a user asks, "where is the apple?" , we convert the question into a vector and use FAISS to find the closest image embedding - an then explain it.

Indexing Semantic Vectors with FAISS

The image embeddings obtained from CLIP are indexed using FAISS. This enables fast, approximate nearest neighbor search to retrieve the most semantically similar objects later.

Query-based Retrieval and Description generation via LLaMA

When a user submits a natural-language query, the following steps occur: 

The query is converted into a semantic vector using CLIP's text encoder.
FAISS searches for the most similar object vectors.
Metadata for the top result (labe, description, etc) is retrieved.
This information is passed into a prompt, which is then fed into LLaMA to generate a natural, human-friendly explanation.

This creates a dynamic docent experience where visitors can ask questions or click on an object and receive personalized explanations generated on the spot. 



이미지 객체 인식부터 LLAMA 기반 설명 생성  
Tue, 20 May 2025 14:02:42 GMT
우리 프로젝트에서는 특정 객체(사과, 꽃병, 인물 등) 에 대한 설명을 자동으로 제공해주는 AI 도슨트를 만들고자 합니다.
전체 구조 요약
[전체 작품 이미지] 
YOLOV8 segmentation
->
[객체 감지 및 CROP] 
clip 임베딩 
->
[백터화된 이미지] 
FAISS Index 저장
->
[ 질의 -> 관련 객체 검색] 
LLaMA 문장 생성
->
[관람객 맞춤 도슨트 설명 ]

YOLOv8 기반 객체 감지 및 crop

먼저, yolov8-seg-pt 모델을 활용하여 작품 이미지에서 작품을 감지합니다. 이때, bouning box가 아니라 segmentation mask를 사용하여 마스크 기반으로 객체만 추출합니다. 

Clip으로 객체 이미지 임베딩

각 crop된 객체 이미지에 대해 clip을 사용하여 시멘틱 벡터로 변환합니다. 
나중에 사용자 질문과 유사한 의미의 객체를 검색할 수 있습니다. 
 ** 시멘틱 벡터란 ? **
 단어나 문장, 이미지처럼 인간이 이해하는 의미를 컴퓨터가 수치적으로 다룰 수 있게 만든 벡터 표현을 말합니다. 
 쉽게 말해, 의미를 담고 있는 숫자의 배열을 의미합니다. 
 ex) 고양이 : [0.8, 0.2, 0.5] , 강아지 [0.79, 0.21, 0.52] 자동차 [0.1 , 0.9, 0.3] 
 고양이와 강아지는 유사한 동물이기 때문에 벡터값이 비슷, 자동차는 전혀 다른 의미라서 벡터가 다름 
 --> 벡터 간의 유사도를 계산하면 서로 의미적으로 얼마나 비슷한지를 판단할 수 있음
 clip은 이미지와 텍스트를 같은 의미 공간에 임베딩함 
 "an apple on the table" -> [텍스트 시멘틱 벡터]
 실제 사과 사진 -> [이미지 시멘틱 벡터], 이 둘이 서로 가까운 위치에 있도록 학습함
 사용자가 '사과가 어딨지' 라고 질문하면 벡터로 바꿔 faiss에서 가장 가까운 이미지 벡터를 찾고, 그에 대한 설명을 해줄 수 있습니다. 

FAISS를 이용한 벡터 인덱싱 

CLIP 임베딩된 벡터들을 FAISS 를 사용하여 인덱스에 저장
이 인덱스를 통해 나중에 의미적으로 유사한 객체를 빠르게 검색할 수 있음

질의 기반 객체 검색 및 LLaMA로 설명 생성

CLIP 텍스트 임베딩으로 질문을 벡터화
FAISS로 가장 유사한 객체 이미지 검색
메타데이터로부터 label + 객체 설명 가져오기 
llama에 prompt와 함께 전달 -> 자연어 생성






LangChain과 LLaMA로 경량형 RAG 시스템 만들기

Mon, 21 Apr 2025 05:05:57 GMT
요즘 RAG(Retrieval-Augmented Generation) 기반의 시스템을 계속 만지다가, 문득 이런 생각이 들었습니다.
"꼭 GPT-4 같은 대형 모델을 써야만 쓸 만한 RAG가 만들어질까?"
비용이나 응답 속도, 모델의 자율성까지 고려해보면 GPT 계열에 의존하지 않고도 꽤 괜찮은 결과를 낼 수 있겠다는 생각이 들어, 이번엔 직접 LangChain + LlamaIndex + LLaMA 조합으로 경량형 RAG 시스템을 구성해봤습니다.
✅ 목표
이번 프로젝트에서 세운 목표는 다음과 같습니다:
로컬 환경에서 완전히 작동할 것
경량화된 모델로도 충분히 답변이 가능할 것
파이프라인은 최대한 단순하게 구성할 것
벡터 검색 성능도 일정 수준 확보할 것
사용한 구성 요소
역할    도구
LLM    LLaMA2 / LLaMA3 (gguf or HF format)
파이프라인 구성    LangChain
문서 검색 / 인덱싱    LlamaIndex
벡터 DB    FAISS (또는 Chroma, 선택 가능)
임베딩 모델    Instructor, MiniLM 등 경량 모델 사용
✅ 문서 분할 전략
문서를 일정한 크기로 나누는 작업부터 신중히 설계하였습니다.
한 번에 너무 많은 텍스트를 넣으면 검색 정확도가 떨어지고, 너무 작게 분할하면 문맥이 끊어져 답변이 부정확해졌습니다.
실험 결과, 512~1024토큰 단위로 나누는 것이 성능과 효율성의 균형이 좋았습니다.
또한 슬라이딩 윈도우 방식으로 중첩 분할을 적용해 문맥의 연속성을 유지하였습니다.
✅ 임베딩 모델 선택
문서와 질의를 벡터로 변환하는 임베딩 모델은 시스템의 핵심이었습니다.
all-MiniLM-L6-v2 모델을 주로 사용하였으며, CPU에서도 빠르게 작동하고 검색 정확도가 안정적이었습니다.
또한 BAAI의 bge-small-en 모델도 성능 측면에서 뛰어나 활용해보았습니다.
문서 임베딩과 질의 임베딩을 따로 최적화할 수 있는 e5-small 모델 계열도 테스트하였으나, 구현 복잡도를 고려해 기본 모델로 유지하였습니다.
✅ 벡터 검색 시스템
검색 성능은 전적으로 벡터 DB에 달려 있었기 때문에, FAISS를 선택하여 로컬에서 빠르고 안정적으로 검색이 가능하도록 구성하였습니다.
소규모 문서(수천 단위)에는 적합했고, 불필요한 외부 서비스 없이도 자체 호스팅이 가능하다는 점에서 장점이 있었습니다.
문서 임베딩 결과는 로컬에 저장하여, 시스템 시작 시 재사용할 수 있도록 처리하였습니다.
✅ 프롬프트 구성
문서 검색 후 LLM에 전달할 프롬프트는 명확하고 간결하게 구성하였습니다.
“다음 문서를 참고하여 질문에 답하십시오” 형태의 시스템 메시지를 고정 프롬프트로 사용하였고, 검색된 문서 2~4개를 컨텍스트로 붙였습니다.
문서 수가 많아지면 오히려 모델이 집중하지 못했기 때문에, 적절한 문서 수 제한이 성능 향상에 도움이 되었습니다.
✅ 경량 LLM 구성
LLM은 LLaMA-2 7B Chat 모델의 quantized 버전(q4_K_M)을 사용하였습니다.
llama-cpp-python을 통해 로컬에서 구동하였고, n_threads 및 context window 등의 파라미터를 조정하여 최적화를 진행하였습니다.
GPU 없이도 동작이 가능해야 했기 때문에 quantized 모델은 필수였습니다.
LLaMA 외에도 Mistral, Phi-2 등도 테스트하였으며, 모델마다 응답 스타일과 속도에 차이가 존재하였습니다.
✅ 파이프라인 설계
LangChain과 LlamaIndex를 조합하여 전체 RAG 파이프라인을 구성하였습니다.
다만 LangChain의 복잡한 체인 구조는 지양하고, 가능한 한 단순한 형태로 구성하였습니다.
질의 → 검색 → 컨텍스트 구성 → LLM 호출 → 응답 출력의 최소 흐름으로 유지하여 성능 저하를 방지하였습니다.
📌 결론
이번 RAG 시스템을 통해 확인한 바는 다음과 같습니다.
경량 모델로도 충분히 유의미한 질의응답 시스템을 구현할 수 있었습니다.
성능을 좌우하는 것은 모델보다도 문서 분할, 검색 정확도, 프롬프트 구성이었습니다.
전체 파이프라인은 단순할수록 유지 보수가 쉬웠으며, 모델 특성을 고려한 조율이 매우 중요했습니다.
향후에는 웹 인터페이스 연동 및 실시간 문서 반영 기능을 추가해, 내부 문서 기반의 실용 챗봇 형태로 확장할 계획입니다.



LLM+RAG 이용 미술관 도슨트 시스템 — 하드웨어 준비편
Sat, 12 Apr 2025 05:55:51 GMT
미술관이나 전시 공간에서 눈과 귀가 되어주는 스마트 도슨트 시스템을 직접 만들 수 있다면 어떨까요? 
이번 글에서는 라즈베리파이5를 이용해 LLM+RAG 기반 스마트 도슨트 시스템을 만들기 위한 하드웨어 구성 시나리오를 작성해보겠습니다. 
아직 제작 전이고, 제작 과정을 블로그에 담아볼 예정입니다. 
우선 완성된 도슨트를 얼굴 부위에 밀착하여 착용해야하기 때문에 안전상의 문제로 라즈베리파이는 센서 허브와 인터페이스 장치 역할로 사용하기로 결정하였습니다. 
라즈베리파이로 LLM과 RAG를 구동하면 발열 문제가 있을 것으로 예상되기 때문이죠. 
즉, 라즈베리파이는 사용자의 음성 입력, 카메라 감지, UI 출력을 담당하고,
LLM + RAG 연산은 스마트폰 또는 클라우드 서버에서 수행됩니다.
우선 예상 준비물은 다음과 같습니다. 

  
    
      
      구성품
      역할
      연결 방식 / 포트
    
  
  
    
      🧠
      Raspberry Pi 5 (4GB)
      메인 컨트롤러
      전원 / GPIO / CSI / USB 등
    
    
      🔌
      PD 27W 아답터
      전원 공급
      USB-C 포트
    
    
      💾
      microSD 64GB + 32GB
      OS 및 저장소
      microSD 슬롯
    
    
      🎙️
      ReSpeaker 2-Mics Pi HAT
      음성 입력(마이크) + 출력(스피커)
      GPIO 헤더 (I2S / I2C 통신)
    
    
      📷
      Pi 카메라 모듈 3
      컬러 인식 / QR 코드 등
      CSI 포트 0 (22핀)
    
    
      🌙
      NOIR IR 카메라
      적외선 감지 / 야간 인식
      CSI 포트 1 (FPC 케이블 필요)
    
    
      🔌
      FPC 카메라 케이블 (22P → 15P)
      Pi5와 구형 카메라 연결
      CSI 어댑터 케이블
    
  




라즈베리 5의 경우 CSI( 라즈베리파이 전용 카메라를 연결하기 위한 포트) 는 2개를 제공하고 있고,MIPI CSI-2 프로토콜을 사용하여 고속 영상 데이터를 저전력으로 전송합니다. 
라즈베리 5부터는 CSI 포트가 변경(22핀/0.5mm 피치 FPC 커넥터) 되어 기존 라즈베리파이 카메라 연결구인 15핀을 연결하기 위해서는 변환 케이블이 필요합니다. 

  🔍 FPC란? 자세히 보기


Search Smarter, Generate Better: The Power of Advanced RAG
Sat, 12 Apr 2025 04:54:27 GMT
📌 “Searching well is half the battle”
Turns out that’s completely true for RAG too.
🧑‍💻 Intro: What is RAG?
RAG stands for Retrieval-Augmented Generation,
a method that augments LLMs (like GPT-4, Claude, Mistral, etc.) with search capabilities.
At first, I thought:
"Isn't this just feeding documents into the LLM after searching?"
But once I started studying it deeply… I realized it’s much more than that.
Advanced RAG isn’t just search + generation.
It’s an optimized pipeline that improves retrieval accuracy, context understanding, and trustworthiness of the responses.
In this post, I’ll break down what I’ve learned so far about:
What Advanced RAG is
Why we need it
What techniques make it powerful
📚 So what exactly is RAG?
RAG stands for Retrieval-Augmented Generation.
In simple terms, it means giving a language model the ability to look things up.
Take models like GPT-3.5 for example—
They're trained only up to 2023, so anything beyond that? They're clueless.
Imagine asking:
“What new policies were introduced in the 2024 elections?”
“What features were added in GPT-4.5?”
A base model won’t be able to answer these.
That’s where RAG comes in.
It lets the LLM pull in external documents, PDFs, web search results, and more—
giving it access to real-time and external knowledge.
📦 Basic (Naive) RAG: How it works
Here’s a simple version of how RAG works:
📄 Chunk documents and convert them into vector embeddings
🔍 Convert the user question into a vector and search for top-K similar chunks
🧠 Feed those chunks + question into the LLM to generate the final answer
Sounds easy, right? But real-world RAG has… issues.
⚠️ Limitations of Naive RAG
Area    Problem
Indexing    Poor parsing of PDFs, tables, or sections → information loss
Retrieval    Returns duplicates or irrelevant chunks, misses key content
Generation    Bad context = misleading or incorrect answers
Even with a powerful LLM, bad retrieval ruins everything 😭
🌟 Enter: Advanced RAG
Advanced RAG addresses those limitations with a 4-stage optimized pipeline:
🧱 Advanced RAG Framework
Pre-Retrieval → Document parsing, query rewriting
Retrieval → Better search (hybrid, fine-tuned embeddings)
Post-Retrieval → Reranking, compression, filtering
Generation → Optimized prompting and response generation
🔍 Stage 1: Pre-Retrieval
🧾 PDF Parsing
PDFs are not plain text—they’re layout commands.
If you extract text directly, you lose formatting, tables, and flow.
Solutions:
pypdf: Rule-based, simple
Unstructured, LayoutParser: DL-based, can detect structure
PP-StructureV2: Extracts semantic info from layouts
✍️ Query Rewriting
Real user questions are often vague or multi-topic, which ruins search accuracy.
Solutions:
Step-Back Prompting: Generalize the question
HyDE: Generate pseudo-docs to embed and search
Query2Doc: Rewrites the query like a document
ITER-RETGEN: Alternating retrieval & generation for refinement
🔗 Context Expansion
One sentence isn’t enough—expand to include surrounding context.
Techniques:
Sentence Window Retrieval: Add k sentences before/after
Parent Chunking: Group chunks into higher-level units
🔎 Stage 2: Retrieval
🧬 Hybrid Search
BM25: Keyword-based, great for precision
Vector Search: Embedding-based, great for semantics
💡 Combine both using RRF (Reciprocal Rank Fusion) → best of both worlds!
✂️ Stage 3: Post-Retrieval
🔄 Re-ranking
Even the top-K results may include junk.
So we re-rank based on relevance or importance.
Tools:
bge-reranker, Cohere, RankGPT
📉 Prompt Compression
LLMs have token limits. Example: GPT-3.5 ≈ 4,000 tokens.
Solutions:
Selective Context: Keep only informative content
LLMLingua, AutoCompressor, RECOMP: Token-level or semantic compression
🧼 Filtering
Remove duplicates, irrelevant, or low-trust chunks.
Models:
FiD-TF, Self-RAG, CRAG: Filter at the token or chunk level
🧠 Stage 4: Generation
🛠 Advanced Generation Techniques
DSP: Multi-query + multi-doc → merged answer
PRCA: RL-based generation refinement
REPLUG: Inserts search results directly into prompts
RECITE: Generate multiple answers → majority vote
✅ Evaluation Matters!
How do you know if your RAG pipeline is actually working?
You can’t just rely on “it feels right.” You need structured evaluation.
Here are some of the best tools and metrics used to evaluate RAG systems:
📊 Tools & Frameworks for RAG Evaluation
Tool / Method    Purpose
Ragas    Evaluate factual accuracy, retrieval precision, and generation faithfulness
LangSmith    Tracks individual steps inside retrieval and generation chains (LangChain-friendly)
OpenAI Cookbook    Offers scripts and guidelines for evaluating performance by category
Helm    Holistic Evaluation of Language Models, useful for benchmarking
LlamaIndex Evaluation    Measures document coverage and response relevance
BERTScore / ROUGE / BLEU    Traditional NLP metrics, can help for generation fidelity
User Feedback Loop    In production systems, nothing beats real user voting and correction tracking
🧪 Key Evaluation Metrics
Metric    What It Measures
Context Precision    Did the retriever bring back relevant content?
Context Recall    Did it miss any important information?
Answer Faithfulness    Is the generated answer grounded in retrieved facts?
Answer Relevance    Does the answer actually address the question?
Latency    How fast is retrieval + generation? Important in real-time apps
🔁 Recap: Why Advanced RAG Matters
Basic RAG might be enough for demos or prototypes.
But if you want to build real-world LLM apps — like search assistants, internal tools, or voice docents — you need:
✅ Clean document ingestion
✅ Accurate and rich retrieval
✅ Efficient compression + reranking
✅ Faithful generation
✅ A tight feedback loop for evaluation
🚀 TL;DR: RAG, When Done Right, Changes the Game
Advanced RAG is more than "search + generate."
It’s a full-stack retrieval-generation architecture.
Think of it as the “search engine” behind your LLM—
and just like real search engines, optimizing the pipeline is everything.
If you nail each stage — from chunking, to retrieval, to reranking, to compression, to generation —
your LLM can answer anything, grounded in your own data.
📌 Final Thoughts
“Just vector search and pass it to GPT” is where everyone starts.
But if you're serious about performance, trust, and user satisfaction, you’ll want:
✨ Pre-processing pipelines
🧠 Smart retrievers
⚙️ Modular evaluators
🤖 Agents that collaborate
📉 And generators that know what to leave out.
RAG isn't just a trick.
It’s a strategy.
References & Further Reading
Advanced RAG — Part 10(https://medium.com/@vipra_singh/building-llm-applications-advanced-rag-part-10-ec0fe735aeb1)



🤖 ChatGPT보다 똑똑한 RAG 만들기?! Advanced RAG 개념부터 기법까지  정리 

Sun, 06 Apr 2025 00:49:59 GMT
📌 "검색만 잘해도 반은 먹고 들어간다" — RAG에도 완벽히 적용되는 말인듯 싶습니다. 
🧑‍💻 Intro: RAG가 뭐예요?
RAG (Retrieval-Augmented Generation)는
LLM(예: GPT-4, Claude, Mistral 등)에게 검색을 붙여주는 방식이에요.
처음에 RAG를 접했을 땐
“그냥 문서 검색해서 LLM에 넣는 거 아닌가?”
하지만 
고급 RAG는 단순 검색 + 생성이 아닙니다.
파이프라인 전반을 최적화해서
검색 정확도, 문맥 이해도, 응답의 신뢰도까지 다 높여주는 구조거든요.
이번 글에서는 제가 공부하면서 정리한
"Advanced RAG가 뭔지", "왜 필요한지", "어떤 기술들이 쓰이는지"
정리해보겠습니다. 
📚 RAG가 뭔데요?
RAG는 “Retrieval-Augmented Generation”의 줄임말로,
쉽게 말하면 LLM에게 ‘검색 기능’을 붙여주는 구조입니다.
예를 들어 GPT-3.5 같은 LLM은
2023년 이전까지의 지식만 학습돼 있어서
그 이후 생긴 정보는 모르고 있어요.
예를 들어,
"2024년 총선에서 어떤 정책이 새로 나왔어요?"
"GPT-4.5에서 추가된 기능이 뭐예요?"
이런 질문은 기본 GPT로는 답을 못 해요.
이럴 때 필요한 게 바로 RAG예요.
외부 문서, PDF, 웹 검색 결과 등을 LLM에 함께 넣어서
지금 이 순간의 정보까지 반영한 응답을 생성할 수 있게 해주는 기술입니다.
📦 Naive RAG는 이렇게 작동해요
기본적인 RAG는 다음과 같은 구조로 되어 있어요:
📄 문서를 청크(chunk) 단위로 잘라서 벡터로 변환
🔍 사용자의 질문도 벡터로 만들어서, 문서 벡터들과 비교해 Top-K 검색
🧠 검색된 청크를 LLM에게 함께 보내서 응답 생성
이게 기본 구조인데, 현실에서는 여러 문제가 발생합니다...
⚠️ Naive RAG의 한계
항목    문제점
인덱싱    PDF나 보고서 같은 문서를 제대로 파싱 못 해서 정보가 유실됨
검색    중복된 내용만 나오거나, 중요한 내용을 못 찾는 경우 많음
생성    LLM이 잘못된 문맥을 받아서 오답을 내거나 편향된 정보 생성
특히 검색이 부정확하면,
아무리 좋은 LLM을 써도 이상한 답변이 나오게 되더라고요 😭
🌟 그래서 등장한 Advanced RAG
Advanced RAG는 Naive RAG의 단점을 보완한 구조예요.
전체 과정을 다음 4단계로 나누고, 각 단계마다 최적화 기법을 적용합니다.
Advanced RAG의 4단계 구조
Pre-Retrieval: 검색 전 준비 (문서 파싱, 쿼리 정리 등)
Retrieval: 검색 최적화 (하이브리드, 임베딩 튜닝 등)
Post-Retrieval: 결과 압축, 재정렬, 필터링
Generation: 답변 생성 최적화 (다중 생성, 요약 등)
🔍 1단계: Pre-Retrieval
🧾 PDF 파싱
PDF는 단순한 텍스트가 아니라
"어디에 어떻게 출력할지를 지시하는 명령어 덩어리"에 가깝습니다.
그래서 그냥 텍스트로 추출하면
문단, 표, 그림이 다 깨지거나 줄바꿈이 엉망이 되죠.
💡 해결 방법:
pypdf: 룰 기반 파서, 간단하지만 구조 인식 어려움
Unstructured, LayoutParser: 딥러닝 기반, 표/문단 인식 가능
PP-StructureV2: 문서 안의 핵심 정보까지 추출
✍️ 쿼리 재작성
사용자 질문이 애매하거나 여러 주제를 섞어서 묻는 경우,
그대로 검색하면 정확한 결과를 못 찾는 경우가 많아요.
💡 해결 방법:
Step-Back Prompting: 질문을 더 일반화해서 검색
HyDE: 질문으로 가상 문서를 생성 → 임베딩 후 검색
Query2Doc: LLM이 문서처럼 재작성해서 검색 효율 ↑
ITER-RETGEN: 생성과 검색을 반복해서 더 정확한 정보 획득
🔗 문맥 확장
하나의 문장만 검색되면 이해하기 어렵죠.
그래서 앞뒤 문장을 함께 가져오거나, 상위 문서 단위로 묶어서 전달해요.
Sentence Window Retrieval: 앞뒤 k개 문장 함께 전송
Parent Chunking: 청크들을 상위 의미 단위로 묶기
🔎 2단계: Retrieval (검색 최적화)
🧬 하이브리드 검색
키워드 검색 (BM25 등) → 정확한 단어 일치
의미 기반 검색 (벡터 임베딩) → 유사한 문맥 이해
💡 두 가지를 RRF (Reciprocal Rank Fusion)으로 조합해서
정확도와 커버리지를 동시에 높입니다.
✂️ 3단계: Post-Retrieval (결과 정리)
🔄 리랭킹 (Re-Ranking)
검색된 Top-K 청크 중에서도
진짜 중요한 걸 위로 올려주는 과정입니다.
bge-reranker, Cohere API
RankGPT: LLM 기반으로 슬라이딩 윈도우 방식 리랭킹
📉 프롬프트 압축 (Prompt Compression)
LLM에게 넣을 수 있는 토큰 수는 한계가 있어요.
(예: GPT-3.5는 약 4,000 tokens)
💡 해결 방법:
Selective Context: 정보량 많은 부분만 남김
LLMLingua, LongLLMLingua: 의미 단위로 압축
AutoCompressor: soft prompt로 요약 정보 전달
RECOMP: 문장 단위로 압축 or 요약 생성
🧼 필터링
필요 없는 문서, 중복된 내용, 신뢰도 낮은 정보는 제거합니다.
FiD-TF, Self-RAG, CRAG 등은
토큰 수준에서 중요성 판단 후 필터링까지 수행해요!
🧠 4단계: 생성 (Generation)
다양한 생성 기술들
DSP: 여러 쿼리 → 여러 문서 검색 → 종합해서 응답 생성
PRCA: 보상 기반 학습으로 더 나은 응답 유도
REPLUG: 검색 결과를 그대로 LLM 입력에 추가
RECITE: 다양한 답변 생성 후 다수결로 최종 응답 선택
🧑‍💻 기타 고급 구성
🗨️ Chat Engine
대화형에서는 이전 대화 내용을 자동 요약해서 쿼리에 반영해야 해요.
예: ContextChatEngine, CondensePlusContextMode
🤖 Agent 구조
문서마다 요약/질의응답 전담 Agent를 만들고,
Top Agent가 전체 통제 → 질문을 각 Agent에 전달하고 응답을 종합!
🔧 모델 튜닝
GPT-4로 질문-응답 데이터 만들고 → GPT-3.5에 파인튜닝
RA-DIT: Retriever + Generator를 동시에 학습시켜 성능 ↑
✅ 평가도 중요해요!
잘 작동하는 RAG인지 어떻게 알 수 있을까요?

Ragas: 정답성, 검색 정확도, 응답 적절성 측정

LangSmith: 체인 내 행동 추적

OpenAI Cookbook: 평가 기준별 테스트 스크립트 제공


참고자료: [ https://medium.com/@vipra_singh/building-llm-applications-advanced-rag-part-10-ec0fe735aeb1 ]



Bringing Museums to Life: Starting Our Journey with LLM, RAG, and Smart Glasses 
Wed, 02 Apr 2025 07:41:12 GMT
Imagine walking through a museum with nothing but a pair of smart glasses—and having a personal AI docent narrate the story behind every piece of art, tailored just for you. No need to scan QR codes, fumble with audio guides, or wait for a human tour.
This is the future we're building.
We're kicking off an exciting project that combines the power of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), and wearable smart glasses to create an intelligent, context-aware museum guide. Our goal? To transform static exhibitions into deeply personalized, immersive storytelling experiences.
In this technical blog series, we'll be documenting our journey step by step—from early experiments to real-world implementation. We’ll dive into our architecture, data pipelines, model tuning, challenges, and everything in between.
If you're interested in conversational AI, edge computing, or just love museums and emerging tech, you’re in the right place.
Let’s begin.



Database week 2
Sun, 06 Oct 2024 18:53:03 GMT
RAID 이란? (Redundant Array of Independent Disk)
2개 이상의 디스크를 병렬로 처리하여 성능 및 안전성을 향상시키는 방식
데이터 손실 방지를 위한 대비책 
레이드 종류
              레이드 0         레이드 1     레이드 2              레이드 3    
 parity bits        x                           x              error corr.         dedicateddisk
 mirroring            x                           0
 striping            block level          x              bit                       byte
         
레이드 0 데이터 분산 처리 
레이드 1 데이터 복제
레이드 2 에러 체크와 수정을 할 수 있도록 해밍 코드 사용 
페리티 정보 디스크 별도 저장 레이드 3  바이트 단위 데이터 저장
                         레이드 4 블록 단위 데이터 저장 
Select Execution
Q1. How does the DBMS find the pieces of Data on Disk?
            Logical           Physical                                                                        
Schema              Database
                             Tablespace                DataFile
                                Segment      
                             Extent
                             Oracle data block      OS block 


Fixed length records formats vs Variable length record fomats
Fixed-length records fomats: fileds stored consecutively
Varialbe-length record formats: array of offsets, null values when start offset = end offset
File Structure
Data Items -> Records -> Blocks -> Files -> Memory 
Records: Collection of related data items (fields) 
Records into blocks 
<4 options>

Separating records
fixed size recs are not need to separate
special marker
give record lengths ( or offsets  / within each record, in block header


Spanned vs Unspanned
Spanned         need indication of partial record/continuation
if record size > block size 
Unspanned    records within one block 



Sequencing
ordering records in file ( and block ) by some key value 
read records efficiently

  <3 options>
  #1 next record physically contiguous
  #2 linked
  #3 overflow area 
4. Indirection 
    완전 물리적(fully physical):
  physical<->indirect
물리적 파일 구조는 데이터를 직접적으로 가리킴
파일을 구성하는 데이터 블록들이 물리적 주소나 위치에 바로 매핑
    완전 간접(fully indirect):
모든 데이터가 간접적으로 참조
하나의 인덱스 블록이 다른 인덱스 블록을 가리키고, 그 인덱스 블록이 실제 데이터를 가리키는 방식
매우 큰 파일을 처리할 때 사용되며, 계층적인 구조를 통해 데이터를 관리
 Block header - data at beginning that describes block

Advantages of Column Store
more compact storage
efficient reads on data mining operations
Advantages of Row Store
writes (multiple fields of one record) more efficient
efficient reads for record access (OLTP)



LeetCode- Add Two numbers 
Sat, 03 Aug 2024 21:46:02 GMT
LeetCode's "Add Two Numbers" problem is a popular algorith challenge that involves adding two non-negative integers represented by linked lists. Each linked list stores digit in reverse order, with each node containing a single digit. Our goal is to add the two numbers and return the sum as a linked list.
Problem Descriptoin
Given two non-empty linked lists representing two non-negative integers, the digits are stored in reverse order, and each node contains a single digit. Add the two numbers and return the result as a linked list. You may assume the two numbers do not contain any leading zero, except the number 0 itself. 
Exmaple
(2 -> 4 -> 3) + (5 -> 6 -> 4)
Output: 7 -> 0 -> 8
Explanation: 342 + 465 = 807
Approach
To solve this problem, we will use a dummy node to simplify the edge cases handling and iterate through both linked lists to compute the sum digit by digit while managing the carry-over value.
Solution
ListNode Class
Each node contains a value 'val' and a reference to the next node 'next'
This code is given. 
class ListNode {
    int val;
    ListNode next;
    ListNode() {}
    ListNode(int val) { this.val = val; }
    ListNode(int val, ListNode next) { this.val = val; this.next = next; }
}
addTwoNumbers Method
Dummy Node: We use dummy node to simplify the code for edge cases, such as when the list is empty. The 'dummy'node serves as the starting point of the result list.
Current List: The 'current' node is used to build the result list. Initially, it points to the dummy node.
Carry Variable: The 'carry' variable holds any carry-over value from the sum of two digits.
public class AddTwoNumbersSolution {
    public ListNode addTwoNumbers(ListNode l1, ListNode l2) {
        ListNode dummy = new ListNode(0);  // Dummy node to hold the result list
        ListNode current = dummy;
        int carry = 0;  // Variable to store the carry value
While Loop
This loop continues until both linked lists are fully traversed and there is no carry left.
Value Extraction: For each node in 'l1'and 'l2', we extract the value if the node is not null; otherwise, we use 0
Sum Calculation: We calculate the sum of the extracted values and the carry.
Carry Update. The carry is updated to 'total/10'
New Node Creation: We create a new node with the value 'total%10' and link it to the current node.
Advance Nodes: We move the 'l1','l2'and 'current' pointers to their respective next nodes.
Conclusion
This  solution efficiently adds two numbers represented by linked lists in reverse order. By leveraging a dummy node and handling the carry-over value, the implementation is both clean and easy to understand. This approach ensures that we can handle any edge cases gracefully.



Leetcode - Two Sum problem in java 
Thu, 25 Jul 2024 22:04:07 GMT
The Two Sum problem from leetcode is a algorithm challenge where you need to find two numbers in an arraythat add up to a specific target. 
*Problem Statement 
*
Given an array of integers 'nums' and an integer 'target', return the indices of the two numbers such that tey add up to 'target'.
You may not use the same element twice. 
*Example 
*
Input: nums = [2, 7, 11, 15], target = 9
Output: [0, 1]
Explanation: Because nums[0] + nums[1] == 9, we return [0, 1].
*Initial Approach *
Solution 1. Brute Force

class Solution {
    public int[] twoSum(int[] nums, int target) {
        // Loop through each element
        for (int i = 0; i < nums.length; i++) {
            // Loop through each element after the current element
            for (int j = i + 1; j < nums.length; j++) {
                // Check if the sum of the two elements equals the target
                if (nums[i] + nums[j] == target) {
                    // Return the indices of the two elements
                    return new int[] {i, j};
                }
            }
        }
        // If no solution is found, throw an exception
        throw new IllegalArgumentException("No two sum solution");
    }
}
#1. loop through each element: The outer loop iterates through each element in the array starting from the first element.
#2. loop through each subsequent element: The inner loop interates through the elements that come after the current element in the outer loop.
#3. check if the sum equals the target: inside the inner loop, we check if the sum of the elements at indices 'i' and 'j' equals the target. 
#4. return the indices. If the sum equals the target, we return the indieces '[i,j]'.
#5. Handle no solution case: If no pair is found that sums to the target, we throw an 'IllegalArgumentException'.
*Complexity Analysis
*

Time complexity 0(N^2) 
Space complexity 0(1) 

*Solution 2. Hash Map *
Using Hash Map is a more efficient solution using a HashMap. 
-- What is Hash Map?--
A 'HashMap' is a part of Java's Collection Framework and provides a way to store key-value pairs. It is often implementation of the 'Map' interface and is often used for its fast retrieval capabilities.
⭐key characteristics of** HashMap**⭐

unordered collectoin
allows null values
non-sysnchronized  ( you should use 'Collections.synchronizedMap()' 

🌟Solution using HashMap🌟

Store Complement 
As we iterate through the array, we calculate the complement(i.e.. 'target-nums[i])
Check Complement 
Before adding the current element to the map, we check if its complement is already present in the map.
Return Indices.
If the complement exists, we return the indices of the complement and the current element. 

import java.util.HashMap;
import java.util.Map;

class Solution{
    public int[] twoSum(int[] nums, int target){
    //HashMap to store the number and its index
    Map map = new HashMap<>();

    //Iterate through the array
    for (int i=0; i

분류 기준	종류	특징 및 사용 예시
관계형 (RDBMS)	MySQL, PostgreSQL, Oracle, MariaDB	테이블 기반 구조, SQL 사용, 트랜잭션 및 정합성 중시 예: ERP, 은행, 전자상거래 시스템
키-값 (Key-Value)	Redis, DynamoDB, Riak	단순한 키-값 구조, 빠른 조회 성능 예: 캐시 시스템, 세션 저장소
문서형 (Document)	MongoDB, CouchDB, Firebase	JSON 형태 문서 저장, 유연한 스키마 예: CMS, 사용자 데이터 저장
열 기반 (Column-Family)	Apache Cassandra, HBase	열 단위로 데이터 저장, 대규모 분석에 적합 예: 로그 분석, IoT 데이터 수집
그래프형	Neo4j, Amazon Neptune	노드와 간선 기반, 관계 중심 데이터 처리에 강점 예: 소셜 네트워크, 추천 시스템
시계열형	InfluxDB, TimescaleDB	시간에 따른 데이터 저장 최적화 예: 센서 로그, 서버 모니터링
객체 지향형	db4o, ObjectDB	객체 자체를 저장, OOP와의 연계 강함 예: 복잡한 객체 모델링 시스템
멀티모델	ArangoDB, OrientDB	다양한 모델 (문서+그래프+키-값 등) 혼합 지원 예: 유연한 구조의 복합 시스템

구성 요소	모델명	기능
Raspberry Pi 5	8GB	메인 컨트롤 보드
Camera Module 3	RGB / NoIR	사용자가 보고 있는 그림 인식
ReSpeaker 2-Mics Pi HAT	Seeed	마이크 2개 내장, 오디오 입출력
IR LED 조광기	850nm	사용자의 동공 움직임 감지
스피커	유선 or USB	GPT 응답 음성 출력

	구성품	역할	연결 방식 / 포트
🧠	Raspberry Pi 5 (4GB)	메인 컨트롤러	전원 / GPIO / CSI / USB 등
🔌	PD 27W 아답터	전원 공급	USB-C 포트
💾	microSD 64GB + 32GB	OS 및 저장소	microSD 슬롯
🎙️	ReSpeaker 2-Mics Pi HAT	음성 입력(마이크) + 출력(스피커)	GPIO 헤더 (I2S / I2C 통신)
📷	Pi 카메라 모듈 3	컬러 인식 / QR 코드 등	CSI 포트 0 (22핀)
🌙	NOIR IR 카메라	적외선 감지 / 야간 인식	CSI 포트 1 (FPC 케이블 필요)
🔌	FPC 카메라 케이블 (22P → 15P)	Pi5와 구형 카메라 연결	CSI 어댑터 케이블