zero-pepsi.log

Proxy

Thu, 19 Sep 2024 04:59:42 GMT

Proxy란

클라이언트에서 어떤 인터넷 주소의 정보를 요청 했을때 그 주소에 해당하는 정보를 사전에 저장해둔 서버에서 찾아보고 있으면 바로 응답을 해주고, 없으면 해당 주소의 웹서버에 접속해서 요청 정보를 가져와 저장 후 응답해 주는 역할을 말한다.

Forward Proxy란

클라이언트가 웹 서버에 접근하려고 할때 클라이언트의 요청이 웹서버에게 직접 전송되는 것이 아니고 중간에 Proxy 서버에게 전달되어 Proxy 서버는 그 요청을 웹 서버에게 전달하여 응답을 받아오는 방식이다.

Reverse Proxy란

클라이언트는 웹 서버의 주소가 아닌 Reverse Proxy로 설정된 주소로 요청을 하게 되고, Proxy 서버가 받아서 그 뒷단에 있는 웹 서버에게 다시 요청을 하는 방식으로 클라이언트는 진짜 웹 서버의 정보를 알 수가 없다.

추천 용도

Application Delivery including Load Balancing(TCP Multiplexing) SSL Offload/Acceleration (SSL Multiplexing) Caching Compression Content Switching/Redirection Application Firewall Server Obfuscation Authentication Single SIgn On

Network Layer

Thu, 19 Sep 2024 04:59:37 GMT

Internet Protocol Stack

인터넷에서 컴퓨터들이 서로 정보를 주고 받는데 쓰이는 프로토콜의 모음. 그 중 TCP와 IP가 가장 많이 쓰이기 때문에 TCP/IP Protocol Suite라고도 함. 응용 계층 DNS, TLS/SSL FTP, HTTP, SMTP, MQTT. . 전송 계층 TCP, UDP, QUIC, DCCP, RSVP. . 인터넷 계층 IPv4, IPv6, ICMP, IGMP, IPsec, ECN . . 링크 계층 ARP, NDP, OSPF, 터널(L2TP), MAC, 이더넷. . 링크 계층은 물리 계층, 데이터 계층으로 나눌수 있음

OSI 7 Layers

SSL이나 TLS를 설명할 때 잘 맞음

응용계층 HTTP, SMTP, FTP, 텔넷, SSH, NFS . . 표현계층 XDR, ASN.1, SMB, AFP . . 세션계층 TLS, SSL, ISO 8327, RPC . . 전송계층 TCP, UDP, RTP, SCTP, SPX . . 네트워크 계층 IP, ICMP, IGMP, ARP, RARP, RIP 데이터 링크계층 이더넷, 토큰링, 무선랜 . . 물리 계층 전선, 광섬유, 동축케이블, 모뎀 . .

1. 물리 계층 데이터 전송 속도, 클록 동기화 방법, 물리적 연결 형태등 2. 데이터 계층 – MAC주소 Frame 3. 네트워크 계층 – IP주소 패킷, 호스트 구분을 위한 주소 개념 필요(전송경로) 4. 전송 계층 – PORT번호 프로세스 구분을 위한 주소 개념 필요, 프로세스간 통신 ---------------------------------------OS에서 동작

5. 세션 계층 세션 지원 6. 표현 계층 데이터의 의미와 표현 방법을 처리, 암호화/압축 기능 처리 7. 응용 계층(Application) 대표적 인터넷 서비스 : HTTP, FTP, Telnet, 메일 --------------------------------------사용자 프로그램으로 동작

리눅스 기본

Thu, 12 Sep 2024 12:35:14 GMT

1. 리눅스(Linux)란?

리누즈 토발즈(Linus Torvalds)에 의해 만들어진 컴퓨터 운영체제로, 자유 소프트웨어와 오픈 소스 개발의 가장 유명한 표본입니다. 컴퓨터 역사상 가장 많은 참여자가 관여하고 있는 오픈소스로 누구나 개발에 참여하고 코드를 볼 수 있는 프로젝트입니다.

아, 참고로 이 펭귄은 리눅스의 마스코트로 이름은 턱스(Tux)입니다 :)

또한, 리눅스는 GNU 프로젝트의 일환입니다. GNU 프로젝트란, "GNU(그누) is not Unix(원래 문장 안에 자신이 이미 들어있는 재귀 약자)"의 줄임말로, 리처드 스톨먼의 주도하에 시작된 공개 소프트웨어 프로젝트입니다. 유닉스 운영체제를 각자의 회사에서 개발하고 소스 코드를 공유하지 않는 문화에 대한 반발로 생겼습니다. 자유 소프트웨어라는 철학을 가지고 있으며, 스톨만은 첫 선언문에 이은 GNU 선언문을 비롯한 여러 글들을 통해서 "초기 전산 공동체에 지배적이었던 협동 정신을 되돌리자"고 주장했습니다.

2. 리눅스의 구조

리눅스의 구조는 다음과 같이 크게 4가지로 분류할 수 있습니다.

office 등의 문서편집기 혹은 웹브라우저와 같은 응용프로그램에서 사용자가 명령을 내리면 shell(셸, 쉘)은 이 명령을 해석해줍니다. 그래서 shell을 명령어 해석기라고도 부르며, 해석된 사용자 입력 명령어를 kernel(커널)에게 전달합니다. kernel은 하드웨어를 제어하는 코드를 통해 소프트웨어와 커뮤니케이션을 하며, 시스템의 모든 자원을 통제/관리하는 역할을 수행합니다.

3. 리눅스의 특징과 종류

리눅스는 유닉스(Unix)라는 운영체제를 기반으로 하고 있으며, 뛰어난 안정성과 보안성, 높은 신뢰성과 성능이 특징입니다. 시스템의 자원을 효율적으로 관리 및 사용할 수 있으며, 멀티 유저(multi-user)와 멀티 태스킹(multi-tasking)을 지원하고 있습니다.

Multi-User: 여러사용자가 동시에 하나의 시스템에 접근할 수 있음 Multi-Tasking: 여러 개의 task(작업)를 동시에 실행하고, 교대로 컴퓨터의 자원을 사용할 수 있는 기능

또한, 대부분의 리눅스는 CLI(명령어창)와 GUI(그래픽)를 모두 지원하고 있으며, 다양하고 강력한 네트워킹 기능 덕분에 서버 OS로 적합합니다. PC 서버에서도 엔터프라이즈 급의 성능을 제공하고, 성능이 낮은 PC에서도 작동합니다. 앞서 언급한 것과 같이 오픈소스 프로젝트이기 때문에 커널 소스코드 및 모든 관련 자료가 공개되어 빠른 발전을 지원하고 있습니다. 다양한 업무 환경을 만족시키는 다양한 배포판이 존재하고 풍부한 응용프로그램을 제공하고 있습니다.

가장 유명하고 사용성이 좋은 우분투(Ubuntu)와 사용자 인터페이스가 잘 갖춰진 페도라(Fedora), 라즈베리파이에서 자주 쓰이는 라즈비안, 우리에게 친숙한 안드로이드까지 모두 리눅스의 한 종류입니다.

패키지 형식 패키지 관리자 운영체제 레드헷(.rmp) yum CentOS, 페도라 데비안 레드헷(.deb) apt 우분투, 리눅스 민트, 라즈비안 안드로이드(.apk) Android Package Manager 안드로이드 OS 활용분야: 라우터나 AP와 같은 곳에서 사용하는 네트워크 서버장비, 라즈베리파이와 같은 임베딩시스템과 IoT, TV 셋톱박스, 정밀 의료기기, 리눅스 개발 서버 등

4. 가장 높은 인지도의 리눅스: 우분투

우분투는 리눅스 중에서도 가장 높은 인지도를 자랑합니다. 데비안 GNU/Linux를 기반으로 제작된 데스크탑 환경을 사용하는 리눅스 배포판입니다.

Ubuntu: 남아프리카의 반투어인 우분투라는 말에서 파생. 사람들간의 관계와 헌신에 중점을 둔 윤리사상 혹은 인본주의 사상으로 평화운동의 사상적 뿌리라는 철학을 가지고 있음. 마음이 열려 있고, 타인을 돕고 존중한다는 의미를 가지고 있음.

개인용 PC 환경에 최적화되어있고, 간결하고 쉽게 사용할 수 있다는 점이 큰 장점입니다. 또한, 높은 인지도와 많은 사용자를 보유하고 있어 그에 따른 커뮤니티도 많습니다. 우분투를 사용하던 중 문제가 생기거나 질문이 있을 때, 커뮤니티를 통해 빠르게 해결할 수 있습니다. 약 6개월 단위로 업데이트되기 때문에 보안이슈, 버그에의 대응이 빠릅니다. GNOME(그놈)을 기반으로 한 인터페이스도 가지고 있습니다.

버전 번호 ex. 19.04 = 19년도 4월 공개 버전을 의미. LTS는 Long Term Support의 약어로, 장기간(약 5년)으로 우분투에서 지원해주는 가장 안정적인 버전을 의미.

5. GUI vs CLI

앞서, 대부분의 리눅스가 GUI와 CLI를 지원하고 있다고 말했는데, 각각은 그래픽창과 명령창을 의미합니다.

5-1. GUI: Graphical User Interface

위 사진에서 처럼 일반적인 사용자가 흔히 사용하는 인터페이스입니다. 사용자가 편리하게 사용할 수 있도록 기능을 아이콘, 이미지 등의 그래픽으로 나타낸 인터페이스입니다. 마우스 클릭이나 드래그앤드롭이 가능하고 수시로 확인이 가능하여 사용이 쉽습니다. 흔히 사용하는 Windows와 Mac 운영체제 모두 지원하고 있습니다.

5-2. CLI: Command Line Interface

문자로 사용자와 컴퓨터가 상호작용하여 동작하는 인터페이스입니다. Windows의 CMD, Mac의 Terminal에서 CLI를 사용할 수 있습니다.

6. 패키지 관리자 - apt

Advanced Packaging Tool의 약자로, 데비안 리눅스(.dev) 또는 파생된 배포판(우분투)에서 소프트웨어를 설치, 제거, 업데이트할 때 사용합니다. 과거에는 설치, 제거, 업데이트에서 apt-get을, 검색과 확인에서는 apt-cache를 따로 사용했습니다. 하지만 최근에는 모두 apt로 통일되었습니다. 다만, 높은 권한이 필요한 활동에 대해서는 apt 앞에 sudo를 함께 입력하여 권한을 획득해야 합니다.

6-1. 권한? sudo? 패키지 설치와 같은 활동에서는 apt install ~~을 입력했을 때 permission denied라는 메시지와 함께 활동이 제한될 수 있습니다. 이 때, sudo를 입력함으로써 리눅스에서 모든 권한을 가지고 있는 최고 관리자 root의 권한을 획득할 수 있습니다. root는 운영체제의 모든 것을 제어할 권리를 가집니다.

sudo apt install package sudo와 함께 입력함으로써 관리자 권한을 획득하고, 암호를 입력하면 관리자 권한으로 패키지 설치가 가능합니다.

6-2. 폴더별 권한 ls -al 명령어를 통해 모든 파일의 모든 속성을 확인할 수 있습니다. 파일의 속성은 다음과 같이 형성됩니다.

rw-r-r- 1 soryeongk elice 8980 9월 18일 11:52 soryeognk.txt
: 파일 유형을 말하며, -은 파일을 d는 폴더를 의미 rw-r-r-: 파일의 권한 1: 링크된 수 soryeongk: 파일 소유자 elice: 소유 그룹 8980: 파일의 크기 9월 18일 11:52: 마지막 변경 시간 soryeongk.txt: 파일 이름 r(읽기, 4), w(쓰기, 2), x(실행, 1)로 구성되는데, 소유자/그룹/그외사용자 3개의 덩어리로 이루어집니다. 예를 들어, rw-r--r--의 경우, 소유자는 읽기와 쓰기가 가능하고 그룹과 그외 사용자는 읽기만 가능하다는 뜻입니다.

같은 이야기를 숫자로 작성하기도 하며, - 하이픈은 0으로 합니다. 각 권한의 숫자를 더해서 나타내야 하므로 rw 읽기 쓰기 권한은 4+2인 6으로, rx 읽기 실행 권한은 4+1인 5로 나타냅니다. 읽기 쓰기 실행의 권한 4+2+1인 7이 됩니다.

권한을 변경하고 싶을 때에는 이 숫자를 사용해chmod [파일권한] [파일 위치 또는 이름]을 사용하면 됩니다. 가령 소유자는 읽기 쓰기 실행의 권한을, 그룹은 읽기 실행만을, 그외 사용자는 실행의 권한만을 가지게 한다고 하면 751를 입력하면 됩니다. 777은 모든 사용자가 모든 권한을 얻는다는 의미입니다.

소유자를 변경할 때에는 chown [소유할유저]:[소유할 그룹] [파일 위치 또는 파일명]을 입력하면 되며 root 권한이 있어야 실행이 가능합니다.

7. 리눅스 파일 시스템

먼저, 파일이란, 주기억장치나 디스크처럼 하드웨어 저장공간에 저장되는 데이터의 집합을 말합니다.

파일시스템이란, 저장 장치 내에서 데이터를 읽고 쓰기 위해 미리 정한 약속입니다. 하드디스크와 ssd는 데이터가 저장된 위치가 이 약속에 따라 달라집니다. 때문에 파일 저장 및 검색을 할 수 있도록 관리하는 방법도 파일시스템이라고 말합니다. 파일을 어떻게 관리할 것인가에 대한 정책이라고 생각하면 됩니다.

대부분의 파일 시스템은 디렉토리와 파일의 형태로 구성되어 있습니다. 리눅스의 파일시스템은 root 파일 아래에 계층적으로 모든 파일과 디렉토리가 만들어집니다.

7-1. 파일시스템의 종류 FAT: File Allocate Table 파일 할당 테이블이라고 말하며, 디지털 카메라 등에 장착되는 대부분의 메모리 카드와 수많은 컴퓨터 시스템에 널리 쓰이는 파일 시스템의 종류입니다. 하지만, 너무 단순한 자료구조 탓에 작은 파일이 여러개 있을 경우 공간 활용을 제대로 하지 못한다는 단점이 있습니다. 용량이 계속 커지고 있으며, 높은 호환성을 갖습니다.

NTFS: New Technology File System Windows NT 계열의 새로운 파일 시스템으로 기존의 FAT 구조를 대체하기위해 만들어졌습니다. 시스템 고장 및 손상 시, 디스크 볼륨을 재구성하여 일관성있는 상태로 복구가 가능하여 안정성이 높고, 보안성도 FAT보다 향상된 파일 시스템입니다.

EXT: EXTended file system 확장 파일 시스템의 준말로 리눅스의 기본 파일 시스템입니다. 성능을 향상시키면서 시리즈로 출시되고 있는데, 기본으로 사용되던 2차 확장 파일 시스템 EXT2를 완벽하게 호환하는 EXT3와 EXT4가 있습니다. EXT3부터 큰 규모의 디렉토리를 접근하기 위해 해쉬를 통해 접근하는 H-tree를 사용하여 데이터 검색이 보다 용이해졌고, EXT4는 지금까지 중 가장 큰 초대형 파일 시스템입니다.

7-2. 리눅스의 디렉토리의 구조 모든 디렉토리는 최상위 디렉토리인 root이 하위로 만들어집니다.

bin: 기본 명령어들이 저장된 폴더 boot: 리눅스의 boot(시작)와 관련한 명령이 들어간 폴더 etc: 리눅스의 거의 모든 설정 파일이 들어간 폴더 home: 말그대로 홈 폴더, 로그인한 계정에 따라 폴더가 만들어짐 lib: 리눅스 및 각종 프로그램에서 사용되는 라이브러리들의 폴더 🌱root ┣ 📦bin ┣ 📦home ┃ ┣ 📂soryeongk ┃ ┃ ┣ 📂바탕화면 ┃ ┃ ┣ 📂어쩌구 폴더 ┃ ┃ ┗ 📜index.html ┣ 📦lib ┣ 📦user ┣ 📦boot ┗ 📦etc

8. 리눅스 명령어

head, tail: 각각 처음과 끝의 N줄을 출력해주는 명령어로 cat과 함께 자주 쓰임 사용법: cat [filename] | head -n[N] cat [filename] | tail -n[N] alias: 지정 명령어 su: 현재 사용자 변경하는 명령어 사용법: su [계정명, 없으면 root로]를 입력하고 비밀번호 입력 more: cat과 달리 화면 단위로 출력하며, 스페이스바로 한 칸씩 내리면서 내용 확인 가능 사용법: more [filename] which: 절대 경로를 알려주는 것으로 명령어의 위치도 알 수 있음 사용법 예시: which cat wc: 파일의 바이트, 문자, 단어, 라인 수를 출력해주는 명령어 사용법: wc [option] [filname] shutdown: 시스템 종료 및 재부팅 명령어 사용법예시 shutdown -r now: 즉시 재부팅 shutdown -h now: 즉시 종료 diff: 두 파일 간의 차이를 보여주는 명령어 사용법: diff [filename1] [filename2] - filename1과 filename2의 차이를 보여줍니다.

8-1. File Redirection 표준 스트림의 흐름을 바꾸어 일반적인 표준 스트림(표준 입력 및 출력 그리고 오류)를 사용하지 않고 다른 경로인 파일로 재지정하는 것을 뜻합니다. <과 >을 이용하여 사용 가능합니다.

표준 스트림?

stdin: 표준입력-키보드 입력 stdout: 표준 출력-화면 출력(cat, ls) stderr: 표준 오류 출력 예시) ls > exitedFilename.txt 을 입력하면, ls명령 수행의 결과를 콘솔에 찍어주는 것 대신, exitedFilename.txt에 저장합니다. 기존의 내용을 대체하므로 주의해야합니다. 만약 존재하는 파일이 없는 경우에는 >대신 >>을 사용하여 ls >> newFilename.txt처럼 입력하면 됩니다. 만약 newFilename.txt가 이미 존재하는 파일이라면 기존의 내용은 지우지 않고, 마지막 줄에 이어서 작성합니다.

python 파일을 작성하고, 입력 값을 다른 파일에 저장해둔 경우에도 file redirection을 사용할 수 있습니다. 예를 들어, 다음과 같은 파일들이 같은 폴더 내에 있다고 가정하겠습니다. add_3.py

input_num = input() answer = int(input_num) + 3 print(answer) input_num.txt

10 result.txt

The result is input_num.txt의 내용 10을 add_3.py의 입력으로 넣은 뒤, 그 결과를 result.txt의 내용을 지우지 않고 뒤 이어서 담는 명령어의 수행 순서는 다음과 같습니다.

코드 실행을 위한 인터프리터 호출한다. - python [python filename] <으로 오른쪽의 내용을 왼쪽의 입력값으로 지정한다. 그 결과를 이미 존재하는 result.txt의 내용에 뒤이어 작성한다. 정답: python add_3.py < input_num.txt >> result.txt

해석: 코드 실행을 위한 인터프리터 호출을 위해 python [python filename]을 입력하고, <으로 오른쪽의 내용을 왼쪽의 입력값으로 지정합니다. 그리고 그 결과를 이미 존재하는 result.txt의 내용에 뒤이어 작성하기 위해 >>를 사용합니다.

8-2. piping command file redirection과 유사한 pipe는 |로 명령을 구분합니다. 여러가지 복잡한 명령어를 병렬로 작성할 수 있습니다. 가령, 이상에서 file redirection으로 작성한 python add_3.py < input_num.txt >> result.txt의 내용을 piping command로도 똑같이 수행할 수 있습니다. 수행의 과정은 다음과 같습니다.

input_num.txt의 내용을 출력한다. 출력된 내용과 함께 add_3.py를 실행한다. 그 내용을 result.txt에 작성한다. - file redirection 사용 정답: cat input_num.txt | python add_3.py >> result.txt

9. 리눅스의 메모장, nano editor

UNIX 호환 시스템에서 사용 가능한 가볍고 간단한 텍스트 에디터로, 손쉽게 파일 내용 수정이 가능합니다.

단축키 기능 단축키 기능 ctrl + o 저장 Alt + 6 복사 ctrl + x 종료 ctrl + u 붙여넣기 ctrl + w 검색 ctrl + ^ 여러 줄 선택

10. mount

물리적인 저장장치(보조기억장치)를 디렉토리(폴더)에 연결시켜주는 것을 말합니다. windows에서는 하드, USB 등의 보조기억장치를 연결하면 자동으로 디렉토리(폴더)에 연결됩니다. USB를 꽂자마자 사용할 수 있게 되며, 이것을 Plug and Play 즉, PnP라고 말합니다.

하지만 리눅스의 경우 PnP 기능이 작동하지 않습니다. 직접 연결을 위해 보조기억장치를 설치했을 때 mount 작업을 수행해야 합니다.

명령어 기본형태: mount [option] [device] [directory]

device의 내용을 directory에 연결해줍니다. device의 파일 시스템 명을 알아야하는데, 파일 시스템 명은 fdisk -l로 확인할 수 있습니다. options

-a: etc/fstab에 명시된 파일 시스템을 마운트할 때 사용 -t: etx/fstab가 아닌 파일 시스템의 유형을 지정할 때 사용 -o: 추가적인 설정을 적요할 때 사용되며, 다수 조건을 적용할 때는 ,로 구분 mount된 디스크 정보 출력 df

mount 해제 remount [device] [directory]

11. 리눅스 프로세스

프로세스란, 시스템에서 메모리에 적재되어 실행되고 있는 모든 프로그램을 말합니다.

프로그램은 코딩을 통해 만든 코딩(명령어)의 집합체이고, 프로세스는 프로그램이 실행되는 과정 중에 현 상황을 말합니다. 즉, 실행되고 있는 프로그램이 곧 프로세스이며, RAM에 저장됩니다. 한 프로그램 내에서 여러 프로세스가 생성된다면 이를 멀티 프로세싱이라고 말합니다. 이들은 모두 운영체제에 의해 관리됩니다.

11-1. 프로세스의 특징 모든 프로그램은 실행될 때 하나 이상의 프로세스를 갖는다. 병행적으로 실행이 가능하다. 부모(PPID), 자식(fork를 통해 복사된 것) 프로세스가 있게 된다. 커널(kernel)에 의해 관리된다. 모든 프로세스에는 소유자(리눅스 계정)가 있다. 프로세스마다 식별을 위한 ID(PID)가 부여된다. PID: 모든 프로세스는 고유한 PID를 가지고 있으며, 1번은 init 프로세스, 2번은 kthreadd(kernel thread demon) 프로세스가 실행됩니다. init 프로세스는 나머지 모든 시스템 프로세스의 부모 프로세스로, kthreadd가 아닌 다른 모든 프로세스들은 모두 init 프로세스를 fork하여 생성된 것입니다. 또한, kthreadd는 이 후에 실행되는 모든 프로세스의 부모 프로세스입니다.

11-2. RAM(메모리) 구성

프로세스 메모리는 크게 커널 주소 공간(kernel space)와 사용자 주소 공간으로 분리할 수 있으며, 이때 kernel 부분은 사용자가 접근할 수 없습니다. 우리가 사용하는 공간은 stack, heap, data, text 4개의 영역으로 나뉘는데, argv, argc, env, etc 파일들 역시 stack의 일부입니다.

11-3. 관련 명령어 프로세스 목록 보기 ps [option]

-e: 현재 실행중인 모든 프로세스 정보 출력 -f: 모든 정보 확인 -a: 실행중인 전체 사용자의 모든 프로세스 출력 -u: 프로세스를 실행한 사용자와 프로세스 시작 시간 등을 추력 -x: 터미널 제어 없이 프로세스 현황 보기 조합하여 사용하는 것도 가능

프로세스 종료 kill [option] [PID]

-l: 사용 가능한 시그널 목록을 출력

-1: 재실행(SIGHUP) -9: 강제종료(SIGKILL) -15: 정상종료(SIGTERM)

12. job

리눅스에서 터미널을 통해 작동하는 거의 모든 명령어는 foreground에서 작동합니다. 즉, 우리가 지금 보고 있는 화면에서 그대로 작동한다는 것입니다. 하지만, &명령을 통해 background를 사용하여 보이지 않는 곳에서도 돌아가게 할 수 있습니다.

job은 그 백그라운드로 실행되는 작업을 보여주고 효율적으로 사용할 수 있게 해주는 명령어입니다. job은 프로세스와 달리 터미널 명령을 통한 작업만을 의미하며, 각 터미널마다 job은 따로 존재합니다. 즉, 터미널이 종료되면 job도 함께 종료되는 의존적 형태입니다.

명령어 뒤에 &를 붙이면 백그라운드에서 실행이 되는데, 이때의 목록은 jobs를 통해 확인할 수 있습니다. 프로세스와 마찬가지로 ps 명령어로 해당 프로세스의 PID를 알아내어 종료하는 것도 가능하고, 옵션 없이 kill %[job 번호]를 통해서도 종료가 가능합니다.

예를 들어, 잠시 멈춤을 의미하는 sleep 명령어를 백그라운드에서 실행하고 종료하는 과정은 다음과 같습니다.

$ sleep 500 & $ sleep 700 & $ jobs 이때의 결과 화면은 다음과 같습니다.

[1] - 실행중 sleep 500 & [2] + 실행중 sleep 700 & 이때 sleep 500 & 종료할 때는 kill %1을 입력하면 됩니다.

13. 작업 예약

13-1. at 지정된 시간에 1회 실행되는 작업 예약 명령어로 시간이 되면 수행되고 작업 리스트에서 사라집니다.

at [option] [time] [date] [+증가시간]

-m: 출력 결과와 함께 작업이 완료될 때 사용자에게 메일을 보냄(결과가 없더라도 메일을 보냄) -f: 특정 스크립트 파일 등을 실행할 때 사용 at now + 3 hours -f soryeongk.sh은 지금으로부터 3시간 뒤 soryeongk.sh를 실행하라는 의미 -l: 예약된 작업 목록을 출력하며, atq 명령어와 같은 동작을 수행 -v: 작업이 수행될 정확한 시간을 출력 -d: 예약된 작업을 삭제하며, atrm 명령어롸 같은 동작을 수행

atq: 실행 예약이 된 at의 리스트(at번호 날자 시간 명령어)를 보여줌 atrm [at번호]: 해당 예약을 삭제

13-2. crontab crontab은 at과는 달리 주기적으로 예약을 실행할 수 있습니다. crontab을 사용한 개인 프로젝트에서 예시를 더 확인할 수 있습니다. 당시 리눅스를 공부하면서 정리한 내용이기에 미흡한 점이 많습니다 ಥ_ಥ

crontab [option] [option에 맞는 text]

-l: 현재 계정의 설정된 crontab 정보를 보여줌 -e: 현재 계정의 crontab 정보를 수정 -r: 현재 계정의 crontab 정보를 모두 삭제 -u: 특정 사용자의 crontab 정보를 다루게 해주며 root 권한 필요해 sudo와 함께 사용

14. SSH

Secure SHell의 준말로 네트워크를 통해 다른 컴퓨터에 접근하거나 그 컴퓨터에서 명령을 실행할 수 있도록 해주는 프로토콜입니다. 즉, SSH를 통해 다른 컴퓨터에서 리눅스에 접속하여 명령어 및 프로그램을 실행할 수 있습니다. (예시: [AWS] 아마존 가상 서버에서 Jupyter Notebook 사용하기)

Telnet: SSH 이전에 다른 컴퓨터에 접근하거나 명령을 실행하는 등을 할 수 있도록 해주는 프로토콜이었으나, 보안적으로 매우 치명적인 결함이 존재. 패킷 데이터가 암호화되어있지 않아서 도중에 탈취될 경우 비밀번호 등의 민감정보가 노출되는.. 치명적인..! 때문에 SSH에는 데이터가 암호화되어 있음.

우분투에서는 openssh라는 패키지를 통해 SSH를 구동할 수 있는데, 우분투 설치 후에는 기본적으로 openssh-client만이 설치되어 있습니다. 다른 컴퓨터에서 우분투에 접속하려면 openssh-server 패키지를 설치해야 합니다.

dpkg -l | grep openssh 명령어를 통해 openssh 설치 여부 확인이 가능합니다. 또한, sudo apt-get install open-ssh-server를 통해 설치가 가능합니다.

14-1. ssh 서버 실행 sudo service ssh start service --status-all | grep + 종료 시에는 start 대신 stop을, 재시작에는 restart을 입력하면 됩니다. 이상의 명령어를 차례로 입력하면 리스팅이 되는데, ssh만 보고 싶은 경우에는 service --status-all | grep ssh를 입력하면 됩니다.

14-2. ssh 포트 확인하기 ssh를 사용하기 위해서는 다른 컴퓨터에서 해당 컴퓨터에 어떤 포트로 접속할지를 알아야 합니다. 이를 위해 sudo netstat -antp 명령어를 통해 실행하고 있는 ssh의 포트를 확인할 수 있습니다. PID와 함께 현재 실행중인 프로세스들과 포트를 확인할 수 있습니다.

14-3. ssh 포트 접속하기 ssh [서버 아이디] @[IP || 서버이름 || 도메인을 입력하면 해당 서버로의 접속이 가능합니다.

Tableau quick table calculation

Wed, 11 Sep 2024 13:37:24 GMT

Quick Table Calculation

Running Total
Difference
Percent Difference
Rank
Ratio Difference YOY growth rate percentile moving average
YTD 총계
통합 성장률
작년 대비 성장률
YTD 성장률

Running Total

Blue - Monthly Sales Orange - Sales accumulation

Difference

Expressed as difference from the previous quarter for total sales from Q2 2016 to Q4 2019

null is because there is no data for comparison in the first quarter of 2016

Difference - difference foramt example

Comparison based on the first value

Comparison based on January value of each year

Percent of Total

Proportion of "중분류"

Rank

The first ranking is based on the entire table, the second ranking is based on the panel within each product category.

Percent Difference

-> growth rate

Year Over Year Growth

Year-on-year growth rate -> Comparison with previous year in the same month

Percentile

Percentile (expressed as top percent 5%, 10%, etc.)

Moving average

Moving average (set the period to calculate the average and then divide by the number of periods)

used for Test score average, Stock data

The average for 2019 2nd quarter 3rd quarter 4th quarter is 337,600,948

Change the averaging period

Above the orange moving average line, performance is relatively good, while below the moving average line, the performance is poor due to relative performance.

YTD Total

(From the start of the year to the present) Total up to a certain point in time

Sales 2016 sum of January and February is YTD February value

Compound Annual Growth Rate (CAGR)

= average annual growth rate

The growth rate over several years is converted to an average, and the annual growth rate is converted to a geometric average, not an arithmetic average.

YOY growth

Growth rate compared to last year

SQL - HackerRank Certi

Wed, 11 Sep 2024 13:15:07 GMT

A/B testing study

Tue, 10 Sep 2024 12:54:28 GMT

A/B testing steps

Pre-requisites
Experiment Design
Running Experiment
Result to Decision
Post-launch Monitoring

1.Pre-requisites

Objective & Key Metrics
- Key Metric
  - Revenue
  - Fair when N(control) = N(treatment)
  - Normalize revenue by # of users
  - Revenue per user
Vatiants
- Control group : checkout
- Treatment group 1: display similar products in checkout
- Treatment group 2: popup similar products window in checkout
Randomization units
- Users
- Assume enough users

2.Experiment Design

User to target
- All users?
- Specific segment of users?

If choose users from "Land on homepage", most of them won't see the feature therefore, target users should be from "Start checkout"

Practical significane boundary
- Assume pratical significane:
  - Revenue incrase : $2 per user
- Power of the test : 80% (indsutrial standard)
- Significan level : 5% (indsutrial standard)
- Sample size (sigma = Standard deviation of population / delta = difference between treatment & control

Assume sigma is 20 for this example :

16*20^2 / 2^2 = 1600 (we need 1600 unique users in each variant)

therefore 4800 unique users for 3 variants

Other case - Need more samples when

Smaller change, sigma = $1 per suser
Smaller significane level, alpha = 2.5%

Decide how long to run (consider 4 factors)

1. Rump-up plan: Start with dozens of users
- No bugs
- Traffic can be handled
- Expose to a small population
- Gradually increase percentage

Assume 2000 users per day entering checkout

1. Day of week effect
- People behave differently (People make more purchases on wage day)

Recommended - Run experiment for more then 1 whole week

1. Seasonality
- Holiday season (Surge in sales during Black Friday)

so the data during the holidays can not be used for analysis and run experiment longer

1. Primacy and novelty effects
- users respond to changes differently

3.Running Experiment

running the experiment based on experiment design and collecting log data

Running experiment for too long will not imporve precision any further

4.Result to Decision

Before into analysis

Sanity checks
- Unreliable if assumptions are violated

Things need to check

number of users assigned to groups
Latency when loading the webpage (user experience is consistant among each group)

Hypotheses test to make recommendation

Recommend launching a change when

Statistically significant
Practically significant

Treatment 1 vs control

Result of treatment 1 (arguable)

No impact at all or
Impact is significant enough

Recommendation of treatment 1

Due to some uncertainty

Do not launch the change
Run a follow-up test with more power

Treatment 2 vs control

Recommendation of treatment 2

Run a follow-up test with more power

Data analysis dashboard (Excel)

Fri, 06 Sep 2024 13:08:57 GMT

Name of the file : Data Analysis of Coca-Cola 2023 & 2024 USA Sales Performance

Data source : kaggle

Data analysis dashboard practice using Excel

Structure
Analysis
Visuals
Slicers

1. Structure

Setup the basic structure for dashboard

2. Analysis

Analysis with pivot table

3. Visuals

Visual representation through charts and tables

1.Sales

KPI's

Sales and margin chart

4. Adding slicers

Make dashboard more dynamic

Region - Midwest

Region - South

default (all regions and all years)

Tableau dashboard

Fri, 06 Sep 2024 13:08:53 GMT

Action (6 options)

Filter + Highlight

Go to sheet

Change parameter

Parameter Dashboard

Change set values

URL

Layout modification

Story (collection of dashboards)

Is the topic of interest to the audience?
Ultimately, what do I want to say?
Planning according to the flow from start to finish
Avoid using too much data or computational parameters unnecessarily.
If complex calculations are made with data, it takes a long time to load the dashboard.

SQL - HackerRank Certi

Fri, 06 Sep 2024 13:08:49 GMT

Coffee shops analysis (MySQL)

Wed, 04 Sep 2024 07:50:13 GMT

Problem 1.

Create a project-related database in AWS RDS (MySQL) and create an accessible user account.

Database Name: oneday User Name / Password: oneday / 1234

Call required module

pip install mysql-connector-python
import mysql.connector

import mysql.connector
mydb = mysql.connector.connect(
    host = "your aws rds's host name",
    port = 3306,
    user = "your ID",
    password = "your password"
)

cursor = mydb.cursor(buffered = True)

Account setting

sql = 'create database oneday default character set utf8mb4'
cursor.execute(sql)

cursor.execute("create user 'oneday'@'%' identified by '1234'")
cursor.execute("grant all on oneday.* to 'oneday'@'%'")

Checking

Database creation statement query result: SHOW CREATE DATABASE oneday;
User permission check result: SHOW GRANT FOR ‘oneday’@‘%’

result1 = "show create database oneday"
cursor.execute(result1)

result1 = cursor.fetchall()
for i in result1:
    print(i)

result2 = "show grants for 'oneday'@'%'"
cursor.execute(result2)

result2 = cursor.fetchall()
for i in result2:
    print(i)

Problem 2.

Create a table to store Starbucks Ediya data

cursor.execute('use oneday')

cBrand = "create table COFFEE_BRAND(id int not null auto_increment primary key, name varchar(12))"
cursor.execute(cBrand)

cStore = "create table COFFEE_STORE(id int not null auto_increment primary key, brand int, name varchar(32) not null, gu_name varchar(5) not null, address varchar(128) not null, lat decimal(16,14) not null, lng decimal(17,14) not null, foreign key (brand) references COFFEE_BRAND(id))"
cursor.execute(cStore)

Checking

Table creation result: Desc COFFEE_BRAND; Desc COFFEE_STORE;
COFFEE_BRAND query result: SELECT * FROM COFFEE_BRAND;

result3 = 'desc COFFEE_BRAND'
cursor.execute(result3)

result3 = cursor.fetchall()
for i in result3:
    print(i)

result4 = 'desc COFFEE_STORE'
cursor.execute(result4)

result4 = cursor.fetchall()
for i in result4:
    print(i)

Problem 3.

Enter and check the COFFEE_BRAND data with Python code as follows.

cursor = mydb.cursor(buffered = True)
cursor.execute("insert into COFFEE_BRAND values (1, 'STARBUCKS'), (2, 'EDIYA')")
conn.commit()

Checking

result5 = "select * from COFFEE_BRAND"
cursor.execute(result5)

result5 = cursor.fetchall()
for i in result5:
    print(i)

Problem 4.

When importing data from the Starbucks page with Python code, modify it to enter directly into the COFFEE_STORE table.

url = "https://www.starbucks.co.kr/store/store_map.do"
driver = webdriver.Chrome()
driver.get(url)

driver.find_element(By.CSS_SELECTOR,'#container > div > form > fieldset > div > section > article.find_store_cont > article > header.loca_search > h3 > a').click()
time.sleep(0.5)
driver.find_element(By.CSS_SELECTOR, '.set_sido_cd_btn').click()
time.sleep(0.5)

xpath = '//*[@id="mCSB_2_container"]/ul/li[1]/a'
tag = driver.find_element(By.XPATH, xpath)
tag.click()
time.sleep(0.5)

soup = BeautifulSoup(driver.page_source, "html.parser")
seoul_list = driver.find_elements(By.CSS_SELECTOR, '#mCSB_3_container ul li')
soup.select_one(f'#mCSB_3_container > ul > li:nth-child(100) > p').text[:-9]

Read entire data

cursor = mydb.cursor(buffered=True)

sql = "insert into COFFEE_STORE (brand, name, gu_name, address, lat, lng) values (1, %s, %s, %s, %s, %s)"

cnt = 1

for content in tqdm_notebook(seoul_list):
    name = content.get_attribute('data-name')
    address = soup.select_one(f'#mCSB_3_container > ul > li:nth-child({cnt}) > p').text[:-9]
    lat = content.get_attribute('data-lat')
    lng = content.get_attribute('data-long')
    gu_name = address.split()[1] if address else ''

    cnt += 1

    cursor.execute(sql, (name, gu_name, address, lat, lng))
    conn.commit()

    driver.close()

Count Records

count_query = 'SELECT COUNT(*) FROM COFFEE_STORE where brand = 1;'
cursor.execute(count_query)

record_count = cursor.fetchone()[0]
record_count

Checking 10 records

check = "select * from COFFEE_STORE where brand = 1 limit 10"
cursor.execute(check)

result = cursor.fetchall()
for i in result:
    print(i)

Problem 5.

When importing data from the Ediya page with Python code, modify it to enter directly into the COFFEE_STORE table.

Collecting Data

driver = webdriver.Chrome() 
driver.get('https://www.ediya.com/contents/find_store.html')
driver.find_element(By.CSS_SELECTOR, '#contentWrap > div.contents > div > div.store_search_pop > ul > li:nth-child(2) > a').click()

sql = "INSERT INTO COFFEE_STORE (brand, name,gu_name, address, lat, lng) VALUES (2, %s, %s, %s, %s, %s)"

for gu in tqdm_notebook(gu_list):

    keyword = driver.find_element(By.CSS_SELECTOR, '#keyword')
    keyword.clear()
    keyword.send_keys(gu)

    driver.find_element(By.CSS_SELECTOR, '#keyword_div > form > button').click()

    time.sleep(1)  

    html = driver.page_source
    soup_ed = BeautifulSoup(html, 'html.parser')
    contents = soup_ed.select('#placesList li')

    for content in contents:
        name = content.select_one('dt').text
        address = content.select_one('dd').text
        gu_name = address.split(' ')[1]

        print(f'{name}--{address}--{gu_name}')

        cursor.execute(sql,(name, gu_name, address, lat, lng))
        conn.commit()

        driver.close()

Recoords count

count_query = 'SELECT COUNT(*) FROM COFFEE_STORE where brand = 2;'
cursor.execute(count_query)

record_count = cursor.fetchone()[0]
record_count

Checking 10 records

cursor.execute("select * from COFFEE_STORE where brand = 2 limit 10;")
result = cursor.fetchall()
for row in result:
    print(row)

Checking

Main distribution areas of Starbucks stores (names of top 5 districts with the most stores, output of number of stores)

strb = "select s.gu_name, count(s.brand) from COFFEE_BRAND as b, COFFEE_STORE as s where b.id = s.brand and b.name='STARBUCKS' group by s.gu_name order by count(s.brand) desc limit 5"

cursor.execute(strb)
result = cursor.fetchall()

for row in result:
    print(row)

Ediya store main distribution area (name of top 5 districts with most stores, output number of stores)

edy = "select s.gu_name, count(s.brand) from COFFEE_BRAND as b, COFFEE_STORE as s where b.id = s.brand and b.name='EDIYA' group by s.gu_name order by count(s.brand) desc limit 5"

cursor.execute(edy)
result = cursor.fetchall()

for row in result:
    print(row)

Search the number of stores for each distinct brand (output old name, brand name, number of stores)

  gu_each_st = "select gu_name, '스타벅스' as brand, count(brand) as count from COFFEE_STORE where brand = 1 group by gu_name"

  cursor.execute(gu_each_st)
  result = cursor.fetchall()

  for row in result:
      print(row)

  gu_each_ed = "select gu_name, '이디야' as brand, count(brand) as count from COFFEE_STORE where brand = 2 group by gu_name"

  cursor.execute(gu_each_ed)
  result = cursor.fetchall()

  for row in result:
      print(row)

Check the number of stores for each brand (output old name, number of Starbucks stores, number of Ediya stores)

count = ("select s.gu_name, "
         "sum(s.brand=1) as count1, "
         "sum(s.brand=2) as count2 "
         "from COFFEE_BRAND b, COFFEE_STORE s " 
         "where b.id = s.brand " 
         "group by s.gu_name " 
         "order by s.gu_name;")

cursor.execute(count)
result = cursor.fetchall()

for row in result:
    gu_name, count1, count2 = row
    print(f'({gu_name}, {count1}, {count2})')

Problem 6.

Save as CSV file. (Working with Python code)

final = ("select A.*, B.* from (select * from COFFEE_STORE where brand=1) as A join (select * from COFFEE_STORE where brand=2) as B on B.gu_name = A.gu_name")

cursor.execute(final)
result = cursor.fetchall()

df = pd.DataFrame(result)
df.columns = ['s_id', 's_brand', 's_name', 's_gu', 's_address', 's_lat', 's_lng', 
              'e_id', 'e_brand', 'e_name', 'e_gu', 'e_address', 'e_lat', 'e_lng']

df.to_csv('./starbucks_ediya.csv', index = False, encoding = "euc-kr")

SQL - Contest Leaderboard

Sat, 31 Aug 2024 13:52:43 GMT

You did such a great job helping Julia with her last coding contest challenge that she wants you to work on this one, too!

The total score of a hacker is the sum of their maximum scores for all of the challenges. Write a query to print the hacker_id, name, and total score of the hackers ordered by the descending score. If more than one hacker achieved the same total score, then sort the result by ascending hacker_id. Exclude all hackers with a total score of from your result.

Input Format

The following tables contain contest data:

Hackers: The hacker_id is the id of the hacker, and name is the name of the hacker.

Submissions: The submission_id is the id of the submission, hacker_id is the id of the hacker who made the submission, challenge_id is the id of the challenge for which the submission belongs to, and score is the score of the submission.

Sample Input

Hackers Table:

Submissions Table:

Sample Output

4071 Rose 191
74842 Lisa 174
84072 Bonnie 100
4806 Angela 89
26071 Frank 85
80305 Kimberly 67
49438 Patrick 43

Explanation

Answer :

SELECT h.hacker_id, h.name, SUM(score) FROM (
    SELECT hacker_id, challenge_id, MAX(score) AS score FROM SUBMISSIONS
    GROUP BY hacker_id, challenge_id
)t 
JOIN Hackers h on t.hacker_id = h.hacker_id
GROUP BY h.hacker_id, h.name
HAVING SUM(score) > 0
ORDER BY SUM(score) desc, h.hacker_id

SQL - Project Planning

Sat, 31 Aug 2024 13:52:38 GMT

You are given a table, Projects, containing three columns: Task_ID, Start_Date and End_Date. It is guaranteed that the difference between the End_Date and the Start_Date is equal to 1 day for each row in the table.

If the End_Date of the tasks are consecutive, then they are part of the same project. Samantha is interested in finding the total number of different projects completed.

Write a query to output the start and end dates of projects listed by the number of days it took to complete the project in ascending order. If there is more than one project that have the same number of completion days, then order by the start date of the project.

Sample Input

Sample Output

2015-10-28 2015-10-29
2015-10-30 2015-10-31
2015-10-13 2015-10-15
2015-10-01 2015-10-04

Explanation

The example describes following four projects:

Project 1: Tasks 1, 2 and 3 are completed on consecutive days, so these are part of the project. Thus start date of project is 2015-10-01 and end date is 2015-10-04, so it took 3 days to complete the project.
Project 2: Tasks 4 and 5 are completed on consecutive days, so these are part of the project. Thus, the start date of project is 2015-10-13 and end date is 2015-10-15, so it took 2 days to complete the project.
Project 3: Only task 6 is part of the project. Thus, the start date of project is 2015-10-28 and end date is 2015-10-29, so it took 1 day to complete the project.
Project 4: Only task 7 is part of the project. Thus, the start date of project is 2015-10-30 and end date is 2015-10-31, so it took 1 day to complete the project.

Answer :

/* A date is a start-date if it's not end-date for anyone */
WITH START_DATES AS (
    SELECT Start_Date
    FROM Projects
    WHERE Start_Date NOT IN (
        SELECT DISTINCT End_Date FROM Projects
    )
),
/* A date is a end-date if it's not start-date for anyone */
END_DATES AS (
    SELECT End_Date
    FROM Projects
    WHERE End_Date NOT IN (
        SELECT DISTINCT Start_Date FROM Projects
    )
)
SELECT 
    S.Start_Date AS SD, 
    /* For each start-date, corresponding end-date is the nearest end-date which is higher than start-date */
    (SELECT MIN(E.End_Date) FROM END_DATES E WHERE E.End_Date > S.Start_Date) AS ED
FROM 
    START_DATES S
ORDER BY
    (ED - SD) ASC,
    SD ASC;

Gas station analysis (MySQL)

Sat, 31 Aug 2024 13:52:31 GMT

Call required module and connect to database in AWS RDS

import mysql.connector

conn = mysql.connector.connect(
    host = "your host",
    port = 3306,
    user = "xxx",
    password = "xxx",
    database = "xxx"
)

cursor = conn.cursor(buffered=True)

Problem 1.

Create a table to store gas station data with the following structure.

# gas_brand
sql_b = "CREATE TABLE GAS_BRAND(" + \
            "id int not null auto_increment primary key, " + \
            "name varchar(16) not null)"

cursor.execute(sql_b)

# gas_station
sql_s = "CREATE TABLE GAS_STATION(" + \
            "id int auto_increment primary key, " +\
            "brand int not null, " +\
            "name varchar(64) not null, " +\
            "city char(2) not null, " +\
            "gu varchar(10) not null, " +\
            "address varchar(128) not null, " +\
            "gasoline int not null, " +\
            "diesel int not null, " +\
            "self boolean not null, " +\
            "car_wash boolean not null, " +\
            "charging_station boolean not null, " +\
            "car_maintenance boolean not null, " +\
            "convenience_store boolean not null, " +\
            "24_hours boolean not null, " +\
            "lat decimal(16,14) not null, " +\
            "lng decimal(17,14) not null, " +\
            "foreign key (brand) references GAS_BRAND(id));"

cursor.execute(sql_s)

Problem 2.

Enter and check the GAS_BRAND data with Python code as follows.

cursor.execute("insert into GAS_BRAND values (1, 'SK에너지')")
cursor.execute("insert into GAS_BRAND values (2, '현대오일뱅크')")
cursor.execute("insert into GAS_BRAND values (3, 'GS칼텍스')")
cursor.execute("insert into GAS_BRAND values (4, 'S-OIL')")
cursor.execute("insert into GAS_BRAND values (5, '알뜰주유소')")
cursor.execute("insert into GAS_BRAND values (6, '자가상표')")
connect.commit()

Table creation result: Desc GAS_BRAND; Desc GAS_STATION;

sql_result = "DESC GAS_STATION"
cursor.execute(sql_result)

result = cursor.fetchall()
for i in result:
    print(i)

GAS_BRAND query result: SELECT * FROM GAS_BRAND;

sql_result = "SELECT * FROM GAS_BRAND"
cursor.execute(sql_result)

result = cursor.fetchall()
for i in result:
    print(i)

Problem 3.

Write the following function and test it

a. function that takes the currency unit character type as input and returns it as a numeric type (test input: ‘1,000’)

def stringToInt(s):
    if s != '':
        s = s.replace(',', '')
        return int(s)
    else: 
        return None

stringToInt('1,000')

b. When a gas station brand is entered, a function returns an ID by referring to GAS_BRAND data (test input: ‘SK Energy’) - A function that receives an address and returns the district name (test input: ‘730 Heolleung-ro, Gangnam-gu, Seoul’)

def getID(brand):
    sql_result = "SELECT * FROM GAS_BRAND"
    cursor.execute(sql_result)
    result = cursor.fetchall()
    for i in result:
        if i[1] == brand:
            return i[0]
        elif brand == '알뜰(ex)':
            return 5

getID('SK에너지')

def getGu(add):
    addList = add.split()
    return addList[1]

getGu('서울시 강남구 헌릉로 730')

c. A function that receives an address and returns latitude and longitude (test input: ‘730 Heolleung-ro, Gangnam-gu, Seoul’)

import googlemaps
gmaps_key = "AIzaSyBn4xqGnCRJRbB-y4uCvBjqNu97pCuXcnc"
gmaps = googlemaps.Client(key = gmaps_key)

def getLL(add):
    tmp = gmaps.geocode(add, language='ko')
    lat = tmp[0].get("geometry")["location"]["lat"]
    lng = tmp[0].get("geometry")["location"]["lng"]

    return lat, lng

getLL('서울시 강남구 헌릉로 730')

Problem 4.

When importing data from the gas station page in the Python code, modify it to enter directly into the GAS_STATION table.

import time 
from selenium import webdriver
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
from tqdm import tqdm_notebook

url = 'https://www.opinet.co.kr/searRgSelect.do'
driver = webdriver.Chrome()
driver.get(url)

sido_list_raw = driver.find_element(By.ID, "SIDO_NM0")
sido_list = sido_list_raw.find_elements(By.TAG_NAME, "option")

seoul_select = sido_list[1].get_attribute("value")
sido_list_raw.send_keys(seoul_select)

gu_list_raw = driver.find_element(By.ID, "SIGUNGU_NM0")
gu_list = gu_list_raw.find_elements(By.TAG_NAME, "option")

gu_names = [option.get_attribute("value") for option in gu_list]
gu_names = gu_names[1:]

sql = "INSERT INTO GAS_STATION (brand, name, city, gu, address, gasoline, diesel, self, " +\
        "car_wash, charging_station, car_maintenance, convenience_store, 24_hours, lat, lng) " +\
        "VALUES (%s, %s, '서울', %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)"

def check(data, tag):
    return 'off' not in data.select_one(tag)['src']

sqltmp = "ALTER TABLE GAS_STATION MODIFY diesel int NULL;"
cursor.execute(sqltmp)
conn.commit()

for gu in tqdm_notebook(gu_names):
    element = driver.find_element(By.ID, 'SIGUNGU_NM0')
    element.send_keys(gu)
    time.sleep(0.5)

    html = driver.page_source
    soup = BeautifulSoup(html, 'html.parser')

    cnt = int(driver.find_element(By.ID, 'totCnt').text)

    for i in range(1, cnt+1):

        station = driver.find_element(By.CSS_SELECTOR, f'#body1 > tr:nth-child({i}) > td.rlist > a')
        station.click()

        html = driver.page_source
        soup = BeautifulSoup(html, 'html.parser')

        data = soup.select('#os_dtail_info')[0]

        # brand
        brand = getID(data.select_one('#poll_div_nm').text)

        # name
        name = data.select_one('.header').text.strip()

        # address
        address = data.select_one('#rd_addr').text

        # gasoline
        gasoline = stringToInt(data.select_one('#b027_p').text)

        # diesel
        diesel = stringToInt(data.select_one('#d047_p').text)

        # self 
        slf = data.select_one('#SPAN_SELF_VLT_YN_ID')
        if type(slf.find('img')) == type(None):
            is_self = False
        else:
            is_self = True

        # car_wash
        car_wash = check(data, '#cwsh_yn')

        # charging_station
        charging_station = check(data, '#lpg_yn')

        # car_maintenance
        car_maintenance = check(data, '#maint_yn')

        # convenience_store
        convenience_store = check(data, '#cvs_yn')

        # 24_hours
        sel24 = check(data, '#sel24_yn')

        tmp = gmaps.geocode(address, language='ko')
        # lat
        lat = tmp[0].get('geometry')['location']['lat']

        # lng
        lng = tmp[0].get('geometry')['location']['lng']

        cursor.execute(sql, (brand, name, gu, address, gasoline, diesel, 
                            is_self, car_wash, charging_station, car_maintenance, convenience_store, sel24, lat, lng))

        conn.commit()

cursor.execute("select count(*) from GAS_STATION")
result = cursor.fetchall()
print(result[0])

cursor.execute("select * from GAS_STATION limit 10")
result = cursor.fetchall()
for i in result:
    print(i)

Problem 5.

For visualization, query using the following rules and save as a CSV file. (written in Python code)

Full data is being imported, but the brand name should be displayed instead of the gas station brand ID. (Sort by gas station store ID)
It must be saved in the following format (note the brand name and column name, id: GAS_STORE.id)

import pandas as pd

sql = "select s.id, b.name 'brand', s.name, s.city, s.gu, s.address, s.gasoline, s.diesel, s.self, " +\
        "s.car_wash, s.charging_station, s.car_maintenance, s.convenience_store, s.24_hours, " +\
        "s.lat, s.lng " +\
        "from GAS_BRAND b, GAS_STATION s " +\
        "where b.id = s.brand ORDER BY s.id"

cursor.execute(sql)
result = cursor.fetchall()

columns = [i[0] for i in cursor.description]

df = pd.DataFrame(result)
df.columns = columns
df.head()

df.to_csv('./sql2_oil_station_data.csv', index=False, encoding='utf-8')

df = pd.read_csv('./sql2_oil_station_data.csv', index_col=0, thousands=',', encoding='utf-8')
df.head()

Problem 6.

Search for information on gas stations located within 1 kilometer from Miwang Building using latitude and longitude information.

Gas station ID, gas station brand name, gas station store name, address, distance from Miwang Building (km)

lat, lng = getLL('서울 강남구 강남대로 364')
lat, lng

cursor.execute('set @location = point(127.02915553846, 37.495435686811)')

sql = "select s.id, b.name 'brand', s.name, s.address, ST_Distance_Sphere(@location, point(lng, lat))/1000 'distance' " +\
        "from GAS_BRAND b, GAS_STATION s " +\
        "where s.brand = b.id " +\
        "having distance <= 1 " +\
        "order by distance"
cursor.execute(sql)
result = cursor.fetchall()

for i in result:
    print(i)

Problem 7.

Using latitude and longitude information, search for the 10 nearest gas stations that allow self-fueling at Miwang Building, are open 24 hours a day, and have a convenience store, sorted in order of lowest gasoline price.

Gas station ID, gas station brand name, gas station store name, address, gasoline price, additional information (self-service, 24-hour, convenience store), distance from Miwang Building (km)

sql = "select * " +\
        "from (select s.id, b.name 'brand', s.name, s.address, s.gasoline, s.self, s.24_hours, s.convenience_store, ST_Distance_Sphere(@location, point(lng, lat))/1000 'distance'" +\
        "from GAS_BRAND b, GAS_STATION s " +\
        "where s.brand = b.id and s.self=1 and s.24_hours=1 and s.convenience_store=1 " +\
        "order by distance limit 10) as tmp " +\
        "order by tmp.gasoline;"
cursor.execute(sql)
result = cursor.fetchall()

for i in result:
    print(i)

Problem 8.

Search for the average price of gasoline by gas station brand and print it in descending order.

District name, gas station brand name, average price of gasoline

sql = "select s.gu, b.name, avg(s.gasoline) " +\
        "from GAS_STATION as s, GAS_BRAND as b " +\
        "where b.id = s.brand " +\
        "group by s.gu, b.name " +\
        "order by s.gu, avg(s.gasoline)"

cursor.execute(sql)
result = cursor.fetchall()
for i in result:
    print(i)

Tableau chart part 2

Sat, 31 Aug 2024 13:52:24 GMT

Using various functions

Group

Two items

Three items

Set

No difference in "group" and "set" when using two items Three or more items then use only "group" function**

Combined Set

Hierarchy

All can be checked on one sheet -> Efficient use of dashboard

Map chart

Normal map chart

Map chart with tooltip

Map chart with filters

Wordcloud

Calendar Heatmap

Simple data analysis (Excel)

Fri, 30 Aug 2024 14:15:22 GMT

Name of the file : Billionaires Statistics Dataset

Data source : kaggle

Simple data analysis practice using Excel

Data Cleaning
Data Analysis
Data visualization

Check data

1. Data Cleaning

Find and remove duplicates

Check and change values (easy to see and understand)

M -> Male / F -> Female

Clean up unnecessary information or convert it into necessary information

Check and convert data type

2.Data Analysis

Descrittive statistics

Basic analysis with descrittive statistics

Advanced analysis with pivot tables

Find interesting or meaningful insights

a. Find top 10 overall

Sort by largest to smallest

b. billionaires by age

change the value to count

group function

checking detail

3. Data visualization

Make it more dynamic (sory by each industry,gender etc...)

Connect this to age table as well

Graphs for visual impact

Make it like a dashboard

SQL - Binary Tree Nodes

Fri, 30 Aug 2024 14:15:09 GMT

You are given a table, BST, containing two columns: N and P, where N represents the value of a node in Binary Tree, and P is the parent of N.

Write a query to find the node type of Binary Tree ordered by the value of the node. Output one of the following for each node:

Root: If node is root node.
Leaf: If node is leaf node.
Inner: If node is neither root nor leaf node.

Sample Input

Sample Output

Answer :

SELECT CASE
    WHEN P IS NULL THEN CONCAT(N, ' Root')
    WHEN N IN (SELECT DISTINCT P FROM BST) THEN CONCAT(N, ' Inner')
    ELSE CONCAT(N, ' Leaf')
    END
FROM BST
ORDER BY N ASC

SQL - 재구매가 일어난 상품과 회원

Fri, 30 Aug 2024 14:15:03 GMT

문제 설명 다음은 어느 의류 쇼핑몰의 온라인 상품 판매 정보를 담은 ONLINE_SALE 테이블 입니다. ONLINE_SALE 테이블은 아래와 같은 구조로 되어있으며 ONLINE_SALE_ID, USER_ID, PRODUCT_ID, SALES_AMOUNT, SALES_DATE는 각각 온라인 상품 판매 ID, 회원 ID, 상품 ID, 판매량, 판매일을 나타냅니다.

동일한 날짜, 회원 ID, 상품 ID 조합에 대해서는 하나의 판매 데이터만 존재합니다.

문제 ONLINE_SALE 테이블에서 동일한 회원이 동일한 상품을 재구매한 데이터를 구하여, 재구매한 회원 ID와 재구매한 상품 ID를 출력하는 SQL문을 작성해주세요. 결과는 회원 ID를 기준으로 오름차순 정렬해주시고 회원 ID가 같다면 상품 ID를 기준으로 내림차순 정렬해주세요.

Answer :

SELECT USER_ID,PRODUCT_ID
FROM ONLINE_SALE
GROUP BY USER_ID,PRODUCT_ID
HAVING COUNT(PRODUCT_ID) > 1
ORDER BY USER_ID, PRODUCT_ID DESC

SQL - Placements

Fri, 30 Aug 2024 14:14:57 GMT

You are given three tables: Students, Friends and Packages. Students contains two columns: ID and Name. Friends contains two columns: ID and Friend_ID (ID of the ONLY best friend). Packages contains two columns: ID and Salary (offered salary in $ thousands per month).

Write a query to output the names of those students whose best friends got offered a higher salary than them. Names must be ordered by the salary amount offered to the best friends. It is guaranteed that no two students got same salary offer.

Now,

Samantha's best friend got offered a higher salary than her at 11.55
Julia's best friend got offered a higher salary than her at 12.12
Scarlet's best friend got offered a higher salary than her at 15.2
Ashley's best friend did NOT get offered a higher salary than her

The name output, when ordered by the salary offered to their friends, will be:

Samantha
Julia
Scarlet

Answer :

Select S.Name
From ( Students S join Friends F Using(ID)
       join Packages P1 on S.ID=P1.ID
       join Packages P2 on F.Friend_ID=P2.ID)
Where P2.Salary > P1.Salary
Order By P2.Salary;

National museum/art gallery data analysis

Thu, 29 Aug 2024 13:08:34 GMT

Data preparation steps

Target Data (Json): National Museum and Art Gallery Information Standard Data Source: Public Data Portal DownLoad: National Museum and Art Gallery Information Standard Data.json

Call required module

import json

with open('../data/전국박물관미술관정보표준데이터.json', 'r', encoding='utf-8') as f:
    json_data = json.load(f)

Step 1: Create DataFrame with Json Data

1-1 Creating a Pandas DataFrame with Json Data

df_target = pd.json_normalize(json_data, record_path="records")
df_target.head(2)

Step 2: Preprocessing DataFrame 01

2-1) Basic preprocessing

The null value of the corresponding json_data is composed of "", so if you check the null value using df_target.info() or df_target.isna(), etc., it will say that there is no null value. Therefore, in order to properly check this data, insert a Null value instead of "".

Condition 1: (""or '') consists only of double quotation marks (or single quotation marks) without spaces.
Condition 2: Change ""(or '') to a null value (None).
Condition 3: Do not change the index or order.
Condition 4: Assign the result DataFrame to the 'df_target' variable.

df_target.replace("", None, inplace=True)

df_target.isnull().sum()

2-2 Basic preprocessing 02

When creating json data as Pandas DataFrame, numeric data was recognized as string.

Condition 1: Change the Column Data of type_int_col below to integer (int) type Data.
Condition 2: Change the Column Data of type_float_col below to float type Data.
Condition 3: If there is a null value in the Data to be changed, fill it with 0.
Condition 4: Do not change the index or order.
Condition 5: Assign the result DataFrame to the 'df_target' variable.

type_int_col = ['어른관람료', '청소년관람료', '어린이관람료']
type_float_col = ['위도', '경도']

# con 1, 2

df_target[type_int_col] = df_target[type_int_col].astype('int')
df_target[type_float_col] = df_target[type_float_col].astype('float')

# con 3

df_target['경도'] = df_target['경도'].fillna(0)
df_target.head(2)

2-3 Basic Preprocessing 03

Improve the readability of the data by deleting the data of the column that is not related to the analysis.

Condition 1: Delete the column data of drop_col below.
Condition 2: Do not change the index or order.
Condition 3: Assign the result DataFrame to the 'df_target' variable.

drop_cols = ['소재지지번주소', '위도', '경도', '운영기관전화번호','운영기관명', '운영홈페이지', '편의시설정보', '휴관정보', 
            '관람료기타정보', '박물관미술관소개', '교통안내정보', '관리기관전화번호', '관리기관명', '제공기관코드', '제공기관명']

df_target.drop(drop_cols, axis=1, inplace=True)
df_target.head(2)

2-4 Basic preprocessing 04

If the admission fee for adults, teenagers, and children is strange, delete the row data itself.

Condition 1: The column related to the admission fee is type_int_col defined above.
Condition 2: If the admission fee is not divisible by 10 won, it is judged as an outlier. Delete the row.
Condition 3: If the admission fee is 100,000 won or more, it is judged as an outlier. Delete the row.
Condition 4: Do not change the index or order.
Condition 5: Assign the result DataFrame to the 'df_target' variable.

for col in type_int_col:
    df_target.drop(df_target[(df_target[col] % 10 != 0) |
                             (df_target[col] >= 100000)].index, inplace=True)

df_target.head(2)

Step 3: DataFrame Preprocessing 02

3-1 Advanced preprocessing 01

delete data of museums/art galleries that are closed or duplicated. In addition to the conditions below, there are duplicate data, but this test proceeds by deleting only the duplicate data that meets the conditions below.

Condition 1: If the Facility Name Column data contains the word 'Closed', the corresponding row is deleted.
Condition 2: If the Facility Name Column data is duplicated, the data with the latest 'Data Reference Date' of the corresponding row is left and the row that is not the latest is deleted.
Condition 3: Whether the Facility Name Column data is duplicated is determined as a duplicate museum/art gallery if the value of the Facility Name Column data with the spaces removed matches.
Condition 4: Do not change the Index or order. - If you changed the order to solve the problem, sort it again in the Index order.
Condition 5: Assign the result DataFrame to the 'df_target' variable.

df_target.drop(df_target[df_target['시설명'].str.contains('휴관')].index, inplace=True)

df_target.sort_values(by='데이터기준일자', ascending=False, inplace=True)
df_target = df_target[~df_target.duplicated(['시설명'])]

df_target = df_target[~df_target['시설명'].str.replace(' ', '').duplicated()]

df_target = df_target.sort_index()

3-2 Advanced preprocessing 02

find out the 'opening hours' that tell me how many hours a museum/art gallery is open during the day on weekdays and public holidays.

Condition 1: The opening hours on weekdays are from 'Weekday opening time' to 'Weekday opening time'. Create a 'Weekday opening hours' column and enter the opening hours on weekdays.
Condition 2: The opening hours on public holidays are from 'Holiday opening time' to 'Holiday closing time'. Create a 'Holiday opening hours' column and enter the opening hours on public holidays.
Condition 3: 'Weekday opening hours' and 'Holiday opening hours' are expressed as floats in hours.
Condition 4: Do not change the index or order.
Condition 5: Assign the result DataFrame to the 'df_target' variable.

time_cols = ['평일관람시작시각', '평일관람종료시각', '공휴일관람시작시각', '공휴일관람종료시각']
for idx, row in df_target[time_cols].iterrows():
    open_hour, open_min = map(int, row.평일관람시작시각.split(':'))
    close_hour, close_min = map(int, row.평일관람종료시각.split(':'))
    total = (close_hour - open_hour) + round((close_min - open_min) / 60, 2)
    df_target.loc[idx, '평일관람가능시간'] = 24 if total > 23 else total

    open_hour, open_min = map(int, row.공휴일관람시작시각.split(':'))
    close_hour, close_min = map(int, row.공휴일관람종료시각.split(':'))
    total = (close_hour - open_hour) + round((close_min - open_min) / 60, 2)
    df_target.loc[idx, '공휴일관람가능시간'] = 24 if total > 23 else total

3-3 Advanced preprocessing 03

process the data of the 'Local Road Name Address' Column and divide it into Metropolitan Autonomy-Basic Autonomy (Administrative City)-Detailed Address.

Condition 1: The first word of the 'Local Road Name Address' Column data always means the name of the metropolitan autonomous government. Create a 'Metropolitan' Column and enter the name of the metropolitan autonomous government for the corresponding row data. 'Sejong Special City' has now been renamed to 'Sejong Special Self-Governing City'. Please reflect this.
Condition 2: The second word of the 'Local Road Name Address' Column data mostly means the name of the basic autonomous government. Create a 'Basic' Column and enter the name of the basic autonomous government for the corresponding row data. - In the case of 'Jeju Special Self-Governing Province', there is no basic autonomous body, but the administrative city ('Jeju-si', 'Seogwipo-si') is located in the second word of the 'Location Road Name Address' Column data. Enter the administrative city in the 'Basic' Column. - In the case of 'Sejong Special Self-Governing City', there is no basic autonomous body. In the case of 'Sejong Special Self-Governing City', enter a null value (None) in the 'Basic' Column data.
Condition 3: In the 'Location Road Name Address' Column data, create a 'Detailed' Column and enter data that is not included in the metropolitan/basic autonomous body (including administrative city).
Condition 4: The data in the 'Location Road Name Address', 'Metropolitan', 'Basic', and 'Detailed Column (Row) must not have spaces before and after the data.
Condition 5: Do not change the index or order.
Condition 6: Assign the resulting DataFrame to the 'df_target' variable.

for idx, value in df_target['소재지도로명주소'].items():
    if '세종특별' in value:
        wide = '세종특별자치시'
        basic = None
        detail = tuple(value.split(' ', 1))[1]
    else:
        wide, basic, detail = tuple(value.split(' ', 2))

    df_target.loc[idx, '광역'] = wide
    df_target.loc[idx, '기초'] = basic
    df_target.loc[idx, '상세'] = detail

Step 4: Get the information

4-1 Get the information 01

Check the total number of museums/art galleries by metropolitan government.

Condition 1: Please display the total number of museums/art galleries by metropolitan government using the metropolitan government data in the 'metropolitan' Column of df_target.
Condition 2: The index of the result DataFrame is the metropolitan government. The priority of the metropolitan government is provided by the value of the province_dict below. Please list the order of the index according to the priority of the metropolitan government. Source: Ministry of the Interior and Safety
Condition 3: The name of the Column that displays the total number of museums/art galleries in the result DataFrame is 'Number of Museums/Art Galleries'.
Condition 4: Assign the result DataFrame to the 'df_result' variable.

province_dict = {
    '서울특별시': 0,
    '부산광역시': 1,
    '대구광역시': 2,
    '인천광역시': 3,
    '광주광역시': 4,
    '대전광역시': 5,
    '울산광역시': 6,
    '세종특별자치시': 7,
    '경기도': 8,
    '강원도': 9,
    '충청북도': 10,
    '충청남도': 11,
    '전라북도': 12,
    '전라남도': 13,
    '경상북도': 14,
    '경상남도': 15,
    '제주특별자치도': 16
}

df_result = df_target.groupby('광역').size().to_frame(name='박물관미술관수')
df_result = df_result.sort_index(key=lambda x: x.map(province_dict))

display(df_result)

4-2 Get the information 02

check the metropolitan-basic autonomous governments (administrative cities) where the total number of museums/art galleries is 8.

Condition 1: Using the metropolitan autonomous government/basic autonomous government (administrative city) data in the 'metropolitan' and 'basic' columns of df_target, find the places where the total number of museums/art galleries is 8 by metropolitan autonomous government-basic autonomous government (administrative city).
Condition 2: Enter the metropolitan autonomous government in the 'metropolitan' column of the result DataFrame and the basic autonomous government (administrative city) in the 'basic' column.
Condition 3: List the 'metropolitan' column in order of metropolitan autonomous government priority, as in problem 4-1. Refer to province_dict in 4-1
Condition 4: If there is the same metropolitan autonomous body, list the data of the 'Basic' Column in reverse alphabetical order.
Condition 5: The name of the Column that indicates the total number of museums/art galleries in the result DataFrame is 'Number of Museums/Art Galleries'.
Condition 6: Set the Index in ascending order of numbers (integers).
Condition 7: Assign the result DataFrame to the 'df_result' variable.

df_result = df_target.groupby(['광역', '기초'], dropna=False).size().to_frame(name='박물관미술관수')

df_result = df_result[df_result['박물관미술관수'] == 8]

df_result = df_result.sort_index(level=1, ascending=False).sort_index(level=0, key=lambda x: x.map(province_dict), sort_remaining=False)

df_result = df_result.reset_index()
df_result.head(5)

4-3 Getting the information 03

find out the average admission fee difference between metropolitan governments and museum art gallery categories (private, national, public, university).

Condition 1: Using the metropolitan government/museum art gallery category data in the 'metropolitan' and 'museum art gallery category' columns of df_target, find the largest and smallest differences between the average adult admission fee and the average child admission fee by metropolitan government-museum art gallery category. However, if either the adult admission fee or the child admission fee is 0 won (free), please exclude museums/art galleries from the average calculation.
Condition 2: Enter the metropolitan government in the 'metropolitan' Index of the result DataFrame, and the museum art gallery category in the 'museum art gallery category' Index.
Condition 3: List the 'metropolitan' Index in order of metropolitan government priority, as in problem 4-1. - Refer to province_dict in 4-1
Condition 4: The 'Adult Admission Fee' Column of the result DataFrame is the average adult admission fee by metropolitan government-museum/art gallery division, the 'Children's Admission Fee' Column is the average children's admission fee by metropolitan government-museum/art gallery division, and the 'Admission Fee Difference' Column is the average adult admission fee by metropolitan government-museum/art gallery division - average children's admission fee (difference). - For the adult/child admission fee and admission fee difference, enter an integer value rounded to the first decimal place from the average value. - Example: 2,978.5 won -> 2,980.0 won (rounded to the first decimal place) -> 2,980 won (integer value)
Condition 5: Assign the result DataFrame to the 'df_result' variable.

df_result = df_target[~((df_target['어른관람료'] == 0) | (df_target['어린이관람료'] == 0))]

df_result = df_result.pivot_table(index=['광역', '박물관미술관구분'],
                      values=['어른관람료', '어린이관람료'],
                      aggfunc='mean')

df_result = df_result.apply(lambda x: round(x, -1))

df_result['어른관람료'] = df_result['어른관람료'].astype(int)
df_result['어린이관람료'] = df_result['어린이관람료'].astype(int)

df_result['관람료차이'] = df_result['어른관람료'] - df_result['어린이관람료']

df_result = df_result[(df_result['관람료차이'] == df_result['관람료차이'].min()) |
                      (df_result['관람료차이'] == df_result['관람료차이'].max())]

df_result.head(2)

4-4 Get the information 04

A family (2 adults, 1 teenager, 1 child) wants to visit an art gallery in Jeju-si, Jeju Special Self-Governing Province on a public holiday. Please show a list of art galleries with a total admission fee of 20,000 won or less and a viewing period of 4 hours or more on a public holiday.

Condition 1: The total admission fee for a family (2 adults, 1 teenager, 1 child) must be 20,000 won or less.
Condition 2: We want to go to an art gallery in Jeju-si, Jeju Special Self-Governing Province. Art Gallery: In this test, we define 'Art Gallery' as the data in the facility name column of df_target that contains the letters <'Art Gallery' or 'Gallery' or 'Art'>.
Condition 3: We want to go on a public holiday. It must be an art gallery that can be viewed for 4 hours or more on a public holiday.
Condition 4: The Frame of the Art Gallery List is the same as df_target.
Condition 5: Assign the result DataFrame to the 'df_result' variable.

money = (df_target['어른관람료'] * 2 + df_target['청소년관람료'] + df_target['어린이관람료']) <= 20000
location = df_target['기초'] == '제주시'
gallery = df_target['시설명'].str.contains('미술관|갤러리|아트')
holiday = df_target['공휴일관람가능시간'] >= 4

df_result = df_target[money & location & gallery & holiday]

Tableau chart

Thu, 29 Aug 2024 13:08:27 GMT

Table chart

Bar chart (Frequently used)

Line chart

Pie chart

Treemaps

Stacked Bar chart

Dashboard

Important considerations when creating a dashboard

Purpose

Who will use it
What information do you want to convey

Display environment

Tablet? Phone? PC? or even hard copy

Layout arrangement

Key content is located in the upper left

Dashboard 01

Scatter chart

Combination chart

1. Bar + Line with dual axes (Frequently used)

2. Line char with dual axes - 트랜드와 함께 매출 강조

3. Line + Area chart with dual axes

Donut chart (Frequently used)

Dashboard 2

Action filter added

Action Highlight added