junhyeok_kim.log

tmp

Tue, 21 May 2024 12:36:46 GMT

CR3 레지스터는 x86 아키텍처에서 중요한 역할을 하는 레지스터로, 문맥 전환(context switch) 시 프로세스 간의 페이지 디렉토리 베이스 주소를 저장합니다. 이 레지스터는 가상 메모리 관리에서 핵심적인 역할을 하며, 특히 다음과 같은 방식으로 쓰레드 간 문맥 전환을 가능하게 합니다:

CR3 레지스터와 문맥 전환

현재 실행 중인 쓰레드의 상태 저장:
- 현재 실행 중인 쓰레드의 상태를 저장할 때, CPU는 일반 레지스터, 프로그램 카운터, 상태 레지스터 등을 저장합니다. 이 상태 정보에는 CR3 레지스터의 값도 포함됩니다.
- CR3 레지스터는 현재 쓰레드 또는 프로세스의 페이지 디렉토리 베이스 주소를 포함하고 있습니다. 이 주소는 현재 쓰레드가 사용하는 가상 메모리와 물리 메모리 간의 매핑 정보를 가리킵니다.
새로운 쓰레드의 상태 로드:
- 새로운 쓰레드가 실행되기 전에, CPU는 이 쓰레드의 상태를 복원합니다. 이 과정에서 저장된 CR3 레지스터의 값이 복원됩니다.
- 복원된 CR3 레지스터의 값은 새로운 쓰레드의 페이지 디렉토리 베이스 주소를 가리킵니다. 이 주소는 새로운 쓰레드가 사용하는 가상 메모리와 물리 메모리 간의 매핑 정보를 가리킵니다.
가상 메모리 공간의 전환:
- CR3 레지스터의 값이 변경됨으로써, CPU는 새로운 페이지 디렉토리를 사용하여 가상 주소를 물리 주소로 변환합니다.
- 이 변경은 전체 가상 메모리 공간을 새로운 쓰레드의 메모리 맵으로 전환합니다. 결과적으로, 각 쓰레드는 자신만의 가상 메모리 공간을 가지게 되어, 메모리 접근이 격리됩니다.

문맥 전환의 예

현재 쓰레드의 문맥 저장:
- 현재 쓰레드 A가 실행 중일 때, 운영 체제는 이 쓰레드의 레지스터 상태와 CR3 값을 저장합니다.
- 저장된 CR3 값은 쓰레드 A의 페이지 디렉토리 베이스 주소입니다.
새로운 쓰레드의 문맥 로드:
- 새로운 쓰레드 B를 실행하기 위해, 운영 체제는 쓰레드 B의 저장된 상태를 복원합니다.
- 여기에는 쓰레드 B의 CR3 값도 포함됩니다. 이 값은 쓰레드 B의 페이지 디렉토리 베이스 주소입니다.
가상 메모리 공간 전환:
- CR3 레지스터가 쓰레드 B의 페이지 디렉토리 베이스 주소로 설정되면, CPU는 이제 쓰레드 B의 가상 주소 공간을 사용하여 메모리 접근을 수행합니다.
- 결과적으로, 쓰레드 A의 가상 메모리 공간에서 쓰레드 B의 가상 메모리 공간으로 전환됩니다.

요약

CR3 레지스터는 문맥 전환 시 가상 메모리 주소 공간을 전환하는 데 필수적인 역할을 합니다. 각 쓰레드 또는 프로세스는 고유의 페이지 디렉토리를 가지고 있으며, CR3 레지스터의 값을 변경함으로써 CPU는 현재 실행 중인 쓰레드의 가상 메모리 맵을 전환합니다. 이는 쓰레드 간 메모리 격리를 유지하고, 각 쓰레드가 자신만의 독립적인 가상 주소 공간을 사용할 수 있게 합니다.

[PintOS] Introduction to User Program - Virtual Memory

Tue, 21 May 2024 06:50:09 GMT

User Program을 시작하기 앞서,

We strongly recommend you to read synchronization and virtual address before you start.

라는 ☢️ 엄중한 경고 🚥 를 받아들이고 Virtual Memory 부터 정리하기로 하였다!

🎬 Introduction

By now you should have some familiarity with the inner workings of Pintos. Your OS can properly handle multiple threads of execution with proper synchronization, and can load multiple user programs at once. However, the number and size of programs that can run is limited by the machine's main memory size. In this assignment, you will remove that limitation by buiding an illusion of infinite memory.

Thread 단원에서 우리는 여러 쓰레드를 다루며 동기화 하는 법을 했었죠? 이번에는 여러개의 userprog을 로드함은 물론, Main Memory 사이즈의 제약을 벗어나 각각의 프로세스가 Main Memory 전체를 활용한다는 환상을 심어줄 것 입니다!

🛎️ Background

Source Files

You will work in the vm directory for this project. The Makefile is updated to turn on the setting -DVM. We provide an enormous amount of template code. You MUST follow the given template. That is, if you submit the code, that is not based on the given template, you get 0pts. Also, you should never change the template where it is marked "DO NOT CHANGE". Here, we provide some details about each template file that you will be modifying.

반드시 주어진 템플릿을 따르세요!! 주어진 템플릿에 기반하지 않고 코드를 제출한다면 0점을 받습니다! 또한, “DO NOT CHANGE”라고 적혀있는 부분의 코드는 절대 수정하지 마세요. 아래에 각 템플릿 파일에서 당신이 수정하게 될 부분에 대한 자세한 설명을 하겠습니다.
하지 말라면 하지 말자!

include/vm/vm.h, vm/vm.c

enum vm_type {
    /* page not initialized */
    VM_UNINIT = 0,
    /* page not related to the file, aka anonymous page */
    VM_ANON = 1,
    /* page that realated to the file */
    VM_FILE = 2,
    /* page that hold the page cache, for project 4 */
    VM_PAGE_CACHE = 3,

    /* Bit flags to store state */

    /* Auxillary bit flag marker for store information. You can add more
     * markers, until the value is fit in the int. */
    VM_MARKER_0 = (1 << 3),
    VM_MARKER_1 = (1 << 4),

    /* DO NOT EXCEED THIS VALUE. */
    VM_MARKER_END = (1 << 31),
};

VM_UNINIT.c

Provides operations for uninitialized pages (vm_type = VM_UNINIT). Under the current design, all pages are initially set up as uninitialized pages, then it transforms to anonymous pages or file-backed pages.

초기화되지 않은 페이지에 대한 연산을 제공함. 모든 페이지는 VM_UNINIT 로 설정된 다음VM_ANON 혹은 file-backed pages 로 변환됨.

vm/anon.c

Provides operations for anonymous pages (vm_type = VM_ANON).

vm/file.c

Provides operations for file-backed pages (vm_type = VM_FILE).

vm/inspect.c

grading을 위한 메모리 기능 검사. 수정 금지! 🚫

Memory Terminology

Pages

출처 : [1]

63          48 47            39 38            30 29            21 20         12 11         0
+-------------+----------------+----------------+----------------+-------------+------------+
| Sign Extend |    Page-Map    | Page-Directory | Page-directory |  Page-Table |    Page    |
|             | Level-4 Offset |    Pointer     |     Offset     |   Offset    |   Offset   |
+-------------+----------------+----------------+----------------+-------------+------------+
              |                |                |                |             |            |
              +------- 9 ------+------- 9 ------+------- 9 ------+----- 9 -----+---- 12 ----+
                                          Virtual Address

A page, sometimes called a virtual page, is a continuous region of virtual memory of size 4,096 bytes (the page size) in length. A page must be page-aligned, that is, start on a virtual address evenly divisible by the page size. Thus, the last 12 bits of a 64-bit virtual address is the page offset (or just offset). The upper bits are used to indicate the index in the page table, which will be introduced soon. With 64-bit system, we use 4-level page table, which makes a virtual address to look like this:

가상 페이지는 크기가 4,096 bytes 인 가상 메모리의 연속적인 영역입니다. 페이지는 페이지 정렬이 되어 있어야 합니다, 즉 페이지 크기로 균등하게 나눌 수 있는 가상 주소에서 시작해야 합니다. 따라서 64비트 가상 주소의 마지막 12비트가 페이지 오프셋(또는 그냥 오프셋)입니다. 위쪽 비트는 페이지 테이블의 인덱스를 나타내는 데 사용되며, 이는 곧 소개할 페이지 테이블에 표시됩니다. 64비트 시스템에서는 가상 주소를 다음과 같이 보이도록 하는 4단계 페이지 테이블을 사용합니다:

4KB는 4096 바이트입니다. 4096은 2¹² 따라서, 4KB 페이지의 주소를 나타내려면 12비트가 필요합니다.

Each process has an independent set of user (virtual) pages, which are those pages below the virtual address KERN_BASE (0x8004000000). The set of kernel (virtual) pages, on the other hand, is global, and thus remain in the same position regardless of what thread or process is running. The kernel may access both user and kernel pages, but a user process may access only its own user pages. See Virtual Memory Layout, for more information.

각 프로세스는 KERN_BASE(0x8004000000) '아래' 에 독립적인 공간을 갖는 user (virtual) pages를 갖고 있습니다. 반면에, kernel (virtual) pages 는 전역적으로 쓰이며, 항상 같은 위치에 존재합니다.
커널은 유저 페이지와 커널 페이지 모두에 접근할 수 있지만, 유저 프로세스는 본인의 유저 페이지에만 접근할 수 있습니다

KERN_BASE (0x8004000000) 🤔?

이 가상 주소는 커널 공간이 시작되는 지점을 나타냅니다. 즉, 이 주소 이상의 영역은 커널 전용이고, 이 주소 미만의 영역은 사용자 전용입니다.

Pintos provides several useful functions for working with virtual addresses. See Section Virtual Addresses, for details. https://casys-kaist.github.io/pintos-kaist/appendix/virtual_address.html

Frames

A frame, sometimes called a physical frame or a page frame, is a continuous region of physical memory. Like pages, frames must be page-size and page-aligned. Thus, a 64-bit physical address can be divided into a frame number and a frame offset (or just offset), like this:

Frame 은 물리적 프레임 또는 page frame 으로 불리기도 하며, 물리적 메모리의 연속적인 영역입니다. 페이지와 마찬가지로 프레임은 page-size와 page-aligned이 되어 있어야 합니다. 따라서 64비트 물리적 주소는 다음과 같이 프레임 번호와 프레임 오프셋(또는 그냥 오프셋)으로 나눌 수 있습니다:

                          12 11         0
    +-----------------------+-----------+
    |      Frame Number     |   Offset  |
    +-----------------------+-----------+
              Physical Address

페이지 정렬 (page-aligned)은 메모리 주소가 페이지 크기의 배수인 경우를 의미합니다. 즉, 페이지의 시작 주소가 페이지 크기의 배수여야 합니다. 4KB 페이지의 경우, 페이지 정렬 주소는 0x0000, 0x1000, 0x2000 등과 같이 4096(0x1000)의 배수입니다.

x86-64 doesn't provide any way to directly access memory at a physical address. Pintos works around this by mapping kernel virtual memory directly to physical memory - the first page of kernel virtual memory is mapped to the first frame of physical memory, the second page to the second frame, and so on. Thus, frames can be accessed through kernel virtual memory.

x86-64는 물리적 주소에서 메모리에 직접 액세스하는 방법을 제공하지 않습니다. 핀토스는 커널 가상 메모리를 물리적 메모리에 직접 매핑하여 이 문제를 해결합니다. 커널 가상 메모리의 첫 번째 페이지는 물리적 메모리의 첫 번째 프레임에 매핑되고, 두 번째 페이지는 두 번째 프레임에 매핑됩니다. 따라서 프레임은 커널 가상 메모리를 통해 액세스할 수 있습니다.

Page Tables

A page table is a data structure that the CPU uses to translate a virtual address to a physical address, that is, from a page to a frame. The page table format is dictated by the x86-64 architecture. Pintos provides page table management code in threads/mmu.c.

페이지 테이블은 CPU가 가상 주소를 물리 주소, 즉 page에서 frame으로 변환하는 데 사용하는 자료구조 입니다. 페이지 테이블 형식은 x86-64 아키텍처에 의해 결정됩니다. 핀토스는 페이지 테이블 관리 코드를 스레드/mmu.c로 제공합니다.

The diagram below illustrates the relationship between pages and frames. The virtual address, on the left, consists of a page number and an offset. The page table translates the page number into a frame number, which is combined with the unmodified offset to obtain the physical address, on the right.

아래 다이어그램은 page와 frame의 관계를 보여줍니다. 왼쪽의 가상 주소는 페이지 번호와 오프셋으로 구성되어 있습니다. 페이지 테이블은 페이지 번호를 프레임 번호로 변환하고 오른쪽의 수정되지 않은 오프셋과 결합하여 물리적 주소를 얻습니다.

Swap Slots

A swap slot is a page-size region of disk space in the swap partition. Although hardware limitations dictating the placement of slots are more flexible than for frames, swap slots should be page-aligned because there is no downside in doing so.

스왑 슬롯은 스왑 파티션에서 디스크 공간의 페이지 크기 영역입니다. 슬롯의 배치를 지시하는 하드웨어 제한이 프레임보다 더 유연하지만, 스왑 슬롯은 page-aligned 를 하던 안하던 큰 차이가 없기에 (단점은 없기에) 정렬해줍니다.

🕹️ Resource Management Overview

You will need to design/implement the following data structures:

Supplemental page table

Enables page fault handling by supplementing the page table. See Managing the Supplemental Page Table below.

page table 을 보조하여 page fault handling 처리를 활성화합니다. 아래 보충 페이지 테이블 관리를 참조하십시오.

Frame table

Allows efficient implementation of eviction policy of physical frames. See Managing the Frame Table below.

물리적 프레임의 퇴거 정책을 효율적으로 구현할 수 있습니다. 아래의 프레임 테이블 관리를 참조하십시오.

Swap table

Tracks usage of swap slots. See Managing the Swap Table below.

스왑 슬롯이 사용되는 것을 추적합니다. 아래의 스왑 테이블 관리 부분을 참고하세요.

You do not necessarily need to implement three completely distinct data structures: it may be convenient to wholly or partially merge related resources into a unified data structure.

For each data structure, you need to determine what information each element should contain. You also need to decide on the data structure's scope, either local (per-process) or global (applying to the whole system), and how many instances are required within its scope.

To simplify your design, you may store these data structures in non-pageable memory (e.g., memory allocated by calloc or malloc). That means that you can be sure that pointers among them will remain valid.

각 자료구조를 완전히 별개로 구현할 필요는 없습니다. 서로 관련된 자원을 통합된 자료구조로 합치는 것이 편리할 수 있습니다. 각 자료구조에서 각 원소가 어떤 정보를 담는지 결정해야 하고, 자료구조를 지역(프로세스별) 또는 전역(전체 시스템에 적용)으로 적용할지 결정해야 합니다. 또한 필요한 인스턴스의 수를 결정해야 합니다.
설계를 단순화하기 위해, non-pageable 메모리에 이러한 자료구조를 저장할 수 있습니다. 이는 calloc이나 malloc으로 할당된 메모리를 의미합니다. 즉, 자료구조들 사이의 포인터가 항상 유효한 상태로 유지될 것을 보장할 수 있습니다.

Choices of implementation (Performance perspective)

Possible choices for implementation include arrays, lists, bitmaps, and hash tables. An array is often the simplest approach, but a sparsely populated array wastes memory. Lists are also simple, but traversing a long list to find a particular position wastes time. Both arrays and lists can be resized, but lists more efficiently support insertion and deletion in the middle.

구현을 위해 선택할 수 있는 방법으로는 배열, 목록, 비트맵, 해시 테이블 등이 있습니다. 배열이 가장 간단한 방법인 경우가 많지만, 밀도가 희박한 배열은 메모리를 낭비합니다. lists는 간단하지만 특정 위치를 찾기 위해 긴 목록을 이동하는 것은 시간을 낭비합니다. 배열과 lists은 모두 크기를 조정할 수 있지만 lists은 중간에 삽입과 삭제를 더 효율적으로 지원합니다.

Pintos includes a bitmap data structure in lib/kernel/bitmap.c and include/lib/kernel/bitmap.h. A bitmap is an array of bits, each of which can be true or false. Bitmaps are typically used to track usage in a set of (identical) resources: if resource n is in use, then bit n of the bitmap is true. Pintos bitmaps are fixed in size, although you could extend their implementation to support resizing.

pintOS 는 비트맵 데이터 구조를 포함합니다. 비트맵은 비트들의 배열로, 각각 참이거나 거짓일 수 있습니다. 비트맵들은 일반적으로 자원들의 집합에서 사용을 추적하는 데 사용되는데, 자원 n이 사용 중이면 비트맵의 비트 n은 참입니다. 핀토스 비트맵들은 크기가 고정되어 있지만, re-sizing을 지원하도록 구현을 확장할 수 있습니다.

Pintos also includes a hash table data structure (See Hash Table). Pintos hash tables efficiently support insertions and deletions over a wide range of table sizes.

핀토스는 해시 테이블 재료 구조도 포함하고 있습니다(해시 테이블 참조). 핀토스 해시 테이블은 다양한 테이블 크기에서 삽입과 삭제를 효율적으로 지원합니다.

🎁 Managing the Supplemental Page Table

The supplemental page table supplements the page table with additional data about each page. It is needed because of the limitations imposed by the page table's format. Such a data structure is often called a "page table" also; we add the word "supplemental" to reduce confusion.

보조 페이지 테이블 은 각 페이지에 대한 추가 데이터로 페이지 테이블을 보완합니다. 페이지 테이블의 형식에 따른 제한 때문에 필요합니다. 이러한 데이터 구조는 종종 "page table"이라고도 불리는데, 우리는 혼란을 줄이기 위해 "보조"이라는 단어를 추가합니다. Supplemental Page Table은 프로세스마다 존재하는 자료구조입니다.
→ 각각의 페이지에 대해서 데이터가 존재하는 곳(frame, disk, swap 중 어디에 존재하는지), 이에 상응하는 커널 가상주소를 가리키는 포인터 정보, active인지 inactive 인지 등

The supplemental page table is used for at least two purposes. Most importantly, on a page fault, the kernel looks up the virtual page that faulted in the supplemental page table to find out what data should be there. Second, the kernel consults the supplemental page table when a process terminates, to decide what resources to free.

보조 페이지 테이블은 적어도 두 가지 목적을 위해 사용됩니다. 가장 중요한 것은page fault 에 대해 커널이 보조 페이지 테이블에서 fault 가 발생한 가상 페이지를 검색하여 어떤 데이터가 있어야 하는지 확인하는 것입니다. 둘째, 커널은 프로세스가 종료될 때 보조 페이지 테이블을 조사하여 어떤 리소스를 free 할 것인지 결정합니다.

🧹 Organization of Supplemental Page Table

You may organize the supplemental page table as you wish. There are at least two basic approaches to its organization: in terms of segments or in terms of pages. A segment here refers to a consecutive group of pages, i.e., memory region containing an executable or a memory-mapped file.

우리가 원하는 대로 보조 페이지 테이블을 구성할 수 있습니다. 보조 페이지 테이블 구성에 대한 최소한 두 가지 기본적인 접근법이 있습니다: 세그먼트 측면 또는 페이지 측면. 여기서 세그먼트는 연속적인 페이지 그룹, 즉 실행 파일 또는 메모리 매핑된 파일을 포함하는 메모리 영역을 나타냅니다.

Optionally, you may use the page table itself to track the members of the supplemental page table. You will have to modify the Pintos page table implementation in threads/mmu.c to do so. We recommend this approach for advanced students only.

"advanced students only"

❌ Handling page fault

The most important user of the supplemental page table is the page fault handler. In project 2, a page fault always indicated a bug in the kernel or a user program. In project 3, this is no longer true. Now, a page fault might only indicate that the page must be brought in from a file or swap slot. You will have to implement a more sophisticated page fault handler to handle these cases. The page fault handler, which is page_fault() in userprog/exception.c, calls your page fault handler, vm_try_handle_fault() in vm/vm.c.

보조 페이지 테이블에서 가장 중요한 것은 page fault handler 입니다. project 2에서 page fault 는 항상 커널이나 사용자 프로그램의 버그를 나타냅니다. 하지만 Project 3에서는 아닙니다. 현재 프로젝트에서는 page fault 가 해당 페이지를 파일이나 스왑 슬롯에서 가져와야 한다는 것만 나타낼 수 있습니다. 이러한 경우를 처리하려면 보다 정교한 page fault handler를 구현해야 합니다. userprog/exception.c에서 page_fault()인 page fault handler는 vm/vm.c에서 우리의 코드의 page fault handler인 vm_try_handle_fault()를 호출합니다.

페이지 폴트 핸들러는 대략 다음 작업을 수행해야 합니다:

Supplemental Page Table에서 오류가 발생한 페이지를 찾습니다: 메모리 참조가 유효한 경우, Supplemental Page Table 항목을 사용하여 해당 페이지에 들어가는 데이터를 찾습니다. 데이터는 파일 시스템이나 스왑 슬롯에 있을 수 있으며, 단순히 all-zero page 일 수도 있습니다. Copy-on-Write를 구현하는 경우, 페이지의 데이터는 이미 페이지 프레임에 있을 수 있지만 페이지 테이블에는 없을 수도 있습니다. Supplemental Page Table에서 사용자 프로세스가 접근하려는 주소에 데이터가 있을 것으로 기대해서는 안 됩니다. 페이지가 커널 가상 메모리 내에 있거나, 읽기 전용 페이지에 쓰기를 시도하는 경우, 접근이 유효하지 않습니다. 잘못된 접근은 프로세스를 종료하여 모든 리소스를 해제합니다.
페이지를 저장할 프레임을 얻습니다: Copy-on-Write를 구현한 경우 필요한 데이터가 이미 프레임에 있을 수 있으며, 이 경우 해당 프레임을 찾아야 합니다.
데이터를 프레임으로 가져옵니다: 파일 시스템에서 읽어오거나 스왑, zeroing 등을 통해 데이터를 프레임으로 가져옵니다. Copy-on-Write를 구현한 경우 필요한 페이지가 이미 프레임에 있을 수 있으며, 이 경우 이 단계에서 추가 조치가 필요하지 않습니다.
페이지 테이블 항목을 업데이트합니다: 폴트가 발생한 가상 주소의 페이지 테이블 항목을 물리 페이지로 매핑합니다. 이를 위해 threads/mmu.c의 기능을 사용할 수 있습니다.

🖼️ Managing the Frame Table

The frame table contains one entry for each frame. Each entry in the frame table contains a pointer to the page, if any, that currently occupies it, and other data of your choice. The frame table allows Pintos to efficiently implement an eviction policy, by choosing a page to evict when no frames are free.

Frame Table에는 각 Frame에 대해 하나의 항목이 포함됩니다. Frame Table의 각 항목에는 현재 해당 Frame을 점유하고 있는 Page에 대한 포인터와 기타 데이터가 포함되어 있습니다. Frame Table을 통해 Pintos는 Frame이 부족할 때 퇴출할 Page를 선택하여 효율적인 퇴출 정책을 구현할 수 있습니다.

The frames used for user pages should be obtained from the "user pool," by calling palloc_get_page(PAL_USER). You must use PAL_USER to avoid allocating from the "kernel pool," which could cause some test cases to fail unexpectedly. If you modify palloc.c as part of your frame table implementation, be sure to retain the distinction between the two pools.

사용자 페이지에 사용되는 Frame은 user pool에서 얻어야 하며, 이를 위해 palloc_get_page(PAL_USER)를 호출해야 합니다. PAL_USER를 사용하여 kernel pool에서 할당하는 것을 피해야 합니다. 그렇지 않으면 일부 테스트 케이스가 예기치 않게 실패할 수 있습니다. Frame Table 구현의 일환으로 palloc.c를 수정하는 경우, 두 풀 간의 구분을 반드시 유지해야 합니다.

The most important operation on the frame table is obtaining an unused frame. This is easy when a frame is free. When none is free, a frame must be made free by evicting some page from its frame.

Frame Table에서 가장 중요한 작업은 사용되지 않은 Frame을 얻는 것입니다. Frame이 비어 있을 때는 이 작업이 쉽습니다. 그러나 비어 있는 Frame이 없을 경우, 어떤 Page를 퇴출하여 Frame을 비워야 합니다.

If no frame can be evicted without allocating a swap slot, but swap is full, panic the kernel. Real OS apply a wide range of policies to recover from or prevent such situations, but these policies are beyond the scope of this project.

어떤 Frame도 스왑 슬롯을 할당하지 않고 퇴출할 수 없고, 스왑이 가득 찬 경우, 커널을 패닉 상태로 만드십시오. 실제 운영체제는 이러한 상황에서 복구하거나 방지하기 위해 다양한 정책을 적용하지만, 이러한 정책은 이 프로젝트의 범위를 벗어납니다.

퇴출 과정은 대략 다음 단계로 구성됩니다

페이지 교체 알고리즘을 사용하여 퇴출할 Frame을 선택합니다. 아래에 설명된 대로, 페이지 테이블의 "accessed" 비트와 "dirty" 비트가 유용할 것입니다.
해당 Frame을 참조하는 모든 페이지 테이블에서 참조를 제거합니다. 공유를 구현하지 않은 경우, 어느 시점에서든 하나의 Page만이 Frame을 참조해야 합니다.
필요한 경우, 페이지를 파일 시스템이나 스왑에 기록합니다. 퇴출된 Frame은 다른 Page를 저장하는 데 사용될 수 있습니다.

🗑️ Accessed and Dirty Bits

x86-64 hardware provides some assistance for implementing page replacement algorithms, through a pair of bits in the page table entry (PTE) for each page. On any read or write to a page, the CPU sets the accessed bit to 1 in the page's PTE, and on any write, the CPU sets the dirty bit to 1. The CPU never resets these bits to 0, but the OS may do so.

x86-64 하드웨어는 각 페이지 테이블 항목(PTE)에 있는 두 개의 비트를 통해 페이지 교체 알고리즘을 구현하는 데 도움을 줍니다. 페이지에 대한 읽기 또는 쓰기가 발생할 때마다 CPU는 페이지의 PTE에서 accessed 비트를 1로 설정하고, 쓰기가 발생할 때마다 CPU는 dirty bit를 1로 설정합니다. CPU는 이러한 비트를 0으로 재설정하지 않지만, 운영체제가 이를 할 수 있습니다.

You need to be aware of aliases, that is, two (or more) pages that refer to the same frame. When an aliased frame is accessed, the accessed and dirty bits are updated in only one page table entry (the one for the page used for access). The accessed and dirty bits for the other aliases are not updated.

Aliases라고 하는 것에 주의해야 합니다. 이것은 두 개 이상의 페이지가 동일한 프레임을 참조하는 경우를 말합니다. Aliased 프레임에 액세스할 때, 액세스된 페이지에 대해서만 페이지 테이블 항목(PTE)에서 accessed 및 dirty 비트가 업데이트됩니다. 다른 별칭에 대한 accessed 및 dirty 비트는 업데이트되지 않습니다.

In Pintos, every user virtual page is aliased to its kernel virtual page. You must manage these aliases somehow. For example, your code could check and update the accessed and dirty bits for both addresses. Alternatively, the kernel could avoid the problem by only accessing user data through the user virtual address.

Pintos 에서는 모든 사용자 가상 페이지가 해당하는 커널 가상 페이지에 별칭으로 지정됩니다. 이 별칭을 어떻게 관리할지 결정해야 합니다. 예를 들어, 코드에서는 각 주소의 access 및 dirty 비트를 확인하고 업데이트할 수 있습니다. 또는 커널은 사용자 데이터에 대해서만 사용자 가상 주소를 통해 액세스하여 이 문제를 회피할 수도 있습니다.

Other aliases should only arise if you implement sharing or if there is a bug in your code. See Section Page Table Accessed and Dirty Bits for details of the functions to work with accessed and dirty bits.

🚥 Managing the Swap Table

The swap table tracks in-use and free swap slots. It should allow picking an unused swap slot for evicting a page from its frame to the swap partition. It should allow freeing a swap slot when its page is read back or the process whose page was swapped is terminated.

스왑 테이블은 사용 중인 스왑 슬롯과 빈 스왑 슬롯을 추적합니다. 이는 페이지를 해당 프레임에서 스왑 파티션으로 퇴출하기 위해 사용되지 않은 스왑 슬롯을 선택할 수 있어야 함을 의미합니다. 페이지가 다시 읽혔거나 페이지를 스왑한 프로세스가 종료될 때 스왑 슬롯을 free할 수 있어야 합니다.

From the vm/build directory, use the command pintos-mkdisk swap.dsk --swap-size=n to create a disk named swap.dsk that contains a n-MB swap partition. Afterward, swap.dsk will automatically be attached as an extra disk when you run pintos. Alternatively, you can tell pintos to use a temporary n-MB swap disk for a single run with --swap-size=n.

vm/build 디렉토리에서 pintos-mkdisk swap.dsk --swap-size=n 명령을 사용하여 n-MB 스왑 파티션을 포함하는 swap.dsk라는 디스크를 생성합니다. 이후에 swap.dsk는 pintos를 실행할 때 자동으로 추가 디스크로 연결됩니다. 또는 --swap-size=n을 사용하여 단일 실행에 대한 임시 n-MB 스왑 디스크를 사용하도록 pintos에 지시할 수 있습니다.

Swap slots should be allocated lazily, that is, only when they are actually required by eviction. Reading data pages from the executable and writing them to swap immediately at process startup is not lazy. Swap slots should not be reserved to store particular pages. Free a swap slot when its contents are read back into a frame.

스왑 슬롯은 lazy 하게 할당되어야 합니다. 이 말은, eviction에 실제로 필요할 때만 할당되어야 한다는 말입니다. 프로세스가 시작될 때 실행파일에서 데이터 페이지들을 읽고 스왑에 곧바로 쓰는 행위는 lazy 하지 못한 행위입니다. 특정 페이지를 저장하기 위해 스왑 슬롯이 예약되어서는 안 됩니다. 스왑 슬롯의 내용물이 프레임으로 읽혀 돌아오면 그 때 스왑 슬롯을 free 해주면 됩니다.

📂 Managing Memory Mapped Files

The file system is most commonly accessed with read and write system calls. A secondary interface is to "map" the file into virtual pages, using the mmap system call. The program can then use memory instructions directly on the file data. Suppose file foo is 0x1000 bytes (4 kB, or one page) long. If foo is mapped into memory starting at address 0x5000, then any memory accesses to locations 0x5000...0x5fff will access the corresponding bytes of foo.

파일 시스템은 주로 read와 write 시스템 호출을 사용하여 액세스됩니다. 보조 인터페이스로는 파일을 가상 페이지로 "매핑"하는 것 입니다. 이는 mmap 시스템 호출을 사용합니다. 프로그램은 그런 다음 파일 데이터에 대해 직접 메모리 명령을 사용할 수 있습니다. 예를 들어, 파일 foo가 0x1000 바이트(4KB 또는 한 페이지) 길이인 경우입니다. 파일 foo가 주소 0x5000에서 메모리에 매핑되어 있다고 가정해보세요. 그럼 주소 0x5000부터 0x5fff까지의 모든 메모리 액세스는 foo의 해당 바이트에 액세스하게 됩니다.

#include 
#include 

int main (int argc UNUSED, char *argv[])
{
  void *data = (void *) 0x10000000;                 /* Address at which to map. */
  int fd = open (argv[1]);                          /* Open file. */
  void *map = mmap (data, filesize (fd), 0, fd, 0); /* Map file. */
  write (1, data, filesize (fd));                   /* Write file to console. */
  munmap (map);                                     /* Unmap file (optional). */
  return 0;
}

🔖 Sitation 📝

권영진(2022). "Virtual Memory.pptx"
원문 출처: https://casys-kaist.github.io/pintos-kaist/appendix/virtual_address.html

[CS-KEYWORD] 32bit OS 🆚 64bit OS

Tue, 21 May 2024 03:54:48 GMT

추억의 Window 98....

역사 및 발전

32bit OS는 1980년대부터 1990년대까지 널리 사용되었습니다. 대표적인 32bit OS 로는 Microsoft Windows 95/98/NT , Apple Mac OS X, Linux 등이 있습니다.

하지만 2000년대 이후에는 64 bit OS가 점차 대중화되었습니다. 64비트 운영 체제는 더 많은 메모리를 사용할 수 있고, 주소 공간 분할 문제를 해결하며, 보안이 강화되었습니다👮🏻‍♂️.

현재 대부분의 컴퓨터는 64 bit OS를 사용하고 있으며, 32 bit OS는 주로 오래된 컴퓨터 또는 특수 용도로 사용됩니다.

🕵 32비트와 64비트 운영체제 이해하기 🕵🏻‍♂️

1. 정의와 기본 개념

CPU의 bit width:
- 32비트: CPU register width가 32비트인 경우를 말합니다. 이는 CPU가 한 번에 처리할 수 있는 데이터 양과 직접 접근할 수 있는 메모리 양에 영향을 줍니다.
- 64비트: CPU register width가 64비트인 경우로, 더 많은 데이터 처리 능력과 훨씬 더 큰 메모리 주소 공간을 가집니다.

CPU bit width🤔?

CPU bit width는 CPU 레지스터가 한 번에 저장할 수 있는 데이터의 비트 수를 의미합니다. 마치 작업대의 폭(책상의 넓이!)과 같다고 생각하면 됩니다. 일반적으로 레지스터 너비는 8비트, 16비트, 32비트, 64비트 등이 있습니다.

2. 메모리 주소 지정

주소 지정 가능한 메모리 공간:
- 32비트 시스템:
  - 최대 주소 지정 가능 메모리 = ( 2³² ) byte = 4 GB.
  - 실제로는 시스템 리소스 예약(예: 하드웨어 메모리 매핑) 때문에 이보다 약간 낮을 수 있습니다.
- 64비트 시스템:
  - 이론적인 최대 주소 지정 가능 메모리 = ( 2⁶⁴) byte = 16 엑사바이트.

3. 레지스터 저장 및 데이터 처리

CPU 레지스터:
- 32비트 CPU: register width 로 32비트 값을 가질 수 있습니다.
- 64비트 CPU: register width 로 64비트 값을 가질 수 있습니다.

4. 성능 및 애플리케이션 호환성

성능 영향:
- 32비트 🪺: 메모리 요구가 적고 간단한 계산을 필요로 하는 애플리케이션에 적합합니다.
- 64비트 🏋🏾‍♂️: 더 많은 메모리와 복잡한 계산을 필요로 하는 애플리케이션에 대해 향상된 성능을 제공합니다:
  - 더 많은 메모리를 사용할 수 있어 디스크 스와핑 필요성을 줄여줍니다.
  - 더 많은 범용 레지스터와 더 큰 정수 크기로 처리 능력을 향상시킵니다.
소프트웨어 호환성:
- 32비트 OS: 32비트 애플리케이션을 네이티브로 실행할 수 있습니다.
- 64비트 OS: 32비트와 64비트 애플리케이션 모두 실행할 수 있습니다(에뮬레이터가 있는 경우).

5. 운영체제와 애플리케이션 제약

운영체제 한계:
- 32비트 OS: 4 GB 이상의 RAM을 네이티브로 지원할 수 없으므로, 메모리 집약적인 애플리케이션의 성능이 제한됩니다, 또한 32비트 운영 체제는 64비트 운영 체제보다 보안 취약점에 더 취약할 수 있습니다.
- 64비트 OS: 훨씬 더 많은 양의 RAM을 주소 지정할 수 있어, 요구가 많은 애플리케이션과 서비스에 더 나은 성능을 제공합니다.
애플리케이션 제약:
- 32비트 애플리케이션: 주소 공간이 4 GB로 제한되어 있어(실제로는 더 적음), 큰 데이터셋이나 복잡한 계산에는 제약이 있습니다.
- 64비트 애플리케이션: 더 많은 메모리를 사용할 수 있으며, 넓은 레지스터의 이점을 통해 향상된 성능을 제공합니다.

6. 32 bit OS 🔜 64 bit OS

도입 및 사용:
- 32비트에서 64비트로의 전환은 더 많은 메모리와 향상된 성능을 필요로 하는 현대 애플리케이션의 요구에 의해 이루어졌습니다.
- 대부분의 OS 와 HW 는 이제 주로 64비트 아키텍처를 지원하며, 📈
- 32비트 시스템*에 대한 지원은 점차 줄어들고 있습니다. 📉
컴파일러와 소프트웨어 설계:
- 컴파일러: 현대 컴파일러는 32비트와 64비트 바이너리를 모두 생성할 수 있으며, 아키텍처별 최적화를 포함하는 경우가 많습니다.
- 소프트웨어 설계: 개발자는 32비트와 64비트 시스템 간 전환 시 데이터 크기와 포인터를 적절히 처리하여 정수 오버플로우나 메모리 접근 오류와 같은 문제를 방지해야 합니다.

요약

32비트와 64비트 운영체제의 주요 차이점은 메모리 처리 및 데이터 처리 능력에 있습니다.
32비트 시스템은 4GB 메모리 한계와 더 좁은 CPU 레지스터로 인해 성능이 제한됩니다.
64비트 시스템은 더 많은 메모리를 주소 지정할 수 있으며, 더 넓은 레지스터로 인해 더 나은 성능과 복잡한 계산 처리 능력을 갖추고 있습니다.

😊 도움을 준 고마운 사람들 🥳

김광윤 : https://github.com/leorivk 정승호 : https://github.com/seungho-jg 황연경 : https://github.com/yunnn426 전병준 : https://github.com/jun9898

[CS-KEYWORD] File discriptor

Tue, 21 May 2024 01:34:25 GMT

유닉스 시스템의 File discriptor 📁

File discriptor란?

유닉스 시스템에서 File discriptor는 프로세스가 파일, 디렉토리, 소켓 등 다양한 객체에 접근하기 위한 중요한 개념입니다. 프로세스가 파일을 열면 kernel 은 해당 프로세스에 대한 File discriptor Table 에서 사용하지 않는 가장 작은 값을 할당하고, 이 값을 사용하여 파일을 식별하고 관리합니다.

File discriptor는 구체적으로 어떻게 쓰일까요? 🤔

프로세스 관리: 각 프로세스는 고유한 파일 디스크립터 테이블 을 가지고 있으며, 해당 테이블에는 프로세스가 열어둔 모든 파일과 관련된 정보가 저장됩니다.
파일 식별: 파일 디스크립터는 단순히 정수값일 뿐이지만, 프로세스 내에서 열린 파일을 고유하게 식별하는 데 사용됩니다.
파일 정보 저장: 파일 디스크립터 테이블의 각 항목에는 파일의 현재 위치, 상태, 권한 등 다양한 정보가 저장됩니다.
시스템 호출 연동: 프로세스가 read(), write(), close()와 같은 시스템 호출을 사용하여 파일에 접근할 때 파일 디스크립터 값을 사용하여 해당 파일을 식별합니다.

File discriptor를 그림으로 이해해봐요 🎨 🤓

1. 파일 디스크립터 테이블:

프로세스 당 하나씩 존재하며, 프로세스가 열어둔 모든 파일을 관리합니다.
정수 배열로 구현되며, 각 인덱스는 파일 디스크립터 값에 해당합니다.
각 항목은 파일이나 리소스에 대한 메타데이터를 저장합니다.

2. 파일 테이블:

파일 시스템 전체에 하나만 존재하며 (global) , 파일 시스템 내에서 열린 모든 파일을 추적합니다.
각 열린 파일에 대해 하나씩 파일 테이블 엔트리가 생성됩니다.
엔트리는 파일의 메타데이터를 저장합니다.

3. I-node 테이블:

유닉스 파일 시스템에서 사용하는 index 블록을 의미합니다.
파일의 데이터 블록을 가리키는 역할을 합니다.
파일의 크기, 소유자, 권한, 생성 시간, 수정 시간 등을 저장합니다.
파일 디스크립터 테이블의 항목과 연결되어 있으며, 각 파일에 대한 I-node의 인덱스를 포함합니다.

표준 파일 디스크립터 ✨

프로그램이 실행될 때 기본적으로 할당되는 세 개의 표준 파일 디스크립터가 있습니다.

STDIN_FILENO (0): 표준 입력 (키보드 입력)을 나타냅니다.
STDOUT_FILENO (1): 표준 출력 (콘솔 출력)을 나타냅니다.
STDERR_FILENO (2): 표준 에러 출력 (오류 메시지 출력)을 나타냅니다.
### File discriptor 의 작동 방식 ⚙️

프로세스가 파일을 열면 운영 체제는 해당 프로세스에 대한 File discriptor Table 에서 사용하지 않는 가장 작은 값을 할당합니다.

프로세스가 File discriptor 값을 사용하여 시스템 호출을 호출하면 운영 체제는 해당 값을 사용하여 Global File Table 에서 해당 파일을 식별합니다.

파일 테이블 엔트리에는 파일의 I-node 인덱스가 포함되어 있으며, 운영 체제는 I-node 테이블을 사용하여 파일의 실제 데이터 블록에 접근합니다.

😊 도움을 준 고마운 사람들 🥳 김광윤 : https://github.com/leorivk 정승호 : https://github.com/seungho-jg 황연경 : https://github.com/yunnn426 전병준 : https://github.com/jun9898

[CS-KEYWORD] User Stack and Kernel Stack

Mon, 20 May 2024 15:34:09 GMT

🤔 Question 🙋🏻

🧑🏻‍🎓 : 나의 궁금증

각 프로세스에는 사용자 모드 스택과 커널 모드 스택이 있나요? 아니면 각 프로세스에는 사용자 모드 스택이 있고, 모든 프로세스가 공유하는 커널 모드 스택은 유일한가요?

✅ Answer 👨‍🏫

🧑🏻‍🔬 :

kernel stack 과 user stack은 cpu processing mode에 따라 개별적으로 사용됩니다. 이 분할 상태를 통해 우리는 두 가지 모드(Kernel, User)를 쉽게 이해할 수 있습니다.

user stack과 kernel stack이라는 두 개의 메모리 영역은 프로세스 실행 및 시스템 상호 작용 처리에 필수적인 기능을 수행합니다.

⛹🏻‍♂️ User Stack : User Mode 프로세스의 작업 공간 🏋🏻

User Stack 은 User Mode 에서 실행되는 각 프로세스에 할당된 개별 메모리 영역입니다. User Mode 코드 실행에 필수적인 다양한 요소를 위한 동적 저장 공간 역할을 합니다.

사용자 스택의 주요 구성 요소

지역 변수: 함수 내에서 선언된 변수와 해당 값은 사용자 스택에 저장됩니다.
함수 매개 변수: 함수 호출 전에 함수에 전달되는 인수가 사용자 스택에 push됩니다.
프레임 포인터: frame pointer 는 스택 프레임 구조를 유지하여 중첩된 함수 호출에서 지역 변수 및 함수 매개 변수를 식별할 수 있도록 합니다.
반환 주소: 함수실행이 끝나면 호출자의 위치를 나타내는 반환 주소가 User Stack에 push 됩니다.
자동 변수: 함수 매개 변수 및 지역 변수를 포함하여 블록 범위 내에서 선언된 변수가 사용자 스택에 저장됩니다.
임시 값: 중간 결과 또는 함수 호출 간에 전달되는 값과 같은 임시 데이터가 User Stack에 저장될 수 있습니다.

🧱 블록 범위가 뭐지?!

함수 블록: 함수 블록은 {로 시작하고 }로 끝나는 코드 블록입니다. 함수 내에서 선언된 모든 변수는 해당 함수 블록의 범위에 속합니다.

조건문 블록: if, else, while과 같은 조건문 블록은 ()로 묶인 코드 블록입니다. 조건문 블록 내에서 선언된 변수는 해당 블록의 범위에 속합니다.
반복문 블록: for, while, do-while과 같은 반복문 블록은 ()로 묶인 코드 블록입니다. 반복문 블록 내에서 선언된 변수는 해당 블록의 범위에 속합니다.

User Stack 작업

Stack 작업을 하려면 Stack Pointer 가 있어야 겠죠? Stack Pointer 에 대해 알아봅시다! 📚 👈🏻

x86 아키텍처: ESP (Extended Stack Pointer): 32비트 모드에서 사용되는 스택 포인터입니다. RSP (Register Stack Pointer): 64비트 모드에서 사용되는 스택 포인터입니다. (PintOS) 에서는 RSP를 쓰겠군!

스택 포인터: Stack Pointer 는 특수 레지스터로서 User Stack의 현재 상단을 추적하여 스택 요소에 대한 효율적인 액세스를 허용합니다.
스택 푸시: 새로운 데이터가 스택 포인터를 감소시키고 데이터를 새 주소에 저장하여 스택의 상단에 추가됩니다.
스택 팝: 데이터가 스택 포인터 주소의 값을 읽고 스택 포인터를 증가시켜 스택의 상단에서 검색됩니다.

💂🏻 Kernel Stack: Kernel Mode 작업의 토대 👮🏻‍♂️

Kernel Stack은 OS 에 의해 독점적으로 사용되는 권한이 부여된 메모리 영역입니다. 커널 코드 실행과 시스템 상호 작용을 위한 작업 공간 역할을 합니다.

커널 스택의 주요 구성 요소:

커널 코드 변수: 커널 함수 내에서 선언된 변수와 해당 값이 Kernel Stack에 저장됩니다.
시스템 호출 매개 변수 및 반환 값: 시스템 호출에 전달되는 인수와 해당 반환 값이 Kernel Stack에 저장됩니다.
커널 데이터 포인터: 프로세스 디스크립터 및 메모리 관리 정보와 같은 커널 데이터 구조에 대한 포인터가 커널 스택에 저장됩니다.
하드웨어 컨텍스트 정보: 인터럽트 발생 시점의 하드웨어 컨텍스트, CPU 레지스터, 프로그래머 카운터 등이 커널 스택에 저장됩니다.

🤯 프로세스 디스크립터? 🔛

파일 디스크립터 는 들어봤는데.. 그렇다면 FD랑 유사한 개념인가?!

프로세스 디스크립터 에 대해 간략하게 알아봅시다!

*PID (Process Identifier) : * 프로세스를 식별하는 고유한 숫자입니다.

메모리 할당 정보 : 프로세스에 할당된 메모리 영역의 주소, 크기 및 사용 권한을 포함합니다.
파일 디스크립터 : 프로세스가 열어둔 파일과 관련된 정보를 포함합니다.
프로세스 우선순위 : 프로세스가 CPU 리소스를 얼마나 자주 사용할지 결정하는 값입니다.
부모 프로세스 : 프로세스를 생성한 프로세스의 PID를 포함합니다.
자식 프로세스 : 해당 프로세스를 생성한 자식 프로세스의 PID 목록을 포함합니다.
프로세스 상태 : 실행 중, 대기 중, 종료된 등 프로세스의 현재 상태를 나타냅니다.
PID 🥊 🆚 🤜🏻 Process Discriptor
PID는 Process Discriptor 의 고유한 식별자 역할을 하며, Kernel Stack 에 저장된 커널 데이터 포인터를 통해 Process Discriptor에 직접 접근할 수 있도록 합니다. 이를 통해 운영 체제는 프로세스를 효율적으로 관리하고 제어할 수 있습니다.

** 🛠️ 커널 스택 작업 🔧 **

스택 전환: 프로세스가 User Mode 에서 Kernel Mode로 전환될 때 Kernel Stack Pointer 으로 전환되어 User Mode 데이터에 대한 무단 액세스를 방지하고 격리합니다. 과정: 시스템 호출(syscall)이나 인터럽트가 발생하면, CPU는 현재 프로세스의 사용자 스택 포인터(esp 또는 rsp)를 저장하고, 커널 스택 포인터로 전환합니다. 이때 커널 스택은 프로세스 컨텍스트와 하드웨어 상태(레지스터 값 등)를 저장합니다.
인터럽트 처리: 인터럽트가 발생하면 CPU는 현재 실행 중인 코드의 상태를 저장하고 interupt handler를 호출합니다. 이 저장 작업은 커널 스택을 통해 이루어집니다.
 **인터럽트 처리 과정! 📈 **

 - **인터럽트 발생**: 하드웨어 장치(예: 디스크, 네트워크 카드 등)가 CPU에 인터럽트 신호를 보냅니다.
 - **현재 상태 저장**: CPU는 현재 프로세스의 레지스터 값, 프로그램 카운터(PC), 플래그 레지스터 등을 커널 스택에 저장합니다. 이는 인터럽트 발생 시점의 상태를 보존하여, 인터럽트 처리 후 원래 작업을 재개할 수 있도록 합니다.
 - **인터럽트 핸들러 실행**: 커널은 해당 인터럽트에 대한 인터럽트 핸들러(Interrupt Service Routine, ISR)를 실행합니다. 인터럽트 핸들러는 해당 인터럽트에 대한 특정 작업을 수행합니다.
 - **인터럽트 처리 완료**: 인터럽트 핸들러가 작업을 완료하면, CPU는 커널 스택에 저장된 이전 상태를 복원합니다.
 - **원래 작업 재개**: CPU는 인터럽트가 발생하기 전의 상태로 복귀하여, 중단되었던 사용자 모드의 작업을 계속 수행합니다.

User Stack 🆚 Kernel Stack: 간단 요약

User Stack

사용자 모드 코드에서 사용되는 스택
함수 매개 변수, 지역 변수, 임시 변수 등을 저장
각 함수 호출마다 생성 및 해제
사용자 모드 데이터에만 접근 가능

*Kernel Stack *

커널 모드 코드에서 사용되는 스택
시스템 호출 처리, 인터럽트 처리, 스택 전환 등에 사용
운영 체제 내내 유지
커널 데이터 및 사용자 모드 데이터에 접근 가능

스택 전환 과정에서 User Stack 포인터는 Kernel Stack 포인터로 전환되어 사용자 모드 데이터에 대한 무단 액세스를 방지합니다🙅🏻‍♂️

** 📊 정리정리 🧹 **

구분	User Stack	Kernel Stack
사용 모드	사용자 모드	커널 모드
용도	함수 실행, 변수 저장	시스템 호출, 인터럽트 처리, 스택 전환
생명 주기	함수 호출마다 생성 및 해제	운영 체제 내내 유지
데이터 액세스	사용자 모드 데이터만	커널 데이터 및 사용자 모드 데이터

[CS-KEYWORD] Register vs Memory

Mon, 20 May 2024 08:09:08 GMT

Register

Registers are the smallest data holding elements that are built into the processor itself. These are the memory locations that are directly accessible by the processor. It may hold an instruction, a storage address or any kind of data such as a bit sequence or individual characters. For example, an instruction may specify that the contents of two defined registers be multiplied together and then placed in a specific register.

Example: Accumulator register, Program counter, Instruction register, Address register, etc.

register는 프로세서 자체에 내장된 가장 작은 memory 이며, register는 프로세서에 의해 직접 액세스됩니다.
명령어, storage address 또는 비트 시퀀스 또는 개별 문자와 같은 임의의 종류의 데이터를 보유할 수 있습니다. 예를 들어, 명령어를 통해 두 개의 레지스터의 값에 대한 곱셈 값이 특정 레지스터에 저장될 수 있습니다.
ex) : 누산 레지스터, PC (프로그램 카운터), 명령 레지스터, 주소 레지스터 등.

Memory

Memory is a hardware device used to store computer programs, instructions and data. The memory that is internal to the processor is a primary memory (RAM), and the memory that is external to the processor is a secondary memory (Hard Drive). Memory can also be categorized on the basis of volatile and non-volatile memory. Volatile memory is memory that loses its contents when the computer or hardware device loses power. RAM (Random Access Memory) is an example of volatile memory. Non-volatile memory is the memory that keeps its contents even if power gets lost. EPROM is an example of non-volatile memory.

Memory는 컴퓨터 프로그램, 명령어 및 데이터를 저장하는 데 사용되는 하드웨어 장치입니다.
프로세서와 연결된 메모리는 Ram 이고, 프로세서의 외부에 있는 메모리는 보조 메모리인 Hard Drive 입니다. 메모리는 휘발성 메모리와 비휘발성 메모리를 기준으로 분류할 수도 있습니다.

휘발성 메모리

휘발성 메모리는 컴퓨터나 하드웨어 장치의 전원이 차단되면 내용물이 손실되는 메모리입이며 RAM이 이에 해당합니다.
비휘발성 메모리
비휘발성 메모리는 전원이 차단되어도 내용물을 유지하는 메모리이며, EPROM(Erasable Programmable Read-Only Memory)은 비휘발성 메모리의 한 예입니다.

🦾 결론

⚖️ 레지스터

register는 CPU 내부에 있는 작고 빠른 메모리 셀입니다. 각각의 register는 CPU가 직접 접근할 수 있어 매우 빠른 데이터 액세스를 제공합니다.
register는 현재 실행 중인 프로세스나 명령어에서 필요한 데이터를 보관하고, CPU가 이를 직접 사용하여 산술 논리 연산을 수행합니다.
register는 CPU가 다음에 실행할 명령어의 주소를 저장하는 Program Counter(PC)와 같은 중요한 제어 정보를 보유합니다.
register의 용량은 CPU 아키텍처에 따라 다르지만 보통 몇 비트에서 수십 비트까지입니다. 예를 들어, 32비트 레지스터는 32비트 데이터를 저장할 수 있습니다.

🎛️ 메모리

memory는 컴퓨터에서 데이터와 명령어를 저장하는 공간입니다. CPU 외부에 위치하며, register보다는 느리지만 대용량의 데이터를 저장할 수 있습니다.
주로 RAM(Random Access Memory)과 ROM(Read-Only Memory)으로 구성됩니다. RAM은 프로그램 실행 중에 필요한 데이터를 저장하고, ROM은 프로그램이 실행되는 데 필요한 기본 코드와 데이터를 저장합니다.
memory는 주소 버스를 통해 CPU와 통신하고 데이터를 주고받습니다. CPU가 메모리에 접근할 때마다 memory address를 참조하여 데이터를 읽거나 쓸 수 있습니다.
보조 기억 장치인 hard drive 등도 memory의 일종으로 데이터를 장기 저장하고 프로그램에 필요한 정보를 제공합니다.

🗂️ 테이블로 정리

N	레지스터	메모리
1	레지스터는 CPU가 현재 처리 중인 피연산자 또는 instruction을 보유합니다.	메모리는 CPU에서 현재 실행 중인 프로그램이 필요로 하는 명령과 데이터를 보유합니다.
2	레지스터는 대략 32비트에서 64비트 정도의 소량의 데이터를 보유합니다.	컴퓨터의 메모리는 몇 GB에서 TB까지 다양할 수 있습니다.
3	CPU는 한 클록 주기에서 여러 작업을 수행할 수 있는 레지스터의 내용을 처리할 수 있습니다.	CPU는 레지스터보다 메모리에 접근하는 속도가 느립니다.
4	누산기 레지스터, 프로그램 카운터, 명령 레지스터, 주소 레지스터 등이 있습니다.	메모리의 종류에는 RAM 등이 있습니다.
5	레지스터는 CPU에 의해 제어될 수 있습니다. 즉, 메모리와 달리 CPU에 의해 직접적으로 수정될 수 있습니다.	메모리에 데이터를 저장하거나 검색하는 것은 프로그램이나 운영 체제 등의 소프트웨어에 의해 관리됩니다.
6	레지스터는 메모리보다 빠릅니다.	RAM은 레지스터보다 훨씬 느립니다.

[출처] : "Difference between Register and Memory" - GeeksforGeeks. https://www.geeksforgeeks.org/difference-between-register-and-memory/

[CS-KEYWORD] User Mode and Kernel Mode

Mon, 20 May 2024 07:02:05 GMT

User Mode

When a Program is booted up on an Operating system let’s say Windows, then it launches the program in user mode. When a user-mode program requests to run, a process and virtual address space (address space for that process) are created for it by Windows. User-mode programs are less privileged than user-mode applications and are not allowed to access the system resources directly. For instance, if an application under user mode wants to access system resources, it will have to first go through the Operating system kernel by using syscalls.

운영 체제 (Windows로 가정)에서 User Mode Program이 실행을 요청하면 OS는 process 및 Virtual address space이 만들어집니다. User Mode Program은 User Mode applications보다 권한이 낮고 시스템 리소스에 직접 액세스할 수 없습니다. 예를 들어, 사용자 모드 응용 프로그램이 시스템 리소스에 액세스하려면 먼저 syscall을 사용하여 OS kernel을 통과해야 합니다.

User Mode Program 🆚 User Mode Applications

공통점

실행 환경
- 모두 운영 체제의 사용자 모드에서 실행됩니다.
- 사용자 모드는 시스템의 커널 모드와 대비되는 개념으로, 사용자 모드에서는 프로그램이 직접 하드웨어 자원에 접근하지 않고 운영 체제의 API를 통해 간접적으로 접근합니다.
안정성 및 보안
- 사용자 모드에서 실행되는 프로그램들은 시스템 전체에 영향을 미치지 않으며, 개별 프로그램이 문제가 발생하더라도 해당 프로그램만 종료될 수 있습니다.
- 이는 시스템의 안정성과 보안을 유지하는 데 중요한 역할을 합니다.
자원 접근 방식
- 사용자 모드 프로그램과 애플리케이션 모두 운영 체제의 API를 사용하여 시스템 자원에 접근합니다.
- 직접적인 하드웨어 자원 접근은 커널 모드를 통해 이루어집니다.

차이점

1. 용어의 범위

사용자 모드 프로그램(User Mode Program):
- 사용자 모드에서 실행되는 모든 종류의 프로그램을 포함하는 광범위한 용어입니다.
- 시스템 유틸리티나 백그라운드 서비스 등도 사용자 모드 프로그램에 포함될 수 있습니다.
사용자 모드 애플리케이션(User Mode Applications):
- 일반적으로 우리가 사용하는 응용 프로그램을 지칭합니다.
- 예를 들어, 웹 브라우저, 워드 프로세서, 게임 등이 사용자 모드 애플리케이션에 해당합니다.

2. 목적 및 사용 사례

사용자 모드 프로그램(User Mode Program):
- 시스템 관리 및 유지 보수와 관련된 프로그램이 포함될 수 있습니다.
- 사용자 인터페이스가 없거나 최소한의 인터페이스를 가진 프로그램들도 포함됩니다.
사용자 모드 애플리케이션(User Mode Applications):
- 사용자와 직접 상호작용하는 응용 프로그램을 지칭합니다.
- 직관적인 그래픽 사용자 인터페이스(GUI)를 가지고 있으며, 사용자가 특정 작업을 수행하기 위해 사용하는 프로그램입니다.

3. 예시

사용자 모드 프로그램(User Mode Program):
- 시스템 모니터링 도구, 백업 소프트웨어, 네트워크 유틸리티 등이 포함될 수 있습니다.
사용자 모드 애플리케이션(User Mode Applications):
- Microsoft Word, Google Chrome, Adobe Photoshop, 게임 소프트웨어 등이 이에 해당합니다.

✅ 사용자 모드 프로그램과 사용자 모드 애플리케이션은 사용자 모드에서 실행된다는 공통점이 있습니다. ❌ 용어의 범위와 목적, 사용 사례에 있어 차이점이 존재합니다.

User Mode Program은 User Mode applications보다 권한이 낮고 시스템 리소스에 직접 액세스할 수 없습니다.

🤔 위 문장이 시사하는 바가 무엇일까? 예시를 통해 알아보자!

백그라운드 업데이트 프로그램 vs. 웹 브라우저

백그라운드 업데이트 프로그램 (User Mode Program):
- 개요: 업데이트 프로그램은 운영 체제나 소프트웨어의 업데이트를 자동으로 확인하고 다운로드합니다. 사용자가 직접 이 프로그램을 실행하거나 상호작용하지 않습니다.
- 권한: 시스템 자원에 대한 접근 권한이 제한적입니다. 업데이트 파일을 다운로드하고 설치하는 기본 기능만 수행합니다.
- 보안과 안정성: 많은 권한을 가지지 않기 때문에 시스템 전체에 미치는 영향이 적고, 잠재적인 보안 취약점이 줄어듭니다.
웹 브라우저 (User Mode applications):
- 개요: 사용자가 직접 사용하는 응용 프로그램으로, 인터넷 서핑, 이메일 확인, 온라인 쇼핑 등을 합니다.
- 권한: 더 많은 시스템 자원에 접근할 수 있는 권한이 필요합니다. 예를 들어, 네트워크에 접근하거나 파일을 다운로드하고 저장할 수 있습니다.
- 보안과 안정성: 다양한 기능을 제공하기 때문에 더 많은 권한이 필요하며, 사용자가 직접 상호작용하므로 보안에 신경을 써야 합니다.

위의 예시를 통해 User Mode Program 이 제한된 권한을 가지는 이유와, User Mode Application이 더 많은 권한을 필요로 하는 이유를 쉽게 이해할 수 있습니다 😊

Kernel Mode

The kernel is the core program on which all the other operating system components rely, it is used to access the hardware components and schedule which processes should run on a computer system and when, and it also manages the application software and hardware interaction. Hence it is the most privileged program, unlike other programs, it can directly interact with the hardware. When programs running under user mode need hardware access for example webcam, then first it has to go through the kernel by using a syscall, and to carry out these requests the CPU switches from user mode to kernel mode at the time of execution. After finally completing the execution of the process the CPU again switches back to the user mode.

Kernel은 다른 모든 OS 구성 요소들이 의존하는 핵심 프로그램으로, 하드웨어 구성 요소들에 접근하여 컴퓨터 시스템에서 어떤 프로세스를 언제 실행해야 하는지 스케쥴링하며, 응용 소프트웨어와 하드웨어 상호 작용도 관리합니다.
따라서 Kernel은 다른 프로그램과 달리 하드웨어와 직접 상호 작용할 수 있는 가장 특권이 있는 프로그램입니다. 웹캠과 같은 User Mode에서 실행되는 프로그램이 하드웨어 액세스가 필요할 때, 먼저 syscall 을 사용하여 kernel을 통과해야 하며, 이러한 요청을 수행하기 위해 CPU는 실행 시 User Mode에서 kernel Mode로 전환됩니다. 최종적으로 프로세스 실행을 완료한 후 CPU는 다시 User Mode 로 전환됩니다.

정리

Difference Between Kernel Mode and User Mode

Criteria	Kernel Mode	User Mode
Access to Resources	Kernel mode에서 프로그램은 시스템 자원에 직접적이고 무제한적인 접근이 가능합니다.	User mode에서 응용 프로그램은 시스템 자원에 직접 접근할 수 없으며, 자원에 접근하기 위해서는 시스템 호출을 해야 합니다.
Interruptions	Kernel mode에서 인터럽트가 발생하면 전체 OS가 다운될 수 있습니다.	User mode에서 인터럽트가 발생하면 단일 프로세스만 실패합니다.
Modes	Kernel mode는 master mode, privileged mode, system mode로도 알려져 있습니다.	User mode는 unprivileged mode, restricted mode, slave mode로도 알려져 있습니다.
Virtual address space	Kernel mode에서는 모든 프로세스가 단일 가상 주소 공간을 공유합니다.	User mode에서는 모든 프로세스가 별도의 가상 주소 공간을 가집니다.
Level of privilege	Kernel mode에서는 응용 프로그램이 더 많은 권한을 가집니다.	User mode에서는 응용 프로그램이 더 적은 권한을 가집니다.
Restrictions	Kernel mode는 사용자 프로그램과 kernel 프로그램 모두에 접근할 수 있기 때문에 제한이 없습니다.	User mode는 kernel 프로그램에 접근해야 하며, 직접 접근할 수 없습니다.
Mode bit value	Kernel mode의 모드 비트 값은 0입니다.	User mode의 모드 비트 값은 1입니다.
Memory References	Kernel mode에서는 두 메모리 영역을 모두 참조할 수 있습니다.	User mode에서는 User mode에 할당된 메모리만 참조할 수 있습니다.
System Crash	Kernel mode에서 시스템 크래시는 심각하며 상황을 더욱 복잡하게 만듭니다.	User mode에서 시스템 크래시는 단순히 세션을 재개함으로써 복구될 수 있습니다.
Access	이 모드에서는 필수 기능만 작동할 수 있습니다.	User 프로그램은 이 모드에서 접근하고 실행할 수 있습니다.
Functionality	Kernel mode는 시스템의 모든 메모리 블록을 참조할 수 있으며, CPU에게 명령 실행을 지시할 수 있어 매우 강력하고 중요한 모드입니다.	User mode는 표준 및 일반적인 보기 모드로, 정보는 스스로 실행되거나 메모리 블록을 참조할 수 없으며, 이를 위해 API(Application Protocol Interface)가 필요합니다.

출처: GeeksforGeeks - Difference between User Mode and Kernel Mode

부동소수점과 고정소수점

Sun, 19 May 2024 08:11:13 GMT

고정 소수점 (Fixed-point)

고정 소수점 수는 소수점의 위치가 고정된 형태로 소수를 표현합니다. 고정된 비트를 정수 부분과 소수 부분에 나누어 사용합니다. 예를 들어, 16비트 고정 소수점 형식에서 8비트를 정수 부분, 나머지 8비트를 소수 부분에 사용할 수 있습니다.

장점

고정된 소수점 위치로 인해 연산 속도가 빠르고 단순하다.
특정 상황(예: 임베디드 시스템)에서 메모리와 성능 효율이 좋다.

단점

표현할 수 있는 수의 범위가 제한적이다.
정밀도가 고정되어 있어, 매우 큰 수나 매우 작은 수를 표현하는 데 한계가 있다.

부동 소수점 (Floating-point)

부동 소수점 수는 소수점의 위치가 가변적입니다. 이는 더 넓은 범위의 수를 더 큰 정밀도로 표현할 수 있게 합니다. IEEE 754 표준에 따라, 일반적으로 32비트(단정밀도)와 64비트(double! 배정밀도) 형식이 사용됩니다.

장점

매우 큰 수와 매우 작은 수를 표현할 수 있는 넓은 범위를 가진다.
높은 정밀도로 다양한 계산을 수행할 수 있다.

단점

연산이 복잡하고 느리다.
특정 숫자 범위 내에서만 정밀도가 높다.

비교

특성	고정 소수점	부동 소수점
표현 범위	제한적	넓음
정밀도	고정	가변
연산 속도	빠름	느림
메모리 사용	적음	많음
복잡도	단순	복잡

🪶 부동 소수점의 표현 방식

부동 소수점의 표현 방식은 fixed-point 와 유사하다, 하지만 정규화 과정 과 Bias 를 추가하는 과정이 따라 붙는다.

정규화 과정을 통해 지수 표기법 으로 이진수를 나타낸 뒤, 지수 와 bias(32bit) = 127 을 더하여 bias를 추가한다. 위의 그림 예제에서는 지수가 3 이다. (10³)

🦉 Bias를 추가하는 이유

그렇다면 자연스럽게 드는 의문이 있을 것이다. Bias는 왜 추가하는 것일까?

부동 소수점 계산에서 Bias 지수를 사용하는 주요 이유는 바로 덧셈 회로 활용을 통한 효율적인 계산 때문 입니다.

덧셈 회로와 🆚 뺄셈 회로

🪃 구조적 차이

덧셈 회로는 간단한 XOR와 AND 게이트로 구성되어 있습니다. 뺄셈 회로는 XOR, AND, NOT 게이트를 조합하여 복잡한 빌림(Borrow) 비트를 계산합니다.

🎢 연산 복잡성

덧셈 회로는 각 비트를 독립적으로 더하는 간단한 연산입니다. 뺄셈 회로는 빌림(Borrow) 비트를 계산해야 하므로 조금 더 복잡한 연산이 필요합니다.

💻 하드웨어 구현

덧셈 회로는 간단한 구조로 이루어져 있어 하드웨어에서 쉽게 구현할 수 있습니다. 뺄셈 회로는 빌림(Borrow) 비트를 계산해야 하므로 구현이 더 복잡합니다.

🙆‍♂️ 따라서 Bias를 추가하는 이유는 아래와 같습니다.

Bias exponent를 사용하면 부호 없는 수 표현을 통해 실수 연산을 편리하게 할 수 있게 되고, 덧셈 회로의 이점을 이용해서 연산이 단순해집니다.

float의 경우 127(0111_1111)을 더하고, double의 경우 1023(0011_1111_1111)을 지수에 더합니다.

🐈 꼬리에 꼬리를 무는 질문...

해당 포스트는 처음에 PintOS의 커널단에서는 Fixed-point 연산을 통해 실수를 계산한다고 한 것에서부터 시작됐다.

fixed-point 방식과 floating-point 방식의 차이.
floating-point로 표현하는 과정에서 나타난 bias 지수 표기법

앞으로 PintOS를 하면서 궁금한 것이 생기면 이렇게 조금씩 정리를 해놔야겠다! 😼

😊 도움을 준 고마운 사람들 🥳

김광윤 : https://github.com/leorivk 정승호 : https://github.com/seungho-jg 황연경 : https://github.com/yunnn426 전병준 : https://github.com/jun9898

[PintOS] 1-3 Advanced Scheduler (구현)

Sat, 18 May 2024 06:07:31 GMT

Main goal

Implement 4.4 BSD scheduler MLFQ like scheduler without the queues.
MLFQS (Multi-level Feedback Queue Scheduling) 와 유사한 스케쥴러를 구현한다.

🙋🏻 왜 `MLFSQS like` 일까?

❌ Priority의 값에 따라 Ready-Queue 여럿 존재하며, 순위를 기준으로 Ready-Queue를 실행하며, Queue에 여러 쓰레드가 존재할 경우 Round-Robin 으로 큐를 실행한다. *(Multi-Level) *
✅ Priority의 값을 일정 tick 마다 조정한다. (Feedback)

여기서, 우리는 여러개의 Ready_Queue를 관리하는 것이 아니라, 한 개의 all_list 로 feedback system을 관리할 것 이기 때문에 MLFQS like 이라고 표현을 하였다!

priority feedback에 사용되는 식

Priority

priority = PRI_MAX - (recent_cpu / 4) - (nice * 2),

recent_cpu

recent_cpu = (2 * load_avg)/(2 * load_avg + 1) * recent_cpu + nice

load_avg

load_avg = (59/60) * load_avg + (1/60) * ready_threads,

추가 함수

fixed_point.h & fixed_point.c

#include 

#define F (1 << 14)  // 고정 소수점 표현을 위한 상수, 여기서 14는 소수 부분의 비트 수

int fixed_add(int x, int y);
int fixed_sub(int x, int y);
int fixed_add_int(int x, int n);
int fixed_sub_int(int x, int n);
int fixed_mul(int x, int y);
int fixed_div(int x, int y);
int int_to_fixed(int n);
int fixed_to_int_round(int x);
int fixed_to_int_trunc(int x);
int fixed_div_int (int x, int n);
int fixed_mul_int (int x, int n);

#include 
#include 
#include "threads/fixed_point.h";

// fixed_point 간 덧셈
int fixed_add(int x, int y) { 
    return x + y;
}
// fixed_point 간 뺄셈
int fixed_sub(int x, int y) {
    return x - y;
}

// fixed_point + 정수 
int fixed_add_int(int x, int n) {
    return x + n * F;
}

// fixed_point - 정수
int fixed_sub_int(int x, int n) {
    return x - n * F;
}

// fixed_point끼리 곱하기
int fixed_mul(int x, int y) {
    return ((int64_t) x) * y / F;
}

// fixed_point끼리 나누기
int fixed_div(int x, int y) {
    return ((int64_t) x) * F / y;
}

// 정수를 fixed_point로
int int_to_fixed(int n) {
    return n * F;
}

// fixed_point * 정수
int fixed_mul_int (int x, int n) {
  return x * n;
}

// fixed_point / 정수
int fixed_div_int (int x, int n) {
  return x / n;
}

// fixed_point를 정수로 바꾸되, 반올림처리
int fixed_to_int_round(int x) {
    if (x >= 0) {
        return (x + F / 2) / F;
    } else {
        return (x - F / 2) / F;
    }
}

// fixed_point를 정수 바꾸되, 내림
int fixed_to_int_trunc(int x) {
    return x / F;
}

이전 포스팅에서 언급 했듯이, recent_cpu , load_avg 및 decay는 fixed_point이기 때문에 fixed_point 와 정수를 연산하려면 위와 같은 함수를 사용하여 연산하여야 합니다.

calc_all_priority() & calc_priority()

priority = PRI_MAX - (recent_cpu / 4) - (nice * 2),

void
calc_all_priority() {
    // In every fourth tick, recompute the priority of all threads
    // priority = PRI_MAX – (recent_cpu / 4) – (nice * 2)
    struct list_elem *e;
    struct thread *cur;
    for (e = list_begin (&all_list); e != list_end (&all_list); e = list_next (e)) {
        cur = list_entry (e, struct thread, allelem); 
        calc_priority(cur);
    }
}


void
calc_priority(struct thread *cur) {
    // In every fourth tick, recompute the priority of all threads
    // priority = PRI_MAX – (recent_cpu / 4) – (nice * 2)
    if(cur==idle_thread) return;
    int tmp = fixed_to_int_trunc (
        fixed_add_int(fixed_div_int (cur->recent_cpu, -4),
          PRI_MAX - cur->nice * 2)
    );
    if (tmp > PRI_MAX) {
        tmp = PRI_MAX;
    } else if (tmp < PRI_MIN) {
        tmp = PRI_MIN;
    }
    cur->priority = tmp;
}

우선순위를 구해서 thread에 지정해주는 함수. 우선순위를 구하기 위해선 우선 recent_cpu를 최신화해주어야 한다.

calc_all_recent_cpu() & calc_recent_cpu()

calc_all_recent_cpu()

void
calc_all_recent_cpu() {
    // In every second, update every thread’s recent_cpu
    // recent_cpu = decay * recent_cpu + nice,
    struct list_elem *e;
    struct thread *cur;
    for (e = list_begin (&all_list); e != list_end (&all_list); e = list_next (e)) {
        cur = list_entry (e, struct thread, allelem);
        calc_recent_cpu(cur);
    }
}

모든 쓰레드가 담긴 all_list를 순회하며 calc_recent_cpu를 실행해주는 함수이다.

calc_recent_cpu(struct thread *cur)

Priority = PRI_MAX – (recent_cpu/4) – (nice2) recent_cpu = (2load_avg)/(2load_avg+1)recent_cpu + nice load_avg = (59/60)load_avg + (1/60)ready_threads

void
calc_recent_cpu(struct thread *cur) {
    // In every second, update every thread’s recent_cpu
    // recent_cpu = decay * recent_cpu + nice,
    if (cur == idle_thread) return;
    int tmp = fixed_mul_int(load_avg,2);
    int decay = fixed_div(tmp , fixed_add_int(tmp,1));
    int ret = fixed_add_int(fixed_mul(decay , cur->recent_cpu) , cur->nice);
    if ((ret >> 31) == (-1) >> 31) {
        ret = 0;
    }
    cur->recent_cpu = ret;
}

idle 쓰래드가 아닐 경우에 최신화된 decay를 사용해 recent_cpu를 업데이트 해주는 코드이다.

load_avg

load_avg = (59/60) * load_avg + (1/60) * ready_threads,


void
calc_load_avg() {
    int ready_threads;

      if (thread_current () == idle_thread)
        ready_threads = list_size (&ready_list);
      else
        ready_threads = list_size (&ready_list) + 1;

      load_avg = fixed_add(fixed_mul (fixed_div (int_to_fixed (59), int_to_fixed (60)), load_avg), 
                            fixed_mul_int (fixed_div (int_to_fixed (1), int_to_fixed (60)), ready_threads));
}

load_avg를 계산해서 최신화해주는 함수.

수정해줘야 하는 함수

우리가 구현할 MLFQS like에서는 priority를 donate하는 방식이 아닌 사전에 협의된 방적식에 의거해서 priority를 설정해준다. 그래서 MLFQS 를 테스트 할때는 해당 기능을 잠시 비활성화 해줘야 한다.

thread_init & thread_create & thread_exit

thread_init

void
thread_init (void) {
    ASSERT (intr_get_level () == INTR_OFF);

    /* Reload the temporal gdt for the kernel
     * This gdt does not include the user context.
     * The kernel will rebuild the gdt with user context, in gdt_init (). */
    struct desc_ptr gdt_ds = {
        .size = sizeof (gdt) - 1,
        .address = (uint64_t) gdt
    };
    lgdt (&gdt_ds);

    /* Init the globla thread context */
    lock_init (&tid_lock);
    list_init (&ready_list);
    list_init (&sleep_list);
    list_init (&destruction_req);
    list_init (&all_list); // all_list 추가

    /* Init global_ticks to track the minimum ticks */
    global_ticks = INT64_MAX; // global_ticks의 타입은 

    /* Set up a thread structure for the running thread. */
    initial_thread = running_thread ();
    init_thread (initial_thread, "main", PRI_DEFAULT);
    list_push_back(&all_list, &(initial_thread->allelem)); // all_list 추가
    initial_thread->status = THREAD_RUNNING;
    initial_thread->tid = allocate_tid ();

모든 쓰레드를 순회하기 위해선 all_list를 추가해줘야 한다.

thread_create

tid_t
thread_create (const char *name, int priority,
        thread_func *function, void *aux) {
    struct thread *t;
    tid_t tid;

    ASSERT (function != NULL);

    .
    .
    .

    list_push_back(&all_list, &t->allelem); // 이 부분 추가

    /* Add to run queue. */
    thread_unblock (t);
    preempt();
    return tid;
}

thread_create 함수가 실행될때 현재 쓰레드를 all_list에 추가해줘야 한다.

thread_exit

void
thread_exit (void) {
    ASSERT (!intr_context ());

#ifdef USERPROG
    process_exit ();
#endif

    /* Just set our status to dying and schedule another process.
       We will be destroyed during the call to schedule_tail(). */
    list_remove(&thread_current()->allelem); // 이부분 추가
    intr_disable ();
    do_schedule (THREAD_DYING);
    NOT_REACHED ();
}

쓰레드가 종료될때 all_list에서 해당 쓰레드를 제거해주는 로직을 추가해줘야 한다.

lock_acquire & lock_release & thread_set_priority

lock_acquire (struct lock *lock)

void
lock_acquire (struct lock *lock) {
    ASSERT (lock != NULL);
    ASSERT (!intr_context ());
    ASSERT (!lock_held_by_current_thread (lock));

    struct thread *cur = thread_current();

    // thread_mlfqs 일때만 실행되도록
    if (thread_mlfqs) {    
        if (lock->holder) {
            cur->wait_on_lock = lock;
        }
        sema_down (&lock->semaphore); // mlfqs면 도네이트 안하고 바로 리턴해야함
        lock->holder = cur;
        lock->holder->wait_on_lock = NULL;
        return; // mlfqs면 도네이트 안하고 바로 리턴해야함
    }

    if (lock->holder) {
        cur->wait_on_lock = lock;
        list_insert_ordered(&lock->holder->donation_list, &cur->donation_elem,cmp_priority_by_donation_elem,NULL);
        // 그러면, list 내에서 next,prev 설정을 자동적으로 해주겠구나.
        donate();
    }

    sema_down (&lock->semaphore);
    cur->wait_on_lock = NULL;     // sema down이 끝나면, 현재 cur는 작업을 끝낸 것 이므로 기다리는 것이 없음.
    lock->holder = cur;
}

lock_release (struct lock *lock)

void
lock_release (struct lock *lock) {
    ASSERT (lock != NULL);
    ASSERT (lock_held_by_current_thread (lock));

    struct list_elem *e;
    struct thread *cur = thread_current ();

      if (thread_mlfqs) {
        lock->holder = NULL;
        sema_up (&lock->semaphore);
        return;
      }

    /* donation 리스트를 순회하며 */
    for (e = list_begin (&cur->donation_list); e != list_end (&cur->donation_list); e = list_next (e)){
        struct thread *t = list_entry (e, struct thread, donation_elem);
        /* 인자로 받은 lock을 원해서 donate를 한 경우라면 */
        if (t->wait_on_lock == lock)
            list_remove (&t->donation_elem);
    }
    update_donation();

    lock->holder = NULL;
    sema_up (&lock->semaphore);
}

thread_set_priority(int new_priority)

void
thread_set_priority (int new_priority) {
    if (thread_mlfqs) {
        return;
    }
    thread_current ()->original_priority = new_priority;
    update_donation();
    preempt();
}

mlfqs like 옵션이 켜졌을때 donate관련 기능을 비활성화 시켜줘야 한다.

timer_interrupt

static void
timer_interrupt (struct intr_frame *args UNUSED) {
    ticks++; 
    thread_tick ();


    if (thread_mlfqs) {
        // In every clock tick, increase the running thread’s recent_cpu by one.
        incr_recent_cpu();
        if (timer_ticks() % 4 == 0) {
            // 모든 스레드에 대해 이거 재계산 하라함
            calc_all_priority();
        }
        if (timer_ticks() % TIMER_FREQ == 0) {
            calc_all_recent_cpu();
            calc_load_avg();
        }
    }

이렇게 작성된 함수를 timer_interrupt에 조건에 맞게 실행해주면 된다.

[PintOS] 1-3 Advanced Scheduler

Fri, 17 May 2024 04:55:34 GMT

📈 Advanced Scheduler

Like the priority scheduler, the advanced scheduler chooses the thread to run based on priorities. However, the advanced scheduler does not do priority donation. Thus, we recommend that you have the priority scheduler working, except possibly for priority donation, before you start work on the advanced scheduler.

고급 스케줄러는 우선 순위 스케줄러 (Priority scheduler) 와 마찬가지로 우선 순위에 따라 실행할 스레드를 선택합니다. 그러나 고급 스케줄러는 Priority Donation을 하지 않습니다. 따라서 고급 스케줄러 작업을 시작하기 전에 Priority Donation를 제외하고 우선 순위 스케줄러가 작동하도록 하는 것이 좋습니다.

When the 4.4BSD scheduler is enabled, threads no longer directly control their own priorities. The priority argument to thread_create() should be ignored, as well as any calls to thread_set_priority(), and thread_get_priority() should return the thread's current priority as set by the scheduler. The advanced scheduler is not used in any later project.

4.4BSD 스케줄러 가 활성화되면 스레드는 더 이상 자신의 우선 순위를 직접 제어하지 않습니다. thread_create() 에 대한 priority 매개변수 뿐만 아니라 thread_set_priority() 에 대한 호출도 무시해야 하며 thread_get_priority() 에 대한 호출도 스케줄러가 설정한 대로 스레드의 현재 우선 순위를 반환해야 합니다. (아래 자세한 설명이 있습니다!)
고급 스케줄러(mlsfq)는 이후 프로젝트에서 사용되지 않습니다.

📝 4.4BSD Scheduler

The goal of a general-purpose scheduler is to balance threads' different scheduling needs. Threads that perform a lot of I/O require a fast response time to keep input and output devices busy, but need little CPU time. On the other hand, compute-bound threads need to receive a lot of CPU time to finish their work, but have no requirement for fast response time. Other threads lie somewhere in between, with periods of I/O punctuated by periods of computation, and thus have requirements that vary over time. A well-designed scheduler can often accommodate threads with all these requirements simultaneously.

범용 스케줄러의 목표는 쓰레드의 서로 다른 스케줄링 요구의 균형을 맞추는 것입니다.
I/O 를 많이 수행하는 스레드는 입력 및 출력 장치를 바쁘게 유지하기 위해 빠른 응답 시간이 필요하지만 CPU 시간이 거의 필요하지 않습니다.
반면에, 많은 계산 이 요구되는 쓰레드는 작업을 완료하기 위해 많은 CPU 시간을 받아야 하지만 빠른 응답 시간에 대한 요구 사항은 없습니다.
다른 쓰레드는 둘의 중간 지점에 있으므로 시간에 따라 달라지는 요구 사항을 가집니다. _ 잘 설계된 스케줄러는 이러한 쓰레드를의 요구사항을 즉각적으로 수용할 수 있습니다._

This type of scheduler maintains several queues of ready-to-run threads, where each queue holds threads with a different priority. At any given time, the scheduler chooses a thread from the highest-priority non-empty queue. If the highest-priority queue contains multiple threads, then they run in "round robin" order.

이 유형의 스케줄러는 실행 준비가 완료된 쓰레드의 여러 큐를 유지하며, 각 큐는 다른 우선 순위를 가진 스레드를 보유합니다.
스케줄러 는 비어 있지 않으며, 가장 높은 우선 순위의 큐에서 스레드를 선택합니다. 가장 높은 우선 순위의 큐에 여러 스레드가 포함되어 있으면 Round-Robin 순서로 실행됩니다.

Multiple facets of the scheduler require data to be updated after a certain number of timer ticks. In every case, these updates should occur before any ordinary kernel thread has a chance to run, so that there is no chance that a kernel thread could see a newly increased timer_ticks() value but old scheduler data values.

⚠️ The 4.4BSD scheduler does not include priority donation ⚠️

스케줄러는 일정 수의 타이머 틱 후에 데이터를 업데이트해야 합니다.
이러한 업데이트는 일반 커널 스레드 가 실행할 기회를 갖기 전에 이루어져야 하며, 따라서 커널 스레드가 새로 증가된 timer_ticks() 값을 볼 수 있지만, 오래된 스케줄러 데이터 값을 볼 가능성이 없습니다.
*The 4.4BSD 스케쥴러는 우선순위 기부를 하지 않습니다!! *

👍 Niceness 😇

Thread priority is dynamically determined by the scheduler using a formula given below. However, each thread also has an integer nice value that determines how "nice" the thread should be to other threads. A nice of zero does not affect thread priority. A positive nice, to the maximum of 20, decreases the priority of a thread and causes it to give up some CPU time it would otherwise receive. On the other hand, a negative nice, to the minimum of -20, tends to take away CPU time from other threads.

The initial thread starts with a nice value of zero. Other threads start with a nice value inherited from their parent thread. You must implement the functions described below, which are for use by test programs. We have provided skeleton definitions for them in threads/thread.c

스케줄러는 다음과 같은 공식을 사용하여 Priority 를 동적으로 결정합니다. 그러나 각 쓰레드에는 해당 쓰레드가 다른 쓰레드에 얼마나 Nice 한 것인지를 결정하는 정수 Nice 값도 있습니다. 나이스가 0이면 스레드 우선 순위에 영향을 미치지 않습니다.
Positive Nice 는 최대 20까지 쓰레드의 우선 순위를 감소시키고, 그렇지 않으면 수신할 CPU 시간을 포기하게 합니다.
반면에 Negative Nice 는 최소 -20까지 다른 쓰레드에서 CPU 시간을 빼앗는 경향이 있습니다.
Nice가 높다 = 친절하다 = CPU를 잘 양도한다 Nice가 낮다 = 불친절하다 = CPU를 잘 뺏어온다

int thread_get_nice (void);
int thread_set_nice (int nice);

위와 같은 getter, setter 함수를 만들어야 합니다.

✒️ Calculating Priority 🧮

Our scheduler has 64 priorities and thus 64 ready queues, numbered 0 (PRI_MIN) through 63 (PRI_MAX). Lower numbers correspond to lower priorities, so that priority 0 is the lowest priority and priority 63 is the highest. Thread priority is calculated initially at thread initialization. It is also recalculated once every fourth clock tick, for every thread. In either case, it is determined by the formula:

priority = PRI_MAX - (recent_cpu / 4) - (nice * 2),

스케줄러는 64개의 우선 순위를 가지며, 따라서 64개의 대기열에 0(PRI_MIN)부터 63(PRI_MAX)까지 번호가 매겨집니다. 숫자가 작을수록 낮은 우선 순위에 해당하므로 우선 순위 0이 가장 낮고 우선 순위 63이 가장 높습니다. 쓰레드 우선 순위는 처음에 쓰레드 초기화 시 계산됩니다. 또한 매 4번째 클럭 틱마다 한 번씩, 모든 스레드에 대해 다시 계산됩니다.

where recent cpu is an estimate of the CPU time the thread has used recently (see below) and nice is the thread's nice value. The result should be rounded down to the nearest integer (truncated). The coefficients 1/4 and 2 on recent cpu and nice, respectively, have been found to work well in practice but lack deeper meaning. The calculated priority is always adjusted to lie in the valid range PRI_MIN to PRI_MAX.

This formula gives a thread that has received CPU time recently lower priority for being reassigned the CPU the next time the scheduler runs. This is key to preventing starvation: a thread that has not received any CPU time recently will have a recent cpu of 0, which barring a high nice value should ensure that it receives CPU time soon.

recent_cpu 는 스레드가 최근에 사용한 CPU 시간을 추정한 값이고 nice는 쓰레드의 nice 값입니다. 결과는 가장 가까운 정수로 내림해야 합니다 (소수점 버림). recent_cpu와 nice의 계수 1/4과 2는 각각 실제로는 잘 작동하지만 의미가 깊지 않습니다. 계산된 우선 순위는 항상 유효한 범위인 PRI_MIN ~ PRI_MAX에 놓이도록 조정됩니다.
🙋🏻'의미가 깊지 않다' 의 의미는, 그냥 여러 실험을 해본 결과 해당 계수가 최적의 계수로 판별됐으나 의미있는 숫자는 아니다 라는 뜻인 것 같다.

이 공식은 최근에 CPU 시간을 받은 스레드에 스케줄러가 다음에 실행할 때 CPU를 재할당하는 우선 순위를 낮춥니다. 이것은 Starvation 현상을 방지하는 핵심입니다. 최근에 CPU 시간을 받은 적이 없는 레드는 최근 CPU 값이 0이므로 nice 값이 높으면 곧 CPU 시간을 받을 수 있습니다.

💾 Calculating recent_cpu

We wish recent cpu to measure how much CPU time each process has received "recently." Furthermore, as a refinement, more recent CPU time should be weighted more heavily than less recent CPU time. One approach would use an array of n elements to track the CPU time received in each of the last n seconds. However, this approach requires O(n) space per thread and O(n) time per calculation of a new weighted average.

우리는 recent_cpu 가 각 프로세스가 "최근에" 얼마나 많은 CPU 시간을 받았는지 측정하기를 바랍니다.
최근에 CPU 시간을 받았다면, 오래 전에 CPU를 할당받았던 쓰레드 보다 더 무겁게 가중치를 부여받아야 합니다. 이를 달성하기 위해 n개의 요소로 구성된 배열을 사용하여 마지막 n초 각각에 수신된 CPU 시간을 추적할 수 있습니다. 그러나 이 접근법은 스레드당 O(n) 공간과 새로운 가중 평균 계산당 O(n) 시간이 필요합니다.
*결론 : 위의 선형탐색 법은 안쓰겠습니다. *

Instead, we use a exponentially weighted moving average, which takes this general form:

$$ x(0) = f(0) $$

$$ x(t) = a x(t - 1) + (1 - a) f(t) $$

$$ a = \frac{k}{k + 1} $$

where x(t) is the moving average at integer time t ≥ 0, f(t) is the function being averaged, and k > 0 controls the rate of decay. We can iterate the formula over a few steps as follows:

$$ x(1) = f(1) $$

$$ x(2) = a f(1) + f(2) $$

$$ \vdots $$

$$ x(n) = \sum_{i=1}^n a^{n-i} f(i) $$

$$ a = \frac{k}{k + 1} $$ 👨‍🏫 a = e^-k 이 되는 이유?!

The value of f(t) has a weight of 1 at time t, a weight of a at time t + 1, a^2 at time t + 2, and so on. We can also relate x(t) to k: f(t) has a weight of approximately 1/e at time t + k, approximately 1/(e^2) at time t + 2k, and so on. From the opposite direction, f(t) decays to weight w at time t + log_a(w).

The initial value of recent_cpu is 0 in the first thread created, or the parent's value in other new threads. Each time a timer interrupt occurs, recent_cpu is incremented by 1 for the running thread only, unless the idle thread is running. In addition, once per second the value of recent cpu is recalculated for every thread (whether running, ready, or blocked), using this formula:

recent_cpu = (2 * load_avg)/(2 * load_avg + 1) * recent_cpu + nice

where load_avg is a moving average of the number of threads ready to run (see below). If load_avg is 1, indicating that a single thread, on average, is competing for the CPU, then the current value of recent cpu decays to a weight of .1 in log_(2/3) .1 ≈ 6 seconds; if load avg is 2, then decay to a weight of .1 takes log_(3/4) .1 ≈ 8 seconds. The effect is that recent cpu estimates the amount of CPU time the thread has received "recently," with the rate of decay inversely proportional to the number of threads competing for the CPU.

Assumptions made by some of the tests require that these recalculations of recent cpu be made exactly when the system tick counter reaches a multiple of a second, that is, when timer_ticks () % TIMER_FREQ == 0, and not at any other time.

You may need to think about the order of calculations in this formula. We recommend computing the coefficient of recent cpu first, then multiplying. Some students have reported that multiplying load_avg by recent_cpu directly can cause overflow.

tick 마다 running 스레드의 recent_cpu 값이 1 증가합니다. (강의참조)

1초가 흐를 때 마다 모든 스레드의 recent_cpu 값을 다시 계산하여야 합니다.

일부 테스트에서 몇가지 가정을 했습니다. 시스템 틱 카운터가 1초의 배수에 도달했을 때, 즉 timer_ticks() % TIMER_FREQ == 0일 때 최근 CPU를 다시 계산해야 합니다.

이 공식의 계산 순서를 생각해 볼 필요가 있을 수 있습니다. 우리는 먼저 recent_cpu 의 계수를 계산한 다음 곱하기를 권장합니다. 몇몇 학생들은 load_avg에 recent_cpu를 직접 곱하면 오버플로우가 발생할 수 있다고 합니다.

Calculating load_avg

Finally, load avg, often known as the system load average, estimates the average number of threads ready to run over the past minute. Like recent_cpu, it is an exponentially weighted moving average. Unlike priority and recent_cpu, load_avg is system-wide, not thread-specific. At system boot, it is initialized to 0. Once per second thereafter, it is updated according to the following formula:

load_avg = (59/60) * load_avg + (1/60) * ready_threads,

where ready threads is the number of threads that are either running or ready to run at time of update (not including the idle thread).

Because of assumptions made by some of the tests, load_avg must be updated exactly when the system tick counter reaches a multiple of a second, that is, when timer_ticks () % TIMER_FREQ == 0, and not at any other time. You must implement thread_get_load_avg(), for which there is a skeleton in threads/thread.c.

load_avg 는 지난 1분 동안 실행할 준비가 된 스레드의 평균 수를 추정합니다. recent_cpu와 마찬가지로, exponentially 하게 가중된 이동 평균입니다. priority나 recent_cpu와 달리 load_avg 는 쓰레별이 아닌 시스템 전체에 걸쳐 있습니다. 시스템 부팅 시, load_avg 0으로 초기화됩니다. 그 후 1초에 한 번씩, load_avg 는 아래 공식에 따라 업데이트됩니다.

 load_avg = (59/60) * load_avg + (1/60) * ready_threads,

여기서 ready thread는 업데이트 시 실행 중이거나 실행할 준비가 된 스레드의 수입니다(유휴 스레드는 포함되지 않음).

일부 테스트에서 수행된 가정 때문에 load_avg는 시스템 틱 카운터가 1초의 배수에 도달할 때, 즉 타이머_ticks() % TIMER_FREQ == 0에 도달할 때 정확히 업데이트해야 하며 다른 시간에는 업데이트하지 않아야 합니다. thread.c에 있는 thread_get_load_avg (),을 구현해야 합니다.
*Returns 100 times the current system load average, rounded to the nearest integer. *

Fixed-Point Real Arithmetic

In the formulas above, priority, nice, and ready_threads are integers, but recent_cpu and load_avg are real numbers. Unfortunately, Pintos does not support floating-point arithmetic in the kernel, because it would complicate and slow the kernel. Real kernels often have the same limitation, for the same reason. This means that calculations on real quantities must be simulated using integers. This is not difficult, but many students do not know how to do it. This section explains the basics.

priority, nice, ready_thread 는 정수이지만 recent_cpu와 load_avg는 실수입니다. 핀토스는 커널에서 부동 소수점 산술을 지원하지 않는데, 이는 커널을 복잡하고 느리게 만들 것이기 때문입니다. 실제 커널은 같은 이유로 같은 제한을 가지는 경우가 많습니다. 이것은 실제 양에 대한 계산은 정수를 사용하여 모의실험을 해야 한다는 것을 의미합니다. 이 절에서는 기본 사항에 대해 설명합니다.

The fundamental idea is to treat the rightmost bits of an integer as representing a fraction. For example, we can designate the lowest 14 bits of a signed 32-bit integer as fractional bits, so that an integer x represents the real number x/(2^14). This is called a 17.14 fixed-point number representation, because there are 17 bits before the decimal point, 14 bits after it, and one sign bit. A number in 17.14 format represents, at maximum, a value of (2^31 − 1)/(2^14) ≈ 131,071.999.

기본적인 아이디어는 정수의 가장 오른쪽 비트를 분수를 나타내는 것으로 취급하는 것입니다. 예를 들어 부호가 붙은 32비트 정수의 가장 낮은 14비트를 분수 비트로 지정하여 정수 x가 실수 x/(2¹⁴)를 표현할 수 있습니다. 이것을 17.14 고정 소수점 앞에는 17비트가 있고 그 뒤에는 14비트가 있으며 부호 비트가 하나 있기 때문에 이것을 17.14 고정 소수점 숫자 표현이라고 합니다. 17.14 형식의 숫자는 최대 (2^³¹ - 1)/(2^¹⁴) ≈ 131,071.999의 값을 나타냅니다.

Suppose that we are using a p.q fixed-point format, and let f = 2^q. By the definition above, we can convert an integer or real number into p.q format by multiplying with f. For example, in 17.14 format the fraction 59/60 used in the calculation of load_avg, above, is (59/60)2^14 = 16,110. To convert a fixed-point value back to an integer, divide by f. (The normal / operator in C rounds toward zero, that is, it rounds positive numbers down and negative numbers up. To round to nearest, add f / 2 to a positive number, or subtract it from a negative number, before dividing.)

p와 q 고정 소수점 형식을 사용하고 있고, f = 2^q 라고 하자. 위의 정의에 의해, 우리는 f와 곱하여 정수 또는 실수를 p.q 형식으로 변환할 수 있습니다. 예를 들어, 17.14 형식 에서 load_avg 계산에 사용된 분수 59/60은 (59/60)2^14 = 16,110입니다. 고정 소수점 값을 정수로 다시 변환하려면 f로 나누십시오. (C의 정규 / 연산자는 0을 향해 반올림합니다. 즉, 양수와 음수를 반올림하기 전에 음수에 f / 2를 더하거나 음수에서 빼십시오.)

[PintOS] 1-2 Priority Scheduling

Thu, 16 May 2024 11:41:34 GMT

🐳 문제정의

현재상황

Pintos - thread - alarm_clock 까지 구현된 상태에서는 작업의 우선순위가 고려되지 않은 채 쓰레드 스케쥴링을 수행한다. ( thread_unblock, thread_set_priority )
여기서 만약 Priority에 따라 Ready_list를 정렬한다고 할지라도, Priority Inversion이 발생할 수 있기 때문에 공유자원에 대한 관리 또한 필요하다. (sema_up, sema_down)

Priority Inversion을 해결하기 위한 방법으로는 priority inheritance protocol에서 소개된 Priority Donation이 있습니다.

Donation을 할 때 고려해야할 3가지 케이스

1. Priority donation 2. Nested donation 3. Multiple donation

해결방안

thread_unblock, thread_yield, 그리고 set_thread_priority

1) thread_unblock()

void
thread_unblock (struct thread *t) {
    enum intr_level old_level;

    ASSERT (is_thread (t));

    old_level = intr_disable ();
    ASSERT (t->status == THREAD_BLOCKED);
    // list_push_back (&ready_list, &t->elem);
    list_insert_ordered(&ready_list, &t->elem, cmp_thread_priority,NULL);
    t->status = THREAD_READY;
    intr_set_level (old_level);
}

2) thread_unblock()

void
thread_yield (void) {
    struct thread *curr = thread_current ();
    enum intr_level old_level;

    ASSERT (!intr_context ());

    old_level = intr_disable ();
    if (curr != idle_thread) {
        // list_push_back (&ready_list, &curr->elem);
        list_insert_ordered(&ready_list, &curr->elem, cmp_thread_priority,NULL);
    }
    do_schedule (THREAD_READY);
    intr_set_level (old_level);
}

ready_list에 Thread를 삽입할 때, Priority를 기준으로 정렬된 형식을 유지하게 list_insert_orderded()를 사용합니다.

3) preempt()

void
preempt() {
    // 쓰레드를 선점할 수 있어야 합니다.
    // 현재 running 중인 thread와 ready 리스트에 있는 스레드 비교를 해야함
    struct thread *cur = thread_current();

    if (list_empty(&ready_list)) {
        return;
    }
    if (cur == idle_thread) {
        return;
    }
    // list_elem 타입의 '변수명' 이 elem 이므로.
    // list_front vs list_begin ;
    if (cur->priority < list_entry(list_front(&ready_list), struct thread, elem)->priority) { 
        thread_yield();
    }
}

void
thread_set_priority (int new_priority) {
    thread_current ()->original_priority = new_priority;
    update_donation(); // 추후 설명합니다
    preempt();
}

새로운 쓰레드가 ready_list에 추가되거나, 현재 쓰레드의 우선순위가 변경 됐을 때(thread_set_priority) 대기중인 쓰레드들과 우선순위를 비교합니다. 만약 더 높은 우선순위를 가진 쓰레드가 존재하면, thread_yield()를 실행하여 context-switching을 수행해줍니다.

4) 🐭 sema_up , sema_down

void
sema_up (struct semaphore *sema) {
    enum intr_level old_level;

    ASSERT (sema != NULL);

    old_level = intr_disable ();
    //sema->value++;
    if (!list_empty (&sema->waiters)){
        list_sort(&sema->waiters, cmp_thread_priority, NULL); // donate 받아서 순위가 변경된 상황 고려.
        thread_unblock (list_entry (list_pop_front (&sema->waiters),
                    struct thread, elem));
    }
    sema->value++;
    preempt(); // 락을 모두 사용하고, sema를 올려준 다음에 선점을 해주면 다음 우선순위의 쓰레드가 선점을 할 수 있습니다!
    intr_set_level (old_level);
}

void
sema_down (struct semaphore *sema) {
    enum intr_level old_level;

    ASSERT (sema != NULL);
    ASSERT (!intr_context ());

    struct thread *t = thread_current();

    old_level = intr_disable ();
    while (sema->value == 0) {
        // list_push_back (&sema->waiters, &thread_current ()->elem);
        list_insert_ordered(&sema->waiters, &t->elem, cmp_thread_priority, NULL);
        thread_block ();
    }
    sema->value--; // 이제 내가 lock을 차지하겠다! 
    intr_set_level (old_level);
}

sema_up에서 list_sort()를 해주는 이유와 preempt()를 해주는 이유

sema_up이 되는 상황은, thread가 공유자원을 활용한 작업을 마치고 lock을 반납하는 상황이다. 그렇기 때문에 우선순위가 가장 높은 thread가 CPU를 선점할 수 있도록 정렬을 해준 뒤 thread_unblock을 해준다. 그 뒤 preempt 함수를 실행해주면 priority가 높은 thread가 실행 될 수 있다.

5) condvar -> condition을 활용한 lock

-> 독산동학원가슬리퍼연쇄절도범 전낙타가 내 아이디어에 영감을 받아 팀원들과 작성한 figma -> https://github.com/jun9898

void
cond_wait (struct condition *cond, struct lock *lock) {
    struct semaphore_elem waiter;

    ASSERT (cond != NULL);
    ASSERT (lock != NULL);
    ASSERT (!intr_context ());
    ASSERT (lock_held_by_current_thread (lock));

    sema_init (&waiter.semaphore, 0);
//    list_insert_ordered(&cond->waiters, &waiter.elem, cmp_priority_by_sema , NULL);
    list_push_back(&cond->waiters, &waiter.elem); // 그냥 넣어줘도 됨.
    lock_release (lock);            // 소유권을 놔줌 == 락 릴리즈.
    sema_down (&waiter.semaphore);
    lock_acquire (lock); // 다시 취득해야함.
}

void
cond_signal (struct condition *cond, struct lock *lock UNUSED) {
    ASSERT (cond != NULL);
    ASSERT (lock != NULL);
    ASSERT (!intr_context ());
    ASSERT (lock_held_by_current_thread (lock));

    if (!list_empty (&cond->waiters)) {
        list_sort(&cond->waiters,cmp_priority_by_sema,NULL); // pop 할때 정렬이 안되어 있을 수 있나?
        sema_up(&list_entry (list_pop_front (&cond->waiters), struct semaphore_elem, elem)->semaphore);
    }
}

cond_wait 및 cond_signal 테스트 코드

Test code에서 cond_wait 의 동작 과정은 아래와 같다.

스레드가 lock 소유권을 획득한다.
이후 cond_wait 에서 Condition 을 위한 세마포어를 따로 생성해줍니다. 해당 세마포어는 생성할 때 부터 0 으로 초기화 되어 cond_signal 을 기다립니다.
그 후, lock 을 해제해주면서( cond의 lock 이 아님을 유의) 다른 스레드 가 1번의 과정을 반복한다. 이 때, lock을 release한 쓰레드는 cond_wait 함수에 진입하여 cond_signal을 기다린다.

Test code에서 cond_signal 의 동작 과정은 아래와 같다.

스레드가 lock 소유권을 획득한다.
이후 cond_signal 함수 에서 cond_wait에서 초기화 한 list를 호출합니다.
그 후, 조건에 맞고 대기중인 thread를 sema_up 을 통해 실행해줍니다.

6) donation

Implement priority donation. You will need to account for all different situations in which priority donation is required. Be sure to handle multiple donations, in which multiple priorities are donated to a single thread. You must also handle nested donation: if H is waiting on a lock that M holds and M is waiting on a lock that L holds, then both M and L should be boosted to H's priority. If necessary, you may impose a reasonable limit on depth of nested priority donation, such as 8 levels.

깃북 가이드에서 multiple donations 와 nested donation 케이스를 고려하여 구현해야 한다고 명시되어 있다.

*Nested donation *

void
donate(void) {
    struct thread *cur = thread_current();
    for (int i=0; i<8; i++) {
        if (cur->wait_on_lock == NULL) return;
        if (cur->priority > cur->wait_on_lock->holder->priority) {
            cur->wait_on_lock->holder->priority = cur->priority; // priority 변경
            cur = cur->wait_on_lock->holder; // cur = next(cur); 와 비슷한 느낌.
        }else {
            return;
        }
    }
}

donate 함수는 lock_acquired 내에서 만약 얻으려는 lock의 홀더가 있을때 작동한다.

*Multiple donation *

void 
update_donation() {
    struct thread *cur = thread_current();
    struct list_elem *e;

    int max_priority = 0;
    cur->priority = cur->original_priority;
    if (!list_empty(&cur->donation_list)) {
        list_sort(&cur->donation_list, cmp_priority_by_donation_elem, 0);
        e = list_front(&cur->donation_list);
        struct thread *max = list_entry(e, struct thread, donation_elem);
        max_priority = max->priority;
    }
    // list가 비어있는 경우를 생각해서, 미리 cur->priority = cur->original_priority 를 수행해놔야함.
    if (max_priority > cur->priority) {
        cur->priority = max_priority;
    }
}

update_donation 함수는 lock_release 함수 내에서 donation_list를 갱신한 뒤, 새로운 donation 값을 업데이트 해줍니다.

[PintOS] 1-2 Priority Scheduling 을 들어가기 전에...

Thu, 16 May 2024 04:01:08 GMT

💇🏼‍♀️ CPU 스케쥴링의 역사와 관련 기술

CPU 스케쥴링 기법은 시간이 지남에 따라 발전해 왔으며, 이는 SW 스레드 기법, 세마포어, 잠금, 뮤텍스 등의 발전과 밀접한 관련이 있습니다. 각 시대별로 개발된 기술과 역사를 살펴보겠습니다.

🙋‍♀️ 초기 단일 프로그램 환경 (1940년대 ~ 1950년대)

초기 컴퓨터 시스템에서는 한 번에 하나의 프로그램만 실행할 수 있었습니다. 이러한 환경에서는 CPU 스케쥴링이 필요하지 않았습니다. 프로그램이 실행되면 CPU를 점유한 채 종료될 때까지 계속 실행되었습니다.

👨🏼‍⚖️ 배치 시스템 (1960년대 초반)

배치 시스템은 여러 개의 프로그램을 순차적으로 실행할 수 있게 해주었습니다. 이때 가장 간단한 CPU 스케쥴링 기법인 FCFS(First-Come, First-Served) 방식이 사용되었습니다(1960년). 프로그램들은 순서대로 CPU를 할당받아 실행되었습니다.

🏃🏻‍♂️ 다중 프로그래밍 (1960년대 중반)

다중 프로그래밍 환경에서는 여러 개의 프로그램이 동시에 메모리에 상주할 수 있었습니다. 이때 CPU 스케쥴링 알고리즘이 필요해졌습니다. 초기에는 SJF(Shortest Job First, 1966년), SRTN(Shortest Remaining Time Next, 1968년) 등의 단순한 알고리즘이 사용되었습니다.
다중 프로그래밍 환경에서는 프로세스 간 동기화 문제가 발생했습니다. 이를 해결하기 위해 세마포어(semaphore, 1965년)와 뮤텍스(mutex)와 같은 기법이 도입되었습니다.

🥿 Plus!

초기 컴퓨터에서는 하나의 프로그램이 완전히 실행되기를 기다려야 했습니다. 그리고 그 후에 다음 프로그램이 실행될 수 있었습니다. 문제는 프로세서의 처리 속도와 입출력 장치의 속도가 현저하게 다르다는 것입니다. 이것은 입출력 작업이 완료될 때까지 프로세서가 대기해야 한다는 것을 의미합니다. 그렇기 때문에 한 프로그램이 입출력 작업을 마치기 전까지는 다음 프로그램이 실행될 수 없었습니다.

🍄 세마포어 (1965년)

세마포어는 공유 자원에 대한 접근을 제어하는 데 사용되는 동기화 기법입니다. 네덜란드 컴퓨터 과학자 에드스거 다이크스트라(Edsger Dijkstra)가 1965년에 제안했습니다. 세마포어는 카운터 값과 두 가지 원자적 연산(wait, signal)으로 구성됩니다.

🍈 뮤텍스 (1960년대 후반)

뮤텍스는 상호 배제(mutual exclusion)를 위한 동기화 기법입니다. 뮤텍스는 lock과 unlock 연산으로 구성됩니다. 한 번에 하나의 프로세스만 뮤텍스를 소유할 수 있어 공유 자원에 대한 독점적 접근이 가능합니다.

🦴 시분할 시스템 (1960년대 후반)

시분할 시스템은 다중 프로그래밍 환경에서 더 나아가 CPU 시간을 작은 단위로 나누어 여러 프로세스에게 할당하는 방식입니다. 이를 위해 라운드 로빈(Round-Robin) 스케쥴링 알고리즘이 도입되었습니다(1966년).
시분할 시스템에서는 스레드 개념이 등장했습니다(1960년대 후반). 스레드는 프로세스 내에서 실행 단위로, 프로세스의 자원을 공유하며 독립적으로 실행됩니다. 스레드 기법은 CPU 활용도를 높이고 응답성을 개선했습니다.

🧄 스레드 기법 (1960년대 후반)

스레드 기법은 프로세스 내에서 여러 개의 실행 흐름을 생성할 수 있게 해줍니다. 각 스레드는 독립적인 스택과 레지스터를 가지지만, 프로세스의 코드, 데이터, 파일 등의 자원을 공유합니다. 스레드 기법을 통해 병렬성과 응답성을 높일 수 있습니다.
스레드 간에도 동기화 문제가 발생할 수 있습니다. 이를 해결하기 위해 세마포어, 뮤텍스, 모니터(monitor, 1970년대 초반) 등의 기법이 사용됩니다.

🦍 현대 CPU 스케쥴링 알고리즘 (1970년대 ~ 현재)

현대 운영체제에서는 다양한 CPU 스케쥴링 알고리즘이 사용됩니다. 대표적인 알고리즘으로는 다음과 같은 것들이 있습니다.

우선순위 스케쥴링(Priority Scheduling, 1970년대 초반)
다단계 큐 스케쥴링(Multilevel Queue Scheduling, 1970년대 후반)
공정 Share 스케쥴링(Fair Share Scheduling, 1980년대 초반)
실시간 스케쥴링(Real-Time Scheduling, 1970년대 후반)

이러한 알고리즘들은 프로세스나 스레드의 우선순위, 실행 시간, 자원 사용량 등을 고려하여 CPU 자원을 효율적으로 할당하고 공정성을 유지하려고 합니다.

🍙 결론

CPU 스케쥴링 기법은 SW 스레드 기법, 세마포어, 잠금, 뮤텍스 등의 발전과 밀접하게 연관되어 있습니다. 초기의 단순한 환경에서 시작하여 점차 복잡해지면서 다양한 알고리즘과 기법들이 1960년대부터 현재까지 지속적으로 등장했습니다. 이를 통해 CPU 자원을 효율적으로 관리하고, 프로세스 및 스레드 간 동기화 문제를 해결할 수 있게 되었습니다.

[BoJ] 10972 다음 순열

Mon, 13 May 2024 02:53:07 GMT

https://www.acmicpc.net/problem/10972

접근법

1) Brute-Force

Brute-force 방식으로, 백트래킹을 이용한 접근법을 생각해봤으나 N이 10,000까지 가능하므로 무조건 시간초과가 납니다.

입력 : 첫째 줄에 N(1 ≤ N ≤ 10,000)이 주어진다. 
둘째 줄에 순열이 주어진다.

2) 규칙 찾기

2-1) 배열 뒤에서부터 탐색을 수행하며, 값이 작아지는 idx를 찾는다.

2-2) idx가 n과 동일하면, 해당 배열은 순열의 마지막이므로 -1을 출력하고 프로그램을 중료한다.

2-3) 배열의 뒤에서부터 반복문을 수행하며 arr[idx-1] 보다 큰 값을 발견하면 Swap을 수행한 뒤, idx를 기준으로 배열을 다시 정렬해준다.

전체 코드

//
//  main.cpp
//  10972
//
//  Created by Jun Hyeok Kim on 5/13/24.
//

#include 
#include 
#include 
#include 

using namespace std;

int n;
int arr[10000];


void doSomething() {

    int idx=n;

    for (int i=n-1; i>0; i--) {
        if (arr[i] > arr[i-1]) {
            idx=i;
            break;
        }
    }

    if (idx == n) {
        cout << -1;
        exit(0);
    }

    for (int i=n-1; i>= idx; i--) {
        if (arr[i] > arr[idx-1]) {
            swap(arr[i],arr[idx-1]);
            break;
        }
    }

    sort(arr+idx,arr+n);
}

int main() {
    cin >> n;

    for (int i=0; i> arr[i];
    }


    doSomething();
    for (int i=0; i


N,M을 활용한 수열 백트래킹에 익숙해진 나머지 규칙을 찾아내기가 조금 까다로웠다..!



OS 특강?
Sat, 11 May 2024 12:54:14 GMT
SW Design의 철학.
Abstraction

Design abstractions to use hardware
Define APIs for applications to use


Protection & Isolation
Contain malicious or buggy behaviors of applications
Protecting OS from malicious or buggy applications
Isolating one application from another


Sharing resources
Multiplex hardware resources



What is “Abstraction”?

The process or outcome of making something easier to understand by ignoring some of details that may be unimportant.

→ 추상화의 Target은 HW이다.

메모리의 추상화는 가상메모리
디스크의 추상화는 파일..!

OS designers’ first thought

No one want to write programs directly handling hardware details (easy to program)
To utilize hardware resources, OS has to run multiple applications (management unit of execution)
Protect applications from each other (protection unit of execution)

What is the conclusion?

Building an abstraction that gives an illusion that each application runs on a single machine
Let’s call it process (= executed application)

How to make it easy to use hardware?

OS designers build each abstraction of hardware resources and bind it to process
CPU -> Virtualizing CPU
Memory -> Virtual address space
Storage -> File


OS designers provide APIs to applications to use the abstractions

They call them as system calls
Process
What process abstracts?

Each process has its own view of ( Machines )
Own address space
Own virtual CPU
Own files



Nice clean abstraction!
The next question is how to design each abstraction?
Abstraction of address space

How to associate virtual address to physical address?
• Divide each physical memory to small chunk (called page)
• Create mapping from virtual to physical address

→ 은행계좌와 실제 현물을 상관지어보면 이해가 쉽네!
테이블 블록 단위를 4KB로 한것은 이게 효율이 좋게 나왔다함.
Virtual memory: Level of Indirection
Level of indirection
가상주소와 물리주소 매핑 → Page-Tabe 
Abstraction of address space

How to map virtual to physical address?
• Segmentation
• Paging (Single-level, multi-level …)
• Segmented paging

Level of Indirection happens!
MMU 칩이 알고리즘을 통해서 가상 주소를 물리 주소로 변환해줌.
DRAM이 너무 빨라서, HW fault 메커니즘이 나온겁니다. 그렇기 때문에 MMU칩을 따로 둔것입니다.
→ 1970년대 정립된 내용입니다.
Address translation: Paging

TLB caches page table entries
Translation lookaside buffer
페이지 테이블 생성.


Page number: logical address
Virtual Memory → Physical Memory로 변경


Frame number: physical address

Mechanism → HW (빠르고 일관된 일 수행)
Policy → SW (복잡한 로직 수행)
Mechanism vs Policy
code 안에서 Mechanism과 Policy를 분리해야합니다.
Abstraction of address space
Think about these questions

Where is the page tables stored?
What are role(s) of software (OS) for paging?
What are role(s) of hardware for paging?

When to allocate physical memory?

Demand paging
Application first accesses Virtual address of unallocated physical memory



→ Page fault handler and getting demanding page
Zero the page usually takes LONG time.
만약 page를 0로 설정하지 않으면, 이전에 있던 데이터를 읽을 수 있다는 뜻입니다. 따라서 Process간 protect가 되지 않습니다.
→ 그렇다면 malloc 에서는 왜 초기화를 안할까요?
→ Thread 끼리의 공유는 Process간 Protection에 위반하지 않으므로 괜찮습니다.
Zero the page file-backed  and Anonymous

file-backed → get the file
Anonymous → set it 0

Page fault handling
Two types of memory: ( Code + Data and Stack + Heap )
→ file-backed  and Anonymous
→ 이미 그 크기와 사이즈를 알고 있으므로.
→ mmap 은 Anonymous
Page fault handling
Processor signals controller

Read block of length P starting at disk
address X and store starting at memory
address Y

Read occurs

Direct Memory Access (DMA)
Under control of I/O controller

I / O controller signals completion

Interrupt processor
OS resumes suspended process

Abstraction of storage
파일은, 저장소의 논리적 단위! → 이것도 그냥 매핑임… 파일은 존재하지 않고 디스크에 산재해 있는 데이터에 접근하는 지도가 파일임.

File: a logical unit of storage

Identifier : pathname (path + filename)
Location of data is identified by ( offset )
OS subsystem maps the file to physical storage Let’s call it ( file System )


Analogy

Virtual memory is an abstraction of physical memory
Level of indirection: ( VA ) → ( PA )


File is an abstraction of physical storage
Level of indirection: ( path , offset) → (block Address)





File System: A Mapping Problem

 → a set of blocks


How to map file to storage media?
Divide a file to small chucks (called block)
Create mappings from each block to a storage location
(called block address)



→ 어떤 자료구조를 사용해서, 언제 Mapping을 할 것인가?
Abstraction of storage

How to create the mappings from file to storage?
File’s logical block -> Storage physical block



→ 파일 시스템의 디자인 (오래된)
Think about these questions!

Where are the internal nodes in the index? (memory?
storage? or both! )
Does hardware help for the indexing?
If yes, what is role(s) of hardware?
If not, why?
(내 생각에는 HW에게 있어서 인덱싱 처리가 힘들줄 알았는데, 교수님 말씀으로는 속도차이 떄문이라함)
미래에는, Storage가 DRAM처럼 빨라지면 HW로 indexing 처리를 할 수도 있음!




When to allocate physical block?
파일을 쓰고, ‘저장’ 할 때 해당 블록을 실제로 할당합니다.
어떤 system call 이 불릴까??  fsync()  크러쉬 컨시스턴시…


Any performance optimization for slow storage device?
DRAM caching



Storage stack overview
Think about these questions

Why VFS is required?
각 파일 시스템에 대해 호환되게 하기 위해서.


Why Generic Block Layer is required?
같은 인터페이스로 접근할 수 있음.


Where is IO queues implemented?
Generic Block Layer → 디바이스에도 종속되지 않으며, 파일 시스템에도 종속되지 않은 일반적은 I/O 큐 설계를 할 수 있는 위치임.


Where is page cache implemented?
VFS-POSIX API →



Page Cache



[PintOS] Alarm Clock 
Sat, 11 May 2024 12:47:17 GMT
구현목표
Busy wait 상태의 next tick 을 기다리는 쓰레드를 단순히 sleep 상태로 변경하여 Kernel ticks 와 idle ticks 변화를 확인합니다.
Busy wait

현재 위와 같이 yield가 지속적으로 일어나 CPU를 낭비하고 있다.
sleep

따라서, 위와 같이 ticks 만큼 sleep 상태에 놓음으로써 간단하게 해결할 수 있습니다.
sleep_list

따라서, sleep_list 구조체를 하나 선언한 뒤, thread_wakeup() 과 thread_sleep()을 통해 ready_list와 sleep_list를 관리해주면 됩니다!
성공 결과

위와 같이 550 idle ticks를 확인할 수 있다.
추가할 함수

thread_sleep() : The function that sets thread state to blocked and wait after insert it to sleep queue. 
thread_wakeup() : The function that find the thread to wake up from sleep queue and wake up it.
*set_global_ticks() : * The function that save the minimum value of tick that threads have.
*get_global_ticks() : * The function that return the minimum value of tick.

1) thread_sleep()
void
thread_sleep(int64_t howLong) {

    struct thread *curr;

    enum intr_level old_level;
    ASSERT(!intr_context());
    // 1) intr disable
    old_level = intr_disable(); 

    curr = thread_current();
    ASSERT(curr != idle_thread);

    curr->local_ticks = howLong;
//    list_push_back(&sleep_list, &curr->elem);
    list_insert_ordered(&sleep_list, &curr->elem, ticks_less ,&curr->local_ticks);
    set_global_ticks(); 

    thread_block();    // hread_block() 과 do_schedule 뭘 써야하지?

    // 2) intr able 
    intr_set_level(old_level);
}
2) thread_wakeup()
void
thread_wakeup(int64_t ticks) { // OS ticks from timer! 

    struct list_elem *e = list_begin (&sleep_list);
    struct thread *cur;
    struct list_elem *next;
    ASSERT (intr_context ());

    // ticks와 동일한 시간에 깨어나야 하는 쓰레드가 여러개 있을 수 있으니까 반복문으로 체크합니다.
      while (!list_empty(&sleep_list)) // incr 'e' inside of the loop by doing e = list_next;
    {
        e = list_begin(&sleep_list);
        cur = list_entry (e, struct thread, elem); // iterate the sleep list !

        // Time to wake up
        if (cur->local_ticks > ticks) break;
        // e == &cur->elem
        list_remove(e);  // e 갱신, //fix
        // next = list_pop_front(&sleep_list);
        thread_unblock(cur); // 이미 만들어진 함수에서 다 처리
        // list_push_back(&ready_list, next);
    }
    set_global_ticks();
}
3) set_global_ticks()
void set_global_ticks() {
    struct thread *tmp = list_min(&sleep_list,ticks_less,NULL);
    global_ticks = MIN(tmp->local_ticks,global_ticks);
}
4) get_global_ticks()
int64_t get_global_ticks(void) {
    return global_ticks;
}
수정한 함수
static void timer_interrupt()
static void
timer_interrupt (struct intr_frame *args UNUSED) {
    ticks++;
    thread_tick ();

    int64_t next = get_global_ticks();
    if (next <= ticks) { 
        thread_wakeup(ticks);
    }
    /*
        code to add : 
        check sleep list and the global tick.
        find any threads to wake up,
        move them to the ready list if needed
        update the global tick.
    */
}
내장함수 정리
-> 천천히 할 예정입니다...



CSAPP 컴파일 오류 
Tue, 07 May 2024 03:30:29 GMT
Problem
undefined reference to `Pthread_create'

collect2: error: ld returned 1 exit status
해당 코드를 실행하려고 하니 Pthread_create를 찾을 수 없다고 나온다.
분명 "csapp.h" 을 제대로 include 했는데... 링커는 이를 찾을 수 없다고 합니다ㅠ.
1) csapp를 Include 한다는 것
#include "csapp.h" 
를 한다는 것이 만능은 아니다. include를 하는 것은 컴파일러에게 function의 signatures(함수 이름, 리턴 타입, 매개변수 타입 등등)을 알려주는 것이며, 실제 구현된 코드를 제공하는 것은 아닙니다.
2) pthread 라이브러리
pthread 라이브러리는 standard C library가 아니기 때문에(즉 외부 라이브러리), linking phase에서 개별적으로 Linking을 해주어야 합니다.
3) Compilation Phase
컴파일 할 때, 컴파일러는 우리의 소스코드를 기계어로 번역합니다. 이 때, 컴파일러는 위에서 언급한 signatures(함수 이름, 리턴 타입, 매개변수 타입 등등)를 알아야 합니다. 그러나 실제 구현 코드가 이 시점에 필요한 것은 아닙니다! (linking phase 때 필요합니다.)
4) Linking Phase
ld error가 이 시점에서 발생한 것 입니다! 링커가 실제로 구현된 함수들을 찾을 수 없어서 에러를 나타는 것 이였습니다!

Solution
gcc -o sharing sharing.c csapp.o -lpthread
gcc
GNU 컴파일러를 호출합니다.
-o sharing
목작 파일의 이름을 명시합니다
sharing.c
우리의 소스파일 입니다
csapp.o
csapp.c에서 컴파일 된 목적파일 입니다. 이렇게 목적 파일을 명시함으로써 실행 가능한 파일을 만들 때 csapp.o를 추가하게 됩니다.
-lpthread
pthread 라이브러리가 linked 되는 것을 명시하며, -l 은 링커에게 즉시 이것을 포함시키도록 명령합니다.
결과적으로 gcc -o sharing sharing.c csapp.o -lpthread 옵션으로 컴파일하게 되면 정상적으로 동작하게 됩니다!



M1 Issue?
Thu, 02 May 2024 01:56:40 GMT
M 맥북 시리즈 에서는 왜 이렇게 점수 차이가 클까?

당연히 RISC 기반의 프로세서 위에서, Apple Rosetta 번역기를 통해 x86 프로그램을 구동하므로 성능저하가 있을 수 있습니다. 아무리 성능저하가 있을지라도, 이렇게 간단한 C 프로그램을 구동하는 데 왜 성능 차이가 이렇게 크게 나는걸까요??


가능한 상황들..?


에뮬레이션 오버헤드: x86 프로그램을 ARM 프로세서에서 실행하기 위해 Rosetta와 같은 에뮬레이션 기술을 사용하는 경우, 명령어 번역 및 실행에 따른 오버헤드가 발생합니다. 이는 원본 명령어를 번역하고 ARM 아키텍처에 맞게 실행하는 과정에서 발생하는 것으로, 이로 인해 성능 저하가 발생할 수 있습니다.
최적화 부재: Rosetta는 x86 명령어를 ARM 명령어로 번역하는 과정에서 최적화를 수행하지만, 이는 완벽하지 않을 수 있습니다. 따라서 특정 프로그램이나 작업에 대해서는 최적화가 충분하지 않을 수 있으며, 이로 인해 성능 저하가 발생할 수 있습니다.
메모리 및 캐시 관리: x86과 ARM 아키텍처는 서로 다른 메모리 및 캐시 구조를 가지고 있습니다. 따라서 x86 프로그램을 ARM 프로세서에서 실행할 때 메모리 및 캐시 관리에 대한 차이로 인해 성능 저하가 발생할 수 있습니다.
하드웨어 호환성: x86과 ARM 아키텍처는 하드웨어 수준에서도 차이를 가지고 있습니다. 따라서 x86 프로그램을 ARM 프로세서에서 실행하는 경우 하드웨어 호환성 문제로 인해 성능 저하가 발생할 수 있습니다.


위와 같이 몇몇 케이스로 나눠봤지만, 해답을 찾을 순 없었습니다. 그래서 중간중간 궁금해진 내용들을 찾아봤습니다.
Docker on Mac
최신 Mac App들은 Swift로 개발되는데, 어떻게 Docker와 같은 가상 환경을 구동하는지 궁금해졌습니다.

Apple에서는 위와 같은 가상환경 API를 제공하여 경량 가상 환경을 제공한다고 합니다. 아마 위 프레임워크를 사용하여 개발했지 않을까 추측해봅니다!
Apple Rosetta 2

Intel Instruction ( CISC ) 인스트럭션을 만나면, 로제타는 자동적으로 실행되고 (OS 단에서 실행되는 것 같음) 해당 인스트럭션을 ARM프로세서가 이해할 수 있는 RISC로 자동으로 번역합니다.
문뜩 든 생각..?
RISC <-> CISC 방식의 프로그램 구동 차이가 인스트럭션 셋에 의한 것 이라면, 번역기 (Apple Rosetta)는 단순히 번역만 해주는 것 일까?



[CSAPP] 6.3 The Memory Hierarchy

Mon, 29 Apr 2024 14:43:47 GMT
6.3.1 Caching in the Memory Hierarchy

In general, a cache (pronounced “cash”) is a small, fast storage device that acts as a staging area for the data objects stored in a larger, slower device. The process of using a cache is known as caching (pronounced “cashing”).
The central idea of a memory hierarchy is that for each k, the faster and smaller storage device at level k serves as a cache for the larger and slower storage device at level k + 1. In other words, each level in the hierarchy caches data objects from the next lower level. For example, the local disk serves as a cache for files (such as Web pages) retrieved from remote disks over the network, the main memory serves as a cache for data on the local disks, and so on, until we get to the smallest cache of all, the set of CPU registers.

일반론적으로, cache는 보다 크고 느린 디바이스에 저장된 데이터 객체를 위한 준비 영역으로 사용하는 작고 빠른 저장장치이다. 
메모리 계층구조의 핵심은, 더 빠르고 더 작은 저장 장치 k 라는 녀석이 k+1레벨의 저장장치에게 캐시 서비스를 제공해준다는 것이다. 


예를 들어서, 로컬 디스크는 원격 디스크인 네트워크에 대한 캐시이며, 메모리는 로컨 디스크에 대한 캐시입니다. 결과적으로 CPU에 대한 CPU register 캐시들이 있습니다!


It shows the general concept of caching in a memory hierarchy. The storage at level k + 1 is partitioned into contiguous chunks of data objects called blocks. Each block has a unique address or name that distinguishes it from other blocks. Blocks can be either fixed size (the usual case) or variable size (e.g., the remote HTML files stored on Web servers). For example, the level k + 1 storage in Figure 6.22 is partitioned into 16 fixed-size blocks, numbered 0 to 15.

위 그림은 캐시 메모리의 일반적인 계층구조를 나타냅니다.
k+1 레벨에 데이터 블록들이 있습니다. 각각의 블록에는 다른 블록과 구별 가능한 고유한 '주소'나 '이름' 이 있습니다.
k+1 레벨에는 0~15, 즉 16개의 블록들이 있습니다.

Similarly, the storage at level k is partitioned into a smaller set of blocks that are the same size as the blocks at level k + 1. At any point in time, the cache at level k contains copies of a subset of the blocks from level k + 1. For example, in Figure 6.22, the cache at level k has room for four blocks and currently contains copies of blocks 4, 9, 14, and 3.

이와 비슷하게, k 레벨에는 k+1 레벨에 있는 동일한 사이즈의 블록들이 있지만, 더 적은 수의 블록을 갖고 있습니다. 
어느 시점에서든지 , k 레벨에 있는 캐시는 k+1 레벨의 블록에 있는 카피된 데이터를 갖고 있습니다.

Data are always copied back and forth between level k and level k + 1 in block-size transfer units. It is important to realize that while the block size is fixed between any particular pair of adjacent levels in the hierarchy, other pairs of levels can have different block sizes. 

데이터는 항상 k ~ k+1 레벨에서 블록사이즈 단위로 이동합니다. 인접 레벨들간의 블록 사이즈는 동일하지만, 다른 레벨의 캐시들은 다른 블록 사이들을 갖을 수 있습니다.

For example, in Figure 6.21, transfers between L1 and L0 typically use word-size blocks. Transfers between L2 and L1 (and L3 and L2, and L4 and L3) typically use blocks of tens of bytes. And transfers between L5 and L4 use blocks with hundreds or thousands of bytes. In general, devices lower in the hierarchy (further from the CPU) have longer access times, and thus tend to use larger block sizes in order to amortize these longer access times.

예를 들어, L1L0 간의 데이터 이동에는 일반족으로 word-size 블록이 사용됩니다. 


**(L1L2), (L2L3), (L3L4)** 의 블록들은 수십개의 바이트들을이 옮겨집니다. 
*(L4~L5) (Main Memory <---> Local Disks ) * 는 수백 수천의 바이트들이 옮겨집니다.


일반론적으로, (CPU에서 멀리 떨어진) 계층에서 낮은 장치는 액세스 시간이 더 길기 때문에 이러한 더 긴 액세스 시간을 상각(부동산 용어)하기 위해 더 큰 블록 크기를 사용하는 경향이 있습니다.

In general, the storage devices get slower, cheaper, and larger as we move from higher to lower levels. At the highest level (L0) are a small number of fast CPU registers that the CPU can access in a single clock cycle. Next are one or more small to moderate-size SRAM-based cache memories that can be accessed in a few CPU clock cycles. 

일반론적으로, 메모리 구조에서 아래로 갈수록 더 느리고, 값은 싸지며, 크기는 커집니다. 가장 높은 레벨(L0)에는 CPU가 한 clock 사이클만에 도달할 수 있는 빠른 레지스터들이 존재합니다. 다음에는 몇 개의 CPU 클럭 사이클로 액세스할 수 있는 하나 이상의 소형 내지 중간 크기의 SRAM 기반 캐시 메모리입니다.

These are followed by a large DRAM-based main memory that can be accessed in tens to hundreds of clock cycles. Next are slow but enormous local disks. Finally, some systems even include an additional level of disks on remote servers that can be accessed over a network. 

그 후로는 큰 수십~수백 사이클 클록안에 접근할 수 있는 DRAM-베이스의 Main Memory가 있습니다. 그 이후로는 느리지만 거대한 로컬 디스크가 있으며, 마지막으로 네트워크로 접근 가능한 원격 서버들이 있습니다.

For example, distributed file systems such as the Andrew File System (AFS) or the Network File System (NFS) allow a program to access files that are stored on remote network-connected servers. Similarly, the World Wide Web allows programs to access remote files stored on Web servers anywhere in the world.

분산 파일 시스템인 Andrew File System(AFS) 혹은,
네트워크 파일 시스템인 Network File System(NFS)가 원격으로 연결된 서버에 있는 파일들을 접근할 수 있게 해줍니다.

Cache Hits
When a program needs a particular data object d from level k + 1, it first looks for d in one of the blocks currently stored at level k. If d happens to be cached at level k, then we have what is called a cache hit. The program reads d directly from level k, which by the nature of the memory hierarchy is faster than reading d from level k + 1. For example, a program with good temporal locality might read a data object from block 14, resulting in a cache hit from level k.

프로그램이 d 라는 특정한 데이터를 k+1레벨에서 요구한다면, 프로그램은 k 레벨에 있는 블록들을 찾아봅니다. 만약 k 레벨에 캐시가 되어있다면, d 는 k에 cache hit 됐다고 표현합니다.


프로그램은 k에서 d를 직접적으로 읽습니다. 

Cache Misses
If, on the other hand, the data object d is not cached at level k, then we have what is called a cache miss. When there is a miss, the cache at level k fetches the block containing d from the cache at level k + 1, possibly overwriting an existing block if the level k cache is already full.

데이터 d 가 레벨 k에서 캐시되지 않았다면, 우리는 이것을 cache miss 라고 부릅니다. cache miss가 일어나면, 캐시레벨 k 에서 k+1로 data를 fetch 합니다. 만약 k의 cache가 가득 찼다면 overwriting을 하야겠지요?

This process of overwriting an existing block is known as replacing or evicting the block. The block that is evicted is sometimes referred to as a victim block. The decision about which block to replace is governed by the cache’s replacement policy. For example, a cache with a random replacement policy would choose a random victim block. A cache with a least recently used (LRU) replacement policy would choose the block that was last accessed the furthest in the past.

이미 데이터가 있는 블록을 overwriting하는 것을  replacing 혹은 evicting 한다고 합니다. evicted 된 block을 victim block 이라고도 한답니다.
예를 들어, 랜덤 교체 정책은 랜덤하게 victim block을 정합니다. Least Recently Used 정책은 최근에 가장 적은 빈도로 사용된 블록을 제거합니다.

Kinds of Cache Misses
It is sometimes helpful to distinguish between different kinds of cache misses. If the cache at level k is empty, then any access of any data object will miss. An empty cache is sometimes referred to as a cold cache, and misses of this kind are called compulsory misses or cold misses. Cold misses are important because they are often transient events that might not occur in steady state, after the cache has been warmed up by repeated memory accesses.

만약 k 레벨의 캐시가 비어 있다면, 어떤 데이터 접근도 cache miss가 날 것 입니다. 빈 cache는 cold cache 라고 불리며 
이러한 종류의 miss 는 compulsory misses 혹은 cold misses라 불립니다.


cold cache는 캐시가 반복적인 메모리 액세스로 예열된 후(즉, 캐시에 데이터가 있는 상황) 정상 상태에서는 발생할 수 없는 일시적인 이벤트이기 때문에 중요합니다.

Whenever there is a miss, the cache at level k must implement some placement policy that determines where to place the block it has retrieved from level k + 1. The most flexible placement policy is to allow any block from level k + 1 to be stored in any block at level k. For caches high in the memory hierarchy (close to the CPU) that are implemented in hardware and where speed is at a premium, this policy is usually too expensive to implement because randomly placed blocks are expensive to locate.

cache miss가 있을 때마다 레벨 k의 캐시는 레벨 k + 1에서 검색한 블록을 배치할 위치를 결정하는 배치 정책을 구현해야 합니다.
가장 유연한 배치 정책은 레벨 k + 1의 모든 블록을 레벨 k의 모든 블록에 저장할 수 있도록 하는 것입니다. 하드웨어로 구현되고 속도가 매우 빠른 메모리 계층 구조가 높은 캐시의 경우 무작위로 배치된 블록을 찾는 데 비용이 많이 들기 때문에 이 정책을 구현하기에는 일반적으로 비용이 너무 많이 듭니다.

Restrictive placement policies of this kind lead to a type of miss known as a conflict miss, in which the cache is large enough to hold the referenced data objects, but because they map to the same cache block, the cache keeps missing. For example, in Figure 6.22, if the program requests block 0, then block 8, then block 0, then block 8, and so on, each of the references to these two blocks would miss in the cache at level k, even though this cache can hold a total of four blocks.

이런 종류의 제한적인 배치 정책은 충돌 미스(conflict miss)라고 알려진 유형의 미스로 이어지는데, 이 미스는 캐시가 참조된 데이터 객체를 보유할 만큼 충분히 크지만 동일한 캐시 블록에 매핑되기 때문에 캐시가 계속 누락됩니다. 
예를 들어 그림 6.22에서 프로그램이 블록 0을 요청하고, 블록 8를 요청하고, 0,8 등을 연속적으로 요청하면 이 캐시가 총 4개의 블록을 보유할 수 있음에도 불구하고 이 두 블록에 대한 각 참조가 k 수준의 캐시에서 누락됩니다.

Programs often run as a sequence of phases (e.g., loops) where each phase accesses some reasonably constant set of cache blocks. For example, a nested loop might access the elements of the same array over and over again. This set of blocks is called the working set of the phase. When the size of the working set exceeds the size of the cache, the cache will experience what are known as capacity misses. In other words, the cache is just too small to handle this particular working set.

프로그램은 종종 각 단계가 일정한 캐시 블록 세트에 액세스하는 단계(루프)의 시퀀스로 실행됩니다. 예를 들어, 중첩된 루프는 동일한 어레이의 요소에 계속해서 액세스할 수 있습니다. 이 블록 세트는 Working set 라고 불립니다. Working set의 크기가 캐시의 크기를 초과하면, 캐시는 용량 누락으로 알려진 것을 경험할 것입니다. 다시 말해서, 캐시는 이 특정 작업 세트를 처리하기에는 너무 작습니다.

Cache Management
As we have noted, the essence of the memory hierarchy is that the storage device at each level is a cache for the next lower level. At each level, some form of logic must manage the cache. By this we mean that something has to partition the cache storage into blocks, transfer blocks between different levels, decide when there are hits and misses, and then deal with them. The logic that manages the cache can be hardware, software, or a combination of the two.

앞서 언급했듯이 메모리 계층 구조의 본질은 각 레벨의 스토리지 장치가 다음 하위 레벨을 위한 캐시라는 것입니다. 각 레벨에서 어떠한 로직이 캐시를 관리해야 합니다. 이는 캐시 스토리지를 블록으로 분할하고, 여러 레벨 사이에 블록을 전송하고, 히트와 미스가 있을 때를 결정한 다음 이를 처리해야 한다는 것을 의미합니다. 캐시를 관리하는 논리는 하드웨어, 소프트웨어 또는 이 둘의 조합일 수 있습니다.

For example, the compiler manages the register file, the highest level of the cache hierarchy. It decides when to issue loads when there are misses, and determines which register to store the data in. The caches at levels L1, L2, and L3 are managed entirely by hardware logic built into the caches.

예를 들어 컴파일러는 캐시 계층 구조의 최상위 계층인 레지스터 파일을 관리합니다. 누락된 부분이 있을 때 로드를 발행할 시기를 결정하고 데이터를 저장할 레지스터를 결정합니다. L1, L2, L3 수준의 캐시는 캐시에 내장된 하드웨어 로직에 의해 전적으로 관리됩니다.

In a system with virtual memory, the DRAM main memory serves as a cache for data blocks stored on disk, and is managed by a combination of operating system software and address translation hardware on the CPU. For a machine with a distributed file system such as AFS, the local disk serves as a cache that is managed by the AFS client process running on the local machine. In most cases, caches operate automatically and do not require any specific or explicit actions from the program.

가상 메모리가 있는 시스템에서 DRAM 메인 메모리는 디스크에 저장된 데이터 블록의 캐시 역할을 하며 CPU의 운영 체제 소프트웨어와 주소 변환 하드웨어의 조합에 의해 관리됩니다. AFS와 같은 분산 파일 시스템이 있는 시스템의 경우 로컬 디스크는 로컬 시스템에서 실행되는 AFS 클라이언트 프로세스에 의해 관리되는 캐시 역할을 합니다. 대부분의 경우 캐시는 자동으로 작동하며 프로그램의 특정 작업이나 명시적인 작업을 필요로 하지 않습니다.


6.3.2 Summary of Memory Hierarchy Concepts
To summarize, memory hierarchies based on caching work because slower storage is cheaper than faster storage and because programs tend to exhibit locality:
Exploiting temporal locality.
Because of temporal locality,the same data objects are likely to be reused multiple times. Once a data object has been copied into the cache on the first miss, we can expect a number of subsequent hits on that object. Since the cache is faster than the storage at the next lower level, these subsequent hits can be served much faster than the original miss.

** 시간 지역성** 때문에, 동일한 데이터 객체들은 여러 번 재사용될 가능성이 높습니다. 첫 번째 cache miss에서 데이터 객체가 캐시에 복사되면, 우리는 그 객체에 대한 다수의 후속 히트들을 기대할 수 있습니다. 캐시가 다음 하위 레벨의 스토리지보다 빠르기 때문에, 이러한 후속 히트들은 원래 미스보다 훨씬 더 빠르게 서비스될 수 있습니다.

Exploiting spatial locality.
Blocks usually contain multiple data objects. Because of spatial locality, we can expect that the cost of copying a block after a miss will be amortized by subsequent references to other objects within that block.

블록은 일반적으로 여러 개의 데이터 객체를 포함합니다. 공간 지역성 때문에 누락된 후 블록을 복사하는 비용은 해당 블록 내의 다른 객체를 참조하여 상각(부동산 용어)될 것으로 예상할 수 있습니다.




[CSAPP] 6.2 Locality
Mon, 29 Apr 2024 11:30:17 GMT
Locality
Well-written computer programs tend to exhibit good locality. That is, they tend to reference data items that are near other recently referenced data items or that were recently referenced themselves. This tendency, known as the principle of locality, is an enduring concept that has enormous impact on the design and performance of hardware and software systems.

잘 짜여진 프로그램은 좋은 지역성을 갖는 경향이 있습니다. 이는 프로그램이 최근에 사용한 데이터가 최근에 참조됐거나, 그 참조된 데이터 주변에 있는 경향성을 띕니다.
이러한 경향성은 '지역성의 원리'로 알려져 있으며, 하드웨어와 소프트웨어 시스템의 성능과 설계에 엄청난 영향을 주는 지속적인 개념입니다.

Locality is typically described as having two distinct forms: temporal locality and spatial locality. In a program with good temporal locality, a memory location that is referenced once is likely to be referenced again multiple times in the near future. In a program with good spatial locality, if a memory location is referenced once, then the program is likely to reference a nearby memory location in the near future.

지역성은 일반론적으로 시간과 공간 두 가지의 구분되는 형식이 있습니다.


Temporal locality ** 은 한 번 참조된 지역은 가까운 시간 내에 여러번 참조되는 것을 나타냅니다.
**Spatial locality 는 한 번 참조된 메모리 지역의 근처를 가까운 시간 내에 참조하는 것을 나타냅니다.



Programmers should understand the principle of locality because, in general, programs with good locality run faster than programs with poor locality. All levels of modern computer systems, from the hardware, to the operating system, to application programs, are designed to exploit locality.

프로그래머는 지역성의 원리를 알아야 합니다. 일반론적으로, 좋은 지역성을 갖는 프로그램을 더 빨리 구동되기 때문입니다. 하드웨에어소 OS, 응용 프로그램들은 좋은 지역성을 활용하기 위해 설계되고 있습니다.

At the hardware level, the principle of locality allows computer designers to speed up main memory accesses by introducing small fast memories known as cache memories that hold blocks of the most recently referenced instructions and data items. At the operating system level, the principle of locality allows the system to use the main memory as a cache of the most recently referenced chunks of the virtual address space.

하드웨어 레벨에서, 지역성은 가장 최근에 사용된 데이터를 담는 캐시 메모리라고 불리는 작고 빠른 메모리를 도입함으로써 메인 메모리 접근속도를 높혔습니다.


운영체제 레벨에서는, 지역성의 원리는 메인 메모리를 가장 최근에 사용한 가상 주소 공간블록의 캐시로 사용될 수 있게 해줍니다.

Similarly, the operating system uses main memory to cache the most recently used disk blocks in the disk file system. The principle of locality also plays a crucial role in the design of application programs. For example, Web browsers exploit temporal locality by caching recently referenced documents on a local disk. 

이와 유사하게, OS는 메인 메모리를 가장 최근에 사용한 디스크의 파일 블록을 캐시하기 위해 사용합니다. 지역성의 원리는 프로그램 개발에 있어 중요한 역할을 합니다. 예를 들어, 웹 브라우저는 디스크 상의 가장 최근에 참조한 문서들을 캐싱해서 시간적 지역성을 활용합니다.

6.2.1 Locality of References to Program Data
Stride-1 (순차참조패턴)


위의 함수는 벡터의 원소들이 메모리에 저장된 것 과 같은 순서로 접근하기 때문에 좋은 지역성을 갖고 있다.



위의 함수는 배열이 한 row의 cols를 순서대로 접근하기 때문에 좋은 지역성을 갖고 있다.

stride-N


위의 함수는 배열이 한 col을 기준으로 rows를 접근하고 있기에 나쁜 지역성을 갖고 있다.

6.2.2 Locality of Instruction Fetches
Since program instructions are stored in memory and must be fetched (read) by the CPU, we can also evaluate the locality of a program with respect to its instruction fetches. For example, in Figure 6.17 the instructions in the body of the for loop are executed in sequential memory order, and thus the loop enjoys good spatial locality. Since the loop body is executed multiple times, it also enjoys good temporal locality. An important property of code that distinguishes it from program data is that it is rarely modified at run time. While a program is executing, the CPU reads its instructions from memory. The CPU rarely overwrites or modifies these instructions.

프로그램의 인스트럭션들은 메모리에 저장되고 CPU에 의해 선입(읽어야) 하기 때문에 인스트럭션 선입에 관한 프로그램의 지역성도 평가할 수 있다. 예를 들어, 위의 벡터를 탐색하는 예제를 다시 상기해보자. 해당 예제에서는 메모리 순서대로 실행되기 때문에 좋은 공간 지역성을 갖는다고 할 수 있으며, for문에 의해 여러번 실행되기에 좋은 시간 지역성을 갖고 있다.


코드와 프로그램 데이터를 구분짓는 가능 큰 특징은 코드는 런타임중 거의 수정되지 않는다. 프로그램이 실행되는 동안 CPU는 메모리로부터 인스트럭션을 읽는다. 이 때 CPU는 인스트럭션을 거의 수정하거나 지우지 않는다.




[CSAPP] 6.1 Storage Technologies
Mon, 29 Apr 2024 07:02:07 GMT
Storage Technologies
if you understand how the system moves data up and down the memory hierarchy, then you can write your application programs so that their data items are stored higher in the hierarchy, where the CPU can access them more quickly.

시스템이 어떻게 데이터를 메모리 계층구조에서 움직이는지 이해한다면, 메모리 상위 계층에 데이터가 저장되는 프로그램을 짤 수 있습니다. 따라서 CPU는 더 빨리 데이터에 접근이 가능합니다!

This idea centers around a fundamental property of computer programs known as locality. Programs with good locality tend to access the same set of data items over and over again, or they tend to access sets of nearby data items. Programs with good locality tend to access more data items from the upper levels of the memory hierarchy than programs with poor locality, and thus run faster. For example, on our Core i7 system, the running times of different matrix multiplication kernels that perform the same number of arithmetic operations, but have different degrees of locality, can vary by a factor of almost 40!

해당 아이디어는 '지역성' 이라고 알려진 컴퓨터 프로그램의 근본 특성에 중심을 둔다. 지역성 좋은 프로그램은 같은 set의 데이터들을 계속해서 접근하거나, 그 근처에 있는 데이터로 접근하는 경향을 보입니다. 
좋은 지역성을 갖는 프로그램은 메모리 상위 계층의 레벨에 있는 데이터에 더 자주 접근합니다. 

SRAM
SRAM stores each bit in a bistable memory cell. Each cell is implemented with a six-transistor circuit. This circuit has the property that it can stay indefinitely in either of two different voltage configurations, or states. Any other state will be unstable—starting from there, the circuit will quickly move toward one of the stable states. Such a memory cell is analogous to the inverted pendulum illustrated in Figure 6.1.

역진자가 어쩌고 저쩌고... 뭔소리야..!!! 이 부분은 넘어가기로 하고 SRAM이 왜 DRAM 보다 빠른지만 요약해보았다!!



Access Speed (액세스 속도): SRAM은 데이터에 액세스하기 위해 캐패시터 충전 및 방전과 같은 복잡한 프로세스를 거치지 않습니다. 대신에 SRAM은 간단한 논리 게이트 및 플립플롭으로 구성되어 있어 데이터에 더 빠르게 액세스할 수 있습니다.

Refresh Cycle (새로고침 주기): DRAM은 주기적인 새로고침이 필요합니다. 이는 캐패시터의 충전 상태가 시간이 지나면 약해져서 데이터를 잃을 수 있기 때문입니다. 하지만 SRAM은 새로고침이 필요하지 않습니다. 따라서 SRAM은 데이터에 대한 액세스를 위해 추가적인 시간을 소비하지 않아도 되므로 더 빠릅니다.

구조적 차이: SRAM은 플립플롭을 사용하여 각 비트를 저장하는 반면, DRAM은 캐패시터와 트랜지스터로 구성된 복잡한 회로를 사용합니다. SRAM의 간단한 구조는 데이터에 대한 액세스를 더 빠르게 만듭니다.


DRAM
DRAM stores each bit as charge on a capacitor. This capacitor is very small-typically around 30 femtofarads—that is, 30 × 10^-15 farads. Recall, however, that a farad is a very large unit of measure. DRAM storage can be made very dense— each cell consists of a capacitor and a single access transistor. Unlike SRAM, however, a DRAM memory cell is very sensitive to any disturbance. When the capacitor voltage is disturbed, it will never recover. Exposure to light rays will cause the capacitor voltages to change. In fact, the sensors in digital cameras and camcorders are essentially arrays of DRAM cells.


패러드(Farad)는 전기 용량을 측정하는 단위입니다. 이것이 "매우 큰 단위"라고 설명된 것은, 대부분의 전기 용량 측정에서 일반적으로 사용되는 단위가 마이크로패러드(Microfarad, 10^-6 패러드) 또는 나노패러드(Nanofarad, 10^-9 패러드) 수준이기 때문입니다. 따라서 패러드는 이러한 일반적인 단위보다 훨씬 큰 값으로 사용됩니다.

DRAM은 각각의 비트를 캐퍼시터에 저장합니다. DRAM은 매우 밀집되어 제작될 수 있으며 각각의 셀은 캐퍼시터와 접근 트랜지스터로 구성됩니다. SRAM과는 달리 DRAM은 작은 방해(?)에도 매우 민감하게 반응합니다. 캐퍼시터의 전압이 다운되면,  절대 복구가 되지 않습니다. 빛을 캐퍼시터에 쬐게하면 전압이 변할 수 있습니다. 

Conventional DRAMs

The cells (bits) in a DRAM chip are partitioned into d supercells, each consisting of w DRAM cells. A d × w DRAM stores a total of dw bits of information. The supercells are organized as a rectangular array with r rows and c columns, where rc = d. Each supercell has an address of the form (i, j), where i denotes the row and j denotes the column.

DRAM은 i,j의 2차원 배열로 이루어져 있으며 혼란을 피하기 위해 이 책에서는 i,j에 대응 되는 셀을 SuperCell 이라고 부르겠습니다!

For example, Figure 6.3 shows the organization of a 16 × 8 DRAM chip with d = 16 supercells, w = 8 bits per supercell, r = 4 rows, and c = 4 columns. The shaded box denotes the supercell at address (2, 1). Information flows in and out of the chip via external connectors called pins. Each pin carries a 1-bit signal. Figure 6.3 shows two of these sets of pins: eight data pins that can transfer 1 byte in or out of the chip, and two addr pins that carry two-bit row and column supercell addresses. Other pins that carry control information are not shown.

비트 정보는 핀(pin) 이라 불리는 외부 커넥터에 의해 칩과 메모리 사이를 이동합니다. 각각의 핀은 1비트 정보를 옮길 수 있습니다. 그림에서는 8개의 데이터 핀을 나타냅니다.

Each DRAM chip is connected to some circuitry, known as the memory controller, that can transfer w bits at a time to and from each DRAM chip. To read the contents of supercell (i, j ), the memory controller sends the row address i to the DRAM, followed by the column address j . The DRAM responds by sending the contents of supercell (i, j ) back to the controller. The row address i is called a RAS (row access strobe) request. The column address j is called a CAS (column access strobe) request. Notice that the RAS and CAS requests share the same DRAM address pins.

DRAM 칩은 각각의 DRAM 칩들로부터 비트 정보를 전송할 수 있는 Memory Controller와 연결되어 있습니다. 슈퍼셀(i,j)로부터 정보를 읽으려면, 메모리 컨트롤러는 행 주소 i를 DRAM에 보내고, 다음에 열 주소 j를 DRAM에 보냅니다. 


행 주소 i 는 RAS (Row Access Strobe) 라 불리고
열 주소 j 는 CAS (Column Access Strob)라 불립니다.


For example, to read supercell (2, 1) from the 16 × 8 DRAM in Figure 6.3, the memory controller sends row address 2, as shown in Figure 6.4(a). The DRAM responds by copying the entire contents of row 2 into an internal row buffer. Next, the memory controller sends column address 1, as shown in Figure 6.4(b). The DRAM responds by copying the 8 bits in supercell (2, 1) from the row buffer and sending them to the memory controller.

예를 들어서 , 슈퍼셀 (2,1)을 읽으려면, 


메모리 컨트롤러는 Row 주소2를 보냅니다. (RAS Request)
DRAM은 내부 로우 버퍼에 row2의 전체 내용을 복사합니다.
그 다음, 메모리 컨트롤러는 col1 주소를 보냅니다. (CAS Request)
DRAM은 행 버퍼에서 슈퍼셀(2,1)에 있는 8비트를 복사해서 메모리 컨트롤러에 보냅니다.

One reason circuit designers organize DRAMs as two-dimensional arrays instead of linear arrays is to reduce the number of address pins on the chip. For example, if our example 128-bit DRAM were organized as a linear array of 16 supercells with addresses 0 to 15, then the chip would need four address pins instead of two. The disadvantage of the two-dimensional array organization is that addresses must be sent in two distinct steps, which increases the access time.

128 비트 (2⁷)를 8로 나누면 16이다. 즉, 선형적 늘어진 16개를 구분하기 위해서는 4개의 주소 핀 (2⁴ = 16)이 필요하게 되는 것이다!
즉, DRAM 구조는 메모리 핀 주소의 갯수와 접근 타임에 대한 트레이드 오프를 고려하여 설계된 것이다.

Nonvolatile Memory
ROMs
DRAMs and SRAMs are volatile in the sense that they lose their information if the supply voltage is turned off. Nonvolatile memories, on the other hand, retain their information even when they are powered off. There are a variety of nonvolatile memories. For historical reasons, they are referred to collectively as read-only memories (ROMs), even though some types of ROMs can be written to as well as read. ROMs are distinguished by the number of times they can be reprogrammed (written to) and by the mechanism for reprogramming them.

DRAM, SRAM 모두 전원이 꺼지면 정보를 잃기 때문에 휘발성이다. 하지만 비휘발성 메모리는 그렇지 않다. 
ROM(Read Only Memory)는 쓰기가 되지만,  역사적인 이유로 ROMs 이라고 부르겠습니다! 
ROMs 은 재프로그램(쓰기)의 횟수에 따라서, 방법에 따라 구분됩니다.

Erasable Programmable ROM
An erasable programmable ROM (EPROM) has a transparent quartz window that permits light to reach the storage cells. The EPROM cells are cleared to zeros by shining ultraviolet light through the window. Programming an EPROM is done by using a special device to write ones into the EPROM. An EPROM can be erased and reprogrammed on the order of 1,000 times. An electrically erasable PROM (EEPROM) is akin to an EPROM, but it does not require a physically separate programming device, and thus can be reprogrammed in-place on printed circuit cards. An EEPROM can be reprogrammed on the order of 105 times before it wears out.

EPROM은 빛이 저장장치 셀에 도달할 수 있는 투명한 수정 윈도우를 갖고있다. EPROM 셀은 이 투명한 수정 윈도우를 통해 자외선을 비추면 0으로 지울 수 있다. EPROM을 프로그래밍 하는 것은 EPROM에 1을 쓸 수 있는 특별한 장치를 사용해서 이루어진다.


Electrically erasable PROM (EPROM)은 물리적으로 별도의 프로그램 장치를 필요로 하지 않고, PCB에서 직접 재프로그램 될 수 있다. 

Flash memory
Flash memory is a type of nonvolatile memory, based on EEPROMs, that has become an important storage technology. Flash memories are everywhere, providing fast and durable nonvolatile storage for a slew of electronic devices, including digital cameras, cell phones, and music players, as well as laptop, desktop, and server computer systems. In Section 6.1.3, we will look in detail at a new form of flash-based disk drive, known as a solid state disk (SSD), that provides a faster, sturdier, and less power-hungry alternative to conventional rotating disks.

플래시 메모리는 EEPROMs를 기반으로 한 비 휘발성 메모리의 한 종류로, 중요한 저장장치 기술이 됐습니다. 디지털 카메라, 휴대폰, 노트북, 데스크탑과 서버 컴퓨터 시스템 등 어디에서나 플래시 메모리를 찾을 수 있습니다.

Accessing Main Memory
Data flows back and forth between the processor and the DRAM main memory over shared electrical conduits called buses. Each transfer of data between the CPU and memory is accomplished with a series of steps called a bus transaction. A read transaction transfers data from the main memory to the CPU. A write transaction transfers data from the CPU to the main memory.

프로세서와 DRAM간의 데이터 이동은 BUS라고 불리는 전선에서 Bus Transaction을 통해서 이뤄집니다.
Read  : DRAM -> Processor 
write : Processor -> DRAM

A bus is a collection of parallel wires that carry address, data, and control signals. Depending on the particular bus design, data and address signals can share the same set of wires or can use different sets. Also, more than two devices can share the same bus. The control wires carry signals that synchronize the transaction and identify what kind of transaction is currently being performed. For example, is this transaction of interest to the main memory, or to some other I/O device such as a disk controller? Is the transaction a read or a write? Is the information on the bus an address or a data item?

버스는 주소, 데이터, 제어신호들을 운반하는 전선의 병렬 배치 입니다. 버스를 어떻게 설계하는지에 따라서 데이터와 주소 신호들은 같은 전선을 사용하거나, 다른 전선을 사용하게 됩니다. 


또한, 2개 이상의 기기에서 같은 버스를 공유할 수 있습니다. 
제어선들은 트랜잭션을 동기화하는 신호들을 운반하고, 어떤 트랙잭션이 현재 수행중인지 확인합니다.

 
예를 들어서, (signals from control wires)
해당 트랜잭션이 메인 메모리로 가는지, 아니면 disk contoller와 같은 I/O 디바이스인지?  혹은 read, write 인지? 혹은 주소인지 데이터 그 자체인지를 판별합니다.


It shows the configuration of an example computer system. The main components are the CPU chip, a chipset that we will call an I/O bridge (which includes the memory controller), and the DRAM memory modules that make up main memory. These components are connected by a pair of buses: a system bus that connects the CPU to the I/O bridge, and a memory bus that connects the I/O

CPU, I/O bridge(Memory Controller를 포함) 그리고 DRAM 메인 메모리 모듈이 주요 컴포넌트이다.


System bus : CPU와 I/O bridge를 연결하는 시스템 버스.
memory bus : I/O와 메인 메모리를 연결하는 메모리 버스.

The I/O bridge translates the electrical signals of the system bus into the electrical signals of the memory bus. As we will see, the I/O bridge also connects the system bus and memory bus to an I/O bus that is shared by I/O devices such as disks and graphics cards. For now, though, we will focus on the memory bus.

I/O 브릿지는 시스템버스의 전기적 신호를 메모리 버스의 전기적 신호로 번역해줍니다. I/O 브릿지는 시스템 버스와 메모리 버스를 디스크나 그래픽 카드 같은 입출력 장치들이 공유하는 I/O 버스로 연결한다.






Consider what happens when the CPU performs a load operation such as
movq A,%rax
where the contents of address A are loaded into register %rax. Circuitry on the CPU chip called the bus interface initiates a read transaction on the bus. The read transaction consists of three steps. First, the CPU places the address A on the system bus. The I/O bridge passes the signal along to the memory bus (Figure 6.7(a)). Next, the main memory senses the address signal on the memory bus, reads the address from the memory bus, fetches the data from the DRAM, and writes the data to the memory bus. The I/O bridge translates the memory bus signal into a system bus signal and passes it along to the system bus (Figure 6.7(b)). Finally, the CPU senses the data on the system bus, reads the data from the bus, and copies the data to register %rax (Figure 6.7(c)).

CPU가 아래의 load operation을 수행하는 상황을 가정해봅시다.

movq A,%rax


**프로세서는 Bus Interface를 호출하여 Read Transaction을 버스에 시행합니다.**
1. I/O bridge는 시그널을 메모리 버스로 보냅니다.
2. 메인 메모리는 address signal을 memory bus를 통해 감지합니다, memory bus로부터 address를 읽고 DRAM으로부터 데이터를 가져온 뒤 메모리 버스에 데이터를 write 합니다.
3.I/O bridge는 memory bus signal을 system bus signal로 번역 후 패스해줍니다.
4. 마지막으로, CPU는 system bus로부터 데이터를 감지하고, 버스로부터 데이터를 읽은 뒤 register %rax에 데이터를 카피합니다.

junhyeok_kim.log

tmp

CR3 레지스터와 문맥 전환

문맥 전환의 예

요약

[PintOS] Introduction to User Program - Virtual Memory

🎬 Introduction

🛎️ Background

Source Files

include/vm/vm.h, vm/vm.c

VM_UNINIT.c

vm/anon.c

vm/file.c

vm/inspect.c

Memory Terminology

Pages

KERN_BASE (0x8004000000) 🤔?

Frames

Page Tables

Swap Slots

🕹️ Resource Management Overview

Supplemental page table

Frame table

Swap table

Choices of implementation (Performance perspective)

🎁 Managing the Supplemental Page Table

🧹 Organization of Supplemental Page Table

❌ Handling page fault

🖼️ Managing the Frame Table

🗑️ Accessed and Dirty Bits

🚥 Managing the Swap Table

📂 Managing Memory Mapped Files

🔖 Sitation 📝

[CS-KEYWORD] 32bit OS 🆚 64bit OS

역사 및 발전

🕵 32비트와 64비트 운영체제 이해하기 🕵🏻‍♂️

1. 정의와 기본 개념

CPU bit width🤔?

2. 메모리 주소 지정

3. 레지스터 저장 및 데이터 처리

4. 성능 및 애플리케이션 호환성

5. 운영체제와 애플리케이션 제약

6. 32 bit OS 🔜 64 bit OS

요약

😊 도움을 준 고마운 사람들 🥳

[CS-KEYWORD] File discriptor

유닉스 시스템의 File discriptor 📁

File discriptor란?

File discriptor는 구체적으로 어떻게 쓰일까요? 🤔

File discriptor를 그림으로 이해해봐요 🎨 🤓

표준 파일 디스크립터 ✨

[CS-KEYWORD] User Stack and Kernel Stack

🤔 Question 🙋🏻

✅ Answer 👨‍🏫

⛹🏻‍♂️ User Stack : User Mode 프로세스의 작업 공간 🏋🏻

사용자 스택의 주요 구성 요소

🧱 블록 범위가 뭐지?!

User Stack 작업

💂🏻 Kernel Stack: Kernel Mode 작업의 토대 👮🏻‍♂️

커널 스택의 주요 구성 요소:

🤯 프로세스 디스크립터? 🔛

PID 🥊 🆚 🤜🏻 Process Discriptor

User Stack 🆚 Kernel Stack: 간단 요약

[CS-KEYWORD] Register vs Memory

Register

Memory

휘발성 메모리

비휘발성 메모리

🦾 결론

⚖️ 레지스터

🎛️ 메모리

🗂️ 테이블로 정리

[CS-KEYWORD] User Mode and Kernel Mode

User Mode

User Mode Program 🆚 User Mode Applications

공통점

차이점

1. 용어의 범위

2. 목적 및 사용 사례

3. 예시

[PintOS] 1-3 Advanced Scheduler (구현)

🙋🏻 왜 `MLFSQS like` 일까?

[PintOS] 1-3 Advanced Scheduler