File-System Implementation

toc {:toc}

File System Structure

Disk는 덮어쓰기, 랜덤 접근을 제공한다.
- I/O 전송은 하나 이상의 sector로 이루어진 block단위로 이루어진다. (512bytes or 4KB)
- 일반적으로 4KB
File system → secondary storage에 위치한다.
Mapping Logical to physical
Logical file system 은 metadata 정보를 다룬다.
Metadata : 구조 유지를 위한 정보. 실제 데이터가 아닌 파일 시스템 구조를 포함한다.
File Control Block (FCB)는 파일에 대한 정보를 포함한다.
File organization module → logical block과 파일에 대한 정보를 가지고 있다.
- 각 파일의 logical block ⇒ 0~N 으로 번호 매겨진다.
- Free-space manager를 통해 할당되지 않은 block을 추적한다.
Basic file system
- logical block 주소들에 대해 드라이브에 명령을 내린다.
- 메모리 버퍼와 캐쉬를 관리한다.
- Block I/O subsystem이라 불린다.
Device driver
- I/O device를 다룬다.
- 메인 메모리와 디스크 사이 정보를 전송한다.
- High level 명령이 주어지고 low level instruction을 출력한다.

Disk Structure

Partition : 디스크의 논리적 분할을 뜻한다. ⇒ 디스크를 분할한 것을 말함.
- Raw(w/o file system), Formatted(w/ file system)으로 사용될 수 있다.
Volumn : 하나의 file system의 logical area
- Device directory 또는 volume table of contents를 이용해 file system 정보를 기록한다.

File System Operation

Boot control block (per volumn) : OS를 boot하기 위한 정보를 포함한다.
- OS를 포함한 volumn의 경우 주로 volumn의 첫 block을 boot control block이 차지한다.
Volumn control block (per volumn) : volumn의 세부 정보를 포함한다.
- 총 block 수, free block 수, 크기, 등
Directory structure (per file system) : file을 관리한다.
File control block (per file) : file의 세부 정보를 포함한다.
Mount table : file system mount, mount point 등을 저장한다.
Directory structure cache : 최근 접근한 directory 정보를 저장한다.
System-wide open-file table : 시스템 전체의 열린 file의 정보를 포함한다.
Per-process open-file table : system-wide open-file table에서 프로세스가 연 pointer를 포함한다.

파일 open 과정

Open
디스크에서 디렉토리 구조를 가져온다.
디렉토리 구조에서 파일을 선택한다.
FCB를 메모리에 올린다.
처음 올리는 경우 system-wide open-file table에 entry를 추가한다.
Per-process open-file table에 해당 파일의 system-wide open-file table entry를 가리키는 pointer를 추가한다.

Directory Implementation

Linear List : Data block에 대해 (file name / pointer) 저장
- 프로그래밍 하기 쉽다.
- 시간 소요가 있다. (선형 복잡도)
- linked list 또는 B+ tree 이용해 알파벳순으로 정렬 가능
Hash Table : hash 데이터 구조를 가진 linear list
- Search time을 감소시킨다.
- Collision : 해쉬 함수에 의해 변환됐을 때 같은 위치에 2개의 파일이 위치하는 상황
  - 해결하기 위해 하나의 file만 저장할 수 있도록 구현 → Collision 발생 X
  - 같은 값을 가지면 linked list로 파일 연결

Contiguous Allocation Method

각 file이 contiguous block의 집합으로 점유한다.
간단하다.
(file, Start, Length) 구조
문제점
- 비어있는 공간을 찾는 과정(Free frame 찾는 것과 비슷하다. Free disk block 찾기)
- 동적 저장소 할당
- External fragmentation
⇒ 변형
처음에는 contiguous하게 저장
Hole이 생겼을 때 contiguous하게 저장한 후 저장하지 못했을 때 다음 hole에서 contiguous하게 저장

Linked Allocation Method

각 file이 block의 linked list로 구성되어 있다.
Block 안에 다음 block의 번호가 포함된다.
(file, Start, End) 구조
압축, External fragmentation 없다.
문제점
- 신뢰성이 낮다. ---⇒ 중간 정보가 유실되면 다음 block은 전부 접근 불가
- Block의 중간 위치를 알기 위해서는 disk seek 시간이 오래 걸린다.
File Allocation Table (FAT)
- File들이 연결된 번호들을 저장하고 있는 table이다.
- Table이 각 block에 대해 연결된 block number로 인덱싱된 하나의 entry만 갖는다.
- MS Dos 에서 사용됐다.

Indexed Allocation Method

각 block의 pointer를 저장하고 있는 index block를 지닌다.
(file, index block)

Performance of Allocation Methods

File access type에 따라 좋은 방법 다르다.
- Contiguous → sequential / random
- Linked → sequential / not random
만들 때 contiguous or linked를 선택한다.
Indexed가 더 복잡하다.
- 하나의 block을 읽으려해도 index block을 읽어야 하므로 추가 연산 발생
- ⇒ 여러 개 연속된 묶음으로 할당한다. ⇒ throughput 향상, CPU overhead 감소
NVM → disk head 없기 때문에 다른 알고리즘 필요

Free-Space Management

이용 가능한 block, cluster를 추적하는 free-space list를 사용한다.
Bit vector or bit map (n blocks)
block free : 1
block occupied : 0
word ⇒ bit를 일정한 크기로 나눈 것
(0으로 구성된 word) X (word 안의 bit의 수) + (1이 나올 때 까지의 bit 수) ⇒ free block
Contiguous file을 얻기 쉽다.
Linked List (Free List) : free block이 linked list로 연결되어 있다.
Grouping : n 개의 free block의 주소를 저장하도록 linked list 구성
Counting : (start addr, length)로 counting해 저장 [(0, 2), (6, 2), (14, 3) …]
Space Maps

Efficiency and Performance

Efficiency
- Fragmentation 발생하면 효율 감소
- Metadata 증가 ⇒ 실제 데이터 저장공간 감소 ⇒ 효율 감소
- Metadata 필요시 추가 ⇒ 빈공간 찾는 overhead ⇒ 효율 감소
- FCB 공간 미리 할당 ⇒ 안 쓰면 효율 감소
Performance
- data와 metadata가 같이 위치하도록 유지해 disk가 덜 움직이게 한다.
- Buffer cache : 자주 사용되는 block에 대한 메인 메모리 구역을 분리한다.
- Synchronous : 바로 disk에 작성
- Asynchronous : buffer에 놔두고 나중에 disk에 작성 (더 흔한 방식)
  - 모았다가 작성해서 더 빠르다. 계속 접근할 시간을 줄이기 때문.
- Free-behind / read-ahead : sequential access에 최적화
- Read가 주로 write보다 느리다.
  - read → 데이터를 가져와서 읽어야 한다.
  - write → 데이터 buffer에 저장해뒀다가 덮어씌우면 된다.

Page Cache

Virtual memory, address를 이동해 page를 저장한다.
디스크로의 접근보다 cache 접근이 더 빠르기 때문에 사용한다.
한 번 읽은 파일의 내용을 page cache 영역에 저장시켜 놓고 사용하는 방식 ⇒ 디스크 접근 감소
Memory-mapped I/O이 page cache 이용
File system의 routine I/O는 buffer cache 이용

Unified Buffer Cache

대부분의 block은 데이터를 저장하고 있는 block이다.
→ 파일의 내용이 buffer cache에 존재
→ 따라서 buffer cache에도 존재하고 page cache에도 존재하는 이중 cache문제 발생
→ 해결하기 위해 같은 page cache를 사용한다.

Recovery

Consistency checking : 디렉토리 구조에서 데이터가 동일한지 확인 / 동일하지 않으면 복구
Back up 해 놓고 복구한다.

Log Structured File Systems

Log structured(or journaling) : 파일의 메타데이터 업데이트를 기록한다.
모든 거래는 순차적으로 log에 써진다.
거래가 가끔 device나 disk section에 써지는 경우 있다.
log는 asynchronous하게 파일 시스템 구조에 반영된다.
- log기록만 되고 disk에 반영되지 않는 경우가 생길 수 있다.
파일 시스템이 충돌이 나면 log에서 남아있는 모든 거래는 처리되어야 한다.

🖥️ Techtellus

Explorer

File-System Implementation

File System Structure

Disk Structure

File System Operation

Directory Implementation

Contiguous Allocation Method

Linked Allocation Method

Indexed Allocation Method

Performance of Allocation Methods

Free-Space Management

Efficiency and Performance

Page Cache

Unified Buffer Cache

Recovery

Log Structured File Systems

Graph View

Table of Contents

Backlinks