• toc {:toc}

Introduction

LeNet-5๊ฐ€ ๋งŒ๋“ค์–ด์ง€๊ธฐ ์ „๊นŒ์ง€ ์†๊ธ€์”จ์— ๋Œ€ํ•œ ํŒจํ„ด ์ธ์‹์— ๋Œ€ํ•œ ์—ฐ๊ตฌ๊ฐ€ ๊ณ„์†ํ•ด์„œ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ์—ˆ๋‹ค. ์ „ํ†ต์ ์ธ ํŒจํ„ด์ธ์‹ ์‹œ์Šคํ…œ์˜ ๊ฒฝ์šฐ hand-designed feature extractor, fully-connected multi-layer classifier 2๊ฐ€์ง€ ๋ชจ๋“ˆ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋‹ค. ์ด ์‹œ์Šคํ…œ์€ ์—ฌ๋Ÿฌ ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค.
LeNet_traditional_model{: .center}


์ „ํ†ต์ ์ธ ์‹œ์Šคํ…œ์˜ ๋ฌธ์ œ์ 

1. input๊ณผ ๊ด€๋ จ๋œ ์ •๋ณด๋งŒ ์ถ”์ถœํ•œ๋‹ค.

hand-designed feature extractor์˜ ๊ฒฝ์šฐ input์— ์ ์ ˆํ•œ ์ •๋ณด๋งŒ ์„ ํƒํ•œ๋‹ค. ๋ถ€์ ์ ˆํ•œ ์ •๋ณด๋Š” ์ œ๊ฑฐํ•œ๋‹ค. ๋‹จ์ˆœํžˆ ๋ดค์„ ๋•Œ๋Š” ์™œ ๋ฌธ์ œ๊ฐ€ ๋˜๋Š”๊ฐ€ ์‹ถ์ง€๋งŒ input์— ์ ์ ˆํ•œ ์ •๋ณด๋งŒ ์„ ํƒํ•˜๋Š” ๊ฒƒ์€ ๊ฒฐ๊ตญ overfitting์„ ๋งŒ๋“ค๊ณ  ์ผ๋ฐ˜ํ™”๋ฅผ ์ด๋ฃจ์ง€ ๋ชปํ•œ๋‹ค. ๋•Œ๋ฌธ์— ๊ฐ ๊ฒฝ์šฐ์— ๋งž๋Š” ๋” ๋งŽ์€ input ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ•˜๊ฒŒ ๋œ๋‹ค.

feature extractor๋Š” ์„ค๊ณ„ํ•œ ์‚ฌ๋žŒ์ด ์ถ”์ถœํ•œ ์ •๋ณด๋งŒ classifier๋กœ ์ „๋‹ฌ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์ œํ•œ๋œ ์ •๋ณด๊ฐ€ ์ „๋‹ฌ๋œ๋‹ค. ๊ฒฐ๊ตญ ์ด์ƒ์ ์ธ ํ•™์Šต์€ feature extractor ์ž์ฒด์—์„œ ์ด๋ฃจ์–ด์ง€๋Š” ๊ฒƒ์ด๋ผ ๋งํ•œ๋‹ค.

2. parameter๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๋‹ค.

ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋Š” ๋ช‡ ๋ฐฑ๊ฐœ์˜ pixel์„ ๊ฐ€์ง„๋‹ค. fully-connected multi layer์—์„œ ์ฒซ ๋ฒˆ์งธ layer๋งŒ์„ ๋ณด๋”๋ผ๋„ ๋ช‡ ๋งŒ ๊ฐœ์˜ ๊ฐ€์ค‘์น˜๋ฅผ ํฌํ•จํ•œ๋‹ค. parameter๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒฝ์šฐ ์‹œ์Šคํ…œ์˜ capacity๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ๋” ํฐ training set์„ ์š”๊ตฌํ•œ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ๋งŽ์€ parameter๋ฅผ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋ฏ€๋กœ ๋ฉ”๋ชจ๋ฆฌ ํ• ๋‹น๋Ÿ‰๋„ ๋งค์šฐ ์ฆ๊ฐ€ํ•œ๋‹ค.

์†๊ธ€์”จ๋Š” ์—ฌ๋Ÿฌ ์Šคํƒ€์ผ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— input์— translation, distortion์ด ์ƒ๊ธด๋‹ค. fully-connected layer๋Š” ์ด๋Ÿฐ input์— ๊ด€๋ จ๋œ ๊ฒฐ๊ณผ๊ฐ’์„ ์ƒ์„ฑํ•˜๋„๋ก ํ•™์Šต๋œ๋‹ค. ๊ณตํ†ต์ ์œผ๋กœ ๋‚˜ํƒ€๋‚˜๋Š” ํŠน์ง•์ด๋”๋ผ๋„ ๋น„์Šทํ•œ ๊ฐ€์ค‘์น˜ ํŒจํ„ด๋“ค ๊ฐ€์ง„ ๋‹ค์ˆ˜์˜ ์œ ๋‹›์„ ํ†ตํ•ด ๊ฒฐ๊ณผ๊ฐ’์„ ์ถœ๋ ฅํ•œ๋‹ค. (์ค‘๋ณต์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๊ฒŒ ๋œ๋‹ค) ๋•Œ๋ฌธ์— ๋ชจ๋“  ๊ฐ€์ค‘์น˜ ํŒจํ„ด์„ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์€ ๋งŽ์€ ํ•™์Šต instance๋ฅผ ํ•„์š”๋กœ ํ•œ๋‹ค.

3. input์˜ topology๊ฐ€ ์™„์ „ํžˆ ๋ฌด์‹œ๋œ๋‹ค.

์ด๋ฏธ์ง€๋ฅผ fully-connected layer๋ฅผ ํ†ตํ•ด ๋‚˜์—ดํ•ด ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— 2D input์˜ ์œ„์น˜ ์ •๋ณด๋ฅผ ์ด์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค.


LeNet-5

Convolution Network

Convolution Network๋Š” 3๊ฐœ์˜ ์•„์ด๋””์–ด๋กœ ๊ฒฐํ•ฉ๋˜์–ด ์žˆ๋‹ค.

(local receptive fields, shared weights, sub-sampling)

1. local receptive fields

CNN์€ receptive field๋ฅผ local๋กœ ์ œํ•œํ•˜์—ฌ edge, end-point, corner์™€ ๊ฐ™์€ local feature๋ฅผ ์ถ”์ถœํ•œ๋‹ค. ์ถ”์ถœ๋œ feature๋“ค์€ ๋‹ค์Œ layer์˜ ์ž…๋ ฅ์ด ๋˜์–ด ๊ณ ์ฐจ์›์˜ ํŠน์ง•์„ ์ถ”์ถœํ•œ๋‹ค.

์ด๋ฅผ ํ†ตํ•ด input์ด distortion์ด๋‚˜ shift๊ฐ€์ด๋ฐœ์ƒํ•˜๋”๋ผ๋„ ๋น„์Šทํ•œ ํŠน์ง•์„ ๊ฐ–๋Š” ๋ฐฐ์—ด์ด receptive field์— ์ถ”์ถœ๋˜๋ฉด ํ•ด๋‹น ํŠน์ง•์„ ๋ฐ˜์˜ํ•œ feature map์„ ๋งŒ๋“ค ์ˆ˜ ์žˆ๋‹ค.

ํ•ด๋‹น local์— ๋Œ€ํ•œ ๊ณ„์‚ฐ์„ ์ง„ํ–‰ํ•˜๊ธฐ ๋•Œ๋ฌธ์— parameter์˜ ์ˆ˜๋„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

2. shared weight

CNN์€ ๋™์ผํ•œ weight์™€ bias๋ฅผ ๊ณต์œ ํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค. ์ด๋ฅผ ์ด์šฉํ•˜์—ฌ shift์— ๋Œ€ํ•œ ๋ถˆ๋ณ€์„ฑ์„ ์–ป๋Š”๋‹ค.

โ†’ CNN์€ input์˜ ์ฒ˜์Œ๋ถ€ํ„ฐ ๋๊นŒ์ง€ ์ด๋™ํ•˜๋ฉฐ feature map์„ ๋งŒ๋“ ๋‹ค. ์ด ๋•Œ ๊ฐ€์ค‘์น˜๋Š” ์ด๋™ํ•  ๋•Œ ๋™์ผํ•œ weight์™€ bias๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์œ„์น˜๊ฐ€ ๋ฐ”๋€Œ๋”๋ผ๋„ ๋™์ผํ•œ ํŠน์ง•์„ ์ถ”์ถœํ•œ๋‹ค.

์ด๋ ‡๊ฒŒ shared weight๋ฅผ ํ†ตํ•ด์„œ ๊ณ„์‚ฐ capacity๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๊ณ  ํ•™์Šตํ•  parameter์˜ ์ˆ˜๋„ ์ค„์—ฌ์ค€๋‹ค.

๋‹ค๋ฅธ ๋…ผ๋ฌธ์—์„œ ์ฆ๋ช…๋œ ์ด๋ก ์— ๋”ฐ๋ฅด๋ฉด

: test set์˜ ์˜ค์ฐจ์œจ

: train set์˜ ์˜ค์ฐจ์œจ

: ์ƒ์ˆ˜

: measure of effective capacity or complexity of the machine

: ์ƒ์ˆ˜

: training sample ์ˆ˜

๊ฐ€ ์„ฑ๋ฆฝํ•˜๊ธฐ ๋•Œ๋ฌธ์— capacity๊ฐ€ ์ค„์–ด๋“ค๋ฉด์„œ ๋‘ set์˜ ์˜ค์ฐจ์œจ๋„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค. โ‡’ Overfitting์„ ๋ฐฉ์ง€

3. sub-sampling

๋…ผ๋ฌธ์—์„œ ๋งํ•˜๋Š” sub-sampling์€ average pooling๊ณผ ๊ฐ™๋‹ค.

ํ•œ ๋ฒˆ local feature๊ฐ€ ์ถ”์ถœ๋˜๋ฉด ์ •ํ™•ํ•œ ์œ„์น˜ ์ •๋ณด๋Š” ์ ์  ๋œ ์ค‘์š”ํ•ด์ง„๋‹ค. ๊ฐ ์œ„์น˜์— ๋Œ€ํ•œ ์ •ํ™•ํ•œ ์œ„์น˜๋Š” ํŒจํ„ด ์ธ์‹๊ณผ ๋ฌด๊ด€ํ•˜๊ณ  ์ž…๋ ฅ๊ฐ’์— ๋”ฐ๋ผ ํŠน์ง•์ด ๋‚˜ํƒ€๋‚˜๋Š” ์œ„์น˜๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์ž ์žฌ์ ์œผ๋กœ ์œ ํ•ดํ•˜๋‹ค๊ณ  ๋งํ•œ๋‹ค.

์œ„์น˜์— ๋Œ€ํ•œ ์ •ํ™•๋„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š” ๊ฐ„๋‹จํ•œ ๋ฐฉ๋ฒ•์€ feature map์˜ ํ•ด์ƒ๋„๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋Š” ๊ฒƒ์ด๋‹ค. local averaging๊ณผ sub-sampling์„ ์ˆ˜ํ–‰ํ•˜๋Š” sub-sampling layer๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•ด์ƒ๋„๋ฅผ ๊ฐ์†Œ์‹œ์ผœ shift์™€ distortion์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ์˜ ๋ฏผ๊ฐ๋„๋ฅผ ์ค„์ธ๋‹ค.

์œ„์น˜ ์ •๋ณด๋ฅผ ๊ฐ์†Œ์‹œํ‚ค๋ฉฐ ๋ฐœ์ƒํ•˜๋Š” ์†์‹ค์€ ๋” ๋งŽ์€ filter๋ฅผ ์‚ฌ์šฉํ•ด ๋” ๋‹ค์–‘ํ•œ feature๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ƒํ˜ธ๋ณด์™„ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค.



## LeNet-5์˜ ๊ตฌ์กฐ

Input โ†’ Conv1 โ†’ Subsampling2 โ†’ Conv3 โ†’ Subsampling3 โ†’ Conv5 โ†’ FC โ†’ FC โ†’ Output LeNet{: .center}

Input

  • size : 32x32x1

์‹ค์ œ ๋ฌธ์ž ์ด๋ฏธ์ง€๋Š” 28x28x1์˜ ํ˜•ํƒœ์ด๊ณ  20x20 ํฌ๊ธฐ์˜ ์ˆซ์ž๊ฐ€ ์ค‘์•™์— ์žˆ๋‹ค. ํฌ๊ธฐ๋ฅผ ํ‚ค์›€์œผ๋กœ์จ receptive field ์ค‘์•™ ๋ถ€๋ถ„์— corner, edge๊ฐ™์€ ํŠน์ง•๋“ค์ด ๋‚˜ํƒ€๋‚˜๊ธธ ๊ธฐ๋Œ€ํ–ˆ๋‹ค.

Conv1

  • size : 5x5
  • kernel = 6
  • stride = 1
  • output : 28x28x6
  • 156๊ฐœ์˜ trainable parameters, 122,304 connections

Subsampling2

  • size : 2x2
  • kernel = 6
  • stride = 2
  • output : 14x14x6
  • 12๊ฐœ์˜ trainable parameters, 5,880 connections

Conv3

  • size : 5x5
  • kernel = 16
  • stride = 1
  • output : 10x10x16
  • 1,416๊ฐœ์˜ trainable parameters, 151,600 connections

Subsampling4

  • size : 2x2
  • kernel = 16
  • stride = 2
  • output : 5x5x16
  • 32๊ฐœ์˜ trainable parameters, 2,000 connections

Conv5

  • size : 5x5
  • kernel = 120
  • stride = 1
  • output : 1x1x120
  • 10,164๊ฐœ์˜ trainable parameters

FC6

  • tanh๋ฅผ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
  • input = 120
  • output = 84

โ‡’ ๊ฐ๊ฐ์˜ ๋ฌธ์ž๊ฐ€ 7x12ํฌ๊ธฐ์˜ ๋น„ํŠธ๋งต์ธ ASCII set์„ ํ•ด์„ํ•˜๊ธฐ ์ ํ•ฉํ•œ ํ˜•ํƒœ๋กœ ์‚ฐ์ถœ๋  ๊ฒƒ์„ ๊ธฐ๋Œ€ํ•˜๊ณ  output์„ 84๋กœ ์„ ํƒํ–ˆ๋‹ค.

FC7

  • RBF(Euclidean Radia Basis Function unit)์„ ํ™œ์„ฑํ™” ํ•จ์ˆ˜๋กœ ์‚ฌ์šฉํ•œ๋‹ค.
  • input = 84
  • output = num_classes = 10

Loss function

  • MSE(Mean Squared Error, ํ‰๊ท  ์ œ๊ณฑ ์˜ค์ฐจ)๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.


์ฐธ๊ณ 

[1] http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

[2] https://deep-learning-study.tistory.com/368