• toc {:toc}

Overview of SPPnet

SPPnet์€ ๊ธฐ์กด CNN(e.g. AlexNet)์˜ ์ด๋ฏธ์ง€ ์ž…๋ ฅ์„ 224x224์™€ ๊ฐ™์ด ๊ณ ์ •์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•˜๋Š” ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๋Š” Spatial Pyramid Pooling(SPP) ๋ฐฉ์‹์„ ์ œ์‹œํ–ˆ๋‹ค. SPP๋ฅผ ํ†ตํ•ด ์ž…๋ ฅ์˜ ํฌ๊ธฐ๊ฐ€ ์–ด๋–ป๋“  ๊ฐ„์— ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ ํ‘œํ˜„์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

image

  • SPPnet์˜ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
  1. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ CNN์— ๋„ฃ๋Š”๋‹ค.
  2. CNN์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ ํŠน์ง•๋งต์— Spatial Pyramid Pooling layer๋ฅผ ์ ์šฉํ•œ๋‹ค.
  3. Spatial Pyramid Pooling layer๋ฅผ ํ†ตํ•ด ๋‚˜์˜จ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ ํ‘œํ˜„์„ fully-connected layers์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

Problem

image

  • ์ด์ „ CNN ๊ตฌ์กฐ์˜ ๊ฒฝ์šฐ ์ž…๋ ฅ ํฌ๊ธฐ๊ฐ€ 224x224๋กœ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋น„์œจ๊ณผ ํฌ๊ธฐ๊ฐ€ ์ œํ•œ๋˜์—ˆ๋‹ค.
  • ๋น„์œจ, ํฌ๊ธฐ ์ œํ•œ์€ ์ด๋ฏธ์ง€์˜ ์™œ๊ณก, ์ •๋ณด์˜ ์™œ๊ณก, ์†์‹ค์„ ๋งŒ๋“ค๊ณ  ์ •ํ™•๋„์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋‹ค.

CNN ๊ตฌ์กฐ์—์„œ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ๊ฐ€ ํ•„์š”ํ•œ ๋ถ€๋ถ„์€ fc layer๊ฐ€ ์‚ฌ์šฉ๋  ๋•Œ์ด๋ฏ€๋กœ SPPnet์—์„œ๋Š” conv layers๋Š” ์ด๋ฏธ์ง€ ํฌ๊ธฐ์— ๊ด€๊ณ„์—†์ด ์‚ฌ์šฉํ•˜๊ณ , fc layer ์ด์ „์— SPP๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์ฃผ๋Š” ๊ตฌ์กฐ๋ฅผ ์ทจํ•œ๋‹ค.


Main Idea - Spatial Pyramid Pooling

Spartial Pyramid Pooling์˜ ๋ฐฉ์‹์€ ํŠน์ง•์„ ํ•จ๊ป˜ ๋ชจ์œผ๋Š” BoW(Bag of Words) ๋ฐฉ์‹์—์„œ ์ฐฉ์•ˆํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

Why is Conv layer fine?

image

  • Conv layer 5๊ฐœ, Fully-connnected layer 2๊ฐœ๋กœ ๊ตฌ์„ฑ๋œ CNN์„ ๊ฐ€์ •ํ•˜์ž.
  • ์œ„ ๊ทธ๋ฆผ์€ (a) Pascal VOC 2007 ์ด๋ฏธ์ง€, (b) conv5์˜ ํŠน์ง•๋งต, (c) conv5์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€๋„ท ์ด๋ฏธ์ง€์˜ ๋ฐ˜์‘ ์ •๋„๋ฅผ ํ‘œ์‹œํ•œ ๊ทธ๋ฆผ์ด๋‹ค.

(a), (b)๋ฅผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋ฌผ์ฒด์˜ ํŠน์ง•, ํ˜•์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ธ ๋ถ€๋ถ„์— ๊ฐ•ํ•˜๊ฒŒ ํ™œ์„ฑํ™”๊ฐ€ ๋˜์–ด ์žˆ๋Š” ๋ถ€๋ถ„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋•Œ๋ฌธ์— conv layer ์•ž์—์„œ ๋ฏธ๋ฆฌ ์ด๋ฏธ์ง€๋ฅผ ์กฐ์ •ํ•  ํ•„์š”์—†์ด conv๋ฅผ ๊ฑฐ์นœ ํ›„ ํŠน์ง•๋งต์„ ์‚ฌ์šฉํ•˜๋”๋ผ๋„ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ๊ฐ™์€ ํšจ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆํ•œ๋‹ค.

Spatial Pyramid Pooling

image

SPP๋Š” spatial bins์„ ์„ค์ •ํ•œ๋‹ค. Bin์€ ์œ„ ๊ทธ๋ฆผ์˜ 4x4, 2x2, 1x1์˜ ์‚ฌ๊ฐํ˜• ํ•˜๋‚˜๋ฅผ ๋งํ•œ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” bin์˜ ๊ฐœ์ˆ˜๋Š” 21๊ฐœ์ด๋‹ค. ์ดํ›„, ํ•ด๋‹น bin์˜ ๊ฐœ์ˆ˜์— ๋งž๋Š” pooling ๊ฐœ์ˆ˜๊ฐ€ ์‚ฐ์ถœ๋˜๊ณ (e.g. 21bin์ผ ๋•Œ 4x4, 2x2, 1x1 pooling) ๊ฐ pooling์— ๋Œ€ํ•ด window size์™€ stride๋ฅผ ์กฐ์ •ํ•ด ๊ณ ์ • ํฌ๊ธฐ์˜ pooling ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ฆ‰, poolingํ•˜์—ฌ ๋‚˜์˜ฌ ์ถœ๋ ฅ์„ ๊ณ ์ •ํ•˜๊ณ  ์ด๋ฏธ์ง€๊ฐ€ ํ•ด๋‹น ํฌ๊ธฐ๋กœ ์ถœ๋ ฅ๋˜๋„๋ก ์„ค์ •ํ•œ๋‹ค. ์ด๋•Œ์˜ pooling์€ MaxPooling์ด๋‹ค.

  • Window_size = Ceil(feature map size / pooling size)
  • Stride = Floor(feature map size / pooling size)
  • ex) 13x13 feature map, 3x3 pooling์„ ํ•˜๋Š” ๊ฒฝ์šฐ
    • window size = ceil(13/3) = 5
    • stride = floor(13/3) = 4
    • ๋‹ค์Œ์„ ํ†ตํ•ด 3x3 pooling ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค.
    • Window_size์™€ Stride๋Š” adaptiveํ•˜๊ฒŒ ์„ค์ •๋œ๋‹ค.

์ดํ›„ pooling ์ถœ๋ ฅ์„ concat, flatten ์‹œํ‚ด์œผ๋กœ์จ ํ•˜๋‚˜์˜ ๊ณ ์ •๋œ ๊ธธ์ด์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

image

SPP๋ฅผ ํ†ตํ•ด 4x4, 2x2, 1x1 pooling ํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ํ‘œํ˜„ํ•œ ๊ทธ๋ฆผ์ด๋‹ค. ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ์ถ”์ถœ์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ๋‹ค์–‘ํ•œ ๊ณต๊ฐ„์  ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ํฐ pooling ๊ฒฐ๊ณผ๋Š” ๊ด‘๋ฒ”์œ„ํ•œ ์ •๋ณด, ์ž‘์€ pooling ๊ฒฐ๊ณผ๋Š” ์„ธ๋ถ€์ ์ธ ์ด๋ฏธ์ง€ ์ •๋ณด๋ฅผ ๊ฐ–๋Š”๋‹ค.


Object Detection of SPPnet

image

  • SPPnet์˜ Object Detection ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
  1. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ CNN์— ๋„ฃ๋Š”๋‹ค.
  2. CNN์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ ํŠน์ง•๋งต์— Selective Search์™€ ๊ฐ™์€ Proposal Method๋ฅผ ์‚ฌ์šฉํ•ด RoI๋ฅผ ์„ ์ •ํ•œ๋‹ค.
  3. ๊ฐ RoI์— ๋Œ€ํ•ด Spatial Pyramid Pooling layer๋ฅผ ์ ์šฉํ•œ๋‹ค.
  4. Fully-connected layers๋ฅผ ํ†ต๊ณผํ•˜๊ณ  ๋‚œ ๊ฒฐ๊ณผ๋ฅผ cache๋กœ ์ €์žฅํ•œ๋‹ค.(SVMs์™€ Bbox reg๊ฐ€ ์—ฐ์‚ฐ์„ ๊ณต์œ ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ)
  5. SVM์„ ํ†ตํ•ด ๋ถ„๋ฅ˜ ์ž‘์—…์„ ํ•œ๋‹ค.
  6. Bbox reg์™€ Non Maximum Suppression์„ ํ†ตํ•ด์„œ bounding box๋ฅผ ๊ตฌํ•œ๋‹ค.

Compare RCNN with SPPnet

image

  • ์žฅ์ 
  1. SPPnet์€ RCNN์˜ 2000๊ฐœ์˜ RoI์— ๋Œ€ํ•ด ๊ฐ๊ฐ CNN์„ ์ ์šฉํ•จ์œผ๋กœ์จ ๋ฐœ์ƒํ•˜๋Š” ๋น„์šฉ์„ ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— CNN์„ ์ ์šฉํ•จ์œผ๋กœ์จ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค
  2. ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ, ๋น„์œจ์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š”๋‹ค.
  3. Pyramid ๋ฐฉ์‹์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๊ณต๊ฐ„์  ํŠน์ง•์„ ์ถ”์ถœํ•ด FC layer๋กœ ์ „๋‹ฌํ•œ๋‹ค.

์ฐธ๊ณ ๋ฌธํ—Œ