• toc {:toc}

Overview of SPPnet

SPPnet ์€ ๊ธฐ์กด CNN(e.g. AlexNet) ์˜ ์ด๋ฏธ์ง€ ์ž…๋ ฅ์„ 224x224 ์™€ ๊ฐ™์ด ๊ณ ์ •์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•˜๋Š” ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๋Š” Spatial Pyramid Pooling(SPP) ๋ฐฉ์‹์„ ์ œ์‹œํ–ˆ๋‹ค. SPP ๋ฅผ ํ†ตํ•ด ์ž…๋ ฅ์˜ ํฌ๊ธฐ๊ฐ€ ์–ด๋–ป๋“  ๊ฐ„์— ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ ํ‘œํ˜„์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.

image

  • SPPnet ์˜ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
  1. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ CNN ์— ๋„ฃ๋Š”๋‹ค.
  2. CNN ์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ ํŠน์ง•๋งต์— Spatial Pyramid Pooling layer ๋ฅผ ์ ์šฉํ•œ๋‹ค.
  3. Spatial Pyramid Pooling layer ๋ฅผ ํ†ตํ•ด ๋‚˜์˜จ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ ํ‘œํ˜„์„ fully-connected layers ์˜ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

Problem

image

  • ์ด์ „ CNN ๊ตฌ์กฐ์˜ ๊ฒฝ์šฐ ์ž…๋ ฅ ํฌ๊ธฐ๊ฐ€ 224x224 ๋กœ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ๋น„์œจ๊ณผ ํฌ๊ธฐ๊ฐ€ ์ œํ•œ๋˜์—ˆ๋‹ค.
  • ๋น„์œจ, ํฌ๊ธฐ ์ œํ•œ์€ ์ด๋ฏธ์ง€์˜ ์™œ๊ณก, ์ •๋ณด์˜ ์™œ๊ณก, ์†์‹ค์„ ๋งŒ๋“ค๊ณ  ์ •ํ™•๋„์— ์˜ํ–ฅ์„ ๋ฏธ์น  ์ˆ˜ ์žˆ๋‹ค.

CNN ๊ตฌ์กฐ์—์„œ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ๊ฐ€ ํ•„์š”ํ•œ ๋ถ€๋ถ„์€ fc layer ๊ฐ€ ์‚ฌ์šฉ๋  ๋•Œ์ด๋ฏ€๋กœ SPPnet ์—์„œ๋Š” conv layers ๋Š” ์ด๋ฏธ์ง€ ํฌ๊ธฐ์— ๊ด€๊ณ„์—†์ด ์‚ฌ์šฉํ•˜๊ณ , fc layer ์ด์ „์— SPP ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ ์ • ๊ธธ์ด ๋ฒกํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด์ฃผ๋Š” ๊ตฌ์กฐ๋ฅผ ์ทจํ•œ๋‹ค.


Main Idea - Spatial Pyramid Pooling

Spartial Pyramid Pooling ์˜ ๋ฐฉ์‹์€ ํŠน์ง•์„ ํ•จ๊ป˜ ๋ชจ์œผ๋Š” BoW(Bag of Words) ๋ฐฉ์‹์—์„œ ์ฐฉ์•ˆํ–ˆ๋‹ค๊ณ  ํ•œ๋‹ค.

Why is Conv layer fine?

image

  • Conv layer 5 ๊ฐœ, Fully-connnected layer 2 ๊ฐœ๋กœ ๊ตฌ์„ฑ๋œ CNN ์„ ๊ฐ€์ •ํ•˜์ž.
  • ์œ„ ๊ทธ๋ฆผ์€ (a) Pascal VOC 2007 ์ด๋ฏธ์ง€, (b) conv5 ์˜ ํŠน์ง•๋งต, (c) conv5 ์— ๋Œ€ํ•œ ์ด๋ฏธ์ง€๋„ท ์ด๋ฏธ์ง€์˜ ๋ฐ˜์‘ ์ •๋„๋ฅผ ํ‘œ์‹œํ•œ ๊ทธ๋ฆผ์ด๋‹ค.

(a), (b) ๋ฅผ ๋น„๊ตํ–ˆ์„ ๋•Œ ๋ฌผ์ฒด์˜ ํŠน์ง•, ํ˜•์ฒด๋ฅผ ๋‚˜ํƒ€๋‚ธ ๋ถ€๋ถ„์— ๊ฐ•ํ•˜๊ฒŒ ํ™œ์„ฑํ™”๊ฐ€ ๋˜์–ด ์žˆ๋Š” ๋ถ€๋ถ„์„ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ๋•Œ๋ฌธ์— conv layer ์•ž์—์„œ ๋ฏธ๋ฆฌ ์ด๋ฏธ์ง€๋ฅผ ์กฐ์ •ํ•  ํ•„์š”์—†์ด conv ๋ฅผ ๊ฑฐ์นœ ํ›„ ํŠน์ง•๋งต์„ ์‚ฌ์šฉํ•˜๋”๋ผ๋„ ์›๋ณธ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒƒ๊ณผ ๊ฐ™์€ ํšจ๊ณผ๋ฅผ ๋ณด์ธ๋‹ค๋Š” ๊ฒƒ์„ ์ž…์ฆํ•œ๋‹ค.

Spatial Pyramid Pooling

image

SPP ๋Š” spatial bins ์„ ์„ค์ •ํ•œ๋‹ค. Bin ์€ ์œ„ ๊ทธ๋ฆผ์˜ 4x4, 2x2, 1x1 ์˜ ์‚ฌ๊ฐํ˜• ํ•˜๋‚˜๋ฅผ ๋งํ•œ๋‹ค. ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” bin ์˜ ๊ฐœ์ˆ˜๋Š” 21 ๊ฐœ์ด๋‹ค.

์ดํ›„, ํ•ด๋‹น bin ์˜ ๊ฐœ์ˆ˜์— ๋งž๋Š” pooling ๊ฐœ์ˆ˜๊ฐ€ ์‚ฐ์ถœ๋˜๊ณ  (e.g. 21bin ์ผ ๋•Œ 4x4, 2x2, 1x1 pooling) ๊ฐ pooling ์— ๋Œ€ํ•ด window size ์™€ stride ๋ฅผ ์กฐ์ •ํ•ด ๊ณ ์ • ํฌ๊ธฐ์˜ pooling ์ถœ๋ ฅ์„ ์ƒ์„ฑํ•œ๋‹ค. ์ฆ‰, pooling ํ•˜์—ฌ ๋‚˜์˜ฌ ์ถœ๋ ฅ์„ ๊ณ ์ •ํ•˜๊ณ  ์ด๋ฏธ์ง€๊ฐ€ ํ•ด๋‹น ํฌ๊ธฐ๋กœ ์ถœ๋ ฅ๋˜๋„๋ก ์„ค์ •ํ•œ๋‹ค. ์ด๋•Œ์˜ pooling ์€ MaxPooling ์ด๋‹ค.

  • Window_size = Ceil(feature map size / pooling size)
  • Stride = Floor(feature map size / pooling size)
  • ex) 13x13 feature map, 3x3 pooling ์„ ํ•˜๋Š” ๊ฒฝ์šฐ
    • window size = ceil(13/3) = 5
    • stride = floor(13/3) = 4
    • ๋‹ค์Œ์„ ํ†ตํ•ด 3x3 pooling ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜จ๋‹ค.
    • Window_size ์™€ Stride ๋Š” adaptive ํ•˜๊ฒŒ ์„ค์ •๋œ๋‹ค.

์ดํ›„ pooling ์ถœ๋ ฅ์„ concat, flatten ์‹œํ‚ด์œผ๋กœ์จ ํ•˜๋‚˜์˜ ๊ณ ์ •๋œ ๊ธธ์ด์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์œผ๋กœ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.

image

SPP ๋ฅผ ํ†ตํ•ด 4x4, 2x2, 1x1 pooling ํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ํ‘œํ˜„ํ•œ ๊ทธ๋ฆผ์ด๋‹ค. ๋‹ค์–‘ํ•œ ํฌ๊ธฐ์˜ ์ถ”์ถœ์„ ํ†ตํ•ด ์ด๋ฏธ์ง€์˜ ๋‹ค์–‘ํ•œ ๊ณต๊ฐ„์  ์ •๋ณด๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ํฐ pooling ๊ฒฐ๊ณผ๋Š” ๊ด‘๋ฒ”์œ„ํ•œ ์ •๋ณด, ์ž‘์€ pooling ๊ฒฐ๊ณผ๋Š” ์„ธ๋ถ€์ ์ธ ์ด๋ฏธ์ง€ ์ •๋ณด๋ฅผ ๊ฐ–๋Š”๋‹ค.


Object Detection of SPPnet

image

  • SPPnet ์˜ Object Detection ๊ตฌ์กฐ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.
  1. ์ž…๋ ฅ ์ด๋ฏธ์ง€๋ฅผ CNN ์— ๋„ฃ๋Š”๋‹ค.
  2. CNN ์œผ๋กœ๋ถ€ํ„ฐ ์ถ”์ถœ๋œ ํŠน์ง•๋งต์— Selective Search ์™€ ๊ฐ™์€ Proposal Method ๋ฅผ ์‚ฌ์šฉํ•ด RoI ๋ฅผ ์„ ์ •ํ•œ๋‹ค.
  3. ๊ฐ RoI ์— ๋Œ€ํ•ด Spatial Pyramid Pooling layer ๋ฅผ ์ ์šฉํ•œ๋‹ค.
  4. Fully-connected layers ๋ฅผ ํ†ต๊ณผํ•˜๊ณ  ๋‚œ ๊ฒฐ๊ณผ๋ฅผ cache ๋กœ ์ €์žฅํ•œ๋‹ค.(SVMs ์™€ Bbox reg ๊ฐ€ ์—ฐ์‚ฐ์„ ๊ณต์œ ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ)
  5. SVM ์„ ํ†ตํ•ด ๋ถ„๋ฅ˜ ์ž‘์—…์„ ํ•œ๋‹ค.
  6. Bbox reg ์™€ Non Maximum Suppression ์„ ํ†ตํ•ด์„œ bounding box ๋ฅผ ๊ตฌํ•œ๋‹ค.

Compare RCNN with SPPnet

image

  • ์žฅ์ 
  1. SPPnet ์€ RCNN ์˜ 2000 ๊ฐœ์˜ RoI ์— ๋Œ€ํ•ด ๊ฐ๊ฐ CNN ์„ ์ ์šฉํ•จ์œผ๋กœ์จ ๋ฐœ์ƒํ•˜๋Š” ๋น„์šฉ์„ ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€์— CNN ์„ ์ ์šฉํ•จ์œผ๋กœ์จ ๊ฐ์†Œ์‹œ์ผฐ๋‹ค
  2. ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ํฌ๊ธฐ, ๋น„์œจ์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋Š”๋‹ค.
  3. Pyramid ๋ฐฉ์‹์„ ํ†ตํ•ด ์—ฌ๋Ÿฌ ๊ณต๊ฐ„์  ํŠน์ง•์„ ์ถ”์ถœํ•ด FC layer ๋กœ ์ „๋‹ฌํ•œ๋‹ค.

์ฐธ๊ณ ๋ฌธํ—Œ