• toc {:toc}

Sequence ์ด์šฉํ•ด Custom Dataset ๋งŒ๋“ค๊ธฐ

Pytorch์˜ Dataset์„ ์ƒ์†๋ฐ›์•„ Custom Dataset์„ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹๊ณผ ์œ ์‚ฌํ•˜๋‹ค.

Tensorflow 2.x ๋ฒ„์ „๋ถ€ํ„ฐ custom dataset loader๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ๋ฒ•์ด ์ƒ๊ฒผ๋‹ค.

tensorflow.keras.utils.Sequence๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

init ํ•จ์ˆ˜ ์ •์˜

class CustomDataset(Sequence):
	def __init__(self, img, labels, batch_size=BATCH_SIZE, augmentor=None, shuffle=False):
	    self.img = img
	    self.labels = labels
	    self.batch_size = BATCH_SIZE
	    self.augmentor = augmentor
	    self.shuffle = shuffle
			
			if self.shuffle:
				self.on_epoch_end()
  • img : ์ด๋ฏธ์ง€ ํŒŒ์ผ์ด ์žˆ๋Š” directory ๊ฒฝ๋กœ, ์ด์™ธ์˜ ๊ฐ ํ”ฝ์…€๊ฐ’์„ ๋‹ด๋Š” numpy array์˜ ๊ฒฝ์šฐ์—๋Š” NonImplmentedError๊ฐ€ ๋ฐœ์ƒํ–ˆ๋‹ค.
  • labels : ์ด๋ฏธ์ง€ label์„ ๋‹ด๋Š”๋‹ค.

len ํ•จ์ˆ˜ ์ •์˜

def __len__(self):
    return int(np.ceil(len(self.labels)/self.batch_size))
  • step์ด ๋ช‡ ๋ฒˆ ๋ฐœ์ƒํ•˜๋Š”์ง€๋ฅผ ์˜๋ฏธํ•œ๋‹ค.
  • ์ฆ‰, ์ „์ฒด๋ฐ์ดํ„ฐ๊ฐ€ 60000์ด๊ณ  batch_size๊ฐ€ 600์ด๋ผ๋ฉด 100๋ฒˆ ๋™์•ˆ step์„ ์ง„ํ–‰ํ•จ์„ ์˜๋ฏธํ•œ๋‹ค.
  • np.ceil์€ ๋งŒ์•ฝ batch_size๊ฐ€ 599๋ผ๋ฉด 100.xxxxx๋ฒˆ ํ•˜๋Š” ๊ฒƒ์ด ์•„๋‹Œ 101๋ฒˆ์„ ์ง„ํ–‰ํ•ด์•ผ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์˜ฌ๋ฆผ ์ฒ˜๋ฆฌ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

getitem ํ•จ์ˆ˜ ์ •์˜

def __getitem__(self, index):
    img_batch = self.img[index*self.batch_size:(index+1)*self.batch_size]
    if self.labels is not None:
        label_batch = self.labels[index*self.batch_size:(index+1)*self.batch_size]
 
    image_batch = np.zeros((img_batch.shape[0], IMAGE_SIZE, IMAGE_SIZE, 3))
 
    for image_index in range(img_batch.shape[0]):
        image = cv2.cvtColor(cv2.imread(img_batch[image_index]), cv2.COLOR_BGR2RGB)
        image = cv2.resize(image, (IMAGE_SIZE, IMAGE_SIZE))
        if self.augmentor is not None:
            image = self.augmentor(image=image)['image']
    
        img_batch[image_index] = image
   
    return img_batch, label_batch
  • index์— ๋”ฐ๋ผ์„œ ๋ฐ์ดํ„ฐ์—์„œ batch_size๋งŒํผ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์˜ค๋Š” ํ•จ์ˆ˜์ด๋‹ค.
  • ํ…Œ์ŠคํŠธ ์„ธํŠธ์˜ ๊ฒฝ์šฐ label์ด ์—†๊ธฐ ๋•Œ๋ฌธ์— ๋”ฐ๋กœ label์„ ์ฒ˜๋ฆฌํ•ด์ค€๋‹ค.
  • img_batch๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฐ’์ด directory path์˜ ๊ฐ’์ด๊ธฐ ๋•Œ๋ฌธ์— cv2๋ฅผ ํ†ตํ•ด numpy array๋กœ ๋ณ€๊ฒฝํ•ด resizeํ•œ๋‹ค.
  • augmentor๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ ์ด๋ฏธ์ง€ ๊ฐ๊ฐ์— ์ ์šฉํ•˜๊ณ  img_batch์— ์ €์žฅํ•œ๋‹ค.
  • img_batch์™€ label_batch๋ฅผ ๋ฐ˜ํ™˜ํ•ด iteration๋งˆ๋‹ค batch๋ฅผ ๊ฐ€์ ธ์˜ค๊ฒŒ ํ•œ๋‹ค.

On_epoch_end ํ•จ์ˆ˜ ์ •์˜

def on_epoch_end(self):
        if(self.shuffle):
            self.image_filenames, self.labels = sklearn.utils.shuffle(self.image_filenames, self.labels)
        else:
            pass
  • On_epoch_end ํ•จ์ˆ˜๋Š” ์„ ํƒ์‚ฌํ•ญ์ด๋‹ค.
  • shuffle์„ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค. sklearn.utils.shuffle ์„ ์‚ฌ์šฉํ•ด์„œ ์ˆœ์„œ์— ๋”ฐ๋ผ shuffleํ•œ๋‹ค.
  • sklearn.utils.shuffle() : ๋ฐ์ดํ„ฐ๋ฅผ ๋™์ผํ•œ ์ˆœ์„œ๋กœ ์„ž์–ด์ค€๋‹ค.