본문 바로가기
Studying/Machine Learning

[머신러닝] CIFAR-10 이미지 분류 - VGG-19 모델

by giem 2022. 7. 25.
반응형

2022.07.22 - [Studying/Machine Learning] - [머신러닝] CNN 모델 구현 with Pytorch (CIFAR-10 dataset)

 

[머신러닝] CNN 모델 구현 with Pytorch (CIFAR-10 dataset)

2022.07.21 - [Studying/Machine Learning] - [머신러닝] Convolution Layer - Padding, Stride, Dilation [머신러닝] Convolution Layer - Padding, Stride, Dilation 2022.07.21 - [Studying/Machine Learning]..

gm-note.tistory.com

저번 포스트에 CNN모델을 직접 구현해봤는데

이번 포스트에는 사전 학습된 모델을 가져와서 분류해보겠다.


VGG-19

VGG-19 모델 구조는 다음과 같다.

 

https://media.springernature.com/lw685/springer-static/image/art%3A10.1038%2Fs41598-020-73287-7/MediaObjects/41598_2020_73287_Fig2_HTML.png

layer구조는 모델을 불러와서 직접 확인해보겠다.


Code

우선 패키지를 import 하고 dataset을 불러오겠다.

import torch
import torch.nn as nn
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import torchvision.datasets as datasets

def CIFAR10_DATA(root='./data/', download = True, batch_size=32, num_worker=1):
    print ("[+] Get the CIFAR10 DATA")
    # 50000
    train_dataset = datasets.CIFAR10(root='./data/', train=True, transform=transforms.ToTensor(), download=True)
    # 10000
    test_dataset = datasets.CIFAR10(root='./data/', train=False, transform=transforms.ToTensor())

    train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True)
    test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False)
    print ("[+] Finished loading data & Preprocessing")
    return train_dataset, test_dataset, train_loader, test_loader

 

Model

import torchvision.models as models

pre_vgg=models.vgg19_bn(pretrained=True) #사전학습된 모델 불러오기
pre_vgg
# VGG(
#   (features): Sequential(
#     (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (2): ReLU(inplace=True)
#     (3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (5): ReLU(inplace=True)
#     (6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (9): ReLU(inplace=True)
#     (10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (12): ReLU(inplace=True)
#     (13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (16): ReLU(inplace=True)
#     (17): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (19): ReLU(inplace=True)
#     (20): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (21): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (22): ReLU(inplace=True)
#     (23): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (24): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (25): ReLU(inplace=True)
#     (26): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (27): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (28): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (29): ReLU(inplace=True)
#     (30): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (31): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (32): ReLU(inplace=True)
#     (33): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (34): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (35): ReLU(inplace=True)
#     (36): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (37): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (38): ReLU(inplace=True)
#     (39): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#     (40): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (41): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (42): ReLU(inplace=True)
#     (43): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (44): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (45): ReLU(inplace=True)
#     (46): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (47): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (48): ReLU(inplace=True)
#     (49): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
#     (50): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     (51): ReLU(inplace=True)
#     (52): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
#   )
#   (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
#   (classifier): Sequential(
#     (0): Linear(in_features=25088, out_features=4096, bias=True)
#     (1): ReLU(inplace=True)
#     (2): Dropout(p=0.5, inplace=False)
#     (3): Linear(in_features=4096, out_features=4096, bias=True)
#     (4): ReLU(inplace=True)
#     (5): Dropout(p=0.5, inplace=False)
#     (6): Linear(in_features=4096, out_features=1000, bias=True)
#   )
# )

확인해보니 VGG-19 모델 그림에서 각 Conv layer는 conv-bnorm-relu의 묶음이었다.

16개는 convolution layer고, 3개는 fully connected layer, 총 19개로 이루어져 있다.

 

모델을 CIFAR-10 데이터에 맞게 바꿔주겠다.

class pretrained_vgg(nn.Module):
    def __init__(self, pre_vgg):
        super(pretrained_vgg, self).__init__()
        self.features = nn.Sequential(   
            *list(pre_vgg.features.children())# 앞의 conv layer들을 다 가져옴
        )
        # FC Layer를 CIFAR10 데이터에 맞게 변환
        self.linear = nn.Sequential(
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 128),
            nn.ReLU(),
            nn.Linear(128, 10),
        )
    def forward(self, x):
        out = self.features(x).squeeze()
        out = self.linear(out)
        return out

 


Training

# set hyperparameters
batch_size = 32
learning_rate = 0.0001
num_epochs = 4

vgg19 = pretrained_vgg(pre_vgg)

vgg19 = vgg19.cuda() # 빠른 학습을 위해 GPU에 올림 
criterion = nn.CrossEntropyLoss() # Loss Function = cross entrophy loss
optimizer = torch.optim.Adam(vgg19.parameters(), lr=learning_rate)

accuracy_list = []
for epoch in range(num_epochs):
    vgg19.train()
    for i, (images, labels) in enumerate(train_loader):
        images = images.cuda()
        labels = labels.cuda()

        # Forward + Backward + Optimize
        optimizer.zero_grad()
        outputs = vgg19(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if (i+1) % 100 == 0:
            print ('Epoch [%d/%d], Iter [%d/%d] Loss: %.4f' 
                 %(epoch+1, num_epochs, i+1, len(train_dataset)//batch_size, loss.item()))

 


결과를 확인해보면 확실히 학습 양에 비해 loss가 적은 것을 볼 수 있다.

 

이렇게 target 이미지와 학습된 이미지가 비슷하면

약간의 구조만 맞춰주어서 사전학습 모델을 이용할 수 있다.

 

VGG모델 말고도 다른 모델들을 불러올 수 있는데

 

각자 컴퓨터 사양에 따라 너무 큰 모델은 학습시키는데 오래 걸리므로

적당한 모델을 가져오는 게 좋다.

반응형

댓글