如何在PyTorch和TensorFlow中訓(xùn)練圖像分類模型

Coder編程 2022-10-13 發(fā)布于北京

展開全文

作者|PULKIT SHARMA
編譯|Flin
來(lái)源|analyticsvidhya

介紹

圖像分類是計(jì)算機(jī)視覺的最重要應(yīng)用之一。它的應(yīng)用范圍包括從自動(dòng)駕駛汽車中的物體分類到醫(yī)療行業(yè)中的血細(xì)胞識(shí)別，從制造業(yè)中的缺陷物品識(shí)別到建立可以對(duì)戴口罩與否的人進(jìn)行分類的系統(tǒng)。在所有這些行業(yè)中，圖像分類都以一種或另一種方式使用。他們是如何做到的呢？他們使用哪個(gè)框架？

你必須已閱讀很多有關(guān)不同深度學(xué)習(xí)框架（包括TensorFlow，PyTorch，Keras等）之間差異的信息。TensorFlow和PyTorch無(wú)疑是業(yè)內(nèi)最受歡迎的框架。我相信你會(huì)發(fā)現(xiàn)無(wú)窮的資源來(lái)學(xué)習(xí)這些深度學(xué)習(xí)框架之間的異同。

這是為你提供的一份資源：每個(gè)數(shù)據(jù)科學(xué)家都必須知道的5種驚人的深度學(xué)習(xí)框架！

https://www./blog/2019/03/deep-learning-frameworks-comparison

在本文中，我們將了解如何在PyTorch和TensorFlow中建立基本的圖像分類模型。我們將從PyTorch和TensorFlow的簡(jiǎn)要概述開始。然后，我們將使用MNIST手寫數(shù)字分類數(shù)據(jù)集，并在PyTorch和TensorFlow中使用CNN（卷積神經(jīng)網(wǎng)絡(luò)）建立圖像分類模型。

這將是你的起點(diǎn)，然后你可以選擇自己喜歡的任何框架，也可以開始構(gòu)建其他計(jì)算機(jī)視覺模型。

如果你不熟悉深度學(xué)習(xí)而且對(duì)計(jì)算機(jī)視覺領(lǐng)域很感興趣（誰(shuí)不是呢），請(qǐng)查看“認(rèn)證計(jì)算機(jī)視覺碩士課程 ”。

https://courses./bundles/certified-computer-vision-masters-program

PyTorch概述

PyTorch在深度學(xué)習(xí)社區(qū)中越來(lái)越受歡迎，并且被深度學(xué)習(xí)從業(yè)者廣泛使用，PyTorch是一個(gè)提供Tensor計(jì)算的Python軟件包。此外，tensors是多維數(shù)組，就像NumPy的ndarrays也可以在GPU上運(yùn)行一樣。

PyTorch的一個(gè)獨(dú)特功能是它使用動(dòng)態(tài)計(jì)算圖。PyTorch的Autograd軟件包從張量生成計(jì)算圖并自動(dòng)計(jì)算梯度。而不是具有特定功能的預(yù)定義圖形。

PyTorch為我們提供了一個(gè)框架，可以隨時(shí)隨地構(gòu)建計(jì)算圖，甚至在運(yùn)行時(shí)進(jìn)行更改。特別是，對(duì)于我們不知道創(chuàng)建神經(jīng)網(wǎng)絡(luò)需要多少內(nèi)存的情況，這很有用。

你可以使用PyTorch應(yīng)對(duì)各種深度學(xué)習(xí)挑戰(zhàn)。以下是一些挑戰(zhàn)：

圖像（檢測(cè)，分類等）
文字（分類，生成等）
強(qiáng)化學(xué)習(xí)

如果你希望從頭開始了解PyTorch，則以下是一些詳細(xì)資源：

PyTorch入門指南
- https://www./blog/2019/09/introduction-to-pytorch-from-scratch
在PyTorch中使用卷積神經(jīng)網(wǎng)絡(luò)建立圖像分類模型
- https://www./blog/2019/10/building-image-classification-models-cnn-pytorch
所有人的深度學(xué)習(xí)：使用PyTorch掌握強(qiáng)大的遷移學(xué)習(xí)藝術(shù)
- https://www./blog/2019/10/how-to-master-transfer-learning-using-pytorch
使用PyTorch進(jìn)行深度學(xué)習(xí)的圖像增強(qiáng)–圖像特征工程
- https://www./blog/2019/12/image-augmentation-deep-learning-pytorch

TensorFlow概述

TensorFlow由Google Brain團(tuán)隊(duì)的研究人員和工程師開發(fā)。它與深度學(xué)習(xí)領(lǐng)域最常用的軟件庫(kù)相距甚遠(yuǎn)（盡管其他軟件庫(kù)正在迅速追趕）。

TensorFlow如此受歡迎的最大原因之一是它支持多種語(yǔ)言來(lái)創(chuàng)建深度學(xué)習(xí)模型，例如Python，C ++和R。它提供了詳細(xì)的文檔和指南的指導(dǎo)。

TensorFlow包含許多組件。以下是兩個(gè)杰出的代表：

TensorBoard：使用數(shù)據(jù)流圖幫助有效地可視化數(shù)據(jù)
TensorFlow：對(duì)于快速部署新算法/實(shí)驗(yàn)非常有用

TensorFlow當(dāng)前正在運(yùn)行2.0版本，該版本于2019年9月正式發(fā)布。我們還將在2.0版本中實(shí)現(xiàn)CNN。

如果你想了解有關(guān)此新版本的TensorFlow的更多信息，請(qǐng)查看TensorFlow 2.0深度學(xué)習(xí)教程

https://www./blog/2020/03/tensorflow-2-tutorial-deep-learning

我希望你現(xiàn)在對(duì)PyTorch和TensorFlow都有基本的了解?，F(xiàn)在，讓我們嘗試使用這兩個(gè)框架構(gòu)建深度學(xué)習(xí)模型并了解其內(nèi)部工作。在此之前，讓我們首先了解我們將在本文中解決的問(wèn)題陳述。

了解問(wèn)題陳述：MNIST

在開始之前，讓我們了解數(shù)據(jù)集。在本文中，我們將解決流行的MNIST問(wèn)題。這是一個(gè)數(shù)字識(shí)別任務(wù)，其中我們必須將手寫數(shù)字的圖像分類為0到9這10個(gè)類別之一。

在MNIST數(shù)據(jù)集中，我們具有從各種掃描的文檔中獲取的數(shù)字圖像，尺寸經(jīng)過(guò)標(biāo)準(zhǔn)化并居中。隨后，每個(gè)圖像都是28 x 28像素的正方形（總計(jì)784像素）。數(shù)據(jù)集的標(biāo)準(zhǔn)拆分用于評(píng)估和比較模型，其中60,000張圖像用于訓(xùn)練模型，而單獨(dú)的10,000張圖像集用于測(cè)試模型。

現(xiàn)在，我們也了解了數(shù)據(jù)集。因此，讓我們?cè)赑yTorch和TensorFlow中使用CNN構(gòu)建圖像分類模型。我們將從PyTorch中的實(shí)現(xiàn)開始。我們將在google colab中實(shí)現(xiàn)這些模型，該模型提供免費(fèi)的GPU以運(yùn)行這些深度學(xué)習(xí)模型。

我希望你熟悉卷積神經(jīng)網(wǎng)絡(luò)（CNN），如果沒有，請(qǐng)隨時(shí)參考以下文章：

從頭開始學(xué)習(xí)卷積神經(jīng)網(wǎng)絡(luò)的綜合教程:https://www./blog/2018/12/guide-convolutional-neural-network-cnn

在PyTorch中實(shí)現(xiàn)卷積神經(jīng)網(wǎng)絡(luò)（CNN）

讓我們首先導(dǎo)入所有庫(kù)：

# importing the libraries
import numpy as np
import torch
import torchvision
import matplotlib.pyplot as plt
from time import time
from torchvision import datasets, transforms
from torch import nn, optim

我們還要在Google colab上檢查PyTorch的版本：

# version of pytorch
print(torch.__version__)

因此，我正在使用1.5.1版本的PyTorch。如果使用任何其他版本，則可能會(huì)收到一些警告或錯(cuò)誤，因此你可以更新到此版本的PyTorch。我們將對(duì)圖像執(zhí)行一些轉(zhuǎn)換，例如對(duì)像素值進(jìn)行歸一化，因此，讓我們也定義這些轉(zhuǎn)換：

# transformations to be applied on images
transform = transforms.Compose([transforms.ToTensor(),
                              transforms.Normalize((0.5,), (0.5,)),
                              ])

現(xiàn)在，讓我們加載MNIST數(shù)據(jù)集的訓(xùn)練和測(cè)試集：

# defining the training and testing set
trainset = datasets.MNIST('./data', download=True, train=True, transform=transform)
testset = datasets.MNIST('./', download=True, train=False, transform=transform)

接下來(lái)，我定義了訓(xùn)練和測(cè)試加載器，這將幫助我們分批加載訓(xùn)練和測(cè)試集。我將批量大小定義為64：

# defining trainloader and testloader
trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
testloader = torch.utils.data.DataLoader(testset, batch_size=64, shuffle=True)

首先讓我們看一下訓(xùn)練集的摘要：


# shape of training data
dataiter = iter(trainloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

因此，在每個(gè)批次中，我們有64個(gè)圖像，每個(gè)圖像的大小為28,28，并且對(duì)于每個(gè)圖像，我們都有一個(gè)相應(yīng)的標(biāo)簽。讓我們可視化訓(xùn)練圖像并查看其外觀：

# visualizing the training images
plt.imshow(images[0].numpy().squeeze(), cmap='gray')

它是數(shù)字0的圖像。類似地，讓我們可視化測(cè)試集圖像：

# shape of validation data
dataiter = iter(testloader)
images, labels = dataiter.next()

print(images.shape)
print(labels.shape)

在測(cè)試集中，我們也有大小為64的批次?，F(xiàn)在讓我們定義架構(gòu)

定義模型架構(gòu)

我們將在這里使用CNN模型。因此，讓我們定義并訓(xùn)練該模型：

# defining the model architecture
class Net(nn.Module):   
  def __init__(self):
      super(Net, self).__init__()

      self.cnn_layers = nn.Sequential(
          # Defining a 2D convolution layer
          nn.Conv2d(1, 4, kernel_size=3, stride=1, padding=1),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
          nn.MaxPool2d(kernel_size=2, stride=2),
          # Defining another 2D convolution layer
          nn.Conv2d(4, 4, kernel_size=3, stride=1, padding=1),
          nn.BatchNorm2d(4),
          nn.ReLU(inplace=True),
          nn.MaxPool2d(kernel_size=2, stride=2),
      )

      self.linear_layers = nn.Sequential(
          nn.Linear(4 * 7 * 7, 10)
      )

  # Defining the forward pass    
  def forward(self, x):
      x = self.cnn_layers(x)
      x = x.view(x.size(0), -1)
      x = self.linear_layers(x)
      return x

我們還定義優(yōu)化器和損失函數(shù)，然后我們將看一下該模型的摘要：

# defining the model
model = Net()
# defining the optimizer
optimizer = optim.Adam(model.parameters(), lr=0.01)
# defining the loss function
criterion = nn.CrossEntropyLoss()
# checking if GPU is available
if torch.cuda.is_available():
    model = model.cuda()
    criterion = criterion.cuda()
    
print(model)

因此，我們有2個(gè)卷積層，這將有助于從圖像中提取特征。這些卷積層的特征傳遞到完全連接的層，該層將圖像分類為各自的類別?，F(xiàn)在我們的模型架構(gòu)已準(zhǔn)備就緒，讓我們訓(xùn)練此模型十個(gè)時(shí)期：

for i in range(10):
    running_loss = 0
    for images, labels in trainloader:

        if torch.cuda.is_available():
          images = images.cuda()
          labels = labels.cuda()

        # Training pass
        optimizer.zero_grad()
        
        output = model(images)
        loss = criterion(output, labels)
        
        #This is where the model learns by backpropagating
        loss.backward()
        
        #And optimizes its weights here
        optimizer.step()
        
        running_loss += loss.item()
    else:
        print("Epoch {} - Training loss: {}".format(i+1, running_loss/len(trainloader)))

你會(huì)看到訓(xùn)練隨著時(shí)期的增加而減少。這意味著我們的模型是從訓(xùn)練集中學(xué)習(xí)模式。讓我們?cè)跍y(cè)試集上檢查該模型的性能：

# getting predictions on test set and measuring the performance
correct_count, all_count = 0, 0
for images,labels in testloader:
  for i in range(len(labels)):
    if torch.cuda.is_available():
        images = images.cuda()
        labels = labels.cuda()
    img = images[i].view(1, 1, 28, 28)
    with torch.no_grad():
        logps = model(img)

    
    ps = torch.exp(logps)
    probab = list(ps.cpu()[0])
    pred_label = probab.index(max(probab))
    true_label = labels.cpu()[i]
    if(true_label == pred_label):
      correct_count += 1
    all_count += 1

print("Number Of Images Tested =", all_count)
print("\nModel Accuracy =", (correct_count/all_count))

因此，我們總共測(cè)試了10000張圖片，并且該模型在預(yù)測(cè)測(cè)試圖片的標(biāo)簽方面的準(zhǔn)確率約為96％。

這是你可以在PyTorch中構(gòu)建卷積神經(jīng)網(wǎng)絡(luò)的方法。在下一節(jié)中，我們將研究如何在TensorFlow中實(shí)現(xiàn)相同的體系結(jié)構(gòu)。

在TensorFlow中實(shí)施卷積神經(jīng)網(wǎng)絡(luò)（CNN）

現(xiàn)在，讓我們?cè)赥ensorFlow中使用卷積神經(jīng)網(wǎng)絡(luò)解決相同的MNIST問(wèn)題。與往常一樣，我們將從導(dǎo)入庫(kù)開始：

# importing the libraries
import tensorflow as tf

from tensorflow.keras import datasets, layers, models
from tensorflow.keras.utils import to_categorical
import matplotlib.pyplot as plt

檢查一下我們正在使用的TensorFlow的版本：


# version of tensorflow
print(tf.__version__)

因此，我們正在使用TensorFlow的2.2.0版本。現(xiàn)在讓我們使用tensorflow.keras的數(shù)據(jù)集類加載MNIST數(shù)據(jù)集：


(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data(path='mnist.npz')
# Normalize pixel values to be between 0 and 1
train_images, test_images = train_images / 255.0, test_images / 255.0

在這里，我們已經(jīng)加載了訓(xùn)練以及MNIST數(shù)據(jù)集的測(cè)試集。此外，我們已經(jīng)將訓(xùn)練和測(cè)試圖像的像素值標(biāo)準(zhǔn)化了。接下來(lái)，讓我們可視化來(lái)自數(shù)據(jù)集的一些圖像：

# visualizing a few images
plt.figure(figsize=(10,10))
for i in range(9):
    plt.subplot(3,3,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(train_images[i], cmap='gray')
plt.show()

這就是我們的數(shù)據(jù)集的樣子。我們有手寫數(shù)字的圖像。再來(lái)看一下訓(xùn)練和測(cè)試集的形狀：

# shape of the training and test set
(train_images.shape, train_labels.shape), (test_images.shape, test_labels.shape)

因此，我們?cè)谟?xùn)練集中有60,000張28乘28的圖像，在測(cè)試集中有10,000張相同形狀的圖像。接下來(lái)，我們將調(diào)整圖像的大小，并一鍵編碼目標(biāo)變量：

# reshaping the images
train_images = train_images.reshape((60000, 28, 28, 1))
test_images = test_images.reshape((10000, 28, 28, 1))

# one hot encoding the target variable
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

定義模型體系結(jié)構(gòu)

現(xiàn)在，我們將定義模型的體系結(jié)構(gòu)。我們將使用Pytorch中定義的相同架構(gòu)。因此，我們的模型將是具有2個(gè)卷積層，以及最大池化層的組合，然后我們將有一個(gè)Flatten層，最后是一個(gè)有10個(gè)神經(jīng)元的全連接層，因?yàn)槲覀冇?0個(gè)類。

# defining the model architecture
model = models.Sequential()
model.add(layers.Conv2D(4, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Conv2D(4, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2), strides=2))
model.add(layers.Flatten())
model.add(layers.Dense(10, activation='softmax'))

讓我們快速看一下該模型的摘要：

# summary of the model
model.summary()

總而言之，我們有2個(gè)卷積層，2個(gè)最大池層，一個(gè)Flatten層和一個(gè)全連接層。模型中的參數(shù)總數(shù)為1198個(gè)。現(xiàn)在我們的模型已經(jīng)準(zhǔn)備好了，我們將編譯它：

# compiling the model
model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])

我們正在使用Adam優(yōu)化器，你也可以對(duì)其進(jìn)行更改。損失函數(shù)被設(shè)置為分類交叉熵，因?yàn)槲覀冋诮鉀Q一個(gè)多類分類問(wèn)題，并且度量標(biāo)準(zhǔn)是'accuracy’?，F(xiàn)在讓我們訓(xùn)練模型10個(gè)時(shí)期

# training the model
history = model.fit(train_images, train_labels, epochs=10, validation_data=(test_images, test_labels))

總而言之，最初，訓(xùn)練損失約為0.46，經(jīng)過(guò)10個(gè)時(shí)期后，訓(xùn)練損失降至0.08。10個(gè)時(shí)期后的訓(xùn)練和驗(yàn)證準(zhǔn)確性分別為97.31％和97.48％。

因此，這就是我們可以在TensorFlow中訓(xùn)練CNN的方式。

尾注

總而言之，在本文中，我們首先研究了PyTorch和TensorFlow的簡(jiǎn)要概述。然后我們了解了MNIST手寫數(shù)字分類的挑戰(zhàn)，最后，在PyTorch和TensorFlow中使用CNN（卷積神經(jīng)網(wǎng)絡(luò)）建立了圖像分類模型?，F(xiàn)在，我希望你熟悉這兩個(gè)框架。下一步，應(yīng)對(duì)另一個(gè)圖像分類挑戰(zhàn)，并嘗試同時(shí)使用PyTorch和TensorFlow來(lái)解決。

本站是提供個(gè)人知識(shí)管理的網(wǎng)絡(luò)存儲(chǔ)空間，所有內(nèi)容均由用戶發(fā)布，不代表本站觀點(diǎn)。請(qǐng)注意甄別內(nèi)容中的聯(lián)系方式、誘導(dǎo)購(gòu)買等信息，謹(jǐn)防詐騙。如發(fā)現(xiàn)有害或侵權(quán)內(nèi)容，請(qǐng)點(diǎn)擊一鍵舉報(bào)。

轉(zhuǎn)藏 分享

QQ空間 QQ好友新浪微博微信

獻(xiàn)花（0） +1

來(lái)自： Coder編程 > 《待分類》

舉報(bào)/認(rèn)領(lǐng)