ChatGPT와 함께 원 샷 러닝 배워보기 - 1 (샴 네트워크에서의 CNN)

seaofwest 2023. 2. 13. 12:18

원 샷 러닝을 하는 데에 가장 기본적인 알고리즘이 샴 네트워크이다. 샴 네트워크를 pytorch로 구현해보기 위해서 아무 것도 모르는 상태로 구글링과 chatGPT를 섞어 가면서 공부한 기록.

- What about siamese neural network?

A Siamese Neural Network is a type of neural network architecture that uses two identical sub-networks to process two separate inputs and compare their outputs for similarity. This type of network is particularly useful for tasks such as face recognition, where the goal is to determine if two images of faces depict the same person.

In a Siamese Neural Network, both sub-networks have the same architecture and are trained using the same parameters. During training, the network is presented with pairs of inputs along with a label indicating whether they are similar or not. The sub-networks process the inputs and their outputs are compared using a distance metric such as the L1 or L2 distance. This comparison is then used to update the parameters of the network to better separate the positive and negative examples.

Once trained, the Siamese Network can be used to compare two new inputs and predict their similarity based on the distance between their outputs. The output of the network can also be interpreted as a measure of similarity between the inputs, allowing it to be used for tasks such as one-shot learning, where the goal is to recognize a new object based on a single example.

Overall, Siamese Neural Networks are a powerful tool for processing and comparing inputs that have a notion of similarity, and have found applications in various domains including computer vision, natural language processing, and speech recognition.

샴 네트워크를 진행하는 데에 동일한 sub-networks를 적용한다. 기존에 샴 네트워크에 대한 구글링을 했을 때에는 CNN을 쓴다고 했는데 꼭 CNN이 아닌 다른 sub-network로도 가능한 것 같다.

- Should I use CNN for sub-network on siamese network?

Yes, you can use a Convolutional Neural Network (CNN) as a sub-network in a Siamese Neural Network. In fact, using a CNN as the sub-network is a common choice for many computer vision tasks, as CNNs are particularly well-suited for image processing and have achieved state-of-the-art results on many benchmark datasets.

When using a CNN as the sub-network in a Siamese Neural Network, you can take advantage of the ability of CNNs to automatically and adaptively learn hierarchical representations of the input data. This can help the Siamese Network to effectively capture and compare the relevant features between the two inputs, leading to improved performance on the similarity prediction task.

It's important to note that while using a CNN as the sub-network in a Siamese Neural Network can be a good choice, it is not the only option. Other types of neural networks, such as fully connected networks or recurrent neural networks, could also be used depending on the specific requirements of the task and the nature of the inputs.

그래서 질문한 내용. 쭉 CNN 칭찬을 하다가 마지막 문단에 CNN이 유일한 옵션이 아니라는 답을 준다. CNN은 이미지 처리에 대한 정확도를 높여주는 기술이기 때문에 이미지가 아닌 다른 것을 학습 할 때는 CNN이 아니어도 되나보다.

근데 샴 네트워크에서 두 output들의 거리를 사용해서 학습을 진행한다고 했는데 두 output들을 만들어내는 CNN은 어떻게 학습하는 것인지 이해가 가지 않았다.

- When I train siamese network, what CNN should I use?

When training a Siamese Neural Network, the choice of the Convolutional Neural Network (CNN) to use as the sub-network depends on several factors, such as the size and complexity of the input data, the computational resources available, and the desired level of accuracy. Here are some commonly used CNN architectures that can be used as the sub-network in a Siamese Neural Network:

LeNet: This is a simple and classic CNN architecture that was first introduced for handwritten digit recognition. It can be a good starting point for small-scale image recognition tasks.
VGGNet: This is a deep CNN architecture that was introduced for the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It is known for its good performance on a wide range of image recognition tasks.
ResNet: This is a deep residual network that was introduced for the ILSVRC. It is known for its ability to train very deep networks effectively.
InceptionNet: This is a deep CNN architecture that was introduced for the ILSVRC. It is known for its efficient use of computational resources and good performance on a wide range of image recognition tasks.
MobileNet: This is a lightweight CNN architecture that was introduced for mobile and embedded devices. It is known for its efficient use of computational resources and good performance on resource-constrained devices.

It's important to keep in mind that the choice of the CNN architecture will depend on the specific requirements of the task and the available computational resources. You may need to experiment with different architectures to find the one that works best for your specific use case.

난 당연히 이론 상의 CNN 처럼 내가 다 디자인해서 학습시켜야되는 줄 알았더니 이미 존재하는 아키텍쳐들이 있었다. 5가지 정도를 추천해주면서 여러가지 요소에 따라서 서로 다른 아키텍쳐를 써야한다는 답을 주었다. (너무 당연하게도)

5가지를 천천히 보다보니까 익숙한 ResNet이라는 아키텍쳐가 보인다. 다른 분이 구현하신 샴 네트워크에 아키텍쳐를 Resnet-101으로 바꾸고 나서 정확도가 향상되었다는 것을 봤는데 이거구나 싶다.

- So can I just use one of sub-network algorithm above without training?

No, you cannot use a pre-trained Convolutional Neural Network (CNN) architecture as the sub-network in a Siamese Neural Network without further training.

Pre-trained CNNs are typically trained on large datasets, such as ImageNet, to perform a specific task, such as object classification. However, the features learned by the pre-trained network may not be directly applicable to the similarity prediction task performed by the Siamese Network. As a result, you would typically need to fine-tune the pre-trained network on your specific task and data to achieve good performance.

In fine-tuning, you would typically use the pre-trained weights as the starting point for your network and continue training using your task-specific data. This allows the network to learn the task-specific features from your data, while still taking advantage of the general knowledge learned from the pre-training.

It's worth noting that the degree to which you need to fine-tune the pre-trained network will depend on the size and similarity of the pre-training and fine-tuning datasets. If your fine-tuning dataset is very different from the pre-training dataset, you may need to perform more extensive fine-tuning, or even train the network from scratch.

그럼 그냥 누가 만든 거를 그냥 쓰면 되는 건가 싶어서 물어봤더니 아니다. Pre-trained된 CNN은 큰 데이터 셋을 가지고 학습하기 때문에 나의 데이터셋에는 맞지 않을 수가 있어서 나의 데이터로 fine-tune해야 된다고 한다.

근데 CNN을 학습시킬 때 사진은 내가 가지고 있는 사진을 쓰면 되는데, 어떠한 label로 파라미터들을 조정해야되는 지가 이해가 가지 않았다.

- What is training CNN means? I understood that I have to use my task-specific image, but What label should I use for it?

Training a Convolutional Neural Network (CNN) means updating the network's parameters so that it can accurately perform a specific task. The process of training a CNN involves two main steps:

Feeding the network input data and computing the output predictions.
Comparing the predictions to the true target values and computing the loss.

To train a CNN, you need to have labeled data that you can use to train and evaluate the network. The label for each input data point is the true target value that you want the network to predict. For example, if you are training a CNN to perform image classification, the label for each image would be the class label, such as "dog" or "cat".

During training, you feed the input data and the corresponding labels to the network, and the network computes the output predictions. You then compare the predictions to the true target values using a loss function, which measures the difference between the predicted and target values. The goal of training is to minimize the loss by adjusting the network's parameters.

The process of computing the loss, adjusting the parameters, and repeating this process for many iterations is known as backpropagation. The backpropagation algorithm uses gradient descent to adjust the parameters in the direction that minimizes the loss. Once the training process is completed, the network's parameters should be set such that the network is able to accurately predict the target values for new, unseen data.

답을 봤지만, 답은 일반적인 상황에서 단순히 CNN을 훈련시키는 방법이었다. 원 샷 러닝은 label이라는 게 존재하지 않고 그냥 이미지 2개의 유사도만 측정하는 것이기 때문에 원 샷 러닝에 대한 학습 방법이 되지 못했다.

- Then how can I get training data and its label for CNN of siamese network?

To train the Convolutional Neural Network (CNN) component of a Siamese Neural Network, you will need labeled training data that consists of pairs of images and the corresponding labels indicating whether the images are similar or not.

Here are some ways to obtain the training data and its labels:

Collect and label your own data: This is the most direct and flexible approach, but it may require significant effort to collect and label a large enough dataset.
Use publicly available datasets: There are many publicly available datasets that you can use to train your Siamese Network. For example, you can use the Labeled Faces in the Wild (LFW) dataset, which contains labeled face images.
Create synthetic data: If you have a smaller dataset, you can generate additional synthetic data by applying various transformations to the original data, such as rotations, translations, or color changes.

Once you have obtained the training data and its labels, you can use them to train the CNN component of the Siamese Network. You would typically perform this training by splitting the data into training and validation sets and using the training set to update the network's parameters. The validation set is used to evaluate the network's performance during training and to prevent overfitting, which occurs when the network memorizes the training data instead of learning to generalize to new data.

라는 대답을 얻었지만, 이걸로 어떻게 CNN을 학습시키는 지 이해가 가지 않아, 결국은 CNN에 대한 학습은 일단 미뤄두고 pre-trained된 아키텍쳐를 우선 사용해보는 걸로 결론을 냈다.