Assignment 1

Dates and rules.

Please read the following guidelines carefully:

  • The assignment must be submitted as a .zip file, compressed in zip format, and sent as an attachment by email to a.lamurias@fct.unl.pt from your official FCT email address with "TIAB TP1" as the title. Please do not use other compression methods or other email addresses.
  • The archive must be compressed in zip format and must contain the following:

    • TP1.txt: the questions file with your answers filled in.
    • tp1.py: the script that can be run to train your selected models
    • Any additional .py modules that you created to write your code, if you wish to split it into different modules
    • Any .png files you may wish to include as reports on your results. These should be linked in the questions and answers file TP1.txt (see the instructions in the file)

    Do not include other files in this archive. In particular, do not include the dataset. Try to keep the archive small (<5MB).

    • For each student, only the last email sent before the deadline will be counted. So you can change the version of the assignment submitted simply by resending your assignment before the deadline.
    • If, for some reason, you want to withdraw your assignment simply send an email with the word WITHDRAW (in all caps) before the deadline. This is necessary only if you do not want to submit your assignment. It is not necessary to do this if you just want to replace your previous submission with a more recent version.
  • The deadline for submitting assignment 1 is Wednesday, April 3, 2024, at 23:59. There will be a tolerance period of 48h, ending on Friday, April 5, at 23:59, but this period should be used only for correcting any problems with the submission.
  • No submissions will be accepted after 23:59 of April 5, once the 48 hour tolerance period is over.

    Download this zip file: TP1.zip. Extract it to your working folder and do not change the directory structure.

    The TP1.zip archive contains a dataset folder with the data for the assignment and the following files:

    TP1.txt
    This is the questions and answers file you must fill out before submitting your assignment.
    tp1.py
    This is the Python 3.x script that can be used to run your assignment. It contains the code for reading the data and obtaining the unlabelled images, the labelled images and the respective labels. Write the necessary code in this file.

Description

The goal of this assignment is to create a network that classifies super-resolution fluorescence microscopy images of bacteria into 3 stages of the cell cycle. There are 400 images labelled with these stages:

Stage 0, before the cell starts replicating:

Stage 1, when the septum dividing the cells starts to form:

Stage 2, when the septum is fully formed, and before the cell splits:

Each image is an array of 40x40, for a grayscale image of 40 by 40 pixels. In addition to the 400 labelled images, there are 3892 unlabelled images, because it is easier to obtain images than to have the biologists classify them.

In this assignment you will need to complete these tasks:

Multiclass classification from the images
Design and train a neural network that tries to predict class of an image based on the labelled data (400 images). Use 300 images for training and 100 for validation. Discuss the problems with this approach due to the small data set.
Feature extraction
Design and train an autoencoder that can create a more compact representation of the images. Each image has 40*40 = 1600 values, and with only 400 labelled images this is excessive. Your autoencoder should be able to create a smaller encoding. You can use the unlabelled images for training and the labelled images for validation in this case.
Multiclass classification based on the encoding
Create a classifier to predict the class of an image using the smaller representations produced by the autoencoder. Use the 400 labelled images for this (300 images for training and 100 for validation). Remember that in this case your input is a vector with the encoding of the images, so you can use a small dense network.

Instructions

In this assignment you will be graded both on the implementation of your models and on your reasoning, experiments and understanding of the subjects. So you must pay equal attention to the questions asked in the TP1.txt file. Confusing explanations, missing information or incorrect statements will be penalized.

In your answers you can link .png files by simply writing the name of the file in a separate line. See the instructions on the TP1.txt file. You can use this to show the training plots and other images.

For each task you must take into consideration several issues. For example (among others):

Network architecture
Choose the appropriate layers and justify your choice. Also, try different models to find the best one, and explain this process.
Loss functions
Choose the correct loss functions and explain your choice.
Training and overfitting
Show the relevant plots to demonstrate that you selected a good model, learning rates, optimizer, etc.