PyTorch Implementation (Structured Data Processing)#

(Last updated: Jan 30, 2025)1

In this practice, we will guide you using the PyTorch deep learning framework to implement a deep regression model on the Smell Pittsburgh dataset. We only provide the basics in this notebook, and the following resources give more detailed information about PyTorch:

Important

To make this notebook work, you need to install PyTorch. You can also copy this notebook (as well as the dataset) to Google Colab and run the notebook on it.

First, we begin importing the required packages.

import pandas as pd
import numpy as np
from os.path import isfile
from os.path import join
from os import listdir
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset
from torch.utils.data import DataLoader
from tqdm import tqdm

# Import the answers for the tasks
from util.answer import (
   answer_preprocess_sensor,
   answer_preprocess_smell,
   answer_sum_current_and_future_data
)

# Import the utility functions that are provided
from util.util import (
    convert_wind_direction,
    insert_previous_data_to_cols
)

Then, we need to set the resources that we want to use for computing. On a machine with a GPU (Graphics Processing Unit), we would prefer to use it since a GPU can speed up the computation a lot. On a MacOS computer with advanced chips (e.g., M1 or M2), PyTorch can use the MPS backend to perform computing.

if torch.cuda.is_available():
    device = torch.device("cuda") # use CUDA device
elif torch.backends.mps.is_available():
    device = torch.device("mps") # use MacOS GPU device (e.g., for M2 chips)
else:
    device = torch.device("cpu") # use CPU device
device
device(type='mps')

The following code will help you move the data to the device that you choose.

def to_device(data, device):
    """Move PyTorch objects (e.g., tensors, models) to a chosen device"""
    if isinstance(data, (list,tuple)):
        return [to_device(x, device) for x in data]
    return data.to(device, non_blocking=True)

Below we hide the function for computing the features and labels, which is taken from the structured data processing tutorial.

def compute_feature_label(df_smell, df_sensor, b_hr_sensor=0, f_hr_smell=0):
    """
    Compute features and labels from the smell and sensor data.

    Parameters
    ----------
    df_smell : pandas.DataFrame
        The preprocessed smell data.
    df_sensor : pandas.DataFrame
        The preprocessed sensor data.
    b_hr_sensor : int
        Number of hours that we want to insert the previous sensor data.
    f_hr_smell : int
        Number of hours that we want to sum up the future smell data.

    Returns
    -------
    df_x : pandas.DataFrame
        The features that we want to use for modeling.
    df_y : pandas.DataFrame
        The labels that we want to use for modeling.
    """
    # Copy data frames to prevent editing the original ones.
    df_smell = df_smell.copy(deep=True)
    df_sensor = df_sensor.copy(deep=True)

    # Replace -1 values in sensor data to NaN
    df_sensor[df_sensor==-1] = np.nan

    # Convert all wind directions.
    df_sensor = convert_wind_direction(df_sensor)

    # Scale sensor data and fill in missing values
    df_sensor = (df_sensor - df_sensor.mean()) / df_sensor.std()
    df_sensor = df_sensor.round(6)
    df_sensor = df_sensor.fillna(-1)

    # Insert previous sensor data as features.
    # Noice that the df_sensor is already using the previous data.
    # So b_hr_sensor=0 means using data from the previous 1 hour.
    # And b_hr_sensor=n means using data from the previous n+1 hours.
    df_sensor = insert_previous_data_to_cols(df_sensor, b_hr_sensor)

    # Sum up current and future smell values as label.
    # Notice that the df_smell is already the data from the future 1 hour.
    # (as indicated in the preprocessing phase of smell data)
    # So f_hr_smell=0 means using data from the future 1 hour.
    # And f_hr_smell=n means using data from the future n+1 hours.
    df_smell = answer_sum_current_and_future_data(df_smell, f_hr_smell)

    # Add suffix to the column name of the smell data to prevent confusion.
    # See the description above for the reason of adding 1 to the f_hr_smell.
    df_smell.columns += "_future_" + str(f_hr_smell+1) + "h"

    # We need to first merge these two timestamps based on the available data.
    # In this way, we synchronize the time stamps in the sensor and smell data.
    # This also means that the sensor and smell data have the same number of data points.
    df = pd.merge_ordered(df_sensor.reset_index(), df_smell.reset_index(), on=df_smell.index.name, how="inner", fill_method=None)

    # Sanity check: there should be no missing data.
    assert df.isna().sum().sum() == 0, "Error! There is missing data."

    # Separate features (x) and labels (y).
    df_x = df[df_sensor.columns].copy()
    df_y = df[df_smell.columns].copy()

    # Add the hour of day and the day of week.
    dow_radian = df["EpochTime"].dt.dayofweek.copy(deep=True) * 2 * np.pi / 6.0
    tod_radian = df["EpochTime"].dt.hour.copy(deep=True) * 2 * np.pi / 23.0
    df_x.loc[:,"day_of_week_sine"] = np.sin(dow_radian)
    df_x.loc[:,"day_of_week_cosine"] = np.cos(dow_radian)
    df_x.loc[:,"hour_of_day_sine"] = np.sin(tod_radian)
    df_x.loc[:,"hour_of_day_cosine"] = np.cos(tod_radian)
    return df_x, df_y

Now, we need to load and preprocess the dataset using the code in the structured data processing tutorial.

# Load and preprocess sensor data
path = "smellpgh-v1/esdr_raw"
list_of_files = [f for f in listdir(path) if isfile(join(path, f))]
sensor_raw_list = []
for f in list_of_files:
    sensor_raw_list.append(pd.read_csv(join(path, f)).set_index("EpochTime"))
df_sensor = answer_preprocess_sensor(sensor_raw_list)

# Load and preprocess smell data
smell_raw = pd.read_csv("smellpgh-v1/smell_raw.csv").set_index("EpochTime")
df_smell = answer_preprocess_smell(smell_raw)

# Compute features and labels
df_x, df_y = compute_feature_label(df_smell, df_sensor, b_hr_sensor=2, f_hr_smell=7)

# Use value 40 as the threshold to indicate a smell event
# In this way, we make it a binary classification problem
df_y = (df_y>=40).astype(int)
df_x
3.feed_1.SO2_PPM_pre_1h 3.feed_1.H2S_PPM_pre_1h 3.feed_1.SIGTHETA_DEG_pre_1h 3.feed_1.SONICWS_MPH_pre_1h 3.feed_23.CO_PPM_pre_1h 3.feed_23.PM10_UG_M3_pre_1h 3.feed_29.PM10_UG_M3_pre_1h 3.feed_29.PM25_UG_M3_pre_1h 3.feed_11067.CO_PPB..3.feed_43.CO_PPB_pre_1h 3.feed_11067.NO2_PPB..3.feed_43.NO2_PPB_pre_1h ... 3.feed_28.SONICWD_DEG_cosine_pre_3h 3.feed_28.SONICWD_DEG_sine_pre_3h 3.feed_26.SONICWD_DEG_cosine_pre_3h 3.feed_26.SONICWD_DEG_sine_pre_3h 3.feed_3.SONICWD_DEG_cosine_pre_3h 3.feed_3.SONICWD_DEG_sine_pre_3h day_of_week_sine day_of_week_cosine hour_of_day_sine hour_of_day_cosine
0 -0.273112 -0.403688 -1.520058 -0.599075 -0.388936 -0.777225 -0.406466 -0.395826 -0.716551 -0.585693 ... 0.279097 1.746934 -0.383942 1.929446 -0.542867 1.331119 0.000000 1.0 -2.449294e-16 1.000000
1 -0.273112 -0.403688 -1.433654 -0.684709 -0.388936 -0.690974 0.007500 -0.305936 -0.426597 0.488014 ... 1.089779 1.481480 0.945548 1.350182 0.512949 1.355712 0.866025 0.5 0.000000e+00 1.000000
2 -0.273112 -0.403688 1.142731 -0.941610 0.147335 -0.173473 -0.147737 -0.216045 -0.444787 0.829648 ... 0.799733 1.640186 0.726159 1.603583 0.537757 1.347897 0.866025 0.5 2.697968e-01 0.962917
3 -0.273112 -0.403688 -0.082623 -0.941610 0.147335 -0.432224 -0.302974 -0.216045 -0.796641 0.081306 ... 0.960380 1.562966 1.185067 0.816616 0.512949 1.355712 0.866025 0.5 5.195840e-01 0.854419
4 -0.273112 -0.403688 1.527618 -0.984426 0.147335 -0.259723 -0.458211 -0.485717 -0.762976 -0.504352 ... 1.623480 0.780539 1.225168 0.602477 0.659294 1.303117 0.866025 0.5 7.308360e-01 0.682553
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
16746 -0.273112 -0.403688 0.011635 -0.128090 -0.925207 -0.604724 -0.561703 -0.575607 0.529598 -0.748376 ... 1.707841 0.380565 1.119099 -0.254164 1.210281 -0.738344 -0.866025 0.5 -9.976688e-01 -0.068242
16747 -0.273112 -0.403688 0.443651 0.000361 -0.925207 -0.690974 -0.406466 -0.665498 0.662087 -0.292864 ... 1.693445 0.048275 1.098706 -0.303805 1.327922 -0.583204 -0.866025 0.5 -9.790841e-01 0.203456
16748 -0.273112 -0.403688 0.443651 -0.256540 -0.925207 -0.604724 -0.458211 -0.575607 0.181817 -0.862254 ... 1.489886 -0.500401 0.609087 -0.931955 0.798657 -1.062282 -0.866025 0.5 -8.878852e-01 0.460065
16749 -0.273112 -0.403688 0.270844 -0.085273 -0.925207 -0.518474 -0.302974 -0.575607 0.856204 -0.439279 ... 1.402368 -0.626362 0.237194 -1.124325 0.706601 -1.107672 -0.866025 0.5 -7.308360e-01 0.682553
16750 -0.273112 -0.403688 -0.341833 0.085995 -0.925207 -0.432224 -0.406466 -0.395826 0.798647 -0.309133 ... 0.581575 -1.153330 -0.684058 -1.060369 -0.161757 -1.243870 -0.866025 0.5 -5.195840e-01 0.854419

16751 rows × 148 columns

df_y
smell_value_future_8h
0 0
1 0
2 0
3 0
4 0
... ...
16746 0
16747 0
16748 0
16749 0
16750 0

16751 rows × 1 columns

We now have the dataset ready in Pandas. To make it work for PyTorch, we need to first convert the pandas.DataFrame object to a torch.Tensor object with the torch.float32 data type (because our PyTorch model will take this data type as the input). The PyTorch tensor object is similar to numpy.array but with more functions to support GPU (Graphics Processing Unit) computing. GPU can perform matrix operations much more efficiently than CPU (Central Processing Unit), and people usually use PyTorch to benefit from its powerful GPU computing support. The following code does the conversion. For more information about tensors, check this UvA deep learning tutorial and this PyTorch documentation.

# Convert pandas.DataFrame to torch.Tensor with type torch.float32
feature = torch.from_numpy(df_x.to_numpy()).float()
label = torch.from_numpy(df_y.to_numpy()).float()

Next, we need to create a torch.utils.data.Dataset class, which is a uniform interface for loading data. More description about the PyTorch dataset class can be found in this UvA deep learning tutorial and this PyTorch documentation.

class SmellPittsburghDataset(Dataset):
    def __init__(self, feature=None, label=None):
        self.feature = feature
        self.label = label

    def __len__(self):
        return len(self.feature)

    def __getitem__(self, idx):
        return self.feature[idx], self.label[idx]

We can use the dataset to create a training and validation set. In the code below, we use the first 60% of data points as the training set, followed by 20% data points as the validation set. The remaining data points are used as the test set. We will use the training set to train the model, the validation set for model selection, and the test set for evaluating model performance. The reason for doing this kind of split is because Smell Pittsburgh is a time-series dataset, which means we should use data from the past to predict the outcome in the future, but not the other way around.

n = df_y.size
train_size = int(n*0.6)
validation_size = int(n*0.2)
i = 0
j = i + train_size
k = j + validation_size
dataset_train = SmellPittsburghDataset(feature=feature[i:j], label=label[i:j])
dataset_validation = SmellPittsburghDataset(feature=feature[j:k], label=label[j:k])
dataset_test = SmellPittsburghDataset(feature=feature[k:], label=label[k:])
print("Total dataset size: %d" % n)
print("Training dataset size: %d" % len(dataset_train))
print("Validation dataset size: %d" % len(dataset_validation))
print("Test dataset size: %d" % len(dataset_test))
Total dataset size: 16751
Training dataset size: 10050
Validation dataset size: 3350
Test dataset size: 3351

Then, let us define a scoring function to evaluate the model performance for binary classification. The function takes two inputs: one is the array of predicted labels, and the other one is the true labels. In this case, we use precision, recall, and F1 score. Notice that we do a small trick here to return empty arrays of the output when the input is None, which will be handy later when we need to keep appending the scores into arrays of metrics.

def binary_scorer(y_predict=None, y=None):
    """
    A customized scoring function to evaluate a binary classifier.

    Parameters
    ----------
    y_predict : torch.Tensor
        The predicted binary labels in 0 (negative) or 1 (positive).
    y : torch.Tensor
        The true binary labels in 0 (negative) or 1 (positive).

    Returns
    -------
    dict of int or float
        A dictionary of evaluation metrics.
    """
    if y_predict is not None and y is not None:
        # Compute metrics and return them
        eq = (y_predict == y)
        tp = float((eq & (y_predict == 1)).sum()) # true positive
        tn = float((eq & (y_predict == 0)).sum()) # true negative
        fp = float((y_predict > y).sum()) # false positive
        fn = float((y_predict < y).sum()) # false negative
        return {"tp": tp, "tn": tn, "fp": fp, "fn": fn}
    else:
        # Return the structure of the dictionary with empty arrays for initialization
        return {"tp": [], "tn": [], "fp": [], "fn": []}

Next, we need to train the model for multiple epochs, which means running through all the available data multiple times in the training set. Different from the traditional gradient descent, deep learning models use Stochastic Gradient Descent (SGD) with mini-batches, which takes batches of data points (instead of all data). For more information about SGD, check this StatQuest video.

The reason for using mini-batch SGD is because training supervised deep learning models typically requires a lot of data (i.e., this is called many-shot learning now), and it is often impossible to fit all data into computer memory, so we have to feed the data to the optimization algorithm in batches. Also, training deep learning models usually requires going through all data points multiple times (i.e., multiple epochs). Batch size is a hyperparameter for tuning.

Now let us create a torch.utils.data.DataLoader object, which is a way to load data efficiently and is extremely beneficial if you cannot fit all data into computer memory at once (e.g., a bunch of videos). More information about DataLoader can be found in this UvA deep learning tutorial or this PyTorch documentation. The DataLoader object also allows us to load data in batches by specifying the batch size. Notice that we only want to shuffle the training set here, not the validation set.

Important

When writing PyTorch code for training deep learning models, the first important thing is to get torch.utils.data.Dataset and torch.utils.data.DataLoader objects ready.

dataloader_train = DataLoader(dataset_train, batch_size=168, shuffle=True)
dataloader_validation = DataLoader(dataset_validation, batch_size=168, shuffle=False)

To allow the DataLoader to use the device that we specified at the beginning of this notebook, we need to use the following code.

class DeviceDataLoader():
    """Wrap a dataloader to move data to a chosen device"""
    def __init__(self, dl, device):
        self.dl = dl
        self.device = device
        
    def __iter__(self):
        """Yield a batch of data after moving it to device"""
        for b in self.dl:
            yield to_device(b, self.device)

    def __len__(self):
        """Number of batches"""
        return len(self.dl)

Now, we can move the DataLoader objects to the specified device.

dataloader_train = DeviceDataLoader(dataloader_train, device)
dataloader_validation = DeviceDataLoader(dataloader_validation, device)

Next, we need to define the deep regression model. We use two layers of linear neurons. The first layer maps the features to 4 hidden units (i.e., linear neurons), and the second layer maps 4 hidden units to one single output (i.e., whether there is a smell event or not).

Notice that for computational efficiency, we do not need to ensure that the output is probability since the loss function that we will define later already does this job for us (i.e., the torch.nn.BCEWithLogitsLoss). We only need to make sure that it is probability later when we are going to use the model to make predictions of events for our task.

In the forward function, we define how the network will pass the result from each layer to the next layer. We use the ReLU activation function between the first and second layers to introduce some non-linearity. We do not need to define a backward function since PyTorch can automatically compute and backpropagate the gradients for us to iteratively update the model weights.

In this case, we choose to use 4 hidden units for demonstration purposes. In reality, this is a hyperparameter that you can tune. The input size should match the number of features (otherwise, running the code will give an error).

class DeepRegression(nn.Module):
    def __init__(self, input_size, hidden_size=4, output_size=1):
        super(DeepRegression, self).__init__()
        self.linear1 = nn.Linear(input_size, hidden_size)
        self.relu1 = nn.ReLU()
        self.linear2 = nn.Linear(hidden_size, output_size)
        
    def forward(self, x):
        out = self.linear1(x)
        out = self.relu1(out)
        out = self.linear2(out)
        return out

We are getting close to being able to train the model. The last three things that we need are to create the model object, specify the loss function (for the optimization algorithm to compute the error so that PyTorch can compute and backpropagate the gradients to update the model weights), and define the optimizer (i.e., the optimization algorithm).

Regarding the model, the code above defines a class. To be able to use it, we need to use the class to create an object. Think about a class as a design specification (e.g., spec for a car) and an object as producing the real design artifact (e.g., a real car). Regarding the loss criterion, we use the Binary Cross Entropy (BCE) loss function, which is standard for binary classification. Notice that we are not using torch.nn.BCELoss for computational efficiency. The torch.nn.BCEWithLogitsLoss that we use instead here can take care of the job of transforming the output of the final neural network layer into probabilities.

Regarding the optimizer, we use the Adam optimization algorithm, which is a variation of Stochastic Gradient Descent with advanced capabilities in scheduling learning rates and scaling the gradients dynamically. Here, we use 0.0001 as the learning rate and 0.000001 as the weight decay for regularization. Adding the regularization can make the training more stable and prevent overfitting. In reality, they are hyperparameters for tuning.

model = DeepRegression(feature.shape[1])
model = to_device(model, device) # move the model to the specified device
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.0001, weight_decay=0.000001)

Finally, we can use the following function to train the model for multiple epochs and return the model performance metrics. We will use the tqdm package for help in tracking the progress in the for-loop, which is handy for deep learning code that can take a long time to run.

As mentioned before, the output of the model is not probability for computational efficiency. But when computing the model performance, we need the probability to determine how likely there will be smell events. We will use the Sigmoid function to convert the output from the final layer to probabilities that sum up to one. Then, we can check if the probability is larger than a threshold (e.g., 0.5).

def train_model(dataloader, model, criterion, optimizer, num_epochs=50, threshold=0.5, check_step=5):
    """
    Train a PyTorch model and print model performance metrics.

    Parameters
    ----------
    dataloader : torch.utils.data.DataLoader
        The dataloader object.
    model : torch.nn.Module
        The PyTorch model that we want to train.
    criterion : torch.nn.modules.loss._Loss
        The loss function.
    optimizer : torch.optim.Optimizer
        The optimization algorithm.
    num_epochs : int
        Number of epochs for training the model.
    threshold : float
        The threshold of probability to decide if the output should be 1.
    check_step : int
        Number of epochs to create a check point for evaluation.
    
    Returns
    -------
    tuple
        A tuple with two dictionaries.
        The first one is the evaluation of the training history.
        The second one is the evaluation of the validation history.
    """
    model.train() # set the model to training mode
    
    # Loop epochs
    train_score_history = {"precision": [], "recall": [], "fscore": [], "loss": []}
    eval_score_history = {"precision": [], "recall": [], "fscore": [], "loss": []}
    for epoch in tqdm(range(num_epochs)):
        score_arrays = binary_scorer() # get the empty structure
        score_arrays["loss"] = [] # add the field for appending the loss
        # Loop the dataloader
        for x, y in dataloader:
            y_pred = model(x) # make predictions using the model
            loss = criterion(y_pred, y) # compute the loss
            optimizer.zero_grad() # set initial gradients to zero
            loss.backward() # accumulate and backpropagate the gradients
            optimizer.step() # update model weights
            score_arrays["loss"].append(loss.item()) # append the loss
            y_label = torch.sigmoid(y_pred) # turn model output into probabilities
            y_label = (y_pred > threshold).float() # turn probabilities to labels with 0 or 1
            score = binary_scorer(y_label, y) # compute evaluation metrics
            # Append the evaluation metrics to the arrays
            for k in score: # k means the type of score, such as precision or recall
                score_arrays[k].append(score[k])
        # After every 10 epochs, compute the evaluation metrics for that epoch
        if epoch % check_step == 0:
            averaged_loss_train = np.mean(score_arrays["loss"])
            sum_tp_train = np.sum(score_arrays["tp"])
            sum_fp_train = np.sum(score_arrays["fp"])
            sum_fn_train = np.sum(score_arrays["fn"])
            p_train, r_train, f_train = compute_prf(sum_tp_train, sum_fp_train, sum_fn_train)
            train_score_history["loss"].append(averaged_loss_train)
            train_score_history["precision"].append(p_train)
            train_score_history["recall"].append(r_train)
            train_score_history["fscore"].append(f_train)
            eval_score_arrays = evaluate_model(dataloader_validation, model, threshold=threshold)
            averaged_loss_eval = np.mean(eval_score_arrays["loss"])
            sum_tp_eval = np.sum(eval_score_arrays["tp"])
            sum_fp_eval = np.sum(eval_score_arrays["fp"])
            sum_fn_eval = np.sum(eval_score_arrays["fn"])
            p_eval, r_eval, f_eval = compute_prf(sum_tp_eval, sum_fp_eval, sum_fn_eval)
            eval_score_history["loss"].append(averaged_loss_eval)
            eval_score_history["precision"].append(p_eval)
            eval_score_history["recall"].append(r_eval)
            eval_score_history["fscore"].append(f_eval)
            
    # Return the evaluation metrics
    return train_score_history, eval_score_history

The function below computes precision, recall, and F-score.

def compute_prf(tp, fp, fn):
    """
    Compute precision, recall, and f-score based on the confusion matrix.
    
    Parameters
    ----------
    tp : int or float
        Number of true positives.
    fp : int or float
        Number of false positives.
    fn : int or float
        Number of false negatives.
    
    Returns
    -------
    dict
        A dictionary with precision, recall, and f-score.
    """
    tpfp = tp + fp
    tpfn = tp + fn
    precision = 0 if tpfp == 0 else tp/tpfp
    recall = 0 if tpfn == 0 else tp/tpfn
    fscore = 0 if precision+recall==0 else 2*(precision*recall)/(precision+recall)
    return precision, recall, fscore

Next, we need a function to evaluate the model on the valiation or test set.

def evaluate_model(dataloader, model, threshold=0.5):
    """
    Evaluate a PyTorch model and print model performance metrics.

    Parameters
    ----------
    dataloader : torch.utils.data.DataLoader
        The dataloader object.
    model : torch.nn.Module
        The PyTorch model that we want to train.
    threshold : float
        The threshold of probability to decide if the output should be 1.
    
    Returns
    -------
    dict of arrays
        A dictionary of evaluation metrics.
    """
    model.eval() # set the model to evaluation mode
    
    # Since we do not want to train the model, make sure that we deactivate the gradients
    with torch.no_grad():
        score_arrays = binary_scorer() # get the empty structure
        score_arrays["loss"] = [] # add the field for appending the loss
        # Loop the dataloader
        for x, y in dataloader:
            y_pred = model(x) # make predictions using the model
            loss = criterion(y_pred, y) # compute the loss
            y_label = torch.sigmoid(y_pred) # turn model output into probabilities
            y_label = (y_pred > threshold).float() # turn probabilities to labels with 0 or 1
            score = binary_scorer(y_label, y) # compute evaluation metrics
            # Append the evaluation metrics to the arrays
            for k in score:
                score_arrays[k].append(score[k])
            score_arrays["loss"].append(loss.item()) # append the loss

    # Return the evaluation metrics
    return score_arrays

Finally, we can now run the functions to train the model for several epochs and evaluate the model on the validation set. In practice, you can run the model for many epochs, save the model for every X epochs (e.g., X=5), and pick the model with the highest performance on the validation set. It may take a while to run the training code, depending on the performance of your computer.

check_step = 1
num_epochs = 80
t_history, v_history = train_model(dataloader_train, model, criterion, optimizer, num_epochs=num_epochs, check_step=check_step)
100%|███████████████████████████████████████████| 80/80 [00:27<00:00,  2.95it/s]

Plot the F-scores and loss for the training and validation sets.

epochs = [i * check_step for i in range(0, len(t_history["loss"]))]
plt.figure(figsize=(18, 6))
plt.plot(epochs, t_history["fscore"], marker='o', linestyle='-', label='Training F-score')
plt.plot(epochs, v_history["fscore"], marker='s', linestyle='--', label='Validation F-score')
plt.xlabel("Epochs")
plt.ylabel("F-score")
plt.title("Training vs Validation F-score over Epochs")
plt.xlim(-1, max(epochs))
plt.ylim(0, 1.05)
plt.xticks([i for i in range(0, max(epochs) + 10, 10)])
plt.yticks([i/20 for i in range(21)])
plt.legend()
plt.grid(True)
plt.show()
../_images/pytorch-structured-data_41_0.png
plt.figure(figsize=(18, 6))
plt.plot(epochs, t_history["loss"], marker='o', linestyle='-', label='Training Loss')
plt.plot(epochs, v_history["loss"], marker='s', linestyle='--', label='Validation Loss')
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.xlim(-1, max(epochs))
plt.xticks([i for i in range(0, max(epochs) + 10, 10)])
plt.title("Training vs Validation Loss over Epochs")
plt.legend()
plt.grid(True)
plt.show()
../_images/pytorch-structured-data_42_0.png

Your Task#

We have used the deep regression model to perform smell event classification on the Smell Pittsburgh dataset. Now, check the previous two graphs again. What is the biggest problem with this model and the training procedure? How do you plan to improve it? Provide your answer below:

(Write your answer here)

Now, it is your turn to implement the plan that you provided above. Edit the model architecture and the hyper-parameters to improve the performance of your model. For example, you can try adding more layers, increasing the number of hidden units in the layers, tweaking the learning rate, tweaking the weight decay (i.e., regularization), etc. There is a set of hyper-parameters and a model architecture that can bring your model performance to f-score 0.5 (or above) at some point of time within 80 epochs, and then the performance will converge at about 0.45 f-score (or slightly more than that), such as the graph below:

image.png

The result should be fairly stable, which means if you re-run the entire notebook for several times, the result should be roughly the same. Explain how you find this set of hyper-parameters and model architecture below.

(Write your answer here)

You can find the answer to the task in this link. Do not look at the answer first. Instead, try to do the task by yourself, and then check the answer for reference.


1

Credit: this teaching material is created by Yen-Chia Hsu.