24 Aug 2020

Learning cos(x) using torch::Sequential from the PyTorch C++ API

This post covers the use of the PyTorch C++ API for approximating a function of a single variable using a Neural Network (NN). We’ll use the Sequential container to build the NN without using a lot of C++ and train the NN on $(x, cos(x))$ data.

The `Sequential` container in PyTorch

The Sequential container is used to chain a sequence of PyTorch modules, i.e. layers of a Neural Network (NN), into a sequence (a list). The Sequential container then forwards transformations to the entire sequence of modules without writing additional code. For example to will be applied on each module in Sequential and a call to forward will be automatically chained.

This way, Sequential makes it possible to chain network layers and activation functions to build neural networks (NNs) without writing structs or classes in C++.

Unnamed modules are chained inside the Sequential container like this:

    torch::nn::Sequential model (
        torch::nn::Linear(1,64), 
        torch::nn::Tanh(), 
        torch::nn::Linear(64,1)
    );

This creates a NN with a single input, two hidden layers with 64 neurons, a tanh activation function between them, and an output layer with a single scalar output.

Named sub-modules can be initialized into Sequential model like this

    torch::nn::Sequential model ({
        {"in", torch::nn::Linear(1,64)}, 
        {"tanh1", torch::nn::Tanh()}, 
        {"out", torch::nn::Linear(64,1)}
    });

or added using Sequential::push_back. Multiple Sequential models can be joined together into larger networks.

The torch::nn::Sequential type is actually a std::shared_ptr that can be dereferenced for convenience

    auto& model_ref = *model;

Then, Sequential member functions can be used with the . access operator

    model_ref.pretty_print(cout);

making it easier to loop over sub-modules using range loops.

Note: torch::Sequential container stores sub-modules as torch::AnyModule type:

The PyTorch C++ API does not impose an interface on the signature of forward() in Module subclasses. This gives you complete freedom to design your forward() methods to your liking. However, this also means there is no unified base type you could store in order to call forward() polymorphically for any module. This is where the AnyModule comes in. Instead of inheritance, it relies on type erasure for polymorphism.

To access the underlying module type, we need torch::AnyModule::ptr() member function, then we can see the names and parameters of the sub-moduels of Sequential

    auto& model_ref = *model; 

    for (auto module : model_ref)
    {
        cout << module.ptr()->name() << endl;
        cout << module.ptr()->parameters() << endl;
    }

Training a PyTorch `Sequential` model on $cos(x)$

We will train the model on the $cos(x)$ function. To do this, the periodicity of $cos(x)$ is used: if $f(x + T) = f(x)$, then $f(x)$ is a periodic function with a period $T$. This means, we can train the model on the interval $[0, T]$ and then wrap the trained model so that model_periodic(x + T) returns model(x). Otherwise, we could try and train the model on $cos(x)$ using many periods, but this would not lead to much success.

Generating input data

We take a sequence of $1000$ samples from $[0, \pi]$.

    // Preparing sample data 
    static const int N_SAMPLES = 1000;
    torch::Tensor x_sequence = torch::linspace(0, 3*M_PI, N_SAMPLES);
    // Reshape and save x_sequence 
    x_sequence = x_sequence.reshape({N_SAMPLES,1});
    torch::save(x_sequence, "x_sequence.pt");

    // Reshape and save y_sequence 
    torch::Tensor y_sequence = torch::cos(x_sequence); 
    torch::save(y_sequence, "y_sequence.pt");

The reason behind using $3\pi$ is described in the results section together with other NN hyperparameters.

The 1D $x$ and $cos(x)$ sample sequences need to be resized to the shape $(N_{samples}, 1)$ so that they can be accepted as input by the first and last module in the model that only have a single (scalar) neuron.

Next, we create the training sequence as a subset of the input sequence

    // SAMPLE DATA: x, cos(x)
    // Training set 70 / 30 split 
    torch::Tensor shuffled_indices = torch::randperm(
        N_SAMPLES, 
        torch::TensorOptions().dtype(at::kLong)
    ); 
    auto n_val = int (0.7 * N_SAMPLES);
    torch::Tensor training_indices = 
        shuffled_indices.index({Slice(0, n_val)}); 
    torch::Tensor x_training = x_sequence.index(training_indices);
    torch::Tensor y_training = y_sequence.index(training_indices);

Note that we need an integer type for the sequence of indices that index $x$ and $cos(x)$ sequences, and the defaul type of any tensor created in PyTorch is always float, so we need to pass the type argument at::kLong to dtype to make sure shuffled_indices are integers.

Training the model

We select the optimizer that we’ll use to minimize the error, and select its’s step. We also need sequences to store the model prediction as well as loss values

    // TRAIN THE MODEL ON THE TRAINING SET
    torch::optim::Adam optimizer(model->parameters(), 0.01);
    torch::Tensor training_prediction = torch::zeros_like(x_training);
    torch::Tensor loss_values = torch::zeros_like(x_training);

Here is the training loop:

    ofstream conv_file ("convergence_data.csv"); 
    conv_file << "max_loss\n";
    for (size_t epoch = 1; epoch <= 1000; ++epoch) 
    {
        optimizer.zero_grad();

        training_prediction = model->forward(x_training);

        loss_values = torch::mse_loss(training_prediction, y_training); 
        loss_values.backward(); 

        optimizer.step();

        // Report the error with respect to y_training. 
        double max_loss = loss_values.max().item<double>();
        cout << "Epoch " << epoch 
            << ", max(loss_values) = " << max_loss << endl;
        conv_file << max_loss << "\n";
    }

PyTorch accumulates weight gradients of the network on subsequent backward propagations , so optimizer.zero_grad(); is called to zero the gradients in order to ensure previous passes do not influence the direction of the gradient. From the official documentation:

torch.Tensor is the central class of PyTorch. When you create a tensor, if you set its attribute .requires_grad as True, the package tracks all operations on it. This happens on subsequent backward passes. The gradient for this tensor will be accumulated into .grad attribute. The accumulation (or sum) of all the gradients is calculated when .backward() is called on the loss tensor.

Validating the model

For the validation, we use the last 30% of the randomly shuffled indices

    // VALIDATE THE MODEL WITH THE VALIDATION SET
    torch::Tensor validation_indices = 
        shuffled_indices.index({Slice(n_val+1, N_SAMPLES)}); 
    torch::Tensor x_validation = 
        x_sequence.index(validation_indices);
    torch::Tensor y_validation = 
        y_sequence.index(validation_indices);
    torch::Tensor validation_values = 
        model->forward(x_validation); 
    torch::Tensor validation_loss = 
        torch::mse_loss(validation_values, y_validation); 

    cout << "Validation max(validation_loss) = " 
        << torch::max(validation_loss) << endl;


    // REPORT THE PREDICTION OVER COMPLETE INPUT
    torch::Tensor y_model_sequence = model->forward(x_sequence); 
    torch::save(y_model_sequence, "y_model_sequence.pt");

Results

In the preparation of the training set we didn’t use $T=2\pi$ for the $cos(x)$ function, instead we used $3\pi$. The reason behind this is the asymmetry I found in the approximation of $cos(x)$ with this NN over $[0,2\pi]$.

svg

The NN seems to not capture the non-linearity of $cos(x)$ at the right end of the interval $[0,2\pi]$. The training and validation $L_\infty$ norm of the MSE loss are

    Epoch 1000, max(loss_values) = 0.000760586
    Validation max(validation_loss) = 0.000676395

So it seems the NN does not overfit and since the model is missing accuracy near $2\pi$ one could think that it simply needs more neurons (increase in nonlinearity). I tried adding more neurons, more hidden layers, changing the loss function (to smooth $L_1$) and the activation functions: this changes the results but the asymmetry in the approximation remains.

Interestingly, if the same model is trained on $[0,4\pi]$, this is the result:

svg

with

    Epoch 1000, max(loss_values) = 0.00219363
    Validation max(validation_loss) = 0.0024011

And the convergence behavior in terms of $\max(MSE(y_{validation}))$

svg

The validation error is too large because the asymmetry is this time at $4\pi$, but the the same model seems to capture the nonlinearity on $[0,2\pi]$. Since the point of this post was to document torch::Sequential, I’ll leave out further hyperparameter tuning of the NN for this example.

Thanks to Andre Weiner for the help with hyperparameters and the model IO!

Data & Code

The code is available usin the 2020-02-08-aiFoamLearn tag in the GitLab repo

The application is in applications/test/aiFoamTestPyTorch. You can compile the application with wmake if you have OpenFOAM installed and run it anywhere, or with cmake

?> mkdir build && cd build && cmake .. && make

and run ./aiFoamTestPyTorch in the build directory.

When compiling it within OpenFOAM, make sure OpenFOAM is built with support for C++14, by changing -std=c++14 or newer -std=c++2a in

    $WM_PROJECT_DIR/wmake/rules/General/Gcc/c++

if you are using the gcc compiler, or another sub-folder of General for the compiler you use for building OpenFOAM.