int vs. float/double performance for matrices (neural networks)

Sioso
2 min readJul 28, 2023

I have started a project to write a neural network library from scratch in C++. Obviously, it is not meant to compete with TensorFlow, SciKit, PyTorch, etc. but it (hopefully) will force me to learn the inner workings of deep learning, including the mathematical concepts, as well as allow me to customize networks better.

Starting with a simple ANN, I quickly realized I need a matrix computation library. For C++, the most popular would probably be Eigen.

After doing a good amount of research on neural networks, when I started playing around with Eigen, I was thinking about a choice I had for the data storage in my NN. Should I store the values in the network with integers of floating point data types?? Int should be faster, right? “Integer arithmetic is generally faster than floating-point arithmetic” is what a quick google search gives, and this was my experience as well.

But most networks seem to use float/double types. Although int8 quantization has also become more popular, the standard is still to train in float.

I decided to run my own tests to be sure (take with a grain of salt, as the answer probably depends based on hardware)

#include <iostream>
#include <Eigen/Dense>
#include <chrono>

using namespace std;
using namespace Eigen;
using namespace chrono;

int main()
{
int dimension = 500;

Matrix <int, Dynamic, Dynamic> intMatrixA(dimension, dimension);
Matrix <int, Dynamic, Dynamic> intMatrixB(dimension, dimension);
intMatrixA.setRandom();
intMatrixB.setRandom();

MatrixXf floatMatrixA(dimension, dimension);
MatrixXf floatMatrixB(dimension, dimension);
floatMatrixA.setRandom();
floatMatrixB.setRandom();

MatrixXd doubleMatrixA(dimension, dimension);
MatrixXd doubleMatrixB(dimension, dimension);
doubleMatrixA.setRandom();
doubleMatrixB.setRandom();

auto start = high_resolution_clock::now();
MatrixXi resultI = intMatrixA * intMatrixB;
auto stop = high_resolution_clock::now();
cout << duration_cast<milliseconds>(stop - start).count() << "\n";

start = high_resolution_clock::now();
MatrixXf resultF = floatMatrixA * floatMatrixB;
stop = high_resolution_clock::now();
cout << duration_cast<milliseconds>(stop - start).count() << "\n";

start = high_resolution_clock::now();
MatrixXd resultD = doubleMatrixA * doubleMatrixB;
stop = high_resolution_clock::now();
cout << duration_cast<milliseconds>(stop - start).count() << "\n";

}

The results in milliseconds (note that parallel computing has been enabled through Eigen, with 16 cores):

I- 5032
F- 1072
D- 2266
I- 5048
F- 981
D- 2180

Much to my surprise at the time, the integer matrix multiplication performs worse than the float/double matrix multiplication. By quite a margin!

With some more research, I found that this is linked very much to the hardware. Most modern desktop CPUs often optimize floating point multiplication, and in some cases, so much so that floating point multiply is faster than integer multiply.

So for my purposes, I’ll be using float data types for my neural network (at least at first), which also makes the math process slightly easier since it will be much easier to avoid overflow errors, etc.

--

--