[SOLVED] Generating huge amount of hashes

Issue

I want to generate a large amount (10 TB) of seemingly random, but predictable numbers. The generation speed should exceed that of fast SSDs, so I want 3000 MB/s to 4000 MB/s.

After the file has been written, the numbers will be read again and generated again, so that they can be compared. The total program is supposed to check disks.

At the moment I’m thinking of hashes. The data to be hashed is just a 8 byte number (ulong) for the predictability. So in the binary file it looks like this

<32 bytes of SHA256(0)> <32 bytes of SHA256(1)> ...

I don’t think I can use a Random number generator with a seed, because I can’t tell the random number generator to generate the nth number. But I can tell the SHA256 algorithm to calculate SHA256(n).

I made a test with 128 MB of data using the SHA256 algorithm like this:

Parallel.For(0, 128 * 1024 * 1024 / 32,     // 128 MB / length of the hash
    a => {
        var sha = SHA256.Create();
        sha.Initialize();
        var ba = new byte[8];
        ba[0] = (byte)((long)a >> 0 & 0xFF);
        ba[1] = (byte)((long)a >> 8 & 0xFF);
        ba[2] = (byte)((long)a >> 16 & 0xFF);
        ba[3] = (byte)((long)a >> 24 & 0xFF);
        ba[4] = (byte)((long)a >> 32 & 0xFF);
        ba[5] = (byte)((long)a >> 40 & 0xFF);
        ba[6] = (byte)((long)a >> 48 & 0xFF);
        ba[7] = (byte)((long)a >> 56 & 0xFF);
        var hash = sha.ComputeHash(ba);
        // TODO: aggregate the byte[]s, stream to file
    }
);

Like that, the throughput is only 95 MB/s on my Ryzen 7 2700X 8 core processor running at 4,08 GHz.

Any chance of speeding this up to 4000 MB/s?

Solution

I don’t think is is possible to reach that speed without using a gpu.
But here are few things you can do to gain some performance:

  1. You can utilize the localInit of Parallel.For to create the SHA256 object, as well as a byte array of size 8 to hold the data to be hashed, once per task.
  2. There is no need to explicitly call Initialize.
  3. Instead of converting the long to byte array manually, one byte at a time, you can use pointers or the Unsafe class to set the bytes all at once.
  4. Pre-allocate the array of bytes that will hold the hash and use TryComputeHash instead of ComputeHash since it allows passing a span for the output.

Here is a code implementing the mentioned above:

Parallel.For(0, 128 * 1024 * 1024 / 32,     // 128 MB / length of the hash
  () => (SHA256.Create(), new byte[8], new byte[32]),
  (a, state, tuple) =>
  {
    Unsafe.As<byte, long>(ref tuple.Item2[0]) = a;
    tuple.Item1.TryComputeHash(tuple.Item2, tuple.Item3, out _);
    var hash = tuple.Item3;
    // TODO: aggregate the byte[]s, stream to file
    return tuple;
  },
  tuple => tuple.Item1.Dispose()
);

Answered By – Sohaib Jundi

Answer Checked By – Dawn Plyler (BugsFixing Volunteer)

Leave a Reply

Your email address will not be published. Required fields are marked *