I’ve recently had the need to read an entire file into a byte array but still had the need to extract integers from it. Turns out, it’s very possible to do this but platform dependently since the byte-ordering (or endianness) on different machines can differ.

For example, Little-Endian machine A has an integer that it needs to write to disk: 12345. Big-Endian machine B has the same integer that it needs to write to disk as well.

The hexadecimal representation of the file on machine A would look like: 00 00 30 39 while machine B’s output would look like: 39 30 00 00.

If we were to read machine A’s file on machine B, the output would not be 12345 as expected but 245618442240 instead. Now that’s quite a problem. Any file written on machine A would be useless in any other environment.

In the meantime, be aware that there’s no way to determine if a file is Big or Little Endian so you would need to set a standard for your file. I use Little-Endian byte ordering since that’s my machine’s native format and 99% of the time, yours as the x86 family of processors is Little-Endian.

So in order to read any of the files we have on disk, regardless of endianness, we first need to detect the endianness of our current machine and somehow detect the endianness of the file we’re trying to read. This is not a big pain in the ass as you might suspect since machines, in addition to files, also order their memory in Big- or Little-Endian byte ordering.

C++ Code:

const bool IsLittleEndian(void)
{
	static const signed int Bytes = 58;
	static const bool Test = (*(&Bytes) != 58);
	return(Test);
}

A little explanation to the code above: the first line in the function sets the value of the integer to 58 (the integer is static, so this will only be declared once) so that the value in hexadecimal becomes: 3A 00 00 00.
An integer is four bytes long so on a Little-Endian machine, the first byte should be our number, 58, which is smaller than the maximum size of one byte, 255, meaning that the number will not overflow to the next byte.
The next line retrieves the address of the integer and compares the first byte of the integer at index zero with the number 58. If the machine is Big-Endian, the function will return false and true for Little-Endian.

Now that we have this out of the way, you can save your file in the endianness that you want so you retain portability. From this point on, I’m going to assume you have a Little-Endian system.

The next obstacle I faced in trying to convert an array of bytes into an integer was the actual conversion. Consider again our previous example of integer 12345 in Little-Endian: 39 30 00 00. If we were to read this sequence of bytes into an integer from left to right, the number would be wrong again. Because of this we need to read the bytes one-by-one and insert them into the desired integer.

C++ Code:

const size_t IntSize = sizeof(int);

const int DecodeInt(const char ByteArray[])
{
	int RetVal = 0;

	for(int i = int(IntSize - 1); i > -1; ––i)
	{
		RetVal |= int(ByteArray[i] & 0xFF);
		if(0 != i)
		{
			RetVal <<= 8;
		}
	}

	return(RetVal);
}

Alright, that’s a bit of code there, let’s go over it line by line.

const size_t IntSize = sizeof(int);
The first line is simply a constant I’ve added to improve readability a bit, it equals the size of an integer in bytes (4).

int RetVal = 0;
The return value is a signed integer initialized to zero (so we don’t have numerical surprises).

const int DecodeInt(const char ByteArray[])
The actual function takes a byte-array as parameter but will only process 4 bytes of information from it.

for(int i = int(IntSize - 1); i > -1; ––i)
As you might have noticed, the loop inside the function loops backwards, this is to convert from the format on disk to the format of a regular integer.The loop stops when the index -1 has been reached because we also want to process index 0.

RetVal |= int(ByteArray[i] & 0xFF);
This line is part one of the bread and butter of this function. The line logically OR-s the clamped (max 255, or 0xFF) byte value to the integer. This basically would result in something like:

10011101 (0x9D, some value from a byte-array)
11111111 (0xFF, the 255 mask for clamping)
-------- AND
10011101 (0x9D, result is the same as 0x9D < 255)

10011101 (0x9D, the result of the AND operation)
00000000 (0x00, the integer RetVal's value at this point)
-------- OR
10011101 (0x9D, the result is the same, no clamping was necessary)

If you're sure that the values you're using are between 0 and 255, you don't have to clamp to 255 but it's in there regardless. This is specifically handy for freak negative values that might pop up when reading from files and unsigned to signed casting.

if(0 != i){ RetVal <<= 8; }
Translates to: if the current index is not the last (0), shift the value of RetVal 8 (amount of bits in a byte) positions to the left. This assures that the value that were assigned are in the correct position, the last iteration is not shifted since it will be inserted at the correct place.

Sample iteration for value: 3258794

{ 0xAA 0xB9 0x31 0x00 } = Byte Array for value 3258794

0000 0000 0000 0000 0000 0000 0000 0000 RetVal = 0x00
0000 0000 0000 0000 0000 0000 0000 0000 After OR-ing 0x00
0000 0000 0000 0000 0000 0000 0000 0000 After Left Shift of 8
0000 0000 0000 0000 0000 0000 0011 0001 After OR-ing 0x31
0000 0000 0000 0000 0011 0001 0000 0000 After Left Shift of 8
0000 0000 0000 0000 0011 0001 1011 1001 After OR-ing 0xB9
0000 0000 0011 0001 1011 1001 0000 0000 After Left Shift of 8
0000 0000 0011 0001 1011 1001 1010 1010 After OR-ing 0xAA

Result: 1100011011100110101010, or the decimal value of 3258794

Good luck :)