----------------------------------------------------------------------------------------------------------------------------

Index file details

The index file is like a lookup table that maps each FileIdentifier (a GUID) to an offset in the data file (where the actual data is stored). See the picture below for a highlevel overview of the index file, please note that its a simplified version of the index file, in the next section we'll go into some more details.


Details of the index file can be seen in the picture below. Basically each byte in the file identifier (a GUID) is an index in a lookup table. The last lookup table is a pointer to the offset in the data file.



As you can see in the picture, one struct (let's call it an allocation table) for one byte of a GUID contains 256 positions, where-as each position holds a 64bit pointer to the next allocation table. If we would store one GUID in the storage, this will have costed us 16 x 1024 + 100 bytes (16 because there are 16 bytes in a GUID, 1024 as 256 positions x 8 byte (=64 bit) and 100 because the header of the file is 100 bytes). Depending on similarity between the GUIDs in the storage, the index file will become bigger, or will keep its size; looks at the following scenario's:
  • Scenario A

The GUID (255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0) and the GUID (255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255) would 'share' all their 16 structs, thus if both would be stored in the same FileStorage, both will fit in 16 x 1024 + 100 bytes.
  • Scenario B

The GUID (255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,0) and the GUID (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) would only 'share' their first (1) struct. Note all GUIDs will always share their first index struct.

How to determine a free GUID

A nice feature of a GUID is that its practically impossible (although theoretically possible) to generate an identical GUID once you pick a random one. Random GUIDs are picked using the static function

GUID.NewGuid()

The fact that the generated GUID is unique is nice, but is less good news for the index approach we picked for NFileStorage; when inserting new files using random GUIDs will generate a tremendous amount of these structs. So if you want to create a more efficient GUID generator for NFileStorageHandler oposed to using GUID.NewGuid(), try to keep the initial byte sequences of GUIDs the same, and try only to vary the byte values at the end of the array of bytes (this will create the most efficient index structure, and thus consume the fewest bytes). The most ideal ordering of the GUIDs would ofcourse be to have a sequential one like so:

GUID BYTES (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)
GUID BYTES (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1) 
GUID BYTES (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2) 
...

----------------------------------------------------------------------------------------------------------------------------

Last edited May 30, 2009 at 10:40 AM by barkgj, version 10

Comments

No comments yet.