----------------------------------------------------------------------------------------------------------------------------

Background information about the NFileStorage project

  • Why?

I am currently working on a project (http://www.prijsvaneenhuis.nl) that stores thousands of relatively small files (currently 100.000 pictures (photos) of houses in The Netherlands) which are exposed on the web to clients. When working on the project I had to make several decisions. The first important decision was how to identify the files, and second to decide where to store the files.
  • How to identify 100.000+ files?

During my previous projects I have quite some good experience using GUIDs (Global Unique Identifiers), and more specifically the so called Combined GUIDs (http://www.constantum.com/Articles/Comb-Guids/). I therefore decided to uniquely identify each picture with a GUID.
  • Where to store?

The simplest option is ofcourse to store the files in a specific folder, that is exposed by the webserver, and ofcourse basically there's nothing wrong with that. However, storing 100.000+ files in one folder makes that folder nearly inaccessible if you open it with Windows Explorer. The poor-man's-solution is to spread the thousands of files into sub folders. So a GUID like new Guid("aaaaaaaa-bbbb-cccc-dddd-123456789abc") could be stored in a folder structure like "/aaaa/aaaa/bbbb/cccc/dddd/1234/5678/9abc". The more sub folders, the more accessible each folder will be. However, when you have to deploy, zip, or backup these files you get a big penalty. In my case the files themselves consume like 427 MB of data, which should be copied or backupped in a couple of seconds (or minutes at max). Another downside is that you loose the control a little bit (as I experienced myself). What if a specific file should be deleted manually? You would have to traverse all the sub folders to locate the file. So even though this option is possible, it definately has some drawbacks.

Another option for the storage would be to store the files in database, for example SQL Server. Being a true Dutch, I decided it would be better not to use the storage space of my SQL Server Express edition, as that database is free but limited to (around) 4 GB of storage space. So in my case I want to allocate the 4 GB possible space on SQL Server Express on the 'core' project data, like GPS coordinates, postal codes, streets, etc.

I therefore decided it would be best to bundle the files on the file system, in a way that looks a little bit like storing files in a Winzip file, without using the compression features (as I want the data to be available rapidly). To store the data in a big file also ofcourse requires some indexing table to be managed, and in my approach I don't mind that much how big the index file would grow (you nowadays buy 1 TB of HD space for 75-100 eur I believe) as long as the files are accessible in a fast way.
  • Why on CodePlex?

I have been using quite some CodePlex and SourceForge projects myself so far and I am pretty happy about that. I initially planned to put the project on SourceForge (don't know why exactly), but I had to wait 1+ day on some confirmation to startup the project over there and I wanted to get started right away which was the case on CodePlex, thumbs up for CodePlex :).
  • Why share it in the first place?

Each project can leverage from the fact that a whole bunch of people is looking at it to improve the overall result. I am pretty sure the NFileStorage can be used frequently on many projects and serves a good purpose, so enjoy it, and if you have feedback feel free to share!

----------------------------------------------------------------------------------------------------------------------------

Last edited May 30, 2009 at 10:38 AM by barkgj, version 4

Comments

No comments yet.