I know I am opening myself up for some ridicule and perhaps more than a few comparisons but my house has a few terabytes of data to manage. Eight terabytes of files, pictures, movies, audio and miscellaneous “stuff” in the library and as I digitize more and more of my records I find the estate growing every day. When I managed the Legoland California infrastructure 15 years ago we had well under a terabyte of data in total. Things sure are different these days. I manage more demanding users at home, for one. They do not understand proliferation or service levels. If the cell phone does not sync with the cloud or my server goes down my phone rings. Never mind that I am in Chicago and it is midnight; the 11 year old needs her Restaurant Impossible. Sure, she could watch last week’s episode again but then how would she ever keep abreast of the latest failed attempt at haute cuisine? At least when I wore a pager (remember those?!) I had an on call schedule. No more. I am ALWAYS on call.
And I am ALWAYS managing the data. It will always grow, never shrink and will remain a constant source of “why the heck are we keeping that?” queries for decades to come. To that end I have some observations about data management policies and ILM. True, my little octuplet of terabytes is nothing for an enterprise get too excited about but I get down and dirty with the data, so let us get down and dirty with some practical data management for the home. Ted did start this blog with “practical” in the name, after all.
Until you reach a critical mass, formal backups can be more trouble than they are worth: I just make 3 copies instead.
Backup software in general just sucks. It is never flexible enough, fast enough, low enough on the processor cycles or reliable enough to count on. (Acronis, I love you, but your inter-version compatibility challenges make me cry) It does not matter whether I am backing up to disk, tape, or some other media, whether I have cheaped out on the software or gone enterprise…none of has proven as reliable or simple as keeping 3 copies of critical data on hand. Why 3?
Google engineers published a paper years ago with a deep analysis of backup failure rates and data restore frequencies. I have lost the source but if I can find this paper again, I will link to it. In short, they recommended eschewing most RAID protection and simply replicate for 3 data copies. It is a tough sell but the data does not lie. I have put this approach to the test over the years and have shelved my formal backup software for anything other than volume imaging (full hard drive recovery operations). For my use case, backups are a waste of time. I use Allway Sync for my rinky-dink operation but there are a bevy of other solutions that perform the same function. Your enterprise class storage infrastructure probably has this function built in.
Duplicates are the number one problem. Long live duplicates.
Aside from the 3 copy rodeo, duplicate data pops up everywhere. I use NoClone to find these things, since it scans for both exact content and file name/file type duplicates. That stupid “CrazyFrog.MP3” pops up everywhere and no matter how hard the children try to hide it by changing the name, date or size, I find it. Ditto for that quarterly review PPTX I got sent 14 times by 5 different people. I would have a 25 terabyte pile of junk by now if I failed to eliminate duplicates. This becomes more of a problem as we share media libraries and have multiple reviews and markups of documentation across the family (e.g., tax time).
Organization is your friend. Have him over for dinner, often.
With all the content management built into your daily life it is hard to conceive of needing to keep the file cabinet tidy. iTunes will manage content for you, so will the operating system. Never mind your e-mail, SharePoint and any of a hundred other CMS tools out there for a “real” IT shop. The tools are great when everything is running well but when things break you are faced with abstraction layers that specifically prevent you from knowing the actual location of your critical data. Enough! Yes, the library construct is a neat idea but can I please have my network home or departmental drive organized in a hierarchical fashion without all the fancy hard and soft links? Oh, wait, yes I can; I am the CIO and CTO for my house. Done. Now I can find everything and my guests can, too. I can even print them a handy sitemap, which conveniently leaves out where the private data is stored. Do things become misfiled on occasion? Sure. That is when content management and indexing becomes useful. CMS is a fallback position, though. I should be able to find my tax returns in the “Financial” folder before using the search function. If you need to perform a Google desktop search to find something, your filing system sucks. Call someone for help. I hear TLC has a show about that.
Seasons don’t fear the reaper, nor should you: press DELETE every so often
It is cathartic and endorphin-releasing to delete, so why rob yourself of this singular pleasure? The rules for retaining data apply equally to my terabytes as they do your petabytes. I am required to hold my tax returns forever, I keep receipts for 3 years and other important documents for as long as the information is useful or could be a liability if lost. I empty my e-mail out every 6 months and delete anything that is a liability if kept past its expiry date. I do it in stages to keep from getting delirium tremens, first archiving and then aggressively hitting the delete key after a few months. I use a combination of manual and automatic intervention here. My hierarchical organization makes for quick work of identifying and rounding up the dross but on occasion I like to use automatic scanning tools to size up the waste. If I had an object storage appliance at the house I could just feed that bad boy a set of policy rules and let self-management take over.
Not everyone is on board with this plan, so your mileage may vary
My wife, for instance, cannot be bothered to use the “documents” library on her Windows 7 laptop. This is a long-standing convention for her and changing it directly affects her productivity. I can adapt. For other matters the question rapidly becomes whether or not I can support all the customization of my IT infrastructure that goes with accommodating non standard ways of managing the data. This is where my “SLA” comes in. If my users can manage with a handful of options for data preservation I can support them. If not, I cannot guarantee that critical contract document or term paper you just spent four days creating will be there when the hard drive crashes. If this sounds to you like I am managing to a service catalog against my home IT estate, you are hearing it right. There are only two differences between my house and a typical small business IT shop:
- I never need to worry about chargeback and
- My users tend to be more savvy than the typical small business users
There are things home IT pros can do and those they cannot. As IT infrastructure begins to take the same shape as our businesses, the difference and distinction blurs. So do the techniques for managing our data and information assets.