Tackling that unstructured data mess, practically

Yesterday I wrote a little about the resurgence of phone calls asking about ILM and played a bit of a highlight reel as to a workable strategy from the perspective of a Compliance expert.  The challenge, as I stated, was to tackle this massive problem in unstructured data without trying to solve for world peace in the process.  Today I want to talk more practically about addressing unstructured data growth.  This is timely, since the customer panel at EMC World is talking about this right now.

How We Do It

In Consulting, we help clients tackle the unstructured data estate either through a complete storage and data management strategy or through a tactical, targeted project to size, identify and execute on opportunities to control that data.  The steps are the same either way and assume the client has no tools at their disposal to accelerate the process:

1.) Deploy lightweight discovery tools to capture the disposition of the unstructured data in question such as duplication, aging (last modification, last access) and file types (among other metadata elements)

2.) Talk to the business to understand their requirements for managing those unstructured files such as retention and access requirements, compliance considerations and other legal/regulatory concerns

3.) Review the application and storage infrastructure architecture requirements to support the unstructured estate

How That Works

Step one simply tells us if there is a problem in the first place.  If we scan a petabyte of data and find it frequently used and possessing of high business value, the goals change in steps two and three pretty rapidly from purging data to managing it more cheaply.  We are not often surprised.  Out of this data collection effort we know how big is the problem and where should we focus efforts to correct it (e.g., de-duplication versus archive or purge).

Step two is the trickiest of the three.  I want to understand what the big picture looks like: are they a regulated business where ALL data is subject to controls or is only a small fraction of the unstructured estate covered?  Do they have a set of policies in place today?  Are they reasonable?  Do business units routinely try to subvert IT’s data management approach?  These answers help us frame up recommendations that will actually work in practice and usually give IT intelligence about their consumers they did not possess before.  I can’t tell you how many times I’ve heard “If I knew [some critical business driver] I would have done this so much differently…”

Step three is the easiest piece and also the most fun because we get to play with ideas and speculate about the wrench-turning.  We aim to uncover the features and functionality this business needs to keep people productive in their jobs as they create and use that unstructured data.  If we make architecture recommendations, these data points guide our thinking.  We can do that most effectively when performance management tools are deployed but even without them, this is the one area your IT professionals probably have wired.  A few interviews are all it takes.

Putting it all together

Now that I know the size of the unstructured “problem” and where to target my solutions, I can combine that with useful recommendations for tools, process, policy and so forth that don’t conflict with the mission of the business.  As an example, EMC found years ago that e-mail (and attachments, specifically) was the biggest target with a measurable return on investment.  With a few policy changes and implementation of an archive solution that problem was made far less troublesome.  Did we the users complain?  Of course, but we came around.  We griped mainly because we hate change of any kind.

We might recommend a course of action that includes acquiring a tool or some technology to automate or accommodate the changes we think are needed.  What we won’t do is recommend magic bullet solutions.  Get in, solve the problem, get out.  Like my colleague Sally Dovitz, I just want to keep it simple.  If the business case for widening the net is strong, so be it.  My goal is to keep our guidance manageable and to avoid creating more work and cost than the problem can justify.  I want to tell you to “archive these files right over here and delete those over there.”  If that action demands a tool, I’ll tell you what that tool should be capable of doing.  If not, let’s dive in and clean up the cesspool.

And that’s the way you get started dealing with your unstructured data estate: simple, direct, defined.  Anyone with the a tool (there are lots of free ones) that can report on the disposition of your files can tell you which ones should go away.  The goal should be to get to a positive action for disposing, migrating, managing or otherwise dealing with the data.  The goal should not be to wrap your unstructured data in overcomplicated policy, controls or rules in the hope that some magic tool will solve the problem for you.  Your end users are crafty; they will find a way to unsolve it.


About Peter

Peter is a Geocacher, competitive cribbage player, surfer, amateur magician, golfer and star watcher (the astronomical kind). In his day job for Datalink, Peter is a Senior Manager with their Cloud Service Management Practice helping customers build, manage and improve their legacy IT and Private Cloud infrastructures through Automation, Orchestration and clean living. We're not so sure on the clean living.
This entry was posted in Big Data, Future of IT, The Nature of IT and tagged , , , , , , , , , , . Bookmark the permalink.

3 Responses to Tackling that unstructured data mess, practically

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.