It's 10 P.M. - Do You Know
Where Your Data Is?
Believe it or not, most firms don't know exactly where their
data is -- at least not throughout the entirety of the
lifecycle. Large organizations, for example, may have systems of
such tremendous complexity, and so many interaction points
between systems and applications, that maintaining an inventory
of where the data is throughout the entire lifecycle is complex
in the extreme.
Do you remember the "Perfectly Normal Beast" from Douglas Adams'
Hitchhiker's Guide series? If you haven't read about it (or
don't remember), the Perfectly Normal Beast is a fictional
creature -- kind of like a buffalo -- that migrates twice a year
across the fictional "Anhondo Plain."
It's (ironically) called "perfectly normal" because it
spontaneously appears at one end of the migration path, thunders
across the plain in a tremendous stampede and spontaneously
vanishes at the other end -- nobody knows where it comes from
and nobody knows where it goes to. This is, in my opinion, a
great metaphor for the flow of data within the typical
enterprise.
Here's what I mean. From an IT perspective, keeping track of
data is like keeping track of the Perfectly Normal Beast: It
enters into our the scope of our awareness when it hits the
systems and applications that we're responsible for -- it flows
across those systems like a thundering herd -- and ultimately it
"vanishes" when it leaves those systems to flow to areas outside
IT control (vendors, partners or applications outside of IT).
Extremely Challenging
While the data is inside the boundaries, it's pretty hard to
miss -- but trying to keep track of it before it enters or after
it leaves those boundaries is extremely challenging. Unlike the
Perfectly Normal Beast though, it doesn't just enter at one
point and vanish at another. Instead, it's even more complex
because there are multiple places where it might originate from
and multiple points along the path where it might leave our
scope of control and awareness.
From the IT side of things, the fact that data behaves this way
makes our lives pretty difficult -- specifically, in most
respects the data is still our responsibility even when it's
outside our scope of awareness. All of us in IT have felt the
pain of addressing what seems to be a never-ending parade of
regulations -- PCI (Payment Card Industry Security Standards),
HIPAA (Health Insurance Portability and Accountability Act), SOX
(Sarbanes-Oxley Act), breach disclosure, e-Discovery -- the list
goes on and on.
All these regulations have one core thing in common: They all
require that an organization protect the regulated data
throughout the entire lifecycle. Breach disclosure laws, for
example, don't just require that we notify in the event that
data is lost within our systems -- they also require that we
notify in the event that our outsourcing Latest News about
Outsourcing partner loses it -- or our outsourcing partner's
partner loses it. In short, the regulations we're required to
meet presuppose that we know where our firm's data is, but
actually doing that in practice -- actually keeping track of the
data our firms process day-to-day is an extremely challenging
proposition.
So Where Is the Data Anyway?
Believe it or not, most firms don't know exactly where their
data is -- at least not throughout the entirety of the
lifecycle. Large organizations, for example, may have systems of
such tremendous complexity, and so many interaction points
between systems and applications, that maintaining an inventory
of where the data is throughout the entire lifecycle is complex
in the extreme.
Smaller firms, while potentially having a much smaller number of
systems and interaction points between systems, also have fewer
staff available to deal with and attend to keeping track of data
within the organization. Both large and small firms also have to
address the issue of locating places where data exists "under
the radar"; for example, QA (quality assurance) systems that use
a copy of production data for test, developers who might make a
copy of data elements to test transaction Free Trials. eCommerce
Data Solutions, Tax Rates, Address Verification & more. flows,
and staff who might send or receive data via unapproved means to
get the job done.
That's just inside our firms. How many places does the typical
firm share data with vendors and partners? Most likely, it's
quite a few. Those third parties we share data with might, in
turn, have data-sharing relationships with others; they might,
for example, subcontract work or outsource certain processes --
they might share access to network resources where our data is
resident.
Add to this mix the fact that technology is constantly changing
and keeping track of the data gets even more complex -- new
applications being deployed, new systems being released, and
business process being refined and adjusted all make the
situation more complex and make the challenge of maintaining an
accurate picture even more difficult. Realistically speaking, by
default, most organizations don't have the time or resources to
keep track of all the places where the data comes from and where
it goes to.
So What Can We Do?
Many organizations have spent quite a bit of time and money
looking for solutions to this problem. They may have invested,
for example, in automated approaches and products designed to
locate and keep track of data as it travels through systems in
their enterprises. They may have updated policy and procedures
to ensure that data (particularly regulated or sensitive data)
is labeled and classified appropriately throughout the firm or
they may have spent time doing process mapping to document the
processes in place that may "touch" this data.
However, each of these approaches in isolation leaves some
serious gaps. Specifically, automated approaches tend to locate
only data within the infrastructure under IT control -- it
won't, for example, locate the areas where data might exist in
hard copy or track data through processes that are under the
control of a vendor or partner.
Procedural approaches such as updating policy for data
classification have the disadvantage that they often require
humans to understand and follow the policy -- individuals might,
for example, forget to apply the policy during system Manage
remotely with one interface -- the HP ProLiant DL360 G5 server.
development or they may run into situations (such as deployment
of COTS solutions) where ensuring appropriate classification and
labeling isn't supported by the product.
Finally, "paper-based" analyses centered around documenting data
flow are only as accurate as they are kept current: changes to
technology and process are rapid and make this type of
documentation difficult to keep up.
A Blended Approach
Given the shortcomings of these methods in isolation, one useful
strategy is to use a blended approach. Ensure that policy is
updated to ensure data classification and data labeling; in
addition, ensure that legal and/or purchasing writes language
into new contracts to ensure that vendors to the same.
Attempt to strategically use other large-scope process-related
efforts -- such as business impact analysis done for BCP/DR
(business continuity planning/disaster recovery) purposes -- to
gather information about where data currently exists and to
document the flow of data within the firm. Couple these
approaches with a technical solution to "tip off" IT in the
event that new technology or new processes have an impact on how
and where data is stored within the firm, and ensure that the
data from automated data cataloging tools are in sync with the
view provided by the documentation efforts.
Most importantly, ensure that there's someone with ownership of
keeping the data "map" current -- nothing goes by the wayside
faster than something that nobody owns.
--By Ed Moyle
E-Commerce Times
08/21/07 4:00 AM PT
|