Thursday, July 2, 2009

A petabyte is no laughing matter


Ever feel like you need a break from work just to get your work done? You're not alone.
Information workers, who comprise about 63% of the U.S. work force, are each bombarded with 1.6 gigabytes of information on average every day through emails, reports, blogs, text messages, calls and more, according to preliminary data from a report coming later this year, an update of the 2003 "How Much Information?" report.

According to the report The daily flow of emails worldwide is currently 1,829 TB. Over the course of a year, the total would be 3.35 petabytes. It goes on to say that only half of the email traffic will be personal messages. Unsolicited email (also known as spam), commercial notifications and news alerts account for one-third of today's email load and will comprise nearly half of the traffic four years from now, the report said.

While this is fascinating and eye opening data - I got sidetracked wondering exactly what is a petabyte?..

How about this to quantify?:


A peta- is one quadrillion. That is one thousand trillion. A trillion is one thousand billion, which is one thousand million. So a quadrillion is one billion million

A petabyte of storage is a thousand terabytes. A terabyte of storage is one thousand gigabytes. So a petabyte is one million gigabytes

Steering the conversation back to laymans terms - a petabyte is 20 million 4 drawer filing cabinets or 13 years of recorded HD - TV (58K plus movies)

Most of us nowdays have a gigabyte of memory in our laptops. A thousand laptops is a terabyte and a million laptops is a petabyte.

To refocus back to litigation technology -

If the size of the data involved in litigation is growing by 50% per year and by all accounts will continue at this rate , the only way to effectively control litigation costs is to embrace the use of electronic discovery tools - tools specifically designed to reduce data populations. The good news is that understanding and using the basic electronic discovery tools is relatively straightforward.

1. Data collection. How the data is collected will have a significant impact on the overall cost of the case.
2. Filter the data by custodian and/or date ranges.
3. Cull out the system files.
4. De-duplicate the data.
5. Search the data using well conceived keywords - preferably obtained using a early case assesment tool.


Here’s a recent real life example of the power of eDiscovery tools. Superior Document Services recently worked on a case in which we performed a full forensic image of 10 computers and laptops. The total data size collected was over 1 terabyte. Using the five eDiscovery tools listed above – collection, filtering, culling and de-duplication -- we reduced the data set to 150 gigabytes. Working with counsel to develop key word terms for searching, we were able to reduce the 150 GB to 5 gigabytes. This represents a reduction of over 99% of the data size prior to attorney review. The attorneys reviewed the data and we produced one gigabyte of responsive data or under 50,000 documents.

Smart work processes are critical to success in e discovery. Always seek professional help. In other words kids - Do not try this at home without consulting an e discovery professional.









No comments: