EDD: Issues, Law, and Solutions: Got ECA? Consider Data Sampling.

Tuesday, March 22, 2011

Got ECA? Consider Data Sampling.

It seems like everywhere you turn in litigation support circles Early Case Assessment (ECA) is thrown around like a frisbee at a frat party. However, despite its frequency in conversation, ECA can mean much different things to many different people in different cases. In the simplest of matters, ECA refers to the use of processing technology to "cull" or filter data that does not meet the scope of relevance in a given case. In more complex matters, ECA can entail the application of technologies to not only filter irrelevant data, but to leverage the processed information in order to gain a better understanding of the facts and merits of the case. Undoubtedly, both ECA applications are an important area of focus when it comes to reducing the risks and costs associated with litigation.

When confronted with an almost impossible data analysis problem, a tried and true technique to solve it has been the use of sampling. The mathematical analysis behind sampling is something that has been studied for quite a number of years. Also, sampling has also been put into practice for well over seventy years, in many fields from predicting results of elections and assessing quality of electric bulbs. Why not do the same for certifying your ESI productions, while also addressing defensibility and reasonableness?

Sampling as a way to assess quality is something the Electronic Discovery Reference Model (EDRM) Search Group authors covered in detail, with a strategy in a comprehensive EDRM Search Guide (see Section 9.5 and Appendix 2). And, while much of that work is still to hit the mainstream litigation scene as a general practice, I was pleasantly surprised to see it receive attention from a fellow blogger and litigator, Nick Brestoff, who highlighted this in a very thoughtfully crafted article in Law.com, titled A Strategy to Sample All the ESI You Need. I commend his article for helping the community understand the practical difficulties in getting a certifiable result that attorneys can stand behind. And, it is highly likely that the current practice is to certify your electronic discovery without a real measure of validity behind it.

That leads us to back to the mechanics of sampling, the math behind it, and its defensibility. As the EDRM Search Guide notes, meaningful sampling can only be done by the one who has the data, i.e., the producing party. While the Federal Rules of Civil Procedures (FRCP) Rule 26(a) lists required disclosures as well as signing and certification guidelines per Rule 26 (g), there is no agreed upon way to specify sampling parameters as well as the results of sampling.It is in this context, Nick Brestoff’s article is significant – it explores practical ways in which the producing party can shift the sampling mechanics to the requesting party. I do think, however,that there is a logistical problem with this–most litigators will balk at producing the largely irrelevant and non-responsive items to the other side.

Perhaps the real need is for the requesting party to specify in their Rule 26 (b) meet and confer, that the production be certified for completeness by also including a statement on sampling and its results. A simple request such as, “Sample the data for 98% confidence level and 2% error rate, and report the number of responsive documents” could be sufficient. The producing side can perform random sampling, per the sampling goals for the above request, selecting 13526 documents (based on the sampling table of EDRM Search Guide). This allows the attorneys representing the producing party to certify and sign off on an agreed-upon target.

In addition to the EDRM Search Guide, The Sedona Conference, Working Group Commentary, Achieving Quality in the E-Discovery Process is an indispensable resource for understanding the role of sampling. This paper discusses at length, several sampling methods, their applicability for various purposes, including certifying that the results meet a certain quality criteria. In addition, a number of electronic discovery cases have mentioned sampling as a way of overcoming the explosion of data volumes. A primary application of sampling is for evaluating proportionality claims, something that has moved from a simple assertion into an informed argument, with specificity on proving cost burden.

In Barrera v. Boughton, 2010 WL 3926070 (D. Conn. Sept. 30, 2010), the court ruled that a phased approach to ESI discovery is appropriate and quotes an earlier case, S.E.C v. Collins & Aikman Corp, 256 F.R.D. 403, 418 (S.D.N.Y. 2009), that “[t]he concept of sampling to test both the cost and the yield is now part of the mainstream approach to electronic discovery.” The sampling recommendation in this instance was both a reduction of number of custodians from forty to three, as well as a significant reduction in the date range for the search. What was initially a $60,000 ESI search and discovery effort was reduced drastically to under $13,000.

Similarly, sampling is suggested in both M. Adams & Assoc., L.L.C. v. Fujitsu Ltd., No. 1:05-CV-64, 2010 WL 1901776, and Mt. Hawley Ins. Co. v. Felman Prod., Inc. as a way to perform a small set of search terms on a smaller number of custodians so as to get a sense for the larger electronic discovery costs.Clearone Communications v. Chiang offers another example of sampling by the use of Boolean logic to combine more common search terms thereby avoiding over-inclusiveness.

Per the Sedona commentary definitions, this type of sampling is referred to as “judgmental sampling” wherein the practitioner has a general sense of which of the several custodians and date range is most likely to offer the greatest yield. As judgmental sampling becomes more widely adopted as a way of controlling costs, electronic discovery sampling can embrace the benefits of statistical sampling as well. It is a natural next step, as even with narrow sampling criteria of judgmental sampling, the cost of review can be high. One area where statistical sampling has an advantage is that quantifiable measures of error and confidence intervals are possible, while judgmental sampling has no such formal measurement. Again, if the requesting party wishes to ensure a level of completeness and quality and if the producing party needs a basis for certifying their productions, statistical sampling can be a powerful aid.

If you've gotten this far - I would be remiss if I failed to give a shout out to the VCU Rams and Richmond Spiders basketball teams for representing our fine city in the Sweet 16 of the NCAA Basketball Tourney.

No comments:

Post a Comment

Sedona Principles 2nd ed.

1. Electronically stored information is potentially discoverable under Fed. R. Civ. P. 34 or its state equivalents. Organizations must properly preserve electronically stored information that can reasonably be anticipated to be relevant to litigation.
2. When balancing the cost, burden, and need for electronically stored information, courts and parties should apply the proportionality standard embodied in Fed. R. Civ. P. 26(b)(2)(C) and its state equivalents, which require consideration of the technological feasibility and realistic costs of preserving, retrieving, reviewing, and producing electronically stored information, as well as the nature of the litigation and the amount in controversy.
3. Parties should confer early in discovery regarding the preservation and production of electronically stored information when these matters are at issue in the litigation and seek to agree on the scope of each party’s rights and responsibilities.
4. Discovery requests for electronically stored information should be as clear as possible, while responses and objections to discovery should disclose the scope and limits of the production.
5. The obligation to preserve electronically stored information requires reasonable and good faith efforts to retain information that may be relevant to pending or threatened litigation. However, it is unreasonable to expect parties to take every conceivable step to preserve all potentially relevant electronically stored information.
6. Responding parties are best situated to evaluate the procedures, methodologies, and technologies appropriate for preserving and producing their own electronically stored information.
7. The requesting party has the burden on a motion to compel to show that the responding party’s steps to preserve and produce relevant electronically stored information were inadequate.
8. The primary source of electronically stored information for production should be active data and information. Resort to disaster recovery backup tapes and other sources of electronically stored information that are not reasonably accessible requires the requesting party to demonstrate need and relevance that outweigh the costs and burdens of retrieving and processing the electronically stored information from such sources, including the disruption of business and information management activities.
9. Absent a showing of special need and relevance, a responding party should not be required to preserve, review, or produce deleted, shadowed, fragmented, or residual electronically stored information.
10. A responding party should follow reasonable procedures to protect privileges and objections in connection with the production of electronically stored information.
11. A responding party may satisfy its good faith obligation to preserve and produce relevant electronically stored information by using electronic tools and processes, such as data sampling, searching, or the use of selection criteria, to identify data reasonably likely to contain relevant information.
12. Absent party agreement or court order specifying the form or forms of production, production should be made in the form or forms in which the information is ordinarily maintained or in a reasonably usable form, taking into account the need to produce reasonably accessible metadata that will enable the receiving party to have the same ability to access, search, and display the information as the producing party where appropriate or necessary in light of the nature of the information and the needs of the case.
13. Absent a specific objection, party agreement or court order, the reasonable costs of retrieving and reviewing electronically stored information should be borne by the responding party, unless the information sought is not reasonably available to the responding party in the ordinary course of business. If the information sought is not reasonably available to the responding party in the ordinary course of business, then, absent special circumstances, the costs of retrieving and reviewing such electronic information may be shared by or shifted to the requesting party.
14. Sanctions, including spoliation ﬁndings, should be considered by the court only if it finds that there was a clear duty to preserve, a culpable failure to preserve and produce relevant electronically stored information, and a reasonable probability that the loss of the evidence has materially prejudiced the adverse party.
Copyright © 2007 The Sedona Conference®. All Rights Reserved.
Reprinted courtesy of The Sedona Conference®.
Go to http://www.thesedonaconference.org/ to download a free copy of the complete document for your personal use only.