I've always been suspicious of the claims of the "predictive coding" zealots. Every year it seems that there is a new buzzword in the field of Ediscovery. Technology Assisted Review - Check! Information Governance - Check! Cloud Computing - Check. Big Data - Check.
My friend John Martin did a magnificent job of distilling some of my concerns in his blog entry that can be found right here.
The Emperor has No Clothes - and PC Can't See Image-Only Documents
There are several parallels between predictive coding (AKA technology assisted review) and Hans Christian Andersons' tale, "The Emperor's New Clothes." In the story, two weavers tell the emperor they will make him a suit of clothes that will be invisible to those people who are unfit for their position, stupid, or incompetent. None of the emperor's subjects want to admit to those deficiencies so the emperor parades around with no clothes on until a child states the obvious - the emperor has no clothes.

This might be just an esoteric debating point if virtually all documents had associated text. However, in some industries like oil & gas, half or more of some collections will be engineering drawings and schematics that were output to image-only PDF for distribution and use by those who don't have the software licenses needed to view the documents in their original file formats.

PDFs will potentially be among the most relevant file types in a collection because that is the format used to distribute information within and among groups of people within an organization, and among organizations. Note that even if in some unique e-discovery settings predictive coding is acceptable, the text-restriction failing of predictive coding will be fatal for broader information governance purposes.
So... if you're going to use predictive coding, at the very least measure what PC doesn't "see." If you're planning on using PC for information governance purposes, make sure that the organization doesn't mind not classifying a potentially significant percentage of its documents.
No comments:
Post a Comment