Showing posts with label OCR. Show all posts
Showing posts with label OCR. Show all posts

Thursday, April 3, 2014

What BeyondRecognition Brings to Document Management

I found this article about BeyondRecognition written by Mimi Dionne which does an excellant job of explaining in plain english how BR can benefit every business with large unstructured data collections.
You can read the entire article right here
Ever heard of BeyondRecognition? If not, the time to learn is now. The Chantilly, Va.-based "document textnology" software provider offers document managers an alternative to optical character recognition (OCR), while delivering results with accuracy and speed.

How BeyondRecognition Works




BeyondRecognition (“BR”) may be a young innovation, but it is a viable alternative to OCR. It utilizes glyphs, a letter or character formed by pixels that are of a sufficiently different color from the background of the document as to be identifiable. BR groups like glyphs into clusters at the character and word level. BR converts one glyph per cluster to text as appropriate.
While OCR continuously decides what each glyph is, BeyondRecognition’s single instance technology need only recognize one glyph per cluster to form a catalog of letters or characters. The advantage: the return on investment of using single instance recognition technology is much higher with a smaller data set — a faster processing speed and better accuracy rate — which shortens the Records Management program’s work breakdown structure significantly.
Because BeyondRecognition software is glyph dependent — not text — it is more versatile:
  • BR is language agnostic. It currently recognizes over forty languages.
  • BR is symbology agnostic. It can recognize and relate non-text elements.
  • BR clusters visual similarities. It works on all kinds of documents.
  • BR is over ninety-nine percent accurate.
  • BR scales. It can analyze millions of pages per day per the BeyondRecognition server.
BeyondRecognition’s zonal attribute extraction permits subject matter experts to extract attributes from document classifications by clicking and dragging zones on one document per document type cluster.

Again for more of Mimi's article click here

Information Governance Lessons from the Six Blind Men and the Elephant


Information Governance Lessons from the Six Blind Men and the Elephant

Posted by Rich E. Davis
Mar 30, 2014 11:35:00 PM

Elephant_wBlindment_Fx600Most of us have heard about the parable of the six blind men and the elephant - it may actually be the first recorded instance of faceted classification. Six blind men touched different parts of an elephant and each described a completely different thing based on their own perspective or “view” of the elephant: The one who felt a tusk reported it as a pipe, the one who felt an ear reported a fan, the belly was reported as a wall, the trunk as a branch, a leg was reported as a pillar and the tail was described as being a rope.
This story illustrates several important information governance lessons:
Elephant_Separate_Views_Fx600Different Stakeholders Have Different Views & Needs. People’s views of and information needs from any given corpus of documents will vary according to where they are in an organization and the functions they perform. People with different roles in a company will naturally be interested in different attributes of the documents in the corpus and may well use different descriptors when describing or trying to find some of them. While some document attributes are common to all stakeholders, others, namely those which enable an individual or group to perform their job function within an organization, are probably not.
Here is an example of how different roles will be interested in different attributes:
The Offshore Power Plant
  • Stakeholders in the tax department need to know whether an expenditure on a sub-sea turbine, a critical component on a key project, can be categorized as an operating or capital expense in the jurisdiction where the project’s work is being performed.
  • The plant maintenance department needs to know:
    • When the warranty period kicks in.
    • Part numbers, nomenclature and service level.
  • Environmental Safety & Health wants to know that the team and all the contractors associated with the project are properly qualified and sanctioned to install the turbine to the engineering design specifications and operated within the tolerances.
  • RIM and Compliance wants to track the locations of all relevant project documents for information lifecycle management, disposition and regulatory reasons.
  • IT needs to ensure that business critical documents are backed up for disaster recovery and business continuity purposes.
  • InfoSec wants to know that the IT group has the requisite information to ensure that the IP associated with the project is properly secured and that the people who access the content have the proper authorization to do so.
     
Some or all of the information required by the stakeholders above will be objectively evident on the face of individual documents. Other “subjective” attributes may have to be assigned (e.g., “project lead engineer”) by knowledge workers with specific domain expertise, and other more granular data elements (e.g., installation location) may have to be assigned by linking attributes from other authoritative data sources or systems of record.
Elephant_Reconstituted_Fx600Just preserving documents without having a systematic, dynamically updatable and holistic view created by assimilating other interrelated data points will result in an incomplete picture of a project or process. Without a holistic way of assembling and viewing all the extracted document attributes of interest to the various stakeholders, the overarching information governance needs of the organization will never be met. There will be incomplete, ambiguous, erroneous and superfluous data points.
Limited Data Points Means Incomplete or Distorted Pictures. As the elephant parable illustrates, having only one or a few attributes available results in having an incomplete or distorted picture of what is being managed. The blind men’s picture is so distorted in fact, that when word of an elephant rumbling through cane fields destroying them in search of food reaches their ears, they have no adequate description for the sum of the parts, and thus no way of applying the individually assimilated knowledge in a holistic fashion. The more uniform, accurate and persistent the document attributes or facets that are available, the greater the ability of the organization to assimilate seemingly disparate information to form a more accurate picture of present and future state projects.
Elephant_Multiplied_Fx600Duplicated data sets. Without a holistic enterprise content plan, each stakeholder starts keeping their own copies of documents so they can extract the attributes they are interested in. The result is multiple copies of the same documents, multiple expenditures to extract the same attributes, and inconsistencies in ways that the same data is extracted and stored.

THE SOLUTION

The challenge described above is endemic. It exists across all types of businesses in every jurisdiction. Corporations of all sizes are dealing with big data symptoms and are stymied when comes to finding a cure that has not been available from prior technology.
Standing apart from the herd is Continuum Advisors. At Continuum, we believe in using powerful emerging technology to help our clients address their most daunting data management challenges. To that end, we have incorporated BeyondRecognition (“BR”) in our services matrix for IG, legal, information security, RIM and a host initiatives that required powerful, scalable data analytics.
BR is a radically new, data-driven information governance technology that meets the IG needs of multiple stakeholders in any enterprise, public or private. Continuum has implemented BR technology at multiple Fortune 500 clients with great success.
Elephant_BR_Consclustion_Fx600The highly experienced CA team chose to align with BR as it is the only technology in the world that automatically classifies electronic files or scanned paper documents based on their visual characteristics – and without having to waste time writing rules to identify each type of document or designating exemplars for each document type. This is tremendously important because accurate, consistent classification is the bedrock upon which all IG initiatives are built. BR solves this long-standing, previously intractable problem.
Subject matter experts can quickly determine how to classify all the documents in a document cluster by examining one or two documents per cluster. They can also associate a document type name with the cluster based on their organization’s document classification tree, and assign retention periods based on the classification.
Our subject matter experts in energy, financial services, and pharmaceuticals work with corporate knowledge workers to extract multiple attributes from each document classification by “painting,” i.e., clicking and dragging boxes, on an image of a document from each cluster. BR then automatically extracts the specified attributes and associates each attribute with the attribute or field names assigned by the subject matter experts. The extracted data can then be loaded into the appropriate content management system.
The various attributes enable the BR-processed documents to be associated with management control systems, e.g., pipeline planning and maintenance, or capital asset acquisition, or ESH inspections. The various attributes serve to provide multiple views into the document collection.
The extracted attribute values can be normalized prior to loading into the target system or the extracted values can be used to update and validate existing field authority lists.
For more information, please contact Rich E. Davis.

Wednesday, June 13, 2012

MAY BRINGS RICH C-LEVEL EXPERIENCE IN INDUSTRY AND PHILANTHROPY TO HIGH-TECH STARTUP BEYONDRECOGNITION


Germantown, TN: (May 30, 2012). John Martin, founder and CEO of BeyondRecognition, LLC, today stated that, ”BeyondRecognition is pleased to announce that Ken May will be providing business development guidance for BeyondRecognition as it pushes its innovative image-based document analysis technology into key markets like mortgage and loan processing, and the oil and gas industries.” BeyondRecognition’s breakthrough integrated workflow enables companies to obtain actionable intelligence from image-based and electronic format documents at a fraction of the cost associated with manually reviewing and abstracting paper files and often with higher accuracy and reliability.

Martin continued, “BeyondRecognition’s core competencies lie in document processing and analysis, and Ken brings an incredible wealth of experience managing FedEx Kinkos, one of the largest and most wide-spread document copying and handling operations in the world, as well as planning and managing some of the most highly automated decision-support systems in the world. He also has a wealth of C-level contacts at companies across America from his many years of service as Chairman of the National Board of Trustees for the March of Dimes. We look forward to being able to capitalize on his rich experience, energy, and industry knowledge.”

Ken May commented, “I have had the opportunity over the years to review many exciting technologies at all sorts of start-ups and emerging market leaders, but I was especially struck at how innovative BeyondRecognition’s technology is and at the incredible value it offers companies that are faced with needing to analyze and process large volumes of paper-based records. This is particularly true in industries like home loan processing where the documents in the underlying files are typically not all or even mostly electronic. The need to process existing back files of loan documents and to eventually automate the new loan initiation process represents an enormous potential. I look forward to helping spread the message about this important new technology.”

About Ken May

Beginning as a manager of hub operations for FedEx in 1982, May served in various management positions, becoming VP, Global Operations Scheduling and Control in 1996. He then served as Sr. VP, Air-Ground and Freight Services 1997 to 1999, was Sr. VP US Operations from 1999 to 2004, COO at FedEx-Kinko’s Office and Print Centers from 2004 to 2006, and was President and CEO at FedEx-Kinko’s Office and Print Centers from 2006 to 2008. 
May served as Chairman of the National Board of Trustees at the March of Dimes from 2007 to 2011, and was President of ES3, LLC, the third-party logistics subsidiary of C&S Wholesale Grocers, the eighth-largest privately held company in the US by revenue from 2010 to 2011. From 2011 to 2012 he was President and COO at Krispy Kreme Doughnuts.

May has been a director of PF Chang’s China Bistro since May 2007, and serves on the Board of Directors of Greystone Medical Group. 

For more about Ken May, see http://en.wikipedia.org/wiki/Ken_may.

About BeyondRecognition

BeyondRecognition has developed unique character, word and document attribute recognition and extraction capabilities for analyzing image-based documents. Its glyph clustering and cataloging approach enables rapid, globally-editable text recognition with accuracy rates far beyond traditional OCR. BeyondRecognition also clusters documents based on visual similarity and permits location-based, cluster-specific data element extraction for coding or abstracting data elements from the documents. Clustering by document type permits prioritized data element extraction using the powerful graphical user interface to highlight zones, and to write and instantly test and verify extraction rules.

Although nominally a “startup,” the principal technologists at BeyondRecognition have been working in the fields of document conversion, electronic evidence forensics and processing for decades. CEO John Martin was a founder of Cricket Technologies, LLC and RedFile LLC.

For more information, visit www.BeyondRecognition.net

Wednesday, June 6, 2012

Unlocking Paper Based Intelligence with Disruptive Technology

"You want your documents to be searchable - not laughably searchable . . . " John Martin



When confronted with massive amounts of unstructured data and the need to access the business intelligence locked inside that data the options before today were expensive, required massive human intervention, were extremely time consuming and most troubling, very ineffective. John Martin loves disruptive technology. I love how John Martin thinks.

John's latest game changing software BeyondRecognition is being referred to as a "Big Data innovator" by several of the " Big Four" accounting  firms. The tool was originally built  for a  company that  needed to extract key information from a 30+ year old scanned paper document set of 2.3 BILLION pages for a due diligence effort.  In the energy sector, Beyond Recognition's  glyph clustering technology makes it possible to search for symbols 
used on maps to indicate things like radioactive wells, salt-water wells or API number codes.

As a result of this disruptive new technology Martin notes , "We're seeing a great deal of interest in this approach in the mortgage and energy sectors. The mortgage industry in particular typically has a relatively finite number of documents in loan files supporting the loan decisions, with definable types of data being of interest on each type of document. Our process could greatly lower the cost of tracking all those data elements during loan initiation, or to quality control the file for audit or sale purposes."




Essentially BeyondRecognition's  unique character, word and document attribute recognition and extraction capabilities for analyzing image-based documents. In Plain english  BR allows clients to extract very valuable business intelligence from scanned and digitized files fast, accurately and in a cost-efficient manner. In a press release Barbara Johnson, CFA, former executive of USAA Federal Savings Bank, serving as Chief Credit Officer and Senior Vice President of Real Estate Lending Services and now a Principal with Saccadent, a Financial Consulting firm, has reviewed the clustering and data extraction capabilities and offered the comment that, "In today's environment the ability to extract, utilize and match data across a variety of documents is incredibly powerful. This type of technology offers the promise of significantly decreasing the time and cost to process a thoroughly compliant loan from application through origination, audit, sale and servicing. An automated system to confirm all the critical items match throughout the process and are in the appropriate format and location on all documents would be invaluable. Lending is a document-rich industry and the time is perfect for this type of technology."

Another intriguing aspect is that  BR solution is language agnostic automatically recognizing 40+ languages interspersed throughout any data set with no up-front programming required and performs at 99.5% word accuracy on first pass unassisted review. The solution can scale to meet customer requirements between 500k - 50m pages per day regardless of the legibility of the images. In fact using BR Adaptive Image Enhancement techniques restorion of  poor quality document images to much improved legibility is a seemless byproduct 


For additional information or to request a demo  contact John Martin at John AT beyond recognition DOT net
or Michael Mulcahy at Michael@focusdata-mgt.com or by phone at 562 546-2465







Thursday, May 31, 2012

ReadySuite’s latest improvements, product briefing and a white paper



ReadySuite 4.2 Now Available
Today we released version 4.2 of ReadySuite, which includes upgrades to the image and text viewers, the addition of redaction capabilities within the image viewer, and improvements in the parsing and output of EDRM XML and Summation DII load files.
ReadySuite users can take advantage of a number of new features and enhancements made to the image and text viewers.  Users will now have the ability to edit TIFF and PDF images inline using the image viewer, utilizing various image clean-up, redaction, and rotation options. When saving changes to documents, users can choose from 'Auto-Save', 'Prompt', and 'Read-Only' modes. Other upgrades to the image viewer included new save and print options, improved image display, and easier page navigation.
Further upgrades include new word-wrap and encoding options in the text viewer, support for reading and writing more fields and data types from EDRM XML load files and Summation DII load files, and new customization options. You can find the entire list of updates and improvements at our development blog.
Each new version of ReadySuite is developed to address your needs – and improve upon your requests. So please... keep sending us your suggestions for improvements!
You can purchase a one-year subscription of ReadySuite for $1,179.95 at our online store. If you haven’t bought it already, a free 14-day trial of ReadySuite can be downloaded here.
ILTA Product Briefing
On Friday, June 1 at 12:00 ET, we will be presenting a product briefing to our friends at ILTA. Please join us if you can. In this session, we’ll cover how ReadySuite, the affordable and easy-to-use litigation software is designed to help you control the quality of your productions and manage various document related tasks, including:
  • Quality check + validate productions
  • Convert among industry standard load file formats
  • Merge, manipulate, and validate load files
  • Converting among multiple image formats such as TIFF and PDF
  • Applying or removing endorsements from image sets
  • Generating searchable text and PDF files using OCR
  • Batch printing image sets to high capacity printers.
Learn More About Quality Control in e-Discovery
Last week, we released “The Need for e-Discovery Quality Control in the Age of Digital Data,” a free white paper that discusses the current state of quality control in the e-Discovery industry, areas for improvement and a possible solution.
The white paper covers the buildup to the need for better quality control in e-Discovery processes. It includes an overview of recent cases where e-Discovery quality became prominent in the outcome of the matters. It also shares possible solutions for law firms, in-house legal teams and discovery vendors to solve the challenges of managing quality in their own processes.
To download the white paper, click here.

Tuesday, April 17, 2012

BeyondRecognition's Disruptive Technology Recognized by LTN

Like you, I view 100's of software demo's annually and in my experience BeyondRecognition is one of the few that isn't just the same technology painted a different color. Over the last month I've become more and more enamored with John Martin's game changing coding and "OCR" software BeyondReconition. As I continue to become more familiar with the softwares accuracy, speed and its potential to search sound and video, the more I truly believe this is a game changing technology. Quantifiably more accurate than human coding, Beyond Recognition can process 1 million pages per day which lends itself incredibly well to replacing the leatest litigation technology buzzword "predictive coding" and makes offshore document coding obsolete.




Law Technology News writer Evan Koblentz wrote an illuminating piece for the April 17, 2012 issue of LTN.  The article which you can read in its entirety here, starts out by stating
"Document review that involves optical character recognition may be on the cusp of a new level of accuracy, startup BeyondRecognition asserts." and continues on by saying "His approach is to perform OCR by using methods that make sense for computers, rather than methods that make sense for humans."




I've seen Beyond Recognition in action, and have the reaction of people involved in the industry as they "get it".  Without any reservation at all I would recommend you take time from your busy schedule and see this technology in action

Saturday, December 17, 2011

Introducing ReadySuite 4.0 w/OCR


We’re excited to announce that ReadySuite version 4.0 has been released and is now available for download.
This version includes major new features, performance enhancements, improved stability and general fixes across the board.


Some of the highlights to this release include the new OCR add-on for generating OCR text files and searchable PDFs. Other features include the ability to save project files, specify field data types, and overlay existing documents. Please see our development blog for a more comprehensive list of changes made.
ReadySuite continues to be our flagship product – bundling specialized litigation utilities used by litigation support professionals, attorneys and paralegals. Using ReadySuite, users are able to handle various tasks related to converting and validation common load files, generating OCR text files and creating searchable PDFs, converting various image file formats, performing branding, numbering, and redaction of image sets and batch printing documents.
To learn about other features provided in ReadySuite, visit our product page. Additionally, you may get started with ReadySuite by downloading our 14-day trial or by contacting oursales staff.
Important Upgrade Information
This release of ReadySuite is considered a major release – with significant changes to the core product – and will require an upgrade for existing customers.
ReadySuite licenses issued on or after October 16, 2011 qualify for a free upgrade. Licenses issued before October 16, 2011 require an upgrade purchase for ReadySuite 4.0. Customers with a license to ReadySuite 3.x can upgrade to ReadySuite 4.0 with a 40% discount.

Now that ReadySuite v4.0 has been officially released, we want to highlight some of the important changes made in this release.
New Features:
Added ‘OCR Wizard’ utilizing RecoStar and Tesseract OCR engines
Added ‘Create PDFs Wizard’ for creating searchable PDFs using OCR engine
Added ability to create and save projects
Added ability to auto-save projects on a time based interval
Added ability to set field data types (text, memo, date, number, etc) during import process
Added ‘Manage Fields’ wizard for modifying fields and setting export masks
Added preference to parse Summation field data types or import only as ‘Text’
Added option to specify the ‘DOCID’ field when exporting load files
Added page level information: TIFF Compression, Width (in.), Height (in.), DPI
Added wizard for ‘Import Text Files’ to associate image sets with text files
Added ability to overlay images, text, and/or natives to documents already imported
Added record number output (;Record 1) for Summation load file export
Added ‘Open Output Folder’ links to wizards generating output
Added ‘Path Editor’ for globally editing existing document paths
Added advanced numbering to import wizards with identifier preview
Added preference to disable Page Rows in grid to improve performance/memory usage
Added ‘Memo’ as field data type to improve performance/memory usage
Added ‘Output bad records’ when importing delimited text files
Added check resources option when importing delimited text files
Added ability to parse multi-page and single-page files from a Summation briefcase
Added prompt to save project file when closing application
Added ability to create and delete custom fields using ‘Modify Fields’ dialog
Added ability to set the internal ‘DOCID’ field to a custom field
Added export mask to Number, Date and Boolean fields to change output format
Added page timeout to config settings for RecoStar OCR engine
Added project name and save status to title bar
Added ‘Link In Place’ option when exporting delimited text file
Created 64-bit version – available for download by request
Wizard for ‘Import Delimited Text’ can now populate new fields (useful for importing a “tag list”)
Wizard for ‘Number Documents’ can now track counters by their unique prefix
Wizards for importing Images, Text, Native and Load Files are more streamlined
Important Changes:
Changed folder browser in Wizards to sort alphabetically
Improved warning messages generated by various wizards
Improved error handling in ‘Import Delimited Text’ wizard
Improved remembering last folder paths for folder and file chooser dialogs
Improved drag/drop functionality to remove lock in Windows Explorer
Improved support for creating output with JPEG2000 and JBIG2 options
Improved display of various document and page count numbers
Improved document and page counts in status bar for grid when document set is filtered
Improved ability to sort by field data type in grid
Improved editing fields in grid and metadata panel by data type
Improved ‘Batch Update’ dialog, can search/replace during update and propagate to family
Improved validation of documents during and post import
Improved ability to check for duplicate pages across all imported documents
Improved ability to hash image, text and native files separately
Improved license check with new folder config setting and silent fail option
Improved reading ‘@D @V’ when improperly formatted for Summation DII load files
Improved memory usage and stability throughout application
Improved numbering by prefix
Improved ability to auto-detect file encoding
Improved ability to auto-detect field data types when importing delimited text
Fixed status bar display when document count 6 digits or more
Fixed sensitivity in Find/Replace dialog
Fixed issue where grid disappears under certain circumstances
Fixed crash with toolbar when application left idle for long duration
Fixed numbering documents when value is only in prefix field
Fixed crash with ‘Trim Documents’ wizard when modifying bounds
Fixed project 0kb by saving to temp file first under certain circumstances where saving fails
Fixed rare crash caused by splash screen
Fixed display of installed licenses, added reminder expiration reminder dialog
Fixed grid filter if count is zero after deleting records
Fixed various tab orders in dialogs and wizards
Note that the above list is not a comprehensive list, but includes more of the important changes we’ve made since the last release in October.