Trials and Translation
International companies face e-discovery challenges with multilingual data.
December 31, 2008 at 07:00 PM
16 minute read
Managing e-discovery is complex enough in English, with cases such as Victor Stanley Inc. v. Creative Pipe Inc. emphasizing the dangers of failing to maintain and document defensible production methods. When a matter involves global business with electronically stored information (ESI) in multiple languages, the problems multiply. In-house counsel involved in multinational e-discovery must deal with a host of technical issues as well as legal complications in foreign jurisdictions.
E-discovery companies are scrambling to present solutions, establish overseas operations to meet demand and develop technologies to enable collection, processing, review and production of multilingual data as the need becomes more pressing. Multilingual e-discovery is even developing its own jargon, including LOTE to refer collectively to “languages other than English,” and CJK to refer to Chinese, Japanese and Korean–languages with special characters that offer unique challenges.
“Foreign-language documents have become an integral part of the e-discovery landscape,” says attorney John Tredennick, CEO of Catalyst Repository Systems Inc., which provides secure data repositories. Tredennick estimates that more than 50 percent of the data his company processes now involves LOTE. “But treating foreign-language ESI as if it is just another component of the discovery process can get you in real trouble.”
Tredennick says improper collection of data in the EU can land you in jail. “Likewise, processing Chinese or Japanese e-mail using software not built for CJK languages will result in gibberish and a potential data spoliation claim,” he adds.
Tower of Babel
To appreciate the technical issues in multilingual e-discovery, the first step is to learn how computers recognize different language structures. The relatively small collection of letters, special characters and punctuation marks included in Indo-European languages such as Spanish, French and German make computer processing relatively simple. On the other hand, the pictorial CJK languages have tens of thousands of often-overlapping characters with no spaces or punctuation between words. Languages such as Hebrew and Arabic that read right to left pose other issues.
Until recently, there was no global standard for coding languages so that they can be recognized by computers. The American Standard Code for Information Interchange (ASCII) allowed just
256 bytes or slots for letters, numbers, special characters and punctuation marks for each language. This was sufficient for English and Western European languages but inadequate for many others. Cross reference tables known as code pages, developed to allow ASCII-based computers to recognize more languages, are still used to store legacy data today. But because there is no universal set of code pages for all computers, code pages on one computer may be unreadable on another.
Fortunately, within the past decade computer hardware and software makers have adopted a global standard known as Unicode Transformation Format (UTF). UTF has the capacity to support more than 1.1 million characters, well beyond the 100,000 or so characters currently in use around the world.
But enough systems with code pages still exist that it's important to determine whether all data that needs to be reviewed is maintained in UTF. If any of it remains stored on systems using code pages, the e-discovery processing system must be able to support both legacy code pages and UTF. The system also must be able to identify the beginnings and ends of words and sentences in CJK languages. Some search engines can separate overlapping words and phrases by their context–a process known as tokenization.
Cross-Border Complications
The first stage of e-discovery–collection of the potentially discoverable ESI–is complicated by privacy laws. For example, the EU has strict privacy laws limiting the transfer of personal information across borders. But the EU and the U.S. have a “safe harbor” agreement allowing companies to transfer personal data out of the EU if they certify that they will provide adequate privacy protection. One solution is Web-enabled software that allows document collections hosted in one country to be securely accessed by legal teams elsewhere.
“The documents can be read online but they always remain stored on servers in the country of origin without being transferred, cached or downloaded to computers outside those countries.” says Ian Campbell, COO of iCONECT, a litigation support software developer.
In addition to ensuring that your e-discovery collection team is safe harbor-certified, work flow and technical data formats should be reviewed so the data ultimately processed is not corrupted because of the collection tools used.
“When data is collected improperly, there often will be no way to salvage it when it comes time to process and review it, or if there is, the process can be extremely difficult and costly,” says Greg Neustaetter, senior product manager at Stratify, an e-discovery vendor.
Man vs. Machine
Once collection is completed, e-discovery specialists suggest identifying all the languages contained in the potentially discoverable data to ensure the appropriate software is used to sort and process it.
“E-discovery filtering and processing is the systematic way of reducing a data set, converting the documents to a standard file format and gathering the metadata and extracted text for review,” says Michelle Lange, director of e-discovery at Kroll Ontrack.
Translation software is a cost-effective first step in processing, particularly where there is a large volume of data. Though not as accurate as human translation, the software is good enough to sort which e-mails discuss lunch plans and which go to the heart of the litigation.
“Compare $15 per page for human translation to 15 cents using translation software, and the difference in expense can be staggering,” says Tredennick.
After machine translation is used to eliminate irrelevant documents, human translators should take over to guarantee an accurate translation if English-speaking attorneys are viewing the remaining documents. Alternatively, attorneys with an understanding of the relevant languages and cultures may review the documents.
“It is important to select a human processing team familiar with a country's practices and cultural differences when doing e-discovery review outside the U.S.,” says David Chaumette, a partner at Baker & McKenzie. The use of idiom and colloquialism in e-mails and text messages requires a review team familiar with the unique practices in each country, Chaumette adds.
This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.
To view this content, please continue to their sites.
Not a Lexis Subscriber?
Subscribe Now
Not a Bloomberg Law Subscriber?
Subscribe Now
NOT FOR REPRINT
© 2024 ALM Global, LLC, All Rights Reserved. Request academic re-use from www.copyright.com. All other uses, submit a request to [email protected]. For more information visit Asset & Logo Licensing.
You Might Like
View AllGC Conference Takeaways: Picking AI Vendors 'a Bit of a Crap Shoot,' Beware of Internal Investigation 'Scope Creep'
8 minute readWhy ACLU's New Legal Director Says It's a 'Good Time to Take the Reins'
'Utterly Bewildering': GCs Struggle to Grasp Scattershot Nature of Law Firm Rate Hikes
Trending Stories
- 1Miami’s Arbitration Week Aims To Cement City’s Status as Dispute Destination
- 2GE Agrees to $362.5M Deal to End Shareholder Claims Over Power, Insurance Risks
- 3As Political Extremism Rises, Is Voter Data the Next Privacy Frontier?
- 4So You Want to be a Tech Lawyer? Consider Product Counseling
- 5US District Judge in North Carolina Will Take Senior Status
Who Got The Work
Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.
Who Got The Work
Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.
Who Got The Work
Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.
Who Got The Work
David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.
Who Got The Work
Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.
Featured Firms
Law Offices of Gary Martin Hays & Associates, P.C.
(470) 294-1674
Law Offices of Mark E. Salomone
(857) 444-6468
Smith & Hassler
(713) 739-1250