Trials and Translation

International companies face e-discovery challenges with multilingual data.

December 31, 2008 at 07:00 PM

16 minute read

Managing e-discovery is complex enough in English, with cases such as Victor Stanley Inc. v. Creative Pipe Inc. emphasizing the dangers of failing to maintain and document defensible production methods. When a matter involves global business with electronically stored information (ESI) in multiple languages, the problems multiply. In-house counsel involved in multinational e-discovery must deal with a host of technical issues as well as legal complications in foreign jurisdictions.

E-discovery companies are scrambling to present solutions, establish overseas operations to meet demand and develop technologies to enable collection, processing, review and production of multilingual data as the need becomes more pressing. Multilingual e-discovery is even developing its own jargon, including LOTE to refer collectively to “languages other than English,” and CJK to refer to Chinese, Japanese and Korean–languages with special characters that offer unique challenges.

“Foreign-language documents have become an integral part of the e-discovery landscape,” says attorney John Tredennick, CEO of Catalyst Repository Systems Inc., which provides secure data repositories. Tredennick estimates that more than 50 percent of the data his company processes now involves LOTE. “But treating foreign-language ESI as if it is just another component of the discovery process can get you in real trouble.”

Tredennick says improper collection of data in the EU can land you in jail. “Likewise, processing Chinese or Japanese e-mail using software not built for CJK languages will result in gibberish and a potential data spoliation claim,” he adds.

Tower of Babel

To appreciate the technical issues in multilingual e-discovery, the first step is to learn how computers recognize different language structures. The relatively small collection of letters, special characters and punctuation marks included in Indo-European languages such as Spanish, French and German make computer processing relatively simple. On the other hand, the pictorial CJK languages have tens of thousands of often-overlapping characters with no spaces or punctuation between words. Languages such as Hebrew and Arabic that read right to left pose other issues.

Until recently, there was no global standard for coding languages so that they can be recognized by computers. The American Standard Code for Information Interchange (ASCII) allowed just
256 bytes or slots for letters, numbers, special characters and punctuation marks for each language. This was sufficient for English and Western European languages but inadequate for many others. Cross reference tables known as code pages, developed to allow ASCII-based computers to recognize more languages, are still used to store legacy data today. But because there is no universal set of code pages for all computers, code pages on one computer may be unreadable on another.

Fortunately, within the past decade computer hardware and software makers have adopted a global standard known as Unicode Transformation Format (UTF). UTF has the capacity to support more than 1.1 million characters, well beyond the 100,000 or so characters currently in use around the world.

But enough systems with code pages still exist that it's important to determine whether all data that needs to be reviewed is maintained in UTF. If any of it remains stored on systems using code pages, the e-discovery processing system must be able to support both legacy code pages and UTF. The system also must be able to identify the beginnings and ends of words and sentences in CJK languages. Some search engines can separate overlapping words and phrases by their context–a process known as tokenization.

Cross-Border Complications

The first stage of e-discovery–collection of the potentially discoverable ESI–is complicated by privacy laws. For example, the EU has strict privacy laws limiting the transfer of personal information across borders. But the EU and the U.S. have a “safe harbor” agreement allowing companies to transfer personal data out of the EU if they certify that they will provide adequate privacy protection. One solution is Web-enabled software that allows document collections hosted in one country to be securely accessed by legal teams elsewhere.

“The documents can be read online but they always remain stored on servers in the country of origin without being transferred, cached or downloaded to computers outside those countries.” says Ian Campbell, COO of iCONECT, a litigation support software developer.

In addition to ensuring that your e-discovery collection team is safe harbor-certified, work flow and technical data formats should be reviewed so the data ultimately processed is not corrupted because of the collection tools used.

“When data is collected improperly, there often will be no way to salvage it when it comes time to process and review it, or if there is, the process can be extremely difficult and costly,” says Greg Neustaetter, senior product manager at Stratify, an e-discovery vendor.

Man vs. Machine

Once collection is completed, e-discovery specialists suggest identifying all the languages contained in the potentially discoverable data to ensure the appropriate software is used to sort and process it.

“E-discovery filtering and processing is the systematic way of reducing a data set, converting the documents to a standard file format and gathering the metadata and extracted text for review,” says Michelle Lange, director of e-discovery at Kroll Ontrack.

Translation software is a cost-effective first step in processing, particularly where there is a large volume of data. Though not as accurate as human translation, the software is good enough to sort which e-mails discuss lunch plans and which go to the heart of the litigation.

“Compare $15 per page for human translation to 15 cents using translation software, and the difference in expense can be staggering,” says Tredennick.

After machine translation is used to eliminate irrelevant documents, human translators should take over to guarantee an accurate translation if English-speaking attorneys are viewing the remaining documents. Alternatively, attorneys with an understanding of the relevant languages and cultures may review the documents.

“It is important to select a human processing team familiar with a country's practices and cultural differences when doing e-discovery review outside the U.S.,” says David Chaumette, a partner at Baker & McKenzie. The use of idiom and colloquialism in e-mails and text messages requires a review team familiar with the unique practices in each country, Chaumette adds.

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

To view this content, please continue to their sites.

Go To Lexis →

Not a Lexis Subscriber?
Subscribe Now

Go To Bloomberg Law →

Not a Bloomberg Law Subscriber?
Subscribe Now

NOT FOR REPRINT

You Might Like

December 04, 2024

Lawyers Drowning in Cases Are Embracing AI Fastest—and Say It's Yielding Better Outcomes for Clients

By Maria Dinzeo

9 minute read

November 25, 2024

The Dynamic Duo Behind CMG's Legal Ops Team

By Trudy Knockless

6 minute read

November 21, 2024

GC Conference Takeaways: Picking AI Vendors 'a Bit of a Crap Shoot,' Beware of Internal Investigation 'Scope Creep'

By Trudy Knockless and James Palmer

8 minute read

October 24, 2024

Why ACLU's New Legal Director Says It's a 'Good Time to Take the Reins'

By Jimmy Hoover

8 minute read

Latest

Trending

Who Got The Work

Michael G. Bongiorno, Andrew Scott Dulberg and Elizabeth E. Driscoll from Wilmer Cutler Pickering Hale and Dorr have stepped in to represent Symbotic Inc., an A.I.-enabled technology platform that focuses on increasing supply chain efficiency, and other defendants in a pending shareholder derivative lawsuit. The case, filed Oct. 2 in Massachusetts District Court by the Brown Law Firm on behalf of Stephen Austen, accuses certain officers and directors of misleading investors in regard to Symbotic's potential for margin growth by failing to disclose that the company was not equipped to timely deploy its systems or manage expenses through project delays. The case, assigned to U.S. District Judge Nathaniel M. Gorton, is 1:24-cv-12522, Austen v. Cohen et al.

Who Got The Work

Edmund Polubinski and Marie Killmond of Davis Polk & Wardwell have entered appearances for data platform software development company MongoDB and other defendants in a pending shareholder derivative lawsuit. The action, filed Oct. 7 in New York Southern District Court by the Brown Law Firm, accuses the company's directors and/or officers of falsely expressing confidence in the company’s restructuring of its sales incentive plan and downplaying the severity of decreases in its upfront commitments. The case is 1:24-cv-07594, Roy v. Ittycheria et al.

Who Got The Work

Amy O. Bruchs and Kurt F. Ellison of Michael Best & Friedrich have entered appearances for Epic Systems Corp. in a pending employment discrimination lawsuit. The suit was filed Sept. 7 in Wisconsin Western District Court by Levine Eisberner LLC and Siri & Glimstad on behalf of a project manager who claims that he was wrongfully terminated after applying for a religious exemption to the defendant's COVID-19 vaccine mandate. The case, assigned to U.S. Magistrate Judge Anita Marie Boor, is 3:24-cv-00630, Secker, Nathan v. Epic Systems Corporation.

Who Got The Work

David X. Sullivan, Thomas J. Finn and Gregory A. Hall from McCarter & English have entered appearances for Sunrun Installation Services in a pending civil rights lawsuit. The complaint was filed Sept. 4 in Connecticut District Court by attorney Robert M. Berke on behalf of former employee George Edward Steins, who was arrested and charged with employing an unregistered home improvement salesperson. The complaint alleges that had Sunrun informed the Connecticut Department of Consumer Protection that the plaintiff's employment had ended in 2017 and that he no longer held Sunrun's home improvement contractor license, he would not have been hit with charges, which were dismissed in May 2024. The case, assigned to U.S. District Judge Jeffrey A. Meyer, is 3:24-cv-01423, Steins v. Sunrun, Inc. et al.

Who Got The Work

Greenberg Traurig shareholder Joshua L. Raskin has entered an appearance for boohoo.com UK Ltd. in a pending patent infringement lawsuit. The suit, filed Sept. 3 in Texas Eastern District Court by Rozier Hardt McDonough on behalf of Alto Dynamics, asserts five patents related to an online shopping platform. The case, assigned to U.S. District Judge Rodney Gilstrap, is 2:24-cv-00719, Alto Dynamics, LLC v. boohoo.com UK Limited.

Learn More About Radar

Featured Firms

Law Offices of Gary Martin Hays & Associates, P.C.

(470) 294-1674

Law Offices of Mark E. Salomone

(857) 444-6468

Smith & Hassler

(713) 739-1250

Trials and Translation

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

You Might Like

Featured Firms

More from ALM

Subscribe to Corporate Counsel