With text-mining software, which finds patterns and insights from collections of documents, one powerful capability identifies related words. The software models the words in a corpus (all the documents) into topics. Latent Dirichlet Allocation (LDA) is one of the algorithms which carries out such topic modeling. In the legal industry, LDA can work its wonders on any set of documents: client satisfaction comments, documents obtained in discovery or due diligence, annual reports, hot line replies, survey answers, and other sources.

An Example of LDA

Let’s start with an actual set of documents and see how LDA performs. The author gathered self-descriptions by thirty U.S. law firms, as in what they might use in recruitment brochures. Each self-description runs at least 150 words. After removing trivial words, we used LDA from a package of the open-source R language and told it to model five topics. The table below lays out the 10 words the software most closely associated with each topic, in declining order within each topic.

This content has been archived. It is available through our partners, LexisNexis® and Bloomberg Law.

To view this content, please continue to their sites.

Not a Lexis Subscriber?
Subscribe Now

Not a Bloomberg Law Subscriber?
Subscribe Now

Why am I seeing this?

LexisNexis® and Bloomberg Law are third party online distributors of the broad collection of current and archived versions of ALM's legal news publications. LexisNexis® and Bloomberg Law customers are able to access and use ALM's content, including content from the National Law Journal, The American Lawyer, Legaltech News, The New York Law Journal, and Corporate Counsel, as well as other sources of legal information.

For questions call 1-877-256-2472 or contact us at [email protected]