Legal knowledge is largely expressed in written language, and legal professionals read and write to access, process and reason with the knowledge in texts. Although one can use information extraction to process text on a computer, the text remains a meaningless string of characters to the machine, without more –- such as the Semantic Web.
The Semantic Web, an extension of the current World Wide Web, promises to make Web-based documents meaningful to both people and computers by changing how legal knowledge is represented, managed and reasoned with. This article focuses on ontologies, which are one of the means to complete the Semantic Web’s design. It introduces some of the broad concepts of ontologies, indicates some of the sources of further information and tools, then provides a brief example of a legal ontology.
ONTOLOGIES
An ontology represents a common vocabulary and organization of information that explicitly, formally and generally specifies a conceptualization of a given domain. Ontologies are related to knowledge management (cf. Rusanow’s “Knowledge Management and the Smarter Lawyer“) and taxonomies (cf. Sherwin’s article “Legal Taxonomies“). But an ontology is a more specific, explicit and formal representation of knowledge than provided by KM; and it is richer and more flexible than a taxonomy.
KM is concerned with how legal professionals share documents and use communication tools like blogs, wikis and e-mail, all of which are irrelevant to ontologies. Taxonomies do not appear to require an ontology’s logical facilities. In contrast to KM and taxonomies, legal ontologies have not been widely discussed among legal professionals, albeit they have long been discussed among researchers in artificial intelligence and law (cf. papers by Professor Trevor Bench-Capon).
In making an ontology, one turns tacit expert knowledge into explicit representations that can be shared, tested and modified by people as well as processed by a computer. Ontologies are delimited, representing only some aspects of some domain, though one ontology may relate to another one (e.g., ontologies for intellectual property cases and human rights cases are related to one another and to a global ontology for legal cases).
To specify a conceptualization of a domain, we define classes of individuals (e.g., lawyers and law firms), subclasses of the classes (e.g., partners and associates are subclasses of lawyers), the properties which hold of individuals of a class (e.g., partners have seniority) and relationships among the individuals (e.g., every associate works for a partner, every lawyer works for a firm, every partner of a law firm works for that law firm). We can infer properties by inheritance, for example, given lawyers who work for a particular firm, then the partners of that firm work for the firm.
Rules for reasoning about cases can be expressed in the terminology of the ontology. With instances of an ontology along with rules, we have a knowledge base from which we can make inferences using the rules: for example, Hale, Williams and Partners is a law firm; Joan Williams is a partner at Hale, Williams, and Partners. Given what we know about law firms, we infer that Joan Williams works at Hale, Williams and Partners and that Joan Williams is a lawyer. To us, these inferences are obvious, but for a computer to make them, the knowledge must use explicit, formal inference rules. Finally, contemporary ontologies can be used to mark up documents so that a computer can meaningfully access and process the content.
For small examples and domains, there is little value in making an ontology and inference rules. The information and inferences are readily apparent to us. However, for large corpora and complex domains, the advantages are that knowledge can be systematized, rich patterns of information can be readily extracted, and inferences can be drawn that would not otherwise be apparent.
To develop and use an ontology, there must be some formalized, machine-readable format, tools to create and manage the ontologies and draw inferences, and a means to mark up documents using the ontology. One format which has been designed for the Web is the Web Ontology Language OWL.
OWL can be used to mark up the semantic role of a section of text; for example, we can take [lawFirmName :: Hale, Williams and Partners] to mean that the text after the colons is an instance of the name of a law firm. In the ontology, the expression “lawFirmName” stands for the class of entities which are names of law firms. The purpose of the markup is to provide a standard form that indicates to the machine what part the linguistic information plays in the knowledge representation of the firm.
OWL also supports a range of logical properties such as conjunction, inference and negation. Such a markup of text on a page is a version of the hyperlinks or text styles such as italics that appear in articles; markups ascribe additional properties or functionality beyond the text itself. With tools such as the free, open source ontology editor Protégé, one can develop an ontology using graphic representations rather than elaborate markups; the editor comes with additional tools to visualize the ontology or test it for consistency. Finally, Web-based tools such as Semantic MediaWiki along with the extension Halo enable one to place documents on the Web which have been marked up using a specified ontology.
ONTOLOGY FOR CASE LAW
Consider an example ontology for case law. There are various approaches to find relevant case law –- using text-mining software, search tools, proprietary indices or legal research summaries. These approaches can extract some latent linguistic information from the text but often require researchers to craft the results; indeed, successful information extraction depends on an ontology, and as there is not yet a rich ontology of the case law domain, much information in cases cannot be easily extracted or reasoned with. Moreover, none of these approaches apply inference rules.
Reading a case such as Manhattan Loft v. Mercury Liquors, there are elementary questions that can be answered by any legal professional, but not by a computer:
- Where was the case decided?
- Who were the participants and what roles did they play?
- Was it a case of first instance or on appeal?
- What was the basis of the appeal?
- What were the legal issues at stake?
- What were the facts?
- What factors were relevant in making the decision?
- What was the decision?
- What legislation or case law was cited?
Legal information service providers such as LexisNexis index some of the information and provide it in headnotes, but many of the details, which may be crucial, can only be found by reading the case itself. Current text-mining technologies cannot answer the questions because the information is embedded in the complexities of the language of the case, which computers cannot yet fully parse and understand. Finally, there are relationships among the pieces of information which no current automated system can represent, such as the relationships among case factors or precedential relationships among cases.
To be specific, consider some sample markups. Among the participants, we have those in the role of plaintiff and others in the role of defendant, which we would mark as [plaintiff:: Manhattan Loft] and [defendant:: Mercury Liquors]. There is a decision indicated with [decision:: Appeal reversed and remanded with directions]. We have a legal question [legalQuestion:: Can a party to a pending arbitration record a notice of pendancy of action without first filling a civil action in Superior Court?]. In the discussion section, there are references to legislation, which might appear as [legislationReference:: Section 425.16, subdivision (b)(1) of the Anti-SLAPP Statute]. Thus, a computer can search for the term “legalQuestion” within the case to find the content. Further components of a case can similarly be represented, such as the conditions which must be satisfied to meet requirements of a legal rule, or the mitigating or aggravating factors which contribute to the decision. The assumption is that while different cases represent information such as the legal question in different linguistic forms, the markup will remain constant; in this sense, the ontology is an abstract representation of knowledge.
PRACTICAL ISSUES
Several practical issues arise. In any case, there is a range of information that may prove useful to a researcher. Moreover, there are many case decisions handed down every year. Given the volume and variety, the legal researcher must negotiate between a fine-grained and a coarse-grained ontology. This is an ongoing, experimental issue which need not be decided all one way or the other, for there may be a variety of related and integrated ontologies which suit different purposes. Who develops the ontology? Given current available tools, an exciting option is Web-based collaborative ontology development, where legal professionals contribute to a free, open ontology of the law. Who does the markup and how is it checked? At this point, the labor is manual.
As a learning tool for law students or a tool for researchers, the labor can be done by individuals using Semantic MediaWikis. As a large-scale enterprise, legal publishers or government agencies could mark up cases using tool bars integrated with word processing software so that the case is marked up as it is written up over the course of the case. Marke- up cases would add enormous value to the case corpora for legal professionals, so there is adequate incentive.
Legal ontologies are one of the central elements of managing and automating legal knowledge. With ontologies, the means are available to realize significant portions of the Semantic Web for legal professionals, particularly if an open-source, collaborative approach is taken.
Dr. Adam Zachary Wyner is affiliated with the department of computer science at University College London, London, United Kingdom. He has a Ph.D. in linguistics from Cornell University and a Ph.D. in computer science from King’s College London. He has published on topics in the syntax and semantics of natural language, as well as artificial intelligence and law concerning legal systems, language, logic and argumentation. For further information, see Dr. Wyner’s blog LanguageLogicLawSoftware. He can be contacted via e-mail at [email protected] and telephone at 00-44-(2)-208-809-3960.