At Knowable, our core mission is to convert clunky contract prose into usable and actionable points of data. In order to achieve that goal, our team developed and continually refines a framework to capture, organize, and categorize data points – the Knowable data model.
The notion of a data model is one borrowed from the software engineering field. In Knowable parlance, the data model is the foundation needed to disassemble abstract legal clauses into discrete data elements in a systematic and standardized manner. Perhaps an illustration is in order:
"This agreement may be terminated for convenience by either party upon 30 days’ prior written notice."
For practitioners and people familiar with contracts, this simple provision is read and its meaning is immediately understood. The challenge is to convert this straightforward clause into standardized discrete data points which would then enable use of an entirely new data analysis toolkit. So let’s take this clause and break down its meaning:
Taking all this into account, we can now convert our clause into structured data.
By applying the Concept/Sub-Concept/Position framework we defined above, this clause and its constituent parts can be broken down into the following simplified data model:
|Structured Data for the Termination Concept|
|Termination Sub-Concept||Sample 1 Position|
|Which party may exercise?||Both|
|Preconditions to exercise?||No – without cause|
|Procedural requirements?||30 Days’ Written Notice|
We now have a standardized method and structure to start collecting impactful data. To illustrate the benefits of standardization, we can introduce another sample contract clause:
Although this clause reads very differently than Sample 1, the underlying substance remains the same and can be reduced to identical data elements in a consistent manner across both documents. This is the core benefit of a standardized structured data model – being able to make meaningful “apples-to-apples” comparisons of Positions across a diverse set of documents.
To illustrate this benefit further, consider that using a standardized data model also avoids the problem many pure software extraction tools and traditional “due diligence” memos face. Variations in language from contract to contract, or the writing style of individual drafters, may introduce subtle inconsistencies to the output across large populations of documents. This cumulative “drift” and the introduction of unnormalized data ultimately prevents a true term-to-term comparison of the underlying legal positions contained within those documents, as illustrated below.
|Comparison of Structured Data vs. Unstructured/Verbatim Extraction Data|
|Termination Sub-Concept||Standardized Sample 1 and 2 Position||Unstructured/ Verbatim Sample 1 Position||Unstructured/ Verbatim Sample 2 Position|
|Which party may exercise?||Both||Either Party||Vendor and Customer|
|Preconditions to exercise?||No – without cause||Convenience||Any or no reason|
|Procedural requirements?||30 Days’ Written Notice||30 days’ prior written notice||Written notice not less than 30 days prior to termination|
Now that we know what a data model is, we can now explore the challenges to effectively develop and deploy one for your own datasets.
At the most basic level, a data modeler needs to decide what data to capture – “What do I need to know?” This might be driven by a business need (What $/widget am I paying?), a legal/risk/compliance obligation (Am I subject to Regulation “X”?), or workflows and administrative concerns (What Vendor # applies to this Purchase Order?).
Returning to our prior example, at a broad level we identified that we want to know about the Termination rights under an agreement and we’ve identified the right “altitude” to capture individual legal terms – the “Position.” You might even think the data model we created is pretty good as it tells us something about the parties involved, the conditions around exercising termination, and what affirmative actions we need to take to end the agreement. The problem is that you “may not know what you don’t know” without the benefit of practical experience or legal subject-matter expertise.
To illustrate, in our example we glossed over the fact that the mere existence of the Termination clause is an important data point in itself – “silence” has its own legal significance and, from a reporting and analytics perspective, we can use that data point as a filter to isolate and examine a subset of the document population. In addition, for any given Position, clear interpretive guidance may be required to parse out subtle textual nuances to arrive at the accurate underlying legal meaning (e.g., termination “without cause” = “for convenience” = “for any reason” = “at sole discretion”, etc.).
To give you a sense of the potential complexity of data models, consider that Knowable’s Termination data model alone consists of 14 Sub-Concepts and 45 distinct Positions to accurately capture each and every permutation reflected in real-world contracting situations. The goal of such a detailed data model is to (i) maintain fidelity to the underlying legal meaning in contracts, (ii) permit the manipulation and reporting of individual data elements, (iii) enable our proprietary machine learning and artificial intelligence algorithms, and (iv) allow for flexibility on how deep or shallow a client would like to capture data relating to Termination.
Some of these Sub-Concepts and Positions address items that are of important economic consequence (e.g., early termination fees, notice obligations); others actually carry legal weight (e.g., conditional termination in the event of breach). Still, others enable powerful reporting and analytics (e.g., instead of capturing a notice period of “30 days” verbatim with text, breaking that static text into the “30” integer and “day” denomination elements allows for the conversion of the notice period into other units of time; this simple ability to convert units of time enables the export of notice due dates manually or via API into a calendar/docketing application, or permits broader implementation actions into CLM systems without such capability out of the box).
Thoughtful data model design also considers efficiency in creating the structured data with the goal of providing cost-effective solutions for clients. To illustrate one efficiency-seeking example, Knowable analyzed frequency distributions across thousands of agreements to isolate the most common Positions for a given Sub-Concept, avoiding a “long tail” of edge cases and outliers.
Finally, good data model design should also consider the accuracy and quality of the output. Knowable provides a 98% accuracy guarantee that is partially enabled by intelligent data modeling. For example, Knowable data modelers leverage dependent relationships between contractual positions to ensure that no conflicting data exists in a dataset. Revisiting our Termination example: If a contract permits termination without the need to provide notice, that condition cannot coexist with a determination that 30 days is the applicable notice period. Knowable’s tools, processes, and workflows would either prevent the input of conflicting data, or escalate this contradiction to a quality control specialist to resolve the contradiction.
Well-thought-out data modeling is an integral step to faithfully distilling complex legal prose into structured data, which in turn enables clear insights and analytics to better inform business decisions and outcomes. To learn more, please reach out to email@example.com.
Mike Kim is Head of Legal Operations and General Counsel at Knowable, where Mike currently focuses on driving productivity and efficiency improvement in direct support of Knowable’s mission to structure data from legal prose. Prior to his current role, Mike served as a Director - Client Service, where his responsibilities included leadership of a cross-functional data governance team, serving as the M&A substantive subject matter expert on client engagements, and general oversight of the service delivery organization.
Prior to joining Knowable, Mike practiced law in both large law firm and corporate settings, most recently as General Counsel for The Master Lock Company, a division of Fortune Brands, a Fortune 500 home and security consumer products company. Resident to Knowable’s Chicago Center of Excellence, Mike is a lifelong Chicagoan and outside of work enjoys spending time outdoors with his wife and three children.