The Anatomy of Contract Data

The value of a company’s contract portfolio does not necessarily lie in the ability to answer questions about a specific document, but rather the ability to answer enterprise-wide questions about risk, obligations, and entitlements. Structured data is the key to unlocking the value and ability to understand (and act upon) a portfolio of agreements. 

In this post, we will explore one foundational tool used to organize and frame contract data – the data model – and why data modeling is trickier than you think.  Along the way, we’ll consider some best practices when developing data models and how careful forethought into data modeling yields better structured data outcomes.


At Knowable, our core mission is to convert clunky contract prose into usable and actionable points of data.  In order to achieve that goal, our team developed and continually refines a framework to capture, organize, and categorize data points – the Knowable data model.

The notion of a data model is one borrowed from the software engineering field.  In Knowable parlance, the data model is the foundation needed to disassemble abstract legal clauses into discrete data elements in a systematic and standardized manner.  Perhaps an illustration is in order:

Anatomy of a Contract Clause

"This agreement may be terminated for convenience by either party upon 30 days’ prior written notice."

For practitioners and people familiar with contracts, this simple provision is read and its meaning is immediately understood.  The challenge is to convert this straightforward clause into standardized discrete data points which would then enable use of an entirely new data analysis toolkit.  So let’s take this clause and break down its meaning:

  • Concept:  What legal topic are we broadly addressing with this provision?  “Termination” – easy enough.
  • Sub-Concept:  What aspect of Termination is specifically addressed here?  Well, there are a few to consider:
    • Which party may exercise the right to terminate?
    • What preconditions or justifications are required in order to exercise the termination right?
    • What procedural requirements are necessary to exercise the right to terminate?
  • Positions:  Finally, once the Sub-Concepts are identified, we need to reduce the prose down to the individual terms, or Positions, contained in the clause.  Regardless of how eloquent the prose within a contract may be, the underlying Positions are the heart of negotiations between contractual parties and ultimately state the substantive rights, obligations, entitlements, representations, and acknowledgements of each party.  Positions make up the individual data element building blocks that, when assembled at the document level, convey the legal relationship between contractual parties.  Broadening the scope, these positional building blocks can be further constructed into portfolio-level contract datasets which enable analytics that provide insights at the enterprise-level.  Examples of Positions include:
    • The name of the parties permitted to exercise a right to terminate.
    • Whether or not there must be “cause” to permit termination.
    • Whether notice is required to effectuate termination and the manner of such notice.

Illustration of a Basic Data Model

Taking all this into account, we can now convert our clause into structured data.

Sample 1: This agreement may be terminated for convenience by either party upon 30 days’ prior written notice.

By applying the Concept/Sub-Concept/Position framework we defined above, this clause and its constituent parts can be broken down into the following simplified data model:

Structured Data for the Termination Concept
Termination Sub-Concept Sample 1 Position
Which party may exercise? Both
Preconditions to exercise? No – without cause
Procedural requirements? 30 Days’ Written Notice


We now have a standardized method and structure to start collecting impactful data.  To illustrate the benefits of standardization, we can introduce another sample contract clause:

Sample 2: Vendor and customer shall each have the unrestricted right to terminate the agreement, for any reason or no reason, by service of written notice of termination to the other party not less than 30 days prior to the effective date of such termination.

Although this clause reads very differently than Sample 1, the underlying substance remains the same and can be reduced to identical data elements in a consistent manner across both documents. This is the core benefit of a standardized structured data model – being able to make meaningful “apples-to-apples” comparisons of Positions across a diverse set of documents.

To illustrate this benefit further, consider that using a standardized data model also avoids the problem many pure software extraction tools and traditional “due diligence” memos face.  Variations in language from contract to contract, or the writing style of individual drafters, may introduce subtle inconsistencies to the output across large populations of documents.  This cumulative “drift” and the introduction of unnormalized data ultimately prevents a true term-to-term comparison of the underlying legal positions contained within those documents, as illustrated below.

Comparison of Structured Data vs. Unstructured/Verbatim Extraction Data
Termination Sub-Concept Standardized Sample 1 and 2 Position Unstructured/ Verbatim Sample 1 Position Unstructured/ Verbatim Sample 2 Position
Which party may exercise? Both Either Party Vendor and Customer
Preconditions to exercise? No – without cause Convenience Any or no reason
Procedural requirements? 30 Days’ Written Notice 30 days’ prior written notice Written notice not less than 30 days prior to termination


Why Are Good Data Models Hard to Develop?

Now that we know what a data model is, we can now explore the challenges to effectively develop and deploy one for your own datasets.

At the most basic level, a data modeler needs to decide what data to capture – “What do I need to know?”  This might be driven by a business need (What $/widget am I paying?), a legal/risk/compliance obligation (Am I subject to Regulation “X”?), or workflows and administrative concerns (What Vendor # applies to this Purchase Order?).

Returning to our prior example, at a broad level we identified that we want to know about the Termination rights under an agreement and we’ve identified the right “altitude” to capture individual legal terms – the “Position.”  You might even think the data model we created is pretty good as it tells us something about the parties involved, the conditions around exercising termination, and what affirmative actions we need to take to end the agreement.  The problem is that you “may not know what you don’t know” without the benefit of practical experience or legal subject-matter expertise.

To illustrate, in our example we glossed over the fact that the mere existence of the Termination clause is an important data point in itself – “silence” has its own legal significance and, from a reporting and analytics perspective, we can use that data point as a filter to isolate and examine a subset of the document population.  In addition, for any given Position, clear interpretive guidance may be required to parse out subtle textual nuances to arrive at the accurate underlying legal meaning (e.g., termination “without cause” = “for convenience” = “for any reason” = “at sole discretion”, etc.).

Case Study – Knowable Data Model

To give you a sense of the potential complexity of data models, consider that Knowable’s Termination data model alone consists of 14 Sub-Concepts and 45 distinct Positions to accurately capture each and every permutation reflected in real-world contracting situations.  The goal of such a detailed data model is to (i) maintain fidelity to the underlying legal meaning in contracts, (ii) permit the manipulation and reporting of individual data elements, (iii) enable our proprietary machine learning and artificial intelligence algorithms, and (iv) allow for flexibility on how deep or shallow a client would like to capture data relating to Termination.

Some of these Sub-Concepts and Positions address items that are of important economic consequence (e.g., early termination fees, notice obligations); others actually carry legal weight (e.g., conditional termination in the event of breach).  Still, others enable powerful reporting and analytics (e.g., instead of capturing a notice period of “30 days” verbatim with text, breaking that static text into the “30” integer and “day” denomination elements allows for the conversion of the notice period into other units of time; this simple ability to convert units of time enables the export of notice due dates manually or via API into a calendar/docketing application, or permits broader implementation actions into CLM systems without such capability out of the box).

Thoughtful data model design also considers efficiency in creating the structured data with the goal of providing cost-effective solutions for clients.  To illustrate one efficiency-seeking example, Knowable analyzed frequency distributions across thousands of agreements to isolate the most common Positions for a given Sub-Concept, avoiding a “long tail” of edge cases and outliers.

Finally, good data model design should also consider the accuracy and quality of the output.  Knowable provides a 98% accuracy guarantee that is partially enabled by intelligent data modeling.  For example, Knowable data modelers leverage dependent relationships between contractual positions to ensure that no conflicting data exists in a dataset.  Revisiting our Termination example: If a contract permits termination without the need to provide notice, that condition cannot coexist with a determination that 30 days is the applicable notice period.  Knowable’s tools, processes, and workflows would either prevent the input of conflicting data, or escalate this contradiction to a quality control specialist to resolve the contradiction.

Well-thought-out data modeling is an integral step to faithfully distilling complex legal prose into structured data, which in turn enables clear insights and analytics to better inform business decisions and outcomes.  To learn more, please reach out to

Mike Kim is Head of Legal Operations and General Counsel at Knowable, where Mike currently focuses on driving productivity and efficiency improvement in direct support of Knowable’s mission to structure data from legal prose. Prior to his current role, Mike served as a Director - Client Service, where his responsibilities included leadership of a cross-functional data governance team, serving as the M&A substantive subject matter expert on client engagements, and general oversight of the service delivery organization.

Prior to joining Knowable, Mike practiced law in both large law firm and corporate settings, most recently as General Counsel for The Master Lock Company, a division of Fortune Brands, a Fortune 500 home and security consumer products company. Resident to Knowable’s Chicago Center of Excellence, Mike is a lifelong Chicagoan and outside of work enjoys spending time outdoors with his wife and three children.