How Does Predictive Review Work?
Predictive Review is as a process. Predictive Review begins with the review of a small sample of documents and iteratively learns as more documents are reviewed. The accuracy of the process is measured and validated with the statistical tools embedded within the Servient E-Discovery Platform.
Step 1 - Servient Represents Each Document Mathematically Using Statistical Analysis
Predictive Review is based upon statistical modeling of the document set. During import into the Servient E-Discovery Platform, Servient represents each document mathematically based upon the important features contained in the document, as well as a comparative analysis to the important features present in all other documents in the data set.
Document features correlate to the information that a lawyer uses to subjectively determine the relevance of any document. For example, we know that the textual meaning of the document is central to a lawyer's review decision. Servient's powerful technology enables Servient to represent the meaning of the document based upon statistical analysis of the important terms in the document.
To understand this, it is helpful to think of a document as being made up of one or more topics. Servient identifies topics that permeate throughout the data set and further evaluates the mix of topics present in each document. The mathematical representation of the document based upon its topic mix is an important document feature.
Further, we know that a typical case has a finite number of witnesses involved in the matter. Thus, a lawyer is usually interested in who sent or received an email, who authored a report, or who possessed a file. If one were to perform a quantitative analysis of a completely reviewed data set, they will likely find that only a modest percentage of the total unique email addresses present in the data set do in fact correlate with responsive email. Document features such as sender email address, email domains, etc. are important document features to evaluate the likelihood of a document's relevance.
Also, we know that in a typical matter important events occur in different periods of time. Like email addresses, a quantitative analysis of a reviewed data set will typically show that that a significant percentage of responsive documents usually cluster within similar date ranges. An email that was sent or a document that was created on a date within an important time frame indicates that the document is more likely to be responsive.
Servient combines all of these document features (and others) in its statistical analysis to represent each document mathematically. With this model created, Servient is now ready to learn from attorney review.
Step 2 - Intelligently Select an Initial Sample of Documents for Attorney Review
Servient automatically selects a sample of documents for the legal team to review.
In Predictive Review we refer to the initial sample as the "Alpha Set". The Alpha Set is not merely a random sample of documents. Servient intelligently selects documents based upon their important features to create a representative sample of the entire data set. While the size of the Alpha Set differs from case to case, the Alpha Set is typically around one percent of the total data set.
Step 3 - Iterative Learning and Review
After the attorney reviews the Alpha Set and labels each document relevant or irrelevant, Servient learns from the labels and incorporates the knowledge into an enhanced statistical model of the Alpha Set. Servient then applies the statistical model to all non-reviewed documents and calculates the probability that each document is responsive or non-responsive.
Servient then intelligently selects additional documents to review. Servient assigns a mixture of probable responsive files with a small selection of probable non-responsive documents for review. As the legal team continues to label documents, Servient continues to learn and adjusts the relevance probability of each non-reviewed document.
This iterative learning and legal review process (i.e. relevance feedback loop) continues until the legal team has reviewed all of the identified responsive documents. Depending on the case strategy, the legal team can also decide to stop the review and rely on statistical validation of the automated document decisions. Servient provides unmatched flexibility because Servient's "active" learning technology is tightly integrated into the Servient E-Discovery Platform.
Step 4 - Statistical Validation of Process
Servient provides integrated tools to preform statistical validation of the process. As detailed in Understanding Statistical Validation, Servient measures the extent to which the learning model is consistent with manual review calls. Servient also allows for statistical sampling of non-reviewed documents to statistically measure the accuracy of the automated document decisions.
