The Dirty Little Secret Behind “Big Data”

These days, no industry can operate without data. With the enormous amounts of data being generated every minute, every second from client logs, business transactions, sales figures, and stakeholders, data is the fuel that drives organizations. All this data gets piled up into huge data sets. This is what is referred to as Big Data.

Big data companies have been collecting and mining public and quasi-public information about consumer, financial, health, political, and personal interests of most Americans for years.

The vast amount of data gathered by these companies can be used in predictive modeling, a process that leverages the statistical analysis of known demographic groups to provide reliable predictions as to how representative sections of the population will respond to a set of questions, statements and/or facts. This process and its proven methodologies have been used in a plethora of industries with success. The field of law needs to catch up to this trend.

What some in the legal field are doing now (perhaps as a means to catch up with the times and jump on the big data bandwagon!) is taking uninformed data, data that can be meaningless when you don’t know what to look for and trying to extrapolate meaning and harvest value. Not all data is valuable. Refining the data to aggregate to valuable data assets is crucial. Often it’s a biased interpretation because those looking at the data are looking for the answer they want, otherwise known as confirmation bias. 

Confirmation Bias

Confirmation bias is the tendency for people to search for, favor, interpret, and/or recall information that confirms their preexisting beliefs. In contrast, confirmation bias drives the same people to ignore or dismiss any information supporting a different viewpoint.

Confirmation bias is a heuristic, a mental shortcut, that germinated as a way for people to make sense of and provide order in a complex world. Rather than exerting the cognitive effort needed for considering new or opposing information, people find it easier to construct their world using information that fits neatly into their existing viewpoints. Confirmation bias can be found in many facets of everyday behavior and can exert great influence on how people process information and make decisions.

Bias is one fundamental way in which people err. Confirmation bias impacts how attorneys evaluate their cases and how jurors process the information that’s presented to them at trial. Being aware of confirmation bias can help trial attorneys mitigate and impact the strategies used in order to achieve better outcomes in their cases.

Bias Addressed. Now How To Make Big Data Smart.

It’s important to know is how to turn the big data you’ve collected into smart data. If this is not done, then the data you’ve spent your valuable resources gathering will have very real limitations as they will be unstructured data sets. Without structure and predictable, reliable methods of consuming and interpreting data assets to generate insights and actionable intelligence makes big data unusable. But when the structure is overlaid effectively and the proper analytics are used, this is when big data starts to become smart data. 

What is Smart Data?jury analyst sam sum

Smart data is robust and meaningful data that provides accurate and valuable insights. It’s the difference between chasing illusory correlations between things that have nothing to do with each other and identifying real relationships between events and their predictors. It’s understanding what’s behind an algorithm and turning those meaningless numbers into actionable insights.

Smart data is data from which meaningful signals and patterns have been extracted. Smart data, simply put, means the data actually makes sense!

The Challenges of Big Data

Many companies get stuck at the initial phase of Big Data projects. Mainly because they’re not aware of Big Data challenges or simply not equipped to tackle them. Modern technologies and large data tools require skilled data professionals. These professionals include data scientists, data analysts, and data engineers to work with the tools, tackle the challenges, and make sense of these giant data sets. 

The data needs to be structured and properly analyzed to enhance and assist in decision-making. The challenges many come across include things like data quality, lack of data science professionals, validating data, accumulating data from different sources, and managing data. 

Historically, data sets are lacking necessary entity relationships or contextual information to bridge the gap between unrefined data assets and business processes i.e. the necessary cross-reference assets that make the data “smart”. It’s not built around demographic profiles. Demographic information does not have the rich details about exactly who the people are on their lists — from age to income to race and ethnicity — the way that robust panels do. These data sets, because they’re created by machine-to-machine transfers, also increase the possibility of waste and fraud.

Because of that, the level of certainty they can provide about specific segments of people and insights is limited. 

Even when you combine that data with other sources, you’re almost guaranteed to have massive gaps and errors in your estimates, due to your inability to correctly identify the relationships between the aggregated data’s variables. This data doesn’t provide the accuracy, objectivity, and transparency required to deliver the most accurate results. 

Data and Predictive Modeling in Jury Selection

The legal industry has been painfully slow to adopt the practice of data mining and predictive modeling. As a result, billions of data points of personal information on the general public that’s routinely stored in ever-growing databases have gone unused by the law profession. Big data is valuable to legal professionals in many ways, none more than in jury selection

Lawyers have traditionally picked jurors based on what they could observe — race, gender, age, body language. This used to be the only information lawyers definitively obtained about the jury panel which resulted in a jury selection system that:

  • Failed to create a representative cross-section of the community
  • Encouraged the discriminatory use of peremptory challenges
  • Resulted in an unacceptably high juror “no show” rate
  • Disproportionately disadvantaged litigants and defendants who couldn’t afford to hire jury consultants

Big data has the potential to remedy many of these existing limitations and inequities in the jury selection system as it has the ability to deliver highly personalized, current, and targeted information for locating qualified jurors in any jurisdiction.

The use of precise algorithms to provide personally targeted data in real-time provides a more accurate representative panel of jurors. Combine the real-person robust panels with the real-time data and now you’ll get a very accurate picture. The personal data that’s collected can also be quite revealing about attitudes, inclinations, and interests.

Panel data is not perfect, but jurors decide the outcome of cases meaning we should only be learning from real-life, verified, potential jurors.

Most Important Tips

The most important tips when dealing with big data is:

1) Use REAL data! Gather real-life data that contains psychographic information and other segmentation variables that can be used to gain accurate and valid results on specific groups of interest.

2) REDUCE the NOISE! Be able to tell what matters from your analyses and what is just noise. Noise results in invalid results and must be sifted through to find valuable insights.

3) Go to the EXPERT! Use robust and advanced analytics conducted by research experts who can efficiently sift through the plethora of noise in your data and discover invaluable information you otherwise would have missed.


For five years Jury Analyst has been promoting the use of big data in the legal industry. Our slogan has never changed “big data for big decisions.”