As a long term advocate of big data based solutions to investigative challenges, I have been interested to see the recent application of such approaches to the ever-growing problem of data beaches. More data is stored electronically than ever before, financial data, marketing data, customer data, vendor listings, sales transactions, email correspondence, and more, and evidence of fraud can be located anywhere within those mountains of data. Unfortunately, fraudulent data often looks like legitimate data when viewed in the raw. Taking a sample and testing it might not uncover fraudulent activity. Fortunately, today’s fraud examiners have the ability to sort through piles of information by using special software and data analysis techniques. These methods can identify future trends within a certain industry, and they can be configured to identify breaks in audit control programs and anomalies in accounting records.
In general, fraud examiners perform two primary functions to explore and analyze large amounts of data: data mining and data analysis. Data mining is the science of searching large volumes of data for patterns. Data analysis refers to any statistical process used to analyze data and draw conclusions from the findings. These terms are often used interchangeably. If properly used, data analysis processes and techniques are powerful resources. They can systematically identify red flags and perform predictive modeling, detecting a fraudulent situation long before many traditional fraud investigation techniques would be able to do so.
Big data are high volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery, and process optimization. Simply put, big data is information of extreme size, diversity, and complexity. In addition to thinking of big data as a single set of data, fraud investigators and forensic accountants are conceptualizing about the way data grow when different data sets are connected together that might not normally be connected. Big data represents the continuous expansion of data sets, the size, variety, and speed of generation of which makes it difficult for investigators and client managements to manage and analyze.
Big data can be instrumental to the evidence gathering phase of an investigation. Distilled down to its core, how do fraud examiners gather data in an investigation? They look at documents and financial or operational data, and they interview people. The challenge is that people often gravitate to the areas with which they are most comfortable. Attorneys will look at documents and email messages and then interview individuals. Forensic accounting professionals will look at the accounting and financial data (structured data). Some people are strong interviewers. The key is to consider all three data sources in unison.
Big data helps to make it all work together to bring the complete picture into focus. With the ever-increasing size of data sets, data analytics has never been more important or useful. Big data requires the use of creative and well-planned analytics due to its size and complexity. One of the main advantages of using data analytics in a big data environment is that it allows the investigator to analyze an entire population of data rather than having to choose a sample and risk drawing erroneous conclusions in the event of a sampling error.
To conduct an effective data analysis, a fraud examiner must take a comprehensive approach. Any direction can (and should) be taken when applying analytical tests to available data. The more creative fraudsters get in hiding their breach-related schemes, the more creative the fraud examiner must become in analyzing data to detect these schemes. For this reason, it is essential that fraud investigators consider both structured and unstructured data when planning their engagements.
Data are either structured or unstructured. Structured data is the type of data found in a database, consisting of recognizable and predictable structures. Examples of structured data include sales records, payment or expense details, and financial reports. Unstructured data, by contrast, is data not found in a traditional spreadsheet or database. Examples of unstructured data include vendor invoices, email and user documents, human resources files, social media activity, corporate document repositories, and news feeds. When using data analysis to conduct a fraud examination, the fraud examiner might use structured data, unstructured data, or a combination of the two. For example, conducting an analysis on email correspondence (unstructured data) among employees might turn up suspicious activity in the purchasing department. Upon closer inspection of the inventory records (structured data), the fraud examiner might uncover that an employee has been stealing inventory and covering her tracks in the record.
Recent reports of breach responses detailed in social media and the trade press indicate that those investigators deploying advanced forensic data analysis tools across larger data sets provided better insights into the penetration, which lead to more focused investigations, better root cause analysis and contributed to more effective fraud risk management. Advanced technologies that incorporate data visualization, statistical analysis and text-mining concepts, as compared to spreadsheets or relational database tools, can now be applied to massive data sets from disparate sources enhancing breach response at all organizational levels.
These technologies enable our client companies to ask new compliance questions of their data that they might not have been able to ask previously. Fraud examiners can establish important trends in business conduct or identify suspect transactions among millions of records rather than being forced to rely on smaller samplings that could miss important transactions.
Data breaches bring enhanced regulatory attention. It’s clear that data breaches have raised the bar on regulators’ expectations of the components of an effective compliance and anti-fraud program. Adopting big data/forensic data analysis procedures into the monitoring and testing of compliance can create a cycle of improved adherence to company policies and improved fraud prevention and detection, while providing additional comfort to key stakeholders.
CFEs and forensic accountants are increasingly being called upon to be members of teams implementing or expanding big data/forensic data analysis programs so as to more effectively manage data breaches and a host of other instances of internal and external fraud, waste and abuse. To build a successful big data/forensic data analysis program, your client companies would be well advised to:
— begin by focusing on the low-hanging fruit: the priority of the initial project(s) matters. The first and immediately subsequent projects, the low-hanging investigative fruit, normally incurs the largest cost associated with setting up the analytics infrastructure, so it’s important that the first few investigative projects yield tangible results/recoveries.
— go beyond usual the rule-based, descriptive analytics. One of the key goals of forensic data analysis is to increase the detection rate of internal control noncompliance while reducing the risk of false positives. From a technology perspective, client’s internal audit and other investigative groups need to move beyond rule-based spreadsheets and database applications and embrace both structured and unstructured data sources that include the use of data visualization, text-mining and statistical analysis tools.
— see that successes are communicated. Share information on early successes across divisional and departmental lines to gain broad business process support. Once validated, success stories will generate internal demand for the outputs of the forensic data analysis program. Try to construct a multi-disciplinary team, including information technology, business users (i.e., end-users of the analytics) and functional specialists (i.e., those involved in the design of the analytics and day-to-day operations of the forensic data analysis program). Communicate across multiple departments to keep key stakeholders assigned to the fraud prevention program updated on forensic data analysis progress under a defined governance program. Don’t just seek to report instances of noncompliance; seek to use the data to improve fraud prevention and response. Obtain investment incrementally based on success, and not by attempting to involve the entire client enterprise all at once.
—leadership support will gets the big data/forensic data analysis program funded, but regular interpretation of the results by experienced or trained professionals are what will make the program successful. Keep the analytics simple and intuitive; don’t try to cram too much information into any one report. Invest in new, updated versions of tools to make analytics sustainable. Develop and acquire staff professionals with the required skill sets to sustain and leverage the forensic data analysis effort over the long-term.
Finally, enterprise-wide deployment of forensic data analysis takes time; clients shouldn’t be lead to expect overnight adoption; an analytics integration is a journey, not a destination. Quick-hit projects might take four to six weeks, but the program and integration can take one to two years or more.
Our client companies need to look at a broader set of risks, incorporate more data sources, move away from lightweight, end-user, desktop tools and head toward real-time or near-real time analysis of increased data volumes. Organizations that embrace these potential areas for improvement can deliver more effective and efficient compliance programs that are highly focused on identifying and containing damage associated with hacker and other exploitation of key high fraud-risk business processes.