Scott Bauguess, Deputy Director and Deputy Chief Economist of the SEC’s Division of Economic and Risk Analysis, recently explained what it means for data to be “Big Data” and what the implications are for the SEC as the agency integrates new statistical methods into its daily work of monitoring and enforcing compliance with the federal securities laws. Bauguess delivered his remarks to an audience at the Midwest Region Meeting of the American Accounting Association in Chicago.
How big is big? Bauguess sought to dispel misconceptions about what “Big Data” actually is. The differing views of “Big Data” and Bauguess’s conception of how much data is needed to be “Big” can be summarized in few bullet points:
- “Rules of thumb”—More data than a thumb drive can hold.
- Spreadsheet test—Data exceeds capability of popular spreadsheet programs.
- One day theory—Amount of data is more than can be processed in one day.
- Related one-day concept—Data that takes more time to process than to create.
- Bauguess’s idea—Any data that approaches the computational limits for analysis.
Social vs. data scientists. “Big Data” can shed new light on human behavior, but these gains challenge long-standing tenets of social scientists, who typically eschew data mining, and the new data scientists. Bauguess, who was trained as a social scientist, said the new normal for the field of data analytics suggests a few unintended results.
For one, Bauguess said there is less academic interest in descriptive statistics. He suggested that this evolution may hinder the development of new hypotheses that are amenable to empirical research because of the reluctance to mull poorly understood correlations and other trends without rigorous proofs.
Another unexpected result is the tension between social scientists and data scientists. Both fields seek to demystify human behavior, albeit from different approaches. The new field of data science explains behavior by making predictions about what humans will do next. The data scientist’s tool kit includes various deep learning methods that can build upon still other methods such as data mining, neural networks and machine learning. By contrast, social science places greater emphasis on understanding the “why” of human behavior while avoiding the use of data mining, which can lead to false correlations.
Ultimately, Bauguess sees the SEC’s “Big Data” efforts as embracing complementary aspects of the social and data sciences. As an example, the Corporate Issuer Risk Assessment (CIRA) dashboard can identify patterns in companies’ financial statements with the goal of finding anomalies that demand closer scrutiny. CIRA’s classical statistical modeling was an outgrowth of the SEC’s accounting quality model (aka “AQM” or “Robocop”).
Another unexpected result is the tension between social scientists and data scientists. Both fields seek to demystify human behavior, albeit from different approaches. The new field of data science explains behavior by making predictions about what humans will do next. The data scientist’s tool kit includes various deep learning methods that can build upon still other methods such as data mining, neural networks and machine learning. By contrast, social science places greater emphasis on understanding the “why” of human behavior while avoiding the use of data mining, which can lead to false correlations.
Ultimately, Bauguess sees the SEC’s “Big Data” efforts as embracing complementary aspects of the social and data sciences. As an example, the Corporate Issuer Risk Assessment (CIRA) dashboard can identify patterns in companies’ financial statements with the goal of finding anomalies that demand closer scrutiny. CIRA’s classical statistical modeling was an outgrowth of the SEC’s accounting quality model (aka “AQM” or “Robocop”).
The SEC has still other tools for conducting data analysis beyond CIRA. Earlier this year, Bauguess described the Broker-Dealer Risk Assessment tool and the SEC’s analytics for gauging investment company risks.
But the human element still plays a vital role in the SEC’s analytics programs. Bauguess said classical models require human input at the initial design stages, while machine learning models demand human interpretive skills to better judge the outputs.