data science

Batman’s new gig: Data science superhero?

Data science is quickly becoming one of the most important fields--here's why.

High demand for data scientists makes them real-life superheroes among employers–but which fictional superhero would make a great data scientist? Batman would make a great data scientist, according to 43 percent of people participating in a poll conducted by data science training provider Metis.

Not everyone agreed with the poll’s outcome. One Twitter respondent asked, “Does Bruce Wayne even know how to code?” “Barbara Gordon is literally a data scientist and she’s in second place, which says a lot about our society,” another weighed in.

Also known as Batgirl or Oracle, Barbara Gordon came in second with 23 percent of the vote, while Black Panther and Wonder Woman followed with 19 percent and 15 percent, respectively.

The informal poll was taken in advance of Metis’ September 27 Demystifying Data Science Live Online Conference, a free 12-hour program of live data science presentations from experts in the field. Presentations are archived and will be available here.

(Next page: Five key parts of data science)

During the livestream, Kirk Borne, principal data scientist at Booz Allen Hamilton and former professor of astrophysics and computational science at George Mason University, outlined five of the most important parts of data science.

1. The Data. “Data science is driven by the data–the fuel of what we’re doing is data,” Borne said. “Nothing is possible without fuel.” A real-world example: Sensor data collected from U.S. commercial jet engines for one year. In one year, those engines produce more than 1 zettabyte of data, Borne said. What’s interesting about data is not so much the volume, but the combinations of different types of data that can be put together.

2. The science. “Thinking about data science as a scientific cycle is extremely important,” he said. Data collection, hypothesis formulation, deduction and formulation of a predictive test, experimental design and testing, evaluation, and reviewing results are critical steps in the process.

3. Data storytelling. This is more than just showing a plot–it’s being able to express it in human terms, being able to describe the “so what,” and illustrate the importance. Being able to tell the data story can help clients and colleagues understand exactly what a data scientists has been doing in a way that’s human and consumable.

4. Data ethics. Data scientists should learn computational thinking and statistical thinking skills and be aware of their biases, Borne said. “If you torture your data long enough it will confess to anything,” he said, citing a well-known quote from economist Ronald Coase.

5. Data literacy. This includes understanding types of data, speaking intelligently about data, and being able to articulate what you can learn from it. Data literacy to me comes in 2 major subcategories, Borne said– how to use data, which is data science, and how to use data correctly, which is data ethics.

Borne also has created a reading list focusing on data science and data literacy, available here.

Laura Ascione