So, what exactly is Data Science?

Data Science is one of those terms that has become part of today’s lexicon for practically any professional, no matter the field. It is a multidisciplinary domain that focuses primarily on the investigation of the past, present, and future. Data Science is not a branch of any single discipline; instead, it is a collection of essential tools taken from statistics, mathematics, computer science, artificial intelligence, and more—all used for the purpose of investigation.

Investigation of what? Absolutely anything, as long as there are enough observations (data) about the subject of study.

Since the word investigation is so commonly linked to Data Science, we can think of a Data Scientist as a kind of detective. Think of any crime movie or documentary: there’s always a series of events that take place in order for the case to be solved—collection of evidence, organization of clues, search for patterns, and formulation of assumptions.

All these steps have equivalents in the Data Science world. Instead of evidence, the Data Scientist must collect data, which can come in the form of text, numbers, images, videos, and other types of media. Depending on the problem being studied, it’s possible that public databases already exist. But if that’s not the case, then it becomes necessary to develop surveys, experiments, interviews, APIs for web scraping, and other data collection methods.

After collecting evidence, a detective must scrutinize it to determine what’s important and discard information that doesn’t contribute to the investigation. They also need to organize the clues in a way that makes the investigation easier to carry out. In addition, the detective must check for missing pieces—essential information that’s absent—and figure out how to handle that lack. This process is what we call data cleaning. It’s essential, because without organized data, it becomes much harder to identify patterns and apply most algorithms.

The next step is one of the most exciting moments in any crime movie: the detective stands in front of a wall covered with carefully arranged evidence and begins to recognize patterns in the criminal’s behavior. In the Data Science world, this moment is called data analysis. During this stage, scientists may find answers to previously established questions, but they might also uncover unexpected patterns that lead to the formulation of new hypotheses.

Usually, visual tools are used to translate these patterns—a practice known as data visualization. With the help of charts and graphs, scientists not only gain a better understanding of the story behind the data, but also have a more effective way to communicate their findings to people from different fields.

However, simply solving the mystery is often not enough. When dealing with dangerous criminals—like serial offenders—the goal is not just to discover who they are, but also to understand how they operate in order to predict and prevent their next actions. While detectives use logic and experience to make these predictions, Data Scientists use machine learning to forecast the future based on historical observations.

Data Science is a broad concept made up of numerous tools and techniques borrowed from various disciplines, all aimed at investigating anything that can be quantified or documented. Being a Data Scientist requires a wide-ranging skill set, especially in statistics and computer science—but more than anything, it demands curiosity and a passion for seeing beyond the obvious.


Discover more from The Data Viewfinder

Subscribe to get the latest posts sent to your email.

Leave a comment