Manuel Durazo is a data scientist and a member of the Andela Community.
Let me introduce myself, I’m Manuel, a mid-level data practitioner starting my journey as a remote worker. Even though I have about 5 years in the field, Andela welcomed me into their community recently: this is my fourth month enjoying an advantageous nomadic working experience.
I won’t waste time discussing how it is possible to work for an international company from almost anywhere in the world, the remote workplace is something that everyone is increasingly aware of, (even more so in a post-pandemic world). The most important question is: how do you get to work for the company you want in the first place?
I can only share my personal experience so far, which is defined primarily by the continuous improvement of my skill set and carefully picking the right sources to learn from.
After I graduated from college, I got my first 9 to 5 job as a data analyst. As a newcomer, I thought my job would consist of extracting insights from an already ordered dataset by applying some descriptive statistics and later making inferences using some fancy regressions. However, as most of you working with data know, this is usually not the case.
The data I was going to work with was stored on a relational database, it was not normalized and had lots of missing values. At that time I knew almost nothing about SQL. So that was my first stop: watching videos about the foundations of information storage, the multiple normal forms (1NF, 2NF, 3NF, etc.), and how to create views by combining already existing tables.
That took a while, but once the view was created and showed data in a clean manner, it was easy to tap into by using a statistical language like R along with a SQL connector library.
Having data in the right form will save you a lot of time. In my case, to solve the messy information problem I faced, I created a view inside the existing database, combining data in existing tables with the aim of getting data into the form that analysis and visualization libraries in R (lm, ggplot2) needed, which was typically a 3NF or 4NF, with one row per observation.
A later stage involved automatic reporting using tools like RMarkdown. Understanding what your organization wants to accomplish with data and how the different sources of data work (databases, streaming, warehouses) to align to those goals is the first step towards a digital transformation of your organization. For me, this was how the world of data analysis started to unfold: I had no experience and a very different toolkit than the circumstances needed. However, I allowed myself some study time in order to find ways to change things up, and so, I arrived at a slightly better result.
This is key, since working in the data industry (and almost any other industry in tech) requires a lifelong learning attitude. You should always look for solutions to solve or improve the daily challenges you encounter. Some people decide to transition from data analysis into data science and/or data engineering; this is a common trend, since often the data you work with does not arrive as clean as it’s supposed to be for analysis. Sometimes you find data scientists combing their role with the tasks of a data engineer, including ingestion, transformation and/or optimizing pipelines.
This transition is also the path I took and one I would love to keep sharing with you in future articles!
Want to be part of the Andela Community? Then join the Andela Talent Network!
If you found this blog useful, check out our other blog posts for more essential insights!