A data analyst gathers data, organizes it and then uses it to provide key business insights and solve business problems.
They also help organizations to understand the current state of their business by interpreting a wide range of structured and unstructured data.
They are very much in the data trenches, drawing on programming, math and statistics to derive insights from data that will help to improve business performance.
The key responsibilities of a data analyst includes:
- Identify data and acquiring them from primacy and secondary sources
- Filtering and cleaning the data in preparation for analysis
- Analyze the data using statistical techniques and tooling
- Working with key stakeholders to glean business requirements
- Interpret the results of the analysis and reporting
- Identifying trends and patterns in complex data sets
- Developing databases and implementing data systems
Why are data analysts so important?
As almost all industries are coming under increased pressure from falling margins, rising costs and increased commoditization, enterprise organizations are having to dig deep to find new sources of revenue and competitive advantage.
The vast volume of data that enterprises preside over is the biggest untapped source of opportunity that exists. Yet, businesses often lack the technical expertise, experience and strategy to make use of their data.
Bringing on board data analysts with the right skills and mindset will help them to exploit their data to create new business opportunities.
Data cleaning and preparation
Data scientists spend about 19% of their time collecting data and a further 60% cleaning and organizing that data, according to research by Crowdflower.
That means that your data analyst will be spending about 80% of their time (on average) preparing data for analysis, so they will need to be good at it!
This involves collecting data from various sources (databases, software, surveys, third party data sets etc.) before cleaning it, which involves filling any gaps in the data and smoothing out any inconsistencies and anomalies. Often, datasets must be transformed in order to make them compatible with each other.
Your candidate should be familiar with working with different kinds of data (e.g. structured and unstructured) and be able to clean and transform these datasets effectively, including familiarity with Extract, Transform and Load (ETL) techniques.
Once your data has been prepared, the next critical skill is to be able to analyze it. This is where the raw data is turned into useful business insights.
Deep data analysis requires strong mathematical and statistical proficiency, as well as familiarity with popular data analysis tools such as Tableau, Jupyter Notebook and Microsoft Power BI.
An added bonus would be experience with cutting-edge modern approaches to data analysis. The knowledge graph, for example, is reaching maturity as a way of organizing data from multiple sources and then establishing connections between these data sources to derive new knowledge from those connections. Artificial intelligence and machine learning are also powerful tools for data analysis (see below!).
The real business value comes, not when data has been analyzed and insights gathered, but when these are successfully communicated to other stakeholders in the business. That’s when they suddenly become useful to the business.
That’s why data visualization is such a critical skill: being able to take complex data and insights and using visuals, charts, graphs and so on to be able to present that data in a way that the audience can easily understand.
This requires a mixture of data analysis skills, on the one hand, and storytelling and presentation skills, on the other, to be able to understand what the data is saying and tell a story about it that hits home with the audience.
Familiarity with powerful intelligence tools such as Tableau and Power BI is also very helpful here, as these allow the user to powerfully represent data intelligence and insights.
Dashboards and reporting
A key responsibility of a data analyst is to make important data and insights easily available to other areas of the business.
This could involve creating anything from very simple data reports to interactive dashboards that draw on hundreds of data sources.
There is an art to making data highly discoverable and accessible. Your prospective candidate should have a lot of experience in using data tooling (anything from Excel to Tableau) to create effective dashboards that allow users to easily get the insights they need to do their jobs better.
Structured Query Language (SQL) is the industry-standard relational database language.
The language is the gold-standard for querying and handling data in relational databases. In a recent survey, SQL was the most in-demand skill for data scientists.
Its ubiquity makes it probably the most important specific skill for data analysts to know. It’s an absolute non-negotiable for any prospective candidate!
Your candidate will need to be familiar with the fundamentals of SQL, including the basics relational databases, tables and indexes, SQL syntax and conditional filters. Intermediate skills include things like Joins, subqueries, supporting programming languages like PHP and related systems such as MySQL and PostgreSQL that would also be useful additions to an analysts’ armory.
NoSQL, which means “not only SQL”, is a database paradigm that focuses on non-tabular, non-relational databases. NoSQL databases allow developers to store large amounts of unstructured data, which gives them greater flexibility.
Ideally, your data analyst will be familiar with the four types of NoSQL database – key value, document, graph and column family stores. Similarly, they will have experience with some of the more popular NoSQL databases tools, such as MongoDV, Cassandra, Redis, Neo4J.
Statistical Programming Languages (R or Python)
Statistical programming languages are used for advanced analyses and predictive analytics on large data sets. They allow data analysts to go beyond spreadsheets and even SQL, enabling much deeper analysis, much more quickly.
Your prospective candidate should have mastered at least one (if not both!) of the most popular of these languages, with R and Python being the industry standards. These open source languages are very similar in many ways, however, Python is a general-purpose programming language, while R is a child of statistical analysis, specifically.
Python tends to be used more for data manipulation, analysis and is effective for machine learning, whereas R slants towards cleansing, preparing and visualizing data.
Machine Learning (ML)
ML is a priority area of investment and research for many enterprises due to its potential to derive deep insights and make powerful predictions.
It’s a method of data analysis that automates analytical model building, allowing it to be executed at massive scale.
ML can help data analysts to automate much of their standard data analysis workflow to provide more comprehensive insights from larger data sets at speed and scale.
A truly advanced data analyst will have good knowledge of ML algorithms, data modeling and neural networks. It is helpful if they are familiar with wider AI/ML operating models that allow ML to be conducted at scale, such as MLOps, which is a framework for automating the path to production for ML models.
A good data analyst will be able to grasp at a deep level the importance of data for their business and take a ‘data-first’ approach to business problems.
They see all the ways in which data can open up powerful new opportunities for increasing efficiency, sparking data-driven innovation, enhancing products, improving the customer experience and so on.
If they grokk the power of data to truly transform businesses this will be evident in how they go about their work and the ambition and vision they set for themselves.
Ethics of data
Working with data is increasingly fraught with very relevant ethical concerns, particularly when it comes to customer data and personally identifiable information (PII).
The main areas to look out for are:
- Ownership (do you have consent to store this data?)
- Transparency (do individuals know how you will use the data?)
- Privacy (is PII truly private?)
- Intention (what is the goal behind collecting the data?)
- Outcome (what are you trying to achieve with the data?)
A data analyst will need to take these ethical concerns seriously and will have a firm understanding of the different ways that ethical concerns can arise when working with data.
This could include a wide range of issues from simple compliance issues around the sharing of personal data and how long personal data is held, all the way to ensuring that machine learning models do not institutionalize race and gender biases.
All organizations come laden with particular assumptions, beliefs and received wisdom about how things are done.
A great data analyst will be able to bring critical thinking to bear on business problems to evaluate them in a fresh light and bring innovative, out-of-the-box solutions to the table.
Similarly, it’s not enough to rely on tools to do the data analysis alone. Data analysts must first deploy their critical thinking skills to determine the root of the business problem, what data to analyze, how to do it, what to look for and so on.
As data increasingly becomes the central driver of modern innovation and transformation, data skills are getting more and more valuable.
The data analyst sits right at the heart of any enterprise data program and a high-quality team of these in-demand data workers will typically be a major asset to the organization.
If you’re looking for first-class data analysts, check out our database of vetted, global technical talent that can seamlessly integrate with your existing team.