Friday, October 21, 2022

Why are graphs important for data analysis?

Hello!

Why are graphs important for data analysis? My dear reader, you may already know the answer to this simple question but rest assured that there are still those out there who just want to look at rows and columns in a spreadsheet.

If I remember correctly, I started developing my first graphs in good old Excel to control my household budget. Nothing special until then. A few years later, I ended up on a Data Warehouse team with some colleagues who were averse to colors, shapes and graphs. I would sometimes hear some questions: "Who cares about this? All these colors and what matters is not the data?"

Journalism has been using visual resources such as infographics to illustrate news stories for years. Corporate data warehouses have also been using visual resources to show their data to users, but it was the advent of BIG DATA that made the volume of data impossible to use in a spreadsheet. Graphs then reinforced the role of showing data in a more intelligible way than just in rows and columns. 

Many tools have been developed to make the most of data visualization. There are very sophisticated tools on the market with dozens of chart models.

You can have a Ferrari as a tool, but if you don't know how to use it, it will be your favorite car in data visualization. Visual tools should be well used so that you can not only drive them, but also extract the best interpretations that the data can show.

In this post, I share the graphs built in Microsoft's Power BI tool to illustrate how a bad graph can make data analysis difficult.

You, dear reader, can download the data set I used for free from the address provided on the web by Microsoft:  Sales Financial Data .

First, I wanted to analyze product sales over time, and my favorite chart for this is the line chart. See how it looks in the figure below.


You can see a jumble of colored lines overlapping each other. The graph has become so confusing that you feel discouraged from even starting to analyze it.

Determined to get a better view, I set out for a second attempt. Did I succeed? See below.


Well, it wasn't this time. I removed the quarter from the X-axis and kept only the year in an attempt to improve the visualization. The lines continue to overlap, making it difficult to analyze sales. 

Don't be discouraged, I've come up with a solution. See below.


In the figure above, I used a horizontal bar chart stacked at 100%. I put the year on the y-axis and sales on the x-axis. I used the products as the legend. Because this stacked bar chart made it possible to visualize the change in sales of products year by year without the colors and shapes competing with the data. For example, it is clear that there was an increase in the number of sales of the product/car "Carretera" in 2014. With this visualization, we do not need to spend hours trying to find the product/car "Carretera" as in the line chart.

Dear reader, it is worth spending a few minutes or hours testing the best visualization. Your user/client will certainly thank you for being able to easily extract inferences from a well-made graph.

Happy data analysis!









No comments:

Post a Comment