Hello!
Why are
graphs important for data analysis? My dear reader, you may already know the
answer to this simple question but rest assured that there are still those out
there who just want to look at rows and columns in a spreadsheet.
If I
remember correctly, I started developing my first graphs in good old Excel to
control my household budget. Nothing special until then. A few years later, I
ended up on a Data Warehouse team with some colleagues who were averse to
colors, shapes and graphs. I would sometimes hear some questions: "Who
cares about this? All these colors and what matters is not the data?"
Journalism
has been using visual resources such as infographics to illustrate news stories
for years. Corporate data warehouses have also been using visual resources to
show their data to users, but it was the advent of BIG DATA that made the
volume of data impossible to use in a spreadsheet. Graphs then reinforced the
role of showing data in a more intelligible way than just in rows and
columns.
Many tools
have been developed to make the most of data visualization. There are very
sophisticated tools on the market with dozens of chart models.
You can
have a Ferrari as a tool, but if you don't know how to use it, it will be your
favorite car in data visualization. Visual tools should be well used so that
you can not only drive them, but also extract the best interpretations that the
data can show.
In this
post, I share the graphs built in Microsoft's Power BI tool to illustrate how a
bad graph can make data analysis difficult.
You, dear reader, can download the data set I used for free from the address provided on the web by Microsoft: Sales Financial Data .
First, I wanted to analyze product sales over time, and my favorite chart for this is the line chart. See how it looks in the figure below.
You can see
a jumble of colored lines overlapping each other. The graph has become so
confusing that you feel discouraged from even starting to analyze it.
Determined
to get a better view, I set out for a second attempt. Did I succeed? See
below.
Well, it
wasn't this time. I removed the quarter from the X-axis and kept only the year
in an attempt to improve the visualization. The lines continue to overlap,
making it difficult to analyze sales.
Don't be
discouraged, I've come up with a solution. See below.
In the
figure above, I used a horizontal bar chart stacked at 100%. I put the year on
the y-axis and sales on the x-axis. I used the products as the legend. Because
this stacked bar chart made it possible to visualize the change in sales of
products year by year without the colors and shapes competing with the data.
For example, it is clear that there was an increase in the number of sales of
the product/car "Carretera" in 2014. With this visualization, we do
not need to spend hours trying to find the product/car "Carretera" as
in the line chart.
Dear
reader, it is worth spending a few minutes or hours testing the best
visualization. Your user/client will certainly thank you for being able to
easily extract inferences from a well-made graph.
Happy data
analysis!

No comments:
Post a Comment