Welcome to GnomeLedge (39/42)
Mar 2024 Updates - 39 Gnomes remains
Mary is an illustrator working in a publishing house. Recently she has decided to take up a part-time business intelligence and data analytics course to further her own studies.
In one of the first group project, they are supposed to identify a topic to adopt as a data visualisation and monitoring - dashboard project. The only constraints in their selection of the topic is that the primary dataset has to be from a validated Singapore source, and the data visualisation topic must be relevant to the Singapore audience.
After much deliberation, the group decided anchor their data visualisation and monitoring - dashboard around the theme of Singapore's Baby Bonus.
Through Mary's learning journey, you will understand some considerations required for a data exploration and visualisation project such as:
- Thinking about questions to ask
- Define the tools of the trade
- Outline the areas to first begin
- Understand the value of picking up skills in this area
Are you ready to begin exploring with Mary?
Here is the list of resources and generic reference points compiled for Mary's Journey:
- [Generic References] How We Decode Visual Information – #PoDV: https://www.fusioncharts.com/blog/how-we-decode-visual-information-podv/
- [Generic Reference] W3Schools - SQL: https://www.w3schools.com/sql
- [Generic Reference] Automating the Design of Graphical Presentations of Relational Information: https://info.sice.indiana.edu/~katy/S637-S11/Mackinlay86.pdf
- [Generic Reference] 3 Design Layouts: Gutenberg Diagram, Z-Pattern, And F-Pattern: https://vanseodesign.com/web-design/3-design-layouts/
- [Generic Reference] The Art of Consequences: https://edspace.american.edu/visualwar/nightingale/
- [Generic Reference] Datakind: https://www.datakind.org/
- [Generic Reference] Omdena: https://omdena.com/projects/
- [Generic Reference] SG Healthcare AI Datathon & Expo: https://sg-ai.org/
- [Generic Reference] Baby Bonus Dashboard – Building a Nation. Creating a Family.: https://analyticsandintelligentsystems.wordpress.com/2017/08/07/baby-bonus-dashboard-building-a-nation-creating-a-family/
"Natural selection is the blind watchmaker,
blind because it does not see ahead,
does not plan consequences,
has no purpose in view."
— RICHARD DAWKINS, 1986, P. 21
Mary's Perspective: So you need to explore and visualise data
"Now that we have locked-in our theme for the group project, how should we begin?" Mary asked as she sent out the email informing the lecturer of the group's selection. "According to the project requirements, we need to design the dashboard around five business/mission critical questions."
"Our primary data that we have selected from the Department of Statistics Singapore covers birth rates and incomes. We can probably ask and find answers for:
- What is Singapore’s fertility trend?
- What is Singapore’s birth order trend?
- Which ethnic group responds positively to the Baby Bonus Scheme?"
Ben, the second member of the team noted. "But we need 2 more questions. If we are coming from an angle where the dashboard is designed for mission critical use-case, based on the context of our chosen theme, it makes sense to also ask:
- What is the progression of living costs (of a child) over the years since the introduction of the Baby Bonus scheme?
- How effective is the Baby Bonus Scheme in lessening the financial burden of raising a child?"
"We have a problem." Joel the last member of the group said as he combed through the data. "The primary data is not enough to ask questions of such depth. The data we have can cover the first 3 questions, but we will need to collect secondary data to form a hypothesis that can inform and provide some insights for the last 2 questions."
"We will probably need some market price of commodities associated with raising a child over time, to analyse the progression of living costs (of a child)." Mary suggested. "I know my kid cost me a lot in milk powder and diapers during his first few years. So the price of those items over time could possibly produce some insights about the burden of raising a child at different stages of his/her growth. The problem is that I don't think there are publicly downloadable data in these area."
"One way is that we stick to asking easy questions with the data we have." Ben noted. "But we are all adults. If we are going to invest our time to do this, I feel that we should get something meaningful out of the learning experience. The spirit of data exploration and visualisation isn't meant to find simple answers, but to really uncover/deliver some new insights in the data wilderness. If you two are open to take a risk on the project marks to uncover something new, I am okay to work together and find answers to the harder questions."
"I am all for it." Mary said. "I am one of the 'stakeholders' of the Baby Bonus programme, being a mom and all. It will be interesting to see what we uncover."
"I am also fine with tackling harder questions where outcome is uncertain. That's why I applied for this course, to translate unknown-unknowns into partially known elements. I can do a quick scan of potential sources for whatever data features we feel maybe necessary for our secondary data and scrap them for exploration." Joel noted. "The secondary data wouldn't be validated by official sources, but they can demonstrate like a data minimum viable product (MVP), of what such data features might bring to the table, in terms of generation of insights to deeper questions. But we will have to be prepared to clean and harmonise the datasets and prepare the multiple individual datasets collected into a final database schema for connection to the dashboard."
Ideation Exercise
If you are given a task to identify potential data features that could reveal an estimated cost of raising a child overtime, what could be the key data features that might have significant impact in the analysis? e.g., milk powder, vaccination costs
Write down your thoughts in your journal / blog.
Tools of the Trade for Data Storage, Transformation and Visualisation
Some examples include:
- Data Storage and/or Transformation/Preparation:
- Data Visualisation:
Mary's Perspective: Selecting the Tools, SQL Query and Design
"Let's keep this project simple." Ben said as he began outlining the plan. "While we have missing data that requires manual effort to collect and prepare, the overall datasets are not that complex. We don't have to design complex data modelling to solve any inherent data issues. We can select SQL server to store and transform the data, and link the database to PowerBI which we can then use to create the dashboard."
"If we use SQL, that means I will need to use SQL commands to instruct the system for data manipulation tasks. I am still quite new to SQL query." Mary said. "Can you run me through it again?"
"I can guide you through some simple SQL statements you may need, when you are doing feature engineering." Joel offered. "But given that the intention of this project also consists of the display of insights and design of the dashboard, we will need your help as well to work on the design aspect."
"That I can contribute." Mary said with a smile. "Effective data visualisation involves consideration and application of pre-attentive attributes to form analytical patterns. I can walk you through some of the considerations why creating a dashboard layout that can convey the critical information we wish to display in a visual format. Let's trade our knowledge."
Tutorial 1: SQL JOIN
JOIN is used to combine rows from two or more tables using a primary key.
Where "Costs" table has the columns "Year" and "Total Cost Year".
Where "Income" table has the columns "Year" and "Median Income".
Where "Year" column in the "Costs" and "Income" tables refer to each other.
We can use the following INNER JOIN SQL statement to select records with matching values in both tables.
SELECT Costs.Year, Income.Median_Income, Costs.Total_Cost_Year
FROM Costs
INNER JOIN Income ON Costs.Year=Income.Year;
Reference to test out a sample INNER JOIN SQL statement: https://www.w3schools.com/sql/sql_join.asp
Tutorial 2: SQL DELETE
DELETE can be used for deletion of records where the column contains an identified value.
Where "Merged_Costs" table has the columns "Year", "Maid_Cost_Month", "Maid_Cost_Changes", "Infant_Milk_Powder_900_Gram_Tin", "Infant_Milk_Powder_900_Gram_Tin_Changes", "Tins_of_Milk_Per_Month", "Infant_Care_Services_Month", "Infant_Care_Services_Month_Changes", "Child_Care_Services_Month", "Child_Care_Services_Changes", "Thomson_Medical_Delivery_OneTime", "Thomson_Medical_Delivery_OneTime_Changes", "Childcare_Subsidies_Month", "Infant_Care_Subsidies_Month", "BB_Cash_Year", "BB_Cash_Month_Changes", "MaidCost_Year", "Infant Milk_Year", "Infant_Care_Year", "Child_Care_Year", "Child_Infant_Care_Avg_Year" and "Total_Cost_Year".
DELETE FROM Merged_Costs WHERE Maid_Cost_Changes=' ' OR Maid_Cost_Changes= IS NULL;
Reference to test out a sample DELETE SQL statement: https://www.w3schools.com/sql/sql_delete.asp
Data visualisation - communicating through visuals
One of the basic building blocks for visualisation are the preattentive attributes utilised by human's working memory. These attributes are what the human eyes can capture immediately when viewing an image. The attributes might be categorised according to form, color and spatial positioning.
The common analytical patterns such as the bar chart, variations of the line and dots are intended to tap on the preattentive attributes to form the "visual language" to expose the structure of the data, so as to disseminate critical insights via the visualisation dashboard.
Dashboard layout design
Part of dashboard design consist of consideration as to where visualisation can be best organised to support efficient and meaningful monitoring.
Most humans are conditioned to read from left to right in an F-shape reading pattern.
As such a simplified golden triangle to mark critical charts to be positioned could help piroritise charts for ease of view and monitoring, as they will be areas where the human eyes will normally start reading from.
The concluding KPI and metrics placed as the last chart at the bottom right corner of the screen as they represent the end of the info-representation journey.
Which industries may seek data exploration and visualisation skills?
Data visualisation is not new. Ever heard of Florence Nightingale? Florence Nightingale was a well known female icon that operates as a social reformer, statistician and most importantly, one of the founder of modern nursing and a pioneer in data visualisation.
According to "The Art of Consequences" website, Nightingale's well-known infographic "Diagram of the causes of mortality in the Army in the East" was alleged to be critical in convincing then Queen Victoria to adopt Nightingale's recommendations for war medicine and sanitation practices.
Any industries that require data exploration, investigation, insights generation, reducing "complex/voluminous/noisy" data and getting to the "meat" of the issue, conveyance of critical information to key (but busy) stakeholders etc. will at times tap on the skills inherent in data exploration and visualisation.
Industries with such continuous needs and fixed positions may be from:
- Education
- Research and Development
- Healthcare
- Security
- Military
- Banks
- Merger & Acquisition
- Lawyers
- Valuators
- Market Forecasters
- Investment
- Journalism
- Accounting
- Pharmaceuticals
- etc.
In the wider societies, citizen scientists have also banded together to pool efforts and resources. Such gathering of citizen expertise usually identify a cause or theme to explore and produce insights for. Such ad-hoc exploration, research and visualisation works will usually be related to social good or to contribute in pro bono work for non-profits, in exchange for access to proprietary data not available to public.