The first phase of data analysis is understanding the business problem. Stakeholders immediately know if the Data Analyst understood the business problem by how the problem statement is structured and the issue is mapped. There are several foundational steps to follow —
The HDEIP Framework
Below are the five stages:
- HYPOTHESIS: The broad scope of the problem. What is the problem you are facing?
Note: Before proceeding to the Data Sourcing phase, complete the hypothesis formation, SMART principles, Issue Tree, and, if needed to convert business metrics, the Value Driver Tree as further discussed below
2. DATA SOURCING: Locate the key data needed to address the hypothesis and prove it to be TRUE or FALSE. What are data sources needed to answer the questions in the hypotheses stage?
3. EXPLORATORY DATA ANALYSIS: Exploration of the data set with respect to the hypothesis formed. What does the data indicate with respect to the hypotheses formed?
4. INSIGHTS: Synthesis, recommendation, and compression of findings. What insights/conclusions can you draw from answering the hypotheses/issues?
5. PRESENTATION: Choose the type of presentation style — executive, technical, or non-technical presentation. Review the aim, objectives, and summary. What insights can you present to the respective stakeholders to gain buy-in?
To craft the best problem statement, you will begin with hypothesis formation by answering the following:
- CONTEXT: Why are you working on this business problem? Note down any relevant context for the project.
- SUCCESS CRITERIA: What key criteria will deem this work a success? What deliverables are necessary?
- SCOPE OF SOLUTION SPACE: What is the focus of this business initiative? (i.e. What are you focusing on exclusively?)
- CONSTRAINTS WITHIN THE SOLUTION SPACE: What constraints exist which will prevent this business initiative from succeeding? Do you have the right access to data? Timeline constraints? Financial constraints? Legal considerations?
- STAKEHOLDER ASSESSMENT: Who are the key stakeholders that need to be involved to get the answers? Think about this from a data-gathering perspective as well as a stakeholder engagement perspective.
- KEY DATA SOURCES: What are the key pieces of data you need to answer the questions related to the problems you are trying to solve?
Now for the exciting part! The hypothesis has been organized and it’s time to begin crafting the problem statement. Every good problem statement includes the following:
- S — be specific, not general
- M — measurable
- A — action-oriented
- R — relevant to the key problem
- T — time-bound
Example: How can ByJulissaMarin grow viewership (relevant) by 15% (measurable) within the next 3 months (time-bound) by creating (action) more content(specific) with other bloggers (e.g., Teradata) and/or network (action) to share existing content (e.g., YouTube), without diluting or hurting ByJulissaMarin overall brand?
Once you have your SMART problem statement focused exclusively on the hypothesis formation, then it is time to create an issue tree. The sole purpose of an issue tree is to map the problem statement. First, divide the problem statement into several components splitting each issue into sub-issues. The Data Analyst can break down a complex problem into manageable pathways and allocate the components to different groups/people.
For the issue tree to be foul-proof, the pathways should be mutually exclusive and collectively exhaustive. There should not be:
- overlapping parts
- no component appearing more than once
- no gaps
- all options considered, even non-actions
Let’s take the SMART problem statement and create a simplistic issue tree:
Value Driver Tree
The function of the value driver tree is to convert business metrics into a clear and accountable manner (e.g., revenue, operational costs, production costs). Similar to the issue tree, there is the main lever that impacts the other connected levers. Whether moving from the primary/root node or the end node, a sense-check ensures each business metric can be retraced off the correctly defined unit.
- Main lever/Primary node: What is the unit of measurement? What is the business measuring? In the image below, the production cost is calculated by dividing the operation cost from the lbs produced.
- Sub-value lever: What directly impacts the primary node? Continue the process of sub-levers until you reach the final node. However, do not over-expand. Each sub-value lever must add purpose.
- End node: A data analyst must know when to stop converting metrics. In the image below, the final end node shows the operational cost is calculated by adding the fixed and variable costs. Both sub-value levers illustrate the same metric unit.
Once a robust hypothesis has been developed, the next phase of data sourcing from the HDEIP framework begins! Hopefully, the succinct notes gathered in this article have been of use to you as they have been to me. As it's the start of 2022, feel free to use the SMART principles to structure your personal goals. One goal I have in mind is to improve my golf game! Let us see if I can decrease my putting average by 2% in the next three months by practicing two times a week at the range!