If you’re a seasoned analytics developer, you may be able to relate to the following scenario:
You’ve spent more hours than you’d like to admit designing a report that will blow your director away. It has time series drill downs and custom Chiclet Slicers and refreshes every five minutes. It answers the big questions and the small ones. It accepts natural language queries! It’s nothing short of a Nobel Prize winner.
While preparing your final email to explain the report and where it’s located, you have a last minute panic: “Have I double-checked the numbers?” You’ve been so excited about building the report that its accuracy has slipped into the background.
Reconciling a report can be a heartless job – akin to weeding a garden during the Super Bowl. However, delivering an inaccurate report will make or break your reputation as an analyst. Once you’ve acknowledged that this should happen BEFORE you start developing the report, here are the top seven methods you should use to ensure the information you’re delivering is as accurate as possible:
- Find a similar report or reports. These will be your North Star because they are already accepted as fact by their stakeholders. If you’re questioned by anyone on accuracy, you can ensure them that it matches an existing report, which in some offices carries significant weight depending on the report.
- Count records with the same attributes and compare their measures. In some cases, this can seem like trying to boil the ocean. So, for sanity’s sake, I always choose a slice of the data that I know should match exactly. This is usually a time period, like one month. If you find mismatched aggregations, it’s time to drill into those records to see if records are missing, if records are duplicated, or if you’re missing some business logic.
- Find attribute values that are missing in your report and explain them. If your stakeholders ask you why the Supersonics were excluded from your Seattle Sports History dashboard, you should have a good answer.
- Know your data sources and how they were built.
– Are they views or multi-dimensional cubes? If so, what base tables are they built on?
– What business process does this data source represent?
– What is their refresh cadence – weekly? Monthly? Is this okay with your stakeholders?
– Is your test environment up to date? Test teams who put functional testing over data testing may not refresh their data often.
- Data lineage. Know what your fields mean and where they come from. Ideally, although not always practical, you should be able to trace your data all the way to the point of collection. Like the great Nile River, all data has a source – or something like that.
- Are any fields in your report renamed throughout the ETL process? If so, why? Once you understand the transformation process your data goes through, you should feel empowered to ask “why” because this is an opportunity to better understand the business. For example, it’s possible for a “fiscal week” to have only one day (true story). You might have a chance to better understand the business.
- Ask a friend. A quick glance at the numbers by a co-worker or friend can make all the difference in the world. If your buddy Pavel says, “Pat, there’s no way 1 billion orders were made at Adventure Works in one day,” he might have a point. Go back to the drawing board.
If you’re reading this article, you already understand the importance of accurate reporting, but maybe now you’ll have a couple tips in mind the next time a new analytics developer asks for some advice.
I’ll leave you with one last nugget of advice that often makes my design decisions much simpler:
#FactsOverFeatures: A stakeholder won’t always be upset about missing features, but they will always be upset over broken facts!