Trainings and tools
Data: Quality, analysis, and interpretation
- How do I find existing data for my county or community health board?
- How do I think about the quality of the data?
- What is the difference between rates, percentages, and proportions?
- How do I use percent difference and percent change?
- What should I think about when presenting two variables together (e.g., cross-tabs)?
- What should I think about when presenting trend data or comparisons between groups?
- How do I know if a health condition or issue is serious in my county or jurisdiction?
- How do I use qualitative information?
- How do I put it all together? (Considering the entire process)
How do I find existing data for my county or community health board?
There are several sources of existing (secondary) data at the county or community health board level. Much of this data is collected every year and in the same way, which makes it especially useful. The Minnesota County Health Tables, the Minnesota Student Survey (every three years), and vital statistics (birth and death) are key pieces of information for describing community demographics, health issues and contributing causes of community health issues. This set of information can lead you to additional questions and perhaps additional data sources to consider.
- Minnesota county-level indicators for community health assessment provides population, birth, and death data through the MDH Center for Health Statistics.
- United States Census Bureau QuickFacts: Minnesota provides additional population data.
- The Minnesota Student Survey has been administered to students in public, charter, and tribal schools every three years since 1989. Initially, grades 6th,9th and 12th were surveyed. Grades changed to 5th, 8th, 9th, and 11th in 2013. Minnesota Student Survey results are available by state and county. Note when using trend data that participating school districts may have changed between survey years.
- MN Public Health Data Access Portal is a data portal that contains health and environmental information, which you can query by state or county and over time.
- Local Minnesota health surveys: For more information, please contact the Minnesota Center for Health Statistics.
- County Health Rankings & Roadmaps: Rankings are a starting point, not an endpoint, and communities are encouraged to draw on local sources, which often have more recent data. Reliability of measures used in the rankings can vary. This is a particular concern for counties with small populations. Improvements in a county’s rank from year to year may be due to real improvement in health in that county—or to declines in health in other counties.
How do I think about the quality of the data?
There are several factors to consider when evaluating the quality of a data source. Things to think about include:
- Data collection methodology: How was this data collected? Who collected it? What was the main purpose in collecting this data?
- Population represented: Who is included in the data? Who isn’t included? Is everyone possible included (birth, death, population data) or is it a statistical sample?
- For surveys,
- Response rate: What was the response rate? Can the researchers describe differences and/or similarities between respondents and non-respondents? Is the data weighted to account for non-response?
- Self-report: Do you have reason to believe that people may feel uncomfortable answering some of the questions accurately (response bias)? Are they asking questions about things that might be hard for respondents to remember (recall bias)?
- Number of observations in the dataset: How many people are in the data set? Are there enough to have stable percentages and/or rates? Are any of the data counts less than 20? If so, remember that a shift of one person could result in what appears to be a big percentage change. Less than 5? Do not use this data.
- Trend data: Have data collection methods changed over time? Are there differences in the population studied? Have questions or measures changed? If yes, be cautious in presenting data as a trend.
- If you still have questions about data quality, consider following up with the people who collected it. Read reports that highlight specific strengths and limitations of the data. Contact researchers to ask additional questions, as appropriate.
Data strengths and limitations for a selected group of potential local data sources are included in the resource linked below. This list is not exhaustive, but does provide a starting point. To learn more about a particular data source, you can contact the organization that collects and maintains the data for additional information.
Remember, no data is perfect—and data doesn’t have to be without limits or caveats to use it—but you want to make sure you understand the limitations and note them when presenting data.
What is the difference between rates, percentages, and proportions?
Data is more useful when placed in context. Simply knowing the count of something isn’t often helpful. For example, knowing that 40 kids are up to date on immunizations isn’t as useful as knowing that 40 out of 42 (95%) kids are up-to-date vs. 40 out of 100 (40%). Your perception of immunization coverage would be dramatically different in those two scenarios. It is especially important to use rates or percentages when comparing data across time or between geographic areas (e.g., local vs. state data, yearly trends). It is important to account for potential differences in population size, either by geography or over time.
- Percentages represent a rate, number or amount per 100. If 80 moms out of 95 report driving their kids to school every day, then that represents 84% of moms.
- Rates typically imply action or the incidence of something and are created by dividing the occurrence of an event by a denominator that provides context (e.g., total population, people at risk for a condition), and then often multiplied by a factor. This is done so the number isn’t too small. Birth data is often represented as rates. For example, the birth rate is the number of live births divided by the population and multiplied by 1,000. The rate is written as xx births per 1,000 population. Without the multiplier, the birth rate would be a decimal (0.xxx), which is harder to understand and compare.
- Proportions allow you to consider a part, share or number in comparative relation to a whole. 1/3 or 2/5 is considered a proportion. “1/3 of all households eat out once per week.” You could also represent this as a percent: 33% of households eat out once per week.
More information: Tools of the trade (Pennsylvania Department of Health)
How do I use percent difference and percent change?
The percent difference measures either an increase or decrease between two numbers you are comparing. If you had 47% of adults reporting seat belt use in 2018, compared to 34% in 2016, you have a 13-percentage point increase between 2016 and 2018. It would not be accurate to state that seat belt use increased by 13%.
Percent change is similar to percent difference, but describes the change as a percent of the initial value. The above example suggests that you had a 13-percentage point increase in seat belt use (using the percent difference approach). For percent change, take the amount of change and divide it by the initial value, then multiply by 100. In this case, 13%/34% x 100=38%. A way to say this is there was a 38% increase in seat belt use between 2016 and 2018.
More information: Percent difference and percent change (PDF) (Substance Abuse in Minnesota)
What should I think about when presenting two variables together (e.g., cross-tabs)?
It can be very powerful to show how specific conditions or issues may vary between or across groups or by some other factor, e.g., age, race/ethnicity, income level, education level, gender, etc. Typically, this is a calculation done with Excel or other statistical package. Even though these packages will calculate cross-tabs and frequencies for you, you do still need to think about data when presenting it.
- Try to use percentages or rates over raw numbers within tables, to account for differences across groups.
- Be careful when framing your results—are you using column or row percentages? Make sure you know the difference in what is calculated.
- When you have cells within tables, are some of the counts less than 5? If so, be very cautious in presenting this data. Any calculated percentages will be unstable given the small denominator. A change of one person could make it seem like a 20% difference. In addition, depending on what data you are using, a small cell size could lead to the identification of an individual. In general, a cell size of 15 or more tends to have more stability in terms of rates or percentages.
What should I think about when presenting trend data or comparisons between groups?
It is important to make sure that you are presenting comparable data, whether between groups or over time. You may need to adjust for differences in population size or age distributions---or account for other changes over time. Some strategies to ensure that your data is comparable include:
Use rates, percentages or per capita to ensure comparability across jurisdictions that might vary by size or over time.
- For example, instead of the number of births per year in your county, present it as the birth rate, or births per 1,000 population.
- A second example is low birthweight. Do not just list the number of births that are low birthweight in a given year, but rather create a percent of total births. This allows you to compare to jurisdictions of other sizes and/or over time.
To account for financial differences over time, consider inflation-adjusting any dollar amounts you present. There are inflation-adjustment online calculators that make it easy to adjust your data. Make sure to list the year to which you are adjusting in your notes.
Similarly, you should consider adjusting for potential population changes over time by creating per capita estimates or rates.
Death data is frequently age-adjusted to account for differences in age distributions between different jurisdictions. Florida has a very large elderly population relative to younger age groups, so if one were to strictly look at their death rate per population each year, it would look like Florida is a bad place to live compared to other states with a younger population. By age-adjusting those rates (commonly provided at the national, state, and local levels), you remove the age-related bias that can occur.
How do I know if a health condition or issue is serious in my county or jurisdiction?
Magnitude, severity and disparities are the three main factors to consider when examining health issues in your community.
- Magnitude: How many people are affected by this health issue? Does it occur more in your county compared to rates in Minnesota, other counties, or nationally? How do your rates compare to targets set in Healthy People 2020 or other benchmarked indicators?
- Severity: Some conditions are not as common, but their severity or complications are so important, that even conditions with low rates warrant intervention. For example, you may not have several children with elevated blood lead levels in your community, but if there are some with very high levels, this might make it a higher importance issue given the severity of short- and long-term effects due to lead exposure. Similarly, one case of tuberculosis (TB) can warrant a major response and use of resources.
- Disparities: While your overall rates of a specific condition or issue maybe low, are some groups within your community disproportionately affected by these conditions? Are those rates so high that it merits focusing on reducing those disparities within your overall work plan?
How do I use qualitative information?
Qualitative data is defined as data that approximate or characterize, but does not measure the attributes, characteristics, properties, etc. of an issue or topic. Common sources of qualitative data include:
- Interviews
- Focus groups
- Document reviews
- Organizational chart reviews
You should still consider the source and quality of the qualitative information you are considering. For document reviews, how did they choose what to examine? What were their inclusion/exclusion criteria?
For key informant interviews or focus groups:
- How were the questions developed? Were they taken from existing instruments or written for the purposes of a specific project?
- How were the respondents selected? How many people were involved?
- When were the interviews/focus groups conducted and by whom?
- Do you have a summary of findings or were there enough respondents that themes could be generated from the individual responses?
- Which population groups were represented in among your respondents?
Formal qualitative research is not the only source of valid qualitative data, but it is important to have a sense as to who provided the information, when and in what ways.
How do I put it all together? (Considering the entire process)
Thinking through potential data sources, their quality, the extent to which a health condition or issue is important in your community… all of this is a process. The above is designed to help you through that process, but ideally you will not be doing it alone. Additional factors to consider include:
- Community context: Are there particularly issues that have become important in your community? Are there political factors that will make a topic difficult to address at this time? Conversely, is there political will around a topic(s) that might make this a good opportunity to tackle it?
- Community engagement: How do you know the above? Taking the time to connect with a variety of stakeholders will improve your process and final product. Information gathered at sessions can augment the quantitative data you use for decision-making. Engaging stakeholders early in your process is important to authentically hearing their perspectives and insights, as well as helping them feel a part of the process.
- Community resources: Are there additional resources available for specific topics that you could leverage? Your community stakeholders may also be aware of resources or other groups that could help you work on a specific topic or issue.