Statistical Reporting 101


Pch 594 Guidelines Statistical Reporting for the Special Project

Suggested Citation: Jung, BC (2006 - 2017). Statistical Reporting 101.


The reason why I am writing this additional set of guidelines for PCH 594 - Special Project Seminar II is because I have found, over the years, that statistical reporting is one area that is not done very well. This is also one area that I have spent the most advisement time on. Therefore, my hope in writing these guidelines is to help you better understand why statistical reporting is important, and why it can and should be done well while you are in school and when you will be working in the field of Public Health.

Of course, I cannot cover everything, but do provide enough detail so that you can submit a presentable Section 4 for your final Special Project Report. Basically, this documentation will deal with how to report data collected with the use of surveys, telephone interviews and focus groups. These are the most common methods used for the Special Project's data collection activities.

Introduction to Statistical Reporting

The purpose of reporting the results of your data collection is to share your experiences and your findings so future researchers can use them as resources for future research endeavors. You should always provide the appropriate context with which your reader will need to truly understand what your data collection and analysis were all about.

Briefly, your readers should only have to look at your appendices with your data collection instruments and documentation, and the "Results" section of your Report's Section 4 to know what you did. If you lost your readers at this point you can forget about having them bother to read the rest of the report you have slaved over during many sleepless weeks. In essence, your statistical presentation has to be perfect for the serious reader (like me).

The instruments and documentation should be self-explanatory so that any data analyst, regardless of what statistical software s/he uses, would be able to analyze the data and make statistical sense of the data. This should also allow the data analyst to compare the results with those that were collected by others with different populations, using the same instrument.

Basics of Data Collecting

Collecting Data

  • You should know what the scale and data types are (i.e., nominal, ordinal, interval, ratio) and what kinds of data analyses you can perform for each type.
  • Pertaining to survey data collection, you should always collect some demographic data that will allow you to describe your survey population.
  • When collecting numeric or measurement variables (i.e., age, cholesterol level, blood count, etc.), collect in numeric form rather than in categories so you can calculate means.
  • When using categories, use conventional categories that are used in the research literature so you can compare your findings to those in the literature.

Developing Data Shells

Before you even start to analyze the data, you should develop templates (shells) for your statistical tables. These shells will help you to organize your statistical analysis activities so you don't miss anything. Creating these shells will provide you with the freedom to organize the tables in a logical way, without being distracted by the data, and will enhance your statistical reporting. Once you have completed your analysis, you can just plug in the numbers.

You can also develop data shells for the reporting of qualitative data. This would include tables that would have at least 2 columns, in which you would be recording the actual statements, or, short answers in one column, and the category you are grouping the statement under in the second column.

Basics of Codebooks


  • When preparing for data entry, precoding can be done when your survey questions are structured, and there are preset choices from which to choose from. Basically, you are operationalizing your variables, or defining what kind of data you are collecting with your variables.
  • A Codebook is the collection of these definitions, which standardizes data entry. Anyone performing the data entry will have the same understanding of what the variables mean, and what kinds of data are being collected.
  • Precoding can be done in Epi Info with the development of a Check file. The Check file will suffice as a Codebook for the required documentation. Here is an example of an Epi Info Check file field:
    • Female
    • Male
    • Unknown

  • If you are not using Epi Info, then you must create a Codebook listing all your variables and the data you have collected for each variable. If you are using a statistical program that can only analyze numeric data, you must define what the numeric codes mean in the Codebook. For example:
    • 1 (Female)
    • 2 (Male)
    • 0 (Unknown)

  • An advantage in using Epi Info over other statistical programs is you can analyze alphanumeric (character, string) data. Many statistical programs can only process numeric data. In such cases, without a Codebook, it is virtually impossible to know what 1, 2, 3, etc. mean for any particular variable by looking at a file's structure. (Just imagine how interesting hieroglyphics would be if the ancient Egyptians had taken the time to develop and leave behind some codebooks in the pyramids!!) Additionally, when reporting, you would have to convert your numeric codes to character data anyway.

For each data collection instrument, include a Codebook in the Appendix.


  • When collecting qualitative responses (statements), you have to add some extra steps in developing your Codebook.
  • You would first record the statements as reported, using a word processor, in a word processing document. Then you would develop codelists in which you show how you categorized all the responses. These categories would be a derived variable that you would then analyze. These codelists would be then compiled into a Codebook.
  • The Codebook would list all the derived variables and all the categories you have developed for each derived variable.

For each data collection instrument, include a Codebook in the Appendix.


  • After your initial precoding for structured answers and postcoding for qualitative open-ended answers (i.e., "Other" write-in responses), you may need to do some additional recoding. This can take place any time during analysis when differences start showing up and you are trying to find an explanation for those differences.
  • This step involves developing derived variables and then recoding the derived variables with a meaningful and manageable number of categories. These variables are usually created solely for purposes of data analysis. They are not data you have collected directly, but organized in ways to answer specific questions that have come up while you were looking at the data.
  • When working with qualitative data - written statements - you should document how you developed the categories in a "Data Analysis Protocol." This is an explanation (rationale) of how you analyzed the statements you have collected. For example, if you decide to group statements into a few major categories, you should provide an explanation of how or why you are using such categories. Analysis of qualitative data can be very subjective, so you need to let the reader know how you have operationalized the categories you are using to quantify the raw data. Categorizing qualitative data will allow you to quantify your data so you can provide some descriptive statistics.
  • Recoding include:
    • Developing categories for "Other" write-in responses,
    • Developing categories for continous variables (i.e., 16 years to "10-19 Age group")
    • Collapsing categorical variables with too many categories into derived variables with broader and fewer categories,
    • Having categories that are mutually exclusive (i.e., 10-19, 20-29, 30-39, etc.),
    • Having a meaningful number of categories (i.e., 4 or 5 age groups),
    • Showing the "n-size" - number of respondent cases. For example, tables for demographic data should include an "N" for the total number of people surveyed, and each gender should include an "n" so the reader would know how many people actually responded to the question asked.

For each data collection instrument, include a Codebook in the Appendix.

Basics of Data Entry

Depending on what kinds of data you collect, you would have to enter the data into some program, process it so you can then analyze it.

  • Word Processor Text Files. You can enter qualitative statements into a word processor file. You can also do this with any write-in answers for such choices as "Other" or "Please list" or "Please describe," etc.
  • Spread Sheets. You can enter numeric data into spread sheet files. If you plan only to report frequencies and means, this format is all right.
  • Database Files. You can enter numeric and string responses into database files. If you plan to only report frequencies, this format is all right. Such programs, however, are not very good for advanced statistical analyses. And, modifying database structures can be a real nightmare.
  • Epi Info. I recommend using Epi Info for survey data. Actually, I recommend Epi Info for about any kind of data management you have in mind. It's that flexible. Only caveat is to stick with Epi Info 6 for now. Epi 2000, the Windows-based version, still require lots of debugging. However, if you like Epi Info, you can try out EpiData, a parallel Windows development product that stays with the original intent of Epi Info - simplicity.

    Finally, with Epi Info, you can develop data entry screens that look like your data collection forms, edit in spread sheet format, modify the database structure at will without losing data, analyze numeric and string data, perform a whole range of statistics, and develop some charts as well.

Basics of Data Processing

While data processing can be considered by some to be data analysis, I am making a differentiation. Raw data that are unusable without some intermediate steps to make them "analyzable" remain raw. Thus, data processing include those procedures in which raw data are redefined, or, categorized with the use of derived variables.

For example, qualitative statements make for colorful anecdotal narratives, but they are practically useless (qualitative researchers, don't kill me yet) without some categorization (processing). Categorization facilitates the generating of descriptive statistics. Examples include:

I would definitely read this Positive
Suitable for teenagers Neutral
Not very usefulNegative

China Asia
VenezuelaSouth America

In this instance, because of the variety in responses, you may decide to go back and recode the original choices using the derived variable, Continent, as this would reduce the number of categories (in this case, countries) you would have to analyze, during data analysis, and may be just as useful as having all the countries' names.

Include preliminary data processing of qualitative statements and "Other" responses in the Appendix.
Include the analyses of derived variables in Section 4's "Results" section, using APA format.

Basics of Data Analysis

Most of the data collected for developing and evaluating your Product Prototype will most likely fall into one of the following:

  • Multiple choice responses (nominal data, yes/no data),
  • Rating or ranking responses with the use of a scale (Likert-type),
  • Written statements or short answers.

Therefore, the MINIMUM data analyses I expect to see include:

  • Category data description - Frequency and percentage tables
    • Example 1: Demographic Variables
    • Demographic Variable
      Total (N = 100) Percent Mean Age (Std Dev)
      Female 48 48% 28.9 yrs (+ 15.0)
      Male 52 52% 31.8 yrs (+ 19.6)
      Total 100 100% 31.1 yrs (+18.8)

      Note: This is how you should set up your tables to report demographic data - with totals adding up to 100%. If there are unknowns, they should be included, so everyone is accounted for.

    • Example 2: Content Variables
    • Choice Variable (N = 20) Total Percent
      Font Choice (n = 20) Choice A (n=10) 50%
      Choice B (n=5) 25%
      Choice C (n=5) 25%
      Graphic Choice (n = 20) Choice A (n= 2) 10%
      Choice B (n= 8) 40%
      Choice C (n=10) 50%
      Title Choice (n = 18) Choice A (n = 0) 0%
      Choice B (n = 15) 83.3%
      Choice C (n = 3) 16.7%

      Note: Not everyone answered every question.

    • Example 3: Derived Variables from Qualitative Responses
    • Category Variable (N = 10) Percent
      Useful (n = 5) 50%
      Not Useful (n = 3) 30%
      Don't Know (n = 2) 20%

    • Example 4: Likert Scale Survey Items
    • Statements (N = 50) Strongly Agree (%) Agree (%) Neutral (%) Disagree (%) Strongly Disagree (%)
      The program was useful (n=35) 10 (28.6%) 20 (57.1%) 0 (0.0%) 5 (14.3%) 0 (0.0%)
      I will be a better health educator (n=50) 0 (0.0%) 50 (100.0.%) 0 (0.0%) 0 (0.0%) 0 (0.0%)

      Note: Not everyone answered every question.

  • Relationships between Variables - What is Bivariate Analysis?
  • Since the data collected for the Special Project are mostly from surveys, or other methods using some sort of survey-type instrument or template, for purposes of developing and modifying your Product Prototype, the presentation of descriptive statistics, with some basic bivariate analyses is usually sufficient. You would not need to perform statistical procedures necessary to test a hypothesis, since this is not the purpose of the Special Project.

    Because you have collected categorical data, along with some demographic data, you can perform some cross-tabulations to see if any relationships exist between two variables. Cross-tabs are most useful in seeing whether or not demographics can be used to explain differences in your other variables. For example, cross-tabs may show that the choice in fonts, colors, or topics may be determined by gender, age, or even geographic location.

    This is why it always important to collect some basic demographics as Age, Gender, and Geographic Location because such characteristics could explain differences that may show up in the other variables you collect. It will also help you to better tailor your product to the various populations in your audience.

  • Relationships between Variables - Cross Tabulations
  • In addition to providing descriptive statistics, you should perform some basic statistical procedures to explore any relationships that may exist between variables, especially between your independent and your dependent variables. Think of your demographic variables as the independent variables. Does gender affect the responses to the questions you have asked? Does location affect the responses to the questions you have asked?

    • Contingency Table - (2 categorical variables) - Chi-Square analysis
    • Gender Agree (%) Disagree (%) Total
      Female 75 (68.2%) 35 (31.2%) 110 (51.2%)
      Male 95 (90.2%) 10 (9.5%) 105 (48.8%)
      Total 170 (79.1%) 45 (20.9%) 215 (100.0%)

      Note: Chi-Square (Yates Corrected) = 14.82; p value=0.0001
      Agreement status is statistically significantly different by gender.

      Note: Dependent Variable=Agreement Status ; Independent Variable=Gender
      Calculate percentages by row (independent variable); Compare percentages by column (dependent variable).

  • Relationships between Variables - Analysis of Variance (ANOVA)
    • Analysis of Variance (categorical and continuous variables) - F Ratio
    • If you treat your Likert Scale responses as linear numeric, then you can calculate means for your scales and then compare these means across the series of questions you asked. You would report this as follows:

      Statements (N = 30)Mean (Std Dev)
      The graphics add to the brochure (n = 10) 4.5 (+ 0.4)
      The fonts add to the brochure (n = 24) 4.2 (+ 0.3)
      Explanations add to the brochure (n = 30) 3.3 (+ 1.27)

      Note: Strongly Agree=5, Strongly Disagree=1

      For example, the last statement seems to have received low ratings. You may want to see if gender could explain the difference in ratings.

      Gender (N = 30)Mean (Std Dev)
      Female (n = 15) 3.9 (+ 1.06)
      Male (n = 15) 2.8 (+ 1.3)

      Note: Strongly Agree=5, Strongly Disagree=1
      F statistic = 6.266; p value= 0.017
      Mean ratings are statistically significantly different by gender.

      Oh, oh. Looks like the men in your audience didn't like the explanations. You may want to think about revising the explanations. And, you should be cautious when interpreting the mean for the graphics statement. Even though the graphics statement had the highest mean rating, only 1/3 of your population answered that question. It could be that the 20 who didn't like the graphics just didn't respond.

Search Betty C. Jung's Web site

Custom Search

Search the Entire Internet

Custom Search

Pch 594 Guidelines Statistical Reporting for the Special Project


Betty's Home Page Site Map Index Academic Index Page

Published on the Net: January 17, 2001
Updated: 12/22/2016 R126


© Copyright 1999 - 2017 Betty C. Jung All rights reserved.