SPSS On-Line Training Workshop |
Valid data
|
General
Considerations
Two Examples that did not take into account the context behind the study
These two examples ignore the context behind of the study. As the result, the conclusion goes against common logic. The problem with Example One is that the root cause of mathematics ability for young children is associated with their age. It happens that foot size is larger when they are older. Example Two misuses the response and the predictor. The situation of the environment is that when population increased, more houses were built, and therefore, attracted more storks moving into the city. These examples suggest that without proper consideration of the context behind the study, it could easily be 'Garbage In, Garbage Out'. |
Data Analysis
A valid data analysis starts with a solid planning of the study.
For a survey study or an observational study where a controlled experiment can not be conducted, one should begin by considering the adequate measurement, the target population, sampling techniques, sample size, factors associated with the intended characteristics, designing questionnaires, and ways of distributing and collecting the survey. | |
For a controlled experimental study, one should begin by considering the measurement, the potential confounding factors associated with the measurement, the intended factors for the experiment, the design of the experiment, experimental units, sample size, and possible statistical techniques for analysis based on the experimental design. | |
In many situations, a controlled experiment may not be possible. However, a semi-experimental design may be possible. For these situations, the background and environmental factors are extremely important. For example, in studying the effect of different teaching pedagogical approaches, one may not be able to perform a randomization scheme to select subjects for each teaching pedagogy. It may happen that one class has much better students than the other class. Hence, the effectiveness of the teaching method is confounded with students' initial ability. If we collect the information of potential confounding factors such as their GPA, gender, and age, and conduct a pre-test, then a proper data analysis such as Analysis of Covariance or Repeated Measure Analysis can be performed to make a proper comparison. |
A valid data analysis must have a valid set of data.
Once data are collected, the next step is to design a proper format for data entry. Many times data are entered in such a way that it is not readable by statistical software. Although computer technology is very advanced, data values can only be either numeric or non-numeric. Numeric values can be quantified; while non-numeric values can only be summarized in most cases. It is important that proper data values be created so that statistical software can perform the analysis. | |
After data entry, it is the data cleaning and manipulation stage. It can happen that some data points are entered completely out of range. A quick way of locating these out-of-range data values is by performing frequency procedures or descriptive procedures, and check the output results to see if any variable has such a problem. | |
Data transformation is often used before a valid analysis can be performed. |
Appropriate Statistical Procedures are the key to a correct analysis.
Almost every statistical procedure has assumptions behind it. It is necessary to carefully consider the violation of the assumptions for a statistical procedure. A minor violation usually does not create serious problems. However, if there is a serious violation, appropriate data transformation or selecting different statistical procedures may be necessary. | |
It is often the case that appropriate statistical procedures are associated with the types of data. Categorical data needs to be analyzed using procedures that are developed for analyzing categorical data. We do not perform frequency analysis or crosstabulation procedures to analyze continuous data. More detailed discussion is given in the Data Type and Possible Analysis Section. | |
It happens often in data analysis that one needs to conduct several analyses before an appropriate one is selected. One should expect that the analysis is never only a one step process. It involves many back and forth analyses and decisions for a proper analysis. |
Appropriate analysis needs correct interpretation of the results.
How to interpret and summarize the results from a huge pile of output is certainly a crucial step for a valid data analysis. It involves the understanding of the project, the statistical techniques and how to bring the numbers into the context of the project. | |
One must make sure that the output is properly interpreted and summarized to a degree that non-statisticians can understand them. |
Generally speaking, statistical techniques are often determined based on the type of data. The Data Type and Analysis page provides some details regarding to different types of data and possible statistical techniques for analysis.
Bottom line is
If you are not familiar with any step described above, seek statistical consulting help.
This online SPSS Training Workshop is developed by Dr Carl Lee, Dr Felix Famoye , student assistants Barbara Shelden and Albert Brown , Department of Mathematics, Central Michigan University. All rights reserved.