Categorical data
Contingency Table A contingency table is a way of summarising the relationship between variables, each of which can take only a small number of values. It is a table of frequencies classified according to the values of the variables in question. When a population is classified according to two variables it is said to have been 'crossclassified' or subjected to a twoway classification. Higher classifications are also possible. A contingency table is used to summarise categorical data. It may be enhanced by including the percentages that fall into each category. What you find in the rows of a contingency table is contingent upon (dependent upon) what you find in the columns. Confidence Interval for a Proportion A confidence interval gives us some idea of the range of values which an unknown population parameter (such as the mean or variance) is likely to take based on a given set of sample data. Sometimes we are interested in the proportion of responses that fall into one of two categories. For example, a firm may wish to know what proportion of their customers pay by credit card as opposed to those who pay by cash; the manager of a TV station may wish to know what percentage of households in a certain town have more than one TV set; a doctor may be interested in the proportion of patients who benefited from a new drug as opposed to those who didn't, etc. A confidence interval for a proportion would specify a range of values within which the true population proportion may lie, for such examples. The procedure for obtaining such an interval is based on the proportion, p of a sample from the overall population. Confidence Interval for the Difference Between Two Proportions A confidence interval gives us some idea of the range of values which an unknown population parameter (such as the mean or variance) is likely to take based on a given set of sample data. Many occasions arise where we have to compare the proportions of two different populations. For example, a firm may want to compare the proportions of defective items produced by different machines; medical researchers may want to compare the proportions of men and women who suffer heart attacks etc. A confidence interval for the difference between two proportions would specify a range of values within which the difference between the two true population proportions may lie, for such examples. The procedure for obtaining such an interval is based on the sample proportions, p1 and p2, from their respective overall populations. Expected Frequencies In contingency table problems, the expected frequencies are the frequencies that you would predict ('expect') in each cell of the table, if you knew only the row and column totals, and if you assumed that the variables under comparison were independent. See also contingency table. Observed Frequencies In contingency table problems, the observed frequencies are the frequencies actually obtained in each cell of the table, from our random sample. When conducting a chisquared test, the term observed frequencies is used to describe the actual data in the contingency table. Observed frequencies are compared with the expected frequencies and differences between them suggest that the model expressed by the expected frequencies does not describe the data well. See also contingency table. ChiSquared Goodness of Fit Test The ChiSquared Goodness of Fit Test is a test for comparing a theoretical distribution, such as a Normal, Poisson etc, with the observed data from a sample. ChiSquared Test of Association The ChiSquared Test of Association allows the comparison of two attributes in a sample of data to determine if there is any relationship between them. The idea behind this test is to compare the observed frequencies with the frequencies that would be expected if the null hypothesis of no association / statistical independence were true. By assuming the variables are independent, we can also predict an expected frequency for each cell in the contingency table. If the value of the test statistic for the chisquared test of association is too large, it indicates a poor agreement between the observed and expected frequencies and the null hypothesis of independence / no association is rejected. ChiSquared Test of Homogeneity
