# The Religious Typology

## Appendix A: About the religious typology

The religious typology divides the public into seven groups based on their answers to 16 questions that measure their religious and spiritual beliefs, their engagement with their faith, and the religious and nonreligious sources of meaning and fulfillment in their lives. The typology groups are created using cluster analysis, a statistical technique that identifies homogeneous groups of respondents based on their answers to the 16 questions used in the analysis.

The tables on the following pages show the distribution of responses on each variable used to build the clusters. Each of these variables was rescaled so that their values ranged from 0 to 1. This ensures that the clustering algorithm treats all variables as equally important. For variables with only two response options, the values were coded as 0 and 1. For questions whose response options form an ordered scale, the highest value was set to 1, and the lowest to 0. The remaining answer choices were distributed evenly between 0 and 1. For example, the question measuring views of the impact of churches and religious organizations on society has three categories: “They do more harm than good,” “They do more good than harm,” and “They don’t make much difference.” For this analysis, these were recoded as o, 1 and 0.5, respectively.

The type of cluster analysis used in this study is called K-means clustering. In K-means clustering, the researcher first decides how many clusters should be created. The algorithm assigns each individual to a single cluster group, where each cluster is constructed to contain individuals who are similar to each other but dissimilar to members of the remaining clusters.^{11}

Cluster analysis is not an exact process. Different cluster solutions are possible using the same data depending on model specifications and even the order in which the cases appear in the dataset. To address the sensitivity of cluster analysis to the order in which cases are entered, each cluster model was run several thousand times with different randomly selected start points. Then the results were compared to identify the solution that produced the set of groups that were both homogeneous internally and different from one another with respect to the 16 clustering variables. In technical terms, the solution for each model with the lowest sum of squared error of the clusters was chosen (e.g., the cluster solution with the lowest within-cluster variance).

Because the survey data were weighted to account for unequal probabilities of selection, and to correct for nonresponse, 1,000 datasets were created by randomly resampling respondents from the full set of completed interviews with probability proportionate to their weights. This effectively undoes the weighting and makes it possible to treat each dataset as a simple random sample. A cluster solution was found for each of these datasets using the procedure described above. A statistical procedure was then used to identify a final set of clusters that best summarize the patterns found across all 1,000 datasets.^{12}

Models with different numbers of clusters and clustering variables were examined, and the results evaluated for their effectiveness in producing cohesive groups that were sufficiently distinct from one another, large enough in size to be analytically practical, and substantively meaningful. While each model differed somewhat from the others, all of them shared certain key features; for example, each contained at least one group of individuals who were highly religious and actively practiced their faith and one that was traditionally religious but far less engaged.

Models that produced five, six, seven and eight clusters were evaluated in depth. The seven-cluster model was found to be strongest from a statistical point of view (most consistently sorting people into the same groups over the course of 1,000 replications of the model, each of which was run on a random sample of respondents in the dataset), most persuasive from a substantive point of view, and representative of the general patterns seen across the various cluster solutions.

- For more information on K-means clustering, see MacQueen, J. 1967. “Some methods for classification and analysis of multivariate observations.” Berkeley Symposium on Mathematical Statistics and Probability. Also see James, Gareth, Daniella Witten, Trevor Hastie and Robert Tibshirani. 2013. “An Introduction to Statistical Learning with Applications in R.” ↩
- For details on this procedure, see Stephens, Matthew. 2002. “Dealing with label switching in mixture models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology). ↩