Sampling and Questionnaires

Primary Data Gathering and Representative Sampling Related to Questionnaires

Are you interested in a factual survey - where the data presented is purely statistical and descriptive? Or are you interested in an attitude survey where you also need to present some sense of what the respondents mean in their answers? Or are you interested in an explanatory survey where you present information in a way to prove or disprove hypotheses and this is how you present your data? Whatever, you have to present the data.

Let's start with quantitative data collection, and sampling, as with doing questionnaires. What part of the population do you want to ask your questions - all of the adults or a specific selection of the adults you are interested in? When you have the relevant population, as it is called, then you must have a sample that is reliable regarding that larger population. For example, if you are interested in women, your sample must exclude men. But more than this, you want each representative unit to reflect the that population proportionately within your sample. To get your sample, you need a sampling frame. A sampling frame is a basis by which to sample from a relevant population. An example of a sampling frame is people in the phone book. What's wrong with that? Well, not everyone - especially the poor - has a telephone. It's not even perfect for sampling telephone customers because many people are ex-directory and others have mobile phones. With sampling frames you ask, "Who is being excluded?" You could use the electoral register. What's wrong with that? Well not everyone is registered to vote. In fact you cannot find a perfect basis of sampling, the perfect sampling frame. So when you choose one admit its faults. And then off you go. Your sample must reflect the relevant population you are interested in. That's because on the basis of the sample you want to apply the results to the whole of that relevant population.

So you have to sample from the sampling frame and make it representative of that relevant population. Method 1 is random sampling. This is where each representative unit has an equal chance of being in your sample. The way to do it is take your sample frame and then come up with a set of random numbers. You then apply each number to the sampling frame and each time that number hits upon a person that is the person you must question. You carry on the random number selection until your sample is large enough that the law of averages suggests that the sample is genuinely representative.

Stratified random sampling is where you choose the proportions of the people that are within your relevant population. Suppose that in naturist clubs 20% are male, 70% are female and 10% transgender. Suppose naturists are made up of 10% lesbian, 15% homosexual and 75% heterosexual. Suppose naturists are made up of 80% middle class and above, 20% working class. Your sample will first of all categorise these people and make sure you have the proportions right. Then within these categories, nicely sorted out, you do random sampling. Trouble is, whilst this requires a smaller sample, how do you know you've got these categories correct?

So you might choose quota sampling. Here you ditch the sampling frame. You have a particular test of inclusion - you want so many who are married and so many single. Stop the people in the street and knock on doors and if they pass the test then ask them. If there are too many of one kind, say sorry and find someone else. Two problems - how do we know the people asked are in any way representative of the relevant population, in that the test of being asked questions is usually fairly slight. An example would be in the street or knocking on doors interviewing far too many unemployed and part time employed people. The full time employed are not there are they? But as soon as you ask those entry questions, and especially if you ask too many, people get suspicious of who you are and your intentions and start giving dodgy answers. Because had you used a random method you would not need such qualifying and often personal entry questions.

A way to save time is to take a ready made sample and sample from that. This is multi-stage sampling. Polling organisations do this all the time. Indeed many surveys do this. But as you do this, the sampling error rises.

If you have a specific target audience then you can do something called snowballing and this is where you get a set of people involved in an activity how introduce you to more and then to more and so on. You build up a large number of people of your specific relevant population to interview. But the relevant population must be specific for such a method.

But another method is to turn the whole notion of sampling on its head. Why not go for deliberately unrepresentative people? You have a hypothesis and you want to falsify it. Find a group of people most likely to meet that hypothesis. If they do not, you have broken the hypothesis and it is falsified; if they confirm the hypothesis then it you can claim it is working among that surveyed group. Many random surveys of course then pick specific respondents to particular questions for further questioning - they are deliberately unrepresentative.

Now in order to do a survey like this you may want insight into what questions to ask. So for this you do a preliminary case study and formulate the issues and questions. You may also carry out a pilot study on a group too small too small to be representative but who may indicate in advance to the researcher whether the research method is any good. Did the questions make sense to the respondents? Did people co-operate? Did it help the interviewer develop interviewing skills? Did it reveal practical problems? If you can iron out the problems then you go ahead with the bigger representative sample - however, none of the people in the pilot study can reappear in the main study.

Let's talk about questionnaires. As soon as an individual presents even a most rigorous questionnaire to a respondent that respondent also responds to the person and not just the questions. You should always think about exactly how you look and exactly what you say throughout. Of course you could send questionnaires in the post but you get very poor response rates and these threaten the random reliability of the data. Of course you can ask people within a group or setting known to you and the response rate is high but you must not let them discuss questionnaires amongst themselves. They should answer alone.

To make a questionnaire you have to be able to translate hypotheses and concepts into working clear questions. The questions must not be leading - they must be neutral and give equal chance to every answer. The questions must also be designed so that they produce clear components of data so that even though the respondent may not realise it they are contributing to some answer which relates back to those hypotheses and concepts you started with - so that the answer means something in terms of statistical representation. This process of concept to question and then back to proof or disproof or meaning in the data records may take quite some skill in translating. How does your question represent or measure the concept you are interested in? This is where so many questionnaires can fall down. You have to decide whether open questions with flexible answers or closed questions with fixed answers serve your needs and intentions the best.

Questionnaires are useful because they cut costs and time involvement. They produce repetitive and therefore reliable data because of the uniform application of questions and the regular presentation of the data. They are useful for multivariate testing - especially with large samples that can be broken down for new questions of respondents to specific questions. They claim to represent the relevant population.

But is the data valid? This is because even if you ask the same questions in all the same ways, you cannot be sure that respondents mean the same thing when they read or hear the questions. Words mean different things with different moral slants between social classes, age groups, and cultures. Suppose you ask a question regarding graffiti on walls "Is this really bad?" and the youth says it is bad and so does the older adult. But the youth interprets bad as good and so "it's really bad" means its good to do it. The researcher goes on to claim everyone disapproves of graffiti on walls. However, the youth carries on putting graffiti on walls because it is really bad, in other words he approves. Questionnaires impose an impersonal distance between researcher and researched, which reduces the ability to understand and interpret what the respondent meant. Another problem - the questions are what's important to the researcher, but are they what is really important to the respondent, when an interview would have given the chance for the interviewee to say what is important to him or her? If you turn a concept into a simple question, are you not creating false categories that define reality too neatly? Reality is fuzzy and needs exposition. Another problem is that what people say is not what they do, so they answer questionnaires one way and behave another. Think of driving cars! Also people have poor memories. Another problem lies with open ended questions - these need perhaps too much manipulation afterwards to fit them into neatly presented data sets.

Haralambos, M., Holborn, M., Sociology: Themes and Perspectives, London: Collins Educational, 734-740.