Skip to content

Please be patient with me while I wait for the full data set

September 30, 2015

2008-08-19Something that has come up a few times in my self-employed career is the issue of when to start analysing a data set for a quantitative survey.  Well it is obvious to me – when the data set is complete i.e. when all of the respondents have filled it in.

But there seems to be a couple of situations that arise which mess with this.  Either:

  • A client wanting / expecting me to ‘get started’ on analysing and reporting on a dataset before all of the responses are in.
  • A client ‘discovering’ / ‘adding in’ a bunch of extra respondents after the survey has officially closed and the analysis has started.

This isn’t ideal from an analysis / reporting perspective.  The thing is, running the analysis /reporting on the data set is quite a bit of effort, and you don’t want to have to do it twice.

Why might you have to do it twice?

OK, well here’s a little example to show what can happen if you run your analysis based on a partial data set.

Let’s imagine I’m running a survey with one question in it, and the question is:

Do you like dragons? (Yes/No)

I have asked a representative sample of 100 people to complete this survey.

When the first 50 people have completed my survey, 45 said yes they liked dragons, and 5 said no they did not like dragons.  So my analysis / report looks like this:

Do you like dragons? (Base: 50)

Yes=45 (90%)

No=5 (10%)

In total, 90% of respondents said yes, they like dragons.

Later, the final 50 people completed my survey giving me 100 respondents in total. 50 said yes they liked dragons, and 50 said no they did not like dragons.  So my analysis / report looks like this:

Do you like dragons? (Base: 100)

Yes=50 (50%)

No=50 (50%)

In total, 50% of respondents said yes, they like dragons.

Completely different.

So what was the value in knowing the first set of stats, where 90% of respondents said they liked dragons?

Nothing, it meant nothing because it was not based on the representative set.  Any conclusions I might have drawn based on this data would simply be irrelevant.

So I just wasted my time in running the data and writing my ‘report’.

Now what if I had a much larger survey, say with thirty questions in it?  How much time would it have taken me to analyse / report on all of that?  Ages.  Then if it turned out I wasn’t basing my findings on the full data set I’d just have to do it all over again.

That’s an extreme example perhaps – but adding in new data certainly changes every calculation as percentages need to be based on the final total, which does usually change the figures to some degree.  And actually it is somehow all the more frustrating when you’ve run the data for the second time and you’re just amending every figure by one or two percentage points!

It is so much more efficient just to wait and do it the once.

So I wait… and I wait… until I’m absolutely sure that the survey is closed.

Please be patient with me while I wait.

 

Advertisements
No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: