Structure and notation in this booklet 3
This booklet contains two types of content: principles and examples The principles are universally applicable, regardless of the software you choose While the examples utilize SPSS and Stata command files, they are designed to be helpful even if you are using different software The primary goal of this booklet is not to teach you SPSS or Stata, but to provide valuable insights.
SPSS examples are shown in single frames:
GET FILE = 'c:\dokumenter\proj1\alfa.sav' SPSS
SAVE OUTFILE = 'c:\dokumenter\proj1\alfa2.sav'
SPSS words are shown with UPPERCASE characters while variable information (file and variable names) are shown with lowercase characters
Stata examples are shown in double frames: use "c:\dokumenter\proj1\alfa.dta", clear Stata generate bmi=weight/(height^2) save "c:\dokumenter\proj1\alfa2.dta" [ , replace]
All Stata text is lowercase Stata words are shown with italics – just for clarity Optional parts of commands are shown with light typeface in square brackets [ ]
Command files, essential for executing a series of instructions, are referred to as "syntax files" in SPSS, with the file extension sps In Stata, these files are known as "do-files," and they carry the extension do.
1) Juul S SPSS for Windows 8, 9 and 10 Århus: Department of Epidemiology and Social Medicine, 2000
Download from www.biostat.au.dk/teaching/software)
2) Juul S Introduction to Stata 8 Århus: Department of Epidemiology and Social Medicine, 2004.
The audit trail 4
When keeping financial accounts e.g for a company or for an association there are some obvious principles to follow:
To ensure transparency in financial reporting, it is essential to trace back from the balance sheet to individual vouchers by assigning a unique number to each voucher This process allows for the identification of component amounts corresponding to each item in the balance sheet The concept of an audit trail embodies this principle, enabling users to follow the path from final results back to the original sources of information.
If you are the bookkeeper you need this for yourself, otherwise you will have a hard time tracing errors And it is an unconditional request for auditing (revision)
When conducting research, it is essential to adhere to the guidelines set forth by Udvalget vedrørende Videnskabelig Uredelighed, which emphasize the importance of traceability Each piece of information must be verifiable and linked back to its original source document.
• ID (case identifier) included in the original documents and in the data set
• All corrections must be documented and explained
• All modifications to the data set must be documented by command files
• A command file must document each analysis
This technique is essential for error checking and correction, as well as for documenting your actions Additionally, it plays a crucial role in projects subject to external audits and monitoring.
• protect yourself against: o mistakes o errors o waste of time o loss of information
Documentation procedures must be included already during project planning, and they should be with you all the time.
Overview of the process 5
This article discusses the use of questionnaires, including self-administered, interviewer-administered, and forms for recording information without direct contact, such as extracting data from medical records The term 'questionnaires' will be used to refer to all these types of information recording forms.
When designing a questionnaire, it's crucial to prioritize the respondent's experience by carefully crafting questions and response options, as well as ensuring a clear layout Additionally, while it's important to consider how the collected information will be processed, this should not complicate the questionnaire for the respondent.
Processing of questionnaires before data entry
Questionnaires should be labelled with a unique number (an ID)
A Codebook outlines the names, meanings, and coding for each variable, ensuring clarity in data management Textual data should be coded prior to entry, rather than inputted verbatim For numerical data, refrain from performing calculations before entry, as computers are more efficient at processing this information Additionally, always record dates directly and avoid calculating ages before data is entered.
Use a professional data entry program, I recommend EpiData 3 To reduce errors double entry of part or all of the data is advisable
Double entry errors can arise from challenges in interpreting ambiguous responses or inconsistent coding during the questionnaire pre-processing stage Additionally, respondents may provide inconsistent answers Chapters 7 and 8 focus on techniques for identifying these errors and inconsistencies, offering guidance on correction methods and the importance of documenting any adjustments made.
When working with original data, it's essential to derive new variables, such as calculating body mass index from height and weight, determining age from dates, or assessing quality of life from various responses Additionally, combining information from multiple sources through file merging is often necessary Chapter 9 focuses on the documentation of these modifications to ensure clarity and accuracy in data handling.
3) Download – at no cost – the program from www.epidata.dk Find a short description in Introduction to
It is essential to archive your data after completing your project to ensure its safe storage and accessibility Many health researchers face challenges in archiving due to the lack of a stable affiliation with a research organization For detailed guidance on archiving data, refer to chapter 11 and appendix 4, which discuss the opportunity to archive at ERAS within the Danish Data Archives.
While only a fraction of your analyses will appear in your final publication, many will inform your decisions on which results to highlight Chapter 10 offers guidance on organizing and maintaining documentation for your analyses.
There are two main considerations to be covered in chapter 11 and 12:
1 Prevent your data from being lost
2 Prevent your data from being abused by someone else
The principles discussed here apply not only to self-administered questionnaires but also to interviewer-administered questionnaires and case report forms completed by investigators or their assistants.
In self-administered questionnaires, the phrasing of questions, response categories, question sequence, and layout are crucial for effective data collection Conversely, when using case report forms, there is often less focus on these elements However, poorly designed case report forms can lead to confusion for investigators regarding consistent data entry and may give external monitors or auditors the impression of disorganized data collection.
Therefore, the advice below on designing self-administered questionnaires also applies to other data collection instruments
Example 1 A short questionnaire for self-administration
3 Which year were you born? 4 At which level did you leave school?
5 How many children do you have?
6 Do you have a vocational education? (Write below)
1st consideration: The respondent: The questionnaire should be simple and clear, and there should be no doubt how to fill it in
2nd consideration: Processing of the information recorded However, this consideration must never complicate the questionnaire to the respondent The first consideration really is the first
The layout of the questionnaire in example 1 is simple and it requires only standard word- processing tools I used the following principles:
1 Each question with response categories is framed by a box, to help the respondent concentrate on one question at a time Technically it is simple: I created a 6Η1 table
2 Questions are written with bold typeface
3 Response categories are written with ordinary typeface
4 Instructions (write) are written with italic typeface
5 For closed questions the response is given by circling a number (the code used when entering data)
6 For open questions responses are written in the box Do not add lines to write on; they only complicate writing the response
7 The amount of blank space should be appropriate, both for circling numbers and for writing text
Example 2 Layout of closed questions
A right-handed person hides the response text, increasing the risk of misplacing the response
The response field should be placed to the right of the response text to avoid this problem
The use of dotted lines minimizes the chances of misplacing responses, making it equally easy to circle a number as it is to check a box Additionally, providing the code upfront decreases the likelihood of errors during data entry.
This is OK for the respondent, but the risk of errors when entering data is higher than in 2b 2d Your sex:
This is OK too and includes the code, reducing the risk of errors when entering data
I prefer style 2b myself, but I can't explain exactly why Perhaps because it looks less pretentious
4.2 On questions and response categories
There are three major types of variables, defined by what kind of properties they express:
Weight in grams Age in days Age in 10 year age groups
Measurements reflect continuous properties and are expressed in defined units (°C, grams, years) Interval width can be as narrow as measurements permit, or be grouped in wider categories
Ordinal scale always / often / seldom / never rare / medium / well done
There is a natural rank of categories, but no exact relationship to a continuous property
No natural rank of categories (unless you are a chauvinist)
When conducting research on sensitive topics, such as fertility problems or quality of life, it's essential to frame questions that align with respondents' expectations and understanding of the study's purpose For instance, in a fertility study, respondents may anticipate inquiries about the frequency of intercourse, while questions about sexual pleasure may be unexpected Conversely, in a quality of life study, avoiding the topic of sexual pleasure could surprise participants Therefore, it's crucial to include an option in the introduction that allows respondents to skip any questions they prefer not to answer, ensuring a respectful and comfortable survey experience.
It is crucial for respondents to grasp the study's purpose; if they perceive that the questions lack relevance to this purpose, they may suspect a hidden agenda and cease to engage with subsequent questions.
Related questions should be neighbours In example 1 you see the number of children separate the questions on school education and vocational education, and this complicates things to the respondent
It's often suggested to begin interviews with neutral questions and save sensitive topics for later; however, I believe it's more effective for respondents to recognize early on that important questions are being asked throughout a lengthy interview or questionnaire.
Use the respondent's language and steer clear of medical or bureaucratic jargon Keep sentences short and easy to read, but ensure they maintain a mature tone This can sometimes clash with the need for clarity For instance, in question 6 of example 1, we inquire about vocational education, which may not be clear to all respondents.
3a Do you have a vocational education?
3b After you left school: Did you have any further education?
The distinction between school education and vocational education (erhvervsuddannelse) is a matter of complex definitions 3b, although not without problems, may be a better choice.
Response categories must be both exhaustive and mutually exclusive Exhaustive means that the options provided encompass all possible situations, while mutually exclusive ensures that respondents can select only one answer from the list.
4a Did your child ever have an itchy skin rash affecting the front of the elbows, behind the knees, front of the ankles, around the neck, or around the eyes?
This question is several questions in one