1. Overview
The Wide format presents each respondent as a single row, with each survey question represented as a separate column. This format is particularly useful when you want to:
Run statistical analysis or modeling in packages like SPSS, R, or Python.
Use dashboards or BI tools that expect one record per respondent.
Quickly view all answers for a respondent in one row.
2. File Structure & Layout
Each row corresponds to one respondent. Each column corresponds to one question or metadata field.
Example (first 5 columns):
V001_respondent_id | V002_Recent_Restaurant_Visit | V003_Age | V004_Gender | V005_Ethnicity |
ps-0097ac15-868c-6608-25fc-c0fe2cd884a8 | Yes | 35 | Female | White |
ps-012a5d8b-9dc4-2132-782d-73742be4088f | Yes | 21 | Female | White |
ps-022b6b43-6304-2dc5-0830-10d98bd7dee3 | Yes | 18 | Male | White |
3. Key Columns
V001_respondent_id – Unique identifier for each respondent.
Question columns (V###_...) – Each prefixed with a variable code (V###) followed by a short label. For example:
V002_Recent_Restaurant_Visit
V003_Age
V004_Gender
The last three columns are always:
Weight column (e.g., V171_weight) – The respondent’s statistical weight.
Timing columns (e.g., V172_start_time, V173_end_time) – Timestamps recording when the respondent started and completed the survey.
4. Data Representation
Single-choice questions
Stored as one column per question with the selected answer recorded.
Multi-choice questions
Each option is represented as a separate column. Values indicate whether the option was selected, and may also include rank order (e.g., -1
= not selected, 1
= selected first, 2
= selected second, etc.).
Numeric questions
Stored directly as numeric values (e.g., annual income).
Open-end questions
Stored as free-text responses in their own columns.
⚠️ Note: Recoded variables and coded open ends are not included in the Wide format. They are only available in the Long format.
5. Missing & Special Values
-1 often denotes an option that was not selected in multi-choice or rank questions.
Empty cells may indicate a skipped or non-applicable question.
"Prefer not to say" appears as a standard response category.
6. Weighting
Apply the weight column when analyzing results to ensure the dataset reflects the target population.
7. Best Practices
Use variable codes (V###) for merging with codebooks or comparing across formats.
Treat
-1
values consistently as non-selections in analysis.For ranked multi-choice questions, filter out
-1
and use positive values to analyze the order of selections.When comparing across formats, match on
reporting_id
(long format) toV###
codes (wide format).Look to the Long format for recoded values and coded open ends, since these are not included in the Wide format.
8. When to Use Wide Format
For statistical modeling and regressions.
When using survey data in BI dashboards or visualization tools.
When analysts want one record per respondent with all answers side by side.