1. Overview
The Long format is designed for detailed, question-by-question analysis. It includes all the data stored in the platform and is identical to the data used internally for reporting. Each row represents a single respondent's answer to a single survey question. This format is particularly useful when you want to:
Explore how different questions were answered across respondents.
Pivot, filter, and group responses in tools like Excel, R, or Python.
Work with multi-choice and grid questions without needing to manage many columns.
2. File Structure & Layout
Each row corresponds to one answer to one option of one question from a respondent. Respondents therefore have multiple rows, one for each question they encountered.
Example (first 5 rows):
respondent_id | status | question | reporting_id | type | response | raw_response | responded | timestamp | weight |
00044f3b-ec63-2e17-9b5c-970e0efd5a8b | Terminated | How old are you? | Age | NumericQuestion | 35-44 | 36 | 1 | 2025-07-30 16:27:06.025 | 0.638 |
00044f3b-ec63-2e17-9b5c-970e0efd5a8b | Terminated | What is your gender? | Gender | MultiChoiceQuestion | Male | Male | 1 | 2025-07-30 16:27:06.068 | 0.638 |
00044f3b-ec63-2e17-9b5c-970e0efd5a8b | Terminated | What is your gender? | Gender | MultiChoiceQuestion | Female | Female | 0 | 2025-07-30 16:27:06.068 | 0.638 |
3. Key Columns
respondent_id – Unique identifier for each participant.
status – Final survey status (e.g., Completed, Terminated).
question – Full wording of the question asked.
reporting_id – The labelled identifier for the question as set in the dashboard (e.g., Age, Gender).
line_number – The line number of the question in the survey script.
type – Type of question (NumericQuestion, MultiChoiceQuestion, OpenEnd, etc.).
response – The recoded, human-readable response category (e.g., 35-44).
raw_response – The raw value stored (e.g.,
36
).responded – Indicates whether and in what order the respondent selected the option.
0
= not selected,1
= selected first,2
= selected second, and so on.timestamp – Time when the answer was submitted.
weight – Weighting factor applied to this respondent’s answers for statistical adjustment.
4. Data Representation
Single-choice questions
Stored as one row with responded=1
.
Multi-choice questions
Stored as multiple rows per respondent per option. The chosen options have responded>0
, with the number indicating the order in which the options were chosen. Unchosen options have responded=0
.
Example: Multi-choice question
Question: Which of the following fruits do you like? (Select all that apply)
respondent_id | question | response | responded |
r1 | Fruits | Apple | 1 |
r1 | Fruits | Banana | 0 |
r1 | Fruits | Orange | 2 |
Here, the respondent chose Apple first, Orange second, and did not select Banana.
Numeric questions
Both response
(bucketed/cleaned category, e.g. 35-44) and raw_response
(e.g. 36
) are provided.
Open-end questions
The full text appears in response
and raw_response
.
5. Missing & Special Values
Non-responses may appear with
responded=0
and emptyraw_response
."Prefer not to say" or similar options appear as normal response categories.
Terminated respondents may have partial rows depending on where they dropped out.
6. Weighting
Apply the weight column in analysis to ensure results reflect population targets.
7. Best Practices
Use pivot tables (Excel) or
groupby
(Python/Pandas) to aggregate responses.For multi-choice questions, include all rows where
responded>0
to capture all selected options. Use the order number if you need to analyze sequence of selection.When comparing across formats, match on
reporting_id
(long) to variable codes (wide/SPSS).
8. When to Use Long Format
For deep exploratory analysis.
When handling multi-select or grid questions where wide format becomes cumbersome.
When exporting data into R/Python for custom cleaning, text analysis, or advanced visualization.