Long Excel Format

1. Overview

The Long format is designed for detailed, question-by-question analysis. It includes all the data stored in the platform and is identical to the data used internally for reporting. Each row represents a single respondent's answer to a single survey question. This format is particularly useful when you want to:

Explore how different questions were answered across respondents.
Pivot, filter, and group responses in tools like Excel, R, or Python.
Work with multi-choice and grid questions without needing to manage many columns.

2. File Structure & Layout

Each row corresponds to one answer to one option of one question from a respondent. Respondents therefore have multiple rows, one for each question they encountered.

Example (first 5 rows):

respondent_id	status	question	reporting_id	type	response	raw_response	responded	timestamp	weight
00044f3b-ec63-2e17-9b5c-970e0efd5a8b	Terminated	How old are you?	Age	NumericQuestion	35-44	36	1	2025-07-30 16:27:06.025	0.638
00044f3b-ec63-2e17-9b5c-970e0efd5a8b	Terminated	What is your gender?	Gender	MultiChoiceQuestion	Male	Male	1	2025-07-30 16:27:06.068	0.638
00044f3b-ec63-2e17-9b5c-970e0efd5a8b	Terminated	What is your gender?	Gender	MultiChoiceQuestion	Female	Female	0	2025-07-30 16:27:06.068	0.638

3. Key Columns

respondent_id – Unique identifier for each participant.
status – Final survey status (e.g., Completed, Terminated).
question – Full wording of the question asked.
reporting_id – The labelled identifier for the question as set in the dashboard (e.g., Age, Gender).
line_number – The line number of the question in the survey script.
type – Type of question (NumericQuestion, MultiChoiceQuestion, OpenEnd, etc.).
response – The recoded, human-readable response category (e.g., 35-44).
raw_response – The raw value stored (e.g., 36).
responded – Indicates whether and in what order the respondent selected the option. 0 = not selected, 1 = selected first, 2 = selected second, and so on.
timestamp – Time when the answer was submitted.
weight – Weighting factor applied to this respondent’s answers for statistical adjustment.

4. Data Representation

Single-choice questions

Stored as one row with responded=1.

Multi-choice questions

Stored as multiple rows per respondent per option. The chosen options have responded>0, with the number indicating the order in which the options were chosen. Unchosen options have responded=0.

Example: Multi-choice question

Question: Which of the following fruits do you like? (Select all that apply)

respondent_id	question	response	responded
r1	Fruits	Apple	1
r1	Fruits	Banana	0
r1	Fruits	Orange	2

Here, the respondent chose Apple first, Orange second, and did not select Banana.

Numeric questions

Both response (bucketed/cleaned category, e.g. 35-44) and raw_response (e.g. 36) are provided.

Open-end questions

The full text appears in response and raw_response.

5. Missing & Special Values

Non-responses may appear with responded=0 and empty raw_response.
"Prefer not to say" or similar options appear as normal response categories.
Terminated respondents may have partial rows depending on where they dropped out.

6. Weighting

Apply the weight column in analysis to ensure results reflect population targets.

7. Best Practices

Use pivot tables (Excel) or groupby (Python/Pandas) to aggregate responses.
For multi-choice questions, include all rows where responded>0 to capture all selected options. Use the order number if you need to analyze sequence of selection.
When comparing across formats, match on reporting_id (long) to variable codes (wide/SPSS).

8. When to Use Long Format

For deep exploratory analysis.
When handling multi-select or grid questions where wide format becomes cumbersome.
When exporting data into R/Python for custom cleaning, text analysis, or advanced visualization.