Skip to main content

Long Excel Format

All the data in one place

Updated over 2 weeks ago

1. Overview

The Long format is designed for detailed, question-by-question analysis. It includes all the data stored in the platform and is identical to the data used internally for reporting. Each row represents a single respondent's answer to a single survey question. This format is particularly useful when you want to:

  • Explore how different questions were answered across respondents.

  • Pivot, filter, and group responses in tools like Excel, R, or Python.

  • Work with multi-choice and grid questions without needing to manage many columns.


2. File Structure & Layout

Each row corresponds to one answer to one option of one question from a respondent. Respondents therefore have multiple rows, one for each question they encountered.

Example (first 5 rows):

respondent_id

status

question

reporting_id

type

response

raw_response

responded

timestamp

weight

00044f3b-ec63-2e17-9b5c-970e0efd5a8b

Terminated

How old are you?

Age

NumericQuestion

35-44

36

1

2025-07-30 16:27:06.025

0.638

00044f3b-ec63-2e17-9b5c-970e0efd5a8b

Terminated

What is your gender?

Gender

MultiChoiceQuestion

Male

Male

1

2025-07-30 16:27:06.068

0.638

00044f3b-ec63-2e17-9b5c-970e0efd5a8b

Terminated

What is your gender?

Gender

MultiChoiceQuestion

Female

Female

0

2025-07-30 16:27:06.068

0.638


3. Key Columns

  • respondent_id – Unique identifier for each participant.

  • status – Final survey status (e.g., Completed, Terminated).

  • question – Full wording of the question asked.

  • reporting_id – The labelled identifier for the question as set in the dashboard (e.g., Age, Gender).

  • line_number – The line number of the question in the survey script.

  • type – Type of question (NumericQuestion, MultiChoiceQuestion, OpenEnd, etc.).

  • response – The recoded, human-readable response category (e.g., 35-44).

  • raw_response – The raw value stored (e.g., 36).

  • responded – Indicates whether and in what order the respondent selected the option. 0 = not selected, 1 = selected first, 2 = selected second, and so on.

  • timestamp – Time when the answer was submitted.

  • weight – Weighting factor applied to this respondent’s answers for statistical adjustment.


4. Data Representation

Single-choice questions

Stored as one row with responded=1.

Multi-choice questions

Stored as multiple rows per respondent per option. The chosen options have responded>0, with the number indicating the order in which the options were chosen. Unchosen options have responded=0.

Example: Multi-choice question

Question: Which of the following fruits do you like? (Select all that apply)

respondent_id

question

response

responded

r1

Fruits

Apple

1

r1

Fruits

Banana

0

r1

Fruits

Orange

2

Here, the respondent chose Apple first, Orange second, and did not select Banana.

Numeric questions

Both response (bucketed/cleaned category, e.g. 35-44) and raw_response (e.g. 36) are provided.

Open-end questions

The full text appears in response and raw_response.


5. Missing & Special Values

  • Non-responses may appear with responded=0 and empty raw_response.

  • "Prefer not to say" or similar options appear as normal response categories.

  • Terminated respondents may have partial rows depending on where they dropped out.


6. Weighting

  • Apply the weight column in analysis to ensure results reflect population targets.


7. Best Practices

  • Use pivot tables (Excel) or groupby (Python/Pandas) to aggregate responses.

  • For multi-choice questions, include all rows where responded>0 to capture all selected options. Use the order number if you need to analyze sequence of selection.

  • When comparing across formats, match on reporting_id (long) to variable codes (wide/SPSS).


8. When to Use Long Format

  • For deep exploratory analysis.

  • When handling multi-select or grid questions where wide format becomes cumbersome.

  • When exporting data into R/Python for custom cleaning, text analysis, or advanced visualization.

Did this answer your question?