Skip to main content

Recoding and post-field data wrangling

Recoding, calculated fields, and removing poor quality respondents

Updated this week

However much you prepare upfront, there are always times after fieldwork when you need to make changes. Maybe a new grouping is needed, a typo slipped through, or you discover that a question needs to be bucketed differently for analysis.

To handle this, a standard recode script is generated automatically based on your survey. This script includes placeholders for getting values, recoding responses, marking poor quality respondents, and creating new calculated variables. You can then edit and extend this script as you need.

You can access this by hitting the data prep icon from the reporting page:

This will display the standard recode script for your survey, which will:

  • Read the value of each variable

  • Implement the recodes defined in your survey

  • Example code to create calculated variables and mark poor quality respondents

Each of the functions is explained fully below.

recode(reporting_id, recodes)

Why you'd use it:

  • Clean up messy text input (e.g., standardizing "Nissan" vs "nissan")

  • Collapse multiple options into grouped buckets (e.g., "18-24" and "25-34" into "Young")

Example:

r.recode("Q1", {"Male": "M", "Female": "F"})

mark_poor_quality(respondent_ids)

Why you'd use it:

  • Remove respondents you've decided don't qualify based on manual analysis of their responses.

  • You can get the respondent ID from the raw data downloads

Example:

r.mark_poor_quality(["respondent_1", "respondent_2"])

store_value(name, value)

Why you'd use it:

  • Create a new derived field that wasn't captured directly (e.g., "is_young", "heavy_user")

  • Prepare new reporting variables without changing the original questions

Example:

r.store_value("is_young", 1 if age < 35 else 0)

get_values(reporting_id)

Why you'd use it:

  • Retrieve all selected responses for a multi-select question to use the results in a new calculated variable.

Example:

selected_brands = r.get_values("Q_brands")

get_value(reporting_id)

Why you'd use it:

  • Retrieve the single answer from a question where only one choice is allowed, to use the results in a new calculated variable.

Example:

gender = r.get_value("Q_gender")

Notes

  • recode() replaces or maps answers without modifying the original capture

  • mark_poor_quality() updates status for final data output

  • store_value() lets you add new derived variables without editing the original survey

  • Always validate that your recodes match the correct reporting IDs β€” typos will raise errors with suggestions

  • You can customize the generated recoding script as much as you need to fit your analysis plan

Did this answer your question?