Chapter 2 Data protection and privacy issues

Protecting privacy and confidentiality of survey respondents is a key concern and has to be considered carefully before preparing and launching a survey. This involves a clear understanding of the concepts involved and to clarify the legal and organizational requirements for the survey. It also involves following certain standards for data management, storage and data reuse.

2.1 About privacy and confidentiality

Although “privacy” and “confidentiality” are often used interchangeably the concepts should be distinguished (European Commission 2010).

Privacy is a more fundamental concept and entails the a) control of information about oneself, b) control over access to oneself (both physical and mental), c) control over one’s ability to make important life decisions.

Confidentiality on the other hand is a more limited concept and concerns first and foremost the protection of personal information. “Confidentiality is a duty that arises when someone has been granted access to information that would otherwise be kept secret” (European Commission 2010, 79ff).

Most importantly, privacy considerations limit the ways in which we acquire information, whereas confidentiality considerations deal with the protection of this acquired information.

In concrete terms, the distinction between “privacy” and “confidentiality” implies to specify how privacy is protected during data collection, including the decisions to (not) participate, and how the collected data will be handled in a confidential manner after it has been recorded.

For the GEAM survey, privacy implies to give respondents relative control over data entry (“what” and “when”), including the possibility to opt out and delete their data at any moment during the submission process. Confidentiality implies that result data is anonymized and stored in a secure way to prevent unauthorized access.

2.2 Introduction to the General Data Protection Regulation (GDPR)

The EU General Data Protection Regulation (GDPR Regulation 2016/679) is a regulation by which the European Parliament, the Council of the European Union and the European Commission intend to strengthen and unify data protection for individuals within the European Union (EU).

The GDPR enhances the rights of ‘data subjects’ (individuals to whom personal data pertain) and provides clear conditions for collecting, storing, processing and transferring personal data (including ‘special categories’ data) by institutions. It is advisable to become familiar with the principles outlined in Article 5 of the GDPR, which address the following:

  • Accountability
  • Lawfulness, fairness and transparency
  • Purpose limitation
  • Data minimization
  • Accuracy
  • Storage limitation
  • Integrity and confidentiality.

The GDPR is applicable to data processing. This includes data collection, recording, organization, structuring, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, restriction, erasure or destruction.

2.3 Collecting personal (sensitive) data

The GDPR draws a distinction between “personal data” and “personal sensitive data”.

Personal data means “any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person” (article 4.1).

Sensitive personal data or special categories of personal data refers to information revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, trade union membership, genetic data, biometric data, data concerning health or data concerning a natural person’s sex life or sexual orientation (article 9).

The GEAM includes items inquiring about Sensitive Personal Information (e.g. sexual orientation, health impairments or reporting discrimination associated with ethnicity) because these variables also constitute the main dimensions of social discrimination (Baumann, Egenberger, and Supik 2018). From our perspective, it is important to address these sensitive issues in the GEAM because otherwise important discriminatory practices will remain invisible and cannot be addressed by adequate Gender Equality measures.

The FUOC is responsible for providing a technical secure solution to conduct online GEAM surveys. However, the survey content and handling of result data will be the responsibility of the survey administrator.

In the last instance it is the survey administrator who has to decide the adequate security settings for a specific survey instance taking into account the GEAM content, national legislation and organizational information needs and context(s). In other words, it is up to individual organizations to specify who will have access to the raw data and how it will be stored securely.

Furthermore, it is up to organizations and the survey administrators to make sure that the GEAM survey is accompanied by adequate information for respondents to provide informed consent. Specifically, you must provide:

  • Information about your organization: including name and contact details, details of your representative (if relevant) and contact details of your Data Protection Officer.

  • Information about the type of data you will collect: for example, gender, contract details, salary, etc.

  • The purpose of collecting the data: including what you will use it for and whether it will be used to make an automated decision; the legal basis for using the data including any ‘legitimate interest’ relied upon. -Who will receive or have access to the data: for example, members of an Equality, Diversity and Inclusion committee, Human Resources, or a Gender Equality Planning group.

  • Other information: including whether the data will be transferred, stored, or processed outside the EU and on what basis; how long the data will be stored for; what security arrangements are in place to protect the data; whether provision of the data is required and the consequences of not doing so.

  • Data subjects’ rights: the right to be informed; right of access; right to rectification; right to erasure; right to restrict processing; right to data portability; right to object; and rights in relation to automated decision making and profile.

  • Contact information: who they can contact in relation to questions or complaints.

It is important to provide a means for individuals to give consent to your processing of their information after they have read the above information. An example statement for the collection of personal and sensitive personal data forms part of the GEAM questionnaire. It is the default setting in the GEAM LimeSurvey platform that requests respondents to accept the data protection statement before they can proceed to answer the survey. Survey administrators are nevertheless advised to carefully revise this statement together with the legal support of the targeted organization before launching their survey.

Questions and response options may need to be adapted according to an organization’s or country’s legal requirements where it affects monitoring practices, policies and the terms used to describe populations. Survey administrators, when editing questions and response options on protected characteristics, need to be aware of the rights and permissions in their country. For instance, they need to know if permissions to collect data on protected characteristics such as sex, race and sexual orientation differ or require an organization to re-phrase the language in the survey to be in line with regulatory standards.

When editing questions or response options for regulatory reasons, it is recommended that you use tried and tested, nationally accepted replacements (for example, the response options presented in a national census). This will increase the likelihood that other organizations in the same country have used similar questions and response options, producing a standardized approach and enabling organizations to benchmark themselves nationally.

In order to make informed decisions regarding privacy protection and confidential treatment of result data, please make sure to read this document carefully, especially the section on Controlling data privacy and response tracking. You will also be required to sign the Annex III - Declaration of Data Protection and Confidentiality Agreement for survey administrators before you can launch a survey (you can still edit and prepare your questionnaire without having signed the agreement with the FUOC. However, before collecting data, you will need to provide a signed copy).

Finally, additional information regarding the collection of sensitive personal data and considerations for primary research are discussed further in Advance HE’s research and data briefing on GDPR compliance (Christoffersen 2018).

2.4 Data storage

Once a survey has finished and the result data matrix has been downloaded from the LimeSurvey platform, we recommend the following steps to protect the confidentiality of the responses and prevent unauthorized access to your data.

2.4.1 Anonymization

Anonymity means that a person participating in the research cannot be identified from the information provided. This is distinct from confidentiality, which is when the researcher or data collector knows the identity of the person, but keeps this information secure, and anonymizes data before being published. For more on this distinction see Advance HE’s briefing on ethics in primary research (Haley 2017).

The GEAM does not contain questions regarding direct personal identifiers such as social security number, names, email addresses or similar. Thus, the collected data is anonymous on a very basic level. LimeSurvey does not store any personal information with the survey results either. However, respondents might provide certain identifiers in open text questions such as organizational names or names of colleagues which unintentionally can identify their (and others) contributions.

Please check the open text fields in your result data for any plain names and other possible identifiers submitted by respondents. Before further processing your results. We recommend replacing these instances with custom codes or XXX-ing them out.

2.4.2 Encrypting result data

A relatively easy but seldom applied measure to protect your data is to save the corresponding file(s) with a password. This works mainly when the results are downloaded as a Microsoft Excel file.

Before sharing the result data with the inner circle of colleagues, please password protect the file. Do not send the password together with the file (in the same email!).

More sophisticated options are available such as using digital signatures and GPG encryption. GPG encryption is easily available on Linux operation systems. For Windows, see https://www.gpg4win.org/. The advantage of using advanced encryption techniques consists of protecting your data with a personal signature/password without limitations to share data with others. The architecture of public and private encryption keys enables you to encrypt data with a specific addressee in mind – and which then can be only decrypted by that particular person. This basically enables you to share protected data without having to worry about circulating a single password (used for encryption) among collaborators.

2.4.3 Deletion

Note that when deleting data from a computer, the file is not actually erased but moved to a trash-bin and can be easily recuperated. This is especially important when working on a shared computer. When you download the result dataset, make sure others (including administrators) cannot access it at a later point in time.

Data should be stored on your personal (Desktop) account (protected by your password) or on an encrypted shared drive (e.g. a cloud drive or the orgnization’s virtual private network). In either case, the folder in which the data are stored should be restricted to a set of one or two individuals who require access to the raw data. If you store it on an external (flash) drive, the drive should be encrypted or password protected in order to prevent unauthorized access in case of loss.

2.5 Disclosure control

Whereas “anonymization” can remove or replace identifiers from a dataset and thus prevent the identification of respondents, “quasi-identifiers” pose a different challenge. Quasi-identifiers refer to the variables used in the study and whose combination can lead to the re-identification of the respondent. For example, Golle (2006) and Sweeney (2000) show that 63% (or 87% respectively for older data) of the US population can be unambiguously identified by combining a 5-digit ZIP code, birth date and sex!

Please consider whether the person to whom the data pertains might still be identified through indirect identifiers (e.g. occupation, salary); by people who know them or the context; or by those who have access to other information which, when combined with the data, might allow them to be identified. In these cases, it is necessary to take further action beyond removing names to anonymize the data. The GDPR contains a strict definition of anonymity: it considers data anonymous only when it cannot be identified by any means “reasonably likely to be used … either by the controller or by another person”. This means that if the data could be re-identified by any person using ‘reasonable effort’, it would not be considered to be anonymized, and respondents would need to be made aware of this during the consent process described above. For example, if a report summarizing the proportions of men and women working in individual roles only included one female respondent from a specific department, it is possible for this single individual to be identified by her colleagues.

In case you plan to distribute the raw data beyond the inner circle of persons involved with the collection and analysis of the GEAM, you need to carefully think about your quasi-identifiers. Quasi-identifiers need to be examined in relation to the number of respondents in your data. For example, a combination of professional category, age and gender might be enough to identify a person in a small research institute but insufficient in a dataset of several hundred or thousands of respondents.

If there is a small number of respondents within a certain category, this identifier may need to be removed before sharing to prevent breaching participant confidentiality and anonymity. Alternatively, it may be better to refrain from sharing the raw data and instead only share the results of the survey in summary tables that have small numbers represented as less than values (e.g. groups with fewer than 5 individuals are represented as ‘< 5’ instead of listing the exact frequency) or by applying a rounding strategy (e.g. all frequencies are rounded to the nearest 5 and any percentages with denominators of less than 22 individuals are suppressed).

Finally, when there are few participants, anonymization can also involve creating groupings from certain variables. For example, the GEAM collects data on respondents’ nationality, which could be transformed into groups by continent, or by EU versus non-EU, both of which would decrease the likelihood of them being identified.

2.6 Sharing data

Given the potential disclosure control issues, the ACT project does not publish any result data tables of GEAM questionnaires.

Survey administrators are advised at this point to only publish survey results in aggregated format.

References

Baumann, Anne-Luise, Vera Egenberger, and Linda Supik. 2018. “Erhebung von Antidiskriminierungsdaten in Repräsentativen Wiederholungsbefragungen. Bestandsaufnahme Und Entwicklungsmöglichkeiten.” Antidiskriminierungstelle des Bundes.

Christoffersen, Ashlee. 2018. “Data Protection and Anonymity Considerations for Equality Research and Data.” London: Advance HE.

European Commission. 2010. European Textbook on Ethics in Research. Luxembourg: Publications Office of the European Union.

Golle, Philippe. 2006. “Revisiting the Uniqueness of Simple Demographics in the US Population.” In Proceedings of the 5th ACM Workshop on Privacy in Electronic Society, 77–80. WPES ’06. Alexandria, Virginia, USA: ACM. https://doi.org/10.1145/1179601.1179615.

Haley, J. 2017. “Ethics in Primary Research (Focus Groups, Interviews and Surveys).” London: Advance HE.

Sweeney, Latanya. 2000. “Uniqueness of Simple Demographics in the US Population.” In LIDAP-WP4. Pittsburgh, PA: Carnegie Mellon University, Laboratory for International Data Privacy.