Working with sensitive data
Sensitive Research Data Tool
It is recommended that all researchers when working on a project complete the Sensitive Research Data Tool. This tool will be able to advise you on the sensitivity level of your data, suggest the best publication method, determine ethics requirements and more.
Why classify research data by sensitivity?
All types of research data have risks associated with its reuse. However, the risk varies greatly between different data types. It is important to understand how sensitive your research data is. The sensitivity level of your data will impact how you properly manage, store, share, publish and re-use data in a way that protects you, your research participants, the university and any other groups associated with your research.
Not all data has the same level of sensitivity. The Sensitive Research Data Tool classifies data under one of four levels. These are the data sensitivity levels from most to least sensitive:
- High
- Medium
- Possible Sensitivities
- Low
What is considered sensitive data?
There are many types of data that can be considered sensitive, including but not limited to:
- Personal data about participants
- Data related to vulnerable groups i.e. children, indigenous communities
- Ecological data
- Data with military applications
- Data involving biological agents
- Data collected via sensors/camera/other technologies
- Non public corporate data
- Data intended for future commercial release
- Data collected under an agreement with a 3rd party
If your research data contains multiple types of sensitive data, the highest sensitivity level of all of the types should be applied to the overall dataset. For example, if you have some high sensitivity and medium sensitivity types of data in your research the sensitivity level of the whole dataset would be considered high.
How do I go about publishing sensitive research data?
When publishing research data the general rule is to publish in the most open way possible. Completing the Sensitive Research Data Tool will suggest the most open publication method allowed by your types of data sensitivities. For more information about how to publish your data to CQUniversity's institutional repository aCQUIRe please visit the Using Data Manager Page.
The many types of personal data
Personal data about research participants is some of the most sensitive that is regularly handled by researchers.
With regards to personal data there are a variety of types of information that can be contained in this data.
This is data that can directly identify a particular participant of a research project. Examples of this include:
- Name
- Date of Birth
- Physical Address/s
- Electronic Address/s – this includes email and IP addresses
- Phone Number
- Other Identification numbers – e.g. Medical Record Number, Employee Number
- Biometrics data – this includes photographs
Some personal data does not directly identify a research participant but can be combined with other personal information to allow for a participant to be identified. This is known as re-identifiable data.
Two of the most common types of re-identifiable data types include postcodes/suburbs and age (either exact or ranges). These combined with other sensitive personal data as outlined below could allow for a participant to be identified.
A subset of re-identifiable data falls under the subset of sensitive personal information. Sensitive personal information is information that increases chances of negative reactions, bias against and future actions against participants if made public.
Examples of sensitive personal data include:
- racial or ethnic origin
- political opinions,
- membership of a political association,
- religious beliefs or affiliations, philosophical beliefs
- membership of a professional or trade association or trade union
- sexual preferences or practices
- criminal record
- gender identity
- medical condition
These pieces of personal information have a risk of being able to be combined either together or with other re-identifiable data to allow for participant identification.
De-Identifying data
When looking at ways to be able to share or even publish personal information about participants a common method is to de-identify the data. De-identified data removes the details that would allow for any particular participant to be identified from the published data. It is important to note that keeping both the original and de-identified data is considered best practice to allow for data verification.
Guidelines on best practices for data de-identification are available from ARDC’s Sensitive Data page, which includes links to a number of different Australian and International guidelines and frameworks.
Looking for a procedural approach to anonymise your data? Check out the UK Data Services Step By Step page. This includes guidance on what is required for both quantitative and quantitative data de-identification.
Ethics and research data
Professional bodies, institutions and funding organisations follow the Australian Code for the Responsible Conduct of Research in order to maintain confidentiality of research data and primary materials by protecting the data from unauthorised access and use.
Ethical treatment of data applies to:
- Collecting data
- Creating/analysing data
- Storing data
- Whether or not data will be shared, and associated access provisions
CQUniversity has 3 ethics committees for different research types:
In addition, there are specialist requirements for any research undertaken involving the Great Barrier Reef.
Full details about these different boards and relevant procedures are available from the CQUniversity Ethics and Integrity page.
Further reading on sensitive data
ARDC’s Sensitive data page – This page includes great details on publishing sensitive data, how to de-identify sensitive data, tips for working with indigenous data and more.
The De-Identification Decision-Making Framework – Developed by CSIRO this document includes a highly detailed approach to ensuring proper de-identification of data.
Five safes frameworks – Originally developed by the UK's Office of National Statistics and adopted by many groups including the Australian Bureau of Statistics. Designed to balance risks and data access for personal data that could potential identify individuals.
CARE Principles – CARE principles are designed to allow for Indigenous data to be properly governed and re-used with respect.
ARDC’s FAIR data page – Includes detailed information about how to make your data FAIR, including a self assessment tool to determine how FAIR your publish data is.