February 3, 2022
Clinical studies and privacy: Anonymize data with this Narrativa model
By Sofía Sánchez González
Privacy is one of the most talked about topics these days and keeping our data safe seems impossible sometimes. Clinical studies are not immune; to carry them out, it’s necessary to anonymize the data of participants so that science may advance while respecting the rights of those involved. But did you know that you can easily comply with privacy regulations and anonymize data using artificial intelligence?
How does our AI model work?
At Narrativa we wanted to provide a solution to the privacy problem. It all started with our technical team investigating medical files found in the Plan for the Promotion of Language Technologies.
The TL Plan aims to promote the development of natural language processing, machine translation and conversational systems in Spanish and other languages.
For this reason, they often release datasets. Among these datasets, clinical comments from patients were found—and then our technical team had an idea.
https://github.com/PlanTL-GOB-ES/SPACCC_MEDDOCAN
In this dataset there is an extensive list of patient data:
- Names
- Surname
- Location data
- Phone number
- Date of birth
- Date of admission
- Treating doctor
But to carry out a clinical study and anonymize the data we have to accomplish several steps.
Two steps to anonymization
1.Identify the information on each tab
The dataset has a large amalgamation of data, but they aren’t sorted. We start with the most difficult part: identifying the information and classifying it. (It’s crucial to know where there is sensitive and personal information.)
https://github.com/PlanTL-GOB-ES/SPACCC_MEDDOCAN/blob/master/corpus/train/brat/S0004-06142005000500011-1.txt
2. Anonymization
https://github.com/PlanTL-GOB-ES/SPACCC_MEDDOCAN/blob/master/corpus/train/brat/S0004-06142005000500011-1.ann
Once we have identified all the patient data, we have to mask (or redact) their personal data to protect their privacy and comply with confidentiality protocols. The model created by Narrativa offers two masking options:
-
To cover it up/black it out
-
To use false/made-up information
Privacy and pharmaceutical companies
Data is the new oil, but we have to be careful when dealing with it—especially when it comes to information as sensitive and personal as medical data. Clinical studies must scrupulously comply with current regulations and Narrativa can help pharmaceutical companies to do this.
Anonymizing data is the most fundamental step; worldwide there has been a huge push for data protection regulation. Take a look at this study in Spanish that delves into the reasons.
Narrativa specializes in collecting, processing and analyzing data, so your company can focus on what’s truly important. With this model, pharmaceutical companies will be able to streamline these privacy processes with much more powerful technology. In fact, the model is more than 70% accurate with the F1 metric.
A solution for all types of companies
Sure pharmaceutical companies can benefit from this model, but so can businesses in other industries. It can also be decisive for companies that conduct research in other sectors, both social and economic.
When we talk about artificial intelligence, data first has to be anonymized when training any model. As each company is different, Narrativa offers personalized solutions for your organization. If you want to anonymize any type of data at your company, don’t hesitate to contact us!
About Narrativa
Narrativa is an internationally recognized business intelligence company that uses its artificial intelligence and machine learning platform to build and deploy natural language content solutions for enterprises. Its proprietary technology suite, consisting of data extraction, data analysis, natural language processing (NLP) and natural language generation (NLG) tools, all work together seamlessly to power a lineup of smart content creation, reporting automation and process optimization products.
Contact us to learn more about our solutions!
Share