For almost every organization the requirements for data protection are painful and burdensome. Irrespective of the jurisdiction, in most cases having personal data, equal relevant duties (even in China the new data protection regime is coming into force). That is a fact and with the increasing pressure on data privacy we should expect more stringent rules for that, and every data breach may cost a lot in terms of money (fines) and reputation.
But we are here not to worry you but to show you that you can be sure that sharing is caring and that compliance with relevant laws and regulations is not difficult as you imagine.
But let’s start with the explanation. Using AI-based models or systems, including the natural language processing tools, require data. Data that can be personal and non-personal depending on particular use case and needs. This data can come from internal (company’s system) or external (websites, external data pools) sources. The NLP tool will have to process such data to get the outcome you need, including processing the personal data – if applicable.
One way to mitigate the risk that is associated with the external providers (that is a little bit exaggerated) is to implement the NLP solution on-premise and perform all tasks that are necessary to have a fully-fledged model – internally. This is, however, not the most optimal way to leverage automation as any calibration and adjustments to the model should be done by people within the organization, not the ones that created the NLP tool.
The second way is linked to trust. The AI-based system or tool provider that is an external party to your organization must ensure that all transfers and processing of data are in line with relevant laws and regulations. Providers are using various techniques and methods to protect the incoming and outcoming data, especially by using encryption methods and pseudonymization (that is however not always the best way to apply). Providers are also subject to stringent rules relating to cybersecurity that should ensure that the data entrusted is properly secured.
You should not also forget that your partner is also subject to relevant data protection rules and requirements and may become a data controller or data processor that must adhere to certain standards and is also subject to the supervision of the data protection authority. Such an entity has the same ‘interest’ in protecting data as you have.
In the end, there is your decision as to whether choose an on-premise or more flexible SaaS model. You can also consider not granting them access to personal data and keep training your NLP model on non-personal data. If this is the case you should not forget that the effectiveness and accuracy of the model may be impaired and the outcome not meet your expectations. The choice is yours, choose wisely.