A question often asked by organizations working with AI-driven recruitment tools: is my candidate data being used to train artificial intelligence or improve models? It is a valid concern. In times when privacy and data protection are paramount, and the EU AI Regulation and GDPR impose strict requirements on the use of personal data, transparency is crucial. For CHROs, HR directors, and other decision-makers, it is essential to know how suppliers such as Selection Lab handle candidate data, whether this data is used for other purposes, and what safeguards are in place.
Using candidate data for AI training means that you use applicant data to teach a model patterns so that it can deliver better predictions, analyses, or generated output. This could involve training a matching model (who suits which job), a scoring model (chance of success), or a generative model that creates candidate profiles. Essentially, historical data is used as a learning basis to improve future decisions or recommendations.
Various types of data can be used for training, such as professional data (work experience, education, skills), assessment results (cognitive scores, personality profiles), interaction data (chat conversations, interview transcripts), and outcome data (hired yes/no, performance after joining the company). Derived data such as structured competency profiles or embeddings can also be used, provided they are carefully managed.
What data can be used for this purpose?
Identifiable personal data (name, email, raw CV, video recordings) entail increased privacy and compliance risks. Special personal data (such as health or ethnicity) may not, in principle, be used for training purposes, unless a very specific legal exception applies. Even pseudonymized data remains subject to the GDPR.
It is important to distinguish between using data for direct processing in a recruitment process and reusing data for model improvement. The latter requires a clear basis, data minimization, and appropriate security measures. Without explicit governance, training on candidate data can lead to legal risks, bias reinforcement, and reputational damage.
Candidate data is used to improve AI models because it helps models function better within the recruitment context. Main reasons:
The General Data Protection Regulation (GDPR) applies within Europe. The use of candidate data for AI training falls under the processing of personal data and must comply with strict conditions.
Legal basis: Organizations must have a valid legal basis, for example:
Consent must be freely given. In a job application context, this can be complex due to power relations.
Purpose limitation: Data may only be used for clearly defined purposes. If data was originally collected for selection purposes, an explicit assessment must be made as to whether reuse for model improvement is compatible with this.
Data minimization and retention periods: Only necessary data may be used. Retention periods must also be clear and proportionate.
Transparency: Candidates must know:
Transparency is crucial for trust in data-driven assessments.
Data storage in the EU
All personal data and application data are stored on EU servers. We work with infrastructure and services in Europe, including AWS (Frankfurt and Dublin), SendinBlue (Belgium/Ireland), and Heroku (Dublin). Data is not transferred outside the EEA without the explicit consent of the customer. For continuity, failover mechanisms are in place to ensure that services remain available even in the event of incidents.
Clear roles and strict agreements with processors
We apply clear responsibilities in our processing. Selection Lab acts as the controller for candidate and employee assessments, while customers remain the controllers for their own HR processes. We work with sub-processors (such as AWS, Typeform, and SendinBlue) exclusively under signed processing agreements, so that obligations regarding security, confidentiality, and data processing are contractually laid down.
Access and security
We limit access to data to what is strictly necessary and provide extra security for privileged access. We do this through role-based access control (RBAC) and multi-factor authentication (MFA), among other measures. For identity management, we use Auth0 and Azure AD, with quarterly reviews of access rights and immediate revocation in the event of a role change or departure. In addition, we take technical measures such as encryption, pseudonymization/anonymization where appropriate, and central logging/monitoring via SIEM. We regularly perform penetration tests and vulnerability scans to identify and mitigate risks in a timely manner.
Privacy by design and control for candidates
Privacy is built into our processes. Candidates give explicit, unambiguous consent for each purpose, and that consent can be withdrawn at any time. Upon withdrawal, deletion is processed within 24 hours. We follow the GDPR principles (Art. 5–32), including data minimization, transparency, and facilitating data subject rights. A Data Protection Officer (DPO) has been appointed for supervision and assurance: Joeri Everaers.
Retention and sharing of results
We apply short retention periods and ensure that sharing results is always a conscious choice on the part of the candidate:
AI, fairness, and verifiability
When AI is used, we do so with clear boundaries and verifiable methods. We do not use characteristics such as gender, age, ethnicity, or religion in scoring. Our models are transparent and explainable (decision tree), and we perform independent bias checks. Within the framework of the EU AI Act, we classify this as high-risk AI, with corresponding safeguards to support reliability, non-discrimination, and auditability.
Or request a callback here.