The AI platform purpose-built to prevent bias
Since 2014, Textio has pioneered ethical AI in HR. By training its guidance with real-world outcomes, Textio promotes equal opportunity in hiring and performance management. Find out why 25% of Fortune 500 companies trust Textio when it counts most: in the hiring and performance conversations that shape people’s careers.
Impact 20
Most innovative
company
Top HR product
Fastest growing
private companies
Seriously secure
Your data is private
Textio AI automatically strips your content of PII like names, email addresses, and phone numbers.
Results are bias-free
Textio Verified scans the results from LLMs and removes bias, so you know the text is safe to use.
Security is best-in-class
Textio maintains a suite of certifications to ensure that our users’ data is safe and secure.
How we build mindful AI at Textio
Textio’s approach ensures that we approach this technology in a safe, ethical, and balanced way. That’s why many of our customers use Textio to screen for bias in other AI platforms’ results.
A diverse team
Just like having diversity in a training dataset matters, having a diversity of perspectives, backgrounds, and experiences on a team is necessary to create the best, safest, smartest products. Textio’s executive level is 70% women. We invest in DEIB internally with headcount, programming, training, and of course technology.
Designing for a specific purpose
Textio brings together over 30 different models that run together to generate its real-time guidance. Because it’s purpose-built for a specific use case like job posts and performance management, Textio can both improve what’s written and identify what’s missing.
Learn more:
Building with bias in mind
At Textio, we have several layers of bias mitigation and quality assurance built into our approach from the very first step of development.
Bias creeps into training data two ways: lack of diversity and representation in the training examples; and bias of human labelers.
It’s not unusual for a dataset to be, for example, heavily based on males in their 30s. That’s bias in the representation of the dataset. We mitigate this by balancing our datasets across demographics (gender, race, etc.).
Then there’s the risk of bias in human labelers. Before data can be used for AI training purposes, it must be “labeled” or annotated to show the model how it should be interpreting such data.
The problem is that people labeling data can have their own biases. We assess this by having multiple people annotate the same text and using a diverse group of experts to label our data. We measure how often the annotators agree with each other using Cohen’s Kappa statistic. High agreement (in the 75-80% range) indicates better quality data and assures individual bias is not introduced.
In cases where annotators disagree, we have a separate annotator tie-break before it’s used for our model training. If the agreement is not at least 75%, we retrain the annotators and re-annotate the data.
We use classification—a machine learning method—to determine the context of a phrase and whether it should be “flagged” according to what you’re writing (employee feedback, a performance review, a job post, etc.).
We test bias in our classification models on a common ML metric called an F-score. This is a statistic that measures how well the model finds the correct answers and how many correct answers it misses. If we test our model over the same data with different names (“Sally is a bubbly person” and “Bob is a bubbly person”), we should see consistent F-scores. If the F-scores are not consistent, there is bias.
When we find bias, we identify the source of bias, mediate appropriately, and retest. Some things we consider:
- Is the data balanced? Do we need more representative data?
- Do we need to choose a different model? Or do we need to re-train the model with different parameters?
- Is there bias in the data we use to measure the performance of the model?
- Consider incorporating “in-processing fairness techniques” that influence how the model learns
Textio’s generative AI features take input text or answers to prompts and can write, rewrite, or expand the message. We test to make sure that the output doesn’t contain demographic biases based on gender, race, or age.
One way we measure bias in these features is to vary names in the input text and test if the generated content is different. For example, we’d test “Rewrite: Sally is a bubbly person” and “Rewrite: Bob is a bubbly person” and compare the results.
To determine whether the differences are meaningful across demographic groups, we collect different generative AI outputs for each variation (for example, male vs. female names) at a large scale. We then run a paired t-test to compare the distribution of words across each of these groups. If there is a significant difference in the language used in one group over another (defined by the p-value, where p < .05), we can confidently say the output of the generative AI model is biased. If so, we would then:
- Do a qualitative analysis of the bias to identify the themes and characteristics of the differences
- Iterate on the prompt strategy and add hard-coded rules (if necessary) to correct the behaviors of the AI
- Remeasure
Research
Textio regularly publishes industry-leading research on performance feedback and workplace bias. Message our research team at research@textio.com.