In the context of developing and implementing AI based systems, I have observed three key bias related challenges.
- Unquestioned acceptance of a practice gives rise to a bias
- Some biases are well hidden
- The datasets often generate the bias
Let us look at each of these in detail.
Unquestioned acceptance of a practice gives rise to a bias
English Grammar teachers have long taught us the preferred use of the active voice (e.g. The cat ate the rat) over the passive voice (e.g. The rat was eaten by the cat). Over time, this has become almost axiomatic. If you try running the text of any article through the grammar checker of MS-Word or Grammarly and they will highlight the passive sentences and suggest changes.
Is passive voice really that bad? Boni Wagner-Stafford co-founder of Ingenium Books, in a great article, argues with four good reasons to use the passive voice including that readers perceive it as “… as more professional and more authoritative.” However, the bias has been established consciously and otherwise in society. It is possible that a machine learning admissions officer program screens thousands of admission essays for “0% Passive Sentences” and a single passive sentence lowers the scoring for an other-wise outstanding essay!
Warning: This article is strewn with passive clauses and very passive sentences!
Some biases are well hidden
Recent research at the University of Kansas shows that mutual fund managers may be influenced by the political leanings of the executives of the companies in their portfolio. The researchers say “…partisan-leaning fund-managers allocate about 43 percent of their assets to firms whose executives have similar political leaning and allocate only 33 percent of their assets to firms with the opposite political leanings.”
Robot advisors are all the rage in the financial world and they are offering advise to millions of clients based on algorithms and rules possibly provided by some of the mutual fund managers identified in the above study. The on-line user would be offered a specially curated selection from thousands of options. Little would the user know of the hidden biases but would surely know the real world outcomes of her investment decisions in the years to come. The bias is well hidden!
The datasets generate the bias
Joy Buolamwini in her MIT Master’s thesis showed that a facial recognition system had high accuracy rates for identifying the gender of lighter skinned males and significant errors in identifying dark skinned females. The thesis identified the root cause, the datasets evaluated are overwhelming lighter skinned: 79.6% – 86.2% and only 24.6% female and 4.4% darker female, and features 59.4% lighter males. The suggested solution to improve the accuracy of the systems was to develop more diverse datasets. Reports suggest that the companies subsequently updated the technology.
I advice clients to rigorously validate the data diversity; also ensure diversity in individuals managing the data!
Summary
IT systems were developed with the process being defined and then the data moving through the steps. In AI based systems, data often determines the process and outcomes. Here, GIGO is worse; smellier and sometimes downright dangerous. Decision makers need to consciously raise the issue of bias and address the same in their business processes.
Pic Credits : https://free-images.com