[Photograph by Sergey Klimkin under Creative Commons]
The last few months have been an eye opener for me on how ubiquitous is the use of psychometrics as an assessment tool. My new role as the founder of a company dedicated to leadership development and my engagement with various organisations brought this home to me. I have realised that this is led by the well-meaning folks in various organisations, who chase objectivity and want to eliminate subjectivity, read judgement, in selecting managers and leaders. However, during my engagements, I recognised that most of the HR functionaries responsible for talent acquisition and leadership development have not taken the time to sit back and reflect on the limitations of psychometric tools and the best way they can leverage them.
I was no different until KV Kamath, then MD and CEO at ICICI Bank, opened my eyes in July 2002. I was then heading the HR function at ICICI Bank and I too was blindly using a combination of intelligence tests and behaviour profiling tools to select managers and leaders. Kamath, as is his wont, challenged me on this. How could I be sure that these tools were not actually filtering out the best managers and leaders, he asked?
I told him that internationally acclaimed test development organisations have developed these tools and that they were well researched. He asked me to carry out my own research by administering the same tests to 50 leaders who were one level below the board and whose capabilities as leaders was beyond dispute because of their track record over at least 15 years. Most of these leaders are today CEOs of large organisations across the world.
The rest, as they say, is history. Based on the tests, only one among the 50 would have been selected. I was shocked. My mission to deconstruct the world of intelligence tests and psychometrics commenced that day. This article has been brewing in my mind for years.
It is important for business leaders and HR practitioners to take a step back and re-examine the unintended consequences of blindly using off-the-shelf psychometric tools for assessment. This article is intended to help them think about psychometric tools with a new perspective.
Can human ability and behaviour be measured?
The search for conclusively predicting human ability is age old. The period following World War II saw the indiscriminate use of IQ tests to find the “whiz kids”. This led to institution after institution blindly using IQ tests with scant understanding of their application and limitations.
In his book The Best and the Brightest, journalist-turned-author David Halberstam chronicles how in the 1960s, the high IQ “whiz kids” in John F Kennedy’s administration crafted “brilliant policies that defied common sense” with respect to Cuba and Vietnam, which led to disastrous consequences for the US.
This is what Kamath was drawing my attention to. That leaders in industry can become academic and invite disastrous consequences through faulty selection/rejection using inappropriate tools and norms.
Alfred Binet, the inventor of IQ tests, had himself cautioned against using such tests to predict human ability. Eminent psychologists such as Jean Piaget, David McClelland, Richard Boyatzis and Daniel Goleman have questioned the propriety of using IQ tests and other such tools to judge or predict human ability.
I believe that it is important to challenge this lazy and ill-informed practice and put psychometrics in perspective. Can psychometric tools predict someone’s success or failure in a job or leadership role? Can these tools “measure” human behaviour? Is there a “unit of measure” of human behaviour? Distance is measured in kilometres, volume in kilolitres, speed in kilometres per hour, and weight in kilograms. How can behaviour have calibrated measures like these?
Human behaviour is indeterminate and is a response to the stimulus from the environment. Its frequency, strength, volatility or stability, flexibility and above all, its impact, cannot be measured, because we have not yet designed a calibrated scale.
Informed professionals will challenge me by referring to the Likert scale. The scale gauges attitudes, values and opinion based on the extent to which respondents agree or disagree with a set of statements.
However, the Likert scale delivers broad judgement at best, not measurement. When more than one person responds to statements about the frequency, strength, volatility or stability of someone’s behaviour, the scale helps in judgement. But it does not provide an estimation of the behaviour, leave alone measurement, because no two persons’ experience of another person’s behaviour can be consistently recalled and reported accurately.
The other issue with the measurement logic is that the scale used for measurement has to be calibrated. This means that every point of measurement can be measured finitely, for us to arrive at a quantitative value whose variance is within acceptable range, irrespective of who measures it.
Let us take an example:
He/She understands the unexpressed motives of others.
Always
Most of the time
Sometimes
Rarely
In a calibrated scale, the distance in terms of measurement value between two measurement indicators on the scale has a fixed value or proportion.
So in this example, how fixed are the measurement values between “Always” and “Most of the time”; and between “Most of the time” and “Sometimes”?
Is it “one unit” between “Always” and “Most of the time”; one unit between “Most of the time” and “Sometimes”, and two units between “Always” and “Sometimes”?
“Always” or “Most of the time” and “Sometimes” cannot have a fixed numerical value. This is different from having a scale that says “100 times in a year”, “50 times in a year”, and “25 times in a year”.
The Likert scale notwithstanding, we do not yet know how to measure human behaviour. At best, we can confirm the presence or absence or preference with some degree of frequency and strength of judgment. This is not even like estimating (instead of measuring) the size of a room. When we say “very big”, “big”, or “small”, we are judging and not measuring.
When the output from these psychometric instruments are presented as quantitative numerical values such as indices or normative values, they fool us into believing that we have achieved measured quantitative objectivity.
Can psychometric tests reliably predict job success or leadership success?
We should be certain that direct and not surrogate outcome indicators connect a particular behaviour to the specific outcome. Performance or talent ratings are surrogate indicators, and will not suffice. How sure are we that extravert orientation brings success in sales compared with an introvert orientation? Even Myers Briggs Type Indicator (MBTI), the orientation profiling tool, does not claim so. More importantly, what direct measure of selling can be linked to extraversion or introversion?
We need to establish that a particular behaviour is not merely related to a particular outcome, but is the driver which causes the outcome. Hence, correlation will not suffice. Correlation only tells us that when a particular behaviour is present, a particular outcome is also present; it won’t tell us which causes which. For example, fever and infection are related, but they do not tell us what caused the infection. At ICICI Bank, by blindly linking analytical and quantitative ability to leadership success, I would have caused great damage had it not been for Kamath’s challenge.
When the designers of these tools claim that they have established predictive validity (that is, the outcome of the test predicts later performance), we have to take it with a pinch of salt. Let us say they claim to predict leadership success, should we not ask the following?
- What is the “measure” of leadership success? Is it the same in all contexts?
- Which characteristic in a multi-characteristic tool have they found to have predictive validity for leadership success? For example, let us say, trust, power, perspectives, risk appetite and political savvy are the multiple characteristics that influence leadership outcomes. Should we not ask what algorithm was used to establish predictive validity? It is almost impossible to establish the predictive validity that even a single characteristic—say, trust—has on leadership outcomes, especially when we have no clarity on what is the measure of the outcome. So, most designers use distant measures like the performance rating of an individual or an internal talent rating. This is what I call surrogate measures which are themselves outcomes of gross judgment.
Let us grant leeway and accept this approach of surrogate measures. Let us also say that instead of a casual connect there is a relationship connect. Then should we at least not ensure the correlation scores are in excess of .65 to .80? (The closer the score is to 1, the stronger is the correlation.)
Till date I have not come across any psychometric tool which shows a correlation of more than 0.2 even to these surrogate indicators. A 0.2 level of correlation even between two variables is no better than a guess. It doesn’t determine what caused the outcome. How risky is it then to use these as selection and elimination tools?
We also have to be sure that a particular outcome is not caused by any other behaviour. In the sales example, let us say for the sake of argument that extraversion drives success. How sure are we that perseverance does not also cause the same outcome? Assuming both drive success in sales, how would we empirically assign weightage to extraversion and perseverance? If we were to assign weightage non-empirically, how can the outcome be objective?
It is also possible that a particular behaviour causes a particular outcome only when another behaviour is present or absent. For example, for nurturance to be effective, care together with faith in others’ ability is required, but there should be no competitiveness. What constructs will help us make this empirical? If it cannot be empirical, this is at best qualitative data supported by judgment. It cannot have empirically confirmed predictive validity.
We also have to know precisely the frequency, intensity, stability, or flexibility of behaviour that is necessary to cause the particular outcome. For instance, a certain temperature is required for copper to melt; and some other metal will not melt at that temperature. At what level of intensity is someone passionate? At what level of intensity is she obsessive? How much care is useful and when does care obstruct clinical focus on performance?
All these have to be tested and empirically established through longitudinal studies over long periods of time, with the same set of people, in pre-set contexts, where there is no other potential interference. How long should the longitudinal period be? How many people should be studied? How do we control and isolate the contextual factors? All these become critical before we can claim that we can predict job, role or leadership success based on a set of behaviours or what we call psychometric tools.
My proposition is not that we should rely on our current subjective judgmental approach and not use any tools.
We currently blindly use behaviour profiling tools for behaviour measurement and prediction. We use it irresponsibly for selecting and rejecting.
Instead, if we accept that profiling tools do not offer measurement or prediction of outcomes, but are more like tools that generate data to support decisions, we will make better use of them.
Behaviour profilers are not like pathological tests which are truly measurement-based and predictive. They are not 3D imaging tools, which accurately capture all the human features. These are at best thumbnail sketches of people. This is not biometrics. This is like using your hand instead of a thermometer to check for fever. These are not calibrated maps, but a pencil sketch on a piece of paper showing the general route to be taken.
With this perspective, profilers will not create confusion whether the person whose profile we are studying has the characteristics of a Mike Tyson or a Mother Teresa. These tools cannot predict that all aggressive people will create Tyson-like outcomes in a boxing ring or outside of it. Or that all people with compassion will end up like Mother Teresa. More importantly, it cannot tell you how much of aggressiveness or compassion creates Tyson-like or Mother Teresa-like outcomes.
Aggression and compassion, like all behaviours, are neutral. Context and the magnitude of these behaviours determines whether these behaviours are appropriate or not.
Hence, we should stop claiming that behaviour profiling tools can predict performance. They can be used as data gathering tools for making someone aware of their orientations. The output from profilers is valuable for conversations about a person’s development, to create insights for them on the consequences of their behavioural orientation in different contexts. Profilers help us reconcile our self-image with how others experience us. This is the best value that profilers deliver to us.
If used as a tool to support decisions, for collecting qualitative data and for development purposes, profilers will add immense value. But to connect them with any degree of confidence to performance outcomes or leadership success, and use them as assessment tools is irresponsible.