Sagot :
We have set out the 5 most common types of bias:
1. Confirmation bias
Occurs when the person performing the data analysis wants to prove a predetermined assumption. They then keep looking in the data until this assumption can be proven. E.g. by intentionally excluding particular variables from the analysis. This often occurs when data analysts are briefed in advance to support a particular conclusion.
It is therefore advisable to not doggedly set out to prove a predefined conclusion, but rather to test presumed hypotheses in a targeted way.
2. Selection bias
This occurs when data is selected subjectively. As a result, the sample used is not a good reflection of the population. This error is often made in surveys. Frequently, there is also selection bias in customer panels: The customers that you (easily) find willing to participate in a customer panel are far from being “average customers”.
This too can be done deliberately or unwittingly. Just look at opinion polls in elections: Can it really be true that so many voters completely change their mind on the last day, or is it more likely that the sample on which the poll is based is not a good reflection of all the voters?
So you should always ask what sort of sample has been used for research.
3. Outliers
An outlier is an extreme data value. E.g. a customer with an age of 110 years. Or a consumer with €10 million in their savings account. You can spot outliers by inspecting the data closely, and particularly at the distribution of values. Values that are much higher, or much lower, than the region of almost all the other values. Outliers can make it a dangerous business to base a decision on the “average”. Just think: a customer with extreme spending habits can have a huge effect on the average profit per customer. If someone presents you with average values, you should check whether they have been corrected for outliers. For example, by basing the conclusions on the median – the middle value.
4.Overfitting en underfitting
Underfitting means when a model gives an oversimplistic picture of reality. Overfitting is the opposite: i.e. when the model is overcomplicated. Overfitting risks causing a certain assumption to be treated as the truth whereas in practice it is actually not the case. Always ask the data analyst what he or she has done to validate the model. If the analyst looks at you with a rather glazed expression, there is a good chance that the outcomes of the analysis have not been validated and therefore might not apply to the whole database of customers. Always ask the data analyst whether they have done a training or test sample.
5. Confounding variabelen
If the research results show that when more ice creams are sold more people drown, ask whether they have checked for what are known as confounding variables. In this case, the confounding variable will be the temperature. If the weather is hotter, people will eat more ice cream and more people will go swimming. This is likely to result in more drownings than on a cold day.