Day by day, we are aware of the impact of data science and the use of data mining techniques in the analysis of our sensory perceptions in our daily life. In the article we paid about the difference between data reporting and data analysis. In this regard, a semi-real story (maybe even real) was formed in my mind, based on which I would show the difference in the way data is collected and the difference in the results of data mining.
Data Mining and a playbook
My daughter studies in the second grade of primary school and has art and sports classes on Wednesdays. Since I am with her during the time of air pollution and holding virtual classes, I tried to help her in her activities. The subject of coloring was a drawing. My daughter brought her box of crayons. I saw that she has about 100 colored pencils, which are maybe left over from three or four boxes of colored pencils. To make it easier for us to find each color, we decided to sort the pencils by size. In this way, we made a row of hundreds from the shortest colored pencil to the longest. The interesting thing was that blue and green pencils were shorter than others and white, black, pink and red were longer than others. It was here that I remembered machine learning algorithms and said to myself that it would be better to produce a rule or rule from the collected data. I reviewed the steps of data mining in my mind.
Steps of data mining
- Data Cleaning
- Data Integration
- Data Selection
- Data Transformation
- Data Mining
- Pattern Evaluation
- Presentation of knowledge
It is clear that collecting and sorting the colored pencils constitutes steps 1 to 4. In the data exploration stage, considering the shortness of the green and blue colored pencils, I notice that these two colors are used more. So the pattern that comes to my mind is that my daughter likes these two colors more than others. In order to evaluate the resulting pattern, I asked her, “How interesting, I just noticed that you like the two colors blue and green more than all the other colors!” But she replied to me: “The color of the sky is blue and the greens of the mountains are green, and since I mostly draw mountains and the sky, these colored pencils have been used more. But I like pink color more than other colors and that’s why I make the girls’ clothes pink in the paintings.” In the final step, i.e. presentation of knowledge, I was completely confused and realized that behind the collected data, we should also look for their story. This example shows that sometimes we should not take the results of data mining seriously and it is better to pay more attention to the story of the data and consider the field of activity.