Classification | A Statistical Approach

Classification
Computer Softwares StatisticsLeave a Comment on Classification | A Statistical Approach

Classification | A Statistical Approach

In statistics, classification is a technique that assigns an observation (or observations) to a set of categories (subpopulations). For example, you can assign a specific email to the class “spam” or “non-spam”. Obviously, this diagnosis is based on previously observed features (examples). In this text, we want to deal with classification with a statistical approach.


Classifying in statistics or Classification

One of the types of variables is related to the class or category to which the observations belong. For example, when we have an information worksheet based on income, job position, etc., along with the gender of the employees, we can guess whether he is a woman or a man, according to the income and job status, with the help of the relationships established between these variables. In fact, descriptive variables are known as descriptive variables or characteristics. These variables may be variously nominal (eg ‘A’, ‘B’, ‘AB’ or ‘O’, for blood group), ordinal (eg ‘large’, ‘medium’ or ‘small’). , be quantitative or integer (for example, the number of occurrences of a certain word in an email) or some real number (for example, a blood pressure measurement). Such variables play an important role in classifiers. Each new observation is compared to previous observations using a function similarity or distance corresponds to one of the categories or classes.

Classification algorithms

The algorithm that implements the classification is known as a classifier. The term “classifier” sometimes also refers to the mathematical operation performed by a classification algorithm that assigns the input data to a group. In statistics, classification is often done with logistic regression or a similar method. In this case, the observed characteristics are called descriptive variables (or independent variables, regressors, etc.) and the categories to be predicted are known as outcomes. These categories are referred to as dependent variable values in machine learning, which are obtained by a sampling operation. Descriptive variables are called characteristics and are used to identify these categories or classes.

Classification and clustering

Classification and clustering are examples of more general problems called pattern recognition, which consists of assigning an output value to one or more input values. Examples in this field are the regression technique, which creates an output associated with a set of inputs during a process. This output may be labeling each observation (clustering) or adding a new member to existing classes based on existing examples (classification). Both of these issues are related to machine learning, the first is unsupervised learning and the second is learning with opinions. A common class of classification algorithms is probabilistic classification. Algorithms in this class use statistical inference to find the best class for a given instance. Unlike other algorithms that simply produce a “best” class, probabilistic algorithms produce a value for each observation as the probability of belonging to a class, with the desired class being selected based on the highest probability. classification

Several classification algorithms

In the class of linear classification algorithms (Linear Discriminant Function), we can mention things like the following.

  • Logistic regression and Multinomial logistic regression
  • Probit regression
  • The perceptron algorithm
  • Support vector machines
  • Linear discriminant analysis.

 

Loading

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top