Classification | A Statistical Approach

In statistics, classification is a technique that assigns an observation (or observations) to a set of categories (subpopulations). For example, you can assign a specific email to the class “spam” or “non-spam”. Obviously, this diagnosis is based on previously observed features (examples). In this text, we want to deal with classification with a statistical approach.

Classifying in statistics or Classification

One of the types of variables is related to the class or category to which the observations belong. For example, when we have an information worksheet based on income, job position, etc., along with the gender of the employees, we can guess whether he is a woman or a man, according to the income and job status, with the help of the relationships established between these variables. In fact, descriptive variables are known as descriptive variables or characteristics. These variables may be variously nominal (eg ‘A’, ‘B’, ‘AB’ or ‘O’, for blood group), ordinal (eg ‘large’, ‘medium’ or ‘small’). , be quantitative or integer (for example, the number of occurrences of a certain word in an email) or some real number (for example, a blood pressure measurement). Such variables play an important role in classifiers. Each new observation is compared to previous observations using a function similarity or distance corresponds to one of the categories or classes.

Classification algorithms

The algorithm that implements the classification is known as a classifier. The term “classifier” sometimes also refers to the mathematical operation performed by a classification algorithm that assigns the input data to a group. In statistics, classification is often done with logistic regression or a similar method. In this case, the observed characteristics are called descriptive variables (or independent variables, regressors, etc.) and the categories to be predicted are known as outcomes. These categories are referred to as dependent variable values in machine learning, which are obtained by a sampling operation. Descriptive variables are called characteristics and are used to identify these categories or classes.

Classification and clustering

Classification and clustering are examples of more general problems called pattern recognition, which consists of assigning an output value to one or more input values. Examples in this field are the regression technique, which creates an output associated with a set of inputs during a process. This output may be labeling each observation (clustering) or adding a new member to existing classes based on existing examples (classification). Both of these issues are related to machine learning, the first is unsupervised learning and the second is learning with opinions. A common class of classification algorithms is probabilistic classification. Algorithms in this class use statistical inference to find the best class for a given instance. Unlike other algorithms that simply produce a “best” class, probabilistic algorithms produce a value for each observation as the probability of belonging to a class, with the desired class being selected based on the highest probability.

Several classification algorithms

In the class of linear classification algorithms (Linear Discriminant Function), we can mention things like the following.

Logistic regression and Multinomial logistic regression
Probit regression
The perceptron algorithm
Support vector machines
Linear discriminant analysis.

February 15, 2025General Statistics

Anomaly detection | Sort and Rank techniques

In most cases, there are complex and specific algorithms for abnormal or anomaly detection that have been implemented through various software in statistical or data analysis. However, in this article from the Arman Computer magazine, we aim to discuss the application of sorting and identifying abnormal observations, which, in addition to being simple, is also […]

January 16, 2025General Statistics

correlation coefficient | Inner product of vectors

The correlation coefficient or more precisely, the Pearson correlation coefficient, is a bivariate statistical descriptive statistics that examines the linear dependency between two vectors of observations. Given the range [1,-1] that the value of the Pearson correlation coefficient can have, it is very close to the value of trigonometric functions, especially the cosine function. Therefore, […]

January 6, 2025January 9, 2025General Statistics

Mean Formula and its foundation

In other articles from Arman Computer Magazine, we have examined how to calculate the average and weighted mean. But in this text, we want to discover the average or mean formula and learn how to derive its formula. I try to mention the formula syntax less, but sometimes it is necessary to use the syntax […]

Classification | A Statistical Approach

Classification | A Statistical Approach