Logistic Regression

Data and goal

Given a dataset of $N$ observations $\{\mathbf{x}_n, y_n\},\; y_n \in C_k, n = 1,\dots,N$. Where $C_k$ is the set of $K$ discrete classes.

The goal is to take an input vector $x$ and to assign it to one of these $K$ classes. For binary classification $C_k = \{-1, +1\}$.

Intuition behind Logistic Regression

Let us think of predicting the probability of object to belong to positive class $p_+ = p\left(y_i = 1 \mid \mathbf{x}\right)$.

We can not use linear regression directly to predict this value since the probability is a real number between 0 and 1.

A few observations:

$\frac{p_+}{1 - p_+} \in [0, +\infty)$ is the chance of assigning an example to the class “+”
$\log(\frac{p_+}{1 - p_+}) \in (-\infty, +\infty)$

It means we can predict the value of $\log(\frac{p_+}{1 - p_+})$ using linear regression model.

Combining this expression with the equation of linear regression we get \[ \begin{equation} \log(\frac{p_+}{1 - p_+}) = \mathbf{w}^T\mathbf{x} \end{equation} \] and \[ \begin{equation} p_{+} = \frac{1}{1 + \exp^{-\mathbf{w}^T\mathbf{x} }} = \sigma(\mathbf{w}^T\mathbf{x}) \end{equation} \]

More general, we rewrite last expression as the following:

\[ \begin{equation} p\left(y = y_i \mid \mathbf{x}_i, \mathbf{w}\right) = \sigma(y_i\mathbf{w}^T\mathbf{x}_i) \end{equation} \]

Maximum Likelihood Estimation

Assuming that the objects in our data set are i.i.d. the likelihood of the data set can be written \[ \begin{equation} P\left(\mathbf{y} \mid \mathbf{X}, \mathbf{w}\right) = \prod_{i=1}^{N} p\left(y = y_i \mid \mathbf{x}_{i}, \mathbf{w}\right), \end{equation} \]

\[ \begin{split} \log P(\mathbf{y} \mid \mathbf{X}, \mathbf{w}) &= \log \prod_{i=1}^{N} p(y = y_i \mid \mathbf{x}_{i}, \mathbf{w}) \\
&= \log \prod_{i=1}^{N} \sigma(y_i\mathbf{w}^{T}\mathbf{x}_i) \\
&= \sum_{i=1}^{N} \log \sigma(y_i\mathbf{w}^{T}\mathbf{x}_i) \\
&= \sum_{i=1}^{N} \log \frac{1}{1 + \exp^{-y_i\mathbf{w}^{T}\mathbf{x}_i }} \\
&= -\sum_{i=1}^{N} \log (1 + \exp^{-y_i\mathbf{w}^{T}\mathbf{x}_i }) \\
\end{split} \]

That gives us the logistic loss

\[ \begin{equation} E_{\text{logistic}} = \sum_{i=1}^{N} \log (1 + \exp^{-y_i\mathbf{w}^{T}\mathbf{x}_i }) \end{equation} \] where $y_i \in \{-1, +1\}$.

Note: Logistic regression model predicts calibrated probabilities. Predicted probabilities that match the expected distribution of probabilities for each class are referred to as calibrated.

Metrics

Link

Logistic Regression

Data and goal

Intuition behind Logistic Regression

Maximum Likelihood Estimation

Metrics

Links