Section 5.5 Stochastic Optimization of Compositional OCE

The goal of this section is to present and analyze stochastic algorithms for solving compositional OCE (COCE) risk minimization as introduced in Chapter 2. In particular, we consider the following abstract problem:

\[\begin{align}\label{eq:reform_eq} &\min_{\mathbf{w}\in\mathbb{R}^d,\boldsymbol{\nu}\in\mathbb{R}^n} F(\mathbf{w},\boldsymbol{\nu}):= \frac{1}{n}\sum_{i=1}^n F_i(\mathbf{w},\nu_{i}), \end{align}\]

where

\[F_i(\mathbf{w},\nu_{i}) = \mathbb{E}_{\zeta\sim \mathbb{P}_i}[\Phi_i(\mathbf{w},\nu_{i};\zeta)],\quad \Phi_i(\mathbf{w},\nu_{i};\zeta)=\tau\phi^*\bigg(\frac{s_i(\mathbf{w};\zeta) - \nu_{i}}{\tau}\bigg) + \nu_{i},\]

where \(\tau>0\) is a constant.

In the special case when \(\phi^*(\cdot)=[\cdot]_+/\alpha\) for some \(\alpha\in(0,1)\), the general COCE minimization problem reduces to

\[\min_{\mathbf{w},\boldsymbol{\nu}}F(\mathbf{w},\boldsymbol{\nu}):=\frac{1}{n}\sum_{i=1}^n\mathbb{E}_{\zeta\sim\mathbb{P}_i}\frac{[s_i(\mathbf{w}; \zeta)-\nu_i]_+}{\alpha} + \nu_i.\]

We refer to this problem as the compositional CVaR minimization (CCVaR) problem. The direct one-way partial AUC optimization problem can be reformulated as an instance of CCVaR minimization as shown in Sec. 6.4.

In the special case when \(\phi^*(\cdot)=\exp(\cdot)-1\), the problem (\(\ref{eq:reform_eq}\)) is equivalent to:

\[\min_{\mathbf{w}}F(\mathbf{w}):=\frac{1}{n}\sum_{i=1}^n\tau\log\left(\mathbb{E}_{\zeta\sim\mathbb{P}_i}\exp\bigg(\frac{s_i(\mathbf{w}; \zeta)}{\tau}\bigg)\right).\]

We refer to this problem as the compositional entropic risk minimization (CERM) problem. The cross-entropy loss for multi-class classification, the listwise cross-entropy loss for ranking, the indirect one-way partial AUC loss for imbalanced classification, and the contrastive losses for representation learning discussed in Chapter 2 are all instances of the CERM problem. In particular, for cross-entropy loss minimization, the proposed framework becomes especially relevant when the number of classes is very large, so that the normalization term in the loss cannot be computed efficiently. This setting naturally motivates the stochastic algorithms developed in this section.

Although we can cast the CERM problem into a special instance of FCCO, there remain some gaps to be filled. (i) For the convex CERM problem with a convex loss function \(s_i(\cdot;\zeta)\), the ALEXR algorithm and its convergence analysis are not directly applicable, since the outer function \(f(\cdot)=\tau\log(\cdot)\) is not convex, as required by ALEXR. Consequently, a convergence rate of \(O(1/\epsilon^2)\) for solving convex CERM remains to be developed. (ii) For the CCVaR problem, the optimal solution of \(\boldsymbol{\nu}\) given \(\mathbf{w}\) is typically difficult to derive in closed form, and hence the problem cannot be reduced to an instance of FCCO. As a result, previous analyses for FCCO do not directly apply. We address these gaps in this section.

← Go Back