Liu S, Guo S, Wang W, et al. Multi-view laplacian eigenmaps based on bag-of-neighbors for RGB-D human emotion recognition[J]. Information Sciences. 2020, 509: 243-256. [中科院Q1, CCF-B, IF=8.1]

1. Contributions
2. Motivations
3. Multi-view laplacian eigenmaps
4. Two new datasets
- 4.1. RGB-D video-emotion dataset
- 4.2. RGB-D face-emotion dataset
5. Evaluation

1. Contributions

A new nonlinear supervised multi-view learning method MvLE is proposed to perform the RGB-D human emotion recognition.
Multi-hidden-layer out-of-sample network (MHON) is the out-of-sample extension of MvLE.
As a new distance metric method, bag of neighbors (BON) can not only overcome the difference between views, but also introduce the category discriminant information.
We introduced a new video-emotion RGB-D dataset and a new face-emotion RGB-D dataset to evaluate the proposed method.

2. Motivations

There are few nonlinear subspace learning methods proposed.
Most of existing two-view learning methods are difficult to extend to multi-view.
Label information is not utilized for the unsupervised multi-view learning methods.
Most of supervised methods aim only at optimizing the correlation of classes or views.
Most of existing human emotion dataset focus only on the RGB view.

3. Multi-view laplacian eigenmaps

3.1. Bag of Neighbors

As the distance between samples of different views are difficult to metric, we represent each sample with a bag of neighbors (BON) vector. Let \(BON_a = [x_1, x_2, \cdots, x_c]\) denotes the BON vector of sample \(a\), where \(x_t\) denotes the number of samples which are labeled as class \(t\) in the \(K\)-nearest neighbors of sample \(a\).

3.2. Weighted graph of all views

A weighted graph of all views is construct based on BON, which is composed by weighted graphs between each two views, indicating inter-view or intra-view similarity measures. Whether samples are connected or not depends on labels of they and their neighbors. Then the optimization target is formulated as:

\[\begin{align}\nonumber \begin{split} \mathop{\min\limits_{Y}}\frac{1}{2}\sum_{a,b}\|y_a - y_b\|_2^2W_{ij}& = \mathop{\min\limits_{Y}}tr(Y^TLY) \\ s.t. \ Y^TDY &= I \\ D_{kk} = \sum_{j=1}^{n_i}{W_{jk}},&\ L=D-W \end{split} \end{align}\]

3.3. Multi-hidden-layer Out-of-sample Network

MNON is the out-of-sample extension of MvLE. MHON is trained on the original distributions of RGB-D views and their labels. In the guiding layer of MHON, low-dimensional representations of training set is used as the leading information and feed forward.

4. Two new datasets

4.1. RGB-D video-emotion dataset

The video-emotion dataset consists of over 4k clips of RGB videos and 4k clips of depth videos that are corresponding.

4.2. RGB-D face-emotion dataset

The face-emotion dataset includes about 1k RGB face emotion images and 1k depth face emotion images that are corresponding.

5. Evaluation

Experiments on the two RGB-D human emotion dataset indicate the effectiveness of MvLE.