Multi-view laplacian eigenmaps based on bag-of-neighbors for RGB-D human emotion recognition

Liu S, Guo S, Wang W, et al. [J]. Information Sciences, 2020. 509: 243-256. [中科院Q1, CCF-B, IF=8.1]


1. Contributions

  • A new nonlinear supervised multi-view learning method MvLE is proposed to perform the RGB-D human emotion recognition.
  • Multi-hidden-layer out-of-sample network (MHON) is the out-of-sample extension of MvLE.
  • As a new distance metric method, bag of neighbors (BON) can not only overcome the difference between views, but also introduce the category discriminant information.
  • We introduced a new video-emotion RGB-D dataset and a new face-emotion RGB-D dataset to evaluate the proposed method.

2. Motivations

  • There are few nonlinear subspace learning methods proposed.
  • Most of existing two-view learning methods are difficult to extend to multi-view.
  • Label information is not utilized for the unsupervised multi-view learning methods.
  • Most of supervised methods aim only at optimizing the correlation of classes or views.
  • Most of existing human emotion dataset focus only on the RGB view.

3. Multi-view laplacian eigenmaps

3.1. Bag of Neighbors

As the distance between samples of different views are difficult to metric, we represent each sample with a bag of neighbors (BON) vector. Let \(BON_a = [x_1, x_2, \cdots, x_c]\) denotes the BON vector of sample \(a\), where \(x_t\) denotes the number of samples which are labeled as class \(t\) in the \(K\)-nearest neighbors of sample \(a\).

3.2. Weighted graph of all views

A weighted graph of all views is construct based on BON, which is composed by weighted graphs between each two views, indicating inter-view or intra-view similarity measures. Whether samples are connected or not depends on labels of they and their neighbors. Then the optimization target is formulated as:

\[\begin{align}\nonumber \begin{split} \mathop{\min\limits_{Y}}\frac{1}{2}\sum_{a,b}\|y_a - y_b\|_2^2W_{ij}& = \mathop{\min\limits_{Y}}tr(Y^TLY) \\ s.t. \ Y^TDY &= I \\ D_{kk} = \sum_{j=1}^{n_i}{W_{jk}},&\ L=D-W \end{split} \end{align}\]

3.3. Multi-hidden-layer Out-of-sample Network

MNON is the out-of-sample extension of MvLE. MHON is trained on the original distributions of RGB-D views and their labels. In the guiding layer of MHON, low-dimensional representations of training set is used as the leading information and feed forward.


4. Two new datasets

4.1. RGB-D video-emotion dataset

The video-emotion dataset consists of over 4k clips of RGB videos and 4k clips of depth videos that are corresponding.

4.2. RGB-D face-emotion dataset

The face-emotion dataset includes about 1k RGB face emotion images and 1k depth face emotion images that are corresponding.


5. Evaluation

Experiments on the two RGB-D human emotion dataset indicate the effectiveness of MvLE.