A Dataset for Studying
Biometric Identification on
Heavily Degraded Data

The BioHDD Dataset

A Dataset for Studying Biometric Identification on Heavily Degraded Data.

101 Participants

2 Sessions

10 Noise Factors

Visible + NIR

The purpose of this dataset is to deliver biometric data under demanding imaging conditions, allowing to evaluate both existing and new biometric methods based on facial traits. It contains data captured from 101 subjects, simultaneously on the visible and near-infrared spectrum, under four straining factors: illumination angle; head revolution; head tilting and camera exposure.

The imaging framework was installed in a dark lounge, deprived of any exterior light source. Data was captured under 8 different illumination angles and participant revolutions, on 45° steps. For both acquisition phases, volunteers were captured facing forward, tilting their heads up and down, and under three exposure levels. Apart from that, ground truth high-quality frontal and side images were captured for each participant.

Two distinct image acquisition sessions were performed, separated by at least two week interval. Volunteers were picked randomly and are at large majority young Caucasian people, approximately 2/3 men and 1/3 women. Images were manually cropped to a rectangular format (600 x 600 pixel), centered on the head.

Degradation factors introduced in the acquisition stage:

Illum. Angle

Participants' heads were imaged while illuminated from different angles, covering all 360° at 45° steps.

Illum. Intensity

Photos where taken with different illumination intensities, ranging from 5% to 100%.


Subject revolution was also introduced over eight angles, in a similar manner to the illumination angle variation.

Head Tilt

Three different levels of head-tilting were registered, with participants facing forward, up and down.

Additional degradation procedures:


To mimic the issues associated with inappropriate lens settings, poor focus, subject movement, etc., Gaussian filtering was used with σ ranging from 5 to 20.


Face occlusion were simulated overlapping a black patch to the original image, covering 15% to 20% of the picture.

Rev. Occlusion

Reverse Occlusion is different flavor of the occlusion degradation, where only a small portion of the image (20% to 5%) is left visible.


Related with low or insufficient spatial resolution devices, or post-processing censorship, was obtained by downscaling the original photo (100x100px to 25x25px).

Storage/Transmission related degradation:


The compression degradation that can be found on systems relying on digital storage or broadcasting was simulated using a standard JPEG compression algorithm with low quality settings (20% to 5%).

White Noise

Based on the same reasoning, the issues associated with storage on photographic film or broadcasting through analog channels were simulated adding white noise.


Please cite the following paper if you use this dataset.

Paper 1
Gil Santos, Paulo T. Fiadeiro and Hugo Proença;
BioHDD: A Dataset for Studying Biometric Identification on Heavily Degraded Data,
IET Biometrics, ISSN 2047-4938, DOI: 10.1049/iet-bmt.2014.0045, 2014.

Substantial efforts have been put into bridging the gap between biometrics and visual surveillance, in order to develop automata able to recognise human beings ‘in the wild’. This study focuses on biometric recognition in extremely degraded data, and its main contributions are three-fold: (1) announce the availability of an annotated dataset that contains high quality mugshots of 101 subjects, and large sets of probes degraded extremely by 10 different noise factors; (2) report the results of a mimicked watchlist identification scheme: an online survey was conducted, where participants were asked to perform positive and negative identification of probes against the enrolled identities. Along with their answers, volunteers had to provide the major reasons that sustained their responses, which enabled the authors to perceive the kind of features that are most frequently associated with successful/failed human identification processes. As main conclusions, the authors observed that humans rely greatly on shape information and holistic features. Otherwise, colour and texture-based features are almost disregarded by humans; (3) finally, the authors give evidence that the positive human identification on such extremely degraded data might be unreliable, whereas negative identification might constitute an interesting alternative for such cases.


The BioHDD Dataset provided here is for non-commercial research/educational use only.

Access request to the BioHDD database must be directed (by email) to one of the e-mails in the contact section.
Applicants should manually fill, sign, scan and attach the application form to the given email address.
Upon receipt of an executed copy of the signed application form, access instructions will be given.
For faster processing of your request, please be sure to use your institutional e-mail to send the form.


High-quality registration data of all participants. Contains three images per participant: frontal, left and right mugshots.

Visible Data

Archive with the visible data acquired from all participants, with all the degradation factors introduced at the acquisition stage.

NIR Data

Archive with the counterpart NIR data, acquired simultaneously with the visible data set.


Videos of the subjects walking over a corridor while being affected by multiple degradation factors.

Contact Us

Want to get in touch? Come see our lab, or drop us an e-mail.

Our Location

SOCIA Lab - Soft Computing and Image Analysis Lab
Department of Computer Science
University of Beira Interior
Rua Marquês D'Ávila e Bolama
6201-001 Covilhã (Portugal)


Gil Santos

Hugo Proença


This work could not have been accomplished without the support from our sponsors.