In the present era of technology, security of data is a very important concept. The information technology development has transformed the way data is processed and nearly all of real life data is now processed electronically. This has resulted in a tremendous amount of digital information being generated on a daily basis and led to the emergence of a very popular technology and concept known as Big Data. Big Data has been utilized by large companies, e-commerce giants as well as government projects. There can be many sources of big data which includes the data arising from online transactions, pictures, posts, videos, audios, emails, medical records, search interrogations, science applications and social networking interface. Another source of big data includes the data resulting from the transfer management of devices such as smart phones.
So what actually is Big Data? The term “Big Data” is used to describe any massive amounts of structured, semi-structured or unstructured data. Big Data essentially means data that is large in volume. Companies like Amazon, Facebook, Google, Twitter and government projects like digital identity cards maintain large volumes of data. All these entities maintain a large volume of user database and it continues to grow. Therefore, security of big data is considered to be very important and a lot of research work is actively conducted to tackle the security issues in big data projects. The use of biometric technologies in big data projects will help to create a secure database.
The emergence and the security challenges of big data
Until 2010, the term “Big Data” was quite unknown but today it is touted as the latest technology trend. Its role as a factor in production, market competitiveness and growth is continuously increasing and big data is now being adopted by everyone from product vendors to large scale outsourcing and cloud service providers. Big Data is making inroads into all areas of our digital life and transforming our day-to-day online activities.
The big data concept is based on extracting business value from a high volume, variety and velocity of data in a timely and cost-effective manner. This 3-v model is used by most analytics to define big data. The first aspect is volume and includes the large quantities of data that need to be harnessed to improve decision making across the organization. Variety involves handling the complexities of a range of new and emerging data sources such as the data generated from social media, location data from smartphones, public data available online etc. The third aspect is velocity that refers to the speed at which the data is disseminated or the speed at which the data gets updated or refreshed during cyclical variations.
A number of challenges must be overcome to reap the benefits of big data. As big data handles large amounts of data with varying data structures and real-time processing, the most important challenge is to maintain data security and adopt proper data privacy policies. There is an urgent need to implement strict mechanisms that ensure data security as well as conform to the rising quality expectations of the involved stakeholders.
The risks to big data implementation should not be underestimated. A big data environment consists of any number of stakeholders compiling, storing and analyzing data for any number of different reasons. Thus there is a strong need to prevent misuse of data so that people’s trust in digital channels is not broken. Biometrics is one such strategy that can respond to this digital revolution and reduce security risks significantly.
How can biometrics help to secure big data?
We will highlight in this section how biometrics can be leveraged to secure big data. We will also discuss the process of biometric scan and the technology that is used in big data projects. The recent years have seen a steady increase in the number of computer and cyber-crimes and it has become a serious problem for the government, public and businesses. In order to counteract these cyber-crimes, it is crucial to capture digital evidence and thus the focus is on how to obtain such evidence. However, users do not host their data themselves and it has become increasingly popular among users to use a third party data service provider to store their data and emails. In some cases, a large server could also be shared among many different users which increase the difficulty in capturing data for investigation.
The problem is further amplified in a big data environment. Big Data technology stores data in a distributed manner that may involve a large number of servers and storage devices. Furthermore, these storage devices could be remote as well and therefore traditional forensic technique may not be applied very easily. For example, the huge volume of data and the distributed manner of the storage devices makes it extremely difficult to clone a copy of data from the storage devices. So this introduction of big data technology and the need to move such information throughout an organization has also exposed a lot of vulnerabilities. This influx of big data that is valuable to organizations has now become a massive target for hackers and other cyber criminals.
This data which was earlier unusable is now valuable to organizations. Hence such data is subjected to privacy laws and compliance regulations and must be protected. Biometrics offers the highest form of security, accuracy and privacy due to the very fact that it is based on the inherent characteristics of individuals. These characteristics may include iris, fingerprints, voice etc. and acts as a strong deterrent to hacking attacks. Biometric traits are extremely difficult to replicate and is the most accurate method known to verify individual identity. We can thus conclude that biometrics can play a vital role to maintain privacy and provide a highly secure environment for big data projects.
What biometric technologies can be used to enhance security in big data projects?
To understand how different biometric modalities can be used in big data projects, we will look at one of the most ambitious big data projects. The Aadhar project is a combination of big data and biometrics that is working to build an identity verification database of the billion Indian residents. This is the world’s largest biometric database and aims to provide resident with a unique identification number that will help them access government welfare and services. This project uses a combination of iris scan and digital fingerprints for each resident.
Let us understand how retinal scan works in big data projects. Iris technology is the primary technology behind retinal scan and the whole scanning process is subdivided into three key processes namely image/ signal gaining and dispensation, picture and identical process.
The first stage of retinal scan is image acquisition in which the user spots his or her eye adjacent to the lens and also must remain perfectly still at this point. The user’s retinal image will be captured and thereafter converted to the desired digital format. In the picture phase, unique feature of the retina is extracted and stored in a template database which contains the unique information for each user. In the identical process, the existing template is matched with the current data captured by the sensors. If there is match between both the scans, the user is authenticated successfully.
Fingerprint scan is a very widely used biometric technique and plays a key role in maintaining the privacy and security in various industries. Fingerprint biometrics has also been extremely helpful in the field of forensic science to identify criminals. In our example of big data project i.e. Aadhar, fingerprint biometric is used to store the information of each citizen by scanning their fingers and thumbs for future references.
The fingerprint scanning process that is used in Aadhar big data project has two parts – enrollment and authentication. A common database is maintained in both these key processes.
The enrollment process captures snapshots of user’s fingerprints, extracts the special features of fingerprints from these snapshot images and stores this information in a database. A fingerprint sensor is used to capture the images along with a feature extraction function. The feature extraction works like encryption for each object and the data of each object is associated with a unique id and stored in the database.
The main key process in authentication is matching of data objects. When an already enrolled user of the Aadhar scheme wants to access a service, he or she scans his fingerprint. This information again goes to the feature extraction phase and is decrypted which then gets checked in the existing template database by the matching function. If a match is found with the current fingerprint scan, the user is successfully authenticated.
In this article, we have discussed how biometric devices and technologies can be used in big data projects for improving security and privacy. Fingerprint and retina scan are the popular biometric technologies that are adopted by many organizations to maintain their user data.