Zain Khalid

Full Stack Web & Android Developer

Ubicuo

Introduction

This was my final year project at university. The project centered around facial detection and facial recognition, two sub-areas of computer vision. With this core, system for automating attendance at our university was developed. As an additional verification measure to facial recognition, beacons were incorporated into the project to make sure that the student would be physically present in class.

A complete system was built in terms of providing a database of students, and applications that provided a user-level view of what is happening in the background. Additionally, an administration program was also created for creating classes easily, and adding students to those classes. The project's aim initially was to implement a session-based system based on classes that would extract facial features of students enrolled in given classes at appropriate times and record this on a database that could then be viewed by teachers or students.

The primary objectives were to streamline attendance and automate it as much as possible, lessen the load the work of teachers, and to stop students from cheating attendance by tracking their movements during class times in an unobtrusive manner.

Justification

This project was chosen firstly because the team had no previous exposure to computer vision and so we felt this idea would be a good opportunity to be introduced to it and secondly, the project allowed the combination of many different areas of computer science that we learnt throughout our degree. These areas were:

  • Relational Databases - design & implementation;

  • Mobile application development (Android);

  • Web development

  • Socket programming to connect two or more programs together;

  • Object-Oriented Principles put into practice;

  • Low-Level Programming (threads & processes);

  • Distributed Computing - >2 machines working together to perform a task & fault tolerance;

  • Multiple programming languages - C/C++, Java, PHP etc. all integrated into one project

As can be seen from the list above, we re-visited almost all areas of computer science that were studied throughout our degree, additionally, we were able to put our knowledge into practice while also combining it together to make a complete product that could be commercially viable.

Scope

During development only a few people were used for testing the system, however scalability for a real-world project was an intention and the system was designed to scale accordingly. Obviously, scaling to a real-world context would require upgrading the hardware that we were using throughout the project which would involve its own integration. Additionally, we have built the system for universities, but we have designed it in such a way that it could be applied to multiple use-cases without too much hassle.

Software

A variety of software was used in the project to integrate all the different areas of it together to create a seamless system. Integration turned out to be one of the new skills we all had to learn because previously we only worked on these problems individually and integrating itself is tough because of the unavoidable errors that were encountered.

  • OpenCV - the computer vision library we chose to use for the development of our face detection and recognition programs. We researched many computer vision libraries before we decided on OpenCV, working out pros and cons based on our experiences and relevance to the task at hand. It was ultimately chosen due to its extensive documentation, open source, and helpful community;

  • MySQL-Connector/C++ - a library that was used minimally just to update the MySQL implemented database through C++ when the face recognition program was running, and to load sessions.

  • MAMP/XAMP - Mac & Windows, respectively, were used to locally host the server that contained our database on our respective machines. Additionally, the software was used to host the PHP scripts that the mobile application uses to access the database.

  • Android Studio - used to develop the mobile application, the default IDE used to develop for Android in Java.

  • WordPress - used to develop the website for students. It was used mainly as a base layout on which we built custom web pages.

  • C++ IDEs - various IDEs (platform-dependent) used to program the C++ applications for the project. IDEs used on macOS were Xcode and NetBeans 8.2, and Visual Studio 2015 for Windows.

Hardware

Multiple pieces of hardware were required to make a complete system, which will be discussed briefly below:

  • Servers - consists of two; the local and the remote. Each server is hosted on a different machine and communicates with the other in order to send relevant data that is used to conduct sessions for a class. Within the scope of the project, member laptops were used to serve as the servers.

  • Cameras - obvious requirement for the project in order to get a video stream, which is then broken down into individual frames used as input for face detection and recognition.

  • Beacons - used to broadcast a Bluetooth signal in a classroom which is picked up by smart phones with the application installed. The application can transmit a message to the remote server with unique data regarding the user.

  • Android Smartphones - provides main point of interaction for the base-user (student/teacher) with the system.

System Architecture

image-20200607222742483

The core components are the camera, beacon and the servers. Input is provided by the cameras & beacons while the management and data processing is done by the servers. Obviously, on a large scale, there will be multiple instances of all the components in the classroom which will all be managed by a central, remote server.

The following mentioned points discuss the basic flow of the overall system:

  • The camera has a wired connection to the local server to make feed transmission seamless (reduced latency) and for security reasons so the feed cannot be tampered with over the network;

  • Whenever a class is in session, the remote server creates a session on the database and signals the local server to start that session. This session is carried out for a set amount of time and all communication between the two servers for it takes place on a unique port number;

  • The yellow line represents the movement of facial data from the students to the remote server. The camera feed is directly sent to the server where the face detection takes place. Face detection allows the frames captured to be processed and then sent to the remote server where the images will be fed to the face recognition code for an output;

  • The green line represents the movement of beacon data. The primary data in this context is the broadcast to the students' smartphones. This captured signal can allow the application to send a message to the remote server which can then update the database based on the student's physical location.

Core Functionalities

The core functionalities of the system consist of all the computer vision functionality. This functionality is broken succinctly into 3 parts; face detection, face pre-processing and face recognition. Face detection and pre-processing take place on the local server. The final output of detection & processing serves as input to facial recognition which takes place on the remote server. Each of the core functionalities will be expanded on in detail below.

Face Detection

Face detection is the initial step in order to carry out the whole process facial recognition. At the very least, a basic stream is required through which faces can be detected accurately and, at the same time, quickly enough to deliver near real-time results. We looked at multiple algorithms during our research phase and eventually narrowed down to two algorithms that we felt would fit our system very well.

Viola Jones (VJ)

The primary algorithm we used for face detection is the Viola Jones object-detection framework. In OpenCV, the Viola Jones Framework is implemented by the name Haar Cascades. While the framework can be used for detection of any object, it's main intent was to cater to face detection.

image-20200607223419609

As can be seen from the image above, a Haar feature (black & white rectangles) is used to detect if a face is present by running the different features on an image and getting a single output which determines whether facial features have been found in the image or not. VJ uses a 24x24 window as a base for calculating features. This window moves throughout an image until it has calculated all the features across the whole image. The issue is too many features are calculated (160,000) which isn't adequate for real-time face detection. To counter this, integral images, Adaboost and cascades are used with the VJ Framework in order to significantly improve face detection.

  • With integral images, the Haar feature only computes output for the corners of a window instead of the whole window;

  • Adaboost reduces the feature-set to 2500 of the most accurate features;

  • Finally, cascading splits the search into smaller steps so that if a step fails, VJ doesn't have to continue searching a specific area thus reducing overall computation.

All this results in real-time face detection along with a high detection rate while computation is more than manageable. As an example, basic face detection without any processing runs at 93% of the base frame rate of one of our laptops; 6.96 FPS vs. a base of 7.44 FPS. However, there are two main downsides to VJ; first, it is highly sensitive to lighting conditions so it has to

be calibrated appropriately for whatever situation it'll be used in and second, heads rotated beyond 45º will not be detected as enough features won't be picked up.

For this reason, we decided to supplement VJ with another algorithm, Kanade-Lucas-Tomasi (KLT), that could account for head rotations and continue to track a rotating head for a limited time.

*It should be noted that it is unfair to expect VJ to initially pick up a face that rotated beyond 45º, even at 45º VJ only has half a face to work with.

image-20200608000928296 image-20200608001014462

A screenshot of basic face detection, P.S drawing a rectangle around the face takes more computing power than the face detection.

KLT

Optical flow is the pattern of motion of objects across subsequent frames. Optical flow is achieved by first implementing a corner detector (Shi-Tomasi Algorithm), and then combining the results of this algorithm with the Lukas-Kanade Algorithm to predict the movement of pixels across subsequent frames; this combination of the algorithms is called KLT. Optical flow operates on two basic assumptions:

  1. Pixel intensities don't change between consecutive frames;

  2. Neighboring pixels have similar motion

KLT achieves the basic principles of optical flow by taking a 3x3 patch around a given pixel and then calculates the next position of the 9 pixels. On a small scale this is sufficient, however when there is large motion (person walking into the room), tracking becomes tougher. Therefore, pyramids are used to build large motions out of smaller motions (pyramids analogous to cascades in VJ).

VJ + KLT Integration

Within the context of this project, the area within the rectangles drawn around detected faces are tracked for their motion. Corners are drawn on the face using Shi-Tomasi and then their movement is tracked using Lukas-Kanade.

Face Pre-processing

A crucial aspect of face recognition is having a set of images that are easy to work with. This is achieved through face pre-processing; the process of editing an image so that it suitable to work with during the process of recognition. Face pre-processing was done on the rectangle captured around the face and consists of 4 major steps:

  1. Geometrical transformation & cropping - Aligning the image so that the face is straight and cropping it so that only the essential features of the face are visible in the image. In the project, aligning was initially a major issue, but as it progressed, through the implementation of KLT and the use of an upright camera that wouldn't move during use, this issue was minimized.

    The appropriate cropping was achieved through trial & error, it was highly dependent on the camera being used, primarily the aspect ratio it was set to. The rectangle around the face was resized to remove any extra noise captured around the face to first, make recognition more accurate and second, to reduce the size of the image sent to the server.

  2. Histogram Equalization - Equalization of the image is the most important part of pre-processing as it makes all images that'll be worked on consistent in terms of their brightness and contrast. Therefore, corners detected on the images will be more similar as they rely on the differences in pixel intensity.

    While equalization seems simple enough, when it comes to equalizing the face there is an issue of one side of the face always coming out darker than the other. For this reason, the two sides of the face have to be first separated, then equalized gradually towards the center to achieve a consistent output.

  3. Smoothing - Reduction of the 'roughness' of an image is required to remove harsh details and pixel noise that would interfere with face recognition. This was a relatively simple step because all that is required is the application of an appropriate filter. The project used a bilateral filter as it gave a sufficient output.

  4. Elliptical Mask - Optional step to remove extraneous details by drawing a circular mask around the face. This step wasn't used in the project as it didn't significantly improve the results in face recognition, which made it redundant because of the additional processing on each image.

image-20200608002451984

A sample of an equalized image after going through the above 4 pre-processing steps

Face Recognition

Face Recognition is the heart of the project. It is essentially identifying and verifying people by their facial features using a computer application. Initially, a database that contains facial features is created. The project will use the database to create a classifier for verification in order to capture the students’ attendance. OpenCV comes with the three commonly used algorithms for face recognition – Eigenface, Fisherface, and Local Binary Patterns Histogram (LBPH). The three algorithms were tested for the project and the algorithm that fits the project best was LBPH.

Local Binary Pattern Histogram (LBPH)

Local Binary Patterns Histogram is a texture descriptor that transforms the image into an array that contains local features of the image. These arrays can be used for image analysis and are known as histograms. LPBH is widely used on monochrome images. It works by thresholding a 3x3-pixel block of the image, if the intensity of the center pixel is greater than its neighbors, it would signify as a 0 and 1 if it the neighbor is greater than or equal to the center pixel. It should also be noted that LBPH is illumination invariant, i.e. it does not get affected by light. The values of the pixel will increase but their relative differences would be the same.

image-20200608002825691

Basic LBP operator

image-20200608002928407

How LBPH works

Justification for LBPH

LPBH was picked as the algorithm fit for the project due to its ability to easily learn, compared to Eigenface and Fisherface. LBPH is able to receive a picture of a face, verifies if it is in the classifier, if it is, the picture will be saved and the classifier will be updated. In contrast, Eigenface and Fisherface needs to be assisted for its classifier to be updated. The project was designed with machine learning in mind. As stated above, LPBH is illumination invariant. Trying to control the light in a classroom would be difficult and these types of problems are best avoided.

Administration

The project needed an administrator program that manages and creates classifier of facial features of students who are enrolled in a specific class. Its main functionalities were to take pictures of the students, retrieve the list of students enrolled in a particular class (Lab or Tutorial) and create a face recognizer classifier for the given class. The classifier could then be used for comparison and take attendance.

image-20200608003313462

Administrator program's main menu

Registration

A student will have their pictures taken when enrolling at university. The set of pictures will be used throughout their time at the university, and can also be updated at later time if required. For the project, we have set the number of images to 15 and therefore 15 pictures of the student will have to be taken and stored. These pictures contain the facial features of the students.

A camera is set up along with the program. The process starts as the administrator inputs the students ID number. The camera will then turn on and starts a video stream. If a face is detected, the frame will undergo face preprocessing. This frame will be saved along with the student’s ID number. As mentioned, there will be 15 useable frames that will be stored.

Creating the Face Recognizer

One of the key functionalities of the administrator program is creating the face recognizer classifier. It retrieves the list of the classes and the list of all the students enrolled in these classes. For each and every student enrolled in a specific class, the pictures that contain facial features of a student, are retrieved and are then used to create a classifier for that class. This classifier will be used for verification when a class is taking place.

Server Programming

The server code holds all the face detection and recognition code and supplements it by connecting the two pieces of code together to make a well-integrated backend for the whole system. Discussed in detail below will be the remote server, which contains the code for face recognition, the administration, and the overall management of the database. Additionally, the local server will be also be discussed; it holds the code for face detection and face preprocessing.

Remote Server

The remote server can be thought of as the central component of the whole project. It coordinates with the local server to create and destroy sessions as scheduled. On starting, the server loads up the sessions for the given day and then checks the time every half an hour to check if a class is about to start. If a class(es) is about to start, the remote server generates a random port number between the range 20020 to 20100 and forks into a new process.

image-20200608003755937 image-20200608003805123 image-20200608003828361

The newly forked process creates a session ID, and based on this ID creates a new session entry in the database and also attendance entries for the students in that session. Following this, the local server is signaled by this process to start a new session on the same port number with the given session ID. With this, the remote and local server are set up for synchronous transfer of images from local to remote server. For as long as the local server is running, the process running on the remote server is also running. Whenever it receives an image, it has to convert it from an array of characters into an image which is then sent to the facial recognition code.

The facial recognition code returns the student ID as an output and if it hasn't recognized the image then it'll return a -1. The code is tested through an .XML file which has been created especially for the class through the administration program. The student ID is passed onto a function which'll update the specific attendance entry in the database. Once the session is completed, the process forked specifically for that session is exited while the original process continues to check the time every half an hour.

image-20200608004332745

Face found successfully!

The remote server also has a backup program with identical code but a different port number in case the original crashes. Before the crash occurs, the original server will execute the running of the backup remote server.

Local Server

The local server was originally thought to be quite simple. However, during implementation it was realized that it would be similar to the remote server in having to listen for a signal by the remote server to start a new session of activating the cameras and then start sending images of detected faces to the remote server.

image-20200608004627842 image-20200608004658469

As discussed in the previous section, the local server receives a signal from the remote server in the form of a port number and a session ID. The local server then also forks into a new process while the original keeps listening. The forked process starts the face detection portion of the code and records for a set amount of time. During the project, this time was kept to a minimum but in a real-world implementation it would run for 30 minutes each during entry and exit.

Whenever a face is detected, a rectangle is drawn around it and this rectangle is then converted into an image that is pre-processed. The pre-processed image is then sent in a thread to the remote server on the specific port number assigned to that session. This ensures that sessions do not end up colluding with each other. Threading ensures that the transmission from the camera is not interrupted and ensures a constant flow of frames.

image-20200608004817732

Image sent to remote server

The local server is also set up for fault tolerance in case the remote server crashes. It does this by saving any detected faces into a folder on the machine that can later on be sent to the remote server when it's running again. The backup folder is created especially for the session to remove any confusion about which files belong to which session. The files themselves are named according to the session and their iteration number.

image-20200608005030167

Fault tolerance on the local server

Policy

It was agreed early on that in order for Ubicuo to be a success, some simple rules had to be adhered to in order for the system to work correctly. At the same time however, the goal of the overall system is to be ubiquitous (where do you think the name comes from? 😏) and to not interfere too much with the natural movements of its users. The face detection of the system works well enough to not require students to have to stop midway just to get their face captured, however, some awareness is required on their part for the system to work optimally.

For this reason, a simple set of rules in order for Ubicuo to work well has been compiled for students to respect:

  • While walking into and out of a classroom, do not turn your face away from the camera so that it the camera isn't able to capture it;

  • Do not cover your face up in any way while walking past the camera so that it isn't able to capture it;

  • Have the Ubicuo application on your smartphone for physical location verification, additionally, make sure you bring your smartphone to labs & tutorials;

  • The student will have their pictures for the system taken as soon as they enroll into the university;

  • The students are required to attend a minimum of 75% of all tutorials and labs in all subject levels as per the university’s attendance requirements;

  • If by some unfortunate circumstance the system is not working as intended, the teacher has the final say on whether the student is present or not. This will only take into effect when the system fails;

  • The system is subject to updating the student's facial details, this includes updating the picture database;

Database

The database is an important part of the project as it contains all the data that'll be used to provide users with information. For this reason, a well-designed database was required so that lookups were speedy while simple to implement; initially, issues were faced designing it but these were cleared rather easily when practical testing came up because it was easier to decide what data to store and display. Below is a discussion of the database design and the implementation of this design.

Database Design

The initial design of the database was well thought out and took into account the later scalability of the system. The design phase was brief because a practical implementation would provide more information about how to improve on the design. Because of this, the overall implementation of the database can be thought of as incremental.

The issues faced with the database were mainly encountered in this phase. The main reason for this is the fact that there was no system to practically work with as mentioned above, and see how the database was being accessed and manipulated. While in theory the design made sense, numerous changes were made during the implementation phase to account for real world issues.

image-20200608005613596

Database Implementation

The implementation phase was relatively simple because of the design was well thought out and scrutinized a lot; all it required was creating the tables and then making appropriate queries in different parts of the project. Obviously, implementing the database and then querying it required a lot of learning on our part as we'd previously only worked with Oracle MySQL. For Ubicuo, the database had to be integrated with the C++ code of the remote server and the Android code of the mobile application.

To connect the remote server to the database, the library MySQL Connector for C++ was used. This library made connecting, reading, and writing to the database simple. Making the data read from the database useful on the remote server was tricky, as it required converting strings to relevant variable types that could be manipulated easily within C++.

The Android application connects to the database through PHP scripts stored on a server. Throughout the project, the scripts were stored on our machines for local access. But as tried and tested, these scripts can easily be hosted online and accessed throughout the application, this would be the case in a real-world implementation. The scripts contain the necessary information to connect to the database, query it and return information specifically encoded to display on the user's device.

Beacons

Beacons are used for physical location verification in the attendance system. Since, none of the team members had previous working experience with Bluetooth Low Energy (BLE), the communication platform on which beacons are built, extensive research was required initially to learn about how the technology works and how it could be customized it for the needs of the project. It was decided that Estimote Beacons would be used for the project, primarily because the team had access to the hardware without a cost, but additionally Estimote has useful documentation, a well laid out API and helpful developer community. Overall, beacons implemented in Ubicuo proved to be a great learning experience mainly because the field of IoT is only starting out and beacons are still in their infancy.

Regions

Beacon region is like a filter or a regular expression. It depends on the developer/user on how to define a region. Each beacon is identified by three values:

  • UUID, most commonly represented as a string, e.g. “B9407F30-F5F8-466E-AFF9-25556B57FE6D”, it is a standard identifying system which allows a 'unique' number to be generated for a device (or in the case of beacons, manufacturer, application, or owner).

  • Major Number, an unsigned short integer, i.e an integer ranging from 1 to 65535, (0 is a reserved value). Major values are intended to identify and distinguish a group of beacons – i.e in the project, all beacons in Block14 can be assigned Major ID "14".

  • Minor Number, also an unsigned short integer, like the major number. Minor values are intended to identify and distinguish a single beacon – i.e individual classroom beacons within a group of beacons are assigned a unique minor value. Beacon in classroom no.121 in Block 14 will have Major:Minor (14:121).

Major and Minor values are numbers assigned in order to identify beacons with greater accuracy than using UUID alone. In our case, the entire UOWD Campus is considered a region as we used Beacon's Universally Unique Identifier "UUID: B9407F30-F5F8-466E-AFF9-25556B57FE6D" to filter beacons and define our region.

image-20200608010211055

Monitoring

Put simply, beacon monitoring can be considered a geo-fence, a virtual barrier that’s usually defined using a set of geographic coordinates. While the application is monitoring, if the user moves in or out of the region it triggers “enter” or “exit” events, which the application can react to. For the project, monitoring wasn't used as it wasn't precise enough for the desired needs of it, however, it was used for testing and learning how to use beacons. The system used beacon ranging, which would help determine a user's proximity to a specific beacon in a classroom.

Ranging

While monitoring creates a virtual fence to detect when a student is moving in and out, ranging actively scans for any nearby beacons and delivers results to the user every second (set by system administrator, the broadcast rate can be set on the beacon).

To determine if students are in classroom, the system needs to check if they are in close range to the allocated beacon and this is done by the ranging process. Broadcasted signal by beacons is received by student’s phone and transmitted to remote server by mobile application for verification purpose.

image-20200608010517316

On the left is a simple visualization of 3 different beacons located in their majors, minors in this case can be assumed to be 1 for each. The 3 are broadcasting the same radius hence the equal size of rectangles, the grey region doesn't have any broadcasting. Of course, there is bound to be overlap at the edges of the radius. This can be easily helped by the fact that classrooms are separated by walls that would significantly impact the broadcast power of the signal, therefore, within the class, a student should get a signal from the appropriate beacon. Additionally, the application has been programmed to select only the beacon that has the strongest signal.

Android Application

The Android application was a great learning experience in this project. It allowed the learning of front-end design following Google's Material Design philosophy, and backend programming to connect the application to a remote database to fetch relevant data. Discussed below will be the different views of the application and how they were implemented.

Student/Teacher View

Initially it was thought that two separate applications could be created for the teacher and the student. However, further research proved this thinking to be redundant as both views share many features and any differences could easily be masked while the application ran. The app differentiates between teacher and student through the login credentials entered, since each login requires a UOW email, a teacher's email will always have 'uowdubai' and a student's email will always have 'uowmail'. These two pieces of text are mutually exclusive, a student email cannot contain 'uowdubai' and vice-versa.

Based on what e-mail is entered, the application calls different PHP scripts to login to the application. The homepage of the application is similar for both teachers & students, displaying what subjects they are teaching or enrolled in, respectively.

On a successful login, parameters that'll be used throughout the session will be bundled with the intent going on to the next activity. Error checks have been implemented as well, the app checks for Internet connectivity of the device as well as if the fields are empty. The subject list shown on the home screen is a RecyclerView but each item within has been implemented with a CardView in a GridLayout with two columns. Doing this proved to be challenging but it was the look that the team had envisioned from the beginning.

Attendance

Shown below is the attendance view. Tabs were used to create two fragments (labs & tutorials). Each fragment contains a RecyclerView to display the attendance. Tabs were used as they simplify navigating between two screens easily. Implementing both tabs and RecyclerView was a good learning experience as can be seen in the screenshot below, every screen has a personalized color that is passed over from the home screen. These colors were then implemented throughout the activity and the fragments of the subject. At the same time, every subject screen has a personalized name for the subject. The item view in the RecyclerView was also designed personally for the application to be able to display relevant information in an easy manner.

Backend Integration

As mentioned above briefly, PHP scripts are hosted on a server and these are the scripts the application is calling to fetch data from the database for each user. There are unique scripts for the student and teacher used throughout the application for different screens. Special classes have been created on the Android side to make sense of the information supplied by the database and to display it in the manner required.

All the screenshots above, the subject list and the list of attendances for the labs & tutorials within a specific subject have been loaded dynamically based on the user that is logged in. All the information is loaded in threads so as not to cause lag in the application. When used, the app is very responsive to inputs from the user for this reason. Of course, fluidity between different screens of the app depends on the connectivity speed & latency.

*All testing was conducted on a local network connection

Development

Integration

Ubicuo consists of many different aspects that all have to be integrated together and function without hiccups. While developing all these separate parts of the project was a task on its own, the biggest undertaking was to combine everything together so that it could work seamlessly.Notably, almost every component that was being combined had to be re-configured slightly to work with the rest of the system. This was expected, and so the time that integration would take was factored into the schedule.

The main point of integration was getting the server applications and the Android application to connect with the database and then write and query it. Early on during development, making the database and integrating it seemed a bit time-consuming as the applications themselves were very bare and couldn't really display the data they'd be fetching from the database. For this reason, generic variables were created within the program to be used while testing to make sure that everything worked well together.

Following this method turned out to be productive because it helped the team realize how to implement the database and which attributes to create & configure in the database so that they could be easily used on the database and in the applications. An example of this would be the time variables in C++ vs. the time variables in MySQL; time in C++ is very low-level while on MySQL it's implemented very well. In any case, C++ was receiving time as a string that had to be converted into a very low-level variable in order to be manipulated.

Another significant point of integration was getting the local server application and the remote server application to communicate with each other over a network connection. A lot of coordination was required between the two programs in order to create a session entry on the database, start a session on the remote & local server, and to then manage the transfer of data from the local server to the remote server.

Development Strategies

The development strategy that was used is the incremental build model. Since most components of the project was new to the team, the tasks were divided to smaller and manageable parts. The tasks were ordered by importance, below is a brief breakdown:

  • Face Detection & Pre-processing;
  • Face Recognition;
  • Socket Programming to connect servers;
  • Server integration with face detection & face recognition;
  • Database design & implementation;
  • App design & implementation;
  • Beacon implementation.

Problems Encountered & Overcoming Them

As with every project, challenges and problems would occur. Initially, implementing the desired face detection (KLT + Viola-Jones) of the project was worrying. On their own, they worked perfectly, but they just wouldn’t cooperate to the project’s desired expectation. After further research and trial and error, both the component worked to the team’s delight. The mistake that was encountered was the way both the components were being implemented. KLT’s ability to mask a region was not used. Once this error was realized, achieving this task was handled quite efficiently relative to the time it took to figure out the error initially.

Creating the database design was filled with lively arguments and disputes, and this helped the team end up with a well-designed database architecture. The initial database design the team agreed on seemed like the ideal design. But after consulting with the associate dean, it was considered to be too complex for an attendance system, it was too complex that attendance could not even be recorded. The following database design was carefully planned according to the project's requirements. The initial design of the database was created with scalability in mind, however such complexity was not required within the scope of the final project. Therefore, a simpler database was implemented for the use of the project and once everything worked together, additional tables can be implemented for future scalability.

For a project that is heavily involved with cameras, one would think that the group shouldn’t have had problems with them. It was originally proposed to use GoPro cameras, but it was difficult to incorporate them into the project. Expensive hardware was needed for them to work with the project and unfortunately, due to lack of capital, the idea was disregarded. An external webcam was bought for the project but due to its low quality, the program had a hard time trying to recognize faces because of the low quality. In the end, DSLR cameras were used for the project, the quality of recognition improved markedly after the use of DSLRs. Throughout the project, the laptop webcams were used for simplicity's sake but the above-mentioned cameras were tested periodically, and extensively once the final system was implemented. This was done so that development could take place rapidly instead of having to fuss over which camera to use, it proved a more pragmatic approach.

The local servers for the project were originally supposed to be Raspberry Pis'. It took the team quite a long time to install and configure OpenCV on them. During the first couple of tests, the Raspberry Pi was not able to perform at all close to the desired expectations. An immediate decision to drop the Raspberry Pi from the project was made given the time it took just to set them up. During the project, the team members' laptops acted as the servers but a real-world implementation would obviously involve these machines being replaced by a much more powerful machine.

Server connectivity was initially expected to play the role it turned out to finally play. Because of this, the group wasn't very focused on it until problems cropped up. However, issues were figured out quickly as all group members had experience socket programming from multiple subjects at university in multiple languages. The primary issue was that of sending an image through to another machine and recreating it on the receiving side. This was obviously a very important step in order to carry out face recognition. Through extensive research online and debugging this issue was achieved in appropriate time and allowed the group to optimize the overall communication between servers.

Conclusion

Lessons Learnt

Technical aspects that were good learning lessons was the overall project in its theme. We learnt quite a bit about computer vision although it's a very diverse field. We delved quite a bit into OpenCV and learning how our way around it, it'll definitely serve the group members well in the future. Figuring out how to install libraries may seem like a minor issue, but troubleshooting and looking for solutions online is a task on its own and all group members handled this very well in terms of coming up with solutions that worked for the project.

This learning was compounded by the fact that the group members had both Mac and Windows machines that allowed each to learn a bit about the other OS. The greatest learning lesson would be putting together a whole project of different modules and seeing it perform to expectations. This obviously was daunting in the beginning because the group members had never embarked on such a task before.

Evolution

As with every project, improvements can be made across the board. The team accomplished a lot in one year given that it began with zero knowledge or computer vision, the fact that a complete, functioning system was able to be implemented was an achievement in itself.

However, the system can be called a prototype at best in terms of a real-world implementation. It would take longer than one year to implement a system that is commercially viable as well as generic enough to support different types of industry.

Below are a few improvements that the team had forecasted throughout the year to add as evolutions for a more complete system:

  • The project was not implemented on a large-scale infrastructure - this would be the primary goal in the evolution of the project, to deliver it to actual institutions and businesses would make it a commercially viable product;

  • An application for iOS phones would definitely be a requirement in order to reach a much wider audience;

  • A fixed camera has to be chosen for the project. The team members had to work with various cameras throughout the project’s lifetime, the one chosen in the end would prove too expensive in a real-world implementation;

  • Security components could be incorporated into the project, these components include encryption of sensitive data, privileges to specific data etc. Additionally, performance would have to be taken into account as security implementations usually result inperformance degradation;

  • A system that is more fault tolerant and backs up itself regularly would also be a major requirement in the evolution of the project. During development, the team had issues of finding points of failure and how to handle these failures even though precautions were taken. It was just that in a real-world implementation crashes of the system wouldbe more unpredictable and the team just didn't have the exposure to make a fault tolerant system of that scale.

References

Face Recognition http://docs.opencv.org/3.2.0/da/d60/tutorialfacemain.html

Nitin Sharma, Ranjit Kaur, “Review of Face Recognition Techniques”, International Journal of Advanced Research in Computer Science and Software Engineering, vol. 6, Issue no. 7, 2016

Ahonen T., Hadid A., Pietikäinen M. (2004) “Face Recognition with Local Binary Patterns”. In: Pajdla T., Matas J. (eds) Computer Vision - ECCV 2004. Lecture Notes in Computer Science, vol 3021. Springer, Berlin, Heidelberg

Picture: https://www.researchgate.net/publication/269688680CurveletTransformandLocalTextureBasedImageForgery_Detection

Beacons: http://developer.estimote.com/ibeacon/


© Zain Khalid 2020