The dissertation project is undoubtfully the largest piece of work a student has to produce during their time at university. When selecting the topic, I wanted to go for a practical project which would allow me to develop new skills. I also wanted the work on something which is somehow related to me and which can be useful for the society.
As I love the Apple’s ecosystem, I have been interested in iOS development for quite some time now. I started studying Swift when it was released back in 2014. In 2019, I got especially intrigued by the then newly announced front-end framework — SwiftUI. During my Placement Year, I managed to find time to do some iOS development and build upon the skills I gained years ago. I know that the best way to learn new things is via practical experience. However, I have always found it difficult to do a project just for the sake of doing something. Therefore, I was quite fortunate to be able to select a dissertation that would allow me to further develop my skills in mobile development while solving a real problem. The original title, “Building an iOS application using ResearchKit”, was open-ended enough for me to be able to expand the idea and work on a medical condition I am interested in and have personal experience with. This is why I decided to focus on hay fever.
The platform that was to be built had to be research-aimed. I explored several ideas. Some were related to investigating the different types of antihistamines, as well as if taking them regularly or on-demand has an impact on the efficacy. While I was doing preliminary research on the latest developments and papers on hay fever, I came to the final idea which is to study if and how hay fever and climate change are related.
I spent the Autumn semester developing the idea, doing research, communicating with third parties, as well as working on the system’s architecture and design. The Spring semester was dedicated to the development, testing, and user evaluation.
Achieved Result: First (96%%)
Hay fever is among the most common medical conditions. Millions are suffering every year. With its symptoms, such as itchy eyes and runny nose, it can greatly affect someone's life. While some sufferers are self-managing their symptoms, others are either misdiagnosed or not being diagnosed at all. This is partially due to the symptoms being similar to those of many other conditions. However, possibly the biggest issue is that the hay fever season is known to be from March to October. Patients, showing symptoms at other times of the year, would most probably not even be sent for allergy sensitivity testing.
Climate change is a change in the average weather patterns. This includes changes in precipitation and more intense heatwaves. As a result, seasonal shifting has been observed — spring arrives earlier, and winter is getting shorter. This changes the timing of when flowers and trees bloom. The early onset of spring prolongs the pollen exposure, whereas late spring leads to people being exposed to multiple pollens at once. This is especially dangerous for those who suffer from raspatory diseases, such as asthma. It has been shown that increased concentrations of carbon dioxide lead to increased ragweed pollen production. However, all studies made on the topic are performed in a lab. As being in a controlled environment, many factors, such as the ambient temperature and rainfall, are excluded. While the limited lab experiments give the ground for future work, they are hard to extrapolate to the whole world, as pollen concentration and air pollution differ based on geographical location.
Self-management of symptoms is something hay fever sufferers have to deal with on daily basis. When working on the project’s idea, it was very important for me to ensure that the platform is useful not only for the potential researchers who would use the app to gather the data they need but also for the users who would download the app on their phones. As of when I was working on the project, there was nothing available on the market that combines both the research and the self-management aspects.
ClimaFever is the first platform that focuses on research while providing additional features, such as displaying pollen and air quality data, allowing self-management of symptoms. ClimaFever makes use of Ambee’s proprietary technology — AIONN-MetNet — which combines a model trained on historical weather data, radar data, satellite information and Gaussian interpolation. The technology demonstrates a great application of Machine Learning. With the available information, ClimaFever was designed to be fundamentally different to what is currently available on the market:
- It is the first application that displays pollen count information throughout the whole year. The others are showing data between March and October and assume low pollen count at all other times. As one of the main purposes of ClimaFever is to change the established perceptions about when the hay fever season starts and how long it lasts, displaying pollen count information in real-time is of vital importance.
- It is the first that provides pollen count with postcode accuracy.
- It is also the first one that provides the actual concentration of the pollens in the environment.
ClimaFever also introduces few new features which are exclusive to the platform — these include pollen count for subcategories, as well as an individualised pollen health risk.
An extensive evaluation of the developed platform was performed with a varied population of participants. The project received very positive feedback. The personalised health risk, which is a ClimaFever exclusive feature, was identified as the users’ favourite. Some of the most wanted features, such as pollen forecast, were also implemented based on the feedback. One of the main limitations of the project is the high pricing of the Ambee’s API. With the current features and the way they are implemented, a single launch of the application costs £0.76 ($1.08). While some optimisations can be made, users would have to pay a costly monthly subscription if the app is made available on the App Store.
In its current implementation, the ClimaFever iOS application contains fourteen core features, which is with five more than what the current best app on the market offers. With its research aspect, the dynamic web application, and the possible addition of the suggested extra features, ClimaFever provides everything needed to hay fever researchers and sufferers. Furthermore, the platform can be adapted so to support different types of research. The project demonstrates how modern-day technologies can be used to investigate and resolve problems of high importance.Download report
Supervision is a crucial aspect of the dissertation project. The whole process was very enjoyable for me — mostly because of the amazing academics who were overseeing my progress. I am very glad that, at every stage of the project, my ideas were positively accepted. The feedback I got was always on point and helped to shape the final project and the accompanying report. The comments on the submission showed the supervisor's and the second examiner's satisfaction with the results achieved:
The dissertation was awarded 96% — the highest in the cohort.
The Intelligent Web
The module explores advanced Web technologies. Broadly, it focused on two aspects. The first one is the evolution of the Web throughout the years with an analysis of what future developments could look like. The module explored big data, as well as the socio-economic aspects of search engines and the social Web. On the other hand, advanced programming concepts for the Web were presented. These include asynchronous and bidirectional client-server architectures, creation of Progressive Web Apps (PWA), providing offline user experiences via local persistence and service workers, and more.
Achieved Result: First (89%)
Group Project: Annotate.ME
As the assignment weights 100% of the module, it aimed to cover all concepts studied throughout the semester. Annotate.ME allows two or more users to join a common chat room with a shared photo that can be annotated. The photo can either be from an URL, chosen from a local file or captured using the device’s camera via WebRTC. The participants can communicate via each other using a chat window. As the solution was implemented using Socket.io, both the annotations and the chat features work in real-time. The website also implements Knowledge Graph, allowing users to link annotations with information for various entities, such as people or places.
Custom UI was built, rather than relying on a front-end framework, such as Bootstrap. This is because such frameworks are usually well-suited for prototyping of a certain type of web projects, however, it was decided that this assignment does not fit into this category and a custom solution would be better suited.
The back-end was implemented using Express. The communication between the client and the server is via AJAX with JSON requests and responses. Time-consuming operations are performed using native APIs with promises to avoid blocking the main thread. The application implements a service worker with a stale-while-revalidate strategy. This ensures that files can be delivered fast, if available in the cache, and be up to date the next time they are requested. By implementing a service worker and utilising IndexedDB, the application also works offline. Users can upload images offline, which then get synchronised with the MongoDB server when online again. IndexedDB was used to store the images, the annotations, and the chat content, allowing users to re-join the room even while offline. It also aids the synchronization with the MongoDB server. The project’s documentation was written using Swagger.
The project was awarded 89%.
The module emphasised on modern artificial intelligence (AI) techniques and their inspiration from biological systems. Some of the topics covered were evolution, neural systems, the immune system, swarms, and the counterpart concepts they have inspired — evolutionary and swarm-based optimization algorithms, neural computing, as well as cellular automata and agent-based models. The lectures showed the application of these concepts for real-world problems. Python was used for the lab sessions, as well as for the group assessment.
Achieved Result: First (91%)
Group Project: Modelling a forest fire with CA
A problem, inspired by the real world, with a set of requirements was presented. We were required to simulate a forest fire using Cellular Automata (CA). The simulation had to take into account the burnability of the terrain. The simulations had to be performed under different wind speeds and directions. Apart from coding, a scientific report, addressing the requirements and summarising the results, was written.
The simulations matched the expectations — with the fire spreading faster towards the wind direction and accelerating based on the wind speed. Various similar papers were researched in order to take inspiration, as well as to compare the findings and the conclusions made. While challenging, the assignment showed that complex systems can be represented and simulated using simple models. However, the results and their interpretation can vary greatly based on the assumptions made and the factors taken into consideration when building the model.
The submission received the highest grade in the cohort — 89%. An individual grade of 93% was achieved.
The module provided the appropriate skill set needed to analyse legacy software and identify “anti-patterns” using static and dynamic software analysis techniques that support reverse engineering. The module also introduced the key reengineering strategies to improve software structure, at both unit and system scale. These include methods for improving the design, migration strategies, as well as regression testing to guard against the introduction of bugs.
Achieved Result: 2:1 (69%%)
Group Project: An Evidence-Based Critique of an Open-Source Software System
This assignment was done in groups of four to five students. Each group could choose the open-source system to work on. The one chosen by the team I was in was Playwright for Java. Playwright is a Java library used to automate Chromium, Firefox and WebKit with a single API.
As part of the analysis, Bash scripts were used to obtain information about the source code. This included calculating the overall lines of code of each file and finding out the biggest classes. The Bash scripts were also used for analysing the GitHub repository, giving information about which files are frequently updated and thus being potential candidates for reengineering.
Static analysis had to be performed. By hypothesising about the system, an initial class diagram was built. It was subsequentially refactored after inspecting the source code. Call graph analysis was also performed to find out which the most called methods and classes are.
Finally, the dynamic analysis gave insight into which methods are likely to be involved in a particular execution phase (via phase analysis). It also provided an overview of the methods and classes that are frequently executed.
Based on the information gathered, the system was critiqued in the context of the best software engineering practices and the main issues were outlined. The findings were presented as a pre-recorded video presentation aided with PowerPoint slides.
An individual grade of 81% was achieved.
Illustration by Freepik Storyset
Assignment: Re-engineering Apache Commons BCEL
The individual project consisted of re-engineering BCEL. BCEL is a framework that enables decomplication and direct manipulation of Java class files. Similar strategies as in the group projects were used to analyse the system and find substantive design weaknesses. Some of the strategies included Bash script analysis, class diagram analysis, call graph analysis, code clone analysis, as well as dynamic analysis.
The second part of the assignment required picking one weakness, identified in the analysis stage, describing the strategy that could be applied to address the weakness, setting up a test set, and applying the re-engineering strategy to the codebase.
Some of the issues identified were related to code duplicates, high coupling (and thus low cohesion), and the existence of God classes among others. The report investigated potential ways the source code can be changed so to get rid of the weaknesses. However, this had to be done carefully, so to ensure that the proposed solution does not violate other design patterns. Lots of the analysis was done in the context of the SOLID design principles, while the report described how some of them have been violated in the code and what the potential fixes could be.
Illustration by Freepik Storyset
Software Testing and Analysis
The module introduced the problems and techniques of analysing and testing software systems. This includes various coverage criteria, based on control and data flow, as well as logic analysis. Grey-box coverage criteria based on input domain analysis was also presented. Random testing and automatic test generation, as well as search-based testing, were also covered. The second half focused on program slicing, test suite minimisation and prioritisation, mutation testing, as well as model-based testing with the use of Finite State Machines (FMS).
Achieved Result: First (72%%)
Assignment: Automated Tool Support for Logic Coverage of Java Code
The aim of this assignment was to develop automated tool support for logic testing of Java methods, minimising the amount of effort required by a human tester when faced with the task of manually writing a test suite for some arbitrary Java method.
To fulfil the requirements, several features had to be implemented. First, a parser was used to analyse the method under test. It extracts and analyses the conditions and the structure of the predicates. Second, the tool has to be able to generate the test requirements needed by a set of logic coverage criteria. The tool supports Condition Coverage, Branch Coverage, and Multiple Condition Coverage. Third, the method under test has to be instrumented, so that the tool knows which conditions have been covered, i.e., whether they were executed or not and if they were, whether they evaluated to true or false. The test generation can be done either via Random Generation or using a custom-built evolutionary algorithm. As expected, the framework computes the coverage for each test case and reports the overall coverage of the test set using the logging information provided by the instrumentation. Finally, the tool generates JUnit tests that are ready to run. The solution developed supports short, int, long, float, double, Boolean, char and string types.
The assignment proved to be quite tricky and there are some limitations, related to the final solution. The tool supports only static methods. Furthermore, it does not support loops and switch statements. The test data generation using the Evolutionary Algorithm supports Branch Coverage and only works with methods that have int, double or string as inputs. Finally, random object generation and collections support was not implemented.
The assignment was done in a pair with another student. The solution was awarded 95%.
Abstract vector created by vectorjuice
The module presented the fundamental concepts and ideas in natural language text processing (NLP). Information Retrieval, Text Compression, Information Extraction, and Sentiment Analysis were the main topics introduced. The module focused on the challenges in NLP, as well as the state-of-the-art techniques. Python was used extensively for the various labs and assignments.
Achieved Result: First (78%)
Assignment: Document Retrieval
The first assignment consisted of building and experimenting with a document retrieval system, as well as writing a report summarizing the findings. Five different weighting schemes were implemented — binary, raw term frequency, algorithmic term frequency, tf-idf, using raw term frequency and wf-idf, using algorithmic term frequency.
Special consideration was taken to ensure the high performance of the system. Operations, needed for a specific scheme, are only executed if the particular scheme is used. Sets are used whenever possible, as they give better performance than lists. The inverted index gets filtered, so only terms that appear both in the query and the document are taken into account when calculating the document scores. Operations related to the whole document collection, such as calculating term frequencies and IDF, are executed only once.
As anticipated, binary weighting produced the lowest results, followed by more sophisticated methods, such as TF and TFIDF. The IR system built showed the importance of pre-processing before performing the retrieval.
The feedback complimented the “excellent performance” achieved with “recall, precision and f-measure scores at the level of the best-known scores”. The assignment was awarded a first-class grade.
Illustration by Freepik Storyset
Assignment: Sentiment Analysis
The second assignment required building a sentiment analysis using a Naïve Bayes classifier, as well as producing a report summarising the extend of the implementation, the experiments carried out and the results obtained. The provided data set was based on movie reviews. The classification was performed on three and five classes. As expected, the classification of three classes performed much better.
Different pre-processing and feature extraction steps were experimented with. Some of the strategies tried were lowercasing, spell checking, stop list removal, punctuation removal, stemming and lemmatisation, single character and digits removal. Additionally, most and least frequent words removal was applied. Part of speech (POS) tagging was used in order to filter out features and perform classification on subsets.
The assignment demonstrated the difficulties related to sentiment analysis. First, the order of the pre-processing steps and feature extraction can lead to different results. The difference in accuracy after applying many of the steps was minimal — sometimes less than 2%. This makes it hard to predict which one will perform better on an unseen example. Second, each step can introduce errors — for example, applying a spell checker might change a misspelt word to different than the intended one. Finally, most of the neutral sentiments were misclassified. This might be due to the fact that these sentences contain fewer or none strong words that can be used for the classification. Furthermore, sentences with neural sentiment might introduce polar opinions — positive and negative — which makes the classification task harder.
As part of the assignment, a program for evaluating the performance of the system was also developed. It calculates the accuracy and displays the confusion matrix, which is a useful way to show the results of the classification.
The submission was awarded 94%.
Illustration by Freepik Storyset
Love vector created by tartila - Freepik
Cyber Security Team Project
The module was about keeping an organisation secure from cybersecurity threats, as well as learning about how to react and what steps to take when a breach occurs. The topics covered were Threat Analysis and Modelling, Security Policies and Awareness, Physical and Technical Defences, Security Monitoring Strategies, and Incident Handling and Response.
Achieved Result: First (74%)
Assignment: Threat Modelling
The individual assessment of the module consisted of carrying out threat modelling of a system, based on its description. The assignment was split into three tasks. The first one was about creating a data flow diagram (DFD). Twenty cybersecurity threats had to be identified as part of the second task. This had to be done in the context of the STRIDE framework which provides mnemonic for security threats in six categories — spoofing, tampering, repudiation, information disclosure, denial of service and elevation of privilege. Finally, mitigation had to be suggested for each of the threats identified.
The assignment helped me to think more about threats I have heard about, but have never actually spent enough time investigating and thinking of possible mitigation strategies. As the system description was quite broad and included various entities, I thought of a wide number of threats that impact the system's security both directly and indirectly.
The submission was awarded a first-class grade.
Illustration by Freepik Storyset
The team project was done in groups of four-five students. Similar to the individual assignment, it was split into several tasks. The first one was concerned with producing an incident response review to a detailed timeline of an incident and a company’s response to it. The strengths and weaknesses were identified and advice was provided on what could have been done differently.
The second part required assessing the company’s preparedness for Cyber Essentials Plus. The framework consists of five key controls — firewalls, secure configurations, access control, malware protection and patch management. The existing security controls and their appropriateness to the company were evaluated. As a result, new security controls were proposed to increase the company’s cyber preparedness.
The third task consisted of preparing a security awareness program. It embodied a 6-month schedule, a presentation for a classroom-based online session, as well as an awareness poster. Finally, two non-security technical controls were selected and proposed to further secure the company’s digital assets.
As the Cyber Essentials Plus framework has requirements in regards to many aspects of a company’s IT infrastructure, we managed to come up with numerous short and long-term interventions which we identified as appropriate and necessary based on the company’s description and its current security posture. A lot of the suggestions were inspired by strategies and methods we have seen in the industry (during our Year in Industry), as well as rigorous research on the latest techniques against cyber threats.
The submission was awarded a first-class grade.
Illustration by Freepik Storyset
Computer Security and Forensics
This module provided an introduction to computer security and forensics. It focused on approaches and techniques for building secure systems. Some of the content studied was access control, cryptographic foundations and crypto attacks, security protocols, software security, threat modelling, secure programming, security testing, static code analysis, and secure operations and forensics.
Achieved Result: First (87%)
Finance and Law for Engineers
The module covered a large amount of finance and legal issues likely to be encountered in the industry. The finance component focused on the practical issues of budgeting, raising finance, assessing financial risks and making financial decisions in the context of engineering projects and/or product development. This includes preparing budgets and analysing financial plans, as well as determining the financial needs of an organisation and identifying appropriate sources of finance. The law part covered the law of contract, intellectual property law, including copyright and data protection, as well as tort law. The environmental law was also covered, outlining the environmental legislation one might have to adhere to.
Achieved Result: First (89%)