Abstract |
"The eye, the window of the soul, is the principal means by which the central sense can most completely and abundantly appreciate the infinite works of nature.” (Leonardo Da Vinci). Can you imagine life without this window? Around 300 million people across the globe are visually impaired. Our investigation, “Lumos”, aims to assist those visually impaired using modern 21st century technologies such as transformers and Large Language Models (LLMs).
The idea is simple: the user straps on a wearable, which they can speak to, by holding down a button on the device. Then the user gets a response, like talking to a human. The wearable contains a camera module, and a Raspberry Pi. An innovative radar feature may be implemented. The project is simple: a wearable was designed to be compact, intuitive, and able to run Edge AI using methods of optimising LLMs and transformers, through quantization and others. The VQA (Visual Question-Answering) model will be further fine-tuned and rigorously tested. Metrics and data regarding the model were collected in a Juypter notebook, which could be used to further optimise the model. The base model performs at approximately 80-84% accuracy at a latency of 1-3 seconds on Cloud TPU. We expect the mobile model to perform with over 70% accuracy on VQAv2 and others, favouring large token count.
In conclusion, our project aims to positively impact daily activities such as navigation and communication for visually impaired individuals, contributing to greater independence and accessibility.
|