“See” More with Technology

By Ruoran (Kathy) Li, Junior Project Manager - Art-Science

On Jan 20, the 31st Café des Sciences event “Assistive Technology for Individuals with Visual Impairment” kicked off our 2022 lecture series focusing on the theme of social innovation. The event highlighted two researchers from Switzerland and one researcher from China working on assistive technology that can help visually impaired persons “see” more, and increase the level of accessibility in their life, study, and work.

The first speaker of the day, Prof. Dr. Alireza Darvishy, is the head of the ICT Accessibility Lab at Zurich University of Applied Sciences, School of Engineering. He was one of the first two visually impaired computer science students in Switzerland and has been committed to accessibility for over 20 years, active in both the private and academic sectors. In 2016, he received the UNESCO Award for Digital Empowerment of People with Disabilities. 

Prof. Darvishy introduced his current project, Accessible Scientific PDFs for All, funded by the Swiss National Science Foundation’s Bridge Discovery program. PDF is the most prevailing document format, and it has been mandatory that PDFs be accessible for people with visual impairment in the EU since 2018. Yet, many PDFs are still partially or entirely inaccessible due to their lack of support for screen readers - one of the most commonly used assistive technology that enables users to obtain information from computers. Prof. Darvishy’s project goal is to research and develop AI-based solutions to create accessible PDFs for visually impaired users. Its potential impact could reach multiple technological, social, and economic levels. 

Prof. Darvishy demonstrated how screen readers work: the software can quickly read the information on the computer aloud. This includes the text within a document, as well as any directive information to help users navigate the operating system. The most common problem with inaccessible PDFs is a lack of “tagging”, which would result in scrambled reading order. In addition, when reading longer documents, sighted readers can quickly understand the document structure by visually identifying the headlines and figures and focusing their vision on the sections of interest; however, screen reader users can only rely on the software to read the text aloud one line at a time. Such a problem is particularly acute for scientific PDFs, which often contains formulas and equations and are often of great length.

Prof. Darvishy mentioned that there are existing tools to make PDFs accessible by allowing the authors to tag its document structure and provide alternative texts for images that screen readers cannot process. However, such a process requires a lot of tedious work and can be daunting for an untrained author. Prof. Darvishy’s working on an AI that uses deep learning to identify the structure through zone detection and classification and recognize math symbols, tables, graphics, etc. According to him, a fully automated process to make scientific PDFs accessible will stay a dream for the current decade. Therefore, his more immediate goal is to research and implement a semi-automated process that requires only a small amount of manual work. The graph below demonstrates its process.

Next up, Arthur Gassner, software engineer at the Swiss start-up company biped.ai. The team has just returned from the CES (Consumer Electronics Show), where they introduced their wearable device “biped” - the world’s first AI copilot for visually impaired persons, now in the beta-testing stage. The device is worn on the shoulders with three 3D cameras capturing the environment, and it can track, identify, and predict trajectories of all surroundings and guide its user using 3D sounds. All of this is done in real-time and locally on the device. 

Arthur elaborated on the technical process to solve biped’s critical problem: figuring out where it is safe to walk. As seen in the slide, here is the data stream from one of the 3D cameras shown in greyscale and depth images. They need to come up with a ground detection algorithm that can predict where the user can walk. The first solution that the team thought of was to use deep learning. The issue is that they want all the computation to be done offline locally, but deep learning is usually computationally expensive and power-consuming, so they must look for alternatives. 

Inspired by a paper titled “Fast Cylinder and Plane Extraction from Depth Cameras for Visual Odometry (Pedro F. Proença, Yang Gao, 2018)”, the team took a depth image and divided it up into “superpixels”. Since they only need the “planar” superpixels, they discard those that have too much depth variation or too many invalid pixels within. After this process, the remaining superpixels are roughly planar.

Then, they use Principal Component Analysis (PCA), as demonstrated in the slide below, to merge similar superpixels together and “grow” the region of the plane, then assign the bottom-most region as ground. Eventually, the team further optimized PCA through random subsampling, NumPy, and eventually Cython and improved the computation speed from 0.2 FPS to 10 FPS - a result that meets the needs of local real-time analysis.

The third and final speaker of the day is Dr. Yang Jiao, assistant professor at the Future Laboratory at Tsinghua University. His main research interest is human-computer design, haptic cognition and interaction design, affective haptic design, and tangible interaction design. At Future Laboratory, he and the team developed “Graille”, a graphical, tactile display that can dynamically generate Braille and other tactile graphical information. 

“Graille” prototype

Dr. Jiao started his presentation by giving some primary data. According to WHO statistics, in 2020, there are 253 million blind and visually impaired people worldwide, including 19 million children. China Disabled Person’s Federation’s statistics in 2018 showed that there are 17 million visually impaired people in China; however, there are only 40 thousand blind and visually impaired student positions in all kinds of blind schools or special education departments in China, according to the data from the Ministry of Education. This means many people with visual impairment cannot receive compulsory education due to disability, travel conditions, economic conditions, school capacity and other barriers. 

Nowadays, most individuals with visual impairment can utilize Braille and voice-assisted software such as screen readers to obtain information. However, there are still minimal tools to help them learn and understand graphical information, such as geometric knowledge in mathematics and circuit knowledge in physics. They set out to explore a multimodal immersive cognition approach that stimulates both the tactile and hearing sense. The goal was to create a user-experience oriented haptic interaction design with low-cost integration. 

In order to better understand the need of people with visual impairment, Dr. Jiao and the team conducted comprehensive research on the current teaching methods, as well as existing assistive technology that can communicate graphic information (such as the extremely costly refreshable tactile screens) and worked with Beijing School for the Blind to study the blind user experience of the tactile interaction.

Chinese Braille is based on the phonics-based Pinyin system and can pose extra challenges. Each word can be ambiguous, and readers need to rely on the context to understand its meaning.

Through fMRI experiment, Dr. Jiao also found that the hand kinesthetic area of blind people is no greater than that of sighted people.

After experimenting with multiple models, the prototype “Graille” was born. To test its functions, Dr. Jiao co-designed a class with a math teacher in Beijing School for the Blind to teach the students the Pythagorean theorem and essential trigonometric functions. Before, the textbooks combined several geometric shapes in one graph. With Graille, teachers can start with a straightforward figure then dynamically add in more geometric information, making it much easier to follow. The experiment yielded exciting results: the students significantly outperformed traditional paper-based classes.

In addition to classrooms, Dr. Jiao also envisioned Graille to be an effective tool in libraries, museums, and other barrier-free public institutions, as well as in scenarios such as online shopping. They are looking for partners to roll out the product to the market to benefit more people with visual impairments.

During the Q&A session, the audience was very engaged, and the speakers delved more profound into the challenge as well as the prospects of their projects. We encourage our audience to email us with any additional questions, and we hope that this edition of Café des Sciences will prompt you to consider the issue of accessibility in the future. Together, we can build a more inclusive and friendlier world.

Click here to watch the recording of the webinar.