1. Trang chủ
  2. » Luận Văn - Báo Cáo

Domestic Multi-channel Sound Detection and Classification for the Monitoring of Dementia Residents’ Safety and Well-being using Neural Networks

194 1 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Domestic Multi-channel Sound Detection and Classification for the Monitoring of Dementia Residents’ Safety and Well-being using Neural Networks
Tác giả Abigail Copiaco
Người hướng dẫn Prof. Christian Ritz, Dr. Nidhal Abdulaziz, Dr. Stefano Fasciani
Trường học University of Wollongong
Chuyên ngành Electrical, Computer, and Telecommunications Engineering
Thể loại thesis
Năm xuất bản 2021
Thành phố Wollongong
Định dạng
Số trang 194
Dung lượng 5,77 MB

Các công cụ chuyển đổi và chỉnh sửa cho tài liệu này

Cấu trúc

  • 1.1 Overview (18)
  • 1.2 Dementia (19)
    • 1.2.1 Signs and Symptoms (20)
    • 1.2.2 Influence of Age and Gender (20)
    • 1.2.3 Statistical Evidence (21)
  • 1.3 Assistive Technology (21)
    • 1.3.1 Continual Influence of Smart Home Devices (22)
    • 1.3.2 Ethical Concerns and Considerations (23)
  • 1.4 Existing Assistive Technology Related to Dementia (23)
    • 1.4.1 Summary of the Limitations of Existing AT Devices for Dementia Care (25)
    • 1.4.2 Recommendations and Compliance to Ethical Requirements (26)
    • 1.4.3 Identification of Domestic Hazards for Dementia Monitoring Systems (26)
    • 1.4.4 Users of the Monitoring System (27)
  • 1.5 Objectives and Contributions (28)
    • 1.5.1 Objectives (28)
    • 1.5.2 Contributions (29)
  • 1.6 Thesis Scope (29)
  • 1.7 Thesis Structure (30)
    • 1.7.1 Publications (30)
    • 1.7.2 Thesis Structure and Research Output Alignment (30)
  • 2.1 Introduction (32)
    • 2.1.1 System Framework (32)
  • 2.2 Acoustic Data (33)
    • 2.2.1 Single-channel Audio Classification (33)
    • 2.2.2 Multi-channel Audio Classification (34)
    • 2.2.3 Factors affecting Real-life Audio Recordings (34)
  • 2.3 Feature Engineering for Audio Signal Classification (35)
    • 2.4.1 Neural Networks (43)
    • 2.4.2 Convolutional Neural Network (44)
    • 2.4.3 Deep Neural Network (45)
    • 2.4.4 Recurrent Neural Networks (45)
      • 2.4.4.1 Long-short Term Memory Recurrent Neural Network (46)
      • 2.4.4.2 Gated Recurrent Neural Networks (47)
    • 2.4.5 Pre-trained Neural Network Models (47)
  • 2.5 Review of Audio Classification Systems (48)
    • 2.5.1 Liabilities and Challenges: Audio Classification (49)
  • 2.6 Review of Sound Source Localization Systems (51)
    • 2.6.1 Liabilities and Challenges: Sound Source Node Location Estimation (53)
  • 2.7 Proposed System Design Specifications and Requisites (54)
  • 3.1 Introduction (58)
  • 3.2 Existing Databases (58)
  • 3.3 Pre-processing (59)
  • 3.4 Noise Reduction Techniques (60)
    • 3.4.1 Beamforming (60)
  • 3.5 Data Augmentation Techniques (62)
    • 3.5.1 Mixing and Shuffling (62)
  • 3.6 Segmentation and Pre-processing Technique for Proposed System (63)
    • 3.6.1 Pre-processing for Sound Classification (64)
    • 3.6.2 Pre-processing for Source Node Estimation (65)
  • 3.7 Development of the DASEE Synthetic Database (66)
    • 3.7.1 Data Curation (66)
    • 3.7.2 Experimental Setup (68)
    • 3.7.3 Room Impulse Response Generation (69)
    • 3.7.4 Dataset synthesis and refinement (73)
    • 3.7.5 Background Noise Integration and Dataset Summary (73)
    • 3.7.6 Curating an Unbiased Dataset (76)
  • 3.8 Chapter Summary (78)
  • 4.1 Introduction (79)
  • 4.2 Feature Extraction for Audio Classification (79)
    • 4.2.1.1 Cepstral, Spectral, and Spectro-temporal Feature Extraction (80)
    • 4.2.1.2 Network Layer Activation Extraction (81)
    • 4.2.1.3 General Feature Performance Study and Results (83)
    • 4.2.2 Fast Scalogram Features for Audio Classification (85)
      • 4.2.2.1 FFT-based Continuous Wavelet Transform (86)
      • 4.2.2.2 Selection of the Mother Wavelet (87)
    • 4.2.3 Scalogram Representation (89)
    • 4.2.4 Results of CWTFT Features for Audio Classification (90)
      • 4.2.4.1 Comparison against State-of-the-Art Features: Balanced and Imbalanced Data (69)
      • 4.2.4.2 Consideration of Signal Time Alignment (93)
      • 4.2.4.3 Per-channel Scalogram with Channel Voting Technique (95)
      • 4.2.4.4 Cross-fold Validation (97)
      • 4.2.4.5 Wavelet Normalization (98)
    • 4.2.5 Classification Performance Observations (100)
  • 4.3 Feature Extraction Methodology for Node Location Estimation (101)
    • 4.3.1 STFT-based Phasograms for Sound Source Node Location Estimation (101)
      • 4.3.1.1 Node Locations Setup (102)
      • 4.3.1.2 Phasogram Feature Calculation (102)
      • 4.3.1.3 Neural Network Integration (104)
    • 4.3.2 Results and Detailed Study (105)
      • 4.3.2.1 Comparison of STFT and CWTFT-based Phasograms (105)
      • 4.3.2.2 Countering the Effects of Spatial Aliasing (106)
      • 4.3.2.3 Comparison against a Magnitude-based Approach (107)
      • 4.3.2.4 Results when using the DASEE Synthetic Database (108)
  • 4.4 Chapter Summary (111)
  • 5.1 Introduction (112)
  • 5.2 Comparison of Pre-trained Models (112)
  • 5.3 Development of MAlexNet-40 (113)
    • 5.3.1 Exploring Activation Functions for CNN Models (113)
    • 5.3.2 Modifications on Weight Factors, Parameters, and the Number of Convolutional Layers (115)
    • 5.3.3 Results and Detailed Study (118)
      • 5.3.3.1 Exploring Variations of the Activation Function and the Number of Layers (118)
      • 5.3.3.2 Fully-connected Layer Output Parameter Modification (119)
      • 5.3.3.3 The Combination of Layer and Output Parameter Modification (121)
      • 5.3.3.4 Exploring Normalization Layers (124)
      • 5.3.3.5 Convolutional Layer Learning and Regularization Parameter Modification (126)
      • 5.3.3.6 Examining System Response to Various Optimization Algorithms (127)
    • 5.3.4 Discussion and Findings (129)
  • 5.4 MAlexNet-40 as a Compact Neural Network Model (132)
    • 5.4.1 Direct Comparison against Compact Neural Network Models (132)
    • 5.4.2 Response to Compact Neural Network Configuration Inspirations (133)
    • 5.4.3 Discussions and Findings (137)
  • 5.5 Examining the Robustness of MAlexNet-40 (138)
    • 5.5.1 Cross-fold Validation (138)
      • 5.5.1.1 Full dataset comparison (138)
      • 5.5.1.2 Balanced dataset comparison (139)
    • 5.5.2 Verification using the SINS Database (140)
    • 5.5.3 Signal Time Alignment for Compact Networks (141)
    • 5.5.4 Factors that affect training speed (142)
  • 5.6 Chapter Summary (144)
  • 6.1 Introduction (145)
  • 6.2 Design Thinking Approach for Graphical User Interface Development (146)
    • 6.2.1 Identifying the Persona (146)
      • 6.2.1.1 User Information (146)
      • 6.2.1.2 Challenges of Dementia Care (147)
      • 6.2.1.3 Concerns on Monitoring Systems (149)
    • 6.2.2 Identifying the Hill: Understanding the Challenges (150)
    • 6.2.3 The Loop: Designing the Graphical User Interface (151)
      • 6.2.3.1 Proposal and Reflection (151)
    • 6.2.4 The Solution: Final Caregiver Software Application Functionalities (153)
  • 6.3 Integrated Domestic Multi-Channel Audio Classifier (155)
    • 6.3.1 Two-step Neural Network for Identifying Disruptive Sounds (155)
      • 6.3.1.1 Detailed Results (157)
    • 6.3.2 Node Voting Methodologies (158)
      • 6.3.2.1 Histogram-based Counts Technique (158)
      • 6.3.2.2 Weighted Energy-based Technique (161)
      • 6.3.2.3 Comparison of the Node Voting Algorithms (162)
  • 6.4 Graphical User Interface (163)
    • 6.4.1 User Interface Overview (164)
    • 6.4.2 Integrated Sound Levels Assessment Tool (166)
  • 6.5 Chapter Summary (167)
  • 7.1 Summary of Research Contributions (168)
  • 7.2 Societal Relevance (169)
  • 7.3 Future Work and Research Directions (170)
    • 7.3.1 Directions for Research (170)
    • 7.3.2 Interface Improvement (171)
  • Appendix 1 (190)
    • 1. Room Impulse Response Generation Code (190)
    • 2. Code for Sound Convolution with the Room Impulse Response (192)
    • 3. Code for adding background noises (193)

Nội dung

Overview

Recent research indicates a growing global population of dementia patients, highlighting the importance of effective dementia care Various initiatives aim to alleviate the challenges faced by these individuals and their families Technological advancements have led to the development of medical assistive technologies, including programmable companion robots, multi-sensor monitoring systems, and virtual dementia simulations These innovations not only help dementia patients adapt to their cognitive decline but also enhance training for potential caregivers Consequently, applications in this field offer significant benefits to both the research community and society at large, which will be explored in the following sections.

Despite the positive advancements in dementia care technologies, challenges remain, particularly regarding ethical considerations and adherence to codes in the design of medical devices The progressive cognitive decline in dementia patients complicates the process of obtaining informed consent for the use of assistive devices, raising concerns about their freedom and human rights Additionally, dementia residents may experience varying side effects and reactions to these technologies, influenced by their cognitive impairment and specific type of dementia Therefore, it is essential to ensure that any system developed for dementia care is adjustable to meet the individual needs of each resident.

This work introduces a novel method for identifying different household acoustic environments, facilitating a non-intrusive multi-channel acoustic monitoring system aimed at assisting dementia residents The thesis outlines the comprehensive process, detailing the stages of feature extraction, classification, and system design.

This chapter highlights the importance of dementia care within the community, addressing the unique symptoms associated with various types of dementia It also explores the available assistive technologies aimed at supporting individuals with dementia, emphasizing key design considerations for medical assistance systems Additionally, the research's scope and objectives are clearly defined to showcase its significant contributions to the field.

Dementia

Signs and Symptoms

Early diagnosis of dementia is crucial for enhancing service management, as highlighted by various national strategies across Europe Timely intervention eases the burden on caregivers and provides residents with more time to adapt to the condition Given the progressive nature of dementia, a clinical diagnosis can be made when cognitive deficits begin to affect daily life.

[10] According to Hort, et al., a Mild Cognitive Impairment (MCI) is often developed prior to dementia

Mild Cognitive Impairment (MCI) is characterized by noticeable complaints and a decline in cognitive functions, even while individuals maintain their ability to carry out daily activities It is important to note that MCI can be identified in individuals younger than 65 years.

Dementia is a category of syndromes characterized by cognitive impairment, which can be further classified into various subtypes Among these, Alzheimer's disease is the most prevalent form It is also possible for an individual to be diagnosed with mixed dementias, which involve a combination of two or more subtypes Common signs and symptoms associated with the main subtypes of dementia are outlined in Table 1.1.

Table 1.1 Main subtypes of dementia and their symptoms [10]

Alzheimer’s Disease Cognitive dysfunction (memory loss, difficulty in language), Behavioral symptoms

(depression, hallucinations), difficulty with daily tasks and activities Vascular Dementia Stroke, vascular problems (hypertension), decreased mobility and stability Dementia with Lewy

Tremor, frequent visual hallucinations and misapprehensions, slowness in movement, rigidity

Decline in language skills (primary progressive aphasia), changes in behaviour, decline in social awareness, mood disturbances Fronto-temporal dementia represents a considerable percentage of residents under 65 years old

Table 1.1 indicates that specific symptoms are more pronounced in certain types of dementia This highlights the need for tailored care for dementia residents, which should be based on their diagnosed category and the severity of their cognitive decline.

Influence of Age and Gender

Dementia predominantly affects individuals aged 65 and older, with a noticeable increase in prevalence correlating with advancing age Additionally, females are more likely to experience dementia than males Interestingly, mild cognitive impairment (MCI) symptoms can manifest as early as 40 years old, accounting for nearly 7% of dementia diagnoses in Ontario, Canada, among those aged 40 to 65 Therefore, both age and gender significantly influence the progression of dementia, highlighting the importance of considering these factors when evaluating care and assistance needs.

Figure 1.2 Percentage of Ontarians with Dementia, by Age and Gender, for a sample size of 90000 [15].

Statistical Evidence

The increasing number of dementia residents highlights the importance of ongoing advancements in this field According to data from the National Centre for Social and Economic Modelling (NATSEM) at the University of Canberra, approximately 436,366 Australians currently live with dementia, a figure projected to rise to nearly 590,000 by 2028 and around one million by 2058 Globally, the population of individuals with dementia was about 47 million in 2015, with estimates suggesting it could reach 131.5 million by 2050.

Dementia significantly impacts a large portion of the global population, with the Australian Bureau of Statistics identifying it as the leading cause of disability among Australians aged 65 and older It ranks as the second leading cause of death, accounting for approximately 5.4% of male deaths and 10.6% of female deaths Furthermore, dementia poses a substantial economic burden, with costs exceeding those of previous years, highlighting its profound effects on both individuals and society.

$15 billion in Australia alone This number is expected to increase to more than $36.8 billion by 2056 [19]

With the anticipated rise in dementia statistics, innovative inventions and research efforts are essential for easing the challenges faced by dementia patients, their families, and caregivers.

Assistive Technology

Continual Influence of Smart Home Devices

The majority of assistive technology (AT) devices designed for dementia patients and their caregivers fall under the category of smart home devices The demand for these devices is increasing as more people seek a comfortable lifestyle A survey by ReportLinker indicates that approximately 41% of the U.S population uses smart home devices, with 12% specifically for security and home monitoring As of 2020, around 24 million Americans utilized security devices, and this number is projected to grow.

Figure 1.3 Population of Smart Home Device Users in the United States, vision until 2023 [25]

*Left column is in millions

The growing demand for smart home devices significantly facilitates caregivers' adaptation to dementia-related assistive technology (AT) home monitoring systems, enhancing their ease of use.

Smart home devices are often marketed for enhancing comfort and simplifying daily tasks, but for individuals with dementia, their primary role is to alleviate distress for both residents and caregivers Additionally, these devices enhance safety and security by facilitating hazard identification and enabling early intervention.

Ethical Concerns and Considerations

Assistive technology (AT) plays a crucial role in easing the challenges of dementia care However, a 2017 World Health Organization (WHO) report revealed that only 10% of AT users had access to the data exchanged by their devices Cognitive impairments in dementia patients often hinder their understanding of their rights regarding these systems Key ethical concerns in the use of AT for dementia care include privacy invasion and restrictions on personal freedom.

Assistive technology (AT) encompasses household monitoring systems and location tracking devices that utilize visual technology, while also including lock systems managed by caregivers and robotic companions These technologies are often criticized for potentially restricting social interaction and freedom for individuals with dementia It is essential that all forms of AT comply with the United Nations Conventions on the Rights of Persons with Disabilities.

Table 1.2: United Nations Conventions on the Rights of Persons with Disabilities – related article summary [32, 28]

4 Assistive technology devices should be designed according to the need of people with limited abilities, while considering cost affordability as one of the prime concerns

9 Assistive technology systems should aim at helping people with limited abilities, while allowing them to continue benefitting from their right of freedom

12 Article 12 stresses on the importance of providing differently-abled people with equal respect and acknowledgement to voice their opinions and preferences

Individuals have the right to openly express their choice of tenancy, promoting their freedom The use of location tracking devices can enhance safety for all.

19 The significance of social participation is emphasized However, further research is essential to identify the level by which assistive technologies should encourage these rights

22 The use of surveillance is permitted to guarantee the safety of dementia residents However, the extent by which they are monitored should not allow for unreasonable privacy intrusion

25 People must have the right to access optimal quality of healthcare without discrimination

The specifications prioritize the rights of individuals with limited abilities and provide a framework for assessing the effectiveness of assistive technology (AT) in dementia care With the increasing global population of dementia residents each year, various organizations have emerged to advocate for their rights Notably, in 2014, the Dementia Alliance International was founded as the first international organization dedicated to supporting dementia residents.

Existing Assistive Technology Related to Dementia

Summary of the Limitations of Existing AT Devices for Dementia Care

Developing effective monitoring systems for dementia care faces numerous challenges Table 1.3 highlights the constraints and shortcomings of current assistive technologies in this field, referencing relevant articles from the United Nations Conventions on the Rights of Persons with Disabilities.

Table 1.3 Limitations of Existing Assistive Technology Devices

Area of Interest Type of Assistive Technology Constraints

Automatic Phone Calls and Messages, Reminder Devices [34]

- Beneficial for people with MCI, but these are user-dependent

- Non-adaptive for aiding moderate to severe cases of dementia, as these levels require higher assistance (Article 25)

Robotic assistants, customized playlists, simplified remote control systems, entertainment and relaxation applications

- Threatens social participation and freedom (Article 19)

Threat to Safety Alarm systems, Temperature Control systems, Water level monitoring, Automated lock system [43, 44, 45, 46]

- Alarm systems can be disruptive to dementia residents due to their hearing impairment issues (Article 25)

- Complications in the maintenance of multiple sensor devices (Article 4)

- Ethical concerns on freedom limitation (Article 9, 12)

Surveillance cameras, Location tracking devices [48, 49]

- Ethical concerns on unnecessary privacy invasion (Article 22)

- Overwhelming feeling caused by cameras can result in negative effects to the residents’ social and emotional state (Articles 9, 12, 19)

Recommendations and Compliance to Ethical Requirements

Considering the drawbacks mentioned in Table 1.3, we aim to develop a system that is adaptable according to the needs of the dementia residents at every stage of cognitive decline

An audio-based monitoring system effectively addresses the needs of both early-stage dementia residents and those with advanced cognitive decline by minimizing visual intrusion, making it less overwhelming This approach aligns with Article 22, as outlined in Table 1.2, and complies with Articles 9, 12, and 18, which uphold the rights of dementia residents to choose their tenancy, make personal decisions, and maintain their freedom By utilizing audio surveillance, residents can feel more at ease, as they are not subjected to visual observation or tracking.

Residents with varying degrees of dementia, both mild and severe, can benefit from a customizable severity index based on their individual profiles This index allows caregivers to adjust notifications according to the resident's activity level and cognitive decline By setting a lower severity index for those with mild cognitive impairment, unnecessary intrusions are minimized, enabling residents to maintain their freedom while ensuring their safety.

Audio-monitoring systems can greatly assist residents with hearing impairments, a condition often associated with dementia, particularly in advanced stages These systems can be customized to notify caregivers when the noise levels in the environment exceed what is tolerable for the resident, based on their specific hearing impairment identified through hearing tests.

Prior to installing microphone arrays, it is essential to consider the permissions and opinions of residents to prevent ethical issues The audio-based monitoring system developed in this work focuses on capturing household acoustic scenes and events, which may include speech and crosstalk While this raises privacy concerns, measures will be taken to filter out such elements before classification Unlike devices that require user interaction or rely on sensors, this audio-based system enables continuous monitoring of residents' wellbeing in a cost-effective manner, aligning with Article 4.

Identification of Domestic Hazards for Dementia Monitoring Systems

The primary goal of developing assistive technology for dementia care is to enhance healthcare quality for residents Effective monitoring systems are essential to alert caregivers when assistance is needed, while also distinguishing between hazardous and non-hazardous domestic activities This distinction is crucial for ensuring safety and minimizing unnecessary intrusions A study by the Centers for Disease Control (CDC) highlights that falls are the most prevalent household accidents among individuals over 65.

Household hazards encompass various risks, including drowning, poisoning, and burns A summary in Table 1.4 outlines the potential causes of these hazards, the household activities associated with them, and the optimal timing for alerting caregivers It is essential to consider all these factors when developing monitoring systems.

Table 1.4 Common Household Hazards for Dementia Residents

Type Causes Related Household Activity Alert Sent

Falls / Slips Slippery floors, floor humps, absence of grab bars, leakage

Washing dishes, Any activities where the faucet is regarded to be on

When the faucet / shower has been on for longer than usual, loud sounds (indicating fall)

Burns Fire, hot water Cooking Depending on cognitive decline level

Presence of some cooking devices (knives, stove, oven), or heavy objects

Cooking, Working Depending on cognitive decline level, loud sounds (indicating fall)

Unattended faucets Any activities where the faucet is deemed to be on, Eating

When the faucet / shower has been on for longer than usual, uncategorized sounds (indicates choking)

Identifying hazards indicates that a resident's proximity to these dangers significantly influences the level of assistance needed Therefore, estimating the location of the detected sound can be a valuable feature in the proposed system, particularly for residents with advanced dementia experiencing severe cognitive decline.

Monitoring systems have inherent limitations, as they can only identify hazards through audio or video detection, depending on the surveillance type Therefore, it is essential to complement these systems with a personal assessment of the resident's environment before occupancy, ensuring the absence of harmful chemicals, medications, or firearms that could adversely impact the resident.

Users of the Monitoring System

Monitoring systems in dementia healthcare primarily serve caregivers rather than the patients themselves Caregivers fall into two categories: formal caregivers, such as healthcare professionals, and informal caregivers, typically family members like spouses or children In Australia, 22% of dementia patients rely exclusively on informal care, highlighting the need for effective interventions to support these caregivers in their crucial role in dementia healthcare.

Caring for individuals with dementia significantly increases stress, anxiety, and depression levels in caregivers, potentially leading to serious health issues According to a survey by the American Association of Retired Persons (AARP), caregivers of dementia patients dedicate more time to caregiving than those caring for individuals without dementia.

Support for caregivers encompasses education, training, social support groups, and psychosocial therapies Additionally, assistive technology can significantly reduce caregiving-related stress This article focuses on the development of an audio-based monitoring system designed to assist dementia caregivers Key objectives and contributions of this research are outlined in Section 1.5, with comprehensive details on the system's interface and the research-based design thinking approach discussed in Chapter 6.

Objectives and Contributions

Objectives

Upon the completion of this project, the following objectives will be achieved:

- Identify a combination of relevant spectral and spatial acoustic features extracted from multi-channel household acoustics, that would account for both sporadic (sound events) and continuous (acoustic scenes) sounds;

- Improve the accuracy of current multi-channel audio classification methods through the detailed revision of prevalent effective solutions;

- Conduct a study on the performance of several efficient feature classification methods in the specific application of household acoustics;

- Utilize the performance analysis of proficient feature classification methodologies, for the purpose of identifying the best techniques to combine for maximum accuracy;

- Conduct a thorough study on neural network architecture and hyper parameter modification, and a detailed comparison against other pre-trained neural network models

- Integrate sound source estimation in order to provide a more reliable dementia resident monitoring system, especially for more severe dementia cases;

Contributions

Accordingly, the following contributions are achieved:

- Design and validate an effective methodology for the classification of multi-channel domestic acoustic scenes and events;

- Design and develop an effective methodology for the estimation of sound source location;

- Design, develop, and validate a novel and accurate customizable compact neural network architecture that is compact and suitable for resource-limited devices utilization

- Develop a synthetic acoustic dataset of representative domestic audio recordings covering sounds that are often experienced in a dementia resident’s environment

Create a customizable Graphical User Interface (GUI) for caregivers to set severity indexes for each activity, allowing them to specify the level of notifications they wish to receive.

Thesis Scope

This research aims to enhance dementia resident care by creating a non-intrusive monitoring system that helps prevent household accidents through acoustic scene and event monitoring The primary objective is to develop an effective multi-channel classification solution for household acoustics, focusing on feature extraction and a compact neural network model that balances accuracy with resource efficiency for devices with limited capabilities Additionally, the study seeks to improve sound source estimation methodologies and advance home technology in terms of speed, accuracy, and reliability The proposed methodology's effectiveness will be assessed using quantitative performance measures Furthermore, a synthetic database for dementia care environments will be established to mitigate ethical concerns associated with real recordings, facilitating the development and refinement of new systems for future testing in real settings under ethical guidelines.

In exploring detection and classification methods, only those capable of real-time or near real-time performance are considered However, the proof-of-concept prototype will prioritize accuracy over computational efficiency, with real-time constraints being a secondary focus.

Thesis Structure

Publications

During this PhD, I published 1 journal article, 5 conference papers, and 3 technical reports or pre-prints, with full citations available in the “Thesis Publications” section All conference proceedings were published by IEEE Xplore and indexed by SCOPUS Additionally, I submitted two entries to the DCASE 2020 Task 4 challenge and created a synthetic audio database of sound scenes and events relevant to dementia care This dataset, along with the source codes, is accessible for open use on Kaggle.

Thesis Structure and Research Output Alignment

The overall structure of the thesis, along with the correspondence and alignment of the research outputs and produced publications for each contribution chapter, can be visualized in Figure 1.4

Figure 1.4 Publication and Research Output Correspondence to Contribution Chapters

Note: C – conference, J – journal, T – technical report; numbers refer to the citation number mentioned in the thesis publication list

The first chapter of this thesis introduces dementia, outlining its signs, symptoms, and statistical evidence It emphasizes the importance of this research in improving the management of the disorder while minimizing invasiveness for dementia residents Additionally, the chapter details the scope and objectives of the study.

The second chapter reviews relevant research on assistive technology for dementia care, examining various methodologies and existing studies It discusses pre-processing, low-level feature extraction, and multi-channel household acoustic scene classification methods from research papers, journals, and books The advantages and disadvantages of these works are assessed in relation to our application, highlighting identified research gaps Additionally, the chapter outlines the proposed methodology and provides justifications for the selected techniques, emphasizing their relevance to the research.

Chapters 3 to 6 then subsequently details the research contribution to the chosen field, starting from data acquisition and the generation of a synthetic database relevant to dementia healthcare, audio classification and sound source estimation, the development of an accurate, compact neural network, and the integrated user interface in line with design thinking specifications Information regarding the implementation process of the research, as well as the results attained, are detailed within these chapters Finally, the last chapter of this thesis summarises the research outcomes, and explores future work and potential improvements relevant to the topic

Review of Approaches to Classifying and Localizing Sound Sources

Introduction

System Framework

The system framework comprises three key components: data collection and preparation, pre-processing and feature extraction, and classification via neural networks As illustrated in Figure 2.1, this framework includes an application for estimating the location of sound source nodes, enhancing the audio classifier system's ability to identify the nearest node to the sound source.

The overall system framework for domestic sound classification begins with the preparation of various audio data types, including acoustic scenes—recordings of activities over time—and sound events, which are specific sound classes occurring briefly This is followed by pre-processing and feature extraction, where audio data is transformed into useful features that can be visualized for classification The final step involves generating predictions using neural networks The subsequent sections of this literature review will follow the sequence of these processes.

Acoustic Data

Single-channel Audio Classification

Audio-based monitoring systems are less prevalent in surveillance compared to video cameras, with most studies relying on mono-channel data from ASN nodes featuring a single microphone Previous research has primarily focused on identifying indoor sounds, while other applications of audio classification include speech recognition in smart home systems and cultural heritage preservation Notably, Almaadeed et al proposed using audio event classification for road surveillance by combining time, frequency, and time-frequency domain features However, the current body of work in domestic audio classification remains limited.

Vafeiadis et al proposed an effective audio-based recognition system for smart homes, focusing specifically on sound events in the kitchen However, further testing and verification are needed to ensure its reliability for overall household acoustic classification Similarly, the Tiny Energy Accounting and Reporting System (TinyEARS) has classified mono-channel household acoustics, successfully reporting individual power consumption of domestic appliances within a 10% error margin.

While the reviewed works may offer benefits, they primarily focus on single-channel sound events rather than comprehensive acoustic scenes Sound events are specific identifiable noises, like footsteps and coughing, whereas acoustic scenes encompass the overall context of the recording environment For the application in this study, it is essential to integrate both sound events and acoustic scenes to develop a more adaptable system for monitoring dementia residents However, the inherent differences between acoustic scenes and events can pose challenges to system accuracy, making it vital to minimize the loss of critical information while maximizing relevant data.

Multi-channel Audio Classification

Classifying audio through mono-channel recordings can be challenging due to overlapping sounds and complex acoustic scenes, which may affect accuracy and reliability Single-channel audio often misses additional cues present in recordings, making the use of multi-channel audio sensor networks (ASNs) advantageous These networks consist of multiple microphone sensors at each node, enhancing sound event recognition While the fundamental steps of aural classification are similar for both single and multi-channel audio, multi-channel systems have demonstrated up to 10% greater accuracy, successfully identifying 15.6% more sound events compared to their single-channel counterparts.

Existing multi-channel audio classification systems are utilized in various applications, including gender recognition and distinguishing crosstalk from speech However, achieving accurate acoustic event recognition is challenging due to factors like background noise and reflections, particularly in smaller datasets Therefore, it is crucial to investigate the robustness of methodologies used in these acoustic conditions Additionally, developing a reliable audio monitoring system faces challenges in accurately identifying hazards, as auditory detection may lead to a higher rate of false identifications Enhancing accuracy and reliability is essential for advancing research in this field.

Factors affecting Real-life Audio Recordings

Emmanouilidou and Gamper [70] highlight that significant variations in recording equipment, noise conditions, and environmental factors pose major challenges in audio event detection and classification Therefore, ensuring high audio quality during data selection and accounting for differences across various datasets is crucial in developing an effective methodology.

Audio quality is crucial for assessing the clarity of recordings, influenced by factors such as recording methods, microphone types, and ASN structure Natural elements like background noise, overlapping sounds, and reverberation can adversely affect audio classification Research indicates that background noise can reduce the performance of classification systems, with effects varying by class type Additionally, studies have shown that overlapping sounds can hinder sound event detection in recordings.

Finally, recent works had determined a connection of system performance decline with using data gathered from reverberant environments [70]

This study focuses on generating and utilizing multi-channel acoustic data that includes both sound events and acoustic scenes, while also incorporating common background noises and room acoustic reverberations from various environments A synthetic domestic acoustic database will be created to accurately represent the selected application environment, ensuring the robustness and effectiveness of the approach in realistic scenarios Additionally, the methodology will be tested against another database to assess its durability across different recording equipment and environments The synthesized database will also serve as a valuable resource for future research.

Feature Engineering for Audio Signal Classification

Neural Networks

Artificial Neural Networks (ANN) are algorithms utilized primarily for classification tasks These networks mimic the human brain, consisting of interconnected neurons organized into three layers: the input layer, the intermediate layer, and the output layer.

Figure 2.3 Three Layers of the Neural Network

Neural networks have demonstrated remarkable efficiency in multi-channel audio classification, as highlighted in various studies They can be categorized into several sub-classes, each with unique strengths and weaknesses A key advantage of all neural network sub-classes is their ability to retain stored information throughout the network, ensuring functionality even when some data is lost Additional benefits include fault tolerance and distributed memory, making them suitable for simultaneous tasks and machine learning applications These features contribute to the effectiveness of neural network classifiers in multi-channel acoustic scene classification However, they also come with drawbacks, such as specific hardware requirements and unpredictable processing times, as well as limitations in managing nonlinearities.

This section reviews recent advancements in traditional artificial neural networks (ANNs) to address their limitations The various subtypes are analyzed based on their unique benefits, challenges, and the suitability of different neural network subclasses, aiming to develop a classifier that meets the requirements of the proposed system.

Convolutional Neural Network

Convolutional Neural Networks (CNNs) have emerged as a popular algorithm for multi-channel sound scene classification, utilizing features like MFCCs, spectrograms, and log-mel energies As a specialized type of neural network, CNNs employ convolution to combine signals for effective classification They incorporate architectural concepts such as local receptive fields, shared weights, and temporal or spatial subsampling, which enhance shift and distortion invariance Unlike traditional neural networks, CNNs connect only a summary of neurons from the previous layer to the next, leading to improvements in runtime, computational complexity, and storage efficiency The architecture of CNNs is illustrated in Figure 2.4.

Figure 2.4 Convolutional Neural Network Architecture

The initial step in every Convolutional Neural Network (CNN) involves applying convolution, a linear operation that combines functions to produce linear activation functions These functions are subsequently processed through a non-linear activation function.

Pooling is employed to reduce the dimensionality of outputs generated by convolution and to extract feature maps This process involves grouping nodes into sets, where each region encapsulates the values of neighboring nodes Convolutional Neural Networks (CNNs) are structured with multiple convolutional layers that create feature maps, culminating in a fully connected layer at the output, where each neuron is interconnected with every neuron from the preceding layer.

Traditional Convolutional Neural Networks (CNNs) are computationally efficient, minimizing the number of parameters required for training classifiers, and they achieve satisfactory accuracy for many applications, particularly with datasets that exhibit spatial relationships However, deeper neural network architectures often deliver superior performance This improvement arises because the pooling technique primarily captures translational invariance, neglecting more complex invariances, which can negatively impact accuracy.

Over the years, several enhancements have been made to address the limitations of traditional CNNs Le et al introduced tiled convolutional networks, which utilize a tiled approach to capture complex invariances while reducing the number of parameters by sharing common weights in the first layer and focusing on small input regions Additionally, Chong et al developed multi-channel CNNs that employ a stack of 1D convolutional layers instead of the conventional 2D convolution, resulting in improved accuracy, real-time classification performance, and efficient execution, even with smaller datasets.

Deep Neural Network

Deep Neural Networks (DNNs) enhance traditional neural network architectures by incorporating additional intermediate layers to handle complex, non-linear features Also known as feed-forward networks or multilayer perceptrons, DNNs utilize hidden layers where activation functions must be selected for computation This structure allows input data to pass through multiple hidden layers before classification, leading to improved accuracy, particularly in multi-channel applications However, the presence of numerous hidden layers and the need for large datasets can significantly increase training time, resulting in a cost-benefit trade-off.

In acoustic scene classification, the complexity and lengthy training times often lead to the integration of Convolutional Neural Networks (CNNs) to leverage their strengths This approach has given rise to Deep Convolutional Neural Networks (DCNN), as demonstrated in the studies by Weiping et al [118] and Duppada and Hiray [119].

Recurrent Neural Networks

Recurrent Neural Networks (RNNs) are a specialized type of artificial neural networks designed for handling sequential data through recurrent connections and non-linear dynamics They excel in modeling data for sequence identification and prediction by transforming input sequences into output sequences over time RNN architecture features feedback loops, known as recurrent cycles, which provide temporary memory, as illustrated in Figure 2.5.

Figure 2.5 Recurrent Neural Networks Architecture [122]

While RNNs demonstrate significant efficiency in sequential tasks, they face challenges related to prolonged training durations, complicating their use as classifiers Additionally, the standard RNN model struggles with classifying long-term sequences due to its heavy reliance on the training data.

For effective domestic acoustic scene classification, the ability to classify long-term sequences is essential for continuous monitoring of household activities To address limitations in conventional Recurrent Neural Networks (RNNs), significant advancements have been made, including the development of Long Short-Term Memory (LSTM) and Gated Recurrent Neural Networks (GRNN) Additionally, researchers have explored combinations of RNNs with other neural network types, such as Convolutional Recurrent Neural Networks and Deep Recurrent Neural Networks.

2.4.4.1 Long-short Term Memory Recurrent Neural Network

Conventional Recurrent Neural Networks (RNNs) face significant limitations due to the training algorithms that rely on gradients, which can hinder the learning of long-term sequential dependencies because of gradient dissipation To address this issue, Long Short-Term Memory (LSTM) networks, a specialized type of RNN, are designed to mitigate the effects of dissipating gradients LSTMs achieve this by utilizing gates and memory cells that regulate the flow of information into hidden neurons, effectively preserving and maintaining outputs from previous sequences.

Despite its ability to manage long-term sequences, the use of gates to control the inputs and outputs of intermediate layers necessitates multiple memory cells This results in increased memory requirements and greater computational complexity compared to traditional RNN models.

Gated Recurrent Neural Networks (GRNNs) address the limitations of traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks by efficiently managing long-term sequences with reduced memory requirements, as they do not rely on separate memory cells Instead, GRNNs utilize linear summations between the current and next states, facilitated by reset and update gates The update gate refreshes the model's current state during significant events, while the reset gate clears the current state These gates play a crucial role in calculating outputs and updating states based on relevant formulas, as illustrated in Figure 2.6.

Figure 2.6.Gated Recurrent Unit Reset and Update Gate [122]

While GRNN has lower memory requirements than LSTM neural networks, it still exhibits higher computational complexity compared to standard RNNs Additionally, the efficiency and accuracy of both LSTM and GRNN can differ significantly depending on the datasets used.

Pre-trained Neural Network Models

Utilizing pre-trained neural network models for classification offers an efficient alternative through transfer learning, which enables the reuse of a trained network's weights for new model training This approach significantly enhances efficiency in model building, training duration, and the overall learning workflow Numerous studies indicate that transfer learning yields better accuracy and results compared to custom CNN models Additionally, the reduced time and resource requirements of pre-trained models provide greater flexibility in developing classification systems For instance, some research employs neural networks as feature extractors, training them up to a certain layer to extract neural activations that serve as features for other machine learning techniques or neural networks.

Pre-trained CNN models such as AlexNet, GoogleNet, ResNet, Inception-ResNet, Xception, SqueezeNet, VGGNet, and LeNet are trained on extensive datasets, allowing their weights to be reused for transfer learning A summary of these models, highlighting their key characteristics, is presented in Table 2.3, which is essential for selecting the appropriate classification methodology for this work.

Table 2.3 General Comparison Summary between Pre-trained CNN Models Marked with * - number of layers may vary depending on the version used

Model Year Size Input size Layers No of Parameters

Pre-trained networks are typically trained on large databases, which can be excessively large and include layers and weights that do not enhance performance for specific datasets This creates challenges for developing systems intended for smaller machines with limited resources while ensuring accuracy Chapter 4 of this thesis addresses this research gap.

Review of Audio Classification Systems

Liabilities and Challenges: Audio Classification

This article evaluates the open challenges and areas for improvement identified in existing research by comparing related works based on their applications, methodologies, contributions, and limitations Table 2.4 specifically highlights studies related to multi-channel audio classification to ensure alignment with the thesis presented It is important to mention that accuracy results from these studies are not included in the comparison, as they were not all measured using the same datasets.

Table 2.4 List of notable works in the field of multi-channel audio classification

Study Application Features Contributions Drawbacks

Log-mel energies with CNN Classifier

Shuffling and Mixing data augmentation

Features and classification methods close to baseline

MFCC and Log- mel features with pre-trained VGG-

Front-end and back-end module approach

Extensive training process required for VGG-16 classifier, which is challenging when using different datasets

Mel-filter bank features with CNN Classifier

Mix-up data augmentation in pre-processing

Accuracy level achieved, needs to be tested for domestic sound scenes

CNN extracted multi-spectrogram features with SVM classifier

Multi- spectrogram through STFT and MFCC as features

Utilizing spectral features assumes a static representation of sounds, needs to be tested for domestic sound scenes

Improvisation in methodology can enhance the exploration of various neural networks for classification, including the potential for combining multiple classifiers or utilizing different feature sets While many existing studies primarily employ Convolutional Neural Networks (CNNs) as their main classifier, these models, despite their success in multi-channel audio applications, may overlook complex invariances due to their pooling techniques and spatial behavior assumptions Additionally, the complexity and size of CNN architectures can lead to overfitting Therefore, investigating compact, series architectures could reduce resource demands while potentially improving accuracy.

Zheng et al [77] highlight the challenges of using spectral features for classification, particularly when integrating them with other features These spectral features are affected by the non-static properties of audio [143], and the varying requirements of different acoustic scene classes in terms of temporal and frequency resolutions can complicate matters [77] This variability can impact the filter shape used in neural network models [77].

Audio scenes often face challenges such as noise and overfitting, which can compromise system accuracy Previous techniques primarily focused on data augmentation through shuffling and mixing, limiting improvements to the pre-processing phase Enhancing system accuracy by refining feature extraction presents a valuable opportunity for advancement Exploring and integrating various cepstral coefficients can provide a comprehensive representation of the signal while considering additional cues within the audio scene Moreover, few studies have tested their systems across multiple datasets of domestic acoustic scenes, indicating that broader testing could significantly enhance reliability.

The adaptability of the system for significant applications is highly valuable Notable research in multichannel acoustic scene classification, as outlined in Table 2.3, lacks direct application ties By linking this research to specific applications and incorporating adjustability and innovative system maneuvering, we can enhance research developments that address the challenges faced by the population.

Review of Sound Source Localization Systems

Liabilities and Challenges: Sound Source Node Location Estimation

To identify the challenges and potential areas for development in this work, Table 2.5 provides a comprehensive comparison of significant related works, highlighting their methodologies, advantages, and disadvantages This comparison facilitates a thorough evaluation for selecting the most suitable methodology to enhance the scope of this research.

Table 2.5 List of notable works in the field of multi-channel sound source estimation

Study Basis Methodology Contributions Drawbacks

[145] Time differences of arrival (TDOA)

Simplicity Low performance due to noise and multipath propagation

Time differences of arrival (TDOA)

Improved performance, reduced effects of noise compared to [145]

Still subject to noise when there are more overlapping sounds present

Narrowband Multiple Signal Classification (MUSIC) estimator performs well in signal sources with smaller separation distances, high resolution

Requires scan vector, which increases computational complexity

Root Multiple Signal Classification (Root- MUSIC) estimator performs well in signal sources with smaller separation distances, high resolution

Assumes sources are perfectly spaced; requires scan vector, which increases computational complexity

Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT)

Does not require the use of a scan vector, reduced complexity, maintains high resolution

[146] Time differences of arrival (TDOA)

Cross-correlation coefficient matrix determinant low computational complexity, flexibility, low sensitivity to noise

Requires prior knowledge of the exact number of active sources

STFT Magnitude and Phase as features to Custom Convolutional and Recurrent NN

Does not require prior knowledge of the number of active sources; better performance than MUSIC

Inconsistency, Performance depletion with increasing overlapping sounds

STFT Magnitude and Phase as features to CNN with dilation

More computationally efficient than traditional NN-based method

Inconsistency, Performance depletion with increasing overlapping sounds

The proposed home monitoring system focuses on efficiently identifying the location of hazardous sounds without needing precise coordinates of the sound sources Instead, estimating the room location of these sounds is sufficient for determining the resident's position relative to potential dangers Additionally, integrating the sound source estimator into a multi-channel sound classification system allows for reduced time and resource consumption.

Approximating Direction of Arrival (DOA) using neural networks offers significant advantages, particularly as it does not necessitate prior knowledge of the number of active sound sources in the environment This capability is especially beneficial for multi-channel audio recordings.

This work seeks to enhance previous research by improving computational efficiency and reducing time and resource demands through the implementation of transfer learning instead of training a custom neural network classifier Additionally, to mitigate the adverse effects of overlapping sound sources on Direction of Arrival (DOA) estimation, the study focuses on maximizing the information derived from the Short-Time Fourier Transform (STFT) phase differences between microphones, while discarding magnitude information that primarily reflects overlapping sound sources.

Proposed System Design Specifications and Requisites

This chapter outlines the limitations and challenges in sound scene classification and source node estimation, alongside the health and ethical considerations discussed in Chapter 1 Based on these insights, a comprehensive set of system development specifications and requirements has been established, which will be elaborated upon in the subsequent chapters of this work.

1 Generation of a synthetic domestic acoustic database comprising of sound events and scenes

2 A well-researched approach to gathering relevant spectral and spatial features aiming at a better accuracy and a shorter inference execution time (Chapter 4)

3 Segmentation of recorded audio signals prior to classification (Chapter 4)

4 Precise source node estimation for approximating the source node of the identified hazard that can be integrated with both the sound classification component of the system design (Chapter 4)

5 Novel, compact, and reliable audio scene classification technique that is (Chapter 5)

Creating a synthetic dataset that accurately represents a typical real-world dementia care environment enables the examination of factors like background noise and room reverberation relevant to the application This approach also facilitates testing the system's robustness and effectiveness across diverse datasets with different recording setups, paving the way for future research opportunities Further details on the dataset generation will be discussed in Chapter 3.

After generating the dataset, optimal spectro-temporal features will be extracted for audio classification, focusing on their advantages Sound event segmentation will be implemented to identify occurrences throughout the classification duration Additionally, a phase-based audio location estimation system will be developed, enhancing classification performance and reliability without needing extensive prior knowledge of the recordings Chapter 4 will elaborate on both classification features and location estimation aspects of the system.

In Chapter 5, we will apply the optimal features identified in Chapter 4 to neural network methods for classification This chapter focuses on designing a compact neural network model that achieves high accuracy while minimizing computational and resource demands Through an in-depth analysis of various neural network approaches and architectures, we aim to enhance precision while ensuring compatibility with devices that have limited computing capabilities.

The system functionalities will be seamlessly integrated into a user-friendly interface, developed using a design thinking and user-centered approach For more information on the specifications and features of this interface, please refer to Chapter 6.

To evaluate the performance of the proposed system, its response to the following aspects are investigated:

This study conducts a comprehensive comparison of cepstral, temporal, and spectro-temporal features on a per-class and overall basis to identify the most effective features The identified features will subsequently be classified using various pre-trained neural network and machine learning models to determine their performance.

2 Investigation of the effects of feature combination

3 Increasing the sample set, and cross-fold validation

Identifying the top-performing features and model classifiers is crucial for developing a novel approach in the selected application To enhance the validity of performance claims, cross-fold validation will be employed, utilizing different training and testing samples each time This method ensures the reliability of system performance while maintaining a balanced dataset during implementation Additionally, cross-validation serves to verify the results obtained in this research.

In addition to standard accuracy, the performance of various techniques will be evaluated using F1-scores, which consider both recall and precision These metrics are calculated from the ratios of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) The F1-score is defined by a specific equation, while recall and precision are detailed in separate equations The necessary parameters for calculating the F1-score can be obtained from confusion matrices.

The performance of each technique will be evaluated using an expanding sample set across various experimental stages, beginning with 3,042 ten-second audio files, comprising 338 files per category, and progressively increasing until the complete database is utilized This analysis includes recordings from the SINS database and the synthetic database created as outlined in Chapter.

3, both compose of unequal numbers of audio files per category To account for the data imbalance, the following techniques are used:

This technique focuses on the initial development and experiments by equalizing the dataset across all levels to maintain balance This approach prevents bias towards categories with a higher number of samples by reducing the data per level to match the minimum count among categories Data selection will be conducted randomly throughout the experiments.

2 Using Weight-sensitive Performance Metrics

The F1-score is the primary performance metric for the experiments, making it essential to ensure that these metrics are unbiased, particularly in multi-classification scenarios In unbalanced datasets, the average F1-score can be influenced by the number of samples per class, potentially skewing results in favor of classes with more data To address this issue, three distinct methods for calculating the mean F1-score will be evaluated.

The weighted F1-score computes the F1 score for each class individually and averages them by applying weights based on the number of true labels for each class This method ensures that the contribution of each class to the overall score reflects its prevalence in the dataset, as outlined in equation (2.13).

- Micro F1-score: utilizes the total number of TP, FN, and FP throughout the multi-classification

The F1-score is calculated directly without bias towards any specific classes Instead of relying solely on total true positives (TP), false negatives (FN), and false positives (FP), the micro-averaged F1-score can be derived by first computing the micro-averaged precision and recall across the entire dataset before applying the relevant equation However, it is important to note that the micro F1-score is based on overall totals rather than individual class counts, which can lead to specific outcomes.

The Macro F1-score is a conventional method for averaging F1-scores across multiple classes It computes the average by summing the individual F1-scores and dividing by the total number of classes This approach evaluates each class independently, without applying any weights during the aggregation process.

In addition to accuracy and F1-score, key performance metrics for the developed system include execution time and resource requirements Given that the system is intended for monitoring and primarily used in mobile applications, it must be compatible with devices that have limited computing power Therefore, ensuring minimal system requirements is essential for optimal functionality.

Introduction

The development of reliable technology begins with extracting a valid dataset that meets the system's requirements Multi-channel household acoustic sounds can be recorded using evenly spaced microphone arrays at an appropriate sampling frequency After data acquisition, it is essential to pre-process the raw data for feature extraction This chapter evaluates various existing databases and pre-processing techniques, highlighting their advantages and disadvantages Additionally, it discusses the curation of a realistic synthetic database created for this work, which will be utilized to evaluate the performance of the proposed algorithms in comparison to existing methodologies in later stages.

Existing Databases

Pre-processing is essential for both experimentally extracted raw data and data obtained from databases, as it helps prevent overfitting and reduces noise in sound signals Various pre-processing techniques will be explored in the following section While several environmental sound databases, such as NOISEX-92 and the AURORA-2 corpus, are available, most are restricted to single or double channel audiometry and require a fee for access.

The Sound INterfacting through the Swarm (SINS) database is a valuable resource for multi-channel acoustics, featuring domestic activity acoustics Developed as part of the DCASE Challenge, this database utilizes an ASN with 13 nodes, each equipped with 4 microphones Additionally, the SINS database provides a collection of acoustic data that includes SVM-trained Mel Frequency Cepstral Coefficient (MFCC) features.

The TUT database is a similar database that has been initiated as a part of the DCASE 2013 challenge

The dataset comprises 15 distinct scenes that capture human activities and home surveillance, recorded at a 44.1 kHz sampling rate and 24-bit resolution It underwent a two-level annotation process and privacy screening with post-processing Unlike the SINS database, this dataset provides acoustic data in the form of MFCC features, which are classified using a Gaussian Mixture Model (GMM) rather than a Support Vector Machine (SVM) Additionally, it emphasizes environmental scenes rather than domestic acoustic environments.

The Diverse Environments Multi-channel Acoustic Noise Database (DEMAND) is a valuable resource for this study, featuring multi-channel recordings of environmental noise These recordings were captured using a planar array of 16 microphones strategically arranged to enhance data collection.

The study utilized a sound database with recordings taken at a sampling rate of 48 kHz, organized into four rows with 5 cm spacing Sounds were classified into various categories, including those beyond domestic environments Within the domestic category, sounds were further categorized based on their location, such as the kitchen, living room, or washroom Therefore, this research necessitates a more detailed sound database specifically focused on domestic environments.

Noise Reduction Techniques

Data Augmentation Techniques

Segmentation and Pre-processing Technique for Proposed System

Development of the DASEE Synthetic Database

Feature Extraction for Audio Classification

Feature Extraction Methodology for Node Location Estimation

Development of MAlexNet-40

MAlexNet-40 as a Compact Neural Network Model

Examining the Robustness of MAlexNet-40

Design Thinking Approach for Graphical User Interface Development

Integrated Domestic Multi-Channel Audio Classifier

Graphical User Interface

Future Work and Research Directions

Ngày đăng: 26/07/2023, 07:44

Nguồn tham khảo

Tài liệu tham khảo Loại Chi tiết
[2] I. Korolev, “Alzheimer's Disease: A Clinical and Basic Science Review,” Medical Student Research Journal, vol. 4, pp. 24-33, September 2014 Sách, tạp chí
Tiêu đề: Alzheimer's Disease: A Clinical and Basic Science Review
Tác giả: I. Korolev
Nhà XB: Medical Student Research Journal
Năm: 2014
[4] H. Ayyoub, “Team Assessment and Planning of Care: Vascular Dementia,” Middle East Journal of Age and Ageing, vol. 12, no. 2, 2015 Sách, tạp chí
Tiêu đề: Team Assessment and Planning of Care: Vascular Dementia
Tác giả: H. Ayyoub
Nhà XB: Middle East Journal of Age and Ageing
Năm: 2015
[6] M. A. a. S. Ramakrishnan, “Detection of Alzheimer disease in MR Images using structure tensor,” in 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, Chicago, IL, 2014 Sách, tạp chí
Tiêu đề: Detection of Alzheimer disease in MR Images using structure tensor
Tác giả: M. A. a. S. Ramakrishnan
Nhà XB: 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society
Năm: 2014
[7] Second Wind Dreams, Featured on ABC News, 2010, 1997-2017. [Online]. Available: https://www.secondwind.org/virtual-dementia-tourreg.html. [Accessed 23 Februrary 2019].[8] Scotland's National Dementia Strategy, 2010. [Online]. Available:http://www.scotland.gov.uk/Publications/2019/09/10151751/17 Sách, tạp chí
Tiêu đề: Scotland's National Dementia Strategy
Năm: 2010
[9] Health, D.O., “A Vistion for Change: Report of the expert group on mental health policy,” 2006 Sách, tạp chí
Tiêu đề: A Vistion for Change: Report of the expert group on mental health policy
Tác giả: D.O. Health
Năm: 2006
[10] T. Foley and G. Swanwick, Dementia: Diagnosis and Management in General Practice, Q. i. P. Committee, Ed., ICGP, 2014 Sách, tạp chí
Tiêu đề: Dementia: Diagnosis and Management in General Practice
Tác giả: T. Foley, G. Swanwick
Nhà XB: ICGP
Năm: 2014
[11] J. Hort, J. O'Brien, G. Gainotti, T. Pirttila, B. Popescu, I. Rektorova, S. Sorbi and P. Scheitens, “EFNS Guidelines for the Diagnosis and Management of Alzheimer's Disease,” European Journal of Neurology, 2010 Sách, tạp chí
Tiêu đề: EFNS Guidelines for the Diagnosis and Management of Alzheimer's Disease
Tác giả: J. Hort, J. O'Brien, G. Gainotti, T. Pirttila, B. Popescu, I. Rektorova, S. Sorbi, P. Scheitens
Nhà XB: European Journal of Neurology
Năm: 2010
[12] M. Rossor, N. Fox and e. al., “The Diagnosis of Young On-set Dementia,” Lancet Neurology, vol. 9, no. 8, pp. 793- 806, August 2010 Sách, tạp chí
Tiêu đề: The Diagnosis of Young On-set Dementia
Tác giả: M. Rossor, N. Fox, e. al
Nhà XB: Lancet Neurology
Năm: 2010
[13] K. Jellinger, “The enigma of mixed dementia,” Elsevier, vol. 3, no. 1, pp. 40-53, January 2007 Sách, tạp chí
Tiêu đề: The enigma of mixed dementia
Tác giả: K. Jellinger
Nhà XB: Elsevier
Năm: 2007
[14] N. Custodio, R. Montesinos, D. Lira, E. Herrera-Perez, Y. Bardales and L. Valeriano-Lorenzo, “Mixed Dementia: A review of the evidence,” Dementia Neuropsychology; Scielo, vol. 11, no. 4, pp. 364-370, December 2017 Sách, tạp chí
Tiêu đề: Mixed Dementia: A review of the evidence
Tác giả: N. Custodio, R. Montesinos, D. Lira, E. Herrera-Perez, Y. Bardales, L. Valeriano-Lorenzo
Nhà XB: Dementia Neuropsychology
Năm: 2017
[15] Ministry of Health and Long-term care, “Developing Ontario's Dementia Strategy: A Discussion Paper,” Ontario, 2016 Sách, tạp chí
Tiêu đề: Developing Ontario's Dementia Strategy: A Discussion Paper
Tác giả: Ministry of Health and Long-term care
Nhà XB: Ontario
Năm: 2016
[17] M. Wortmann, February 2019. [Online]. Available: https://www.openaccessgovernment.org/dementia-become-trillion-dollar-disease-2018/26160/ Sách, tạp chí
Tiêu đề: Dementia to become trillion dollar disease by 2018
Tác giả: M. Wortmann
Nhà XB: Open Access Government
Năm: 2019
[19] The NAtional Centre for Social and Economic Modelling NATSEM, “Economic Cost of Dementia in Australia: 2016- 2056,” 2016 Sách, tạp chí
Tiêu đề: Economic Cost of Dementia in Australia: 2016- 2056
Tác giả: The NAtional Centre for Social and Economic Modelling NATSEM
Năm: 2016
[20] J. O'Keeffe, J. Maier and M. Freiman, “Assistive Technology for People with Dementia and Their Caregivers at Home: What Might Help,” Washington, DC, 2020 Sách, tạp chí
Tiêu đề: Assistive Technology for People with Dementia and Their Caregivers at Home: What Might Help
Tác giả: J. O'Keeffe, J. Maier, M. Freiman
Nhà XB: Washington, DC
Năm: 2020
[21] L. Garcon, C. Khasnabis, L. Walker, Y. Nakatani, J. Lapitan, J. Borg and e. al., “Medical and assistive health technology: meeting the needs of aging populations,” in Gerontologist, 2016 Sách, tạp chí
Tiêu đề: Medical and assistive health technology: meeting the needs of aging populations
Tác giả: L. Garcon, C. Khasnabis, L. Walker, Y. Nakatani, J. Lapitan, J. Borg, e. al
Nhà XB: Gerontologist
Năm: 2016
[22] K. Marasinghe, J. Lapitan and A. Ross, “Assistive Technologies for ageing populations in six low-income and middle- income countries: A systematic review,” BMJ Innovations, vol. 1, no. 4, pp. 182-195, October 2015 Sách, tạp chí
Tiêu đề: Assistive Technologies for ageing populations in six low-income and middle- income countries: A systematic review
Tác giả: K. Marasinghe, J. Lapitan, A. Ross
Nhà XB: BMJ Innovations
Năm: 2015
[23] T. Chen, C. King, A. Thomaz and C. Kemp, “Touched by a Robot: An Investigation of Subjective Responses to Robot Initiated Touch,” in ACM/ IEEE Conference on Human Robot Interaction, Lausanne, Switzerland, 2011 Sách, tạp chí
Tiêu đề: Touched by a Robot: An Investigation of Subjective Responses to Robot Initiated Touch
Tác giả: T. Chen, C. King, A. Thomaz, C. Kemp
Nhà XB: ACM/IEEE Conference on Human Robot Interaction
Năm: 2011
[24] D. Goovaerts, “More than 40 percent of Americans have Adopted Smart Home Technology, what's stopping the rest?,” 19 May 2017. [Online]. Available: https://www.ecnmag.com/data-focus/2017/05/more-40-percent-americans-have-adopted-smart-home-tech-whats-stopping-rest Sách, tạp chí
Tiêu đề: More than 40 percent of Americans have Adopted Smart Home Technology, what's stopping the rest
Tác giả: D. Goovaerts
Nhà XB: ECN Magazine
Năm: 2017
[25] Statista, November 2018. [Online]. Available: https://www.statista.com/outlook/279/109/smart-home/united-states#market-revenue. [Accessed February 2019].[26] World Health Organization (WHO), 2017. [Online]. Available:http://apps.who.int/iris/bitstream/10665/254660/1/WHO-EMP-IAU- 2017.02-eng.pdf Sách, tạp chí
Tiêu đề: Smart Home - Worldwide | Statista Market Forecast
Tác giả: Statista
Nhà XB: Statista
Năm: 2018
[27] J. Cocco, “Smart home technology for the elderly and the need for regulation,” Pittsburgh Journal of Environmental and Public Health Law, vol. 6, no. 1, pp. 85-108, 2011 Sách, tạp chí
Tiêu đề: Smart home technology for the elderly and the need for regulation
Tác giả: J. Cocco
Nhà XB: Pittsburgh Journal of Environmental and Public Health Law
Năm: 2011

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

🧩 Sản phẩm bạn có thể quan tâm

w