| |
Last updated on September 30, 2024. This conference program is tentative and subject to change
Technical Program for Wednesday October 9, 2024
|
WeAT1 |
MR01 |
Cybernetics and Quantum Systems 3 |
Regular Papers - Cybernetics |
Chair: Sheng, Taoran | University of Texas at Arlington |
|
08:45-09:05, Paper WeAT1.3 | |
Fine-Grained Derivative-Free Simultaneous Optimistic Optimization with Local Gaussian Process |
|
Song, Junhao | East China Normal University |
Zhang, Yangwenhui | East China Normal University |
Qian, Hong | East China Normal University |
Keywords: Machine Learning, Computational Intelligence, Evolutionary Computation
Abstract: Derivative-free optimization has achieved remarkable success across a variety of applications where the explicit formulation of an objective function is inaccessible. Learning an accurate surrogate model from solutions and their function values is crucial for derivative-free optimization. Methods for constructing global surrogate models, such as Bayesian optimization (BO), encounter the challenge of high learning cost, which impairs optimization efficiency. Splitting the entire search domain into smaller regions, a series of domain partition methods are proposed, like simultaneous optimistic optimization (SOO). It has demonstrated notable effectiveness in derivative-free optimization but still has room for improvement due to its relatively coarse-grained partition strategy. To this end, this paper proposes a fine-grained simultaneous optimistic optimization (FGSOO) method with local Gaussian process. Specifically, FGSOO designs a fine-grained partition strategy to endow SOO with the capability of cross-height comparison, and utilizes local Gaussian process to make nodes' potential more representative, so as to reduce the required number of solutions for learning surrogate models. Compared with BO, FGSOO reduces the learning cost. Meanwhile, compared with SOO, FGSOO could avoid unnecessary partition. The experimental results on real-world tasks, such as trajectory optimization and molecule substructure optimization, verify that FGSOO surpasses the compared methods in improving efficiency while maintaining effectiveness.
|
|
09:05-09:25, Paper WeAT1.4 | |
Domain Knowledge Based Weakly Self-Supervised Human Activity Recognition with Wearables |
|
Sheng, Taoran | University of Texas at Arlington |
Huber, Manfred | The University of Texas at Arlington |
Keywords: Machine Learning, Application of Artificial Intelligence, Deep Learning
Abstract: Recognizing different types of human activities from wearable sensor-based data remains a challenging research topic in ubiquitous computing, despite the availability of embedded sensors in smartphones and wearable devices. The lack of labeled data poses a significant hurdle for existing human activity recognition (HAR) systems that heavily rely on supervised methods. In this paper, we propose a novel weakly self-supervised approach consisting of two stages. Firstly, our model leverages the inherent nature of human activities to project the data into an embedding space, grouping similar activities together. Secondly, the model is fine-tuned using similarity information in a few-shot learning fashion, enhancing the embedding's discriminative power. This enables downstream classification or clustering tasks to benefit from the learned embeddings. We evaluate our framework on three benchmark datasets and demonstrate its effectiveness. Our approach achieves comparable performance to pure supervised techniques applied directly to fully labeled datasets, thereby aiding in identifying and categorizing underlying human activities. The results highlight the potential of our approach to improve clustering algorithms for activity recognition tasks in real-world scenarios with limited labeled data.
|
|
09:25-09:45, Paper WeAT1.5 | |
UG-STNN:A Spatial-Temporal Neural Network Based on Unsupervised Graph Representation Module for Traffic Flow Prediction |
|
Zhang, Enwei | Qingdao University |
Cheng, Zesheng | College of Computer Science and Technology, Qingdao University, |
Wang, Tiankuan | Faculty of Computer Science, University of Alberta, Edmonton, Ca |
Liu, Weidong | Menual School, Qingdao |
Keywords: Machine Learning
Abstract: Accurate and efficient traffic flow prediction helps to build an intelligent transportation system and improve the travel experience in daily life. In this study, a new Spatial-Temporal Neural Network Based on Unsupervised Graph Representation Module (UG-STNN) is proposed to improve the graph convolution module, which uses unsupervised learning to extract features in spatial dimensions, and it can learn the structural and feature information in the graph better. Our UG-STNN uses fewer convolutional layers to reduce the number of parameters, decrease the complexity of the model, and improve performance and accuracy. From the experimental results of UG-STNN on different test datasets, the model can approach or even achieve better prediction results compared with other models, which well illustrates the accuracy and stability of the UG-STNN model.
|
|
WeAT2 |
MR02 |
Entertainment and Media Computing |
|
Chair: Zhao, Hongtian | Xinjiang University |
|
08:05-08:25, Paper WeAT2.1 | |
ReMark: Reversible Lexical Substitution-Based Text Watermarking |
|
Jiang, Ziyu | Sichuan University |
Wang, Hongxia | Sichuan University |
Keywords: Information Assurance and Intelligence, Multimedia Computation, Application of Artificial Intelligence
Abstract: Neural-based natural language watermarking (NLW) shows promise for generating context-aware lexical substitutions, minimizing semantic loss in watermarked text. However, existing works confront two primary challenges: 1) the reliance and sensitivity on textual context during substitutes generation hinders text reversibility, and 2) strict synchronization constraints on the generation order of substitutes from both original and watermarked text blocks out some suitable substitutes, limiting watermark capacity. This paper puts forward a reversible neural NLW approach with improved capacity and text quality. Specifically, we construct a novel lexical substitution system (LSS), utilizing prompt learning for candidates generation and comprehensive assessment features for candidates ranking. A reversible watermarking scheme is then presented by ingeniously screening recoverable positions and enabling multi-bit substitutions via the proposed LSS. Experiments validate that our method achieves complete reversibility while enhancing watermark payload and text fidelity compared to prior arts.
|
|
08:25-08:45, Paper WeAT2.2 | |
Improving Multi-View Vehicle Identification in Complex Scenes Using Robust Deep Neural Networks |
|
Zhao, Hongtian | Xinjiang University |
Keywords: Deep Learning, Multimedia Computation, AI and Applications
Abstract: Vehicle re-identification is crucial for intelligent transportation systems and traffic management, enabling the matching of vehicles across diverse camera captures. The primary challenge lies in the significant variation in appearance and background due to different camera angles, which complicates the retrieval of consistent vehicle images from a database. The variability directly impacts the effectiveness of re-identification techniques. This paper proposes a novel learning approach that leverages a view-consistent triplet loss framework and region segmentation to address the challenges of pose variation and background complexity in vehicle re-identification due to multi-view imaging. Specifically, the consistency of segmentation area distribution is used to estimate view consistency, and then triplets are selected based on it. Concentrating on distinguishing features between sample pairs that are likely to be confused, our approach markedly enhances model robustness in scenarios involving multiple perspectives. Experimental evaluations on the Veri776 dataset demonstrate that the proposed method surpasses several state-of-the-art techniques across various metrics and shows exceptional performance in recognizing samples with complex viewpoints, thus validating the efficacy of our approach.
|
|
08:45-09:05, Paper WeAT2.3 | |
Multimodal Mutual Learning with Online Knowledge Distillation for Dim Object Recognition in Aerial Images |
|
Li, Zixing | National University of Defense Technology |
Lan, Zhen | National University of Defense Technology |
Yan, Chao | Nanjing University of Aeronautics and Astronautics |
Xiang, Xiaojia | National University of Defense Technology |
Tang, Dengqing | National University of Defense Technology |
Keywords: Deep Learning, Multimedia Computation
Abstract: Deep learning methods have shown promise in various visual tasks such as object recognition. However, achieving robust and accurate performance in dim object recognition for remote sensing images remains challenging in the field of computer vision. This challenge can be attributed to factors such as cluttered backgrounds, varying observing angles, and limited availability of labeled data. In contrast, the human brain exhibits robust and efficient recognition of sensitive targets. To leverage the strengths of both computer calculation and human cognition, we propose a multimodal mutual learning with online knowledge distillation method (MMOKD) for object recognition. Our approach enables simultaneous training and mutual learning between modalities, where each modality serves as both a teacher and a student. A series of experiments are conducted to verify the potential of multimodal learning for object recognition. The results demonstrate that our approach not only enhances the robustness of multimodal fusion model, but also improves the accuracy of visual modality.
|
|
WeAT3 |
MR03 |
Wearable Computing |
|
Chair: Ting, Wei-Lun | Department of Electronic Engineering, National Taipei University of Technology |
|
08:05-08:25, Paper WeAT3.1 | |
Real-Time Pedestrian Dead Reckoning for IoT Based Platform-Independent Positioning System |
|
Vu, Anh Van | Korea Advanced Institute of Science and Technology |
Nguyen, Thanh Minh | Korea Advanced Institute of Science and Technology |
Sung, Changmin | Korea Advanced Institute of Science and Technology |
Han, Dongsoo | Korea Advanced Institute of Science and Technology |
Keywords: Wearable Computing
Abstract: This paper introduces a real-time pedestrian dead reckoning (PDR) algorithm for Internet of Things (IoT) devices. It is motivated by a practical challenge encountered during the development and operation of a positioning system named KAILOS, which provides positioning services via a smartphone application. Its dependence on smartphones causes data collection constraints imposed by operating systems. To mitigate this issue, we designed and incorporated a dedicated IoT device into the system, enabling full access to sensing data. This invention demands developing and porting a PDR algorithm into the IoT device to address the communication bottleneck with a remote positioning server. Our goal was to process a large amount of data from motion sensors instantly on the device to produce precise positioning information without forwarding all of the data to the server. To this end, we propose a new approach for detecting steps using acceleration differential in our PDR. Additionally, a dynamic gyroscope bias update strategy is also included to enhance the capability of heading estimation. These advancements not only enhance the accuracy of the PDR algorithm but also facilitate its implementation on IoT devices. We practically deployed the PDR algorithm into our IoT hardware platform called Kailos Tag (K-Tag). Via the extensive experiments conducted both indoors and outdoors, we found that our real-time PDR outperformed the conventional methods. It reduces step detection errors (SDE) to approximately 1.6%, travel distance errors (TDE) to below 1.8%, and end/start errors(E/SE) to about 3.2m regardless of environment. Moreover, it enables an average positioning latency of 2.49 ms, while consuming only 20% of CPU usage and 8.6% of total power consumption.
|
|
08:25-08:45, Paper WeAT3.2 | |
FedBChain: A Blockchain-Enabled Federated Learning Framework for Improving DeepConvLSTM with Comparative Strategy Insights |
|
Li, Gaoxuan | Monash University |
Lim, Chern Hong | Monash University Malaysia |
Ma, Qiyao | Sichuan University |
Tang, Xinyu | Monash University |
Tew, Hwa Hui | Monash University Malaysia |
Ding, Fan | Monash University |
Luo, Xuewen | Monash University |
Keywords: Wearable Computing, Human-centered Learning, Systems Safety and Security,
Abstract: Recent research in the field of Human Activity Recognition has shown that an improvement in prediction performance can be achieved by reducing the number of LSTM layers. However, this kind of enhancement is only significant on monolithic architectures, and when it runs on large-scale distributed training, data security and privacy issues will be reconsidered, and its prediction performance is unknown. In this paper, we introduce a novel framework: FedBChain, which integrates the federated learning paradigm based on a modified DeepConvLSTM architecture with a single LSTM layer. This framework performs comparative tests of prediction performance on three different real-world datasets based on three different hidden layer units (128, 256, and 512) combined with five different federated learning strategies, respectively. The results show that our architecture has significant improvements in Precision, Recall and F1-score compared to the centralized training approach on all datasets with all hidden layer units for all strategies: FedAvg strategy improves on average by 4.54%, FedProx improves on average by 4.57%, FedTrimmedAvg improves on average by 4.35%, Krum improves by 4.18% on average, and FedAvgM improves by 4.46% on average. Based on our results, it can be seen that FedBChain not only improves in performance, but also guarantees the security and privacy of user data compared to centralized training methods during the training process. The code for our experiments is publicly available (https://github.com/Glen909/FedBChain).
|
|
08:45-09:05, Paper WeAT3.3 | |
Development of Smart Mask System Integrated with Alert Detection and Vital-Sign Measurement (I) |
|
Ting, Wei-Lun | Department of Electronic Engineering, National Taipei University |
Hsiao, Chun-Chieh | Lunghwa University of Science and Technology |
Lee, Ren-Guey | National Taipei University of Technology |
Keywords: Wearable Computing, Environmental Sensing,, Assistive Technology
Abstract: This research focuses on enhancing worker safety in environments with high levels of TVOC gases, such as toluene, a neurotoxin. Traditional cartridge replacement methods, based on fixed intervals, lack precision and pose risks. We propose a novel approach using the SGP30 sensor to monitor cartridge effectiveness, aligned with PEL-TWA regulations. Our system triggers alerts when sensor readings exceed 10 ppm for toluene, 35 ppm for carbon monoxide, and 5000 ppm for carbon dioxide. Additionally, we incorporate physiological monitoring via Photoplethysmography (PPG) using the AFE4404 sensor to assess Heart Rate (HR) and Heart Rate Variability (HRV), alongside respiration monitoring with the D6F-P0010AM2 airflow sensor. Data from these sensors is transmitted via low-power Bluetooth to a mobile APP, enabling real-time monitoring of the wearer’s condition. In case of cartridge failure or abnormal physiological readings, the APP triggers vibrations on the mask and automatically sends an SOS to the employer’s server via UDP protocol, ensuring immediate intervention. This system aims to enhance occupational safety by reducing the risk of accidents and safeguarding worker well-being.
|
|
WeAT5 |
MR05 |
Cyber-Physical Systems and Robotics 1 |
Regular Papers - SSE |
Chair: George, Nijil | TCS Research, Tata Consultancy Services Ltd |
|
08:05-08:25, Paper WeAT5.1 | |
Visual Anomaly Detection with Self-Attention and Separate Memory Bank |
|
Hattori, Kosaburo | Ritsumeikan University |
Ishibashi, Ryuto | Ritsumeikan University |
Kaneko, Hayata | Ritsumeikan University |
Meng, Lin | Ritsumeikan University |
Izumi, Tomonori | Ritsumeikan University |
Keywords: Robotic Systems, Soft Robotics, Manufacturing Automation and Systems
Abstract: Declining birthrate and aging populations are progressing all over the world. This has led to labor shortage, making visual inspections more challenging in various industries. Recently, visual anomaly detection methods using deep learning have been proposed to solve these problems. However, they are computationally expensive and difficult to infer in real-time, even in a GPU environment. In addition, while they detect structural anomalies (e.g., scratches and stains), logical anomalies(e.g., mis-position and mis-number) cannot be detected. This work proposes an anomaly detection method to detect both structural and logical anomalies with high speed by improving PatchCore. The proposal applies self-attention mechanism for the intermediate layer of the pre-trained Convolutional Neural Networks(CNN) model. Self-attention mechanism enables the model to understand the relationships between image features and detect logical anomalies. In addition, the global and local features are extracted from the intermediate layer of the pretrained CNN model and stored in Separate Memory Bank (SMB). SMB leads to improving AUROC, which represents accuracy, by calculating features for each feature type. It also avoids unnecessary upsampling and reduces the dimensionality, thus improving inference speed. Experiments validate the proposed method and compare previous anomaly detection methods. Experiments evaluate the performance of the proposal for the CAD-SD dataset and MVTec LOCO dataset, which contains structural and logical anomalies. For Co-occurrence dataset, the experimental results show that the proposal achieves 98.5% (improving 2.2%) for AUROC and 16.1 (improving 66.6%) for FPS compared to the state-of-the-art method. Also, the experimental results show that the proposal achieves 82.8% (improving 0.9%) for MVTec-LOCO dataset. Hence, the proposal can contribute to the efficiency and automation of manufacturing, medical, and other fields.
|
|
08:25-08:45, Paper WeAT5.2 | |
HyperSurf: Quadruped Robot Leg Capable of Surface Recognition with GRU and Real-To-Sim Transferring |
|
Satsevich, Sergei | Skolkovo Institute of Science and Technology |
Savotin, Yaroslav | Skolkovo Institute of Science and Technology |
Belov, Danil | Skolkovo Institute of Science and Technology |
Pestova, Elizaveta | Skolkovo Institute of Science and Technology |
Erkhov, Artem | Skolkovo Institute of Science and Technology |
Khabibullin, Batyr | Skolkovo Institute of Science and Technology |
Bazhenov, Artem | Skolkovo Institute of Science and Technology |
Kovalev, Vyacheslav | Skolkovo Institute of Science and Technology |
Fedoseev, Aleksey | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Mechatronics, Adaptive Systems, Robotic Systems
Abstract: This paper introduces a system of data collection acceleration and real-to-sim transferring for surface recognition on a quadruped robot. The system features a mechanical single- leg setup capable of stepping on various easily interchangeable surfaces. Additionally, it incorporates a GRU-based Surface Recognition System, inspired by the system detailed in the Dog-Surf paper [1]. This setup facilitates the expansion of dataset collection for model training, enabling data acquisition from hard-to-reach surfaces in laboratory conditions. Furthermore, it opens avenues for transferring surface properties from reality to simulation, thereby allowing the training of optimal gaits for legged robots in simulation environments using a pre-prepared library of digital twins of surfaces. Moreover, enhancements have been made to the GRU-based Surface Recognition System, allowing for the integration of data from both the quadruped robot and the single-leg setup. The dataset and code have been made publicly available.
|
|
08:45-09:05, Paper WeAT5.3 | |
System for Autonomous Management of Retail Shelves Using an Omnidirectional Dual-Arm Robot with a Novel Soft Gripper |
|
George, Nijil | TCS Research, Tata Consultancy Services Ltd |
Saha, Somdeb | Tata Consultancy Services |
Parab, Shubham | Tata Consultancy Services |
Vakharia, Vismay | Tata Consultancy Services |
Lima, Rolif | Tata Consultancy Services |
Vatsal, Vighnesh | TCS Research, Tata Consultancy Services Ltd |
Das, Kaushik | TCS Research |
Keywords: Robotic Systems, Soft Robotics, Consumer and Industrial Applications
Abstract: Managing shelves in retail stores includes restocking, rearrangement and replenishment of products. As these are some of the most labor-intensive activities, there has been widespread demand from retailers for automation in this domain. However, major challenges still remain in perception, navigation and manipulation while implementing an autonomous robotic system for this purpose. We present a system aimed at addressing some of these challenges through novel approaches. In terms of perception, we have developed a transformer-based local anomaly detection algorithm that can identify misplaced items without the need for a central database. Navigation of the omnidirectional mobile base is performed through stereo vision and LiDAR sensors. Finally, identifying grasping and manipulation as one of the key shortcomings of present robotic systems in this domain, we have developed a customized soft robotic gripper targeted at retail objects. It has compliant cable-driven fingers, and a palm configuration that can be adapted in real-time based on the target object's geometry. Coupled with a conventional two-fingered gripper in a dual-arm setup, this system is equipped to handle most objects encountered in a retail setting. We describe the underlying hardware and algorithms for each component of the system, evaluating their individual performance. We then evaluate the whole system in a mock retail setup, demonstrating promising results for autonomous management of shelves.
|
|
09:05-09:25, Paper WeAT5.4 | |
Jam-Absorption Driving with Data Assimilation |
|
Li, Siyu | The University of Tokyo |
Nishi, Ryosuke | Tottori University |
Yanagisawa, Daichi | The University of Tokyo |
Nishinari, Katsuhiro | The University of Tokyo |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle
Abstract: This paper introduces a data assimilation (DA) framework based on the extended Kalman filter-cell transmission model, designed to assist jam-absorption driving (JAD) operation to alleviate sag traffic congestion. To ascertain and demonstrate the effectiveness of the DA framework for JAD operation, in this paper, we initially investigated its impact on the motion and control performance of a single absorbing vehicle. Numerical results show that the DA framework effectively mitigated underestimated or overestimated control failures of JAD caused by misestimation of key parameters (e.g., free flow speed and critical density) of the traffic flow fundamental diagram. The findings suggest that the proposed DA framework can reduce control failures and prevent significant declines and deteriorations in JAD performance caused by changes in traffic characteristics, e.g., weather conditions or traffic composition.
|
|
WeAT6 |
MR06 |
Infrastructure Systems and Services 1 |
Regular Papers - SSE |
Chair: Walter, Marcelo Luis | PUC-PR |
|
08:05-08:25, Paper WeAT6.1 | |
Satellite Image and Tree Canopy Height Analysis Using Machine Learning on Google Earth Engine with Carbon Stock Estimation |
|
Loo, ChuKiong | University of Malaya |
Wang, Huang Han | University of Malaya |
Keywords: Smart Buildings, Smart Cities and Infrastructures
Abstract: This research presents a comprehensive investigation into the dynamics of forested ecosystems using advanced geospatial techniques and machine learning applications, focusing on the University of Malaya study area. The study aims to contribute crucial data for informed decision-making aligned with sustainable development goals. It encompasses canopy height estimation, aboveground biomass density prediction, and carbon stock estimation. Machine learning algorithms, including Random Forest, Gradient Boost Tree Regression, and Support Vector Machine, are employed for canopy height estimation. Their performance is evaluated with and without Principal Component Analysis using metrics such as Root Mean Squared Error and R-squared. Results, summarized in Table 1 and Table 2, highlight the variability in canopy height predictions across different models and feature selection methods. The research explores challenges associated with GEDI Aboveground Biomass Density data, emphasizing spatial variability in model performance across different strata. Results, detailed in Table 3, underscore the importance of tailoring the model, especially in areas characterized by high biomass canopy forests. The integration of Aboveground Biomass Density data with tree cover datasets forms the basis for aboveground carbon stock estimation. Carbon stock is calculated considering forest area, land-use types, and specific carbon content factors. Findings, presented in Table 5, reveal a spectrum of aboveground carbon stock estimates, reflecting the complexity of the University of Malaya study area. This research advances remote sensing and machine learning in forestry and environmental monitoring. Its insights support informed decision-making and policy formulation.
|
|
08:25-08:45, Paper WeAT6.2 | |
Evaluation of Machine Learning Models in a Smart Water Metering System |
|
Walter, Marcelo Luis | PUC-PR |
Ribeiro, Juliano | Pontifícia Universidade Católica - PUC-PR |
Nunes, Leonardo Reis | Sumersoft Tecnologia |
Nodari, Alexandre Luis | PUCPR |
Pellenz, Marcelo Eduardo | Graduate Program in Computer Science (PPGIa) - Pontifical Cathol |
Scalabrin, Edson Emilio | Pontifícia Universidade Católica Do Paraná |
Tramontini, Ramon | PUC Parana |
Keywords: Smart Metering, Smart Buildings, Smart Cities and Infrastructures, Fault Monitoring and Diagnosis
Abstract: The integration of AI with IoT heralds the era of AIoT (Artificial Intelligence of Things). It represents a transformative approach in technology and opens up a new opportunity for deploying machine learning models in embed- ded devices that face resource constraints and operate on the edge of networks. Central to this study is the implementation of computational vision techniques for digit recognition, evaluating various machine learning models, particularly in the context of smart metering. The selected models were converted from GPU-equipped workstations to ESP32-S3 microcontroller-based low-end devices. Through a series of experiments using ESP32-S3 development kits, the MNIST database, and TensorFlow Lite, we explore the effectiveness of these models in smart metering applications, focusing on accuracy, inference times, and the challenges in model conversion. The findings demonstrate the feasibility of executing machine learning inferences on low-end devices with high accuracy in smart meter contexts. However, challenges such as model size limitations, processing speed, conversion difficulties, and potential accuracy loss were noted. Not all models were viable for conversion to TensorFlow Lite. Simpler models like LeNet5 emerged as effective solutions for smart metering applications, balancing size, accuracy, and latency. This work offers practical insights for researchers and engineers looking to implement machine learning in AIoT and smart metering environments, highlighting the trade-offs and considerations for effective deployment.
|
|
08:45-09:05, Paper WeAT6.3 | |
A Neighborhood Reconstruction-Based Cyber Attack Detection Method for Smart Grid Security |
|
Ren, Wanwan | Central South University |
Peng, Jun | Central South University |
Li, Shuo | Changsha University of Science and Technology |
Zhang, Rui | Changsha University |
Rong, Jieqi | Central South University |
Li, Heng | Central South University |
Keywords: Smart Buildings, Smart Cities and Infrastructures
Abstract: The integration of advanced communication and information technologies in smart grids has led to enhanced efficiency and reliability but also introduced security vulnerabilities, prompting the need for robust cyber attack detection methods. Traditional approaches struggle to capture evolving attack patterns and handle high-dimensional data, highlighting the necessity for more sophisticated approaches. A neighborhood reconstruction-based smart grid attack detection scheme based on subgraphs is proposed. By leveraging Graph Neural Networks (GNNs), the challenge of capturing complex interdependencies among grid nodes is addressed. This approach employs unsupervised learning principles, training the model solely on normal data and utilizing the reconstruction error of node features to detect attacks. Additionally, by subgraph sampling and feature suppression, the model's ability to utilize neighborhood information is enhanced, thereby further improving detection effectiveness. Simulation results on IEEE 30-bus and IEEE 118-bus power system demonstrate the feasibility of the method, achieving a detection accuracy of 96.67% and 97.46%, respectively.
|
|
WeAT7 |
MR07 |
Online - AI Applications 5 |
|
Chair: Zou, Zhiyuan | Wuhan Textile University |
|
08:05-08:25, Paper WeAT7.1 | |
A Multi-Lead Electrocardiogram Signal Classification Method Based on Temporal and Multi-View Contrastive Learning |
|
Li, Luyao | Qilu University of Technology (Shandong Academy of Sciences) |
Liu, Hui | Qilu University of Technology (Shandong Academy of Sciences) |
Zhou, Shuwang | Shandong Artificial Intelligence Institute, Qilu University of T |
Liu, Zhaoyang | Shandong Artificial Intelligence Institute, Qilu University of T |
Shu, Minglei | Shandong Artificial Intelligence Institute, Qilu University of T |
|
|
08:25-08:45, Paper WeAT7.2 | |
XWCoDe: XGBoost with Weighted Code Dependency for Requirements-To-Code Traceability Link Recovery |
|
Zou, Zhiyuan | Wuhan Textile University |
Wang, Bangchao | School of Computer Science and Artificial Intelligence, Wuhan Te |
Deng, Yang | School of Computer Science and Artificial Intelligence, Wuhan Te |
Wan, Hongyan | School of Computer Science and Artificial Intelligence, Wuhan Te |
An, Zhiquan | Wuhan Textile University |
Cao, Yukun | Wuhan Textile University, School of Computer Science and Artific |
Keywords: Machine Learning
Abstract: Information Retrieval (IR), Machine Learning (ML), and Deep Learning (DL) have become mainstream methods for traceability link recovery. However, IR-based methods face the challenge of low precision, while DL-based methods require large-scale training data to achieve better performance. In this paper, we propose a novel model XWCoDe, which apply XGBoost combined with a weighted code dependency strategy to traceability link recovery domain. In order to refine the initial candidate links generated by the XGBoost model, the strategy only modifies low confidence candidate links and pioneers the use of graph embedding technology node2vec to calculate the importance of each code dependency relationship. The experimental results show that the average F1 score of XWCoDe on 4 datasets and 9 training/testing ratios is 12.93% higher than the state-of-the-art method DF4RT.
|
|
08:45-09:05, Paper WeAT7.3 | |
Robotic Crop Disease Monitoring Using Neural Network-Based Prediction and Weighted Path Planning |
|
Sutton, Jacob | University of North Florida |
Dutta, Ayan | University of North Florida |
Kreidl, O. Patrick | University of North Florida |
Boloni, Ladislau | University of Central Florida |
Roy, Swapnoneel | University of North Florida |
Keywords: Application of Artificial Intelligence, Deep Learning, Computational Intelligence
Abstract: Disease control is paramount in modern agriculture to ensure optimal yield. Monitoring the spread of crop diseases is crucial for effective control measures. Traditional methods involve uniform pesticide spraying across entire fields, which can be inefficient and environmentally harmful. In this paper, we propose an intelligent solution employing mobile robots equipped with predictive AI techniques for disease monitoring and targeted intervention. These robots strategically visit select locations within the field, guided by a convolutional and recurrent neural network model trained on limited data to predict disease spread. We introduce a novel weighted path planning algorithm to optimize robot movement within the field considering disease risk and battery constraints. Our approach is implemented in the WaterBerry benchmark, an open-source platform for agricultural robotics. Experimental results demonstrate the efficacy of our technique, showcasing improved prediction accuracy and operational efficiency compared to baseline methods.
|
|
09:05-09:25, Paper WeAT7.4 | |
Optimal Barcode Representation for NLP Embeddings |
|
Sinha, Soumen | Mahindra University |
Asilian Bidgoli, Azam | Wilfrid Laurier University |
Rahnamayan, Shahryar | Brock University |
Keywords: Deep Learning, Evolutionary Computation, Computational Intelligence in Information
Abstract: The utilization of binary representation of the embeddings over real valued features represents a promising avenue, in terms of memory savings and faster operations for various machine learning models. In this research paper, we delve into the exploration of barcode representation for text embeddings derived from BERT, which is optimized using Coordinate Search algorithm. These binary embeddings present a compact representation of text, thereby mitigating memory and computational demands, which is especially advantageous in the context of resource-intensive large-scale text processing tasks. In our study, we introduce a novel optimal threshold technique, coupled with the Coordinate Search algorithm to transform continuous BERT embeddings into binary barcodes thereby enabling effective Natural Language Processing while sustaining computational efficiency. The optimal barcode representations have been applied in Natural Language Processing applications, showcasing its innovative potential in revolutionizing text representation. Through an extensive series of experiments on various NLP task encompassing diverse datasets, we comprehensively evaluate our approach, comparing it against a spectrum of thresholding techniques. The binary embeddings achieved by optimal thresholds outperform traditional binarization methods in terms of accuracy. The proposed method for generating a binary representations is versatile, being independent of the model, data and task, making it applicable across various machine learning applications.
|
|
09:25-09:45, Paper WeAT7.5 | |
Disparity Map-Crack Detection: Combining Disparity Map Feature into Binary Segmentation for Accurate Crack Detection |
|
Liu, Yang | Qingdao University |
Yuan, Genji | Qingdao University |
Li, Jianbo | Qingdao University |
Keywords: Intelligent Transportation Systems
Abstract: To address the limitations of crack detection methods relying solely on RGB images, we propose an innovative approach that incorporates disparity maps as an additional data source. This integration with RGB images aims to enhance crack detection performance. However, the generation of disparity maps is susceptible to image noise and matching errors, leading to inaccurate or mismatched disparity values that may impede precise ground crack detection. To mitigate this challenge, we apply a disparity transformation technique to refine the estimated disparity map, improving the differentiation of crack regions. Additionally, we employ a feature fusion method based on a connected low-loss subspace. This approach adaptively assigns feature weights to facilitate the complementary fusion of disparity and RGB features. Furthermore, the decoder includes a multi-scale feature alignment module that uses the fused encoder features to align each layer in the decoding process. This preserves image details and local features, enhancing the overall detection accuracy. Extensive testing experiments demonstrate a significant breakthrough in crack detection performance, achieving an Intersection over Union (IoU) of 80.48%. Our approach sets the benchmark in crack detection, effectively leveraging multi-source information, mitigating disparity map noise, and enhancing feature fusion.
|
|
WeAT9 |
MR09 |
Deep Learning and Neural Networks 13 |
Regular Papers - Cybernetics |
Chair: Xin, Sida | Academy of Military Sciences |
|
08:05-08:25, Paper WeAT9.1 | |
Adaptive Graph Spatial Temporal Fourier-Enhanced Transformer Networks for Traffic Prediction |
|
Hu, Jun | Hunan University |
He, Xiaolong | Hunan University |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning
Abstract: Traffic prediction is an important component of intelligent transportation systems as it plays a key role in route planning and traffic management. However, traffic flow series present a complex spatial-temporal correlations and nonlinear traffic patterns, predicting traffic accurately is made challenging by this. The current methods are struggling to model the overall trend of traffic flow series and are unable to utilize dynamic information about spatial dependencies. In this paper, we propose an adaptive graph spatial temporal Fourier-enhanced transformer networks (ASTFETN) to tackle the above traffic prediction problems. ASTFETN adopts an encoder-decoder architecture, the encoder and decoder are both composed of multiple spatial-temporal blocks to capture dynamic spatial and nonlinear temporal correlations. Furthermore, there is a transformer attention layer to capture the relationships of historical and future time. Experiments on two datasets, METR-LA and PEMS-BAY, demonstrate that ASTFETN outperforms the state-of-the-art baselines.
|
|
08:25-08:45, Paper WeAT9.2 | |
Multi-Objective Evolutionary Neural Architecture Search for Liquid State Machine |
|
Xin, Sida | Academy of Military Sciences |
Chen, Renzhi | Defense Innovation Institute |
Xiao, Xun | National University of Defense Technology |
Li, Yuan | National University of Defense Technology |
Wang, Lei | Defense Innovation Institute |
Keywords: Evolutionary Computation, Neural Networks and their Applications, Machine Learning
Abstract: Liquid State Machine (LSM) is a brain-inspired computational model that has proven highly effective in various applications, owing to its intrinsic capability to process spatiotemporal information and its minimal training complexity. However, the performance of LSMs significantly depends on the design of their network architecture, which is overly reliant on existing human experience. Furthermore, as the network scale increases, the computing resources required for deployment and operation also increase, so we regarded the network design as a multi-objective problem. To address these challenges, we introduced an effective surrogate-assisted multi-objective evolutionary neural architecture search algorithm that balanced the accuracy and network scale. Our approach utilized parameter sensitivity analysis followed by the upper confidence bound algorithm to reduce the search space. Experimental results demonstrate that we successfully reduced the dimensions of the search space by 11% and the size of the entire search space by 75%. Compared to the state-of-the-art, our approach offered better trade-off solutions, such as a solution that reduced network scale by 32.5% while maintaining the same accuracy, and another that improved accuracy by 1.4% without changing the network scale. Furthermore, the knee point reduced network scale by 25% and simultaneously increased accuracy by 0.7%. The source code can be accessed at https://github.com/XinSida/MOENAS-PSA.
|
|
08:45-09:05, Paper WeAT9.3 | |
Semantic Consistency Based Dual-Asymmetric Discrete Online Hashing for Multi-View Streaming Data Retrieval |
|
Jing, Chen | Beijing University of Posts and Telecommunications |
Zu, Yunxiao | Beijing University of Posts and Telecommunications |
Hou, Bin | Beijing University of Posts and Telecommunications |
Sang, Xinzhu | Beijing University of Posts and Telecommunications |
Liu, Meiru | Beijing University of Posts and Telecommunications |
Keywords: Big Data Computing,, Machine Learning, Multimedia Computation
Abstract: Multi-view online hashing has received much attention due to its huge potential in the area of large-scale multimedia retrieval. However, there are still some issues, e.g., how to alleviate the catastrophic forgetting, how to adequately extract high-level semantic information of multi-view streaming data and improve the discrimination of hash models, and how to effectively optimize the binary constraint problem. In this paper, we propose a novel Semantic Consistency based Dual-asymmetric Discrete Online Hashing method, SC-DDOH for short. It adopts a dual-asymmetric distance-based similarity supervision to retain similarities of new data chunk and database. To extract efficient high-level semantic information, an online semantic consistent supervision to mine the semantic related information from word embedding labels. Moreover, an efficient discrete iterative optimization algorithm is introduced to directly learn hash code in the Hamming space. Experiment results on three large-scale multi-view datasets demonstrate the superiority of SC-DDOH over the state-of-the-art baselines.
|
|
WeAT10 |
MR10 |
Image Processing and Pattern Recognition 4 |
Regular Papers - Cybernetics |
Chair: Yuzhe, Wang | Dalian Minzu University |
|
08:05-08:25, Paper WeAT10.1 | |
EMPA-YOLO: A Lightweight Real-Time Weed Detection Method Suitable for Natural |
|
Yuzhe, Wang | Dalian Minzu University |
Jiahao, Chen | Dalian Minzu University |
Xiaodong, Duan | Dalian Minzu University |
Li, Zhuohui | DalianMinzuUniversity |
Keywords: Application of Artificial Intelligence, Image Processing and Pattern Recognition, Machine Vision
Abstract: Weed detection is crucial for the healthy growth of crops, yet existing detection models struggle to perform high-accuracy real-time detection of weeds in natural environments on edge computing devices. This paper introduces EMPA-YOLO, a model designed for rapid, accurate, real-time weed detection on low-performance edge computing devices. It incorporates an efficient multi-scale convolutional structure, C3EMSC, and a lightweight, adaptive weight subsampling layer, LAWDS, into YOLOv5s. Additionally, a logical distillation algorithm, AlignSoftTarget, is proposed for knowledge distillation. Validation on a mixed dataset of crops and weeds showed that EMPA-YOLO improved mAP50 by 11.2%, reduced parameter count by 4.8M, decreased computational load by 8.7GFLOPs, and increased inference frame rate by 58% compared to the original YOLOv5s algorithm. When compared to YOLOv3, YOLOv5s, YOLOv6, YOLOv8s, and RT-DETR, inference speed improved by 89.2%, 60.8%, 77.5%, 76.5%, and 94.7%, respectively, with mean accuracy enhancements of 7.3%, 11.2%, 16.7%, 0.9%, and 1.2%. Real-world testing on edge computing devices met real-time detection requirements, proving its efficacy and practicality in weed detection. Keywords—lightweight model; YOLO; model compression; knowledge distillation; edge computing
|
|
08:25-08:45, Paper WeAT10.2 | |
A Multi-Stream Structure-Enhanced Network for Mesh Denoising |
|
Yang, Yutian | South China University of Technology |
Liang, Lingyu | South China University of Technology |
Yan, Jie | South China University of Technology |
Xu, Yong | South China University of Technology |
Keywords: Neural Networks and their Applications, Image Processing and Pattern Recognition, Application of Artificial Intelligence
Abstract: Triangular meshes provide an efficient representation of 3D shapes. Various applications such as 3D simulation suffer from degradation in geometric quality. This paper proposes a novel Multi-stream Structure-Enhanced Network (MSE-Net) based on graph convolutional networks. The network uses multi-scale features besides vertex position to guide face normal filtering, which can better preserve the geometric feature during the denoising process. In contrast to former methods that focus on filtering vertex coordinate and face normal apart, MSE-Net innovatively fuses more structure features like face area, inner product between face normal and vertex normals, and the interior angles of face to guide the face normal and vertex position updating, utilizing the inherent structural characteristic of Mesh. Our method achieves state-of-the-art performance on several publicly available datasets, demonstrating its effectiveness.
|
|
08:45-09:05, Paper WeAT10.3 | |
Object Detection Approaches to Identifying Hand Images with High Forensic Values |
|
Nguyen, Thanh Thi | Monash University |
Wilson, Campbell | Monash University |
Khan, Imad | Monash University |
Dalins, Janis | Australian Federal Police |
Keywords: AI and Applications, Image Processing and Pattern Recognition, Neural Networks and their Applications
Abstract: Forensic science plays a crucial role in legal investigations, and the use of advanced technologies, such as object detection based on machine learning methods, can enhance the efficiency and accuracy of forensic analysis. Human hands are unique and can leave distinct patterns, marks, or prints that can be utilized for forensic examinations. This paper compares various machine learning approaches to hand detection and presents the application results of employing the best-performing model to identify images of significant importance in forensic contexts. We fine-tune YOLOv8 and vision transformer-based object detection models on four hand image datasets, including the 11k hands dataset with our own bounding boxes annotated by a semi-automatic approach. Two YOLOv8 variants, i.e., YOLOv8 nano (YOLOv8n) and YOLOv8 extra-large (YOLOv8x), and two vision transformer variants, i.e., DEtection TRansformer (DETR) and Detection Transformers with Assignment (DETA), are employed for the experiments. Experimental results demonstrate that the YOLOv8 models outperform DETR and DETA on all datasets. The experiments also show that YOLOv8 approaches result in superior performance compared with existing hand detection methods, which were based on YOLOv3 and YOLOv4 models. Applications of our fine-tuned YOLOv8 models for identifying hand images (or frames in a video) with high forensic values produce excellent results, significantly reducing the time required by forensic experts. This implies that our approaches can be implemented effectively for real-world applications in forensics or related fields.
|
|
WeAT11 |
MR11 |
Image Processing and Pattern Recognition 7 |
|
Chair: Komiya, Daiki | Kanagawa University |
|
08:05-08:25, Paper WeAT11.1 | |
Low-Rank Tensor-Based Two-Dimensional Projection Learning for Feature Extraction |
|
Liang, Xiaojia | South China Normal University |
Wu, Yue | South China Normal University |
Xiao, Xiaolin | South China Normal University |
Keywords: Machine Learning, Image Processing and Pattern Recognition
Abstract: Recently, Low-Rank Matrix (LRM)-based feature extraction methods have drawn increasing attention since they can extract robust features when the data are corrupted. However, these algorithms require a matrix-to-vector transformation to tackle Two-Dimensional (2D) images, through which the spatial structure residing in 2D images is ignored. To solve this problem, we propose a Low-Rank Tensor-based 2D Projection learning model (LRT-2DP) to extract features directly from 2D images as well as to reduce dimensionality. In essence, LRT-2DP embraces the global self-expressiveness property to denoise the corrupted data, from which a 2D projection basis is learned for robust feature extraction. The proposed LRT-2DP can be efficiently optimized with an alternative optimization scheme. Extensive experiments on image feature extraction have demonstrated the superiority of LRT-2DP compared to state-of-the-arts.
|
|
08:25-08:45, Paper WeAT11.2 | |
EE-MVSNet: Deep Learning-Based Cascaded High-Precision Multi-View Stereo Network with ECA and EVC (I) |
|
Zhang, Ziyi | Zhejiang University of Technology |
Kong, Changfei | Zhejiang University of Technology |
Mao, Jiafa | Zhejiang University of Technology |
Cheng, Xu | Norwegian University of Science and Technology |
Chan, Sixian | Zhejiang University of Technology |
Keywords: Deep Learning, Image Processing and Pattern Recognition, Multimedia Computation
Abstract: Multi-view stereo (MVS) has emerged as a pivotal algorithm in 3D reconstruction, garnering significant research attention over the past several decades. While recent coarse-to-fine methods have demonstrated promising results in enhancing the reconstruction quality of traditional algorithms, they often neglect the crucial aspect of feature layer refinement. Additionally, these methods face the challenge of low-cost feature matching. To address these limitations, we propose a novel learning-based MVS framework(EE-MVSNet). Firstly, we propose a novel approach incorporating an explicit visual center (EVC) module within the feature pyramid network (FPN), strengthening the adjustment within feature layers and improving model accuracy. Furthermore, we introduce the ECA+3DCNN module, which utilizes channel attention to alleviate the problem of low-cost feature matching. Finally, our model achieves competitive performance through extensive experimentation on the DTU dataset, showcasing its high-quality 3D reconstruction.
|
|
08:45-09:05, Paper WeAT11.3 | |
A Classification Method for “kawaii” Images Using Semantic Interpretation (I) |
|
Komiya, Daiki | Kanagawa University |
Akiyoshi, Masanori | Kanagawa University |
Keywords: Representation Learning, Image Processing and Pattern Recognition, Application of Artificial Intelligence
Abstract: This paper proposes a classification method for five types of images represented by the word “kawaii”. “Kawaii images” do not have fixed concept or object, which makes it to classify them simply using conventional methods such as Convolutional Neural Network or Support Vector Machine. Our previous study extracted color and shape features from such images, then achieving a classification accuracy of 70.2%. However, this approach did not handle the semantic content of the images. In this study, a classification method based on the constituent elements of “kawaii images” is used, resulting in a classification accuracy of 71.4%. Additionally, the experiment seems to providing possibilities to reflect similarities with human recognition to some extent.
|
|
WeAT12 |
MR12 |
Affective and Cognitive Computing |
Regular Papers - HMS |
Chair: Liu, Muxuan | Ochanomizu University |
|
08:05-08:25, Paper WeAT12.1 | |
Dual-Domain Attention Based Adaptive Graph Convolutional Network for EEG Emotion Recognition |
|
Xu, Tie | South China University of Technology |
Zhang, Tong | South China University of Technology |
Chen, Bianna | South China University of Technology |
Chen, C. L. Philip | University of Macau |
Keywords: Affective Computing, Brain-Computer Interfaces
Abstract: The asymmetry of emotional responses is observed in electroencephalogram (EEG) of different frequency bands across various spatial brain regions in neuroscience research. Many prior works have primarily emphasized the dependencies among channels in the spatial domain, neglecting the dynamic interaction of EEG in both spatial and frequency domains, which may limit the performance of EEG emotion recognition. To address these issues, we propose the dual-domain attention based adaptive graph convolutional network (DDA-AGCN) for EEG emotion recognition. Specifically, we propose the lightweight dual-domain attention mechanism (DDA) based on random vector similarity measurement and the squeeze-excitation technique to capture important characteristics in the channel and frequency domain respectively. Furthermore, the adaptive graph convolutional network (AGCN) is utilized to adaptively filter and refine low signal-to-noise ratio EEG data, while also learning the dynamic connectivity patterns among important EEG channels and extracting higher-level abstract features for emotion recognition tasks. To validate the effectiveness of the proposed method, experimental comparisons were conducted on SEED, SEED-IV, and MPED. The experimental results show that our method achieves highly competitive classification performance compared to existing methods. Moreover, under fair comparison, the DDA demonstrates better performance and computational efficiency than self-attention.
|
|
08:25-08:45, Paper WeAT12.2 | |
Transfer Learning for Emotion Recognition across Depression Patients and Healthy Subjects with Data Alignment and Selection |
|
Jiang, Chao | Shanghai University |
Dai, Yingying | Shanghai University of Electric Power |
Chen, Xi | The School of Communication and Information Engineering, Shangha |
Tang, Yingying | Shanghai Mental Health Center, Shanghai Jiao Tong University Sch |
Li, Yingjie | Shanghai University |
Keywords: Affective Computing, Brain-Computer Interfaces, Cognitive Computing
Abstract: Identifying and understanding emotions is crucial, particularly when considering a range of subjects. This study introduces a novel approach to emotion recognition between depression patients and healthy subjects, amalgamating data alignment (DA) and subject selection (SS) within the transfer learning framework. Three methodologies for DA are explored, including Riemannian alignment (RA), Euclidean alignment (EA), and correlation alignment (CORAL). The Jensen -Shannon (JS) divergence is employed to gauge the similarity between target and source subjects to select potential training datasets. Through conducting cross-subject experiments within depression and healthy cohorts, and employing diverse models, experimental outcomes evince notable enhancements in recognition efficacy facilitated by this combined DA and SS transfer learning paradigm. Moreover, the study demonstrates that despite cognitive challenges in emotion recognition among individuals with depression and those without the disorder, skillful design enables the utilization of data from healthy individuals and trained algorithms to greatly enhance emotion recognition in depression patients, resulting in significant benefits.
|
|
08:45-09:05, Paper WeAT12.3 | |
Do Feature Representations from Different Language Models Affect Accuracy of Brain Encoding Models' Predictions? |
|
Liu, Muxuan | Ochanomizu University |
Kobayashi, Ichiro | Ochanomizu University |
Keywords: Brain-based Information Communications, Cognitive Computing, Human Perception in Multimedia
Abstract: We investigate the impact of feature respresentations derived from different language models on brain encoding models, which are designed to predict brain states from linguistic stimuli. This study aims to determine whether the variances in the feature respresentations of language models, originating from their distinct encoder/decoder architectures, training data quality and quantity, and parameter sizes, affect their predictive accuracy on brain states. By examining how these feature respresentations influence brain encoding models, we identify specific brain regions where the predictability of brain activity is consistently influenced across various models, thereby uncovering similarities in their predictive effectiveness.
|
|
WeBT1 |
MR01 |
Cybernetics and Quantum Systems 4 |
|
Chair: Shen, Yixiang | Shanghai University |
|
11:00-11:20, Paper WeBT1.1 | |
Universum Based Class-Specific Self-Set Broad Learning System for Software Defect Prediction |
|
Tang, LeQi | South China University of Technology |
Huang, Sen | South China University of Technology |
Chen, Wuxing | South China University of Technology |
Bi, Jichao | Zhejiang University |
Zhou, Shan | Technology and Engineering Center for Space Utilization, Chinese |
Yang, Kaixiang | South China University of Technology |
Keywords: Machine Learning, Artificial Social Intelligence, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing
Abstract: In today's rapidly evolving field of information technology, software plays a key role in a wide variety of applications, thus increasing the need for accurate and fast prediction of software defects. Despite the growing interest in software defect prediction, the prevalent prediction methods have paid little attention to the class imbalance embedded in them. To address this problem, we introduce the universum-based Class-Specific Broad learning system (UCSSBLS). UCSSBLS synthesises the prior information in the classification model by computing the mean of two samples from different classes. At the same time universum-based BLS tries to use this information to create a third plane between the two planes of symmetry. Based on the a priori information in the training data and different class distributions, we adaptively modify the penalty parameters to fit the imbalanced class distributions, and the simulation experiments demonstrate that our proposed method is effective in the software defect prediction problem with imbalanced distributions.
|
|
11:20-11:40, Paper WeBT1.2 | |
S-ADMM: Optimizing Machine Learning on Imbalanced Datasets Based on ADMM |
|
Shen, Yixiang | Shanghai University |
Lei, Yongmei | Shanghai University |
Qiu, Qinnan | Shanghai University |
Keywords: Machine Learning
Abstract: Nowadays, addressing large-scale machine learning problems using high-performance computing (HPC) clusters has gained significant importance. The alternating direction method of multipliers (ADMM) is widely used in machine learning for solving optimization problems on clusters. However, ADMM's performance is greatly affected by imbalanced datasets on the HPC cluster. In this paper, we propose the distributed shunt ADMM with a new adaptive penalty method based on the hybrid MPI/OpenMP programming model (S-ADMM). The proposed shunt strategy chooses different sub-problem optimization algorithms to improve the accuracy with the imbalanced datasets. Additionally, we design a novel adaptive penalty parameter method and improve a sub-problem optimization algorithm for S-ADMM. The adaptive penalty parameter method accelerates the algorithm's convergence and the sub-problem optimization algorithm improves the training efficiency of ADMM. Moreover, S-ADMM reduces communication cost by exchanging parameters among nodes using the MPI and saves calculation time by parallel computation within nodes via OpenMP threads. For the SVM classification problem, experiments conducted on the Tianhe-2 supercomputing platform show that S-ADMM has competitive running efficiency and up to 43% accuracy improvement compared to existing distributed ADMM implemented with pure MPI or MPI/OpenMP on imbalanced datasets.
|
|
11:40-12:00, Paper WeBT1.3 | |
A Boosting Framework for Financial Distress Prediction Based on Imbalanced Data |
|
Dan, Zhao | Zhejiang University |
Keywords: Machine Learning, Soft Computing, Socio-Economic Cybernetics, Application of Artificial Intelligence
Abstract: This study introduces a boosting framework for financial distress prediction, specifically designed for imbalanced data, and incorporates robust business logic to enhance interpretability. The framework employs a clustering algorithm to group data samples based on corporate governance features and then determines the optimal number of clusters using a unique validation measure. An oversampling method is applied post-clustering, followed by a base prediction algorithm to predict financial distress for each cluster. The empirical analysis is conducted using imbalanced sample data from Chinese listed companies, with feature data at time t− m (where m = 1, 2, 3) and the Special Treatment (ST) status at time t used to train the model. The aim is to predict the occurrence of financial distress m years into the future. The results demonstrate that the proposed boosting framework outperforms the base model in terms of prediction accuracy on imbalanced data.
|
|
12:00-12:20, Paper WeBT1.4 | |
Optimising Horizons in Model Predictive Control for Motion Cueing Algorithms Using Reinforcement Learning |
|
Al-serri, Sari | Deakin University |
Qazani, Mohammad Reza Chalak | Deakin University |
Mohamed, Shady | Senior Research Fellow, Deakin University |
Arogbonlo, Adetokunbo | Deakin University |
Al-ashmori, Mohammed | Deakin University |
Lim, Chee Peng | Deakin University |
Nahavandi, Saeid | Swinburne University of Technology |
Asadi, Houshyar | Deakin University |
Keywords: Machine Learning, Metaheuristic Algorithms
Abstract: This paper explores the application of driving simulators across multiple sectors, highlighting the challenges associated with refining motion cueing algorithms (MCA) through model predictive control (MPC). Through these platforms, drivers can simulate the sensation of motion. The implementation of MPC-based MCA, while advantageous for its precision in controlling motion simulations, encounters significant hurdles such as the requirement for highly accurate system models and the extensive parameter tuning needed for each specific control scenario. These issues create a critical gap in achieving optimal simulation fidelity and efficiency with lower computational time, necessitating a novel approach to improve the MCA domain. Addressing these challenges, the study pioneers the use of Deep Q-Network (DQN), a reinforcement learning (RL) technique, to optimise the horizons of MPC within the MCA domain. This innovation is significant as it introduces, for the first time, a method to dynamically adjust MPC-based MCA horizons using DQN, which learns through continuous interaction with the simulation environment. This approach is set to overcome the limitations of traditional meta-heuristic optimisation methods, such as the Grasshopper Optimisation Algorithms (GOA) and Butterfly Optimisation Algorithms (BOA), by offering a more flexible and adaptable solution. The overarching goal of this research is to minimise the system's cost function by maximising a reward function that encompasses key performance metrics such as specific force sensation, angular velocity, linear displacement, linear velocity, and angular displacement. By integrating DQN into the MPC-based MCA environment, this study demonstrates a faster computational running time and improves the precision and efficiency of the simulations. This innovative approach enhances the efficiency of the horizon determination process, showcasing promising implications for the MCA domain's advancement.
|
|
12:20-12:40, Paper WeBT1.5 | |
Hybrid Quantum-Inspired Evolutionary Neural Networks for Intrusion Detection System |
|
Kuo, Shu-Yu | National Taiwan University |
Shen, Jyun-Yi | National Chi Nan University |
Liu, Chia-Lin | National Chi-Nan University |
Chou, Yao-Hsin | National Chi Nan University |
Keywords: Quantum Cybernetics, Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Hybrid Models of Computational Intelligence
Abstract: Quantum-inspired evolutionary algorithms harness quantum properties to optimize the search process within classical computers, efficiently addressing complex and challenging problems. This study first proposes an intrusion detection system (IDS) based on a hybrid model using quantum-inspired evolutionary neural networks. The model integrates a deep neural network (DNN) and a global best-guided quantum-inspired tabu search algorithm (GQTS). To safeguard against potential threats, an IDS is deployed to monitor network or system traffic and detect malicious attacks. Anomaly detection, a pivotal aspect of IDS, aims to establish a normal model to respond effectively to unknown abnormal attacks. The experiment utilizes the latest dataset, CICIDS2017, which is generated based on realistic background traffic. During the training phase, GQTS selects valid features from the dataset and optimizes the hyperparameters of the DNN setting automatically, significantly contributing to improving accuracy and reducing the false negative rate. The results highlight that the proposed hybrid model decreases computational complexity through feature selection and enhances model accuracy via suitable hyperparameter optimization compared to other state-of-the-art methods. The proposed model demonstrates great potential over alternative structures.
|
|
12:40-13:00, Paper WeBT1.6 | |
Fine-Grained Speech Sentiment Analysis in Chinese Psychological Support Hotlines Based on Large-Scale Pre-Trained Model (I) |
|
Chen, Zhonglong | Beijing University of Technology |
Changwei, Song | Beijing University of Technology |
Chen, Yining | Beijing University of Technology |
Li, Jianqiang | Beijing University of Technology |
Fu, Guanghui | Sorbonne University |
Tong, Yongsheng | Peking University Huilongguan Clinical Medical School |
Zhao, Qing | Beijing University of Technology |
Keywords: Deep Learning, Machine Learning
Abstract: Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available at: https://github.com/czl0914/psy_hotline_analysis.
|
|
WeBT2 |
MR02 |
Deep Learning and Neural Networks - 14 |
|
Chair: Yi, Chaoxiong | National University of Defense Technology |
|
11:00-11:20, Paper WeBT2.1 | |
HMO: Host Memory Optimization for Model Inference Acceleration on Edge Devices |
|
Yi, Chaoxiong | National University of Defense Technology |
Jian, Songlei | National University of Defense Technology |
Tan, Yusong | National University of Defense Technology |
Zhang, Yusen | National University of Defense Technology |
Keywords: Machine Learning, Deep Learning, AI and Applications
Abstract: Deep learning (DL) is characterized by its demanding computational and memory requirements, which creates a significant challenge when deploying on edge devices. These devices often have limited computational capabilities and constrained resources. Most existing methods primarily focus on model-level techniques, such as model pruning or parameter quantization, to reduce model size and computation for accelerating inference. Considering the prevalent programming paradigms in DL, we propose a host memory optimization method, namely HMO, which can be integrated into DL programming framework, e.g., PyTorch, to improve the inference efficiency of DL models without modifying any model code. We particularly focus on memory optimization for intermediate variables in inference, aiming to enhance inference speed while maintaining a lower memory footprint. HMO involves a single profiling of inference to gather memory statistics about intermediate variables. These statistics are then used to guide subsequent inference. Additionally, we incorporate huge pages in operating systems to improve the memory access performance of HMO. Our experimental results show that HMO can achieve an average inference latency optimization ratio of 20.13% compared with native PyTorch on six typical DL image representation models while effectively managing memory usage. Importantly, this is achieved without compromising model accuracy.
|
|
11:20-11:40, Paper WeBT2.2 | |
Contrastive Learning-Based User Identification with Limited Data on Smart Textiles |
|
Zhang, Yunkang | University of Science and Technology of China |
Wu, Ziyu | University of Science and Technology of China |
Liang, Zhen | University of Science and Technology of China |
Xie, Fangting | University of Science and Technology of China |
Wan, Quan | Universtity of Science and Technology of China |
Zhao, Mingjie | University of Science and Technology of China |
Cai, Xiaohui | University of Science and Technology of China |
Keywords: Deep Learning, Transfer Learning, Neural Networks and their Applications
Abstract: Pressure-sensitive smart textiles are widely applied in the fields of healthcare, sports monitoring, and intelligent homes. The integration of devices embedded with pressure sensing arrays is expected to enable comprehensive scene coverage and multi-device integration for smart home environments. However, the implementation of identity recognition, a fundamental function in this context, relies on extensive device-specific datasets due to variations in pressure distribution across different devices. To address this challenge, we propose a novel user identification method based on contrastive learning. We design two parallel branches to facilitate user identification on both new and existing devices respectively, employing supervised contrastive learning in the feature space to promote domain unification. When encountering new devices, extensive data collection efforts are not required; instead, user identification can be achieved using limited data consisting of only a few simple postures. Through experimentation with two 8-subject pressure datasets (BedPressure and ChrPressure), our proposed method demonstrates the capability to achieve user identification across 12 sitting scenarios using only a dataset containing 2 postures. Our average recognition accuracy reaches 79.05%, representing an improvement of 2.62% over the best baseline model.
|
|
11:40-12:00, Paper WeBT2.3 | |
A Joint Multi-Dimensional Fine-Grained Pruning Method for Deep Neural Network |
|
Chen, Cong | Nanjing University of Aeronautics and Astronautics |
Zhang, Tong | Nanjing University of Aeronautics and Astronautics |
Zhu, Kun | Nanjing University of Aeronautics and Astronautics |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: Existing deep neural network (DNN) pruning methods can be classified into two main categories: structured pruning and weight pruning. Structured pruning is a representative model compression technology of DNN to reduce the storage and computation requirements and accelerate inference, which mainly includes filter pruning and channel pruning. However, they both belong to coarse-grained methods, which can only decide whether to prune a whole filter or channel or not and provide limited decision space. On the other hand, structured stripe-wise pruning has finer granularity than filter pruning, and shape-wise pruning also has finer granularity than channel pruning. These two fine-grained methods are related to two dimensions: rows and columns from the general matrix multiplication (GEMM) perspective of convolution operations. Considering that combining pruning decisions in finer granularity from multiple dimensions will produce a larger solution space, in this paper we propose a joint multi-dimensional finegrained pruning scheme (JFP) for DNN compression, which simultaneously prune elements in filters and channels. Extensive experiments on the CIFAR-10 dataset demonstrate that: (1) JFP achieves stabler pruning ratios compared to stripe-wise pruning (2) JFP effectively compresses DNN parameters and reduces calculation amount while maintaining the accuracy compared with counterparts.
|
|
12:00-12:20, Paper WeBT2.4 | |
PSA-Swin Transformer: Image Classification on Small-Scale Datasets |
|
Shao, Chao | XinJiang University |
Jiang, Shaochen | Xinjiang University |
Li, Yongming | Xinjiang University |
Keywords: Deep Learning, AI and Applications, Image Processing and Pattern Recognition
Abstract: This paper introduces PSA-Swin Transformer, a novel framework for image classification on small-scale datasets, highlighting the challenges of training effective models in resource-constrained environments. Recognizing the limitations of current deep learning methods that rely heavily on large-scale datasets and extensive pre-training, we propose an approach to handle small datasets. Our model is able to effectively handle smaller data volumes without pre-training weights. The key to our approach is the introduction of an Efficient Positional Embedding (EPE) module, which improves parameter utilization and network expressiveness through a grouped convolutional architecture and shuffling operations for dynamic information exchange. In addition, we integrate the Polarized Self-Attention (PSA) module, which addresses the complexity of learning element-specific attention by combining polarized filtering with augmentation techniques. Through a series of experiments on the Mini-Imagenet dataset, PSA-Swin Transformer demonstrates decent performance, especially in environments where high-quality annotated data is scarce or costly to acquire. Our results are expected to lead to advances in areas where efficient and accurate image classification using limited resources is required.
|
|
12:20-12:40, Paper WeBT2.5 | |
Deep Reinforcement Learning-Based Strategies for Truck Platooning at Highway On-Ramps |
|
Wang, An | Shandong University of Science and Technology |
Qi, Liang | Shandong University of Science and Technology |
Luan, Wenjing | Shandong University of Science and Technology |
Liu, Kun | Shandong University of Science and Technology |
Guo, Xiwang | Liaoning Petrochemical University |
Keywords: Deep Learning, Application of Artificial Intelligence, Machine Learning
Abstract: The development of Connected and Automated Trucks (CATs) provides a new opportunity for freight industry to enhance fuel efficiency, increase traffic flow, and improve safety through platooning. Particularly at highway on-ramps, how to effectively form CAT platoons is a key research topic. In the process of CAT platooning, the timing, location, and speed of CAT merging significantly impact safety and energy consumption. Thus, this study proposes a hierarchical merging strategy, aimed at achieving effective autonomous CAT platooning at highway on-ramps by considering the interference of human-driven vehicles (HDVs). Specifically, we employ a model-free deep reinforcement learning method that guides CAT merging process by exploring optimal driving behaviors. It ensures the safety and efficiency of the CAT merging process. In addition, we use the real vehicle dynamics model in simulation. The proposed strategy can handle the variation of the CATs’ initial positions and speeds at on-ramps, as well as disturbances caused by HDVs at highway mainline. The effectiveness of the proposed strategy has been validated through simulations. The results show that the proposed strategy can effectively coordinate CAT platooning at highway on-ramps.
|
|
12:40-13:00, Paper WeBT2.6 | |
Modeling Hydrodynamic Diffusion Processes Using Spatio-Temporal Deep Neural Networks with Environmental Physical-Coupled Constraints (I) |
|
Jia, Lei | University of Aizu |
Yen, Neil | University of Aizu |
Pei, Yan | University of Aizu |
Keywords: Deep Learning
Abstract: The simulation and analysis of complex spatiotemporal systems are crucial for expressing and solving chaotic dynamical systems such as those in Earth and environmental sciences. Understanding and computing physical processes, reactions, or substance transport typically relies on control equations. This paper aims to explore a novel research paradigm by enhancing the physical network coupling structure to construct predictive models for fluid dynamics systems, simulating spatiotemporal dynamical processes of substance transformation in the domain of environmental physics. In particular, when addressing problems involving the non-homogeneous 2D fluid dynamics equations, the characteristic parameters of the physical processes were redefined. This was achieved by encoding hard boundary conditions and designing appropriate neural network architectures to mitigate over-fitting issues during the prediction of parameterized dynamical systems. Comparative experiments involving five benchmark physics-informed neural network methods emphasize the significant improvement in capturing time-varying features and prediction accuracy brought by the proposed approach. Through various water cycle scenarios, the model’s estimation ability for diffusion fields is validated, focusing on analyzing the influence of data errors and sample size on the computational results of this deep neural network.Notably, the proposed method exhibits higher robustness to outlier observations under extreme conditions.
|
|
WeBT3 |
MR03 |
Cognitive and Affective Computing |
|
Chair: Gu, Jiaqi | GuangXi University for Nationalities |
|
11:00-11:20, Paper WeBT3.1 | |
Precise Knowledge Enhancement Via CBR Framework for Empathetic Dialogue Generation |
|
Gu, Ziyin | Chinese Academy of Sciences |
Zhu, Qingmeng | Science & Technology on Integrated Information System Laboratory |
He, Hao | Chinese Academy of Sciences |
Yu, ZhiPeng | ISCAS |
Keywords: Affective Computing, Human-Machine Interaction
Abstract: Empathetic dialogue systems are designed to capture emotions in conversations and provide appropriate emotional responses. Previous researches have indicated that integrating specific knowledge into empathetic dialogue systems can enhance the overall effectiveness of generating empathetic responses. Nevertheless, existing methods for knowledge-enhanced empathetic dialogue generation lack a focus on the precise selection of knowledge enhancement configurations for this specific task. To address this, we propose a Case-Based Reasoning (CBR) framework called CBR-KNOWLEDGE for autonomously select precise knowledge enhancement configurations tailored to specific empathetic dialogue contexts. Firstly, CBR-KNOWLEDGE establishes a case base that mirrors the overall quality of empathetic dialogues generated under various knowledge enhancement configurations. Subsequently, CBR-KNOWLEDGE employs an innovative text representation method, integrating an additional representation for words with noteworthy emotional impact. This approach facilitates the retrieval of analogous empathetic dialogues, enabling the reuse of their knowledge enhancement configurations to determine a new knowledge enhancement configuration. Ultimately, CBR-KNOWLEDGE employs this precise knowledge enhancement configuration for the purpose of empathetic dialogue generation. Experimental results demonstrate that CBR-KNOWLEDGE effectively enhances the performance of empathetic dialogue generation task.
|
|
11:20-11:40, Paper WeBT3.2 | |
Two-Stage Multi-Modal Prompt Tuning for Few-Shot Sentiment Analysis |
|
Gu, Jiaqi | GuangXi University for Nationalities |
Niu, Hao | Gengchi Technology Co., Ltd |
Keywords: Affective Computing, Multimedia Systems
Abstract: Few-shot 多模态情感分析 (MSA) 至关重要 视觉语言理解领域的任务和 在各种应用领域中发挥着关键作用 (例如,互动、电子商务推广和社交媒体 分析等)。最近,随着 在预训练的语言模型中,前人的工作主要有 利用了预训练语言模型的组合 除了视觉编码器和采用的提示学习之外 将预训练的语言模型推广到 MSA 任务。 但是,有专门的 visionlanguage 预训练 设计用于处理视觉语言任务的模型 (VLPM), 如视觉问答。几乎没有 VLPM及其提示学习方法的探索 多模态情感分析。因此,我们的工作填补了 通过提出两阶段多模态提示来弥补这一差距 基于小样本情感分析的调优 (TSMMP) VLPM。TSMMP 由两级提示调谐组成。在 第一阶段,我们分别对图像和文本进行编码,然后 将它们馈
|
|
11:40-12:00, Paper WeBT3.3 | |
Bi-VLA: Vision-Language-Action Model-Based System for Bimanual Robotic Dexterous Manipulations |
|
Gbagbe, Koffivi Fidele | Skolkovo Institute of Science and Technology |
Altamirano Cabrera, Miguel | Skolkovo Institute of Science and Technology Skoltech |
Alabbas, Ali | Skolkovo Institute of Science and Technology |
Alyounes, Oussama | Skolkovo Institute of Science and Technology Skoltech |
Lykov, Artem | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Human-Computer Interaction, Affective Computing, Human-Machine Interface
Abstract: This research introduces the Bi-VLA (Vision-Language-Action) model, a novel system designed for bimanual robotic dexterous manipulation that seamlessly integrates vision for scene understanding, language comprehension for translating human instructions into executable code, and physical action generation. We evaluated the system’s functionality through a series of household tasks, including the preparation of a desired salad upon human request. Bi-VLA demonstrates the ability to interpret complex human instructions, perceive and understand the visual context of ingredients, and execute precise bimanual actions to prepare the requested salad. We assessed the system’s performance in terms of accuracy, efficiency, and adaptability to different salad recipes and human preferences through a series of experiments. Our results show a 100% success rate in generating the correct executable code by the Language Module, a 96.06% success rate in detecting specific ingredients by the Vision Module, and an overall success rate of 83.4% in correctly executing user-requested tasks.
|
|
12:00-12:20, Paper WeBT3.4 | |
Self-Attention Residual Connection and Graph Neural Hawkes Bilayer Model for Session-Based Recommendation |
|
Li, Huan | Dongguan University of Technology |
Chen, Senpeng | Dongguan University of Technology |
Wei, Wenhong | Dongguan University of Technology |
Dong, Ani | Dongguan City University |
Li, Qingxia | Dongguan City University |
Keywords: Cognitive Computing, Intelligence Interaction, Human-Machine Interaction
Abstract: Session-based recommendation aims to make recommendations for anonymous users based on limited session data. However, traditional session-based recommendation methods fail to capture complex item transitions and simply represent the user’s last clicked item as a short-term preference, neglecting the global sequential information of the session. This approach struggles to consider transitions between contexts and cannot accurately capture the user’s true intentions. To address these issues, this paper proposes a session recommendation method based on self-attention residual connections and graph neural Hawkes (SRGNH). This method introduces a duallayer network structure consisting of graph neural self-attention residual connection layers and graph neural Hawkes layers, designed to learn users’ long-term and short-term preferences, respectively. SRGNH employs a Gated Graph Neural Network (GGNN) to capture complex interactions between nodes, obtaining latent vectors for each item. It incorporates self-attention networks and residual connections to effectively utilize low-level inspired information for capturing users’ long-term preferences. The graph neural Hawkes layer combines the Hawkes process with GGNN to capture the relationship between user item clicks over continuous time, accurately representing users’ short-term preferences. To better represent user intent, we linearly combine users’ long-term and short-term preferences in the end. Experimental results demonstrate that the proposed SRGNH outperforms other recommendation models on the Diginetica, Yoochoose1/64, and Yoochoose1/4 datasets.
|
|
12:20-12:40, Paper WeBT3.5 | |
Investigation of Correspondence between Learner Sensory Processing Sensitivity and Different Avatars in Online Lectures (I) |
|
Riese, Sean Mirai | Japan Advanced Institute of Science and Technology |
Koich, Ota | Japan Advanced Institute of Science |
Gu, Wen | Center for Innovative Distance Education and Research, Japan Adv |
Hasegawa, Shinobu | Japan Advanced Institute of Science and Technology |
Keywords: Affective Computing
Abstract: Highly Sensitive Person (HSP) characteristics such as "depth of processing," "overstimulation," "emotional reactivity and empathy," and "sensitivity to subtleties" sometimes face challenges due to their high Sensory Processing Sensitivity (SPS) to the environment. To investigate the differences in SPS for each learner and the impact of different avatars on video presentation for online videos increased by COVID-19, we surveyed 20 participants who learned SDGs instruction videos with four different avatars. Analysis of their SPS responses using the HSPS-J19 self-assessment tool revealed that participants' SPS scores followed a normal distribution, indicating individual differences in SPS. In addition, a few correlations were found between HSPS-J19 scores and participants' impressions and motivation to avatar presentation. Furthermore, the cluster analysis results indicated that the HSP tendency group was more effective in applying appropriate avatars. Based on these results, we designed an online lecture support environment that allows the control of other people's videos as environmental stimuli. This research focuses on an unexplored area and enhances online lectures for HSP since HSP statistically applies to 15% to 20% of the population, and supporting high SPS learners is socially significant in the post-COVID-19 era.
|
|
12:40-13:00, Paper WeBT3.6 | |
Teacher-To-Teacher: Harmonizing Dual Expertise into a Unified Speech Emotion Model (I) |
|
Singkul, Sattaya | SpeeChance Co., Ltd |
Yuenyong, Sumeth | Mahidol University |
Wongpatikaseree, Konlakorn | Mahidol University |
Keywords: Affective Computing, Human-Computer Interaction, Human-Machine Interaction
Abstract: This paper introduces the Teacher-to-Teacher (T2T) framework, a novel approach in speech emotion recognition (SER) specifically tailored for the Thai language. Leveraging the dual expertise of the Wav2Vec and Wav2Vec2 models, the T2T framework utilizes unsupervised and self-supervised learning knowledges to effectively address the unique challenges posed by tonal languages. By integrating these two powerful models into a unified SER framework, T2T enhances its capability to process and interpret nuanced emotional cues in speech, achieving superior performance compared to traditional SER methods. Evaluated across three major datasets-ThaiSER, EMOLA, and MU-the framework demonstrates significant improvements in unweighted accuracy and F1-score. Innovations such as emotional clustering representation and targeted emotional representation contribute to its high precision in detecting and differentiating subtle emotional states. Additionally, the integration of a fine-tuned teacher module aligns these advancements with practical SER applications, further increasing the framework's accuracy and sensitivity in real-world scenarios. The successful implementation of the T2T framework opens new avenues for enhancing SER technologies in other low-resource languages and extends its applicability to real-time processing applications, thereby advancing the field of computational emotion recognition.
|
|