| |
Last updated on September 30, 2024. This conference program is tentative and subject to change
Technical Program for Monday October 7, 2024
|
MoAT1 |
MR01 |
AI Applications 1 |
Regular Papers - Cybernetics |
Chair: Li, Xiaoou | CINVESTAV-IPN |
|
08:45-09:05, Paper MoAT1.3 | |
LDD-YOLO: An Improved Lightweight Detection Method for Steel Surface Defects Based on YOLOv8 |
|
Zhang, Yuechen | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Aimin | Qilu University of Technology |
Li, Zhiyao | Qilu University of Technology (Shandong Academy of Sciences) |
Kong, Xiaotong | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Wenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Application of Artificial Intelligence, AI and Applications, Deep Learning
Abstract: Steel is an indispensable raw material in the industrial field, steel surface defects seriously affect the quality of steel, in recent years a lot of research has been carried out on the detection of steel surface defects. Existing steel defect detection methods are unable to fully mine the underlying feature information of the target image and do not achieve a dynamic balance between accuracy and speed. To address the above problems, this paper proposes an optimised target detection algorithm based on YOLOv8. First, we proposed the DMCA module, which combines the ideas of deformable convolution and multi-channel self-attention mechanism. We developed a strengthen self-attention module to enhance the process of deformable convolutional generation of offsets, so that the model can better adapt to the complex shapes of different defective targets and extract features at a deeper level. Secondly, using the idea of LKA (Large Kernel Attention), we propose the LF-MSPP lightweight module with long-range dependence and adaptive capability to capture the tele-relationships with small computational cost and parameters, improved the problem of missing defective feature information. Finally, we replaced the head of the original YOLOv8 with a Dynamic Head and used the split attention mechanism to improve the head detection capabilities while ensuring lightweight. We conduct extensive experiments on the widely used Northeastern University steel defect dataset NEU-DET. Experimental results show that the improved model improves mAP@50, mAP@50−95, AP and AR indicators by 2.4%, 2.0%, 6.1%, and 3.4% respectively compared with the original YOLOv8 model, and the number of model parameters is reduced by 11.2%. The improved model is also better than mainstream defect target detection models such as SSD, Retinanet, FasterRCNN, YOLOv5, YOLOv6, YOLOv7, YOLOv8, etc, and can better meet the accuracy and speed requirements of actual industrial production for steel surface defect detection models.
|
|
09:05-09:25, Paper MoAT1.4 | |
Smartphone-Based Structural Health Monitoring with Neural Network Regression for Damage Detection |
|
Li, Xiaoou | CINVESTAV-IPN |
Yu, Wen | CINVESTAV-IPN |
Yingqin, Zhu | CINVESTAV-IPN |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Machine Learning
Abstract: This paper presents a novel and cost-effective approach for structural health monitoring using smartphones. By using built-in accelerometers, smartphones can collect data on building motion, facilitating the detection of potential damage. Traditional methods often rely on classification techniques, requiring extensive training data encompassing both damaged and undamaged scenarios. However, this proves impractical for smartphones due to their limited computational resources for complex classification tasks. We propose a paradigm shift, transforming the classification problem into a regression problem. This enables robust structural health assessment using a neural network specifically designed for this purpose: the echo state network (ESN). ESNs offer inherent robustness to noise and perturbations, making them ideal for real-world applications with sensor data. Compared to traditional methods, the proposed smartphone-based system offers significant advantages in terms of cost-effectiveness, user-friendliness, and computational efficiency. The effectiveness of the proposed method is evaluated through several experiments, demonstrating its capability in identifying structural damage.
|
|
09:25-09:45, Paper MoAT1.5 | |
Dynamic Event-Triggered Distributed MPC for UAV-UGV Systems against DoS Attacks on Communication Channels |
|
Tang, Hui | University of Electronic Science and Technology of China |
Chen, Yong | University of Electronic Science and Technology of China |
Keywords: Intelligent Internet Systems, Agent-Based Modeling, Complex Network
Abstract: In this study, we address the secure model predictive control (MPC) problem for a system comprising unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs). Our objective is to cooperatively regulate the UAV-UGV system to maintain all states at equilibrium in spite of physical constraints, external disturbances, and Denial of Service (DoS) attacks on communication channels. We propose a dynamic event-triggered distributed MPC (DET-DMPC) approach. Initially, a robustness constraint is integrated into the MPC to incrementally confine system states within a progressively narrowing tube, thereby bolstering disturbance resilience. Additionally, we introduce a neighbor output sequence prediction mechanism (NOSPM) to mitigate packet loss due to DoS attacks. In response to the varying durations of DoS attacks, we have developed a DET mechanism equipped with adaptive triggering thresholds. Moreover, we have formulated sufficient conditions that ensure the recursive feasibility of the MPC and the stability of the closed-loop system. Comparative tests demonstrate the proposed DET-DMPC's superiority in robustness and security.
|
|
MoAT2 |
MR02 |
AI Applications 5 |
Regular Papers - Cybernetics |
Chair: Jiang, Ming | Guilin University of Electronic Technology |
|
08:45-09:05, Paper MoAT2.3 | |
CSFIR: Leveraging Code-Specific Features to Augment Information Retrieval in Low-Resource Code Datasets |
|
Tong, Zhenyu | University of Chinese Academy of Sciences |
Luo, Chenxi | La Trobe University |
Luo, Tiejian | University of Chinese Academy of Sciences |
Keywords: Application of Artificial Intelligence, AI and Applications, Deep Learning
Abstract: From search engines like Google to advanced ap- plications such as Retrieval Augmented Generation integrating Large Language Model (LLM), Information Retrieval (IR) serves a crucial role. To facilitate the development of increas- ingly large and complex code program projects, researchers introduce IR systems into the code domain. Unfortunately, although IR systems achieve significant success in retrieving natural language corpus and query, they face challenges when tasked with retrieving corpus consisting of code sequences. Primarily, in practical applications, most code sequences lack corresponding natural language annotations, which are known as low-resource scenarios, hindering the training of neural network-based retriever in IR systems. Furthermore, modern IR systems often overlook the structural features of code sequences, which may be beneficial for understanding these sequences. Additionally, the length of most code sequences exceeds that of equivalent natural language expressions, com- plicating the processing of relationships between code and natural language sequences. To address these challenges, we propose a novel IR system CSFIR, which leverages Code- Specific Features to augment IR. For the prevalent issue of unlabeled code sequences in low-resource scenarios, we employ a supervised fine-tuned LLM as a generator to generate natural language queries for unlabeled code sequence. Subsequently, we extract structural features from the abstract syntax tree of code sequence using graph convolution networks and integrate these features to enhance the original retriever. Finally, given the adverse effects of lengthy code sequences on generators, we propose a subtree segmentation algorithm, which reduces the length of code sequences without compromising their original meaning, thereby enhancing the quality of queries generated by the generator. We conduct comparative experiments to ascertain the efficacy of our method. Regarding Recall@100, our CSFIR system improves from 90.12 in a traditional IR system to 96.18. Our code is available at https://github.com/tzy31415/CSFIR.git.
|
|
09:05-09:25, Paper MoAT2.4 | |
Adaptive Fusion of Global Information for Position Aware Multi-Interest Sequential Recommendation |
|
Sun, Tianhao | Chongqing University |
Ma, Jiayi | Chongqing University |
Chen, Yanke | Chongqing University |
Ma, Yunhao | Chongqing University |
Wu, Quanwang | Chongqing University |
Keywords: Application of Artificial Intelligence, Deep Learning, Expert and Knowledge-Based Systems
Abstract: Sequential recommendation based on multi-interest networks has become a research hotspot cause it meets the various needs and preferences of users. However, current models only focus on how to extract multiple interests of users, but do not distinguish the importance of interests; and the global information that can simulate the evolution trend of user interest is not fully utilized. In this work, we propose a novel approach named adaptive fusion of global information for position aware multi-interest sequential recommendation(AGPM). Specifically, all users' historical interaction sequences are utilized to simulate the potential evolution trend of user interests. In this process, we set neighbor window size and use the position interval size between pairs of co-occurring items to adjust the weight between them, and design an adaptive strategy to fuse high- and low-order global information, enriching with highly relevant items while expanding high-order correlations. In addition, relative position interval information is used as the key factor to capture users' multiple interests, so as to highlight the content that users are more concerned about recently. Extensive experiments and analyses are carried out on three public datasets, and the experimental results demonstrate the effectiveness and superiority of the proposed method.
|
|
09:25-09:45, Paper MoAT2.5 | |
FedHiDiC: High-Dimensional Heterogeneous Data Condensation and Distillation Contrastive Learning in Federated Learning |
|
Jiang, Ming | Guilin University of Electronic Technology |
Li, Yun | Guilin University of Electronic Technology |
Zhang, Feng | Guilin University of Electronic Technology |
Guo, Biao | Guilin University of Electronic Technology |
Qin, Ping | Guangxi Polytechnic of Construction |
Zhang, Shuo | Guilin University of Electronic Technology |
Lu, Yao | Guilin University of Electric Thechnology |
Keywords: Application of Artificial Intelligence, Machine Learning, AI and Applications
Abstract: In response to key challenges in federated learning, particularly those concerning the heterogeneity of locally distributed data which engenders client drift and the problematic occurrence of dimensionality collapse, we propose FedHiDiC: high-dimensional heterogeneous data condensation and distillation contrastive learning. A key idea of FedHiDiC is to integrate mean squared error, standard deviation, and covariance as regularization terms in local training to control parameter norm growth and bolster the capacity to handle high-dimensional, heterogeneous data. Another key idea is to utilize the similarity of representations between models to correct for local training, which reinforces intra-class compactness and inter-class distinction and reduces model variability through knowledge distillation from the global to local models, combined with contrastive strategies. Our experiments on CIFAR10 and CIFAR100 datasets show that FedHiDiC achieves significant performance improvements, surpassing FedAvg, FedProx, and MOON by 2.9% to 4.0% in accuracy, which confirms the effectiveness of FedHiDiC in tackling heterogeneity challenges.
|
|
MoAT3 |
MR03 |
Design Methods and Information Systems 1 |
|
Chair: Ying, Wang | Capital Normal University |
|
08:45-09:05, Paper MoAT3.3 | |
Design Thinking in Software Development: A Practical Approach in Technology Labs |
|
Moura de Jesus, Felipe | Federal University of Alagoas - UFAL |
Bion, Danillo | Federal University of Agreste of Pernambuco |
Rocha, Rodrigo | Universidade Federal Do Agreste De Pernambuco |
Neto, Pirangaba | Federal University of Agreste of Pernambuco |
Vanderlei, Igor | Universidade Federal Do Agreste De Pernambuco |
Cunha, Icaro | Federal University of Agreste of Pernambuco |
Araujo, Jean | Faculdade De Ciências, Universidade De Lisboa |
Keywords: Design Methods, Cooperative Work in Design, User Interface Design
Abstract: The search for ways to improve the software engineering process is never-ending, and huge groups like academia and the software industry are constantly looking for new methodologies, models, theories, tools, and resources to help them improve the quality of their products. In this context, Design Thinking appears as an alternative in the system development process, with specific procedures to better understand the customer’s and user needs, assisting in creating and developing competent products and services. This paper presents a design thinking approach to software development through interaction and userfocused. As an environment for evaluation, this work is based on a case study. The study objects included ten software projects developed in technology laboratories of a public university that followed this approach and produced technical solutions for real consumers. As a result of this work, a Design Thinking approach for creating computational systems has been presented, which has proven its efficacy through positive effects on the end product’s quality, as well as a more robust organization and a reduction in development time. Finally, as a reference point, this work provides and allows for derivations and adaptations for work in similar contexts.
|
|
09:05-09:25, Paper MoAT3.4 | |
Encryption and Decryption of Communication Systems Using the CMAC-Based Chaotic System Synchronization Technique (I) |
|
Wang, Hung-Chan | Yuan Ze University |
Lin, Chih-Min | Yuan Ze University |
Keywords: Design Methods
Abstract: In this paper, a Cerebellar Model Articulation Controller (CMAC) is designed for the synchronization control of chaotic system used for the encryption and decryption of communication systems. Chaotic systems are important nonlinear systems that display complex and unpredictable behavior; so how to synchronize a chaotic system becomes a great deal in engineering community. Since the architecture of CMAC is small and it can learn fast, so it is suitable for high speed signal processing. The audio and image to be transmitted can be mixed into chaotic systems for encryption transmission, thereby preventing the massage being known by others; and finally use the synchronization of chaotic system to decrypt the correct information at the receiving end. The synchronization of chaotic system can be controlled by using a CMAC. By adjusting the controller parameters, CMAC can achieve fast and stable control for decrypting the original signals.
|
|
09:25-09:45, Paper MoAT3.5 | |
Research on Computing Offloading Methods for Deep Learning Scenario Applications (I) |
|
Ying, Wang | Capital Normal University |
Zhang, Zhe | School of Software, Shanxi Agricultural University |
Lan, Gao | Capital Normal University |
Xin, Liu | Capital Normal University |
Yixiong, Wu | Capital Normal University |
Weigong, Zhang | Capital Normal University |
Keywords: Design Methods
Abstract: 在 中部署人工智能算法 端侧的异构系统需要寻址 计算调度的优化问题 计算密集型深度学习算法。在本文中, 我们提出了一种计算卸载架构,其中 以卷积神经网络场景为背景。 首先,全面的资源架构和 基于以下基础建立计算需求模型 实际场景应用。其次, 各场景的计算需求分为 巨集和微观方面,以及硬件计算能力 的异构计算系统被划分为 粗粒度和细粒度以优化计算 服务映射过程和运营商服务映射 过程。最后,基于马尔科夫的流程调度 提出了确定每个服务是否的策略 Scheduling 在当前 时刻和卸载顺序。仿真结果 展示了所提算法的显著效果 关于整体任务延迟和硬件计算资源 利用。
|
|
MoAT5 |
MR05 |
Adaptive Systems and Control 1 |
Regular Papers - SSE |
Chair: Thorén, Samuel | Company |
|
08:45-09:05, Paper MoAT5.3 | |
Model Predictive Geofence for Vehicle Containment |
|
Thorén, Samuel | Company |
Wikander, Lukas | AstaZero AB |
Jarlow, Victor | AstaZero |
Kero, Timo | Research Institute |
Keywords: System Modeling and Control, Adaptive Systems, Autonomous Vehicle
Abstract: As automated vehicle technology advances, measures for their safe containment become increasingly important. To this end, geofencing is a prominent alternative as a fundamental technique for triggering specific actions when vehicles enter or leave a predefined operational area. Today’s geofencing methods usually fall short in safety-critical use cases, failing to contain vehicles, or triggering needless intervening actions. This work presents the novel model predictive geofence, which predicts future transgressions based on vehicle dynamics-informed real-time decisions. We studied its performance compared to representative approaches, both physically at the AstaZero Proving Ground in Sweden and through numerical calculations. Our geofence utilised the operational area more effectively than current approaches. Furthermore, the model predictive geofence successfully contained the vehicle to the operational area in all experiments, preventing exit with a low amount of false stops. The model predictive geofence presents an applicable approach for quick decision-making regarding the containment of vehicles in operational areas.
|
|
09:05-09:25, Paper MoAT5.4 | |
Optimal Operator-Based Modeling for Open Circuit Voltage Hysteresis of LiFePO4 Batteries |
|
Lisen, Yan | Central South University |
Peng, Jun | Central South University |
Wu, Yue | Central South University |
Zhu, Zeyu | Central South University |
Li, Heng | Central South University |
Huang, Zhiwu | Central South University |
Keywords: System Modeling and Control
Abstract: Accurate modeling of open circuit voltage hysteresis for text{LiFePO}_text{4} batteries is crucial for establishing an advanced battery model. However, existing hysteresis modeling methods often yield suboptimal results due to inadequate parameterization. This paper proposes an optimal modeling method for open circuit voltage hysteresis based on the Prandtl-Ishlinskii model and an associated parameterization method. First, an asymmetric operator with cubic envelope functions is designed to enhance the classical Prandtl-Ishlinskii model, which originally features a symmetric and linear operator. This modification enables the proposed model to accurately capture intricate hysteresis. Second, a hierarchical parameterization method is proposed to identify optimal parameters. Specifically, an improved grey wolf optimizer is employed to determine the operator-related parameters. Then, the remaining parameters are calculated using the least squares algorithm, enhancing computational efficiency. Finally, the proposed model is validated on the experimental hysteresis data from three distinct scenarios. The modeling error of the proposed model decreased by 66.57 % and 32.51 % compared with two other benchmark models.
|
|
09:25-09:45, Paper MoAT5.5 | |
Integral Sliding Mode Control Design for the Consensus Problem in Microgrids |
|
Zarei, Jafar | Shiraz University of Technology |
Pooyandeh-far, Samaneh | Shiraz University of Technology |
Saif, Mehrdad | University of Windsor |
Keywords: Cooperative Systems and Control, Control of Uncertain Systems, Intelligent Power Grid
Abstract: his paper deals with the problem of consensus in multi-agent systems using an integral sliding mode controller based on a finite-time disturbance observer. It is assumed that the dynamics of each agent are non-linear. The purpose is to achieve consensus in multi-agent systems in the presence of matched and mismatched disturbances by designing an integral sliding mode control based on the disturbance observer. For this purpose, first, the disturbance observer is designed in such a way that it estimates matched and mismatched disturbances. Then, the integral sliding mode controller is modified by estimating the disturbances to eliminate the mismatched disturbances with the input. Stability analysis of the proposed controller is conducted using Lyapunov theory. Finally, to demonstrate the effectiveness of the proposed method, it is investigated and simulated on a microgrid model that includes several boost converters feeding constant power loads.
|
|
MoAT6 |
MR06 |
Adaptive Systems and Control 5 |
|
Chair: Lin, Yann-Horng | National Taiwan Ocean University |
|
08:25-08:45, Paper MoAT6.2 | |
Information Security Evaluation by Information Flow Analysis Based on Stochastic Petri Nets |
|
Tu, Hanqian | Zhejiang Sci-Tech University |
Xiang, Dongming | Zhejiang Sci-Tech University |
Lin, Wang | Zhejiang Sci-Tech University |
Liu, Guanjun | Tongji University |
Keywords: System Modeling and Control
Abstract: The Petri-net-based information flow analysis offers an effective approach for detecting information leakage by the concept of non-interference. Although the related studies propose efficient solutions, they lack quantitative evaluation on information leakage. In this paper, we propose a novel method for quantitative evaluation of information security based on stochastic labeled Petri nets (SLPNs) and information flow analysis. Specifically, we introduce four different levels of security metrics, and provide a methodology for evaluating the information security. Furthermore, a case study is presented to show the feasibility of our method.
|
|
08:45-09:05, Paper MoAT6.3 | |
A Study on the Scenario Design Method for National Decision-Making Behavior in International Conflicts |
|
Li, Bo | National University of Defense Technology |
Yang, Yang Xiao Yu | National University of Defense Technology |
Yao, Feng | National University of Defense Technology |
Tang, Fang | National University of Defense Technology |
Wen, Mengxuan | National University of Defense Technology |
Zhu, Renqi | National University of Defense Technology |
Keywords: System Modeling and Control, Conflict Resolution, Homeland Security
Abstract: 作为国际冲突的复杂性和多样性 加大力度,以国际冲突情景为基础设计 关于因果关系的推理具有重要意义和价值 让国家决策者把握《公约》的关键要点 冲突并实现科学有效的国家 政策回应。该研究设计了国际冲突 三个阶段的场景:数据处理、结构 模型构建和真实世界情境拟合。 以俄乌冲突的建设为例 情景 以场景为例,该研究以主题挖掘和 基于冲突描述的因果推断方法 设计科学情景并对其进行分析的文本 获得合理的政策建议,证明 情景设计方法的合理性。该研究不 仅提供一种分析国际的方法 冲突不仅支持实际政策的制定,而且 提供使用定量的场景设计框架 方法,这对未来具有重要意义 相关领域的研究与实践,如 国际政治局势和ࠩ
|
|
09:05-09:25, Paper MoAT6.4 | |
Formation and Containment Fuzzy Control Via Interval Type-2 Approach for Multiple Autonomous Ships with Complex Noises (I) |
|
Lin, Yann-Horng | National Taiwan Ocean University |
Lee, Yi-Chen | National Dong Hwa University |
Chang, Wen-Jer | National Taiwan Ocean University |
Ku, Cheung-Chieh | National Taiwan Ocean University |
Sun, Cheinchung | National Kaohsiung University of Science and Technology |
Keywords: System Modeling and Control, Control of Uncertain Systems, Intelligent Transportation Systems
Abstract: An Interval Type-2 (IT-2) fuzzy control approach is proposed to solve the Formation-Containment (F-C) problem for nonlinear multiple Autonomous Ship (AS) systems with the consideration of uncertainties and multiplicative noises. Based on the IT-2 Takagi-Sugeno Fuzzy Model (T-SFM), the nonlinear control problem can be recast into a linear problem. Different from the existing F-C control research, the formation problem is solved by the individual IT-2 fuzzy tracking controller of each leader. As a result, the information communication between leaders, who are farthest from each other, is needless. Moreover, other valuable functions such as the avoidance of known obstacles and time-varying formation can also be accomplished by designing the proper target trajectories for the tracking of leaders. By virtue of the IT-2 T-SFM, the analysis methods for linear Multi-Agent Control Systems (MACSs) can be utilized to solve the containment analysis problem, which is caused by the leaders’ control input. Finally, the simulation results of multiple ASs are provided to demonstrate the advantage of the IT-2 fuzzy controller design approach in the F-C problem.
|
|
09:25-09:45, Paper MoAT6.5 | |
Adaptive Event-Triggered Control for Switched Nonlinear Systems with Average Dwell Time (I) |
|
Wang, Xueliang | University of Science and Technology Beijing |
Zou, Yao | University of Science and Technology Beijing |
He, Wei | University of Science and Technology Beijing |
Keywords: Adaptive Systems, System Modeling and Control
Abstract: We explore event-triggered control for switched nonlinear systems with state constraints. The state constraint problem is solved by introducing nonlinear mapping. By enabling the triggering error to adapt in tandem with the switching signal, we effectively alleviate the negative repercussions stemming from mismatch, thereby enhancing overall system performance. Moreover, through the development of multiple Lyapunov functions, we propose novel switching signals that adhere to time-dependent switching, ensuring system stability and optimizing system performance. Simulation results confirm the efficacy of the control method.
|
|
MoAT7 |
MR07 |
Online - Adaptive Systems and Control 1 |
|
Chair: Dehnad, Parastoo | Tabriz University |
|
08:05-08:25, Paper MoAT7.1 | |
Interpretable DRL-Based Maneuver Decision of UCAV Dogfight |
|
Han, Haoran | University of Electronic Science and Technology of China |
Jian, Cheng | University of Electronic Science and Technology of China |
Maolong, Lv | Air Force Engineering University |
Keywords: Application of Artificial Intelligence, AI and Applications, Agent-Based Modeling
Abstract: This paper proposes a three-layer unmanned combat aerial vehicle (UCAV) dogfight frame where Deep reinforcement learning (DRL) is responsible for high-level maneuver decision. A four-channel low-level control law is firstly constructed, followed by a library containing eight basic flight maneuvers (BFMs). Double deep Q network (DDQN) is applied for BFM selection in UCAV dogfight, where the opponent strategy during the training process is constructed with DT. Our simulation result shows that, the agent can achieve a win rate of 85.75% against the DT strategy, and positive results when facing various unseen opponents. Based on the proposed frame, interpretability of the DRL-based dogfight is significantly improved. The agent performs yo-yo to adjust its turn rate and gain higher maneuverability. Emergence of "Dive and Chase" behavior also indicates the agent can generate a novel tactic that utilizes the drawback of its opponent.
|
|
08:25-08:45, Paper MoAT7.2 | |
Mining Temporal Association Rules for Multivariate Time Series Classification Problems with Both Discrete and Continuous Values Based on Shapelets |
|
Ding, Guohui | Shenyang Aerospace University |
Yuan, Zhaoyi | Shenyang Aerospace University |
Tang, Wenjing | Shenyang Aerospace University |
Jiang, Chao | Shenyang Aerospace University |
Jiao, Qingyang | Shenyang |
Keywords: Big Data Computing,, Representation Learning
Abstract: Due to its excellent interpretability and high accuracy, time series Shapelet has garnered widespread attention in time series classification tasks. However, currently prevalent Shapelet classification methods primarily focus on numerical data, neglecting the common occurrence of categorical feature variables in practical applications. Additionally, existing multivariate time series classification algorithms exhibit shortcomings in executing classification tasks following the Shapelet discovery process. Inspired by association rule mining, this paper proposes an innovative Shapelet classification algorithm aimed at addressing both numerical and categorical data in multivariate time series. This algorithm employs a unified representation method to effectively integrate categorical and continuous features, while enhancing existing time series Shapelet discovery methods by independently calculating Shapelets for each variable, making them more suitable for association rule mining. Leveraging the discovered Shapelets and Allen's interval relations, the algorithm constructs temporal relationships among multivariate time series Shapelets, enabling the discovery of frequent patterns and the completion of classification tasks. This study aims to fully leverage the interpretability of time series Shapelets, revealing hidden temporal patterns within time series data. Experimental results demonstrate that this algorithm outperforms existing benchmark algorithms for multivariate time series classification in terms of accuracy, while exhibiting significant interpretability advantages.
|
|
08:45-09:05, Paper MoAT7.3 | |
LB-VTON: Lower-Body Virtual Try-On Via Differential Constraints and Adversarial Training Strategies |
|
Rong, Chunyu | Sichuan University |
Yi, Li | Sichuan University |
Keywords: Machine Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: 最近,基于图像的虚拟试妆系统取得了重大进展,但他们最近的进展主要集中在上半身试衣上,而牺牲了下半身试衣。腿部姿势的复杂性和不适当的正则化参数等问题导致了下半身布料虚拟试穿仍然存在挑战。在这里,我们提出了一种新颖的下半身虚拟试妆网络,称为LB-VTON,它根据穿着者和目标布料的图像合成逼真的图像。为了更好地保留布料和身体特征,LB-VTON采用了四阶段设计策略。首先,它使用人工解析等方法对图像进行预处理,以描绘身体部位和织物区域。其次,该方法使用回归网络方法,通过使用从实际网格数据导出的二阶微分约束来估计薄片样条变换的参数L,从而提高试读图像的分辨率和保真度。第三步,为了克服织物模糊
|
|
09:05-09:25, Paper MoAT7.4 | |
Innovative Initialization Scheme for Multi-Objective Feature Selection in Continuous Search Spaces |
|
Dehnad, Parastoo | Tabriz University |
Asilian Bidgoli, Azam | Wilfrid Laurier University |
Rahnamayan, Shahryar | Brock University |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Machine Learning
Abstract: Feature selection is a demanding and costly endeavor within the realms of machine learning and data mining, targeting the elimination of irrelevant and redundant features. This endeavor significantly bolsters classification accuracy or other post processing components, such as search or clustering. In this context, the feature selection can be as a single-objective optimization task, with the primary objective of maximizing classification accuracy or multi objective one while minimizing the number of selected features as well which can be tackled using population-based metaheuristic algorithms. Considering that feature selection (FS) inherently poses a binary problem, given that most metaheuristic algorithms operate in continuous domains (such as Real-coded GA, DE, PSO, and CMA-ES), transitioning them to binary search spaces necessitates substantial operator modifications. One of the crucial steps of the population-based algorithms is initialization of the population which significantly impacts both the convergence speed and the quality of the final solution. However, in most cases, especially with continuous algorithms, random initialization emerges as the predominant method for generating candidate solutions (initial population) which lacks diversity in the large-scale feature selection search spaces. In this paper, two novel population initialization methods within the continuous search space are introduced, followed by a comparative analysis against the traditional random initialization method. Experimental results conducted on ten datasets with 300 to 11,000 features demonstrate the effectiveness of population-level uniform initialization, surpassing the widely recognized individual level uniform initialization method. The experiments are conducted using Differential Evolution as the single objective algorithm and Generalized Differential Evolution3 as multi-objective algorithm as our case studies. This study demonstrates the crucial role of initialization in the population-based optimization algorithms when they tackle binary problems.
|
|
09:25-09:45, Paper MoAT7.5 | |
Port AGV Hierarchical Formation Control Considering High-Frequency Disturbance Factors |
|
Zhang, Qiang | Wuhan University of Technology |
Li, Wenfeng | Wuhan University of Technology |
Qi, Xiaohang | Wuhan University of Technology |
Keywords: Cooperative Systems and Control, Robotic Systems
Abstract: To enhance the flexibility of port horizontal transportation systems and improve the smoothness, accuracy, and efficiency of Automated Guided Vehicles (AGVs) motion in high-frequency disturbance environments, a longitudinal and lateral hierarchical formation control strategy for AGVs based on angle and velocity tracking is proposed. Based on the leader-follower formation control model, an AGV control system is designed, comprising a lateral controller for Sliding Mode Control (SMC) based on angle tracking and longitudinal controllers for SMC and Proportion Integration (PI) based on velocity tracking. To address the impact of high-frequency disturbance signals in the port environment, a first-order Low Pass Filter (LPF) is designed to enhance the robustness of the AGVs formation control system. Finally, tests were conducted in a combined simulation environment using Simulink and Trucksim, focusing on typical operational conditions for empty AGV formations. Simulation results indicate that the proposed scheme significantly enhances the robustness of the port AGV control system.
|
|
MoAT8 |
MR08 |
Online - Adaptive Systems and Control 2 |
|
Chair: Yu, Bihui | Shenyang Institute of Computing Technology, Chinese Academy of Sciences & University of Chinese Academy of Sciences |
|
08:45-09:05, Paper MoAT8.3 | |
Perturbing and Backtracking Based Textual Adversarial Attack |
|
Qiao, Yuanxin | Beijing Information Science and Technology Univerisity |
Xie, Ruilin | Beijing Information Science and Technology University |
Xie, Songcheng | Beijing Information Science and Information Technology Universit |
Cui, Zhanqi | Beijing Information Science and Technology University |
Keywords: Machine Learning, Deep Learning, Application of Artificial Intelligence
Abstract: In the field of Natural Language Processing (NLP), Language Models (LMs) are widely applied in tasks such as text classification, machine translation, and knowledge reasoning. However, the defects of LMs make them vulnerable to adversarial attacks, resulting in substantial economic losses. Adversarial examples can effectively expose vulnerabilities of LMs and be used for adversarial training to improve the robustness of the models. Existing methods mostly generate adversarial examples by first selecting important tokens and then adding perturbations to them. Such methods require a large number of queries to the victim model, which is not applicable in scenarios where the query budget is limited. To address the imperative demands for more query-efficient adversarial example generation, this paper presents CBAPB, a Classification Boundary Adjacent Perturbation and Backtrack based textual adversarial attack method, which initially introduces coarse-grained perturbations at random positions while preserving the original semantics of input examples until they reach the similarity threshold. Subsequently, fine-grained perturbation backtracking is conducted on all successfully misclassified examples to minimize perturbation magnitudes. We conduct multiple experiments on the Yelp Reviews, AG News, and DBpedia datasets by employing BERT as the victim model. Comparative analysis against baselines reveals that CBAPB requires merely 3.2% of the average query times of these baselines, while increasing the attack success rate by 7.6%, with only a slight decrease of 1.5% in textual similarity. Experimental results demonstrate the effectiveness of CBAPB, which is not only a query-efficient method but also with greater attack success rates.
|
|
09:05-09:25, Paper MoAT8.4 | |
Faster and More Efficient Subject Image Generation for Text-To-Image Diffusion Models |
|
Yu, Bihui | Shenyang Institute of Computing Technology, Chinese Academy of S |
Yao, Zhengbing | Shenyang Institute of Computing Technology, Chinese Academy of S |
Wei, Jingxuan | Shenyang Institute of Computing Technology, Chinese Academy of S |
Sun, Linzhuang | Shenyang Institute of Computing Technology, Chinese Academy of S |
Zhang, Sibo | Shenyang Institute of Computing Technology, Chinese Academy of S |
Bu, Liping | Shenyang Institute of Computing Technology, Chinese Academy of S |
Keywords: AI and Applications, Application of Artificial Intelligence, Image Processing and Pattern Recognition
Abstract: In recent years, there has been significant progress in text-to-image generation models. However, text struggles to accurately describe abstract concepts like shapes and sizes. Some methods have been proposed to enhance text prompt by incorporating image prompts. While they have shown effective improvements, they either require substantial fine-tuning costs or struggle to effectively integrate text and image information. In our study, we delve into the issue of the difficulty in integrating text and image information in decoupled cross-attention and conduct visual analysis. We identify the presence of background-related tokens in image features as a key factor affecting text fidelity. To address this issue, we develop an algorithm to filter out these tokens. Additionally, we observe differences in the attention of Unet layers to text prompts and image prompts. Based on this finding, we optimize the flow of image information to reduce interference with text information. In summary, we introduce a new topic-customized method that requires no repeated training. It trains a plug-and-play image prompt adapter with only 417M parameters, lightweight yet powerful, surpassing existing models in both text and image consistency. Our code and pre-trained checkpoints will be available at https://github.com/YZBPXX/DDCA
|
|
09:25-09:45, Paper MoAT8.5 | |
TOPSIS Method Based on Entropy Weight for Interval-Valued Neutrosophic Numbers and Its Application in Multi-Attribute Group Decision-Making |
|
Sun, Mengran | Qilu University of Technology (Shandong Academy of Sciences) |
Ding, Xin | Synthesis Electronic Technology Co |
Geng, Yushui | Qilu University of Technology (Shandong Academy of Sciences) |
Zhao, Jing | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Decision Support Systems, Control of Uncertain Systems
Abstract: Given the inherent ambiguity in human thinking and judgment, as well as the complexities of objective realities and decision-making contexts, accurately expressing decision-making information with precise numerical values is challenging. To address the uncertainties, incompleteness, and inconsistencies in multi-attribute decision-making, along with the issue of unknown attribute weights, this study introduces the entropy weight-based interval-valued neutrosophic set method. By employing this method, an extended TOPSIS approach is proposed, which integrates the TOPSIS method with interval-valued neutrosophic set theory. Through the TOPSIS method, candidate solutions are assessed based on their proximity to both the ideal solution and the negative ideal solution. Ultimately, the effectiveness and applicability of this methodology are demonstrated through a case study involving the selection of an e-commerce platform, validating its utility in real-world decision-making scenarios.
|
|
MoAT9 |
MR09 |
Agent-Based and Autonomous Systems 1 |
|
Chair: Hayashida, Tomohiro | Hiroshima University |
|
08:45-09:05, Paper MoAT9.3 | |
Multi-Agent Reinforcement Learning-Based UAV Swarm Confrontation: Integrating QMIX Algorithm with Artificial Potential Field Method |
|
Zhang, Chao | Beijing University of Posts and Telecommunications |
Wu, Zhangyan | Beijing University of Posts and Telecommunications |
Li, Zhaoxin | Beijing University of Posts and Telecommunications |
Xu, Hao | Beijing University of Posts and Telecommunications |
Xue, Zhihao | Beijing University of Posts and Telecommunications |
Qian, Rongrong | Beijing University of Posts and Telecommunications |
Keywords: Swarm Intelligence, Agent-Based Modeling, Deep Learning
Abstract: As an important area of machine learning, reinforcement learning has specific applicability in multi-agent systems (including UAV swarms). In this article, we use reinforcement learning algorithm (i.e., the QMIX algorithm) to resolve the problem of UAV swarm confrontation, considering the condition of asymmetric confrontation under which the adversary’s combat power is much stronger than our own. First, after constructing the system model, we develop the QMIX algorithm by designing the state space, action space, and reward function. Second, we propose a confrontation strategy that integrates decisions made by the QMIX algorithm and the artificial potential field method for UAV swarm confrontation. Finally, the experimental results show that our proposed confrontation strategy has a 72% higher win rate compared to the QMIX algorithm under asymmetric confrontation conditions.
|
|
09:05-09:25, Paper MoAT9.4 | |
How Collective Intelligence Emerges in a Crowd of People through Learned Division of Labor: A Case Study |
|
Wang, Dekun | Harbin Institute of Technology, Shenzhen |
Zhang, Hongwei | Harbin Institute of Technology, Shenzhen |
Keywords: Agent-Based Modeling, Swarm Intelligence, Artificial Social Intelligence
Abstract: This paper investigates the factors fostering collective intelligence (CI) through a case study of *LinYi's Experiment, where over 2000 human players collectively controll an avatar car. By conducting theoretical analysis and replicating observed behaviors through numerical simulations, we demonstrate how self-organized division of labor (DOL) among individuals fosters the emergence of CI and identify two essential conditions fostering CI by formulating this problem into a stability problem of a Markov Jump Linear System (MJLS). These conditions, independent of external stimulus, emphasize the importance of both elite and common players in fostering CI. Additionally, we propose an index for emergence of CI and a distributed method for estimating joint actions, enabling individuals to learn their optimal social roles without global action information of the whole crowd.
|
|
09:25-09:45, Paper MoAT9.5 | |
Integrating Task Allocation and Hierarchical Reinforcement Learning for Optimized Cargo Transport Routing (I) |
|
Hayashida, Tomohiro | Hiroshima University |
Sekizaki, Shinya | Hiroshima University |
Furukawa, Ryuya | Hiroshima University |
Nishizaki, Ichiro | Hiroshima University |
Keywords: Agent-Based Modeling, Optimization and Self-Organization Approaches, Application of Artificial Intelligence
Abstract: In recent years, there have been growing expectations for the improvement of operational efficiency through the use of artificial intelligence (AI: Artificial Intelligence) in various fields such as healthcare, industry, and services. Additionally, advancements in robotics technology have enabled widespread collaborative operations among multiple autonomous mobile robots. Since the 1980s, there has been active research on improving the efficiency of coordinated actions by multiple autonomous mobile robots using Multi-Agent Robot Systems (MARS) (Cao et al.; 1997, Yan et al.; 2013, Ismail and Sariff; 2019). This study focuses on robots that transport cargo within a warehouse and aims to develop learning methods for effective division of labor among multiple robots. In large and complex environments, agents need to undergo extensive repetitive learning to make appropriate action choices solely through machine learning techniques such as reinforcement learning. Traditional multi-agent methods like Qcite{RashidEtal20}MIX (Rashid et al.; 2020) and MADDPG (Multi-Agent Deep Deterministic Policy Gradient) (Lowe et al.; 2017) require substantial computational time for environmental exploration, presenting a significant challenge. Therefore, this paper proposes a two-tiered learning model that separates overall optimization, such as establishing a general procedure for task performance, from individual optimization based on local situational judgments. Additionally, to demonstrate the effectiveness of the proposed method, a simulation system tailored to the target environment is constructed, and simulation analysis based on this system is conducted.
|
|
MoAT10 |
MR10 |
Cybernetics and Quantum Systems 1 |
Special Sessions: Cyber |
Chair: Liu, Pengda | Chongqing Technology and Business University |
|
08:45-09:05, Paper MoAT10.3 | |
Improved Event-Triggered Approximate Optimal Control for Nonlinear Nonzero-Sum Games Using Reinforcement Learning (I) |
|
Liu, Pengda | Chongqing Technology and Business University |
Zhang, Huiyan | Chongqing Technology and Business University |
Shi, Peng | University of Adelaide |
Rudas, Imre | Obuda University |
Keywords: Heuristic Algorithms, Cybernetics for Informatics, Neural Networks and their Applications
Abstract: This paper presents event-triggered integral reinforcement learning methods to solve nonlinear nonzero-sum differential game problems. Firstly, for nonlinear systems, by constructing coupled Hamilton-Jacobi equations, the theoretical basis for solving multi-player nonzero-sum game problems is established. With the help of integral reinforcement learning, the approximate optimal control strategy corresponding to each player can be obtained without knowing the drift dynamics of the system. Then, the event-triggered mechanism with preliminary operation is constructed by designing appropriate triggering condition. The dynamic triggering mechanism is further integrated into the algorithm architecture of online learning method to realize aperiodic adaptive learning and sampling control, and effectively save system computing and communication resources. Finally, the effectiveness of the proposed reinforcement learning method is verified by theory analyses and simulation experiments.
|
|
09:05-09:25, Paper MoAT10.4 | |
Cloud-Model-Improved Transformer for Robust Load Prediction of Power Network Clusters (I) |
|
Jiang, Cheng | The Department of Computer Application Technology, College of Co |
Lu, Gang | The Energy Strategy and Planning Research Department, State Grid |
Ma, Xue | The Green Energy Development Research Institute (Qinghai) and Th |
Wu, Di | Southwest University |
Keywords: Machine Learning, Deep Learning
Abstract: 来自电网集群的负载数据表明经济 每个领域的发展,对于预测区域性发展至关重要 趋势和指导企业决策。这 变压器模型,一种领先的负载预测方法, 由于变量,在对历史数据建模方面面临挑战 例如天气、事件、节日和数据波动性。自 为了解决这个问题,云模型的模糊特性被用来 有效管理不确定性。呈现创新 方法,云模型改进转换器 (CMIT) 方法将 Transformer 模型与云端集成 利用粒子群优化算法的模型, 以实现稳健和精确的电力负载为目标 预测。通过对以下方面进行的对比实验 一个电力网络集群内的31个真实数据集,它是 证明CMIT显著超过了 Transformer 模型在预测精度方面,从而 突出其在加强预测方面的有效性 电力网络集群领域的能力。
|
|
09:25-09:45, Paper MoAT10.5 | |
A Novel Extended-Kalman-Filter-Incorporated Latent Feature Model on Dynamic Weighted Directed Graphs (I) |
|
Zhou, Hongxun | Southwest University |
Chen, Xiangyu | Southwest University |
Yuan, Ye | Southwest University |
Keywords: Machine Learning, Representation Learning
Abstract: 摘要— 动态加权有向图 (DWDG) 是 常见于各种应用场景中。它 涉及众多之间广泛的动态交互 节点。大多数现有方法都探索了错综复杂的 隐藏在DWDG中的时间模式从纯粹的 数据驱动的视角,会遭受准确性损失 当DWDG随着时间的推移表现出强烈的波动时。自 针对这一问题,本研究提出了一种新颖的方案 扩展卡尔曼滤波合并潜在特征 (EKLF) 用于表示模型驱动的 DWDG 的模型 透视。其主要思想分为以下几点 两点思路:a) 采用控制模型,即 扩展卡尔曼滤波 (EKF),用于跟踪复杂的时间 精确地具有非线性状态跃迁的模式和 观察功能;b) 引入交替 最小二乘法 (ALS) 算法,用于训练潜在特征 (LFs) 或者用于精确表示 DWDG。 对DWDG数据集的实证研究表明, 所提出的EKLF模型在ߣ
|
|
MoAT11 |
MR11 |
Computational Intelligence and Soft Computing 4 |
Special Sessions: Cyber |
Chair: Obo, Takenori | Tokyo Metropolitan University |
|
08:45-09:05, Paper MoAT11.3 | |
Energy-Optimized Offloading of Delay-Sensitive Tasks in Hybrid Edge-Cloud Computing (I) |
|
Yuan, Haitao | Beihang University |
Wang, Shen | Beihang University |
Ma, Yaofei | Beihang University |
Bi, Jing | Beijing University of Technology |
Yang, Jinhong | CSSC Systems Engineering Research Institute |
Zhang, Jia | Southern Methodist University |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Evolutionary Computation, Computational Intelligence, Cloud, IoT, and Robotics Integration
Abstract: Abstract—Currently, a cloud-edge collaborative system combines almost unlimited storage and computing resources where tasks can be migrated to high-performance servers in edge servers or the cloud. However, resource allocation and task offloading present big challenges due to the competition among mobile devices (MDs) for communication and computing resources of edge servers. Therefore, it is significant to properly offload MDs' tasks to edge servers or the cloud. This work proposes a collaborative edge-cloud architecture, including a centralized cloud, edge servers, and MDs. Then, this work jointly considers computing power, task sizes, computing resources, transmission power of MDs, transmission rates, computing power, transmission power, computing resource of edge servers, and computing resource of the cloud. Considering the abovementioned factors, this work designs a mixed-integer nonlinear programming problem. To solve this problem, a genetically simulated annealing-based particle swarm optimization (GSPSO) algorithm is proposed to obtain the best solution. Building upon it, this work proposes an energy-minimized task offloading and resource allocation strategy, thereby minimizing the system’s energy consumption while ensuring strict task response time limits. Experimental results show that GSPSO reduces the system’s energy by 66.34%, 34.65%, and 4.95% more than particle swarm optimization (PSO), self-adaptive PSO, and Tyrannosaurus optimization.
|
|
09:05-09:25, Paper MoAT11.4 | |
Multilayer Topological Clustering for Human Motion Segmentation (I) |
|
Obo, Takenori | Tokyo Metropolitan University |
Hamada, Kunikazu | Tokyo Metropolitan University |
Matsuda, Tadamitsu | Juntendo University |
Kubota, Naoyuki | Tokyo Metropolitan University |
Keywords: Computational Intelligence, Optimization and Self-Organization Approaches
Abstract: This paper presents a method for human motion segmentation aimed at motion analysis in healthcare and rehabilitation. Motion segmentation involves extracting small movements, known as motion primitives, from a sequence of behavioral patterns. While previous works have utilized unsupervised clustering methods as effective approaches for motion segmentation, many of these methods require prior knowledge to enhance performance. To overcome these challenges, we propose a hierarchical topological clustering method capable of representing spatiotemporal features using GNG and the Pulse Neuron Model. Additionally, we present experiments and discussions to validate the applicability of the proposed method for motion analysis in exercise.
|
|
09:25-09:45, Paper MoAT11.5 | |
Large Language Model Implemented Simulated Annealing Algorithm for Traveling Salesman Problem (I) |
|
Wang, Debing | Sun Yat-Sen University |
Zhang, Zizhen | Sun Yat-Sen University |
Teng, Yi | Guangdong University of Education |
Keywords: Computational Intelligence, Hybrid Models of Computational Intelligence, Application of Artificial Intelligence
Abstract: Large language models (LLMs) have recently attracted significant attention and permeated diverse fields and disciplines. This paper aims to investigate the efficacy of LLMs in efficiently tackling combinatorial optimization problems and integrating them with traditional heuristic algorithms. Firstly, we describe the fundamental concepts and developmental history of LLMs, outlining the basic LLM framework involving the instance prompt, solution prompt, and algorithm prompt. Subsequently, we introduce a novel LLM implemented simulated annealing (SA) approach that enhances the basic LLM method. In the experiments, we present the average iterations required, convergence speed, and overall solution quality of LLM-based approaches in addressing the Traveling Salesman Problem (TSP). The results demonstrate that the integration of LLM with SA can enhance TSP-solving capabilities. Our research endeavors to empower non-specialists to effectively address combinatorial optimization problems.
|
|
MoAT12 |
MR12 |
Haptic and Human-Computer Interaction 1 |
Regular Papers - HMS |
Chair: Ono, Keiichi | Tokyo University of Science |
|
08:45-09:05, Paper MoAT12.3 | |
Perspectives-Observer-Transparency - a Novel Paradigm for Modelling the Human in Human-To-Anything Interaction Based on a Structured Review of the Human Digital Twin |
|
Mandischer, Nils | University of Augsburg |
Atanasyan, Alexander | RWTH Aachen University |
Schluse, Michael | RWTH Aachen University |
Rossmann, Juergen | RWTH Aachen University |
Mikelsons, Lars | University of Augsburg |
Keywords: Human-Machine Cooperation and Systems, Human-Machine Interaction, Design Methods
Abstract: Modern modelling approaches fail when it comes to understanding rather than pure supervision of human behavior. As humans become more and more integrated into human-to-anything interactions, the understanding of the human as a whole becomes critical. In this paper, we conduct a structured review of the human digital twin to indicate where modern paradigms fail to model the human agent. Particularly, the mechanistic viewpoint limits the usability of human and general digital twins. Instead, we propose a novel way of thinking about models, states, and their relations: Perspectives-Observer-Transparency. The modelling paradigm indicates how transparency - or whiteness - relates to the abilities of an observer, which again allows to model the penetration depth of a system model into the human psyche. The split in between the human's outer and inner states is described with a perspectives model, featuring the introperspective and the exteroperspective. We explore this novel paradigm by employing two recent scenarios from ongoing research and give examples to emphasize specific characteristics of the modelling paradigm.
|
|
09:05-09:25, Paper MoAT12.4 | |
An Immersive Mirror-Reversal Interface for Teleoperated Bring-Me Task Using a Mobile Manipulator |
|
Ono, Keiichi | Tokyo University of Science |
Yuguchi, Akishige | Tokyo University of Science |
Matsumoto, Yoshio | Tokyo University of Science |
Keywords: Human-Machine Interaction, User Interface Design, Human-Collaborative Robotics
Abstract: One of the major tasks of assistive robots for future domestic applications is to bring objects requested by the user, which is called a ``Bring-me Task.'' Many of those previous studies including the Bring-me Task focus on the technologies to perform the task autonomously. However, object recognition, grasping, and navigation on autonomous robots are still difficult and not yet practical in a real-world unstructured environment. As an alternative approach, Bring-me Tasks by teleoperated robots have also been studied, but such systems require remote operators to perform the task for the user. In this paper, we propose a novel approach to the Teleoperated Bring-me Task in which a user teleoperates a robot by him/herself and receives an object from the robot. The user wears an HMD showing a mirror-reversal image from the viewpoint of the robot. The results from the experiments on the Teleoperated Bring-me Task suggest that the proposed interface with the mirror-reversal image improved the performance of receiving the object by hand compared with the original image. Furthermore, it was also confirmed that our interface improved the operability of the robot.
|
|
09:25-09:45, Paper MoAT12.5 | |
Dynamic Hand Gesture Recognition for Robot Manipulator Tasks |
|
Sharma, Dharmendra | Indian Institute of Technology Mandi |
Thakur, Peeyush | Indian Institute of Technology Mandi |
Gupta, Sandeep | Indian Institute of Technology Mandi |
Dhar, Narendra Kumar | Indian Institute of Technology Mandi |
Behera, Laxmidhar | IIT Kanpur |
Keywords: Human-Collaborative Robotics, Human-Machine Interaction, Human-Computer Interaction
Abstract: This paper proposes a novel approach to recognizing dynamic hand gestures facilitating seamless interaction between humans and robots. Here, each robot manipulator task is assigned a specific gesture. There may be several such tasks, hence, several gestures. These gestures may be prone to several dynamic variations. All such variations for different gestures shown to the robot are accurately recognized in real-time using the proposed unsupervised model based on the Gaussian Mixture model. The accuracy during training and real-time testing prove the efficacy of this methodology.
|
|
MoBT1 |
MR01 |
AI Applications 2 |
Regular Papers - Cybernetics |
Chair: Zhu, Zhenyu | HoHai University |
|
11:00-11:20, Paper MoBT1.1 | |
DPCA: Dynamic Probability Calibration Algorithm in LMaaS |
|
Deng, Zhongyi | South China University of Technology |
Chen, C. L. Philip | University of Macau |
Zhang, Tong | South China University of Technology |
Keywords: AI and Applications, Application of Artificial Intelligence, Deep Learning
Abstract: Probability calibration is a method to improve the reliability of models by linking the predicted probability to accuracy. Most research follow a static strategy of full fine-tuning. These studies do not consider dynamic data and sparse parameters in Language Models as a Service(LMaaS), leading to limited effectiveness of probability calibration. To address above issues, we propose a dynamic probability calibration algorithm (DPCA) to consider both data flow and parameter freezing. DPCA consists of streaming annotation (SA) task and dynamic calibration (DC) task. The SA task takes a specified number of samples from the training data stream. The sampled data is automatically annotated according to the deviation between predicted probability and true label. The DC task injects the probability deviation of the SA task into next training epoch through adapter-tuning. DPCA achieves data augmentation in LMaaS through joint learning of sample labels and their predicted probability deviation. This work validates DPCA through the implementation on BERT architecutre. The proposed model achieves overall performance improvement on both Chinese and English NLP tasks. Experimental results demonstrate a 2.17% decrease in average ECE without decreasing in accuracy. Experimental analysis demonstrate the effectiveness and generalizability of DPCA in LMaaS.
|
|
11:20-11:40, Paper MoBT1.2 | |
Multiscale Feature Extraction and Attention Mechanism Generative Adversarial Network for Super-Resolution and Deblurring of Fundus Images |
|
Sha, Hualing | Wuhan Textile University |
Zhou, GuoPeng | Wuhan Textile University |
Jianquan, Zhang | Hubei University of Science and Technology |
Keywords: Application of Artificial Intelligence
Abstract: High-resolution and clear fundus images are essential to help physicians diagnose lesions. However, the imaging quality of acquired fundus images often has errors due to differences in operator experience and equipment limitations. To address this problem, this paper proposes a super-resolution network for retinal fundus images based on generative adversarial networks (GANs). The network aims to improve the resolution of fundus images and restore fine retinal structure and lesion details. First, based on the analysis of the ophthalmic mirror system, we designed a new degradation model to simulate the effects of various unfavorable factors on fundus images, and thus constructed a batch of fundus image datasets for fundus image super-resolution work.In order to enhance the network's ability to extract local information, we introduced a texture reply block based on coordinate attention. Meanwhile, in order to capture the fundus image features at different scales, we also add a multi-scale feature extraction block to realize the fusion of multi-scale features.Experimental results show that our network is able to reconstruct high-quality fundus images, and the proposed method outperforms other super-resolution deblurring methods in both PSNR and SSIM metrics. This result provides strong support for accurate diagnosis of fundus images.
|
|
11:40-12:00, Paper MoBT1.3 | |
Enhancing On-Device Inference Security through TEE-Integrated Dual-Network Architecture |
|
Zhu, Zhenyu | HoHai University |
Qu, Zhihao | Hohai University |
Jia, Ninghui | Hohai University |
Zhou, Wenxuan | HoHai University |
Ye, Baoliu | Nanjing University |
Keywords: Deep Learning, Cloud, IoT, and Robotics Integration, Neural Networks and their Applications
Abstract: Trusted Execution Environment(TEE) offers a secure data processing zone for model inference. Due to its limited resources, existing solutions like MirrorNet usually deploy a lightweight model within the TEE for sensitive data, and a backbone model outside for the rest. However, these approaches do not inherently limit the learning ability of the backbone model, which could acquire inference capabilities similar to the lightweight model, inevitably weakening the security. To counter this, we propose FakeNN, an innovative mechanism with a dual-network architecture, that intentionally guides the backbone model towards low predictive performance, thereby reducing its ability to infer sensitive information. We further improve the accuracy of the entire model by integrating a channel attention mechanism which reduces the transmission of redundant information. We conduct extensive experiments and the results demonstrate that FakeNN substantially expands the performance gap between the non-secure and secure TEE models, with improvements ranging from 3.16% to 66.42% compared to MirrorNet. This enhancement strengthens the system’s security without negatively impacting the accuracy of the model.
|
|
MoBT2 |
MR02 |
AI Applications 6 |
Regular Papers - Cybernetics |
Chair: Oikawa, Haruki | Tokyo University of Sciense |
|
11:00-11:20, Paper MoBT2.1 | |
Dynamic Fusion Network for Multi-Domain Dialogue State Tracking |
|
Li, Donghao | School of Cyber Security, University of Chinese Academy of Scien |
Weng, Jinta | School of Cyber Security, University of Chinese Academy of Scien |
Wu, Hao | Guangzhou University |
Ye, Zhuohai | School of Education(Teachers College), Guangzhou University |
Hu, Yue | School of Cyber Security, University of Chinese Academy of Scien |
Huang, Heyan | School of Computer Science and Technology, Beijing Institute Of |
Keywords: Transfer Learning, AI and Applications, Machine Learning
Abstract: Recently, some remarkable achievements have been made in the research of end-to-end task-oriented dialogue system. However, because most dialogue systems rely on large amounts of labelled data, it is difficult for models to transfer to low-resource domains. At present, relevant researches pay little attention on the prediction of users' dialogue intention, which limits the model’s ability to generate responses. Therefore, we propose a novel Dynamic Fusion Network for Multi-domain Dialogue State Tracking (DFN-DST), It is a framework that combines a dynamic fusion network that automatically captures domain dependencies with a dialogue state tracking module that predicts slot values. It combines the domain embedding and the slot embedding to enable the model not only to improve the performance of extracting users’ goals, but also to effectively transfer to the unseen domain by tracking the unseen slot values. Empirical results demonstrate that DFN-DST achieves state-of-the-art Entity F1 Score of 38.0% for the three domains of Multi-WOZ, a human-human dialogue dataset. In addition, DFN-DST outperforms the best baseline DFN by 3.6% on average in low-resource Settings, showing its better transferability.
|
|
11:20-11:40, Paper MoBT2.2 | |
Real-Time Smoke Detection Network Based on Multi-Scale Feature Recognition and Lightweight Architecture Design |
|
Li, Ganggang | Xinjiang University |
Li, Yongming | Xinjiang University |
Jiang, Shaochen | Xinjiang University |
Keywords: AI and Applications, Deep Learning, Image Processing and Pattern Recognition
Abstract: 森林火灾对全球生态系统产生了重大影响 和人类社会,需要发展 高效、准确的早期烟雾探测技术 用于火灾。然而,目前的烟雾探测技术 在实时应用中面临多重挑战, 包括参数量大,计算量高 复杂度高,在复杂场景中检测精度低。 因此,基于 YOLOv8,我们提出一种轻量级的、 高精度、实时的烟雾探测网络, MLSD(MultiScale 轻量级烟雾探测网络)。 首先,为了降低计算复杂度和 模型的参数数量,我们提出了一个 轻量级检测头,称为EISDH(高效) 信息共享检测头)。二、减少 提取冗余特征,我们创新性地提出 C2f-Pconv模块。三、提升提取力度 多尺度和微妙烟雾特征的能力 对于复杂的视觉场景,下采样模块ADown为 创新性地集成到模型中。MLSD演示 在三个测试基准测
|
|
11:40-12:00, Paper MoBT2.3 | |
Immobility Recognition System in Tail Suspension Test Using Single Camera and Deep Learning |
|
Oikawa, Haruki | Tokyo University of Sciense |
Kobayashi, Daima | Tokyo University of Science |
Yamamoto, Masataka | Tokyo University of Science |
Hagiwara, Akari | Tokyo University of Science |
Takemura, Hiroshi | Tokyo University of Science |
Keywords: AI and Applications, Deep Learning, Machine Learning
Abstract: The tail suspension test (TST) is a common method used in animal experiments with mice, particularly to evaluate the efficacy of antidepressant drugs. However, annotating immobility, a key metric in TST, often relies on human visual inspection. Therefore, there is a demand for a system capable of automatically recognizing immobility from video data. In this study, a system capable of automatically and accurately recording immobility was developed using a single domestic camera and deep learning analysis. The system employs two deep learning models, allowing for the evaluation of both temporal movement data and static mouse postures. This system achieves immobility recognition with extremely high precision, as evidenced by a correlation coefficient (r) of 0.99 with human annotations. It is considered to improve behavioral analysis in TST, providing an unbiased and automated approach, thereby contributing to advances in neurology.
|
|
MoBT3 |
MR03 |
Design Methods and Information Systems 2 |
|
Chair: Wallace, Lankha | University of Tsukuba |
|
11:00-11:20, Paper MoBT3.1 | |
Tele-Running: Trajectory Generation for Monopod Robots by Teleoperation |
|
Wandinger, David | German Aerospace Center (DLR) |
Schmidt, Annika | Technical University of Munich |
Raffin, Antonin | German Aerospace Center (DLR) |
Albu-Schäffer, Alin | DLR - German Aerospace Center |
Keppler, Manuel | German Aerospace Center (DLR) |
Keywords: Haptic Systems, Design Methods
Abstract: Energy-efficient legged locomotion in robots depends on exploiting passive dynamics, particularly with integrated mechanical compliance. Traditional model-based control strategies that leverage these dynamics often encounter challenges due to inherent model uncertainties. We propose a novel model-free method for generating motor trajectories in compliantly actuated monopods using teleoperation with force feedback. This approach allows operators to detect ground reaction forces and excite the robot’s natural frequency, achieving highly efficient hopping. The method results in a mechanical Cost of Transport (CoT) of 0.25 at 0.63 m/s on an articulated hopper. To further enhance energy efficiency and adjust for hardware variations, these trajectories are refined using Black-Box Optimization (BBO) directly on the hardware. Experimental results confirm that these optimized trajectories closely match the efficiency of those initiated by humans, demonstrating the effectiveness of this method in exciting the robot’s natural dynamics.
|
|
11:20-11:40, Paper MoBT3.2 | |
Functional 3D-Printed Finger Prosthesis with Compliant Fingertips for Support of IADLs |
|
Wallace, Lankha | University of Tsukuba |
Hassan, Modar | University of Tsukuba |
Shimizu, Yukiyo | University of Tsukuba |
Kikuchi, Akira | University of Tsukuba |
Suzuki, Kenji | University of Tsukuba |
Keywords: Design Methods, Assistive Technology, Human Enhancements
Abstract: We propose a method for fabricating custom passive finger prostheses with compliant fingertips. The prosthetic devices detailed in this study are intended to mimic the compliance of biological fingertips: improving the capacity and comfort of keyboard typing, the manipulation of small objects, and instrumental activities of daily living in general. The proposed method utilizes 3D scanning, 3D printing, and computer-aided design to realize the compliant fingertips feature and to create custom-fit prostheses. We present the clinical application of the proposed prostheses on two end users and the evaluation results of typing speed and gross finger dexterity. The results showed successful fitting of the developed prostheses on both participants. Performance tests showed comparable performance in typing speed and accuracy with and without prostheses, and improved comfort with the prostheses. Gross finger dexterity and pinch force did not show improvement with the prostheses. The participants were able to utilize all fingers in keyboard typing and other iADLs instead of only the intact fingers, which indicates the potential for further functional improvement after habituation to the prostheses. In addition to improved comfort the participants also reported reduced pain in sensitive sites, and aesthetic satisfaction with the prostheses.
|
|
11:40-12:00, Paper MoBT3.3 | |
Path Tracking Algorithm for Mobile Robot Based on Learning-Based Nonlinear Model Predictive Control (I) |
|
Cheng, Poyuan | National Taipei University of Technology |
Chen, YuJie | National Taipei University of Technology |
Lian, Kuang-Yow | National Taipei University of Technology |
Keywords: Design Methods, Environmental Sensing,
Abstract: As a key vehicle in disaster relief scenarios, the reliability and path-following capability of autonomous vehicles are crucial. We developed a tracked mobile robot with a robust chassis to adapt to varied road conditions. A machine learning-based nonlinear model predictive control (NMPC) algorithm, combining the XGBoost model with insights gained from the disturbances encountered during the mobile robot’s operation, significantly reduces path-tracking errors. Experimental results demonstrate an average path-following error of 14 cm on smooth surfaces and 17 cm on surfaces with potholes and slopes, showcasing excellent performance across different road conditions. With accumulated experience in repeated path following, further performance improvements are achievable.
|
|
MoBT5 |
MR05 |
Adaptive Systems and Control 2 |
Regular Papers - SSE |
Chair: Ignaciuk, Przemyslaw | Lodz University of Technology |
|
11:00-11:20, Paper MoBT5.1 | |
Balancing Energy Production and Drought Protection in Prosumer Hydropower Networks under Intense Vaporization |
|
Ignaciuk, Przemyslaw | Lodz University of Technology |
Morawski, Michal | Lodz University of Technology |
Keywords: Consumer and Industrial Applications, System Modeling and Control
Abstract: Susceptibility to weather conditions combined with different consumption and energy generation schemes threatens the large-scale adoption of renewable sources of energy. Moreover, the associated price fluctuation lowers economic gain. As a countermeasure, one may consider energy depots, although not in the form of bulky industrial installations that impose substantial costs themselves. This work proposes to involve a system of connected prosumer hydro plants with ponds acting as energy reservoirs to create the desirable buffer. For that purpose, a dynamic model of a hydro multi-cascade is constructed, and a cost-optimal control law is established via a formal analytical procedure. In addition to economic benefits, the proposed control policy counteracts droughts in periods of high vaporization, e.g., during sunny summer days or in arid climates. The prosumers are estimated to increase their revenue by as much as 5%, whereas the operators gain from reduced load variation on the grid.
|
|
11:20-11:40, Paper MoBT5.2 | |
Self-Optimizing Control of Stochastic Systems |
|
Su, Hongxin | Zhejiang University |
Zhou, Chenchen | Zhejiang University |
Cao, Yi | Zhejiang University |
Zhang, Xuefeng | Northeastern University |
Shuang-Hua, Yang | Zhejiang University |
Keywords: Control of Uncertain Systems
Abstract: Optimal control of stochastic systems involves finding control strategies that optimize certain performance criteria while accounting for the parametric uncertainties and stochastic additive disturbances involved in the system dynamics. Model predictive control (MPC) solves an open-loop constrained stochastic optimal control problem repeatedly in a receding-horizon manner, resulting in large computation sometimes. Alternatively, the proposed stochastic self-optimizing control (SOC) selects optimal nonlinear controlled variables (CVs) offline by minimizing the expectation of the weighted closed-loop loss function based on neural network training. The nonlinear self-optimizing CVs are simply kept constant online so that the satisfactory control performance can be achieved. The proposed stochastic SOC requires much less online computation time compared with MPC, which is demonstrated by a two-mass spring simulation model.
|
|
11:40-12:00, Paper MoBT5.3 | |
Adaptive Backstepping Integral Sliding Mode Control of Multirotor UAV System Used for Smart Agriculture |
|
Shi, Yuhao | Control System Laboratory, University of Nottingham Ningbo China |
Ijaz, Salman | Control System Laboratory, University of Nottingham Ningbo China |
He, Zenan | Zhejiang University |
Javaid, Umair | Ningbo University of Technology, Ningbo China |
Xia, Yu | University of Nottingham Ningbo China |
Keywords: Control of Uncertain Systems, Mechatronics, System Modeling and Control
Abstract: This work proposes a reliable control scheme to attain the precise tracking control of multirotor unmanned aerial vehicle systems used for smart agriculture. The nonlinear mathematical model of co-axial octorotor system equipped with a spraying mechanism is first established that contains the time-varying inertial coefficients and payload effect. Then an adaptive backstepping controller scheme is proposed to attain the desired attitude and position tracking. To ensure robustness against parameter uncertainty and external disturbances, a high-order integral sliding mode controller is integrated with the adaptive backstepping controller. Numerical simulations are carried out in variable payload conditions were carried out to demonstrate the effectiveness of the proposed approach.
|
|
MoBT7 |
MR07 |
Online - Decision Support and Expert Systems |
|
Chair: Situ, Liwen | South China University of Technology |
|
11:00-11:20, Paper MoBT7.1 | |
A Multi-QoS-Constrained Routing Algorithm for Double-Layer Satellite Networks Based on Enhanced NSGA-II Algorithm |
|
Song, Yu | Central South University |
Ning, Hao | Central South University |
Long, Jun | Central South University |
Liu, Limin | Central South University |
Keywords: Complex Network, Evolutionary Computation, Heuristic Algorithms
Abstract: Abstract— Addressing the issue of inadequate Quality of Service (QoS) in inter-satellite communication under high load conditions due to limited network resources and uneven user distribution in a double-layer satellite constellation, this paper proposes a routing algorithm that satisfies multiple QoS constraints based on an enhanced NSGA-II algorithm. By utilizing the NSGA-II genetic algorithm, a chromosome path encoding scheme is constructed, treating the multi-QoS constraints of paths as a minimization multi-objective optimization within the NSGA-II algorithm. Moreover, elite solution preservation strategies are fine-tuned to procure a broader and more efficacious optimal Pareto front solution ensemble. This ensures the derivation of optimal pathways that meet multiple Quality of Service (QoS) constraints effectively. Simulation results demonstrate that the proposed algorithm effectively achieves load balancing in satellite network communication. Furthermore, significant improvements are observed in meeting various QoS requirements such as end-to-end latency, latency jitter, packet loss rate, and remaining bandwidth.
|
|
11:20-11:40, Paper MoBT7.2 | |
Anatomy-Aware Enhancement and Cross-Modal Disease Representation Fusion for Medical Report Generation |
|
Chen, Jianbin | South China Normal University |
Yang, Kai | South China Normal University |
Lin, Runfeng | South China Normal University |
Wang, Yating | South China Normal University |
Xu, Dacheng | South China Normal University |
Zhang, Meng | South China Normal University |
Liu, Shouqiang | South China Normal University |
Keywords: Application of Artificial Intelligence, AI and Applications, Image Processing and Pattern Recognition
Abstract: The objective of medical report generation is to convert medical images into structured text reports. The limitations of existing methods primarily include the lack of efficient local modeling and fusion algorithms, as well as severe data bias in medical datasets. To address these issues, we propose a novel method for report generation based on multi-feature fusion, comprising Anatomy-Aware Fusion Enhancement (AAFE) and Cross-modal Disease Representation Fusion (CDRF). This method first localizes specific anatomical structures and then models the interdependencies among anatomy-aware tokens through AAFE, achieving deep integration of global and anatomical tokens. Simultaneously, CDRF dynamically fuses cross-modal clinical indications, visual features, and disease states to obtain a high-order representation rich in visual and disease information, enabling the generator to generate fine-grained medical reports based on rich multi-feature embedding. Furthermore, we introduce a cross-modal cosine similarity loss mechanism to further promote the extraction of disease text features with anatomy-aware information. Experimental results on the IU-Xray and MIMIC-CXR datasets demonstrate that our method outperforms existing advanced models in the medical report generation task and exhibits outstanding performance.
|
|
11:40-12:00, Paper MoBT7.3 | |
Human Multi-Dimensional Stiffness Skills Transfer for Robot Teleoperation System |
|
Situ, Liwen | South China University of Technology |
Lu, Zhenyu | University of the West of England |
Si, Weiyong | University of the West of England |
Yang, Chenguang | University of Liverpool |
Keywords: Robotic Systems
Abstract: Neuroscience research has demonstrated the significance of modulating stiffness during human task performance. Similarly, endowing robots with such capability is expected. However, existing methods for robot teleoperation require operators to simultaneously control position and stiffness, resulting in high workload and task inefficiency. On the other hand, learning from demonstration (LfD) offers a feasible approach for autonomously generating stiffness. Therefore, this paper proposes a robot teleoperation system that combines the advantages of teleoperation and LfD. Teleoperation enables precise positioning guided by human operators, while LfD can transfer human stiffness skills to robots. A teleoperation-oriented stiffness-adaptive Gaussian Mixture Model/Gaussian Mixture Regression method is proposed to learn human multi-dimensional stiffness and reproduce robot stiffness on a Riemannian manifold. To enhance generalization and cooperate with teleoperation, reference points and position-driven output are introduced. Furthermore, a teleoperation strategy for both the single-leader-single-follower configuration and the single-leader-dual-follower configuration are designed, which allows operators to control either one or two robot arms with a single leader device. Finally, the effectiveness of our method is verified through a plugging-in task and a continuous flipping task, demonstrating that the proposed system is capable of performing tasks that demand high positioning accuracy and stiffness adjustment. A supplementary video for this paper is available in GitHub.
|
|
MoBT8 |
MR08 |
Online - Big Data and Intelligent Systems |
|
Chair: Zheng, Yilin | Sichuan University |
|
11:00-11:20, Paper MoBT8.1 | |
Dynamic Spatial Feature Enhancement for Local Climate Zone Classification in SAR and Multi-Spectral Data |
|
Zheng, Yilin | Sichuan University |
Lan, Shiyong | Sichuan University |
Li, Yao | Sichuan University |
Ma, Wei | Sichuan University |
Deng, Guonan | Sichuan University |
Keywords: Deep Learning, Image Processing and Pattern Recognition
Abstract: Local Climate Zone (LCZ) classification from remote sensing images plays a crucial role in quantifying the urban heat island effect. However, the performance of LCZ classification has not been satisfactory so far, especially for built-up area categories. To alleviate this issue, we introduce a novel network architecture, DS-LCZNET, which incorporates a Dynamic Spatial Feature Enhancement (DSFE) module for capturing complex spatial information and a SAR-MS Fusion (SMF) module to improve feature integration from SAR and MS data. Extensive experiments demonstrate that DS-LCZNET significantly enhances classification performance, achieving a 3.55% increase in overall accuracy, a 1.18% improvement in average accuracy (AA), and a 3.88% rise in the kappa (×100) coefficient compared to the current leading baseline, MsF-LCZ-Net. The codes will be publicly available at:https://github.com/zhyilin97/DSLCZNET.
|
|
11:20-11:40, Paper MoBT8.2 | |
Vrefine: A Self-Refinement Approach for Enhanced Clarity and Quality in Text-To-Speech Models |
|
Huang, Dongjin | Shanghai University |
Liu, Yuhua | Shanghai University |
Qian, Jiyu | Shanghai University |
Wang, Yanli | Shanghai University |
Keywords: Multimedia Computation, Application of Artificial Intelligence, Deep Learning
Abstract: Driven by advancements in the Large Language Model (LLM), there has been significant global attention on integrating the Generative Pre-trained Transformer (GPT) concept into Text-to-Speech (TTS) technologies. However, existing TTS models face issues such as inconsistent training data quality and reliance on autoregressive models, which makes controlling the quality of audio generation challenging. To address these challenges, this study introduces a novel TTS framework known as TTS-Vrefine, which is a new type of TTS architecture based on a self-feedback mechanism, aimed at enhancing the inference capabilities of the model and the quality of generated audio. The Vrefine framework iteratively refines the output, allowing the TTS system to self-train using its own generated data, significantly improving the clarity and quality of the audio. It expands the training dataset and enhances the self-optimization potential of the audio generation model, reducing the Word Error Rate (WER) of the base model by 2.4%, increasing the Perceptual Evaluation of Speech Quality (PESQ) by 2.22%, and improving the Short-Time Objective Intelligibility (STOI) by 4.55%. Additionally, the architecture optimizes the use of low-quality resources through a self-refinement mechanism, effectively expanding the training dataset.
|
|
11:40-12:00, Paper MoBT8.3 | |
Closed-Loop Predictive Control for Adaptive Optics Via Neural Networks |
|
Hu, Haining | Academy of Military Science |
Sun, Qianchong | Academy of Military Science |
Hua, Yuchen | Academy of Military Science |
Tan, Jie | Academy of Military Science |
Ren, Xiaoguang | Academy of Military Science |
Zhang, Rongkai | Intelligent Game and Decision Lab |
Liu, Xin | Academy of Military Science |
Keywords: Adaptive Systems, Manufacturing Automation and Systems, System Modeling and Control
Abstract: Wavefront sensor-less adaptive optics (WFS-less AO) systems have garnered considerable interest in recent years due to their compact architecture and extensive applicability. However, most algorithms developed for AO predominately rely on traditional control methodologies, which often require an unexpectedly prolonged period to converge. This inefficiency can severely hinder the feasibility of practical applications, especially in scenarios where rapid environmental fluctuations necessitate real-time control capabilities. In contrast to these conventional approaches, we propose an efficient closed-loop wavefront reconstruction method based on a novel forward prediction paradigm. The proposed method leverages information from the current control objective, the wavefront phase, and previous control signals, typically the voltages applied to deformable mirrors to predict future wavefront distortions and implement anticipatory control actions accordingly. Experimental results demonstrate that, compared to most classical techniques, our method can reduce the post-control RMS wavefront aberration by nearly one-third while exhibiting robustness to different types of turbulence, and showing promising potential for complex multi-layer turbulence scenarios.
|
|
MoBT9 |
MR09 |
Agent-Based and Autonomous Systems 2 |
|
Chair: Xiaobin, Pu | Shenzhen Metro Group Co., Ltd |
|
11:00-11:20, Paper MoBT9.1 | |
An Cognitive Inertia Sequence Model for Opinion Formation in Group Decision-Making Systems |
|
Dong, Jianglin | University of Electronic Science and Technology of China |
Mao, Haixia | Southwestern University of Finance and Economics |
Zhao, Yiyi | Southwestern University of Finance and Economics |
Peng, Yuan | University of Electronic Science and Technology of China |
Yang, Junyi | University of Electronic Science and Technology of China |
Hu, Jiangping | University of Electronic Science and Technology of China |
Keywords: Agent-Based Modeling, Complex Network
Abstract: Inspired by the empirical decision-making (EDM) phenomenon, wherein agents assimilate the opinion learned from social neighbors as their cognitive inertia and progressively rely on their own cognitive inertia sequence (CIS) for decision-making over time, we propose a novel CIS model paradigm and extend it based on the bounded confidence rule. In the extended CIS model, before agents obtain their acquired opinions, they will reconstruct the weight coefficients by evaluating the credibility of the opinions of social neighbors based on a comprehensive trust degree, composed of the opinion similarity and the centrality degree. Then, agents update their opinions through weighted aggregation of their CISs and acquired opinions. Finally, we apply the proposal to the Zachary’s karate club network, providing a comparison analysis between the extended CIS model and the HK model. Simulation results indicate that the number of opinion clusters increases as the trust threshold increases, and the extended CIS model has a shorter convergence time than the HK model, illustrating the effectiveness of the proposed model.
|
|
11:20-11:40, Paper MoBT9.2 | |
Enhancing Location Privacy through Prioritized Experience Replay in Deep Q-Networks |
|
Kaur, Harkeerat | Indian Institute of Technology Jammu |
Pandey, Manish | IIT Jammu |
Echizen, Isao | National Institute of Informatics Tokyo |
Keywords: Agent-Based Modeling, Application of Artificial Intelligence, Cybernetics for Informatics
Abstract: The growth of location-based services (LBS) means increased privacy for individuals. Consequently, it requires a solid base for user privacy mechanisms. Our study proposes a decentralized method based on Deep Q-Networks (DQN), enhanced by Prioritized Experience Replay (PER), and applies Federated Learning (FL) for a better training process. PER, apart from being informational, aims mainly to provide an effective way to learn while on the other hand, Federated Learning relies on a centralized server to train the model using the replay memory but not at the expense of the privacy of the data during the learning process. After the trained model weights are successfully transferred to client devices, prediction can be performed locally and real-time decisions can be made as positions change. The model incorporates implicit cues presented by app context, frequency of use, and separation factors among others to estimate users’ privacy attitude toward location data sharing. The test of resourcefulness upholds a claim to the advantages of the augmented DQN model, as shown in the results in which PER and Federated Learning contribute to accelerated convergence and the enhanced ability to absorb information. As a result, there is the utilization of numerous tools that put this power into the hands of individual users in the ongoing development of LBS, hence, a detribalization of the digital space by empowerment.
|
|
11:40-12:00, Paper MoBT9.3 | |
A Tube Model Predictive Control Approach for Virtually Coupled Train Set with Nonlinear Relative-Braking Distance (I) |
|
Xiaobin, Pu | Shenzhen Metro Group Co., Ltd |
Liqun, Fan | Shenzhen Metro Group Co., Ltd |
Qiusheng, Liu | Shenzhen Municipal Design and Research Institute Co., Ltd |
Luo, Xiaolin | Beijing Jiaotong University |
Qingdong, Jia | Traffic Control Technology Co., Ltd |
Kaixuan, Li | Traffic Control Technology Co., Ltd |
Kun, Zhang | Traffic Control Technology Co., Ltd |
Liu, Hongjie | Beijing Jiaotong University |
Keywords: Cybernetics for Informatics, Agent-Based Modeling, Optimization and Self-Organization Approaches
Abstract: Virtual coupling deploys the relative-braking distance (RBD) to keep a minimal distance between successive units in a virtually coupled train set (VCTS). However, it is still challenging to design a control approach that addresses RBD in a stable and computation-efficient way. Thus, this paper proposes a tube model predictive control (MPC) approach to solve this problem. The nonlinear model is formulated to capture the movement of VCTS with RBD. In the prediction horizon, we linearize the model at each instant, such that the controller is calculated by solving a computationally efficient quadratic programming problem. Tubes are constructed to confine the tracking errors from linearization. Finally, experiments are carried out and relevant results show that the proposed tube MPC can operate VCTS as well as nonlinear MPC, but save by over 70% computation time.
|
|
MoBT10 |
MR10 |
Deep Learning and Neural Networks 1 |
|
Chair: Miyazawa, Ryuta | Ntt Docomo, Inc |
|
11:00-11:20, Paper MoBT10.1 | |
Introduction of Weighted Local Response Normalization in VOneNets for Improved Robustness against Adversarial Examples |
|
Miyazawa, Ryuta | Ntt Docomo, Inc |
Hattori, Motonobu | University of Yamanashi |
Keywords: Neural Networks and their Applications, Deep Learning, Machine Learning
Abstract: This study aims to enhance the robustness of neural networks against adversarial examples by introducing a novel approach called weighted local response normalization into a recently developed class of hybrid convolutional neural network vision models known as VOneNets. While conventional methods, such as local response normalization, do not significantly improve robustness to adversarial examples, the proposed weighted local response normalization demonstrates notable advancements in robustness. Experiments conducted using the MNIST and CIFAR-10 datasets revealed that the introduction of weighted local response normalization significantly enhanced robustness to adversarial examples. These findings emphasize the importance of how inhibition is applied, rather than merely its application, in boosting robustness. The study concludes that weighted local response normalization serves as an effective strategy to reduce the effects of noise, ultimately enhancing the robustness of neural networks against adversarial examples.
|
|
11:20-11:40, Paper MoBT10.2 | |
A Discrete Diffusion-Based Approach for Solving Multi-Objective Traveling Salesman Problem (I) |
|
Su, Dawei | Sun Yat-Sen University |
Zhang, Zizhen | Sun Yat-Sen University |
Chen, Jinbiao | Sun Yat-Sen University |
Fang, Zhanhong | Sun Yat-Sen University |
Keywords: Deep Learning, Neural Networks and their Applications, Optimization and Self-Organization Approaches
Abstract: Thanks to the highly-expressive generative capabilities exhibited by diffusion models, recent works have shown their promising performance in combinatorial optimization (CO) problems, where the complicated problems are converted into the corrupting and denoising of heatmaps. The characteristics of diffusion-based approaches result in special advantages for Multi-Objective CO (MOCO) problems, especially Multi-Objective Traveling Salesman Problem (MOTSP) better aligned with that solving paradigm. In this paper, we improve and adapt the diffusion-based approaches to tackle MOTSP, which are trained to generate various Pareto optimal solutions according to the problem decomposition strategies. Experimental results demonstrate that although the proposed approach may lag behind with the most advanced neural methods at present, it outperforms several traditional heuristics with a single graph neural network, indicating its effectiveness and potentiality in addressing MOCO problems.
|
|
11:40-12:00, Paper MoBT10.3 | |
A Progressive Soft Erase and Multi-Scale Feature Fusion Method for Medical Education Management of Breast Cancer (I) |
|
Liu, Yiming | Beijing University of Technology |
Li, Jianqiang | Beijing University of Technology |
Liu, Xiaoling | Beijing University of Technology |
Xu, Xi | Beijing University of Technology |
Ding, Shujie | Beijing University of Technology |
Zhao, Linna | Beijing University of Technology |
Liu, Zhaolei | Beijing University of Technology |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Vision
Abstract: Breast ultrasound (BUS) image lesion segmentation is an important step in computer-aided diagnosis (CAD) systems for breast cancer screening. Due to acquiring pixel-level labels for fully supervised BUS lesion segmentation being extremely expensive and time-consuming, many studies have adopted weakly supervised methods to mitigate the reliance of models on pixel-level labels. However, in weakly supervised methods, the class activation map (CAM) often excessively focuses on the most distinctive regions of the target while overlooking other lesion parts. This limitation often leads to insufficient CAM activation and compromises the accuracy of lesion segmentation. To address this problem, we propose a weakly supervised framework for lesion segmentation on BUS images using image-level labels. Specifically, the method consists of two stages: classification and segmentation. During the classification stage, the network based on the progressive soft erase (PSE) module reduces the contribution of the most discriminative feature region in CAM, making the network focus on the features of non-dominant regions and generating more high-quality pseudo-masks. In the segmentation stage, we propose a dual-branch feature fusion (DBFF) network, which can not only capture more contextual information by fusing from different scales but also generate a more accurate segmentation mask. Extensive experiments on the publicly available dataset BUSI show the effectiveness of our method. Furthermore, this model significantly supports medical education management by providing a more efficient and cost-effective approach to breast lesion segmentation.
|
|
MoBT11 |
MR11 |
Computational Intelligence and Soft Computing 5 |
Special Sessions: Cyber |
Chair: Li, Jiacheng | Kanagawa University |
|
11:00-11:20, Paper MoBT11.1 | |
Surrogate-Assisted Multi-Class Collaborative Teaching and Learning Optimizer for High-Dimensional Industrial Optimization Problems (I) |
|
Bi, Jing | Beijing University of Technology |
Wang, Ziqi | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Yang, Jinhong | CSSC Systems Engineering Research Institute |
Zhang, Jia | Southern Methodist University |
Keywords: Computational Intelligence, Metaheuristic Algorithms, Cloud, IoT, and Robotics Integration
Abstract: Swarm intelligence and evolutionary algorithms are widely applied in industrial scheduling, mobile edge computing, etc due to their strong robustness and fast optimization speed. However, some real-world industrial optimization problems involve numerous decision variables, known as highdimensional problems. Current algorithms often require considerable computational resources to evaluate objective function values because of high-dimensional decision spaces. Moreover, they are also prone to be trapped in local optima. To solve the above problems, this work proposes an improved algorithm named Surrogate-assisted Multi-class Collaborative Teaching and learning optimizer (SMCT). A multi-class collaborative teaching and learning optimizer is proposed as a base optimizer to improve exploration and exploitation abilities. Furthermore, an autoencoder-assisted radial basis function is proposed as the surrogate model to replace true function evaluations, thereby saving computational resources and balancing the complexity and accuracy in fitting true models. Finally, experimental results demonstrate that SMCT surpasses its existing peers in both search accuracy and convergence speed across eight highdimensional benchmark functions.
|
|
11:20-11:40, Paper MoBT11.2 | |
Optimization Analysis of Bus Operation Scheduling Considering Passenger and Cargo Sharing under Uncertain Demand (I) |
|
Li, Jiacheng | Kanagawa University |
Zhang, Yang | Hosei University |
Noto, Masato | Kanagawa University |
Keywords: Computational Intelligence, Swarm Intelligence, Agent-Based Modeling
Abstract: In view of the problems faced during the ongoing development of public transport, such as departure intervals that are too long, low occupancy rates, difficulties making a profit, and high express delivery costs, bus operation and scheduling models that consider passenger and cargo sharing have been attracting attention. In this work, considering the interests of both bus companies and passengers, we propose a bus operation scheduling optimization model that ensures the largest revenue for urban and rural bus companies and the shortest travel time for passengers under uncertain demand. We also design a particle group genetic algorithm for use with this model. The results of a case study set in China demonstrate the feasibility and effectiveness of the model and algorithm.
|
|
11:40-12:00, Paper MoBT11.3 | |
A Neurobehavioral Evaluation of the Efficacy of 1mA Longitudinal, Anodal tDCS on Multitasking and Transfer Performance (I) |
|
K Rao, Akash | Manipal Institute of Technology, Manipal Academy of Higher Educa |
Uttrani, Shashank | IIT Mandi |
Shah, Darshil | IIT Mandi |
Menon, Vishnu | IIT Mandi |
Bhavsar, Arnav | Indian Institute of Technology Mandi (IIT Mandi) |
Chowdhury, Shubhajit Roy | IIT Mandi |
Negi, Ramsingh | IIT Mandi |
Dutt, Varun | Indian Institute of Technology Mandi |
Keywords: Computational Intelligence, Cybernetics for Informatics, Computational Intelligence in Information
Abstract: Multitasking requires rapid switching of attention and cognitive resources between different tasks in a dynamic environment, relying on cognitive processes, such as working memory, executive control, and selective attention. Although studies have investigated the efficacy of various neurobehavioral interventions in improving multitasking capabilities, the effects of longitudinal anodal transcranial direct current stimulation (tDCS) in enhancing multitasking performance have not been investigated. This research investigated the efficacy of 1mA anodal tDCS administered longitudinally on multitasking performance. 42 participants were randomly and equally divided into the experimental and placebo control conditions in this study conducted for 10 days. All participants executed two multitasking tasks on day 1 and received 1mA anodal/placebo tDCS during task training from day 2 to day 8. Various behavioral and neurophysiological measures have been measured. The findings revealed that tDCS had the propensity to augment multitasking capabilities in the trained task, but had limited transfer capabilities. EEG-based brain connectivity analysis also revealed the formation of network hubs in the prefrontal and frontal regions, indicating enhanced cortical activation in the beta band. We intend to use these findings to design interventional frameworks to enhance multitasking performance using tDCS.
|
|
MoBT12 |
MR12 |
Haptic and Human-Computer Interaction 2 |
|
Chair: Tokmurziyev, Issatay | Skolkovo Institute of Science and Technology |
|
11:00-11:20, Paper MoBT12.1 | |
Examining the Impact of Delay on Tele-Robotic Surgical Operability through Brain Activity Evaluation |
|
Ichihara, Junnosuke | Institute of Science Tokyo |
Miura, Satoshi | Tokyo Institute of Technology |
Keywords: Human-Machine Interface, Human-Computer Interaction, Brain-Computer Interfaces
Abstract: In tele-surgery, surgeons remotely operate a surgical robot via the leader-follower system. However, communication delays often lead to challenges that can affect the ease of operation. In this study, we evaluated the intuitive operability of surgical robotic systems by measuring brain activation in the intraparietal sulcus, which serves as an interface between visual information and body motor control. The participants completed a suturing task in virtual reality using a surgical simulator. We randomly introduced time delays while measuring brain activity using functional Near-Infrared Spectroscopy. We found significant differences in task completion time and activity in the right intraparietal sulcus according to the delay time. According to our data, delays of 300 ms or more could be identified as one of the indicators that negatively affect intuitive operability in tele-surgery.
|
|
11:20-11:40, Paper MoBT12.2 | |
GazeRace: Revolutionizing Remote Piloting with Eye-Gaze Control |
|
Tokmurziyev, Issatay | Skolkovo Institute of Science and Technology |
Serpiva, Valerii | Skolkovo Institute of Science and Technology Skoltech |
Fedoseev, Aleksey | Skolkovo Institute of Science and Technology |
Altamirano Cabrera, Miguel | Skolkovo Institute of Science and Technology Skoltech |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Human-Machine Interface, Human-Machine Cooperation and Systems, Human-Computer Interaction
Abstract: This paper presents GazeRace, a novel system that leverages eye-tracking technology for intuitive drone control. Using the MediaPipe library, the system translates eye movements into precise drone commands, enabling effective remote piloting. In testing, GazeRace demonstrated an 18% reduction in drone trajectory length while maintaining competitive speed with traditional controls. The results suggest that this approach enhances control accuracy and reduces user frustration, offering a significant advancement in the field of human-computer interaction and drone navigation.
|
|
11:40-12:00, Paper MoBT12.3 | |
Haptic Shared Control by Adjusting the Stiffness of a Joystick Based on Sensor Systems Reliability for Underwater Vehicles (I) |
|
Fujie, Kenshin | Nara Institute of Science and Technology |
Orita, Yasuaki | Nara Institute of Science and Technology |
Sato, Eito | Nara Institute of Science and Technology |
Sakagami, Norimitsu | Ryukoku University |
Wada, Takahiro | Nara Institute of Science and Technology |
Keywords: Haptic Systems, Human-Machine Cooperation and Systems, Human-Machine Interaction
Abstract: Haptic shared control is a framework in which automation and the human pilot utilize the same operational input terminals to control a system. Both of their responses to automation failures are important. This paper describes the concept of adjusting the stiffness of the control stick based on the reliability of sensor systems. This helps human pilots perceive when to intervene and when to follow (just put their hand lightly on the stick) by changing the stiffness of the control stick. First, the haptic feedback design for changing the stiffness of a joystick is explained. Then, in the experiment, participants perform the task of keeping a remotely operated underwater vehicle stationary in front of a target in water stream using a joystick with three different haptic feedbacks: ours, constant high stiffness, and constant low stiffness. The experimental results indicate that our strategy increases control accuracy without increasing the human pilot's workload while allowing the human to understand the whereabouts of the control initiative. The experimental results indicate that our strategy increases control accuracy without increasing the human pilot's workload while allowing the human pilot to understand whether the human pilot or the automation should be currently in control of the task or action being performed. As the unexpected underwater environment adversely affects the sensor systems, our findings can be expected to play an important role in improving the control performance in the sea. Furthermore, it may inform the design of haptic feedback systems for low sensor visibility scenes.
|
|
MoCT1 |
MR01 |
AI Applications 3 |
Regular Papers - Cybernetics |
Chair: Liping, Wang | Zhejiang University of Technology |
|
15:45-16:05, Paper MoCT1.1 | |
HR-SFormer: High Resolution Swin Transformer for Adrenal Tumor Segmentation |
|
Liping, Wang | Zhejiang University of Technology |
Liao, Jun | Zhejiang University of Technology |
Qicang, Qiu | Zhejiang Lab |
Wang, Jian | Tongde Hospital of Zhejiang Province |
Keywords: Application of Artificial Intelligence, Deep Learning, Neural Networks and their Applications
Abstract: Adrenal tumor segmentation is crucial for precise tumor diagnosis. However, Existing methods often encounter difficulties attributed to over-segmentation induced by low contrast or adhesion of surrounding tissues. In this study, we propose a novel three-dimensional CT adrenal tumor segmentation method named HR-SFormer, based on Swin Transformer. Firstly, this method leverages attention mechanism inherent in Swin Transformer to capture long-range dependencies in three-dimensional space, enabling initially location of adrenal tumors and their boundaries. Secondly, we introduce High Resolution Block (HR) to enhance network depth and nonlinear features, improving model's expression ability and tumor localization accuracy. Additionally, we incorporate skip connections to fuse shallow and deep features through HR module, enabling HR-SFormer to better comprehend location, shape and boundaries of tumors in CT images, resulting in more accurate tumor localization. Finally, we evaluate our method utilizing three common metrics: Dice Similarity Coefficient, Intersection over Union and Hausdorff Distance, comparing it with state-of-the-art methods on the dataset including four types of adrenal tumors. The results demonstrate superior segmentation performance of HR-SFormer.
|
|
16:05-16:25, Paper MoCT1.2 | |
Mixformer: Feature Mixed Transformer for Rainfall Forecasting |
|
Liao, Yuanyuan | XinJiang University |
Li, Boyuan | Xinjiang University |
Li, Xiuhong | Xinjiang University |
Keywords: AI and Applications
Abstract: In the Xinjiang region of China, water is scarce and unevenly distributed, and due to factors such as global warming, extreme rainfall events occur frequently, posing serious threats to people's lives and property safety. To accurately predict short-term precipitation, we propose a Mixformer model. Specifically, we first use a combination of min-max normalization and reversible instance normalization to reduce the impact of feature numerical ranges on modeling while preserving the distribution of the original data. Next, to enhance the representation of multivariate time series data, we propose a feature mixing module. This module enhances the representation of inter-feature correlation information by calculating the correlation between different sequences, thus improving prediction effectiveness. Finally, to enhance its non-linear characteristics, we introduce a residual prediction method. This method models periodic and trend components separately and also pays special attention to the residual component. We have validated the proposed method extensively, proving that it outperforms existing state-of-the-art (SOTA) methods in this field.
|
|
16:25-16:45, Paper MoCT1.3 | |
Remaining Useful Life Prediction of Lithium-Ion Batteries Using Lag-Llama Model with Auto-Correlation Analysis |
|
Li, Heng | Central South University |
Zhu, Zeyu | Central South University |
Chen, Xiaolong | Central South University |
Fan, Yunsheng | Central South University |
Lisen, Yan | Central South University |
Liu, Weirong | Central South University |
Keywords: Application of Artificial Intelligence
Abstract: Predicting accurate capacity degradation and remaining useful life (RUL) of lithium-ion battery is critical to health management and safe operation. However, variations in operating conditions and the variety of battery types present challenges to data-driven predictive models. Most data-driven methods rely on traditional machine learning models, which often have constrained predictive and generalization abilities. In this paper, a foundational model: Lag-Llama is used to predict capacity and RUL of battery with auto-correlation analysis. Firstly, the tokenization scheme of Lag-Llama is improved by auto-correlation analysis, which calculate the most probable periods in history capacity sequence. It is helpful for model to comprehend the capacity fluctuation pattern. Then, Lag-Llama is pre-trained to learn battery capacity degradation, and thus calculate the RUL. Additionally, the model is fine-tuned with a small amount of data to update the top-level module for application to the target cell. Finally, experimental results show that the proposed model exhibits accurate RUL prediction and strong transfer capability, within the average mean square error and absolute error less than 0.035 and 9 respectively.
|
|
16:45-17:05, Paper MoCT1.4 | |
ECFO: An Efficient Edge Classification-Based Fusion Optimizer for Deep Learning Compilers |
|
Li, Wei | University of Science and Technology of China |
Pengcheng, Wang | The Institute of Computing Technology, Chinese Academy of Scienc |
Wu, Yutong | University of Chinese Academy of Sciences |
Liu, Kangcheng | University of Chinese Academy of Sciences |
Xue, Lide | University of Science and Technology of China |
Du, Zidong | State Key Lab of Processors, Institute of Computing Technology, |
Zhang, Xishan | Institute of Computing Technology,Chinese Academy of Scie |
Zhou, Xuehai | University of Science and Technology of China |
Keywords: AI and Applications, Application of Artificial Intelligence, Machine Learning
Abstract: Operation fusion is a critical technique in optimizing deep learning compilers as it enhances computational efficiency by integrating multiple operations into a single computational graph. However, finding an effective fusion strategy is challenging, requiring the definition of an optimization search space and identification of the best strategy within this space. Existing methods, such as heuristic searches and learning-based searches, have significant limitations. Heuristic searches are complex, labor-intensive, and often lack generalizability across different network architectures. On the other hand, learning-based methods demand extensive training and prolonged search time. To address these challenges, we introduce the Edge Classification-Based Fusion Optimizer (ECFO), a novel approach that reconceptualizes operation fusion as an edge classification problem. By leveraging Graph Neural Networks (GNNs) for efficient graph feature encoding, ECFO streamline the optimization process and significantly reduces computational overhead. Comprehensive evaluations across diverse neural networks demonstrate that ECFO decrease search time by up to 23x and improves inference performance by 3.2%, representing a substantial advancement over existing strategies.
|
|
MoCT2 |
MR02 |
AI Applications 7 |
Regular Papers - Cybernetics |
Chair: Yan, Ruiyi | Beijing Institute of Technology |
|
15:45-16:05, Paper MoCT2.1 | |
TokenFree: A Tokenization-Free Generative Linguistic Steganographic Approach with Enhanced Imperceptibility |
|
Yan, Ruiyi | Beijing Institute of Technology |
Song, Tian | School of Cyberspace Science and Technology, Beijing Institute O |
Yang, Yating | School of Cyberspace Science and Technology, Beijing Institute O |
Keywords: Application of Artificial Intelligence, AI and Applications, Media Computing
Abstract: Since tokenization serves a fundamental preprocessing step in numerous language models, tokens naturally constitute the basic embedding units for generative linguistic steganography. However, tokenization-based methods face challenges including limited embedding capacity and possible segmentation ambiguity. Despite existing character-level (one tokenization-free type) linguistic steganographic approaches, they face the problem of generating unknown or out-of-vocabulary words, potentially compromising steganographic imperceptibility. In this paper, we focus on both embedding capacity and imperceptibility of tokenization-free linguistic steganography. First, we suggest that unknown words mainly result from low-entropy distributions and rigid coding rules used in candidate pools, thus we propose an entropy-based selection approach to flexibly construct candidate pools. Further, we present a lexical emphasis approach, prioritizing characters within candidate pools capable of forming in-vocabulary words. Experiments show that, across a range of high embedding rates, our approaches achieve considerably higher imperceptibility and text fluency, increase anti-steganalysis capacity averagely by 14.4%, and particularly reduce out-of-vocabulary rate averagely by 88.7%, compared to the existing state-of-the-art character-level steganographic methods.
|
|
16:05-16:25, Paper MoCT2.2 | |
A Multi-Level Contrastive Learning Framework for Knowledge Graph-Based Recommendation Systems |
|
Sun, Tianhao | Chongqing University |
Zhang, Xiaodong | Chongqing University |
Chen, Yanke | Chongqing University |
Huhai, Zou | Chongqing University |
Wu, Quanwang | Chongqing University |
Keywords: Application of Artificial Intelligence, Deep Learning, Expert and Knowledge-Based Systems
Abstract: In recent years, researchers have introduced knowledge information and structural information extracted from knowledge graphs into recommendation systems to improve their performance. However, existing knowledge graph-based recommendation algorithms face challenges in extracting sufficient knowledge and identifying noise unrelated to the recommendation task. To address the aforementioned challenges, we introduce a multi-level contrastive learning framework for knowledge graph-based recommendation systems, MCKGRec. The model integrates information from three levels: user-item interactions, item-entity knowledge, and collaborative knowledge structures. It utilizes lightweight graph convolutional networks to capture interactive signals, graph attention networks to learn knowledge information, and a collaborative knowledge propagation module to capture global structural information. Additionally, a multi-level contrastive learning task is introduced to enhance the recommendation accuracy and robustness. Comprehensive testing across three different datasets confirms that our method significantly enhances the efficacy of recommendation outcomes. By leveraging knowledge graph information and multi-level contrastive learning, MCKGRec better captures complex user-item relationships and filters out irrelevant noise.
|
|
16:25-16:45, Paper MoCT2.3 | |
Using Explainable AI for EEG-Based Reduced Montage Neonatal Seizure Detection |
|
Battagodage, Dinuka Sandun Udayantha | University of Moratuwa |
Weerasinghe, Kavindu Nirmana | University of Moratuwa |
Wickramasinghe, Nima | University of Moratuwa |
Abeyratne, Akila Jayashan | University of Moratuwa |
Wickremasinghe, Kithmin | The University of British Columbia |
Wanigasinghe, Jithangi | University of Colombo |
De Silva, Anjula Chathuranga | University of Moratuwa |
Edussooriya, Chamira | University of Moratuwa |
Keywords: Application of Artificial Intelligence, Deep Learning, Neural Networks and their Applications
Abstract: The neonatal period is the most vulnerable time for the development of seizures. Seizures in the immature brain lead to detrimental consequences, therefore require early diagnosis. The gold-standard for neonatal seizure detection currently relies on continuous video-EEG monitoring; which involves recording multi-channel electroencephalogram (EEG) alongside real-time video monitoring within a neonatal intensive care unit (NICU). However, video-EEG monitoring technology requires clinical expertise and is often limited to technologically advanced and resourceful settings. Cost-effective new techniques could help the medical fraternity make an accurate diagnosis and advocate treatment without delay. In this work, a novel deep learning model to automate the neonatal seizure detection process with a reduced EEG montage is proposed, while addressing two key issues in existing methods. Firstly, the slow convergence problem is solved by introducing a wider convolution encoder with skip connections. Secondly, the explainability issue is solved by taking the derivatives of model output with respect to the last graph attention layer. By evaluating the performance on the Zenodo dataset with 10-fold cross-validation, the presented model achieves an absolute improvement of 8.31% and 42.86% in area under curve (AUC) and recall, respectively.
|
|
16:45-17:05, Paper MoCT2.4 | |
Multi-Granularity Temporal-Spectral Representation Learning for Speech Emotion Recognition |
|
Zhichen, Yuan | South China University of Technology |
Chen, C. L. Philip | University of Macau |
Li, Shuzhen | South China University of Technology |
Zhang, Tong | South China University of Technology |
Keywords: AI and Applications, Deep Learning
Abstract: Speech emotion recognition (SER) captures emotional information from speech signals to recognize users' emotional states, which plays a crucial role in conversational human-computer interaction. Most SER researches focus on exploiting emotional information from global temporal or spectral features, but it may neglect detailed emotion-related information such as phonemes and syllables. To address this problem, this paper proposes a multi-granularity temporal-spectral representation learning (MG-TSRL) network for speech emotion recognition tasks. Specifically, MG-TSRL extracts different temporal features in phonetic, syllabic, and sentential granularity from spectrograms to retain more detailed emotional-related information. It then designs multi-layer emotion-aware units to capture emotion-related frequency patterns and obtain deep spectrum features at each temporal granularity feature. MG-TSRL further introduces a fast broad learning system and feeds deep temporal-spectral features to it to obtain more accurate emotions. MG-TSRL gradually achieves effective temporal-spectral representation learning through multi-granularity temporal features and multi-layer frequency pattern learning. The state-of-the-art results on the CASIA, RAVDESS, and SAVEE datasets are respectively 95.17%, 92.78%, and 87.50% in unweighted accuracy, demonstrating the effectiveness of MG-TSRL in speech emotion recognition.
|
|
MoCT3 |
MR03 |
Augmented and Virtual Reality 1 |
Regular Papers - HMS |
Chair: Zhou, Tianyi | Southeast University |
|
15:45-16:05, Paper MoCT3.1 | |
Emohance: Real-Time Emotional Amplification in Gaming Via Physiological Vibratory Feedback |
|
Kosuge, Yuki | Tokyo Metropolitan Univercity |
Okamoto, Shoggo | Tokyo Metropolitan University |
Keywords: Affective Computing, Virtual and Augmented Reality Systems, Virtual/Augmented/Mixed Reality
Abstract: Vibratory feedback, when applied during emotionally charged scenes in movies and music, has been shown to intensify viewer experiences. Previous methodologies primarily relied on pre-determined timings based on audiovisual content, rendering them less effective for interactive media. This study introduces a new approach that triggers vibratory feedback in real-time, guided by the viewer's physiological responses, aiming to amplify emotional experiences. We executed a user study involving 11 participants who engaged in a treasure-hunting game. Vibratory feedback was administered to their upper bodies in instances deemed emotionally arousing, as indicated by their skin conductance responses (SCR) during gameplay. Comparative analysis of sessions with and without vibratory feedback revealed that vibrations significantly enhanced subjective feelings of anger and excitement. Moreover, vibrations led to a notable increase in the number of peaks and integrated value of positive SCR signals, substantiating the efficacy of vibratory feedback in enriching emotional experiences. The findings affirm the potential of leveraging users' physiological signals to enhance emotional responses in interactive content. This advancement holds promise for a wide range of applications, particularly in personalizing emotional experiences in dynamic scenarios shaped by user interactions.
|
|
16:05-16:25, Paper MoCT3.2 | |
Study on the Influence of Embodied Avatars on Gait Parameters in Virtual Environments and Real World |
|
Zhou, Tianyi | Southeast University |
Ding, Ding | Southeast University |
Wang, Shengyu | Southeast University |
Shi, Chuhan | Southeast University |
Xu, Xiangyu | Southeast University |
Keywords: Virtual/Augmented/Mixed Reality, Virtual and Augmented Reality Systems, Human-Machine Interaction
Abstract: In this study, we compare the virtual and real gait parameters to investigate the effect of appearances of embodied avatars and virtual reality experience on gait in physical and virtual environments. We developed a virtual environment simulation and gait detection system for analyzing gait. The system transfers real-life scenarios into a realistic presentation in the virtual environment and provides look-alike same-age and old-age avatars for participants. We conducted an empirical study and used subjective questionnaires to evaluate participants' feelings about the virtual reality experience. Also, the paired sample t-test and neural network were implemented to analyze gait differences. The results suggest that there are disparities in gait between virtual and real environments. Also, the appearance of embodied avatars could influence the gait parameters in the virtual environment. Moreover, the experience of embodying old-age avatars affects the gait in the real world.
|
|
16:25-16:45, Paper MoCT3.3 | |
Saving the Elf: Immersive VR Experience Utilizing Multiple Relaxing Approaches |
|
Nie, Cheng | Southeast University |
Ding, Ding | Southeast University |
Wu, Jinhao | Southeast University |
Yang, Xu | Southeast University |
Ou, Ronghuang | Southeast University |
Du, Wei | Southeast University |
Keywords: Virtual and Augmented Reality Systems, Virtual/Augmented/Mixed Reality, Human-Computer Interaction
Abstract: The global prevalence of anxiety has seen a significant increase in the past few years. Anxiety, if left untreated, can lead to various detrimental effects, such as sleep disturbances, excessive fear, and even suicidal tendencies. Traditional face-to-face therapies have been employed to address this issue; however, their effectiveness is limited by time and space constraints, particularly during the COVID-19 pandemic. As a result, computer-based systems have emerged as a potential solution. However, most existing relaxation systems rely on a single method and have unnatural interactions, leading to a lack of variety and engagement. To overcome these limitations, this study proposes a virtual reality (VR) relaxation system that combines various relaxation techniques, a captivating storyline and natural interactions to enhance the user experience. To assess the effectiveness of the proposed system, a pilot study was conducted with 36 participants. The study utilized a between-subjects design, dividing the participants into three groups: a control group, a progressive muscular relaxation group, and a virtual reality relaxation group. The findings revealed that the virtual reality relaxation system significantly reduced anxiety levels among the participants. Importantly, this effect was observed to persist even after a period of time and was found to be more effective than using only one relaxation method in the long term. Moreover, the performance of our system is close to that of commercial systems.
|
|
16:45-17:05, Paper MoCT3.4 | |
Enhancing Marine Navigation Performance Using the Head-Up Interface |
|
Singh, Avinash | University of Technology Sydney |
Zhou, Jinzhao | University of Technology Sydney |
Lin, Chin-Teng | University of Technology Sydney |
Lal, Sara | University of Technology Sydney |
Eidels, Ami | University of Newcastle |
Jiang, Xiaowei | University of Technology Sydney |
Brown, Scott | University of Newcastle |
Keywords: Virtual/Augmented/Mixed Reality, Affective Computing, Biometrics and Applications,
Abstract: Modern marine navigation places significant physical and mental demands on officers stationed on ship bridges, primarily due to the continuous observation and evaluation of real-time navigational information displayed on scattered electronic equipment. To alleviate the high cognitive load experienced by marine officers and allow them to focus on essential tasks during complex situations, the integration of head-up displays (HUDs) in marine applications has emerged as a promising solution. HUDs offer the potential to provide crucial information, enhancing the accessibility and organization of previously disordered data. However, there is limited information on the impact of HUDs on marine officers, which has been explored in the presented work. In this work, a novel immersive navigation experiment with three conditions: traditional display (NonAR), augmented reality (AR) based information presentation, and a variant of AR with essential information only (AR-Indicator) has been conducted. The objective is to explore the effects of these three conditions on navigation performance and mental workload. Our findings indicate that the AR-based information presentation, specifically the variant that includes only essential information, is preferred by participants and showed performance improvements measured by time to complete tasks, gaze duration, and pupil dilation compared to the traditional display and full AR condition. These results have an impact on the design and development of HUDs in marine-related tasks. This pilot research sheds light on the potential benefits of HUDs in improving maritime navigation and paves the way for further advancements in this field.
|
|
MoCT5 |
MR05 |
Adaptive Systems and Control 3 |
Regular Papers - SSE |
Chair: Ge, Hangli | The University of Tokyo |
|
15:45-16:05, Paper MoCT5.1 | |
Time-Probability Dependent Knowledge Extraction in IoT-Enabled Smart Building |
|
Ge, Hangli | The University of Tokyo |
Seike, Hirotsugu | Interfaculty Initiative in Information Studies, the University O |
Koshizuka, Noboru | The University of Tokyo |
Keywords: Smart Buildings, Smart Cities and Infrastructures, System Modeling and Control, Cyber-physical systems
Abstract: Smart buildings incorporate various emerging Internet of Things (IoT) applications for comprehensive management of energy efficiency, human comfort, automation and security. However, the development of a knowledge extraction framework for human activities is fundamental. Currently, there is a lack of a unified and practical framework for modeling heterogeneous sensor data within buildings. In this paper, we propose a practical inference framework for extracting status-to-event knowledge within smart building. Our proposal includes IoT-based API integration, ontology model design, and time probability dependent knowledge extraction methods. We leveraged the Building Topology Ontology (BOT) to construct spatial relations among sensors and spaces within the building. Additionally, we utilized Apache Jena Fuseki's SPARQL server for storing and querying RDF triple data. Two types of knowledge could be extracted: timestamp-based probability for abnormal event detection and time interval-based probability for conjunction of multiple events. We conducted experiments over a 78-day period in a real smart building environment, collecting data on light and elevator states for evaluation. The evaluation revealed several inferred events, such as room occupancy, elevator trajectory tracking, and the conjunction of both events. The numerical values of detected event counts and probability demonstrate the potential for automatic control in the smart building.
|
|
16:05-16:25, Paper MoCT5.2 | |
A Target Trajectory Prediction Method in Air Combat Based on Wavelet-Attention-GRU under the Frenet Frame |
|
Zhang, An | Northwestern Polytechnical University |
Mao, Zeming | Northwestern Polytechnical University |
Xu, Haiyu | Shanghai Zhongchuan Ship Design Technology National Engineering |
Fan, Qiucen | Northwestern Polytechnical University |
Bi, Wenhao | Northwestern Polytechnical University |
Yan, Yuwen | Northwestern Polytechnical University |
Keywords: Decision Support Systems, System Modeling and Control
Abstract: 目标轨迹预测方法可辅助飞行员进行态势感知,为提高飞行员在高动态视距空战中获得优势的能力提供决策支持。针对基于笛卡尔框架的传统方法存在训练数据利用率低、泛化性弱、现有时间序列预测模型精度低等问题,该文提出一种基于Frenet框架下基于Wavelet-Attention-GRU的空战目标轨迹预测方法。该方法通过基于Frenet框架的空间轨迹曲线描述空战特征;将GRU网络与改进的多头自注意力机制相结合,通过添加小波变换,建立了目标轨迹预测模型。最后,利用高保真空战模拟器得到的一对一视距内空战数据集,对弹道预测模型进行训练和测试。
|
|
16:25-16:45, Paper MoCT5.3 | |
Research on Integrated Elderly Care Service Supply Strategy under Competitive Subsidy Mechanism |
|
Zhao, Jing | Northwestern Polytechnical University |
Lou, Zhaoxiang | Northwestern Polytechnical University |
Keywords: Infrastructure Systems and Services, System Modeling and Control, Service Systems and Organizations
Abstract: The continued growth of health care expenditures and changes in health care demand patterns have brought challenges to societies around the world. The structural contradiction between supply and demand in integrated elderly care models has become a focus of concern for governments and academic circles around the world. The study constructed a price-quality decision-making model for the supply of elderly care services in medical and nursing care institutions under a competitive subsidy mechanism, and analyzed the impact of government price control and subsidy intensity on institutional portfolio decisions, expected returns, and bidding capabilities under the competitive subsidy model. The research results show that under the competitive subsidy model, the service supply strategy of medical and nursing institutions shows a supply trend of "relatively high price + high quality"; the government can improve and control the income, service quality and social welfare of institutions in the market.
|
|
16:45-17:05, Paper MoCT5.4 | |
DEA Malmquist Research on Efficiency of Agricultural Infrastructure in Bangladesh |
|
Anwar, Raied Al | Northwestern Polytechnical University |
Zhao, Jing | Northwestern Polytechnical University |
Keywords: Infrastructure Systems and Services, Adaptive Systems, Technology Assessment
Abstract: This study employs Data Envelopment Analysis (DEA) and the Malmquist productivity index to investigate the influence of infrastructure development on the efficiency and productivity of the agricultural sector in Bangladesh from 2021 to 2023. Focusing specifically on the sectors of irrigation, transportation, and electrical infrastructure, the research highlights how these critical elements underpin the operational and scale efficiencies across Bangladesh's seven main agricultural divisions. Through a detailed examination of efficiency change (Effch), technological change (Techch), and total factor productivity change (Tfpch), significant findings emerge regarding the differential impact of infrastructural advancements on agricultural outputs. The analysis reveals that despite some regions showing marked improvements due to infrastructure enhancements, there remains a notable disparity across divisions, underscoring the need for regionally tailored infrastructure strategies. This paper presents a comprehensive overview of the role that sophisticated infrastructure plays in augmenting agricultural efficiency, offering valuable insights for policymakers, engineers, and stakeholders engaged in infrastructure planning and development.
|
|
MoCT6 |
MR06 |
Adaptive Systems and Control 6 |
|
Chair: Chen, Tianqi | University of Science and Technology Beijing |
|
15:45-16:05, Paper MoCT6.1 | |
A Cybertwin-Driven 6G Network Architecture for Distributed Data Management across IoT Verticals |
|
Cai, Jiahong | Hunan University of Science and Technology |
Liang, Wei | Hunan University of Science and Technology |
Huo, Yingzi | Hunan University of Science and Technology |
Li, Yang | Hunan University of Science and Technology |
Xiong, Naixue | Northeastern State University |
Xiao, Lijun | College of Information Engineering, Shanghai Maritime University |
Vasilakos, Athanasios | UiA |
Keywords: Digital Twin, Communications, Distributed Intelligent Systems
Abstract: With the rapid development of mobile communications technology and the Internet of Things (IoT), the 5G network has met several problems, such as lack of frequency, mismatch of network architecture, and difficulty in coordinating network resources. Aimed at solving issues for distributed optimization and efficient data coordination based on Cybertwin-driven 6G network architecture, we propose an architecture for distributed data management across IoT verticals. This architecture include cybertwin-driven 6G basic network layer and novel dynamic spectrum access management layer. We propose an advanced Particle Swarm Optimization - Simulated Annealing collaborative algorithm that dynamically adjusts the network structure, taking into consideration the location, different computational power, and variation on the importance of nodes in Cybertwin-driven 6G network basic layer. Using Blockchain technology for dynamic spectrum access management, network nodes earn rewards by mining, verification, and consensus, so thus, nodes obtain their needed spectrum resource. Experimental results show that the proposed scheme effectively optimizes the Device-to-Device (D2D) communication transmission yet reduces resource overhead in data transmission and network delay, improving the utilization of the spectrum gap when compared with other algorithms. The utilization of the spectrum gap opportunity rate reaches 98.18%, and the system utility increases by 7.36%.
|
|
16:05-16:25, Paper MoCT6.2 | |
Dual Quaternion-Based Moving Target Trajectory Tracking Adaptive Sliding Mode Control for Robotic Manipulator (I) |
|
Chen, Tianqi | University of Science and Technology Beijing |
Sun, Liang | University of Science and Technology Beijing |
Jiang, Jingjing | Loughborough University |
Keywords: System Modeling and Control, Adaptive Systems, Control of Uncertain Systems
Abstract: This article focuses on the moving target trajectory tracking control problem for robotic manipulator. The dynamics models of the rigid-body moving target and the manipulator end-effector based on the unit dual quaternion are firstly established. Then, the relative motion dynamics equation is deduced according to the arithmetic rules of dual quaternion. Further, an adaptive sliding mode controller is put forward to guarantee that the error between the pose of the rigid-body moving target and the pose of manipulator end-effector asymptotically converges to zero, where the upper bound on the norm of the uncertainty in the second-order differential error dynamics equation is estimated online by adaptive law. It is ensured via the Lyapunov theory that the asymptotic stability of the closed-loop system. Numerical simulation validates the theoretical consequences.
|
|
16:25-16:45, Paper MoCT6.3 | |
Recurrent Polynomial-Based FBLS for Adaptive Predictive PID Control of Nonlinear Discrete-Time Systems: Comparative Studies on Control Performance and Time Complexity (I) |
|
Rospawan, Ali | National Chung Hsing University |
Tsai, Ching-Chih | National Chung Hsing UNversity |
Keywords: Control of Uncertain Systems, System Modeling and Control, Adaptive Systems
Abstract: This paper proposes an adaptive predictive PID control approach using recurrent polynomial-based fuzzy broad learning systems (RP-FBLS) for nonlinear discrete-time dynamic systems. The RP-FBLS combines polynomial-based fuzzy logic, broad learning systems, and recurrent networks to enhance modeling capabilities and adaptive control performance. The RP-FBLS-based adaptive predictive PID controller is evaluated through three simulation case studies on two renowned discrete-time dynamic systems and one experimental study on a heating oven in semiconductor manufacturing. The comparative studies analyze the time complexity and control performance of the proposed approach over existing methods in the aspects of setpoint tracking, disturbance rejection, and robustness. The paper also quantifies the computational effort per sampling instance by considering real-time constraints and highlighting suitability for embedded systems. Simulation results demonstrate the RP-FBLS-based PID controller's efficacy in handling nonlinear dynamics while offering favorable computational performance characteristics. Experimental validations using the heating oven are provided for illustration of the practicability of the proposed method requiring computational efficiency and memory optimization.
|
|
16:45-17:05, Paper MoCT6.4 | |
Latent Factor Analysis Enhanced Graph Contrastive Learning for Recommendation (I) |
|
Long, Junfeng | Chongqing University of Posts and Telecommunications |
Hao, Wu | Southwest University |
Keywords: System Modeling and Control
Abstract: Graph Neural Networks (GNNs) are powerful learning methods for recommender systems owing to their robustness in handling complicated user-item interactions. Recently, the integration of contrastive learning with GNNs has demonstrated remarkable performance in recommender systems to handle the issue of highly sparse user-item interaction data. Yet, some available graph contrastive learning (GCL) techniques employ stochastic augmentation, i.e., nodes or edges are randomly perturbed on the user-item bipartite graph to construct contrastive views. Such a stochastic augmentation strategy not only brings noise perturbation but also cannot utilize global collaborative signals effectively. To address it, this study proposes a latent factor analysis (LFA) enhanced GCL approach, named LFA-GCL. Our model exclusively incorporates LFA to implement the unconstrained structural refinement, thereby obtaining an augmented global collaborative graph accurately without introducing noise signals. Experiments on four public datasets show that the proposed LFA-GCL outperforms the state-of-the-art models.
|
|
MoCT7 |
MR07 |
Online - Brain-Machine Interfaces (BMIs) 1 |
Regular Papers - Cybernetics |
Chair: Xu, Yingrui | Institute of Information Engineering, Chinese Academy of Sciences |
|
15:45-16:05, Paper MoCT7.1 | |
SRE-KGC : A Knowledge Graph Completion Model Based on Hidden Graph Structure and Relational Semantic Enhancement |
|
Lu, Sijun | Hohai University |
Xu, Guoyan | Hohai University |
Sun, Shuangyang | Hohai University |
Keywords: Neural Networks and their Applications, Deep Learning, Representation Learning
Abstract: Knowledge Graph Completion (KGC) endeavors to use existing knowledge graph data for predicting missing elements in triples. Recently, due to the efficiency of graph neural networks (GNNs) in capturing topological structures and the effectiveness of text descriptions in supplementing semantic information, numerous models integrating graph structures and entity descriptions have emerged. However, these approaches typically focus on aggregating neighboring feature information and overlook mining hidden structural information. Furthermore, they often append textual descriptions to entities independently without considering the semantics within specific relations. Hence, we propose a knowledge graph completion model SRE-KGC to solve these challenges. First, while aggregating neighbors, we analyze and mine the hidden structure in the neighborhood from the perspective of entities and relations; then we introduced a dual-layer attention mechanism to extract the most pertinent textual information towards relations from both the relational semantic level and the neighbor semantic level respectively; finally, the two learned features are fused and sent to the decoder for scoring. Experiments demonstrate that our model delivers superior performance.
|
|
16:05-16:25, Paper MoCT7.2 | |
Multimodal Fake News Detection Based on Chain-Of-Thought Prompting Large Language Models |
|
Xu, Yingrui | Institute of Information Engineering, Chinese Academy of Science |
Ge, Jingguo | University of Chinese Academy of Sciences |
Lyu, Guangxu | School of Cyber Security, University of Chinese Academy of Scien |
Li, Guoyi | Institute of Information Engineering, Chinese Academy of Science |
Li, Hui | Institute of Information Engineering, Chinese Academy of Science |
Keywords: AI and Applications, Deep Learning
Abstract: The rapid rise of social networks has led to a proliferation of fake news, especially those with images. The combination of images and text may confuse users and cause even more negative impact. Exisiting methods for fake news detection either require expert knowledge or large amounts of labeled data. In addition, these methods fails to clarify which part of the multimodal information is misleading or why. In this paper, we present a simple yet efficient Chain-of-thought Prompting method for Multimodal Fake News Detection (CP-FEND). It first finds the closest demonstration samples of the news to be detected by a kNN-based approach. Afterwards, we design a logical prompt method including Examination, Inference and Determination stages to guide Large Language Models (LLM) to automatically construct reasoning processes for the authenticity of the samples. Finally, LLMs are prompted to derive the authenticity of multimodal news with the guidance of samples and Chain-of-Thought reasoning. A reflective verification is performed to further improve the detection performance of LLMs through comprehensive evaluation of its responses. Extentsive experiments on two public datasets have demonstrated the superiority of our method over existing methods.
|
|
16:25-16:45, Paper MoCT7.3 | |
TS3IM: Unveiling Structural Similarity in Time Series through Image Similarity Assessment Insights |
|
Liu, Yuhan | East China Normal University |
Tu, Ke | East China Normal University |
Keywords: Machine Learning, Deep Learning
Abstract: In the realm of time series analysis, accurately measuring similarity is crucial for applications such as forecasting, anomaly detection, and clustering. However, existing metrics often fail to capture the complex, multidimensional nature of time series data, limiting their effectiveness and application. This paper introduces the Structured Similarity Index Measure for Time Series (TS3IM), a novel approach inspired by the success of the Structural Similarity Index Measure (SSIM) in image analysis, tailored to address these limitations by assessing structural similarity in time series. TS3IM evaluates multiple dimensions of similarity—trend, variability, and structural integrity—offering a more nuanced and comprehensive measure. This metric represents a significant leap forward, providing a robust tool for analyzing temporal data and offering more accurate and comprehensive sequence analysis and decision support in fields such as monitoring power consumption, analyzing traffic flow, and adversarial recognition. Our extensive experimental results also show that compared with traditional methods that rely heavily on computational correlation, TS3IM is 1.87 times more similar to Dynamic Time Warping (DTW) in evaluation results and improves by more than 50% in adversarial recognition.
|
|
16:45-17:05, Paper MoCT7.4 | |
CCAUNet: Boosting Feature Representation Using Complex Coordinate Attention for Monaural Speech Enhancement |
|
Liu, Sixing | Nanjing University of Aeronautics and Astronautics |
Jiang, Yi | Nanjing University of Aeronautics and Astronautics |
Yang, Qun | Nanjing University of Aeronautics and Astronautics |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: 基于深度学习的语音增强模型,例如 复杂的U-Net模型,取得了良好的效果。 然而,传统的基于卷积神经的方法 网络经常忽略语音的固有属性 频谱,例如长期时间依赖性, 跨频相关性与空间位置 信息,在处理语音信号时。这些 属性对于帮助模型区分至关重要 语音从噪音中走出来,提高语音质量。在这个 论文中,我们提出了一种新的语音增强模型,称为 CCAUNet.该模型的核心是一个创新的复合体 协调注意力结构,可以同时 捕捉并强调时间依赖性、频率 语音中的依赖性和空间位置信息 光谱。此外,此外,我们还雇用 关节的多分辨率STFT损耗和SI-SNR损耗 优化模型,从而协助复杂 在准确处理光谱时协调注意力 特征。可以捕获多分辨率STFT损失 不同频率尺度
|
|
MoCT8 |
MR08 |
Online - Infrastructure Systems and Services |
|
Chair: Xue, Jingyi | Zhengzhou University |
|
15:45-16:05, Paper MoCT8.1 | |
Multi-Level Graph Convolutional Networks with Enhanced Partitioning Strategies for Skeleton-Based Action Recognition |
|
Liu, Xinyu | Central China Normal University |
Yao, Huaxiong | Central China Normal University |
Peng, Qin | Central China Normal University |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning
Abstract: The methods of skeleton-based human action recognition (HAR) have gained widespread interest due to their robustness in capturing actions amidst changing environments and intricate backgrounds. Utilizing graph convolutional networks (GCNs) to describe the human skeleton for HAR has been shown to achieve impressive performance. However, most GCN-based methods only consider the relationships between adjacent joints, overlooking the relationships between joints that are not connected by natural physical links. Therefore, we propose a novel method called Multi-level Graph Convolutional Network, abbreviated as ML-GCN. We have refined the original partitioning strategy, introducing three novel strategies to more effectively enhance the connectivity among distant joints. Additionally, we incorporate a multi-level graph convolutional network and a non-local temporal convolutional network to better extract spatio-temporal features. Experiments conducted on the NTU-RGB+D and Kinetics datasets demonstrate that our model achieves a certain improvement in accuracy.
|
|
16:05-16:25, Paper MoCT8.2 | |
ROP: Exploring Image Captioning with Retrieval-Augmented and Object Detection of Prompt |
|
Zhang, Meng | South China Normal University |
Yang, Kai | South China Normal University |
Liu, Shouqiang | South China Normal University |
Keywords: Application of Artificial Intelligence, AI and Applications, Image Processing and Pattern Recognition
Abstract: Image captioning models aim to bridge the modalities of vision and language by generating natural language descriptions that match the content of an input image. Existing methods for generating image captions do so by integrating visual encoders with language models, and by training the model using large volumes of image-text pairs. However, this approach leads to a significant increase in the model's parameter size due to the need to store extensive visual concepts and detailed textual descriptions. With substantial investments in datasets and computational resources, the quality of image caption generation has notably improved, though these advancements come at a high cost. In this paper, we introduce a novel method, termed ROP, that combines retrieval and detection to address this challenge. This approach significantly reduces the number of trainable parameters while preserving the accuracy of the descriptions. Through the retrieval module, the model can find the k most similar sentences to the input image, utilizing this rich contextual information to enhance the overall understanding of the image content. Simultaneously, the detection module enhances the model's interpretation of fine details by identifying prominent regions within the image. Our method has proven its effectiveness and feasibility in experiments.
|
|
16:25-16:45, Paper MoCT8.3 | |
Human-Machine Collaborative Decision-Making Method under Emergency Scenario for Unmanned Warehouse |
|
Xue, Jingyi | Zhengzhou University |
Niu, Tao | Zhengzhou University |
Zong, Jingting | Zhengzhou University |
Zhang, Yingkang | Zhengzhou University |
Guo, Yibo | Zhengzhou University |
Keywords: Human-Machine Cooperation and Systems, Networking and Decision-Making, Human-centered Learning
Abstract: In modern operational environments, all work plans are susceptible to the influence of emergencies, such as equipment failures and resource shortages, underscoring the criticality of emergency management. Current solutions primarily rely on experienced human or employ machine scheduling models. Human-machine collaboration can leverage their respective strengths, thereby enhancing decision-making efficiency and safety. Based on this, a framework for human-machine collaborative emergency management is proposed in this paper. The emergency management process is divided into two stages: task selection and task scheduling. In the task selection phase, the Human-Machine Collaborative Decision-Making Algorithm for Dynamic Tasks (HMC-DMADT) is introduced to identify key nodes and generate task lists, and humans can correct and confirm lists they do not endorse. In the task scheduling phase, the Dynamic Task Scheduling Algorithm with Human Experience-Based Constraints (HEC-ETSA) is proposed, which integrates the Action-Mask mechanism with the Double Deep Q-Network (DDQN) algorithm to optimize action selection, ensuring decision safety and feasibility. Finally, a simulation platform is established to conduct numerous experiments within an unmanned warehousing scenario. The results demonstrate that the proposed human-machine collaborative emergency management framework effectively addresses emergencies in unmanned warehousing operations.
|
|
16:45-17:05, Paper MoCT8.4 | |
TS-FL: Software Fault Localization Based on Teacher-Student Network |
|
Zhang, Jiale | Beijing Information Science and Technology University |
Yue, Lei | Beijing Information Science and Technology University |
Zheng, Liwei | Beijing Information Science and Technology University |
Cui, Zhanqi | Beijing Information Science and Technology University |
Keywords: Quality and Reliability Engineering, Decision Support Systems, Fault Monitoring and Diagnosis
Abstract: Automated fault localization methods can expedite the process for developers to locate faulty code in complex software systems. Existing fault localization methods improve performance by combining the suspicious scores from different kinds of fault localization methods. Among these, the suspicious scores of mutation-based fault localization methods, commonly referred as mutation features, have been proven to effectively enhance fault localization performance. However, collecting mutation features requires generating a large number of mutants and executing test cases for each mutant, which demands substantial computational resources and time. Additionally, certain code statements lack mutation features because no mutant can be generated for them, which affect the performance of fault localization. To address this, this paper proposes a Teacher and Student network-based Fault Localization (TS-FL) method. Firstly, a BiLSTM-based classifier is used to extract the deep semantic features of code statements, and the suspicious scores calculated by spectrum-based and mutation-based fault localization methods are used as the spectrum features and mutation features of the code statements, respectively. Then, a teacher-student network is constructed, and a mutual learning strategy is used to collaboratively train the teacher and student network, enabling the student network to learn the mutation feature information from the teacher network and thereby enhance its fault localization performance. The experimental results on Defects4J show that, without using mutation features, TS-FL can locate 36, 36, and 35 more faulty statements than spectrum-based fault localization methods Ochiai, Tarantula, and DStar, and can locate 8 more faulty statements than deep learning-based fault localization method TRANSFER-FL, in terms of Top-1.
|
|
MoCT9 |
MR09 |
AIoT 1 |
|
Chair: Zhang, ZhiYao | Guangdong University of Technology |
|
15:45-16:05, Paper MoCT9.1 | |
Energy-Optimized Computation Offloading with Improved Differential Evolution in UAV-Enabled Edge and Cloud Computing |
|
Yuan, Haitao | Beihang University |
Wang, Meijia | Beihang University |
Bi, Jing | Beijing University of Technology |
Zhang, Jia | Southern Methodist University |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Cloud, IoT, and Robotics Integration, Evolutionary Computation, Metaheuristic Algorithms
Abstract: Mobile edge computing (MEC) emerges as a vital paradigm to support the increasing use of mobile users (MUs) with capabilities similar to cloud computing. While most research concentrates on MEC facilitated by terrestrial base stations (BSs), its applicability in scenarios such as disaster rescue and field operations is limited. Efforts have been made to explore MEC assisted by unmanned aerial vehicles (UAVs) with efficient scheduling algorithms. However, relying solely on UAVs for MEC has limitations, particularly for computation-intensive applications. This work proposes a hybrid MEC system leveraging UAVs and BS. Multiple UAVs and a BS are deployed to provide MEC services directly from UAVs or indirectly from the BS. We formulate an energy-efficient scheduling problem to minimize energy consumption by jointly optimizing UAV trajectories, task associations, and allocation of computing and transmitting resources. To solve it, this work designs a hybrid algorithm named Success History-based parameter Adaptation for Differential Evolution with a Niching-based population size reduction strategy and an efficient Ensemble sinusoidal scheme (SHADE-NE). Experimental results validate the superiority of SHADE-NE over its benchmark peers, thus proving that SHADE-NE greatly enhances the performance of the system.
|
|
16:05-16:25, Paper MoCT9.2 | |
Low-Latency and Energy-Efficient Task Scheduling for End-Edge-Cloud Collaborative Computing (I) |
|
Yuan, Haitao | Beihang University |
Li, Jingyao | Beihang University |
Ma, Yaofei | Beihang University |
Bi, Jing | Beijing University of Technology |
Yang, Jinhong | CSSC Systems Engineering Research Institute |
Zhang, Jia | Southern Methodist University |
Keywords: Cloud, IoT, and Robotics Integration, Swarm Intelligence, Evolutionary Computation
Abstract: Mobile edge computing (MEC) is a new paradigm which improves the quality of service compared with traditional cloud computing. In MEC, computational tasks are submitted by numerous end users and are partially offloaded to edge servers or a central cloud. However, the characteristics of tasks are different from each other, and the limited resources of computational nodes are also heterogeneous, which brings great challenges to computation offloading and resource allocation for MEC. This work establishes an end-edge-cloud collaborative computing network, which consists of end devices, edge servers, and a central cloud. Task execution location and CPU running frequency determine the execution time and energy consumption to finish the tasks. Considering the aforementioned factors, a multi-objective constrained optimization problem is formulated. To solve the problem, an improved Non-dominated Sorting Genetic Algorithm II (NSGA-II) with self-adaptive crossover and mutation rates is proposed, which is called Improved NSGA-II with Self-adaptive Crossover and Mutation (INSCM). The total execution time and energy consumption can be jointly minimized with our proposed INSCM. Numerous experiments are carried out to test the performance of INSCM. Simulation results show that INSCM effectively improves the performance of NSGA-II and surpasses random offloading and NSGA-III.
|
|
16:25-16:45, Paper MoCT9.3 | |
Data-Enhanced Prediction with Decomposition and Amplitude-Aware Permutation Entropy in Distributed Computing Systems (I) |
|
Yuan, Haitao | Beihang University |
Hu, Qinglong | Beihang University |
Bi, Jing | Beijing University of Technology |
Zhang, Wei | University of Connecticut |
Zhang, Jia | Southern Methodist University |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Application of Artificial Intelligence, Cloud, IoT, and Robotics Integration, Deep Learning
Abstract: In recent years, distributed computing has witnessed widespread applications across numerous organizations. Predicting workload and computing resource data can facilitate proactive service operation management, leading to substantial improvements in quality of service and cost efficiency. However, these data often exhibit non-linearity, high volatility, and interdependencies across different categories, presenting challenges for accurate forecasting. Consequently, there is a critical need to develop a method that thoroughly and comprehensively analyzes all available data to forecast future trends effectively. This work proposes a novel integrated data-enhanced prediction model named SVI for achieving high-accuracy workload prediction in distributed computing systems. SVI employs the Savitzky-Golay filter and Variational mode decomposition for feature processing, whose features are subsequently utilized by Informer for multivariate joint analysis of the enhanced data, achieving highprecision prediction. Ablation and comparative experiments with advanced prediction models are conducted on the Google cluster trace and other typical datasets. Realistic data-driven results indicate that SVI improves the prediction accuracy by 35.4% compared to the original Informer, with each module contributing to the performance enhancement. Furthermore, compared with Autoformer, SVI enhances the prediction accuracy of workload, CPU, and memory by 62.5%, 65.6%, and 69.1%, respectively.
|
|
16:45-17:05, Paper MoCT9.4 | |
Energy-Aware Scheduling for a Cloud Data Center by LSTM Prediction and African Vulture Optimization (I) |
|
Zhang, ZhiYao | Guangdong University of Technology |
Zhu, QingHua | Guangdong University of Technology |
Hou, Yan | Guangdong University of Technology |
Keywords: Cloud, IoT, and Robotics Integration
Abstract: As data volume increases and data parallelism strengthens, cloud service providers must reduce energy consumption by using effective cloud scheduling methods. This paper investigates scheduling multiple servers in cloud data centers to achieve more balanced energy consumption over the long term while ensuring task completion. We propose a method based on long short-term memory (LSTM) and the African vulture optimization algorithm (AVOA), combining prediction and scheduling. The proposed method is divided into two main modules: the prediction phase and the scheduling phase. The purpose of the prediction is to reserve partial resources on servers for large tasks, thereby reducing energy consumption by minimizing the number of active servers. Therefore, we use a prediction method based on LSTM to predict upcoming resource demands and anticipate server requirements. In the scheduling phase, we adjust the priority of tasks to ensure task completion while maximizing the processing of tasks with larger demands. Experiments are performed to validate the application and effectiveness of the proposed method, which outperforms the benchmark algorithms.
|
|
MoCT10 |
MR10 |
Deep Learning and Neural Networks 3 |
Special Sessions: Cyber |
Chair: Ke, Zhenghao | Zhejiang University of Technology |
|
15:45-16:05, Paper MoCT10.1 | |
Consistency-Driven Cross-Modality Transferring for Continuous Sign Language Recognition (I) |
|
Ke, Zhenghao | Zhejiang University of Technology |
Liu, Sheng | Zhejiang University of Technology |
Ke, Chengyuan | Zhejiang University of Technology |
Feng, Yuan | Zhejiang University of Technology |
Keywords: Machine Vision, Deep Learning, Transfer Learning
Abstract: Sign language consists of a unique grammar and expression system. While Continuous Sign Language Recognition (CSLR) approaches share similarities with conventional NLP approaches in which language understanding is involved, the current works on CSLR usually focus on feature extraction and fusion, neglecting the language semantics, causing false positives and overfitting in gloss detection. In this paper, we propose a novel Consistency-Driven Cross-Modality Transferring (CDCM) mechanism to transfer the language modality to visual modality under a consistency-driven optimization. By progressively reducing the gap between text and visual modality, we are able to stably train CSLR networks. The experiments show the efficacy of our approach, with a notable relative reduction in Word Error Rate of 6.91% on average across multiple datasets. We also demonstrate that our approach contributes corrections to suppress the false peaks on highly related and visually similar glosses while training, making glosses in semantic space distinct, thereby achieving improved overall performance.
|
|
16:05-16:25, Paper MoCT10.2 | |
SCG: A Novel Spatiotemporal Coupling Graph Convolutional Network-Incorporated Approach for Dynamic QoS Estimation (I) |
|
Bi, Fanghui | University of Chinese Academy of Sciences |
He, Tiantian | Agency for Science, Technology and Research |
Keywords: Representation Learning, Big Data Computing,, Deep Learning
Abstract: Dynamic Quality-of-Service (QoS) data capturing temporal variations in user-service interactions are essential source for service selection and user behavior understanding. Approaches based on Latent Feature Analysis (LFA) have shown to be beneficial for discovering effective temporal patterns in QoS data. However, existing methods cannot well model the spatiality and temporality implied in dynamic interactions in a unified form, causing abundant accuracy loss for missing QoS estimation. To address the problem, this paper presents a novel Graph Convolutional Network (GCN)-based dynamic QoS estimator namely Spatiotemporal Coupling GCN (SCG) model with the three-fold ideas as below. First, SCG builds its dynamic graph convolutional rules by incorporating generalized tensor product framework, for unified modeling of spatial and temporal patterns. Second, SCG combines the heterogeneous GCN layer with tensor factorization, for effective representation learning on time-varying bipartite user-service graphs. Third, it further simplifies the dynamic GCN structure to lower the training difficulties. Extensive experiments have been conducted on two large-scale widely-adopted QoS datasets describing throughput and response time. The results demonstrate that SCG realizes higher QoS estimation accuracy compared with the state-of-the-arts, illustrating it can learn powerful representations to users and cloud services.
|
|
16:25-16:45, Paper MoCT10.3 | |
Discrete Multi-View Feature Propagation Preserving Graph Clustering (I) |
|
Duan, Zhixuan | SouthWest University |
Wang, Zuo | Southwest University |
Bi, Fanghui | University of Chinese Academy of Sciences |
He, Tiantian | Agency for Science, Technology and Research |
Keywords: Representation Learning, Complex Network, Machine Learning
Abstract: Graph clustering is a fundamental and challenging learning task, which is conventionally approached by grouping similar vertices based on edge structure and feature similarity. In contrast to previous methods, in this paper, we investigate how multi-view feature propagation can influence cluster discovery in graph data. To this end, we present Discrete Multi-View Feature Propagation Preserving Graph Clustering (DMVFPPGC), a novel method that leverages multi-view feature propagation to enhance cluster identification in graph data. DMVFPPGC employs a unified objective function that utilizes graph topology and multi-view vertex features to determine vertex cluster membership, regularized by a module that supports key latent feature propagation. We derive an iterative algorithm to optimize this function, prove model convergence within a finite number of iterations, and analyze its computational complexity. Our experiments on various real-world graphs demonstrate the superior clustering performance of DMVFPPGC compared to well-established methods, manifesting its effectiveness across different scenarios.
|
|
MoCT11 |
MR11 |
Computational Intelligence and Soft Computing 6 |
Regular Papers - Cybernetics |
Chair: Qiu, Yinghan | Nanjing University of Information Science and Technology |
|
15:45-16:05, Paper MoCT11.1 | |
Energy-Optimized Task Offloading with Genetic Simulated-Annealing-Based PSO for Heterogeneous Edge and Cloud Computing |
|
Yuan, Haitao | Beihang University |
Zheng, Ziyue | Beihang University |
Bi, Jing | Beijing University of Technology |
Zhang, Jia | Southern Methodist University |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Computational Intelligence, Evolutionary Computation, Cloud, IoT, and Robotics Integration
Abstract: Recent years have seen a surge in Internet of Things (IoT) technologies, with billions of mobile devices (MDs) straining limited computing and networking resources. Mobile edge computing offloads tasks from MDs to edge servers, saving energy and reducing network pressure. Edge servers provide closer services yet have fewer resources than cloud servers. A new heterogeneous edge and cloud computing paradigm combines the benefits of both. Edge servers provide close proximity services to MDs, while the cloud owns enough resources. The existence of mobile IoT devices makes it more practical to consider mobility when allocating resources of edge servers to decrease the energy consumption of the heterogeneous edge and cloud while meeting the latency needs of tasks. This work formulate a constrained energy consumption optimization problem and design a hybrid algorithm named Genetic Simulatedannealing- based particle swarm optimization (PSO) to yield a near-optimal solution. Simulation results prove that compared to genetic algorithm, PSO, simulated-annealing-based PSO, and Trex, GSPSO reduces the total energy consumption by 38.64%, 54.63%, 45.94%, and 36.21%, respectively.
|
|
16:05-16:25, Paper MoCT11.2 | |
Non-Linearly Weighted Pheromone Updating for Ant Colony Optimization |
|
Qiu, Yinghan | Nanjing University of Information Science and Technology |
Yang, Qiang | Nanjing University of Information Science and Technology |
Li, Jian-Yu | South China University of Technology |
Jia, Ya-Hui | South China University of Technology |
Wang, Zijia | Guangzhou University |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Zhang, Jun | Hanyang University |
Keywords: Swarm Intelligence, Evolutionary Computation, Computational Intelligence
Abstract: Ant Colony Optimization (ACO) has witnessed great success in tackling the Traveling Salesman Problem (TSP). In ACO, ants involved in the pheromone update play pivotal roles in its optimization effectiveness. Along this road, this paper designs an ant selection mechanism along with a non-linear weight method for ACO to update the pheromone effectively, leading to a novel ACO, called NLW-ACO. Particularly, NLW-ACO leverages the fitness values of ants to assign each ant a selection probability. Then, it adaptively chooses ants for pheromone update. Subsequently, a nonlinear weight is assigned to each selected ant based on its fitness value to update the pheromone matrix. Resultantly, better ants have higher selection probabilities and larger weights to take part in the pheromone update. This leads to that NLW-ACO compromises search convergence and search diversity appropriately to seek for the optimum. Experiments have been carried out on 10 TSP instances of diverse scales. The experimental findings substantiate that NLW-ACO significantly outperforms the 5 typical ACO methods, especially on large-scale TSP problems.
|
|
16:25-16:45, Paper MoCT11.3 | |
Forced Breeding Evolution for Numerical Optimization |
|
Lai, Wei-Kai | National Ilan University |
Cho, Hsin-Hung | National Ilan University |
Tseng, Fan-Hsun | National Cheng Kung University |
Chen, Chi-Yuan | National Ilan University |
Zeng, Jiang-Yi | CSIE, National Cheng Kung University |
Keywords: Swarm Intelligence, Evolutionary Computation, Computational Intelligence
Abstract: Genetic Algorithm and Differential Evolution are widely utilized and emulated in the field of metaheuristic algorithms. Species achieve population evolution through crossover and mutation with a small number of individuals. However, this paper argues that the continuity of species should be based on the phenomenon of species reproduction. This phenomenon applies to various species, with typically more dominant individuals having greater mate selection priority, and vice versa. This approach not only preserves the essence of GA and DE but also imparts a more diverse search capability. Experimental results demonstrate that our proposed method not only incorporates some concepts from GA and DE but also ensures the preservation of solution structures, preventing easy entrapment in local optimum in high-dimensional problems
|
|
16:45-17:05, Paper MoCT11.4 | |
Adaptive Ant Selection for Pheromone Update in Ant Colony Optimization |
|
Wang, Biao | Henan Normal University |
Duan, Danting | Key Laboratory of Media Audio & Video, Communication University |
Yang, Qiang | Nanjing University of Information Science and Technology |
Zhao, Xiao-Yan | Henan Normal University |
Li, Tao | Henan Normal University |
Liu, Dong | Henan Normal University |
Zhang, Jun | Hanyang University |
Keywords: Computational Intelligence, Evolutionary Computation, Swarm Intelligence
Abstract: Ant selection for updating the pheromone is one most crucial operation in ant colony optimization (ACO). In this direction, this paper designs an adaptive ant selection strategy (AAS) to adaptively and dynamically select ants to update the pheromone for ACO. Therefore, a new ACO, called AAS-ACO is developed. Specifically, AAS-ACO first assigns a non-linear selection probability to each ant based on its path ranking. As a result, better ants preserve exponentially higher selection probabilities. Then, based on the selection probabilities, ants are adaptively selected for the pheromone update. By this means, on the one hand, the number of ants involved in the pheromone update is uncertain; on the other hand, relatively better ants instead of absolutely better ones are adaptively selected to update the pheromone, leading to the promotion of search diversity. Subsequently, a dynamic weighting strategy is designed to adjust the amount of the pheromone deposited by the best ant in the current iteration to enhance the search convergence. With the two schemes, AAS-ACO is expected to maintain a suitable compromise between search diversity and search convergence to seek the optimal solutions to TSP. Experiments on 10 classical TSP instances varying from 100 to 1000 cities have proven the significant superiority of AAS-ACO to 5 classic ACOs, especially on high-dimensional TSP problems.
|
|
MoCT12 |
MR12 |
Haptic and Human-Computer Interaction 3 |
|
Chair: Yuguchi, Akishige | Tokyo University of Science |
|
15:45-16:05, Paper MoCT12.1 | |
Learning Human Strategy for Flattening Wrinkled Cloth |
|
Kant, Nilay | Michigan State University |
Aryal, Ashrut | Michigan State University |
Ranganathan, Rajiv | Michigan State University |
Mukherjee, Ranjan | Michigan State University |
Owen, Charles | Michigan State University |
Keywords: Human-centered Learning, Assistive Technology, Human-Collaborative Robotics
Abstract: This paper explores a novel approach to model strategies for flattening wrinkled cloth learning from humans. A human participant study was conducted where the participants were presented with various wrinkle types and tasked with flattening the cloth using the fewest actions possible. A camera and Aruco marker were used to capture images of the cloth and finger movements, respectively. The human strategies for flattening the cloth were modeled using a supervised regression neural network, where the cloth images served as input and the human actions as output. Before training the neural network, a series of image processing techniques were applied, followed by Principal Component Analysis (PCA) to extract relevant features from each image and reduce the input dimensionality. This reduction decreased the model's complexity and computational cost. The actions predicted by the neural network closely matched the actual human actions on an independent data set, demonstrating the effectiveness of neural networks in modeling human actions for flattening wrinkled cloth.
|
|
16:05-16:25, Paper MoCT12.2 | |
Capturing Contact Surfaces by a Frustrated Total Internal Reflection System Using a Curved Plate for Comparison of the Beginning of Touching Motions by Humanitude Experts and Novices |
|
Yuguchi, Akishige | Tokyo University of Science |
Toyoda, Mayuki | Nara Institute of Science and Technology |
Cho, Sung-Gwi | Tokyo Denki University |
Nakazawa, Atsushi | Okayama University |
Takamatsu, Jun | Microsoft |
Yoshino, Koichiro | Guardian Robot Project, RIKEN/ Nara Institute of Science and Tec |
Ogasawara, Tsukasa | Nara Institute of Science and Technology |
Keywords: Human Performance Modeling, Haptic Systems, Affective Computing
Abstract: Analyzing time-series changes of the contact surface by touching is important to elucidate the touching skills of Humanitude as one of the pervasive multimodal comprehensive care methodologies. For the analysis, there is a frustrated total internal reflection (FTIR) method to capture contact surfaces on a transparent flat plate by a camera. However, this conventional flat plate is far from the actual surfaces of care receivers because the surfaces of humans consist of curved shapes. In this paper, we propose an FTIR sensing system using a transparent curved plate to capture more ideal contact states with a surface shape more similar to the human body. We collect the contact surface data of the beginning of touching motions by Humanitude experts and novices using the FTIR sensing system with the curved and flat plates. Then, we compare the data by the experts and novices in terms of time-series contact areas and the quantitative indices and subjective evaluation and discuss the analysis results. Through these experiments, we confirm that the proposed system has the potential for the novices to perform more correctly the beginning of Humanitude's touching motions.
|
|
16:25-16:45, Paper MoCT12.3 | |
Predictive Tree-Based Virtual Keyboard for Improved Gaze Typing |
|
Hrushikesh, Etikikota | IIT GANDHINAGAR |
Meena, Yogesh | IIT Gandhinagar |
Keywords: Human-Computer Interaction, Human-Machine Interaction, Assistive Technology
Abstract: On-screen keyboard eye-typing systems are limited due to the lack of predictive text and user-centred approaches, resulting in low text entry rates and frequent recalibration. This work proposes integrating the prediction by partial matching (PPM) technique into a tree-based virtual keyboard. We developed the Flex-Tree on-screen keyboard using a two-stage tree-based character selection system with ten commands, testing it with three degree of PPM (PPM1, PPM2, PPM3). Flex-Tree provides access to 72 English characters, including upper- and lower-case letters, numbers, and special characters, and offers functionalities like the delete command for corrections. The system was evaluated with sixteen healthy volunteers using two specially designed typing tasks, including the hand-picked and random-picked sentences. The spelling task was performed using two input modalities: (i) a mouse and (ii) a portable eye-tracker. Two experiments were conducted, encompassing 24 different conditions. The typing performance of Flex-Tree was compared with that of a tree-based virtual keyboard with an alphabetic arrangement (NoPPM) and the Dasher on-screen keyboard for new users. Flex-Tree with PPM3 outperformed the other keyboards, achieving average text entry speeds of 27.7 letters/min with a mouse and 16.3 letters/min with an eye-tracker. Using the eye-tracker, the information transfer rates at the command and letter levels were 108.4 bits/min and 100.7 bits/min, respectively. Flex-Tree, across all three degree of PPM, received high ratings on the system usability scale and low-weighted ratings on the NASA Task Load Index for both input modalities, highlighting its user-centred design.
|
|
16:45-17:05, Paper MoCT12.4 | |
Cognitive Processes of Haptic Perception of Virtual Objects: Effect of Human and Machine Disruptions (I) |
|
Ghaemi Dizaji, Lida | Univeristy of Calgary |
Zenia, Nusrat Zerin | University of Calgary |
Vite, Yobbahim | University of Calgary |
Hu, Yaoping | University of Calgary |
Keywords: Haptic Systems, Human-Machine Interaction, Virtual/Augmented/Mixed Reality
Abstract: Haptic perception of object shape is crucial for humans to interact with machines in human-machine systems (HMS). This perception is prone to disruptions arising from the human and/or machine sides of HMS. An unexplored topic is cognitive processes of the perception. Herein, this study examined the feasibility of measuring the cognitive processes within a virtual environment (i.e., an HMS). Non-invasive electroencephalography was employed to record brain activity of human participants during a task, which was perturbed by disruptions from the human and machine sides. The cognitive processes were measured by using an engagement ratio (ER) and an attention ratio (AR) as physiological metrics, besides behavioral metrics. The results of the study confirmed the feasibility of ER and AR to measure the processes and, in turn, opens an avenue towards elucidating the processes for improving HMS.
|
|
MoDT1 |
MR01 |
AI Applications 4 |
Regular Papers - Cybernetics |
Chair: Petropoulakis, Panagiotis | Technical University of Munich (TUM) |
|
17:30-17:50, Paper MoDT1.1 | |
State Representations As Incentives for Reinforcement Learning Agents: A Sim2Real Analysis on Robotic Grasping |
|
Petropoulakis, Panagiotis | Technical University of Munich (TUM) |
Gräf, Ludwig | Technical University of Munich |
Malmir, Mohammadhossein | Technical University of Munich |
Josifovski, Josip | Technical University of Munich |
Knoll, Alois | Technical University of Munich (TUM) |
Keywords: Representation Learning, Transfer Learning, Cloud, IoT, and Robotics Integration
Abstract: Choosing an appropriate representation of the environment for the underlying decision-making process of the reinforcement learning agent is not always straightforward. The state representation should be inclusive enough to allow the agent to informatively decide on its actions and disentangled enough to simplify policy training and the corresponding sim2real transfer. Given this outlook, this work examines the effect of various representations in incentivizing the agent to solve a specific robotic task: antipodal and planar object grasping. A continuum of state representations is defined, starting from hand-crafted numerical states to encoded image-based representations, with decreasing levels of induced task-specific knowledge. The effects of each representation on the ability of the agent to solve the task in simulation and the transferability of the learned policy to the real robot are examined and compared against a model-based approach with complete system knowledge. The results show that reinforcement learning agents using numerical states can perform on par with non-learning baselines. Furthermore, we find that agents using image-based representations from pre-trained environment embedding vectors perform better than end-to-end trained agents, and hypothesize that separation of representation learning from reinforcement learning can benefit sim2real transfer. Finally, we conclude that incentivizing the state representation with task-specific knowledge facilitates faster convergence for agent training and increases success rates in sim2real robot control.
|
|
17:50-18:10, Paper MoDT1.2 | |
An Time-Frequency Graph Attention Network for Obstructive Sleep Apnea Detection |
|
Luo, JingLiu | Shenzhen University |
Wu, Hao | Shenzhen University |
Keywords: Neural Networks and their Applications, Application of Artificial Intelligence, Biometric Systems and Bioinformatics
Abstract: Obstructive sleep apnea (OSA), characterized by interruptions in breathing during sleep, is among the diseases with a high incidence and long-term harm. With the development of wearable devices and advancements in deep learning techniques, methods utilizing deep learning to process photoplethysmography (PPG) and oxygen saturation (SpO2) signals have shown significant potential in OSA detection. However, recent work fails to explicitly capture the intricate relationships within and between single modalities, the long-range dependencies of signals from different modalities, and the lack of model interpretability. Recently, graph attention neural networks have progressed in tasks related to time series, enabling better extraction of spatial features and interpretability. This paper proposes a time-frequency graph attention network for OSA detection to address this challenge and research gap. The proposed method uses a time-frequency feature extraction module to extract contextual dependencies between signals. It introduces a graph attention neural network to explore spatio-temporal dependencies among multi-channel signals. Experimental results on the MESA benchmark dataset demonstrate that the proposed model outperforms existing deep learning-based methods in OSA detection tasks.
|
|
18:10-18:30, Paper MoDT1.3 | |
AnoGrad: Time Series Anomaly Detection with Score-Based Generative Model |
|
Mi, Jiaxuan | Beijing University of Posts and Telecommunications |
Zhu, Xinning | Beijing University of Posts and Telecommunications |
Meng, Zhaoyang | Beijing University of Posts and Telecommunications |
Hu, Zheng | Beijing University of Posts and Telecommunications |
Keywords: AI and Applications, Deep Learning, Machine Learning
Abstract: 时间序列异常检测是一项广泛研究的 但在现代学术研究中具有挑战性的任务。以前 研究仍然会遇到由以下原因导致的性能下降 异常浓度。为了应对这一挑战,我们 提出了一个结合 Transformer 和基于分数的框架 用于时间序列异常检测的生成模型,名为 阿诺格勒。该模型采用创新的变压器模型 嵌入变分自动编码器以捕获时间 依赖项和分发信息。对于异常 集中问题,我们设计了一种新颖的条件 基于分数匹配的随机微分方程 并使用新的方差混合过程来扰乱 均匀可控的概率密度空间, 提高对不同集中异常的适应能力 场景。此外,我们还设计了一个分数估计网络 将时间和分布特征作为 条件输入以降低时间序列中的不确定性 数据生成过程。对多个 真
|
|
MoDT2 |
MR02 |
Autonomous and Intelligent Vehicles |
|
Chair: Golabi, Mahmoud | IRIMAS - University of Haute Alsace |
|
17:30-17:50, Paper MoDT2.1 | |
Amplitude-Ensemble Quantum-Inspired Tabu Search Algorithm for Solving 0/1 Knapsack Problems |
|
Tseng, Kuo-Chun | National Ilan University |
Lai, Wei-Chieh | National Ilan University |
Chen, I-Chia | National Ilan University |
Hsiao, Yun-Hsiang | National Ilan University |
Chiue, Jr-Yu | National Ilan University |
Huang, Wei-Chun | National Ilan University |
Keywords: Metaheuristic Algorithms, Quantum Machine Learning, Evolutionary Computation
Abstract: In this study, we learned more about quantum properties from quantum and metaheuristic algorithms after conducting many experiments. We then propose an improved version of QTS (Quantum-inspired Tabu Search), which enhances the utilization of population information. The new version is called "Amplitude-Ensemble" QTS (AE-QTS). This makes AE-QTS more similar to the real quantum search algorithm, Grover Search Algorithm, in abstract concept, while keeping the simplicity of the algorithm. Later, we demonstrate the AE-QTS on the classical combinatorial optimization 0/1 knapsack problem. Experimental results show that the AE-QTS outperforms other algorithms, including the QTS, by at least an average increase of nearly 20% in all cases and even by 30% in some cases. Even as the problem complexity increases, the quality of the solutions found by our method remains superior to that of the QTS. These results prove that our method has better search performance.
|
|
17:50-18:10, Paper MoDT2.2 | |
Calibrating Low-Cost Environmental Sensors Using Optimised Artificial Neural Networks |
|
Arafin, Tanzila | Institute of Intelligent Systems Research and Innovation (IISRI) |
Hosen, Anwar | Deakin University |
Pappu, Mohammad Rokonuzzaman | Deakin University |
Keywords: Metaheuristic Algorithms, Machine Learning, Neural Networks and their Applications
Abstract: Low-cost sensors play a vital role in diverse applications, yet their accuracy limitations hinder widespread adoption. To address this problem, this study proposes a calibration technique using Artificial Neural Networks (ANNs) optimised with metaheuristic algorithms. We apply five well-known and widely used metaheuristic algorithms: Particle Swarm Optimisation (PSO), Harris Hawk Optimisation (HHO), Driving Training Based Optimisation (DTBO), Squirrel Search Optimisation (SSO) and Whale Optimisation Algorithm (WOA). These algorithms are used to tune the hyperparameters of an ANN to improve sensor accuracy via calibration. In this study, data collection was conducted alongside corresponding ground truth, resulting in a comprehensive dataset suitable for various applications such as environmental monitoring and sensor calibration. A comparative analysis among the algorithms revealed that WOA consistently outperformed the other metaheuristic optimisation techniques with the lowest Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) values. This study provides valuable insights into combining metaheuristic optimisation techniques with ANNs for sensor calibration, potentially enhancing the development of more resilient and precise sensor technologies in the future.
|
|
18:10-18:30, Paper MoDT2.3 | |
A Multi-Surrogate-Assisted Solution Approach for Solving the Electric Vehicle Charging Scheduling Problem |
|
Golabi, Mahmoud | IRIMAS - University of Haute Alsace |
Azerine, Abdennour | IRIMAS, Université De Haute-Alsace |
Oulamara, Ammar | ORIA, Universit ́e De Lorraine, Nancy |
Idoumghar, Lhassane | Université De Haute-Alsace |
Keywords: Metaheuristic Algorithms, Machine Learning, AI and Applications
Abstract: The contribution of electric vehicles (EVs) to mitigating greenhouse gas emissions and achieving climate objectives is indisputable. However, the surge in EV usage poses significant challenges to the existing electrical grid, emphasizing the need for efficient charging strategies. To address these challenges, this paper delves into the complex task of scheduling EV charging at public stations, taking into account the arrival and departure times of the vehicles. EV drivers convey their charging requirements before arriving at the station. Considering limitations in power capacity and charger availability, the scheduler strategically allocates chargers and manages power distribution such that the total discrepancy between the requested and final state-of-charge levels at departure is minimized. Given the NP-hard complexity inherent in charging scheduling problems, this research introduces a solution framework employing a genetic algorithm integrated with mathematical programming. To expedite problem-solving, the investigation explores the implementation of a multi-surrogate-assisted model as a substitute for the mathematical model. Simulation results indicate the efficiency of the proposed approaches, showcasing their effectiveness in tackling the complex nature of EV charging scheduling problems.
|
|
MoDT3 |
MR03 |
Augmented and Virtual Reality 2 |
|
Chair: Hou, Qinyao | Southeast University |
|
17:30-17:50, Paper MoDT3.1 | |
Enhancing Sign Language Teaching: A Mixed Reality Approach for Immersive Learning and Multi-Dimensional Feedback |
|
Wen, Hongli | Beijing Normal University |
Xu, Yang | Beijing Normal University |
Li, Lin | BeiJing Normal University |
Ru, Xudong | Beijing Normal University |
Wu, Zhongke | Beijing Normal University |
Fu, Yan | Beijing Normal University |
Zheng, Xuan | Beijing Normal University |
Wang, Xingce | Beijing Normal University |
Keywords: Virtual/Augmented/Mixed Reality, Virtual and Augmented Reality Systems, Wearable Computing
Abstract: Traditional sign language teaching methods face challenges such as limited feedback and diverse learning scenarios. Although 2D resources lack reality sences, classroom teaching is constrained by a scarcity of teacher and methods based on VR and AR have relatively primitive interaction feedback mechanisms. This study proposes an innovative teaching model that uses real-time monocular vision and mixed reality technology. First, we introduce an improved hand-posture reconstruction method to achieve sign language semantic retention and real-time feedback. Second, a ternary system evaluation algorithm is proposed for a comprehensive assessment, maintaining good consistency with experts in sign language. Furthermore, we use mixed reality technology to construct a scenario-based 3D sign language classroom and explore the user experience of scenario teaching. Overall, this paper presents a novel teaching method that provides an immersive learning experience, advanced posture reconstruction, and precise feedback, achieving positive feedback on user experience and learning effectiveness.
|
|
17:50-18:10, Paper MoDT3.2 | |
EOSAD: An Event-Oriented Physiological and Behavioral Social Anxiety Annotation Dataset in Virtual Reality |
|
Hou, Qinyao | Southeast University |
Ding, Ding | Southeast University |
Yang, Jiaju | Southeast University |
Tu, Jiahang | Southeast University |
Keywords: Virtual/Augmented/Mixed Reality, Human-Computer Interaction, Affective Computing
Abstract: Social phobia is a prevalent mental health condition characterized by overwhelming fear on apprehension of social situations, often leading individuals to avoid such encounters. It is a very common mental disorder among adolescents and can have a significant bad effect on their social skills and well-beings if not identified and intervened early. Existing methods to assess social anxiety level rely on self-report scales or clinical diagnosis, both of which may face challenges like subjective judgments and patient resistance. To address this, we applied Virtual Reality technology (VR) and introduced event-oriented social anxiety dataset (EOSAD) to serve for fine-grained social anxiety level detected model. A case study (N = 44) was conducted where participants wore a Vive Pro Eye HMD to experience two types of self constructed task event scenes and labeled their social anxiety level accordingly. During the experiment, relevant signals and social anxiety scores were collected, including (1) behavioral signals (head and eye movements) (2) physiological signals (heart rate (HR), electrodermal activity (EDA), blood volume pulse (BVP), etc.) (3) discrete social anxiety self-reported labels (4) pre and post-study evaluation questionnaires. We first verified participants' mean social anxiety labels, and furthermore ran baseline classification experiments, where GRU model with 0.5s signal segment showed best accuracy:83.08% for 5-class classification; 92.84% for binary classification. It was also found that either behavioral data or physiological signals alone could achieve satisfactory accuracies (85.64% and 91.02% for binary classification) while the combined achieved slightly higher(92.84% for binary classification).
|
|
18:10-18:30, Paper MoDT3.3 | |
The Effect of Rhythmic Auditory Cues on Cognitive Resource Allocation During Gait Initiation: An EEG Study (I) |
|
Meng, Tao | Wenzhou Medical University,Cixi Biomedical Research Insti |
Wu, Jiajia | Cixi Biomedical Research Institute, Wenzhou Medical University |
Zhou, Huilin | Ningbo Institute of Materials Technology and Engineering, Chines |
Zuo, Guokun | Ningbo Institute of Materials Engineering and Technology, Chinea |
Shi, Changcheng | Ningbo Institute of Materials Technology and Engineering, Chines |
Keywords: Biometrics and Applications,, Virtual/Augmented/Mixed Reality, Medical Informatics
Abstract: Rhythmic auditory stimulation (RAS) has been shown to be beneficial for the gait initiation (GI) in Parkinson's disease (PD) patients with freezing of gait. However, the underlying neurophysiological mechanisms are still poorly understood. In this study, we utilized electroencephalography (EEG) and surface electromyography (sEMG) to investigate differences in neural and muscular activity during the gait initiation phase of 20 healthy participants, under conditions with and without RAS. We analyzed contingent negative variation (CNV) amplitude in EEG during gait initiation and the behaviorally relevant onset time of sEMG initiation, primarily from a time-domain analysis perspective. Additionally, we employed a two-tailed t-test to compare the CNV amplitude and sEMG onset time between the two conditions (with RAS vs. without RAS). The results revealed that CNV was induced in the middle pre-frontal, frontal, central, and temporal regions under both rhythmic and non-rhythmic auditory stimulation (Non_RAS) conditions. Significant differences in CNV amplitude were observed in the temporal region between conditions with and without RAS, with higher CNV amplitudes observed under RAS conditions. Additionally, sEMG data indicated that the onset of gait was earlier in participants exposed to RAS.
|
|
MoDT5 |
MR05 |
Adaptive Systems and Control 4 |
Regular Papers - SSE |
Chair: Veil, Carina | University of Stuttgart |
|
17:30-17:50, Paper MoDT5.1 | |
Steady-State Analysis of a Competitive Age-Structured Population System with Two Inputs |
|
Veil, Carina | University of Stuttgart |
Arnold, Eckhard | University of Stuttgart |
Sawodny, Oliver | University of Stuttgart |
Keywords: System Modeling and Control
Abstract: Age-structured population models are an intuitive way to model competing bacteria populations in bioreactors, and are of interest for biotechnology processes, wastewater treatment, or epidemics. Such multi-population models with competition terms result in coupled partial differential equa- tions with integral terms and non-local boundary conditions. They represent the population density of each species at a specific time and age. In this work, a model to represent two intra- and interspecific competing populations in a bioreactor is introduced. It has two system inputs, namely the dilution rate with nutrient solution and a recycling rate which introduces biomass in a steady-state from a second bioreactor. Adding a recycling rate to the multi-population models allows for influencing not only the entire biomass in the bioreactor but also the age distribution of the bacteria. In order to use population models to develop improved control concepts for such a cascaded bioreactor experiment, an extensive steady- state analysis is carried out. There exists an infinite number of steady-states of the system in dependence of the initial condition. Each choice of that initial condition lead to uniquely determined steady-state profiles, inputs and outputs. In a next step, a stabilization around these steady-states is necessary.
|
|
17:50-18:10, Paper MoDT5.2 | |
Adaptive Optimization Tracking Control for an Unmanned Aerial Vehicle with Disturbance Suppression |
|
Yang, Meiying | Shanghai Jiao Tong University |
Zhu, Hai | AMS |
Xia, Xingyu | National University of Defense Technology |
Liu, Zhe | Shanghai Jiao Tong University |
Yao, Wen | Defense Innovation Institute, Chinese Academy of Military Scien |
Keywords: Adaptive Systems, Control of Uncertain Systems, Robotic Systems
Abstract: Aiming at the tracking control for an unmanned aerial vehicle (UAV) with uncertainty and external disturbances, an adaptive optimization control method is proposed based on the backstepping method. The model uncertainty in the UAV is estimated using neural networks. Moreover, a disturbance observer is designed to estimate and counteract the disturbance in real-time. In addition, reinforcement learning (RL) is used to solve optimal problems and overcome the difficult issues of the Hamilton-Jacobi-Bellman (HJB) equation. In the controller design, dynamic surfaces are employed to avoid complex derivation issues in the virtual controller and improve operating efficiency. Furthermore, the stability of the UAV system is proved by the Lyapunov theory. Finally, the effectiveness and superiority of the proposed method are verified through numerical simulations and real-world experiments.
|
|
18:10-18:30, Paper MoDT5.3 | |
Empirical Study on Memory Allocation Patterns in GUI-Based Applications |
|
Beletti Ferreira, Alexandre | Federal Institute of Technology |
Rodrigues dos Santos, Caio Augusto | Federal University of Uberlandia |
Matias, Rivalino | Federal University of Uberlandia |
Keywords: System Architecture, System Modeling and Control
Abstract: The ubiquity of dynamic memory allocations in computer programs makes the comprehension of their prevalent patterns of major importance. Previous studies have discussed patterns of memory allocations in different categories of applications. In this paper, we investigate these patterns for GUI-based applications. We analyzed 16 real-world applications built on top of two widely adopted frameworks, GTK+ and Qt. The target applications were selected due to their similarities in terms of look and feel and functionalities. We found that most of their memory allocation patterns were compatible with prevalent patterns observed in non-GUI applications. Surprisingly, on average, Qt-based applications showed allocation sizes twice larger than applications using GTK+, which underscores the significant impact of framework selection on memory allocation sizes. Two distinct patterns of allocation paths were observed in GTK+ applications and Qt applications, which indicate how these applications and the related frameworks behave in terms of memory allocations.
|
|
MoDT6 |
MR06 |
Agent-Based and Autonomous Systems |
|
Chair: Yang, Sung-Chi | National Chung Cheng University |
|
17:30-17:50, Paper MoDT6.1 | |
State-Of-Charge Estimation of Supercapacitors for Reconffgurable Circuits |
|
Li, Heng | Central South University |
Zhou, ZiZao | Central South University |
Zhu, Ren | Central South University |
Peng, Hui | Central South University |
Keywords: Modeling of Autonomous Systems
Abstract: The State-of-Charge (SOC) estimation for supercapacitors has been thoroughly examined in the literature, while the majority of the research to far is concentrating on the SOC estimation of single supercapacitor units. Nevertheless, the system dynamics of the battery may shift to a different system when utilizing the recently suggested reconffgurable circuit, suggesting that the straightforward use of current SOC estimate techniques is not possible. In order to assess the battery’s state of charge (SOC), we use a switching systems technique in this paper. We establish the supercapacitor’s RC model with a reconffgurable circuit and carefully investigate the continuity of the state and observability of the switched system. Afterwards, we propose a switching observer and compare the performance of various observers, analyzing its convergence qualities.We compared the proposed observer with other observers through a hardware platform, and the experimental results proved the superiority of the proposed observer in SOC estimation.
|
|
18:10-18:30, Paper MoDT6.4 | |
A Traffic Sign Detection Technique Using Road Scene Images and GPS/GIS Information (I) |
|
Yang, Sung-Chi | National Chung Cheng University |
Lin, Huei-Yung | National Taipei University of Technology |
Yu-Hsiang, Fan | National Taipei University of Technology |
Shih-Han, Wei | National Taipei University of Technology |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, Robotic Systems
Abstract: With the advancement of computational intelligence, autonomous driving has become the future development trend of the automotive industry. Since the safety is commonly considered as the first priority of self-driving and driver assistance systems, the understanding of transportation infrastructure is an essential problem. In this paper, a technique for traffic sign detection and recognition is proposed. Different from the general image-based methods, we also incorporate the vehicle position and geographic information to improve the accuracy. Based on the approximate location of a traffic sign obtained from GPS and GIS, it can be used to increase the confidence level of network detection results. In the experiments, the training datasets are derived from Google Street View images and collected with an in-vehicle camera. The performance evaluation compared to the image-only methods has demonstrated the effectiveness of the proposed approach.
|
|
18:10-18:30, Paper MoDT6.5 | |
Comparison of Neural Network Models for Short-Term Load Forecasting (I) |
|
Surmon, Shinead | University of New South Wales |
Ahmad, Ahmad | University of New South Wales |
Xiao, Xun | University of Otago |
Mo, Huadong | University of New South Wales |
Keywords: System Modeling and Control, Modeling of Autonomous Systems, Decision Support Systems
Abstract: Balancing supply and demand is crucial for efficient energy distribution. To achieve it, accurate short-term electrical load forecasting is essential. This study investigates the applicability of various machine learning architectures for short-term load forecasting, using an NSW load dataset from the Australian Energy Market Operator. The key finding is the superior performance of a hybrid model, which integrates LSTM and GRU layers, on the NSW load dataset. This study demonstrates hybrid neural network models can significantly improve the accuracy and reliability of energy load predictions, thereby suggesting a viable pathway for enhancing future utility management practices.
|
|
MoDT7 |
MR07 |
Online - Brain-Machine Interfaces (BMIs) 2 |
|
Chair: Chen, Hanxin | Beijing University of Technology |
|
17:30-17:50, Paper MoDT7.1 | |
CB-YOLO: A Small Object Detection Algorithm for Industrial Scenarios |
|
Chen, Hao | Qilu University of Technology (Shandong Academy of Sciences), Ji |
Wu, Xiaoming | Qilu University of Technology, Shandong Computer Science Center |
Su, Zhanzhi | Qilu University of Technology (Shandong Academy of Sciences), Ji |
Zhao, Ying | Qilu University of Technology (Shandong Academy of Sciences), Ji |
Dong, Yunfeng | Qilu University of Technology (Shandong Academy of Sciences), Ji |
Liu, Xiangzhi | Shandong Computer Science Center (National Supercomputer Center |
Keywords: Artificial Social Intelligence, Image Processing and Pattern Recognition, Machine Vision
Abstract: In certain speciffc industrial scenarios, smoking and cellphone usage are strictly prohibited behaviors. In these scenarios, it is crucial to rapidly and accurately detect smoking and cell phone usage, and promptly issue warnings, to ensure industrial safety. The detection of prohibited behaviors using computer vision has gained attention from researchers with the development of artiffcial intelligence. However, detecting small objects like cigarettes and cell phones in complex backgrounds and varying angles poses a challenge due to their changing shapes. In this paper, we propose a detection model called CB-YOLO, which utilizes YOLOv7 as the baseline model, for detecting smoking and mobile phone usage behavior. Firstly, we propose a new channel space pyramid network called CSPPF that pools features at different scales and introduces scale focusing on improving the perceptual ability of the network to better handle targets at various scales and locations. Secondly, we propose an enhanced feature pyramid called AWBFPN. This introduces additional learnable weight parameters to improve the model’s ability to fuse multi-scale features effectively. Finally, we have also proposed a new loss function called SizeIoU. The experimental results demonstrate that our algorithm outperforms the baseline model. Speciffcally, it achieves a 3.9% improvement in Precision, a 2.1% improvement in Recall, a 6.4% improvement in mAP@0.5, and a 2% improvement in mAP@0.5:0.95 on the Phone check dataset. Similarly, on the Smoking dataset, our algorithm achieves a 3.3% improvement in Precision, a 1.9% improvement in Recall, a 1.7% improvement in mAP@0.5, and a 0.3% improvement in mAP@0.5:0.95.
|
|
17:50-18:10, Paper MoDT7.2 | |
Visual Object Tracker Based on Video Temporal-Spatial Features and Long-Term Memory |
|
Gao, Yongyang | Sichuan University |
Lan, Shiyong | Sichuan University |
Li, Piaoyang | Sichuan University |
Ma, Wei | Sichuan University |
Keywords: Deep Learning, Image Processing and Pattern Recognition
Abstract: High-performance video object tracking is pivotal for video comprehension and analysis. There exists evident temporal correlation information among consecutive video frames. Nevertheless, current methods fail to effectively leverage this temporal information, leading to inaccurate feature representation of visual targets and heightened risks of tracking failure. To tackle this issue, we introduce a novel transformer-based tracker, dubbed the Video Temporal-Spatial Features and Long-term Memory (TSFLM) tracker. Firstly, the Encoder sequentially integrates multiple self-attention modules to extract spatial and temporal features, respectively. Secondly, we design a novel continuous template update module capable of preserving the long-term memory of the target template. Thirdly, we employ the long-term memory template to further augment the feature representation of the input image (search frame). Finally, the tracking results are derived through the decoder. Extensive comparative experiments against baselines on multiple challenging benchmarks demonstrate that our tracker achieves state-of-the-art performance. The source codes will be accessible at https://github.com/SYLan2019/TSFLM-Tracker.
|
|
18:10-18:30, Paper MoDT7.3 | |
A Dual-Branch Structural Network for Human Pose Estimation Based on Millimeter Wave Radar |
|
Chen, Hanxin | Beijing University of Technology |
Kong, Dehui | Beijing University of Technology |
Li, Jinghua | Beijing University of Technology |
Yin, Baocai | Beijing University of Technology |
Keywords: Human-Computer Interaction, Human Perception in Multimedia
Abstract: Radar-based Human Pose Estimation (R-HPE) aims to localize the body joints of each individual in a given radar image. This is relevant for various applications such as action recognition, person re-identification, and human-object interaction. Unlike traditional RGB-based human pose estimation, radar-based human pose estimation can effectively preserve human privacy and remain stable under low-light conditions and darkness. However, research on radar-based human pose estimation is limited, and existing methods fail to adequately model radar features, resulting in lower accuracy of corresponding pose estimation algorithms. Therefore, this paper proposes a dual-branch structured network, which can extract dimension-independent features and dimension-dependent features separately and then combine them for more precise decision-making. This allows the network to learn richer and more diverse feature representations, thereby improving the quality of feature extraction. Meanwhile, a Multi-Dimensional Feature Fusion Network extracts more detailed feature representations. Furthermore, it is combined with a Transformer module to further enhance the model's ability to extract local features and global modeling capabilities, thereby improving the accuracy of human pose estimation. Extensive experiments conducted on the HuPR dataset demonstrate that our model outperforms existing state-of-the-art models in terms of human pose estimation performance.
|
|
MoDT8 |
MR08 |
Online - Deep Learning and Neural Networks |
|
Chair: Lin, Runfeng | South China Normal University |
|
17:30-17:50, Paper MoDT8.1 | |
DGRC: An Effective Fine-Tuning Framework for Distractor Generation in Chinese Multi-Choice Reading Comprehension |
|
Lin, Runfeng | South China Normal University |
Xu, Dacheng | South China Normal University |
Wang, Huijiang | Guangxi Normal University |
Chen, Zebiao | South China Normal University |
Wang, Yating | South China Normal University |
Liu, Shouqiang | South China Normal University |
Keywords: Application of Artificial Intelligence, AI and Applications, Computational Intelligence in Information
Abstract: When evaluating a learner's knowledge proficiency, the multiple-choice question is an efficient and widely used format in standardized tests. Nevertheless, generating these questions, particularly plausible distractors (incorrect options), poses a considerable challenge. Generally, the distractor generation can be classified into cloze-style distractor generation (CDG) and natural questions distractor generation (NQDG). In contrast to the CDG, utilizing pre-trained language models (PLMs) for NQDG presents three primary challenges: (1) PLMs are typically trained to generate "correct" content, like answers, while rarely trained to generate "plausible" content, like distractors; (2) PLMs often struggle to produce content that aligns well with specific knowledge and the style of exams; (3) NQDG necessitates the model to produce longer, context-sensitive, and question-relevant distractors. In this study, we introduce a fine-tuning framework named DGRC for NQDG in Chinese multi-choice reading comprehension from authentic examinations. DGRC comprises three major components: hard chain-of-thought, multi-task learning, and generation mask patterns. The experiment results demonstrate that DGRC significantly enhances generation performance, achieving a more than 2.5-fold improvement in BLEU scores.
|
|
17:50-18:10, Paper MoDT8.2 | |
CC-Net: Consistency Learning Combined with Contrastive Learning for Kidney Ultrasound Segmentation |
|
Luo, Yang | Wuhan University of Science and Technology |
Liu, Jun | Wuhan University of Science and Technology |
Ding, Mengqian | Wuhan University of Science and Technology |
Keywords: Deep Learning, Image Processing and Pattern Recognition
Abstract: Due to the scarcity of medical images with labels and the fact that medical image data is not easily accessible. Kidney ultrasound (KUS) images have speckle noise, low image quality, acoustic shadows, and other reasons, which pose significant challenges to the kidney ultrasound image segmentation task. To solve the above two problems, a novel consistency learning and contrastive learning-based kidney ultrasound image segmentation method (CC-Net) has been proposed. Specifically, the student-teacher model was adopted as the semi-supervised learning framework and uncertainty estimation was introduced. The Consensus Uncertainty Map (CUM) strategy was proposed to generate segmentation results for the teacher model when calculating uncertainty. To solve the problems of acoustic shadows and low image quality in kidney ultrasound images, a kidney ultrasound image data augmentation method (KUS-AUG) and a convolutional block (KUS-CONV) tailored to the characteristics of kidney ultrasound images were proposed. In addition, contrastive learning was implemented to enhance the encoder's ability to learn more semantic information, due to its inherent advantage in learning data distribution. Finally, a series of experiments were conducted on the ultrasound image dataset and the method achieved the best performance.
|
|
18:10-18:30, Paper MoDT8.3 | |
An Efficient Token Mixer Model for Sheet Metal Defect Detection |
|
Hao, Huijuan | Qilu University of Technology (Shandong Academy of Sciences) |
Zhu, Sijian | Qilu University of Technology (Shandong Academy of Sciences) |
Yi, ChangLe | Qilu University of Technology (Shandong Academy of Sciences) |
Chen, Yu | Qilu University of Technology (Shandong Academy of Sciences) |
Zhao, Hongge | Qilu University of Technology (Shandong Academy of Sciences) |
Feng, Yue | Qilu University of Technology(Shandong Academy of Sciences) |
Yang, Rong | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, Machine Vision, Machine Learning
Abstract: Defects such as scratches, patches, and cracks frequently occur during sheet metal production. However, the low detection accuracy and slow processing speed of industrial defect detection models significantly impede enterprise production efficiency. The aforementioned issues primarily manifest in three aspects. Firstly, the model complexity and computational overhead are substantial. Secondly, detecting small local defects poses a significant challenge. Thirdly, extracting global features, such as elongated scratches, proves to be difficult. To address these challenges, this paper introduces a novel network architecture called SATRNet. Firstly, within the model backbone, the STR module is devised. Through incorporation of the sparse self-attention method and the CNNs parallel vision Transformer model in the shallow layers, this module significantly enhances the model's capability to extract global features. Secondly, the SCATR module is designed in this paper. By substituting self-attention with the designed SCA soft attention as the token mixer, the module aims to enhance detection accuracy while reducing the number of parameters, thereby fundamentally addressing the problem of model complexity. Finally, this paper presents the GCD bottleneck convolution module. This module combines shallow and deep features, enabling the fusion of more information beneficial for detection, thereby achieving improved efficacy in capturing minute targets. Experiments demonstrate that SATRNet surpasses existing advanced models in detection accuracy on public datasets.
|
|
18:10-18:30, Paper MoDT8.4 | |
Fraud Detection for Financial Transactions: Leveraging Big Data Analytics and Machine Learning |
|
Zhang, Yi | Tsinghua University |
Chen, Yixuan | Tsinghua University |
Zhang, Kai | Tsinghua University |
Keywords: AI and Applications, Machine Learning, Big Data Computing,
Abstract: Recently, transaction fraud costs billions of dollars to card issuers. With the increase in fraud rates, it is important to establish a comprehensive monitoring mechanism for detecting abnormal accounts that conform to the characteristics of online fraudulent activities. This paper primarily utilizes statistical methods and data mining techniques to analyze new signals that can represent the latest fraudulent transaction methods based on account static information, account transaction data and fraud blacklists collected by the financial big data system. To safeguard the financial security of customers, interpretable machine learning algorithms are also proposed to construct fraud detection model. Eventually, extensive experiments demonstrate that the proposed framework achieves state-of-the-art results with the highest F1 score, Recall and Accuracy.
|
|
18:10-18:30, Paper MoDT8.5 | |
Adversarial Patrolling Using a Shepherding Approach |
|
Zhou, Jonathan | Royal Australian Navy |
El-Fiqi, Heba | UNSW Canberra |
Hussein, Aya | University of Canberra |
Keywords: Agent-Based Modeling, Swarm Intelligence
Abstract: Adversarial patrolling has traditionally been explored using graph-based methods or planning and optimisation approaches. These methods inherently require environment discretisation and are computationally expensive. This paper introduces a new perspective of patrolling for reactive agents using a modified form of shepherding, typically used for large-scale swarm control. Shepherding utilises the emergent attraction-repulsion behaviours of simple agents to generate dynamic interactions between two types of agents. This paper shows that these behaviours can be utilised to model the problem space of adversarial patrolling. The emergent behaviour observed in shepherding provides a basis for more complex patrolling and responses to attackers. This paper will cover the theoretical analysis of adversarial patrolling and shepherding and propose the modified model underpinning shepherding for adversarial patrolling. The defending agent utilises a simple look-ahead strategy to opt for the best behaviour combination required to complete the adversarial patrolling task. The results demonstrate that using appropriate combinations of behaviours enables the defender to keep the attackers away from areas of interest.
|
|
18:10-18:30, Paper MoDT8.6 | |
A Method of Ultrasonic Gesture Recognition Based on Attention Mechanism |
|
Yang, Jieming | Northeast Electric Power University |
Lin, Ying | Northeast Electric Power University |
Wu, Yun | Northeast Electric Power University |
Keywords: Human-Computer Interaction, Environmental Sensing,, Wearable Computing
Abstract: The focus of this research lies in wireless gesture recognition, a prominent model of human-computer interaction that has garnered significant attention in recent years. This study aims to achieve efficient gesture recognition on smartphones using ultrasonic signals via speakers and microphones. Firstly, the original one-dimensional audio sequence is transformed into a two-dimensional spectrogram through the data preprocessing module, which facilitates capturing gesture patterns and changes while unifying the data format to enhance subsequent processing operability. Secondly, an algorithm for contour extraction is devised to mitigate signal interference caused by multipath effects. This algorithm enhances feature representation while reducing feature dimensionality and improving model robustness in adapting to interference and changes in various environments. By incorporating the spatial attention mechanism into the CNN model, it allows the model to focus on key areas, eliminate distracting information more effectively, better understand gesture features and shapes, and achieve more accurate recognition results. In testing basic number gestures, our method achieved an accuracy rate of 94%-96%, demonstrating its effectiveness. This would further enrich the spectrum of human-computer interaction, enhance user experiences, and drive the advancement of technological innovation.
|
|
MoDT9 |
MR09 |
AIoT 2 |
|
Chair: Zhang, Leizhen | Old Dominion University |
|
17:30-17:50, Paper MoDT9.1 | |
Energy-Efficient and Latency-Optimized Computation Offloading with Improved MOEA for Industrial Internet of Things |
|
Zhai, Jiahui | Beijing University of Technology |
Bi, Jing | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Yang, Jinhong | CSSC Systems Engineering Research Institute |
Zhang, Jia | Southern Methodist University |
Zhou, Mengchu | New Jersey Institute of Technology |
Keywords: Cloud, IoT, and Robotics Integration, Evolutionary Computation, Swarm Intelligence
Abstract: The unprecedented prosperity of the industrial Internet of Things has thoroughly facilitated the transition from traditional manufacturing towards intelligent manufacturing. In industrial environments, resource-constrained industrial equipments (IEs) often fail to meet the diverse demands of numerous compute-intensive and latency-sensitive tasks. Mobile edge computing has emerged as an innovative paradigm for lower latency and energy consumption for IEs. However, computational offloading and coordinating of multiple IEs with diverse task types and multiple edge nodes in industrial environments poses challenges. To address this challenge, we propose a multi-task approach encompassing scientific and concurrent workflow tasks to achieve energy-efficient and latency-optimized computation offloading. Furthermore, this work designs an improved Quantum Multi-objective Grey wolf optimizer with Manta ray foraging and Associative learning (QMGMA) to optimize multi-task computation offloading. Comprehensive experiments demonstrate the superior efficiency and stability of QMAGA compared to state-of-the-art algorithms in balancing latency and energy consumption. QMAGA improves average inverse generation distance and average spacing by 37% and 31% on average than multi-objective grey wolf optimizer, nondominated sorting genetic algorithm II, and multi-objective multi-verse optimization, proving the convergence and diversity of its non-dominated solutions.
|
|
17:50-18:10, Paper MoDT9.2 | |
Resource Allocation and Trajectory Optimization in Unmanned Aerial Vehicle-Assisted Mobile Edge Computing |
|
Bi, Jing | Beijing University of Technology |
Cheng, Xiangshuai | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Niu, Siyu | Beijing University of Technology |
Zhai, Jiahui | Beijing University of Technology |
Keywords: Cloud, IoT, and Robotics Integration, Computational Intelligence in Information, Swarm Intelligence
Abstract: Edge computing offers a groundbreaking architecture for supplying computing, storage, and networking resources to propel the Internet of Things forward. By situating them at the network’s edge, this model makes computational power more accessible to users. If tasks are executed entirely at the edge, energy and resource constraints of edge nodes may lead to poor performance. Therefore, it is widely recognized that offloading certain tasks to cloud data centers (CDCs), which possess abundant execution resources, is advantageous. However, implementing CDCs is not widespread and lacks flexibility in isolated regions. This presents challenges and high costs for reliably completing tasks quickly. Consequently, employing more adaptable unmanned aerial vehicles (UAVs) as CDCs in specific scenarios is crucial. The work presents the idea of mobile edge computing supported by the UAV. By considering the needs of user services, we enhance the energy efficiency of the UAV by optimizing their trajectories, transmission power, and computational load distribution. Furthermore, the work introduces an improved algorithm called Genetic Simulatedannealing- based Particle Swarm Optimizer (GSPSO) to optimize the energy efficiency of the UAV. Experimental simulations show that regarding the energy efficiency of the UAV, GSPSO exhibits superior search efficiency, surpassing genetic algorithm, simulated annealing, and particle swarm optimization by 7.39%, 15.03%, and 27.93%, respectively.
|
|
18:10-18:30, Paper MoDT9.3 | |
Fairness-Aware Streaming Feature Selection with Causal Graphs (I) |
|
Zhang, Leizhen | Old Dominion University |
Lusi, Li | Old Dominion University |
Wu, Di | Southwest University |
Chen, Sheng | University of Louisiana |
He, Yi | William & Mary |
Keywords: Big Data Computing,, Application of Artificial Intelligence, Computational Intelligence in Information
Abstract: This paper proposes a new online feature selection approach with an awareness of group fairness. Its crux lies in the optimization of a tradeoff between accuracy and fairness of resultant models on the selected feature subset. The technical challenge of our setting is twofold: 1) streaming feature inputs, such that an informative feature may become obsolete or redundant for prediction if its information has been covered by other similar features that arrived prior to it, and 2) non-associational feature correlation, such that bias may be leaked from those seemingly admissible, non-protected features. To overcome this, we propose Streaming Feature Selection with Causal Fairness FSCF that builds two causal graphs egocentric to prediction label and protected feature, respectively, striving to model the complex correlation structure among streaming features, labels, and protected information. As such, bias can be eradicated from predictive modeling by removing those features being causally correlated with the protected feature yet independent to the labels. We theorize that the originally redundant features for prediction can later become admissible, when the learning accuracy is compromised by the large number of removed features (non-protected but can be used to reconstruct bias information). We benchmark FSCF on five datasets widely used in streaming feature research, and the results substantiate its performance superiority over six rival models in terms of efficiency and sparsity of feature selection and equalized odds of the resultant predictive models.
|
|
MoDT10 |
MR10 |
Agent-Based and Autonomous Systems 3 |
|
Chair: Kozma, Robert | University of Memphis, TN |
|
17:30-17:50, Paper MoDT10.1 | |
Visual Navigation by Fusing Object Semantic Feature |
|
Li, Weimin | Chongqing University |
Wu, Xing | Chongqing University |
Wang, Chengliang | Chongqing University |
He, Zhongshi | Chongqing University |
Wang, Peng | Southwest Hospital of Army Medical University |
Wang, Hongqian | Southwest Hospital of Army Medical University |
Keywords: Agent-Based Modeling, Deep Learning, Application of Artificial Intelligence
Abstract: 对象目标视觉导航的关键是学习环境对象之间的空间关系,并评估它们与目标对象的语义相关性。我们提出了一种基于深度强化学习的端到端视觉导航模型G2SNet,该模型由两个特征图和一个专门的融合特征网络组成:GloVe特征图(GFM)、Sbbox特征图(SFM)和GloVe融合网络(GNet)。GFM表示观测图像中所包含的物体的位置和语义信息,解决了观测图像中复杂的背景信息对目标识别造成干扰的问题。SFM 提供视野中的物体大小,以协助距离判断。GNet完全依靠网络学习来计算环境中对象之间的语义相关性和空间位置关系,使智能体具有更好的泛化能力。这允许学习 GFM 中对象之间的空间关系。在AI2-THOR上的实验证明了我们提出的3种新结构的有效性,四种已知情景的平均
|
|
17:50-18:10, Paper MoDT10.2 | |
Dynamic Subgoal Based Path Formation and Task Allocation: A NeuroFleets Approach to Scalable Swarm Robotics |
|
Peter, Robinroy | Skolkovo Institute of Science and Technology |
Ratnabala, Lavanya | Skoltech |
Andrew Charles, Eugene Yugarajah | University of Jaffna |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Swarm Intelligence, Agent-Based Modeling, Artificial Social Intelligence
Abstract: This paper addresses the challenges of exploration and navigation in unknown environments from the perspective of evolutionary swarm robotics. A key focus is on path formation, which is essential for enabling cooperative swarm robots to navigate effectively. We designed the task allocation and path formation process based on a finite state machine, ensuring systematic decision-making and efficient state transitions. The approach is decentralized, allowing each robot to make decisions independently based on local information, which enhances scalability and robustness. We present a novel subgoal-based path formation method that establishes paths between locations by leveraging visually connected subgoals. Simulation experiments conducted in the Argos simulator show that this method successfully forms paths in the majority of trials. However, inter-collision (traffic) among numerous robots during path formation can negatively impact performance. To address this issue, we propose a task allocation strategy that uses local communication protocols and light signal-based communication to manage robot deployment. This strategy assesses the distance between points and determines the optimal number of robots needed for the path formation task, thereby reducing unnecessary exploration and traffic congestion. The performance of both the subgoal-based path formation method and the task allocation strategy is evaluated by comparing the path length, time, and resource usage against the A* algorithm. Simulation results demonstrate the effectiveness of our approach, highlighting its scalability, robustness, and fault tolerance.
|
|
18:10-18:30, Paper MoDT10.3 | |
Cinematic Theory of Cognition and Consciousness - Implications for Efficient Human-Computer Interactions (I) |
|
Kozma, Robert | University of Memphis, TN |
Baars, Bernard | Society for MindBrain Sciences |
Geld, Natalie | MedNeuro, Inc |
Keywords: Neural Networks and their Applications, Agent-Based Modeling, Application of Artificial Intelligence
Abstract: Recent advances in human brain monitoring provide increasingly detailed insights into the spatio-temporal neurodynamic processes contributing to cognition and consciousness. The view that cognition is not a smooth temporal process, rather it is a sequence of metastable states which have a duration of around 100 ms, becomes increasingly recognized. The cinematic theory of cognition is a potential approach to interpret these experimental findings. Recent extensions explore the possible link between the cinematic sequence and conscious broadcast events postulated by the Global Workspace Theory (GWT). The present work summarizes the existing arguments on this issue and elaborates on potential implications of this theory on the development of novel technologies for human-centered and efficient human-computer interaction.
|
|
MoDT11 |
MR11 |
AI Applications and AIoT |
|
Chair: Miyawaki, Tomoya | Kyushu University |
|
17:30-17:50, Paper MoDT11.1 | |
Joint Computing Offloading and Resource Allocation in MEC-Enabled IoT: A Diffusion-Based Reinforcement Learning Approach |
|
Cao, Huimin | East China Normal University |
Xiao, Bo | East China Normal University, Software Engineering Institute |
Keywords: Cloud, IoT, and Robotics Integration, Application of Artificial Intelligence
Abstract: The integration of the Internet of Things (IoT) with mobile edge computing (MEC) has come out to be a promising solution to address the requirements of high computing capabilities and low latency services, enabling user equipments(UE) to migrate the computation of tasks onto edge servers. This paper focuses on optimizing the performance of MEC-enabled IoT system by formulating a joint computing offloading and resource allocation problem. The objective is to minimize the total delay of the system consisting of multiple servers and multiple users. The denoising network of a diffusion model with capabilities of generation can be trained to obtain optimal solution given the changed environment conditions. Therefore, we propose the diffusion-based deep deterministic policy gradient (DiffDDPG) algorithm which utilizes a diffusion model as the policy to learn optimal decisions jointly. Simulation results exhibits the superior performance of the DiffDDPG algorithm.
|
|
17:50-18:10, Paper MoDT11.2 | |
Wolfe: Wifi Based Object Recognition Framework Using Multiple Features |
|
Zhang, Zheng | Inner Mongolia University |
Zhang, Junxing | Inner Mongolia University |
Keywords: AIoT, AI and Applications, Application of Artificial Intelligence
Abstract: Existing work on stationary object recognition using WiFi CSI (Channel State Information) only leverages single feature such as profile, category, etc. However, in many situations, objects have more than one feature. Multiple features can better reflect characteristics of objects and make them easier to recognize. This paper takes the first step toward multiple feature recognition using WiFi CSI. We propose WOLFE, a WiFi based object recognition framework using multiple features. Our framework matches features and labels by decoding CSI data and multi-label matrices into Gaussian latent spaces and aligning their distributions. By resampling in this Gaussian latent space, we can restore the corresponding label information from the samples thereby making recognition. Our framework uses different pipelines during training and inference stages to achieve end-to-end recognition. In our experiments, we collected two indoor small-scale static object datasets. WOLFE achieves recognition accuracy as high as 84.6% and 82.87% respectively. We also considered the scenarios of multiple objects and cross-domain to verify the universality of our framework. The results show that WOLFE has more than 79% accuracy in multiple objects scenario and in cross-domain experiment, accuracy is around 80% which is decreased by less than 4% in their origin domains.
|
|
18:10-18:30, Paper MoDT11.3 | |
Development of Dementia Care Training System Using AR and Large Language Model (I) |
|
Miyawaki, Tomoya | Kyushu University |
Nishiura, Yuki | Kyushu University |
Fukuda, Ryouta | Kyushu University |
Nakashima, Kazuto | Kyushu University |
Kurazume, Ryo | Kyushu University |
Keywords: Application of Artificial Intelligence, AI and Applications, Cloud, IoT, and Robotics Integration
Abstract: We have developed HEARTS, a dementia care training system using augmented reality based on Humanitude. Humanitude is a multimodal comprehensive care technique for dementia, and has attracted attention as a method to reduce the burden on both caregivers and patients. However, the HEARTS developed so far could not evaluate "speaking" skills based on the content of conversations among "seeing," "touching," and "speaking," all of which are fundamental skills in Humanitude. Therefore, we attempted a new quantitative evaluation of trainees' "speaking" skills by estimating the emotional value of conversational content using GPT 4 named HEARTS 5. A survey of caregivers was conducted using the developed system and was well received by the participants. We also developed a HEARTS 5 conversational version based on HEARTS 5, with the addition of GPT 4 conversation generation and Azure Text-to-Speech.
|
|
MoDT12 |
MR12 |
Haptic and Human-Computer Interaction 4 |
Special Sessions: HMS |
Chair: Usai, Marcel | Fraunhofer FKIE |
|
17:30-17:50, Paper MoDT12.1 | |
Reacting on Human Stubbornness in Human-Machine Trajectory Planning (I) |
|
Schneider, Julian | Karlsruhe Institute of Technology |
Straky, Niels | Karlsruhe Institute of Technology |
Meyer, Simon | Karlsruhe Institute of Technology |
Varga, Balint | Karlsruhe Institute of Technology (KIT), Campus South |
Hohmann, Sören | KIT |
Keywords: Human-Machine Interaction, Human-Machine Cooperation and Systems, Haptic Systems
Abstract: In this paper, a method for a cooperative trajectory planning between a human and an automation is extended by a behavioral model of the human. This model can characterize the stubbornness of the human, which measures how strong the human adheres to his preferred trajectory. Accordingly, a static model is introduced indicating a link between the force in haptically coupled human-robot interactions and humans’s stubbornness. The introduced stubbornness parameter enables an application-independent reaction of the automation for the cooperative trajectory planning. Simulation results in the context of human-machine cooperation in a care application show that the proposed behavioral model can quantitatively estimate the stubbornness of the interacting human, enabling a more targeted adaptation of the automation to the human behavior.
|
|
17:50-18:10, Paper MoDT12.2 | |
Pattern Handler: Integrating Real World and Virtual Models of Human Systems Patterns to Regulate the Control Distribution in a Cooperative Automated Driving Task (I) |
|
Usai, Marcel | Fraunhofer FKIE |
Mandischer, Nils | University of Augsburg |
Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Keywords: Human-Machine Cooperation and Systems, Human-Machine Interaction, Shared Control
Abstract: To reach intuitive control of partially and highly automated machines, a smooth cooperation between both, human and machine, is necessary. A natural way to design cooperation is to use the structures of mental models already established in our minds by human-human or human-animal cooperation. These cooperation designs follow certain patterns to provide solutions in a variety of scenarios. By using interaction patterns, we form and trigger mental models in the human mind, which already uses patterns in a similar way. This paper provides a description of the structure of cooperation patterns. An exemplary pattern for the use case of control transition between a human and a driving automation is modelled based on data of an experiment of human drivers reacting to takeover requests. To use the patterns in human-machine cooperation systems, the novel concept of a pattern handler is introduced. The pattern handler can be instantiated in software and hardware, and matches the human behavior, automation actions, and environment data to cooperation patterns. Within the design phase, it helps with identifying and correcting flaws within the design.
|
|
18:10-18:30, Paper MoDT12.3 | |
Should, Want, Can, Will, Do and Be Accountable: Human-Machine and Human-AI Patterns for Integrating the Real World, Virtual Models and Society by Shared Control and Cooperative Systems (I) |
|
Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Usai, Marcel | Fraunhofer FKIE |
Mandischer, Nils | University of Augsburg |
Baltzer, Marcel Caspar Attila | Fraunhofer FKIE |
Saito, Yuichi | University of Tsukuba |
Pacaux-Lemoine, Marie-Pierre | Lamih - Cnrs Umr 8201 |
Keywords: Human-Machine Cooperation and Systems, Human-Machine Interaction, Shared Control
Abstract: Machines, e.g. empowered by AI and based on virtual models, can help to improve the quality of life. To exploit this potential and also integrate this with the real world and society, cooperation and teaming of these machines with humans, and with societies is crucial. Human-Machine Patterns can be a key concept to analyze, understand, design, engineer and evaluate the delicate interplay of humans and machines. Key issues here are to understand in which situations which agents should do, want to do, can do, will do and finally actually do which actions, and who then is accountable. This overview article is intended as an introduction into the special session on Shared and Cooperative Control, especially on patterns and models for controllability and resilience. It is a direct follow-up on the 2022's special session and overview article (which is also available on IEEE Xplore). It introduces the topic of shared and cooperative control of human-machine and human-AI systems, especially in the light of the new advances in AI technology. This paper gives a short overview on the state of research on interaction patterns, controllability and resilience, before it focuses on the fundamental aspects of which actor should do, wants to do, can do, does and will be accountable for a pattern, sub-pattern or action within a pattern. Examples of what can be achieved with this basic architecture are given, e.g. for the recognition of intent or for the support by assistant systems, using the automotive domain as a first application example.
|
| |