| |
Last updated on September 30, 2024. This conference program is tentative and subject to change
Technical Program for Tuesday October 8, 2024
|
TuAT1 |
MR01 |
Cognitive and Affective Computing 1 |
Regular Papers - Cybernetics |
Chair: Liping, Wang | Zhejiang University of Technology |
|
08:45-09:05, Paper TuAT1.3 | |
Parquet-Based CTR Model Training in Production Environment |
|
Liu, Zhibing | Institute of Information Engineering, CAS; School of Cyber Secur |
Guo, Jinrong | JD.com |
Zhou, Biyu | Institute of Information Engineering, Chinese Academy of Sciences |
Xiaokun, Zhu | JD.com |
Yongjun, Bao | JD.com |
Han, Jizhong | Institute of Information Engineering, Chinese Academy of Science |
Hu, Songlin | Institute of Information Engineering, Chinese Academy of Sciences |
Keywords: Deep Learning, Neural Networks and their Applications, Application of Artificial Intelligence
Abstract: CTR model has played an important role in modern recommendation systems. Most of the recommendation models in industrial scenario are trained by TensorFlow. However, we observed that, TFRecord, the native k-v data format in TensorFlow, is not the best choice for CTR training. Those keys take up to 54% of the storage space in TFRecord formatted training data. To overcome this, we introduce Apache Parquet, a column-oriented data format, into CTR tasks to improve spatial efficiency. Besides, to use in production environment, we further give some high performance implementations of Parquet training scheme. Firstly, GPU data preprocessing method is adopted in replace of original Spark based solution to generate Parquet training data and accelerate data preprocessing. Secondly, we modify data loader in TensorFlow to consume the Parquet training data with high efficiency. Experimental results show that, the size of preprocessed Criteo dataset is 95.29% smaller in comparison to TFRecord and the data preprocessing time also reduces 99.6%. Without any model performance damage, we speed up the training process by 1.45x. Our scheme has applied to our internal business and has obtained similar performance benefits.
|
|
09:05-09:25, Paper TuAT1.4 | |
A Multi-Scale Qiantang River Tide Prediction Model Based on Multi-Period Decoupling |
|
Liping, Wang | Zhejiang University of Technology |
Jin, Hao | Zhejiang University of Technology |
Qicang, Qiu | Zhejiang Lab |
Hui, Wang | Zhejiang University of Technology |
Keywords: Neural Networks and their Applications, AI and Applications, Application of Artificial Intelligence
Abstract: To enhance the accuracy of tidal bore height prediction in the Qiantang River, this paper addresses the limitations of one-dimensional, single-scale time series in terms of representational capacity by proposing an MPCSM model that integrates multi-period decomposition with cross-scale fusion for tide height forecasting. The model initially employs the Maximum Overlap Discrete Wavelet Transform to decompose the original time series data into multiple periodic components. Subsequently, it leverages multi-frequency channel attention and gating mechanisms to ascertain the significance of each frequency component and optimize prediction weights. For prediction, cross-scale iterative forecasting is utilized to capture both long-term and short-term characteristics of tidal height data, accompanied by a designed loss function computation strategy that adapts to cross-scale prediction errors. At the Qiantang River Zhakou station, compared to current mainstream prediction models, the average absolute error across different forecast time spans has been reduced by 9.26%. The research findings can serve as a reference for tidal bore prediction in the Qiantang River and other similar basins.
|
|
09:25-09:45, Paper TuAT1.5 | |
Enhancing Session-Based Recommendation Via Inter-Session Similar Intent Modeling and Graph Neural Networks |
|
Li, Yunhan | Inner Mongolia University |
An, Chunyan | Inner Mongolia University |
Yang, Conghao | Inner Mongolia University |
Wang, Mingyuan | Inner Mongolia University |
Keywords: Neural Networks and their Applications, AI and Applications, Deep Learning
Abstract: Session-based recommendation (SBR) is a challenging task that aims to make item recommendations based on anonymized user session data. Mainstream SBR efforts focus on modeling information within a session and do not use information from other sessions. Although some works try to use other session information, there are still many limitations, and how to model other session information is still a challenging task. To overcome these limitations, we propose a new method for learning similar intentions between sessions, aiming to better model the recommendation information contained in other sessions. Specifically, we contribute a new model named ISIM-GNN that learns and integrates three levels of information simultaneously: (i) In the intra-session representation learning layer, we represent the session as a session graph and model it using a gated graph neural network. (ii) In the global item embedding learning layer, We use the graph attention mechanism to propagate and aggregate relevant item information from other sessions on the global graph. (iii) In the inter-session similar intent learning layer, we employ both "hard similarity" and "soft similarity" to select similar sessions, and use the attention mechanism to conduct session-level aggregation on the selected similar sessions to make better use of the inter-session collaboration information. Experiments on three real-world datasets show a significant performance improvement of our approach compared to state-of-the-art work.
|
|
TuAT2 |
MR02 |
Deep Learning and Neural Networks 4 |
Regular Papers - Cybernetics |
Chair: Yang, Yisheng | Xiamen University Malaysia |
|
08:45-09:05, Paper TuAT2.3 | |
AutoASD: An Automated Architecture Search for Detecting Insidious Malicious Traffic Behaviour in APT Attacks with Assorted Features |
|
Liu, Xinyu | Sichuan University |
Zhong, Zhentian | Sichuan University |
Li, Xiaohui | Sichuan University |
Xiang, Huifang | Sichuan University |
Keywords: Deep Learning, Transfer Learning, Machine Learning
Abstract: It has been acknowledged that the risks posed by Advanced Persistent Threats (APTs) are critical. These attacks can allow cybercriminals to remotely manipulate infected devices and steal sensitive data. To effectively combat APT attacks, it is crucial to employ multidimensional analysis techniques that can predict their impact and detect lateral infiltration behavior. This paper presents an approach called AutoASD, which utilizes Neural Architecture Search (NAS) Deep learning (DL) and Transfer Learning (TL) to identify various types of malicious traffic in APT attacks. AutoASD analyzes data at multiple granularities to classify different types of malware traffic and enhance classification accuracy. It leverages feature extraction and a pre-trained high-performance backbone network as the seed network, and employs parameter remapping to adjust the depth, width and kernel to create a super network. The aim of using NAS is to improve the real-time and accuracy of the system. In experiments, the effectiveness of AutoASD was verified using MobileNetV2 and demonstrated superior performance in APT malicious traffic classification, particularly for attacks with small sample sizes.
|
|
09:05-09:25, Paper TuAT2.4 | |
EDAW: Enhanced Knowledge Distillation and Adaptive Pseudo Label Weights for Continual Named Entity Recognition |
|
Sheng, Yubin | Central South University |
Zhang, Zuping | Central South University |
Tang, Panrui | Central South University |
Huang, Bo | Central South University |
Xiao, Yao | XinJiang University |
Keywords: Knowledge Acquisition, Neural Networks and their Applications, Deep Learning
Abstract: Continual Learning for Named Entity Recognition (CL-NER) is designed to train models capable of adapting to evolving data by continuously introducing new entity types. This approach is crucial in dynamic environments where data evolves, such as social media, healthcare, and legal documents, necessitating the model to retain the memory of previously learned entity types while learning to identify new ones. However, due to the neural network's tendency to acquire new knowledge and forget old knowledge in continual learning and the unique non-entity type annotations in NER tasks, CL-NER faces severe catastrophic forgetting and semantic drift issues. In this paper, we propose Enhanced Knowledge Distillation and Entropy-based Adaptive Pseudo Label Weights (EDAW) to address the catastrophic forgetting and semantic drift issues in CL-NER. Specifically, we develop an enhanced knowledge distillation method that combines Kullback-Leibler divergence and feature cosine discrepancy. This method effectively minimizes the variance in output probability distributions and aligns the internal feature spaces between new and old models, thus reducing catastrophic forgetting. Additionally, we propose an entropy-based adaptive pseudo label weight method that allows the model to assign different weights to pseudo labels with varying certainties during training, effectively alleviating semantic drift and error accumulation caused by erroneous re-labeling of pseudo labels. Notably, this study pioneers the inclusion of a Chinese dataset in CL-NER, enhancing the model's robustness and demonstrating its efficacy in a multilingual context. Experiments on fourteen CL-NER settings across four public NER datasets show that EDAW improves average Micro-F1 and Macro-F1 scores by 3.44% and 3.72%, respectively, over existing state-of-the-art(SOTA) methods. We make our code available at: https://github.com/livosr/EDAW/tree/master
|
|
09:25-09:45, Paper TuAT2.5 | |
Evolutionary Neural Architecture Search for 3D Point Cloud Analysis |
|
Yang, Yisheng | Xiamen University Malaysia |
Du, Guodong | Harbin Institute of Technology, Shen Zhen |
Toa, Chean Khim | Xiamen University Malaysia |
Tang, Ho-Kin | Harbin Institute of Technology (Shenzhen) |
Goh, Sim Kuan | Xiamen University Malaysia |
Keywords: Multimedia Computation, Neural Networks and their Applications, Computational Intelligence
Abstract: Neural architecture search (NAS) automates neural network design by using optimization algorithms to navigate architecture spaces, reducing the burden of manual architecture design. While NAS has achieved success, applying it to emerging domains, such as analyzing unstructured 3D point clouds, remains underexplored due to the data lying in non-Euclidean spaces, unlike images. This paper presents Success-History-based Self-adaptive Differential Evolution with a Joint Point Interaction Dimension Search (SHSADE-PIDS), an evolutionary NAS framework that encodes discrete deep neural network architectures to continuous spaces and performs searches in the continuous spaces for efficient point cloud neural architectures. Comprehensive experiments on challenging 3D segmentation and classification benchmarks demonstrate SHSADE-PIDS's capabilities. It discovered highly efficient architectures with higher accuracy, significantly advancing prior NAS techniques. For segmentation on SemanticKITTI, SHSADE-PIDS attained 64.51% mean IoU using only 0.55M parameters and 4.5GMACs, reducing overhead by over 22-26X versus other top methods. For ModelNet40 classification, it achieved 93.4% accuracy with just 1.31M parameters, surpassing larger models. SHSADE-PIDS provided valuable insights into bridging evolutionary algorithms with neural architecture optimization, particularly for emerging frontiers like point cloud learning.
|
|
TuAT3 |
MR03 |
Autonomous Systems and Robotics |
|
Chair: Disimino, Giuseppe | APPLICA Srl |
|
08:05-08:25, Paper TuAT3.1 | |
Matching Input and Output Devices and Physical Disabilities for Human-Robot Workstations |
|
Weidemann, Carlo Benedikt | RWTH Aachen University |
Mandischer, Nils | University of Augsburg |
Corves, Burkhard | RWTH Aachen University |
Keywords: Human-Collaborative Robotics, Design Methods, Assistive Technology
Abstract: As labor shortage is rising at an alarming rate, it is imperative to enable all people to work, particularly people with disabilities and elderly people. Robots are often used as universal tool to assist people with disabilities. However, for such human-robot workstations universal design fails. We mitigate the challenges of selecting an individualized set of input and output devices by matching devices required by the work process and individual disabilities adhering to the Convention on the Rights of Persons with Disabilities passed by the United Nations. The objective is to facilitate economically viable workstations with just the required devices, hence, lowering overall cost of corporate inclusion and during redesign of workplaces. Our work focuses on developing an efficient approach to filter input and output devices based on a person's disabilities, resulting in a tailored list of usable devices. The methodology enables an automated assessment of devices compatible with specific disabilities defined in International Classification of Functioning, Disability and Health. In a mock-up, we showcase the synthesis of input and output devices from disabilities, thereby providing a practical tool for selecting devices for individuals with disabilities.
|
|
08:25-08:45, Paper TuAT3.2 | |
Design and Implementation of a Cobot Arm System for Ladder Stitch (I) |
|
Disimino, Giuseppe | APPLICA Srl |
Mangini, Agostino Marcello | Polytechnic of Bari |
Fanti, Maria Pia | Polytecnic of Bari, Italy |
Keywords: Human-Collaborative Robotics, Design Methods
Abstract: While automation is widespread in tailoring, high- end fashion brands still rely on skilled manual labor for intri- cate stitching. However, finding skilled workers is challenging due to a lack of new talent entering the field. The stitches required both speed and precision. In response to the declining availability of skilled artisans, this study explores leveraging cobot technology to replicate traditional manual techniques. The research investigates the development and implementation of a system integrating cobots to execute precise stitching tasks. Key aspects include designing infrastructure to support cobots and implementing advanced techniques for fabric manipulation and needle guidance. The study also examines the adaptation of cobot technology to replicate the ”Ladder Stitch” technique, traditionally performed by skilled artisans. By blending tra- ditional craftsmanship with modern technology, this research aims to address the shortage of skilled labor in the tailoring industry.
|
|
08:45-09:05, Paper TuAT3.3 | |
Exploring Capability-Based Control Distributions of Human-Robot Teams through Capability Deltas: Formalization and Implications (I) |
|
Mandischer, Nils | University of Augsburg |
Usai, Marcel | Fraunhofer FKIE |
Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Mikelsons, Lars | University of Augsburg |
Keywords: Human-Collaborative Robotics, Assistive Technology, Shared Control
Abstract: The implicit assumption that human and autonomous agents have certain capabilities is omnipresent in modern teaming concepts. However, none formalize these capabilities in a flexible and quantifiable way. In this paper, we propose Capability Deltas, which establish a quantifiable source to craft autonomous assistance systems in which one agent takes the leader and the other the supporter role. We deduct the quantification of human capabilities based on an established assessment and documentation procedure from occupational inclusion of people with disabilities. This allows us to quantify the delta, or gap, between a team’s current capability and a requirement established by a work process. The concept is then extended to the multi-dimensional capability space, which then allows to formalize compensation behavior and assess required actions by the autonomous agent.
|
|
TuAT5 |
MR05 |
Autonomous and Intelligent Vehicles 1 |
|
Chair: Kshetrimayum, Satchidanand | National Taipei University of Technology |
|
08:45-09:05, Paper TuAT5.3 | |
Trajectory Planning for UAV Transportation Systems Using RRT*-Informed NMPC |
|
Kang, Junjie | York University |
Shan, Jinjun | York University |
Keywords: Intelligent Transportation Systems, Autonomous Vehicle, Robotic Systems
Abstract: This paper presents a novel trajectory planning approach for two typical aerial transportation systems: UAV-slung-load and flying inverted pendulum. By integrating Rapidly-exploring Random Trees* (RRT*) into Nonlinear Model Predictive Control (NMPC), the proposed method enhances motion planning, enabling effective navigation in complex environments while ensuring stability and safety. Simulation results demonstrate the approach's capability to overcome local minima and generate feasible trajectories, highlighting its potential to advance trajectory planning in UAV transportation systems.
|
|
09:05-09:25, Paper TuAT5.4 | |
Attention-Based Few-Shot Food Classification Using Prototypical Networks (I) |
|
Kshetrimayum, Satchidanand | National Taipei University of Technology |
Huang, Yo-Ping | National Taipei University of Technology |
Keywords: Intelligent Transportation Systems, Consumer and Industrial Applications, Manufacturing Automation and Systems
Abstract: In the era of rapidly advancing technology, food classification has emerged as a pivotal application across various domains including health monitoring, dietary assessment, and culinary innovation. However, efficiently categorizing food items remains a challenge, particularly in scenarios with limited labeled data. This paper introduces a novel approach for few-shot food classification using Prototypical Networks with ResNet-50 and an attention mechanism as embedding network. Leveraging the inherent capability of Prototypical Networks to learn from scarce examples, our method demonstrates exceptional adaptability and accuracy in classifying food items. Through extensive experimentation on the Food-101 dataset, employing various CNN architectures, our findings underscore the effectiveness of our approach. In particular, ResNet-50 integrated with the attention mechanism surpasses other architectures, achieving superior classification accuracies of 91.5% and 95.2% for 1-shot and 5-shot learning scenarios, respectively. This integrated approach showcases the potential of Prototypical Networks in addressing the challenges of limited labeled data in food classification tasks, marking a significant advancement in the field.
|
|
09:25-09:45, Paper TuAT5.5 | |
Weighted Fuzzy Rough Sets Feature Selection for High Dimensional Classification Problems (I) |
|
Khabusi, Simon Peter | National Taipei University of Technology |
Huang, Yo-Ping | National Taipei University of Technology |
Vu, Van Phong | Ho Chi Minh City University of Technology and Education |
Keywords: Intelligent Transportation Systems, Decision Support Systems, System Modeling and Control
Abstract: Feature selection holds significant importance in knowledge mining as it plays a pivotal role in selecting and preserving the most informative features within a dataset while discarding irrelevant, redundant, or noisy attributes. This process contributes to enhancing model performance, reducing computational complexity, and refining interpretability, thus facilitating more accurate and efficient data analysis. In high-dimensional datasets, the necessity for feature selection becomes more pronounced due to the heightened risk of encountering the curse of dimensionality. Therefore, this study proposes a weighted fuzzy rough quickreduct (FRQR) feature selection approach employing feature weights to handle the equal situation problem inherent in FRQR. The proposed method is evaluated on ten publicly available datasets with feature sizes ranging from 2000 to 15154. The selected features are used to train and test random forest candidate models whose estimates are then combined according to the posterior probabilities by Bayesian Model Averaging (BMA). The performance of the model on the selected features is evaluated on four performance metrics. The essentiality of the selected features is further determined by comparing the model classification performance achieved on the non-selected features and all the dataset features. The results indicate competitiveness in the performance metric values achieved on selected features over the other two feature categories affirming the efficacy of the proposed method.
|
|
TuAT6 |
MR06 |
Autonomous Systems and Robotics 3 |
|
Chair: Chen, Yu-Xuan | National Chung Cheng University |
|
08:45-09:05, Paper TuAT6.3 | |
Is Shared Autonomous Driving Worth Promoting? Based on the Heterogeneity of Consumer Green Preference |
|
Li, Wenjing | Northwestern Polytechnical University |
Zhang, Yali | Northwestern Polytechnical University |
Keywords: Autonomous Vehicle, Decision Support Systems, Intelligent Transportation Systems
Abstract: In order to solve the artificial problems such as high prices and frequent conflicts between drivers and consumers in ride-hailing services, China and other countries are actively promoting the development of shared autonomous driving. We explore the stimulative impact of subsidy initiatives on shared autonomous driving, focusing on consumers' green preferences. By constructing a Hotelling model of AV and HV service platform competition, it analyzes the optimal solutions under three scenarios: no subsidy, subsidized AV platform, and subsidized consumers, and conducts sensitivity analysis. The study found that subsidizing AV platforms can reduce their service prices and enhance market competitiveness, while subsidizing consumers may increase their service prices but not necessarily increase market share. Policy makers should adjust subsidies as AV services develop, and may gradually cancel incentives after the industry matures. At this time, AV platforms need to improve their technology and management level to achieve sustainable development.
|
|
09:05-09:25, Paper TuAT6.4 | |
Integrating End-To-End and Modular Driving Approaches for Online Corner Case Detection in Autonomous Driving |
|
Kaljavesi, Gemb | Technical University of Munich |
Su, Xiyan | Technical University of Munich |
Diermeyer, Frank | Technical University Munich |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems, Fault Monitoring and Diagnosis
Abstract: Online corner case detection is crucial for ensuring safety in autonomous driving vehicles. Current autonomous driving approaches can be categorized into modular approaches and end-to-end approaches. To leverage the advantages of both, we propose a method for online corner case detection that integrates an end-to-end approach into a modular system. The modular system takes over the primary driving task and the end-to-end network runs in parallel as a secondary one, the disagreement between the systems is then used for corner case detection. We implement this method on a real vehicle and evaluate it qualitatively. Our results demonstrate that end-to-end networks, known for their superior situational awareness, as secondary driving systems, can effectively contribute to corner case detection. These findings suggest that such an approach holds potential for enhancing the safety of autonomous vehicles.
|
|
09:25-09:45, Paper TuAT6.5 | |
High-Precision Vehicle Positioning Technology by Combining Vehicle Images and Satellite Maps (I) |
|
Chen, Yu-Xuan | National Chung Cheng University |
Lin, Huei-Yung | National Taipei University of Technology |
Keywords: Autonomous Vehicle, Intelligent Transportation Systems
Abstract: With the continuous advances of technologies, the demand for precise vehicle positioning has grown significantly. Although it is now possible to capture driving scenes and record driving paths using a dashcam, standard civilian GPS typically has the accuracy errors ranging from 3 to 5 meters. This level of precision is in general not sufficient for the rapidly evolving ADAS (Advanced Driver Assistance Systems). In this paper, we present a high-precision vehicle positioning technique based on satellite map and image data. Instead of using expensive LiDAR sensors, the proposed approach utilizes lane detection, semantic segmentation and geolocation to extract environmental features from images. A* algorithm is then adopted to refine the driving trajectory for the improvement of vehicle positioning accuracy. Furthermore, we establish an image dataset containing satellite maps and latitude/longitude coordinate information of various road scenes. The code and datasets are made available publicly at https://github.com/M610415018/M610415018-Paper.
|
|
TuAT7 |
MR07 |
Online - AI Applications 1 |
|
Chair: Xu, Jin | Shenyang Aerospace University |
|
08:05-08:25, Paper TuAT7.1 | |
Intrusion Detection System Based on FastICA and Multi-Grained Cascaded Forest |
|
Fei, Jiahui | Nanjing University of Science and Technology |
Zhang, Shuangquan | School of Cyber Science and Engineering |
Lian, Zhichao | Nanjing University of Science and Technology |
Keywords: AIoT, Application of Artificial Intelligence, Machine Learning
Abstract: 随着大数据、微处理器和其他应用的进步,物联网 (IoT) 取得了长足的发展。由于缺乏必要的安全防御机制,物联网设备容易被攻击者针对和控制。他们可以操纵大量的物联网设备,对一个国家或地区的网络基础设施发动DDoS攻击,导致严重的经济损失和社会安全风险。基于深度学习的入侵检测方法通常依赖于大量高质量的训练实例,因此很难将其应用于缺乏足够标记数据的网络流量。传统的机器学习 (ML) 方法在从高维数据中提取和表示特征方面的能力有限,这使得发现数据中的底层结构和模式变得具有挑战性。针对上述问题,该文提出一种融合了快速独立分量分析(FastICA)模块和多粒度级联森林(GcForest)的入侵检测系统(IDS)。通过利用 Fa
|
|
08:25-08:45, Paper TuAT7.2 | |
Efficient One-Shot Pruning of Large Language Models with Low-Rank Approximation |
|
Xu, Yangyan | Institute of Information Engineering, Chinese Academy of Science |
Cao, Cong | Institute of Information Engineering, Chinese Academy of Science |
Yuan, Fangfang | Institute of Information Engineering, Chinese Academy of Science |
Mi, Rongxin | National Computer Network Emergency Response Technical Team/Coor |
Sun, Nannan | Institute of Information Engineering, Chinese Academy of Science |
Wang, Dakui | Institute of Information Engineering, Chinese Academy of Science |
Liu, Yanbing | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Representation Learning, Deep Learning, Machine Learning
Abstract: Model pruning, as an effective method for compressing large language models (LLMs), has recently attracted considerable attention in the field of natural language processing. However, existing LLM pruning methods have two main drawbacks: (1) Iterative pruning for LLMs with over a billion parameters requires retraining, which leads to significant pruning costs. (2) LLMs Pruning is formalized as a weight reconstruction problem that necessitates second-order information, incurring expensive computations. To address these issues, we propose a novel pruning method named Eplra: efficient one-shot pruning of large language models with low-rank approximation, which efficiently identifies sparse networks in LLMs. Specifically, we design a novel pruning metric based on input activations for the rapid one-shot compression of LLMs. We first incorporate input activations into the calculation of weight importance to promote precise pruning of low-priority weights. Then, we perform local weight comparisons across each output of linear layers to induce uniform sparsity. Next, we expand Eplra into semi-structured pruning patterns to accommodate various acceleration scenarios. Finally, we employ low-rank parametrized update matrices to fine-tune the pruned model, facilitating a swift recovery of model performance. Experimental results on various language benchmark datasets demonstrate that Eplra outperforms the state-of-the-art methods.
|
|
08:45-09:05, Paper TuAT7.3 | |
Video-Based Examination Anomaly Action Recognition Via Channel-Temporal Model |
|
Peng, Qin | Central China Normal University |
Yao, Huaxiong | Central China Normal University |
Liu, Xinyu | Central China Normal University |
Keywords: Image Processing and Pattern Recognition, Neural Networks and their Applications, Deep Learning
Abstract: With rapid technological advancements in computer vision, the recognition of abnormal behavior during examinations has transitioned from human observation to computer-assisted recognition. Although traditional 2D Convolutional Neural Networks (CNNs) excel in computational efficiency, they need to capture crucial temporal dynamics for comprehensive video analysis more precisely. Nevertheless, 3D CNN-based methods demonstrate promising performance in temporal modeling but impose substantial computational demands and deployment costs. To overcome these challenges, this paper introduces an innovative Examination Anomaly Action Recognition Network named ReTANet. It incorporates cross-channel temporal modeling to capture temporal features within videos. It also employs Multi-Scale Channel Attention to enrich feature representation and extract channel and spatial information, thereby enhancing recognition accuracy without significantly increasing computational complexity and model parameters. Furthermore, this paper introduces the Examination Anomaly Action Dataset, also named the ExamGuard Dataset (EGD), to facilitate model training and evaluation. Remarkably, our model demonstrates superior performance compared to existing mainstream action recognition algorithms on the HMDB-51 dataset. Rigorous ablation studies conducted on the UCF-101 dataset have shown the effectiveness and significance of the proposed module.
|
|
09:05-09:25, Paper TuAT7.4 | |
KGNet: A Legal Knowledge Enhancement and GlobalPointer Triple Extraction Network |
|
Li, Jinchen | Inner Mongolia University of Finance and Economics |
Li, Yanling | Inner Mongolia Normal University |
Fengpei Ge, Fengpei Ge | Beijing University of Posts and Telecommunications |
Xingxing, Wang | Inner Mongolia Normal University |
Keywords: AI and Applications, Application of Artificial Intelligence, Deep Learning
Abstract: Extracting entity relations is vital in legal artificial intelligence. It automates the mining of triple data from vast legal texts. Current methods face challenges in inaccurately identifying legal named entity boundaries and extracting overlapping relation triples from legal texts. We present KGNet, a model developed to address these issues effectively. Our approach introduces a Word Information Generator Based on BMES tagging combined with the Fusionformer module. This innovation enhances the incorporation of legal domain knowledge into text representations, improving the accuracy of entity recognition. Additionally, we utilize the GlobalPointer decoder, which redefines and decomposes relation triples, thus resolving the issue of overlapping entities. Performance evaluations on a specially constructed judicial document dataset show that KGNet achieves an F1 score of 66.7%, representing an average improvement of 15.3% over baseline models. These results confirm the effectiveness of KGNet in enhancing legal document processing.
|
|
09:25-09:45, Paper TuAT7.5 | |
Research on Task Assignment of Firefighting UAVs Based on E-CARGO Model (I) |
|
Xu, Jin | Shenyang Aerospace University |
Xiang, Zhiyu | Shenyang Aerospace University |
Zhang, Senyue | Shenyang Aerospace University |
Sun, Yue | Shenyang Aerospace University |
Gao, Beihang | Shenyang Aerospace University |
Keywords: Adaptive Systems, Cooperative Systems and Control
Abstract: 消防无人机技术已成为其中之一 现代消防行动的核心工具,及其 任务 执行的规模和复杂性都在不断扩大。 面对这样的发展,更是变得更加 找到一种有效的方法来确保无人机能够 被指派执行最合适的任务。在这个 研究中,我们使用了 Environment-Class、Agent、Role、Group、 和对象(E-CARGO)模型,以系统地分析 消防无人机的任务分配(FDTA)问题,以及 引入了增强的鲸鱼优化算法 (EWOA) 优化FDTA问题中的路径规划。最后 仿真实验在多样化下进行 地形 展示效率和快速响应的条件 改进算法在不同工作负载下的能力 和环境条件。
|
|
TuAT8 |
MR08 |
Online - Affective and Cognitive Computing 1 |
Regular Papers - Cybernetics |
Chair: Yuan, Desen | ASR Microelectronics Co., Ltd.; University of Electronic Science and Technology of China |
|
08:05-08:25, Paper TuAT8.1 | |
RPID: Boosting Transferability of Adversarial Attacks on Vision Transformers |
|
Wang, Shujuan | Nanjing University of Science and Technology |
Wang, Panpan | Nanjing University of Science and Technology |
Sun, Yunpeng | Nanjing University of Science and Technology |
Lian, Zhichao | Nanjing University of Science and Technology |
Li, Shuohao | National University of Defense Technology |
Keywords: Image Processing and Pattern Recognition, Machine Learning, Deep Learning
Abstract: Vision Transformers (ViTs) have achieved excellent performance on many computer vision tasks, which has attracted attention of many researchers for their adversarial robustness. As a kind of black-box attack, transfer-based attacks usually use adversarial examples generated by a surrogate model to attack structurally different models. It is practical and poses a certain threat to the application of ViTs in critical security areas. Existing transfer-based attacks against ViTs suffer from weak adversarial transferability and noticeable perceptibility. In this work, we propose a method called Reduce Regional Perturbation Interaction and Differentiated (RPID) attack, which employs two strategies of reducing correlation between regional perturbations and adding differentiated perturbations to produce adversarial examples. Extensive experiments demonstrate that our proposed method improves the transferability of the baseline methods for adversarial attacks against ViTs while maintaining stealthiness.
|
|
08:25-08:45, Paper TuAT8.2 | |
LESaET: Low-Dimensional Embedding Method for Link Prediction Combining Self-Attention and Enhanced-TuckER |
|
Ding, Lichao | Qilu University of Technology (Shandong Academy of Sciences) |
Zhao, Jing | Qilu University of Technology(ShanDong Academy of Sciences) |
Lu, Kai | Qilu University of Technology (Shandong Academy of Sciences) |
Hao, Zenghao | Qilu University of Technology |
Keywords: Knowledge Acquisition, Representation Learning, Neural Networks and their Applications
Abstract: Knowledge graphs (KGs) provide a structured representation of the real world through entity-relation triples. However, current KGs are often incomplete, typically containing only a small fraction of all possible facts. This involves inferring missing content from existing information, a task known as link prediction. Existing methods in the field of link prediction struggle with controlling the dimensionality of embedding vectors or suffer from overly complex models. In order to tackle these challenges, we introduce a method in this paper, named Low-Dimensional Embedding Method for Link Prediction Combining Self-Attention and Enhanced-TuckER (LESaET). LESaET leverages both self-attention mechanisms and tensor factorization to learn expressive contextual-enhanced representations of KGs. Specifically, LESaET employs the multi-head self-attention mechanism of Transformer as an encoder to capture the mutual information between entities and relationships, and utilizes Enhanced-TuckER as a decoder, ultimately achieving expressive low-dimensional embeddings for link prediction tasks. LESaET demonstrates competitive performance compared to advanced methods on standard datasets.
|
|
08:45-09:05, Paper TuAT8.3 | |
Towards Adversarial Robustness in Blind Image Quality Assessment with Soft Thresholding Norm |
|
Yuan, Desen | ASR Microelectronics Co., Ltd.; University of Electronic Science |
Wang, Lei | University of Electronic Science and Technology of China |
Keywords: Multimedia Computation, Deep Learning, Media Computing
Abstract: In this study, we address the issue of adversarial robustness within the context of Blind Image Quality Assessment (BIQA), an area of heightened importance due to the inherent susceptibility of Deep Neural Networks (DNNs) to adversarial assaults. Current approaches primarily rely on adversarial training, which, despite its efficacy, imposes a significant computational burden. Our research proposes an alternative strategy known as the Soft Thresholding Norm (ST Norm). This approach counters the 'feature shift' phenomenon, identified by a substantial Euclidean Distance Statistics (EDS) between original and adversarial features, through the imposition of sparse constraints on potential features following batch normalization. This novel method offers several advantages: it reduces the Lipschitz constant yielding smoother models, seamlessly integrates with existing models, and boasts inherent denoising capabilities, thereby effectively mitigating the impact of adversarial perturbations. Results suggest that our approach achieves robustness comparable to adversarial training but with significantly less computational overhead. Moreover, it consistently outperforms other adversarial defense strategies on BIQA datasets, highlighting its practical effectiveness in enhancing adversarial robustness. This study underscores the potential of the Soft Thresholding Norm within the realm of IQA tasks, positioning it as a resource-efficient alternative to traditional adversarial training methodologies.
|
|
09:05-09:25, Paper TuAT8.4 | |
Efficient Nearest Neighbor Prompt-Based Learning for Few-Shot NER in Manufacturing |
|
Chen, JiaXin | Shenyang Aerospace University |
Wang, Peiyan | Shenyang Aerospace University |
Keywords: Application of Artificial Intelligence, Knowledge Acquisition
Abstract: The NER task in manufacturing is usually lack sufficient labeled data resources. To tackle this issue, this paper presents an effective NN-PLM framework for few-shot NER in manufacturing, which introduce a simple enhancement of the prompt-based learning model using nearest neighbor retrieval. We retrieve the morphologically similar characters for each character to be predicted and then rectifies the prediction. Moreover, we use supervised contrastive learning (SCL) and instance weighting to get better semantic representations of multi-category characters. Compared with the best baseline, our NN-PLM achieves a 7.12% F1 score average improvement on all few-shot settings in manufacturing.
|
|
09:25-09:45, Paper TuAT8.5 | |
MJR: Multi-Head Joint Reasoning on Language Models for Question Answering |
|
Li, Shunhao | South China Normal University |
Chen, Jiale | South China Normal University |
Yan, Enliang | South China Normal University |
Zhan, Choujun | South China Normal University |
Wang, Fu Lee | Hong Kong Metropolitan University |
Hao, Tianyong | South China Normal University |
Keywords: Deep Learning, Neural Networks and their Applications, Expert and Knowledge-Based Systems
Abstract: Language Models (LMs) have achieved impressive success in various question answering (QA) tasks but have shown limited performance on structured reasoning. Recent research suggests that Knowledge Graph (KG) can augment text data by providing a structured background to enhance reasoning capabilities of LMs. Therefore, how to integrate and reason over KG representations and language context remains an open question. In this work, we propose MJR, a novel model to integrate encoded representations of LMs and graph neural network through multiple layers of feature interaction operations. Subsequently, the fused feature representations in two modalities are fed into a multi-head representation fusion module to comprehensively capture semantic and graph structure information, thereby enhancing language understanding and reasoning capabilities. In addition, we investigate the performance and applicability of different types of large language models as text encoder in the question-answering task. We evaluate our model on three common dataset: CommonsenseQA, OpenBookQA, and MedQA-USMLE datasets. The results demonstrate the advancements of MJR over existing LMs, LM+KG and LLMs models in reasoning for question answering.
|
|
TuAT9 |
MR09 |
AI Applications 8 |
Regular Papers - Cybernetics |
Chair: Liu, Shanwen | College of Computer Science, Sichuan Normal University |
|
08:45-09:05, Paper TuAT9.3 | |
Robotic Manipulator Motion Planning Based on Global Path Guidance Reinforcement Learning in Dynamic Obstacle Environment |
|
Liu, Shixian | Chinese Academy of Sciences |
Zhang, Jinhan | Institute of Automation, Chinese Academy of Sciences |
Shanlin, Zhong | Institute of Automation, Chinese Academy of Sciences |
Chen, Jiahao | Institute of Automation, Chinese Academy of Sciences |
Zhengyu, Liu | Institute of Automation, Chinese Academy of Sciences |
Wu, Wei | Institute of Automation, Chinese Academy of Sciences |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Friendly robots have extremely important application prospects in many fields. However, in unstructured environment, the interaction between manipulator and dynamic environments faces the problems of high uncertainty caused by random invasion of work space and computational complexity brought by multi-dimensional action space. Therefore, we propose a hierarchical planning algorithm based on global path guidance reinforcement learning to solve this problem from the decision and planning level. Specifically, the global path planning algorithm first produces a global reference path that ensures the target can be reached. Then the reference path is decomposed into consecutive local targets, which are combined with the objective function of reinforcement learning as local constraints. Finally, the reinforcement learning local planner generates the action of the manipulator based on the observed information. The simulation results show that our method is superior to the standard off-policy reinforcement learning algorithm in terms of learning speed and accuracy, which proves the effectiveness of our algorithm.
|
|
09:05-09:25, Paper TuAT9.4 | |
MFFDR: An Advanced Multi-Branch Feature Fusion and Dynamic Reconstruction Framework for Enhancing Adversarial Robustness |
|
Liu, Shanwen | College of Computer Science, Sichuan Normal University |
Guo, Rongzuo | Sichuan Normal University |
Zhang, Xianchao | Sichuan Normal University |
Keywords: Application of Artificial Intelligence, Deep Learning, Machine Learning
Abstract: Deep Neural Networks (DNNs) are highly susceptible to adversarial noise, which can lead to erroneous predictions. In high-stakes scenarios, such as autonomous driving and medical diagnosis, DNNs inaccuracies can be dire. To address this issue, Adversarial Training (AT) has been widely adopted as an effective defense method. However, our analysis reveals two critical flaws in the traditional AT approach that hinder its adversarial robustness: (1) focus only on a subset of robust features during the training process. This narrow focus limits the model's ability to learn and perceive a diverse range of features. (2) tend to overlook potential cues in non-robust features that could be beneficial for the model to make correct predictions. These cues, referred to as "positive activations" for simplicity, contain valuable information that can enhance the model's perception and understanding of the input data. In this way, we propose a novel and plug-and-play framework called Multi-branch Feature Fusion and Dynamic Reconstruction (MFFDR), which leverages multi-branch attention mechanisms to enhance the model's perception of robust features and enrich the diversity of learned features. Moreover, we employ a dynamic weighting strategy to reconstruct non-robust features in order to utilize the positive activations embedded within them. Extensive experiments demonstrate that our method significantly improves the model's adversarial robustness and outperforms previous state-of-the-art methods.
|
|
09:25-09:45, Paper TuAT9.5 | |
BTP-CAResNet: An Encrypted Traffic Classification Method Based on Byte Transfer Probability and Coordinate Attention Mechanism |
|
Li, Junhao | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Wei | Qilu University of Technology (Shandong Academy of Sciences) |
Shi, Huiling | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Application of Artificial Intelligence, AI and Applications, Neural Networks and their Applications
Abstract: With the extensive application of network traffic encryption technology, the accurate and efficient classification of encrypted traffic has become a critical need for network management. Deep learning has become the predominant method for traffic classification, primarily involving the transformation of network traffic into grayscale images and their subsequent classification using Convolutional Neural Networks (CNNs). However, traditional grayscale image generation methods are plagued with issues of redundant and lost information, and conventional channel attention mechanisms are still insufficient in capturing key traffic features, collectively hindering the enhancement of classification performance. To tackle these issues, this paper introduces a classification method based on Byte Transfer Probability and Coordinate Attention Mechanism in Residual Network (BTP-CAResNet). This method, on the foundation of the classic ResNet architecture, incorporates a new grayscale image generation method that utilizes Byte Transfer Probability, effectively overcoming the deficiencies of traditional approaches. Additionally, this paper integrates a Coordinate Attention Mechanism into the ResNet model, which effectively overcomes the limitations of traditional channel attention mechanisms and further improves the performance of traffic classification. Experimental validation on the ISCX VPN-nonVPN dataset demonstrates that, compared to previous CNN-based methods, the method proposed in this paper exhibits superior performance in key metrics such as accuracy, precision, recall, and F1 score. It provides a new perspective for traffic classification based on convolutional neural networks.
|
|
TuAT10 |
MR10 |
Big Data and Intelligent Systems |
|
Chair: Li, Wei | University of Chinese Academy of Sciences |
|
08:05-08:25, Paper TuAT10.1 | |
SIKGC: Structural Information Prompt Based Knowledge Graph Completion with Large Language Models |
|
Li, Wei | University of Chinese Academy of Sciences |
Ge, Jingguo | University of Chinese Academy of Sciences |
Feng, Weihua | Zhengzhou Tobacco Research Institute of CNTC, China |
Zhang, Lei | Institute of Information Engineering,Chinese Academy of S |
Li, Liangxiong | Institute of Information Engineering, Chinese Academy of Science |
Wu, Bingzhen | Institute of Information Engineering, Chinese Academy of Science |
Keywords: AI and Applications, Big Data Computing,, Deep Learning
Abstract: Knowledge Graph Completion (KGC) aims to enrich and complete the knowledge graph by discovering missing information from existing fact triples. However, existing KGC methods often overlook the utilization of structured knowledge within the knowledge base. In this paper, we propose a novel Large Language Models-based Knowledge Graph Completion framework, called SIKGC, which builds the structural information prompt to assist the knowledge graph completion tasks. Specifically, we arrange the triples in the knowledge graph as the sequences of text. By fusing the descriptions of entities, relations and their structural information as task-aware prompts, we input such prompts into large language models and regard the responses as prediction tasks. The experimental results on various public datasets show that the proposed method outperforms all baseline methods for the three knowledge completion tasks and attains state-of-the-art in triple classification. We also demonstrate that fine-tuning the smaller large language models (e.g., Baichuan2-13B, LLaMA2-13B, ChatGLM3-6B) with relevant data markedly enhances their KGC capabilities and significantly outperforms GPT-4.
|
|
08:25-08:45, Paper TuAT10.2 | |
RealDriftGenerator: A Novel Approach to Generate Concept Drift in Real World Scenario |
|
Lin, Borong | Xi'an Jiaotong-Liverpool University |
Huang, Chao | University of Southampton |
Zhu, Xiaohui | Xi'an Jiaotong-Liverpool University |
Jin, Nanlin | Xi’an Jiaotong-Liverpool University |
Keywords: Big Data Computing,, Machine Learning
Abstract: Concept drift refers to the probability distribution of data generation changes over time in a data stream environment. In recent years, there has been an increasing interest in drift detection models. However, due to the lack of labeled concept drift datasets, most researchers tend to using synthetic drift data generators for model training. These generators only have relatively simple feature distributions, which fail to capture the complexity found in real-world scenarios. This paper introduces a real scenario concept drift label generator (RealDriftGenerator). This generator aims to preserve the complexity and temporal correlation of real-world scenario while generating concept drifts with user defined drift positions and drift widths. The validation results shows that the temporal correlation coefficients of RealDriftGenerator is significantly higher than benchmark drift generators. Additionally, the ability of RealDriftGenerator to capture the complexity in real-world scenarios is 20% higher than benchmark drift generators(measured by model performance). The source code of RealDriftGenerator has been published on https://github.com/sniperrifle71/realDriftGenerator.
|
|
08:45-09:05, Paper TuAT10.3 | |
An Agent-Based Model of Opinion Dynamics with Hierarchical Thinking |
|
Ou, Lizhen | National University of Defense Tenchnology |
Yao, Yiping | National University of Defense and Technology |
Luo, Jiao | National University of Defense Technology |
Tang, Wenjie | National University of Defense Technology |
Keywords: Agent-Based Modeling, Artificial Social Intelligence
Abstract: Opinion dynamics studies the principles governing the evolution of collective opinion, offering valuable insights into the comprehension of social phenomena and forecasting group behavior. However, existing opinion dynamics models often overlook the impact of both opinion climate and cognitive capacities on interactive behaviors, thus causing simulation outcomes diverge from real-world observations. Addressing this gap, we propose a novel opinion dynamics model based on hierarchical thinking to describe the opinion evolution on social networks. Individuals are classified into different levels according to their cognitive abilities. They act with bounded rationality at their respective levels to optimize both the promotion of personal opinions and the avoidance of cyberbullying. Through simulation analysis, we found the crucial role of users with high levels of hierarchical thinking. They can discern the opinion climate and articulate their opinion, acting as bridges in the evolution of public opinion. Their opinion can reach the bounded confidence range of more people, thereby enabling polarization to shift to consensus under the same conditions. Furthermore, this effect is independent of individual inherent attributes, which is more in line with real-life scenarios.
|
|
TuAT11 |
MR11 |
Brain-Machine Interfaces (BMIs) 2 |
Regular Papers - Cybernetics |
Chair: Shukla, Rishabh | Indian Institute of Technology Jammu |
|
08:25-08:45, Paper TuAT11.2 | |
RoPAR: Enhancing Adversarial Robustness with Progressive Image Aggregation and Reordering Noise |
|
An, Jong-Hyun | Korea University |
Hong, Jung-Ho | Korea University |
Kim, Hee-Dong | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications
Abstract: Adversarial attacks mislead deep neural network classifiers with slight perturbations, underscoring the necessity for the development of robust defenses to ensure the secure and responsible use of artificial intelligence. Recent research has shown that diffusion-based adversarial purification methods have emerged as a promising defense technique, but often suffer from computational inefficiencies and suboptimal results. To address these issues, we propose RoPAR, an innovative approach that enhances robustness against adversarial attacks by aggregating purified images at intermediate steps of the diffusion process. Our method improves model robustness while reducing the required diffusion steps. We also introduce a technique for reordering Gaussian noise to minimize semantic information loss while removing adversarial perturbations. These enhancements significantly reduce the number of function evaluations from 200 to 6, achieving a robust accuracy of 92.39% against preprocessor-blind PGD attacks on CIFAR-10, a 2.29 percentage point improvement over state-of-the-art. Moreover, our method demonstrates its effectiveness in real-world scenarios, achieving 87.46% accuracy on CIFAR-10C.
|
|
08:45-09:05, Paper TuAT11.3 | |
Bridging the Gap: Creating Authentic Biometric Templates for Secure Authentication Systems |
|
Shukla, Rishabh | Indian Institute of Technology Jammu |
Kaur, Harkeerat | Indian Institute of Technology Jammu |
Echizen, Isao | National Institute of Informatics Tokyo |
Keywords: Biometric Systems and Bioinformatics
Abstract: Fingerprints serve as a primary means of individually identifying individuals. However, employing fingerprints in online mode poses a significant privacy risk, since it is susceptible to several forms of attack. It is plagued by issues related to privacy and security. In response to this, we proposed an innovative approach to convert the original fingerprint into a secure template that may be retained and utilized for authentication purposes. The new templates bear a resemblance to the original human fingerprints and ensure privacy by possessing the characteristic of non-invertibility. This study presented a method for generating highly authentic fingerprint templates that ensure the capacity to revoke and cancel the stolen fingerprint. Throughout the training and testing phase, we utilized the dataset derived from the Vikriti-ID fingerprint. The collection has 25000 distinct fingerprint samples, divided into five classes, with each class containing 5,000 samples. Throughout the testing phase, the comprehensive performance was evaluated based on the matching performance including EER and AUC.
|
|
09:05-09:25, Paper TuAT11.4 | |
PromotiCon: Prompt-Based Emotion Controllable Text-To-Speech Via Prompt Generation and Matching |
|
Lee, Ji-Eun | Korea University |
Kim, Seung-Bin | Korea University |
Cho, Deok-Hyeon | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications
Abstract: Text-to-speech (TTS) technologies have recently expanded to incorporate natural language prompts for user-friendly control of speech styles, driven by significant advancements in language models. Traditional prompt-based TTS research, however, typically requires large-scale prompt generation that often necessitates costly human annotations. To address this challenge, we propose PromotiCon, a system that leverages prompts generated without human annotations to control emotions in speech. Our model utilizes abundant prompts generated using a large language model. Additionally, we propose an emotion distance-based prompt-speech matching method to appropriately pair the generated prompts with the most resembling speech data. To enhance speaker adaptation, we introduce a semi-supervised approach that allows the joint utilization of multi-speaker data without emotion labels. As a result, our system facilitates zero-shot emotional speech synthesis. Our experimental results confirm the effectiveness of our approach. Audio samples are available at https://promoticon.github.io/.
|
|
09:25-09:45, Paper TuAT11.5 | |
CHBaR: Conditional Hilbert Schmidt Bottleneck As Regularization for Adversarial Robustness |
|
Jung, Seung-Wook | Korea University |
Hong, Jung-Ho | Korea University |
Kim, Hee-Dong | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: Deep Learning, Application of Artificial Intelligence, Neural Networks and their Applications
Abstract: Adversarial attacks pose a significant threat to security-critical applications by deliberately deceiving model predictions. Numerous works attempt to create robust models by encoding useful information to intermediate representations. However, they still contain too much information about the training data which hinders improving the robustness of the model. To mitigate this issue, we propose a novel approach, CHBaR, that incorporates class-conditioned information into intermediate representations. The class-conditioned information plays the role of weight components which are multiplied with the intermediate representations to produce class-conditioned representations. We utilize an attribution-based explanation method to obtain this class-conditioned information. As a result, the weight components emphasize class-relevant features by highlighting relevant information from the target class. This weighting process easily integrates the target class without complex computations and conceals useless representations, thus enhancing model predictions by masking features unrelated to the class. Extensive experiments demonstrate the effectiveness of our proposed method in enhancing adversarial robustness. Especially, on the SVHN dataset, our proposed method shows an increment of 6.98% points compared to the baseline model in PGD40 adversarial attack with the TRADES training setting.
|
|
TuAT12 |
MR12 |
Haptic and Human-Computer Interaction 6 |
Regular Papers - HMS |
Chair: Panagopoulos, Dimitrios | Cranfield University |
|
08:25-08:45, Paper TuAT12.2 | |
GRUI: A Novel Gesture Recognition Utilizing UWB Sensor and IMU |
|
Lee, Dongjae | Korea University |
Yoo, Kyeonghyun | Korea University |
Jung, Wooyong | Korea University |
Kim, Hwangnam | Korea University |
Keywords: Human-Computer Interaction, Intelligence Interaction, Human-Machine Cooperation and Systems
Abstract: In recent advancements in sensor and artificial intelligence technologies, the reliability of gesture recognition has significantly improved, prompting various industrial fields to adopt this technology. However, most gesture recognition systems rely on optical methods because of their high accuracy, despite require complex and computationally intensive processes. Moreover, these systems are associated with high construction costs and are susceptible to environmental factors. This paper introduces a novel gesture recognition system, which effectively tracks and estimates gestures using cost-effective ultra-wideband (UWB) sensors and inertial measurement units (IMU). The system acquires position data of gesture through UWB sensors and includes essential data processing steps such as the detection and removal of abnormal data via IMU, data smoothing with a Kalman filter, and data normalization and scaling. Notably, normalization and scaling are achieved by converting the position data into grayscale images, ensuring the consistency of data features and enhancing gesture recognition accuracy across diverse users. The proposed system employs a convolutional neural network (CNN) model to estimate gestures from these images. Comparative analyses demonstrate that the proposed system exhibits superior gesture classification performance compared to systems utilizing a long-short term memory (LSTM) model and those employing the same CNN model without the aforementioned data processing steps. Therefore, this system is not only cost-effective but also efficiently tracks and estimates gestures, offering significant improvements over existing methods.
|
|
08:45-09:05, Paper TuAT12.3 | |
Generating Explanations for Autonomous Robots Using Assumption-Alignment Tracking |
|
Cao, Xuan | Brigham Young University |
Crandall, Jacob | Brigham Young University |
Goodrich, Michael | Brigham Young University |
Keywords: Human-Machine Interaction, Intelligence Interaction
Abstract: As the techniques of autonomous robots advance, there is an increasing demand for robots to provide explanations for their behavior. There are two commonly used explanation types. The first type emphasizes that a robot’s policy is the best (or only) option that satisfies a specific property produced by its decision-making algorithms. The second explanation type is used when a robot fails and describes the cause of an error state that led to the failure. This paper proposes a new explanation type derived from a robot's proficiency self-assessment. The proposed explanation type not only supplements the first explanation type under typical operating conditions but also includes the second explanation type when the robot fails. The proposed explanation type is based on assumption-alignment tracking (AAT), a novel method for robot proficiency self-assessment. AAT provides three pieces of information for explanation generation: (1) assessment of assumptions veracity on which the robot's generators rely; (2) proficiency assessment measured by the probability that the robot will successfully accomplish its task; (3) counterfactual proficiency assessment computed by hypothetically varying assumptions. The information provided by AAT fits the situation awareness-based framework for explainable artificial intelligence. Examples of generated explanations are demonstrated using a simulated robot setting up a table with different blocks.
|
|
09:05-09:25, Paper TuAT12.4 | |
Selective Exploration and Information Gathering in Search and Rescue Using Hierarchical Learning Guided by Natural Language Input |
|
Panagopoulos, Dimitrios | Cranfield University |
Perrusquia, Adolfo | Cranfield University |
Guo, Weisi | Cambridge University |
Keywords: Human-centered Learning, Human-Machine Interaction, Human-Machine Cooperation and Systems
Abstract: In recent years, robots and autonomous systems have become increasingly integral to our daily lives, offering solutions to complex problems across various domains. Their application in search and rescue (SAR) operations, however, presents unique challenges. Comprehensively exploring the disaster-stricken area is often infeasible due to the vastness of the terrain, transformed environment, and the time constraints involved. Traditional robotic systems typically operate on predefined search patterns and lack the ability to incorporate and exploit ground truths provided by human stakeholders, which can be the key to speeding up the learning process and enhancing triage. Addressing this gap, we introduce a system that integrates social interaction via large language models (LLMs) with a hierarchical reinforcement learning (HRL) framework. The proposed system is designed to translate verbal inputs from human stakeholders into actionable RL insights and adjust its search strategy. By leveraging human-provided information through LLMs and structuring task execution through HRL, our approach not only bridges the gap between autonomous capabilities and human intelligence but also significantly improves the agent's learning efficiency and decision-making process in environments characterised by long horizons and sparse rewards.
|
|
09:25-09:45, Paper TuAT12.5 | |
Sensor System for Real-Time Classification of Manual Construction Tasks with Power Tools for Exoskeleton Control |
|
Leudesdorff, Bent | Fraunhofer IPA |
Salazar Strümpler, Lydia Rebeca | Fraunhofer IPA |
Dobosz, Thomas | Institute of Industrial Manufacturing and Management, University |
Maufroy, Christophe | Fraunhofer Institute for Manufacturing Engineering and Automatio |
Schneider, Urs | Institute of Industrial Manufacturing and Management, University |
Bauernhansl, Thomas | Institute of Industrial Manufacturing and Management, University |
Keywords: Human-Machine Cooperation and Systems, Assistive Technology, Human-Machine Interaction
Abstract: Work-related musculoskeletal disorders (WMSD) continue to be a significant cause of work incapacity. Exoskeletons have the potential to prevent these disorders, with passive exoskeletons already proving their usefulness but also displaying limitations. However, there is currently a lack of suitable methods to control active exoskeletons, which offer additional advantages like supporting the user just when needed. This paper proposes a sensor system and a method to classify different activities based on the kinematic and activation signals of the used power tools. First, requirements and thresholds for the sensor system and its signals were derived from representative activities in the construction environment. It then introduces a sensor system that collects the tool's kinematic signals and activation. The sensor system consists of an inertial measurement unit (IMU), pressure sensor, and WiFi-capable micro-controller to stream the data. Based on the signals, a threshold-based algorithm, capable of identifying six predefined activities, is presented. The paper presents a suitable test course based a real situation in the construction industry to evaluate the proposed sensor system and algorithm. With the developed test course a study is conducted and the activities are classified based on the signals of the sensor system. The results demonstrate that the defined activities can be distinguished based on the kinematic and activation signals of the power tools.
|
|
TuAT13 |
Foyer |
2P - AI Applications |
|
Chair: Yagi, Naomi | University of Hyogo |
|
08:05-08:25, Paper TuAT13.1 | |
CiRA CORE: A Low Code Platform That Makes AI Work for Industry 4.0 |
|
Loo, ChuKiong | University of Malaya |
Boonsang, Siridech | King Mongkut’s Institute of Technology |
Sasisaowapak, Thanyathep | King Mongkut’s Institute of Technology |
Chuwongin, Santhad | King Mongkut’s Institute of Technology |
Tongloy, Teerawat | King Mongkut’s Institute of Technology |
Nahavandi, Saeid | Swinburne University of Technology |
Wong, Kok Wai | Murdoch University |
Keywords: Application of Artificial Intelligence, Cloud, IoT, and Robotics Integration, AIoT
Abstract: CiRA CORE is a central hub designed to connect AI technology creation with practical application, making it easier to work with ROS (Robot Operating System) and link different systems through a user-friendly drag-and-drop interface. This approach removes the need for extensive coding, making the platform accessible to those with minimal programming experience. CiRA CORE offers a comprehensive suite of features for AI development and robot control, including algorithm creation, AI model training, and device integration commonly used in industrial settings. It supports tasks like image recognition and facilitates data storage, labeling, and integration with other systems for data-driven AI development. Overall, CiRA CORE aims to democratize AI development and robot control, simplifying AI development for Industry 4.0 applications, and leading to increased efficiency, reduced costs, and improved safety in industrial processes. This paper reports the progress of the CiRA CORE training modules funded by the SMCS TEAM Program Award. The project has completed the design of a 6-axis robot 3D training kit and simulation models for CiRA CORE training modules. The next steps involve developing 3D-printed robots and training materials. The main goal is to democratize advanced robotics and AI by simplifying integration through a visual, node-based programming interface. This approach reduces the need for complex coding, making these technologies accessible to users with limited programming experience. This initiative aims to foster widespread adoption in business and industrial settings, aligning with IEEE SMC's mission to promote professional growth and innovation in robotics and AI.
|
|
08:25-08:45, Paper TuAT13.2 | |
Towards an Optimal Design: What Can We Recommend to Elon Musk |
|
Ceberio, Martine | The University of Texas at El Paso |
Kosheleva, Olga | University of Texas at El Paso |
Kreinovich, Vladik | University of Texas at El Paso |
Nguyen, Hung T. | New Mexico State University |
Keywords: Consumer and Industrial Applications, Large-Scale System of Systems, Manufacturing Automation and Systems
Abstract: Elon Musk's successful "move fast and break things" strategy is based on the fact that in many cases, we do not need to satisfy all usual constraints to be successful. By sequentially trying smaller number of constraints, he finds the smallest number of constraints that are still needed to succeed -- and using this smaller number of constrains leads to a much cheaper (and thus, more practical) design. In this strategy, Musk relies on his intuition -- which, as all intuitions, sometimes works and sometimes doesn't. To replace this intuition, we propose an algorithm that minimizes the worst-case cost of finding the smallest number of constraints.
|
|
08:45-09:05, Paper TuAT13.3 | |
Development of Tracking System for Swallowing Movement Using Optical Flow |
|
Yagi, Naomi | University of Hyogo |
Nishihara, Ryosuke | University of Hyogo |
Kawamura, Naoko | Himeji Dokkyo University |
Maezawa, Hitoshi | Kansai Medical University |
Kashioka, Hideki | National Institute of Information and Communications Technology |
Hirata, Masayuki | Osaka University |
Yanagida, Toshio | National Institute of Information and Communications Technology |
Sakai, Yoshitada | Kobe University |
Hata, Yutaka | University of Hyogo |
Keywords: AI and Applications, Application of Artificial Intelligence, Computational Intelligence
Abstract: Currently, population in Japan has been aging at a speed unparalleled in other countries, and countermeasures against aging population and the worsening of disease for people with disabilities have become urgent issues. Pneumonia and aspiration pneumonia are the leading causes of death. It is said that swallowing function tends to decline from around age of 40, however it is important to keep it in good condition without deteriorating function as much as possible. The gold standard for swallowing functional evaluation is swallowing contrast testing, however X-ray exposure disables to repeat testing. In addition, Repetitive Saliva Swallowing Test (RSST) of screening test is difficult for self-check. Therefore, in this study, we develop a system to self-evaluate swallowing ability for keeping swallowing function healthy. It is proposed by applying optical flow and artificial intelligence of DeepLabcut. As a result, we were able to visualize movement of the larynx during swallowing.
|
|
09:05-09:25, Paper TuAT13.4 | |
The Improved Mango Plant Detection Model Based on Attention Module Mechanism |
|
Sung, Wen-Tsai | National Chin-Yi University of Technology |
Isa, Indra Griha Tofik | National Chin-Yi University of Technology |
Keywords: AIoT, Computational Intelligence, Soft Computing, Socio-Economic Cybernetics
Abstract: Agriculture is one of the sources of income a region can rely on to support its economy. Traditional agriculture relies primarily on human performance and observation, resulting in greater production costs and, subsequently, higher selling prices. Artificial intelligence-based technology can be used to reduce production costs, increase productivity, and provide consumer convenience. An indicator that is easy to interpret in measuring the quality and optimization of plant growth is the visualization of the condition of the leaves. The artificial intelligence technique that can be implemented in this regard is the object detection model. However, the challenge is the complex, multi-object, and multi-intersection condition of the leaves, which causes the model to be less optimal in conducting classification and detection tasks regarding whether the leaf condition is good or not. A YOLOv7 model will be employed in order to detect leaf quality, whether in an “optimal” or “not optimal” condition. To enhance the model's performance by improving accuracy through feature extraction enhancement, YOLOv7 will be integrated with the attention module, called the convolutional block attention module (CBAM). The case study in this research is detecting a mango plant which is one of the plants that can provide a high economic impact and the object observed is the mango plant leaf. Several previous studies related to the implementation of attention modules in object detection include the improved pest-YOLO for real-time pest detection by combining YOLOv3 with efficient channel attention (ECA) and a transformer encoder. The ECA module and transformer encoder were integrated into the backbone and neck block systems of YOLO [1]. The lightweight YOLO model combined with SE-CSPGhostnet by improving the backbone block which employs squeeze-and-excitation networks (SENet) and a convolution technique consisting of regular convolution and ghost convolution [2]. There is a highlighted improvement of YOLOv7 compared to the previous version of YOLO, which is Extended Efficient Layer Aggregation Networks (E-ELAN). YOLOv7's learning ability is enhanced by using this network while maintaining the transition layer's architecture. E-ELAN enhance
|
|
09:25-09:45, Paper TuAT13.5 | |
AI-Enhanced Web Form Development: Tackling Accessibility Barriers with Generative Technologies |
|
Saraswathi, Pradeep Kumar | Salesforce |
Keywords: Assistive Technology, User Interface Design, Companion Technology
Abstract: Web forms play a pivotal role in digital interfaces but frequently pose significant accessibility challenges. This paper explores the main barriers to creating accessible web forms and investigates how generative AI technologies can provide solutions. We highlight core issues such as accurate labeling, keyboard navigation, error management, focus control, visual design factors, placeholder text usage, assistive technology compatibility, handling of complex inputs, responsive design, cognitive load reduction, and ongoing testing. For each of these challenges, we assess its effect on accessibility and present innovative AI-driven strategies. Our findings illustrate how AI can streamline the development process by automating label generation, improving tab indexing, enhancing real-time error detection, refining focus control, offering contrast improvement suggestions, and simulating interactions with assistive technologies. We conclude that incorporating generative AI into web form development can markedly improve accessibility, making digital experiences more inclusive for users of all abilities. This not only supports compliance with legal and ethical standards but also fosters a more inclusive online environment, enhancing user satisfaction and overall experience.
|
|
TuAPSR |
Room T14 |
Poster Presentation - Session 1 |
Poster Session |
|
08:05-09:45, Paper TuAPSR.1 | |
UAVs for Sustainable Palm Oil Production: An Ant Colony Approach to Efficient Path Planning |
|
Lai, Weng Kin | Tunku Abdul Rahman University of Management and Technology |
Chen, Pak Hen | Tunku Abdul Rahman University of Management and Technology |
Lim, Li Li | Tunku Abdul Rahman University of Management and Technology |
Lee, Patrick Sheng Siang | AONIC |
Keywords: Application of Artificial Intelligence, Swarm Intelligence, AI and Applications
Abstract: The production of palm oil on a commercial scale is labour intensive with many of its processes handled by humans. In some countries, there can be as many as 500,000 plantation workers in the palm oil sector involved in labour intensive work in large plantations. However, such dependence on humans for low skill manual work has led to many problems. Unmanned aerial vehicles (UAVs) have been seen as a possible alternative to support some of these processes that require low skills in the palm oil industry. However, the flying time of the UAVs is finite and hence it is important to maximize the number of palm trees that it can service. In this paper, an Ant Colony System (ACS) with a novel path constructor was used to identify good flight paths for UAVs in large palm oil plantations to help improve the efficiency for some of the agricultural activities. Good results were obtained for various data sets especially when compared with the standard ACS as well as those by the human experts.
|
|
08:05-09:45, Paper TuAPSR.2 | |
Incremental Learning Algorithms for Broad Learning System with Node and Input Addition |
|
Chen, Guang-Ze | University of Macau |
Jin, Junwei | Henan University of Technology |
Sun, Hai-Wei | University of Macau |
Chen, C. L. Philip | University of Macau |
Keywords: Computational Intelligence, AI and Applications, Machine Learning
Abstract: The Broad Learning System (BLS) has been established as an effective flat network alternative to Deep Neural Networks (DNNs), delivering high efficiency while achieving competitive accuracy. Despite its advantages, the incremental learning methods of BLS face challenges in stability and computation when expanding with new nodes or input. We introduce two novel incremental learning algorithms based on factorization updates for BLS that optimize node and input additions to overcome these limitations. Our node addition algorithm utilizes QR decomposition and Cholesky factorization, using the update of the Cholesky factor instead of pseudo-inverse computations. For input addition, we propose an iterative Cholesky factor update algorithm. Our algorithms demonstrate not only faster computation compared to the existing BLS but also improved testing accuracy on the MNIST or Fashion-MNIST dataset. This work presents a significant step forward in the practical application and scalability of BLS in various data-dense environments.
|
|
08:05-09:45, Paper TuAPSR.3 | |
RTS-DETR: Efficient Real-Time DETR for Small Object Detection |
|
Li, Wenqiang | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Aimin | Qilu University of Technology |
Li, Zhiyao | Qilu University of Technology (Shandong Academy of Sciences) |
Kong, Xiaotong | Qilu University of Technology (Shandong Academy of Sciences) |
Zhang, Yuechen | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, AI and Applications
Abstract: In recent years, object detection models DETRs based on Transformer architecture have played a huge role in various fields. However, the DETR series models are not satisfactory in small object detection. Mainly due to the huge amount of calculation of DETR, a lot of feature information will be lost in the feature fusion stage and the low tolerance of small objects to Intersection over Union (IoU). In order to solve the above problems, we propose a near real-time detection model RTS-DETR. In this paper, we revisit RT-DETR, which effectively handles multi-scale features by decoupling intra-scale interactions and cross-scale fusion, but this will lose a lot of positive local information. To this end, we have improved the efficient hybrid encoder. We propose a new positional encoding method that enables the hybrid encoder to more accurately convert the input feature sequence into a high-dimensional representation, and propose a new feature fusion module to enhance the model's ability to capture local features. Furthermore, in order to improve the tolerance of small objects to IoU, we combine Normalized Wasserstein Distance (NWD) with Shape-IoU for the optimization model. This method more accurately takes into account the shape and size of objects, thereby improving detection accuracy. Our model achieves an accuracy of 38.8% (in terms of mAP_{@0.5}) on the widely used VisDrone dataset, which improves the accuracy by 2.5% compared to RT-DETR with ResNet-18 as the backbone network.
|
|
08:05-09:45, Paper TuAPSR.4 | |
Synergizing Internal and External Knowledge: Prompt Engineering for Efficient and Effective Large Language Model Reasoning |
|
Lu, Gewei | Shanghai Jiao Tong University |
He, Chaofan | Shanghai Jiao Tong University |
Shen, Liping | Shanghai Jiao Tong University |
Keywords: Application of Artificial Intelligence, Deep Learning, Knowledge Acquisition
Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated remarkable capability in question answering but face challenges when it comes to knowledge-based reasoning, such as limited training data and hallucination. To address these challenges, integrating LLMs with knowledge graphs (KGs) has emerged as a promising solution. However, the cost associated with training and inference of LLMs is high. Our method integrates the Retrieval-Augmented Generation (RAG) paradigm, incorporating relevant information from KGs alongside the question to enhance LLMs' reasoning process without training. Moreover, we propose a novel concept of self-knowledge motivation to reduce the overhead of inference, which prompts LLMs to integrate retrieved information with their internal knowledge for reasoning before seeking additional queries to KGs. Experimental results showcase improvements in answer accuracy and a reduction in LLMs' API calls compared to the latest published state-of-the-art (SOTA) method employing an identical paradigm, underscoring the efficiency and effectiveness of our method.
|
|
08:05-09:45, Paper TuAPSR.5 | |
Try-Then-Eval: Equipping an LLM-Based Agent with a Two-Phase Mechanism to Solve Computer Tasks |
|
Cao, Thanh-Duy | Ho Chi Minh University of Science, VNU-HCM |
Nguyen, Phong Phu | University of Science - VNUHCM |
Le, Vy | University of Information Technology |
Nguyen, Long | University of Science, Ho Chi Minh City, Vietnam |
Nguyen, Vu | University of Science, Vietnam National University |
Keywords: Application of Artificial Intelligence, Computational Intelligence, Neural Networks and their Applications
Abstract: Building an autonomous intelligent agent capable of carrying out web automation tasks from descriptions in natural language offers a wide range of applications, including software testing, virtual assistants, and task automation in general. However, recent studies addressing this problem often require manually constructing of prior human demonstrations. In this paper, we approach the problem by leveraging the idea of reinforcement learning (RL) with the two-phase mechanism to form an agent using LLMs for automating computer tasks without relying on human demonstrations. We evaluate our LLM-based agent using the MiniWob++ dataset of web-based application tasks, showing that our approach achieves 85% success rate without prior demonstrations. The results also demonstrate the agent's capability of self-improvement through training.
|
|
08:05-09:45, Paper TuAPSR.6 | |
Decrease the Prompt Uncertainty: Adversarial Prompt Learning for Few-Shot Text Classification |
|
Weng, Jinta | School of Cyber Security, University of Chinese Academy of Scien |
Zhang, Zhaoguang | Guangzhou University |
Jing, Yaqi | National Computer Network Emergency Response Technical Team/Coor |
Niu, Chenxu | China |
Huang, Heyan | School of Computer Science and Technology, Beijing Institute Of |
Hu, Yue | School of Cyber Security, University of Chinese Academy of Scien |
Keywords: Artificial Social Intelligence, AI and Applications, Machine Learning
Abstract: With few-shot learning abilities, pre-trained language models (PLMs) have achieved remarkable success in classification tasks. However, recent studies have shown that the performance of PLM is vulnerable due to different prompts and the instability of the prompt-based learning process. To address this challenge, we explore appropriate perturbation addition of adversarial training and integrate the global knowledge of the full-parameter fine-tuned pre-trained language model(PLM). Specifically, we propose an adversarial prompt learning model (ATPET) and ATPET with fine-tuning(ATPETFT), incorporating ATPET with fine-tuning knowledge into the prompt learning process. Through extensive experiments on several few-shot classification tasks and challenging data settings, we demonstrate that our methods consistently improve the robustness while maintaining the effectiveness of PLMs.
|
|
08:05-09:45, Paper TuAPSR.7 | |
Enhancing Autofocus Performance through Predictive Motion-Targeting and Self-Attention in a Deep Reinforcement Learning Framework |
|
Wei, Xiaolin | Chongqing University |
Yang, Ruilong | Chongqing University |
Wu, Xing | Chongqing University |
Wang, Chengliang | Chongqing University |
Wang, Haidong | Southwest Hospital of Army Medical University |
Wang, Hongqian | Southwest Hospital of Army Medical University |
Tang, Tao | Chongqing University |
Keywords: Image Processing and Pattern Recognition, AI and Applications, Neural Networks and their Applications
Abstract: In focusing tasks on moving targets, traditional methods that rely on maximizing contrast struggle to capture moving objects due to insufficient focusing speed. Deep learning-based methods have attempted to directly predict the optimal focal length for the target; however, due to low prediction accuracy, they often lead to out-of-focus situations when capturing moving objects. In recent years, some approaches have utilized reinforcement learning to automatically explore focal length adjustment patterns, thus achieving better results than traditional methods. However, these approaches have not considered the motion characteristics of the targets, leading to a need for further improvement in focusing performance. To overcome these limitations, we introduce a motion-based feature and deep reinforcement learning-driven autofocus algorithm named MF-DRLAF for moving targets. This novel method tracks the object, predicts its motion state through feature extraction, and uses deep reinforcement learning to dynamically adjust the focus. We utilize a self-attention mechanism to adaptively learn various motion patterns and employ a feature pool structure to enhance processing efficiency. Experiments and real-world testing on a Google Pixel3 demonstrate that our approach significantly enhances autofocus performance on moving objects, highlighting its potential for broader imaging applications. This approach offers a promising direction for future development in autofocus technology.
|
|
08:05-09:45, Paper TuAPSR.8 | |
Fractional Order Controller Design for LFC of Two-Area Interconnected Power System with Time Delay Based on IMC Approach |
|
K, Gnaneshwar | PDPM IIITDM Jabalpur |
Padhy, Prabin Kumar | PDPM IIITDM Jabalpur |
Keywords: System Modeling and Control, Intelligent Power Grid, Control of Uncertain Systems
Abstract: Load frequency control (LFC) of a two-area connected electric power system is vital for maintaining grid stability and reliability by matching power generation with load demand. Thus, this work proposes an analytical approach for designing a fractional order (FO) controller to regulate the LFC of a two-area connected electrical power system with time delay. First, the interconnected electrical power system is accurately modelled as a FO system with time delay. Then, the FO controller is designed using the internal model control (IMC) technique, where a low-pass filter (LPF) is considered to mitigate the effect of the disturbances. The tuning parameter of the designed FO involves a single tuning parameter, which is analytically designed using gain crossover frequency criteria. The disturbance and parametric uncertainty analyses have been carried out to analyze the efficacy of the proposed method under the variation of tuning parameter. Then, the frequency and tie-line power fluctuations are estimated under nominal and parametric uncertainty conditions. Also, its performance has been compared to recent state-of-the-art techniques for precise efficacy analysis.
|
|
08:05-09:45, Paper TuAPSR.9 | |
SELus: Towards Spatio-Temporal Modeling and Quantitative Evaluation for Cyber-Physical Systems |
|
Zhang, Quanguo | East China Normal University |
Liu, Jing | East China Normal University |
Liu, Mingxing | Nuclear Power Institute of China |
Huang, Yanhong | East China Normal University |
Hou, Rongbin | Nuclear Power Institute of China |
Shi, Jianqi | East China Normal University |
Keywords: System Modeling and Control, Cyber-physical systems, Modeling of Autonomous Systems
Abstract: Synchronous language is routinely used to model safety-critical control systems. In recent years, it is gradually being applied to cyber-physical systems (CPS) which emphasise high levels of correctness and safety. It is based on the assumption that the system reacts instantaneously to input events and can compute the output before the next input event, so it is well suited for expressing temporal logic. However, it lacks effective constructs for expressing spatial properties in CPS. Moreover, spatio-temporal properties in CPS are indispensable, requiring not only qualitative analysis but also quantitative analysis. Therefore, we propose SELus, a new synchronous language based on Lustre, to provide the capability of modeling spatio-temporal properties in CPS, enabling the representation of spatial topological relationships and the performance of quantitative analysis on them. To formally verify the SELus model, we introduce a set of mapping rules to transform the SELus model into the Ptolemy II model. The resulting Ptolemy II model is used in Ptolemy II to perform quantitative analysis of the SELus model. Experiments are conducted on lane changing system, showcasing the usability and effectiveness of our language.
|
|
08:05-09:45, Paper TuAPSR.10 | |
Wheeled Mobile Robots on Rough Terrains As Stochastic Nonholonomic Systems |
|
Gzenda, Vaughn | Carleton University |
Chhabra, Robin | Carleton University |
Keywords: Control of Uncertain Systems, Modeling of Autonomous Systems, Robotic Systems
Abstract: In this paper, we investigate the motion of wheeled mobile robots on rough terrains modeled as noisy nonholonomic constraints. Such constraints are the natural extension of ideal nonholonomic constraints when the Stratonovich process is directly introduced in the constraint equations. The resulting stochastic model can capture motion on rough surfaces, random deformation in the wheel-ground contact, or stochastic loss/gain of traction. We study a differential robot with ideal noisy and affine noisy constraints, where each case models a certain aspect of motion on rough terrains. We then investigate their corresponding stochastic dynamics and the propagation of mean and covariance through Monte-Carlo simulations. The proposed model for roving rough terrains has the potential to serve as the stochastic model employed in model-based motion planning, pose estimation, and control of rover systems. The main challenge will be dealing with the nonlinear appearance of the noise and its feedback in the equations of motion.
|
|
08:05-09:45, Paper TuAPSR.11 | |
Energy-Efficient Hybrid Model Predictive Trajectory Planning for Autonomous Electric Vehicles |
|
Ding, Fan | Monash University |
Luo, Xuewen | Monash University |
Li, Gaoxuan | Monash University |
Tew, Hwa Hui | Monash University Malaysia |
Loo, Junn Yong | Monash University Malaysia |
Chor, Wai Tong | Tunku Abdul Rahman University of Management and Technology |
Bakibillah, A. S. M. | Tokyo Institute of Technology |
Zhao, Ziyuan | I2R,A*STAR |
Tao, Zhiyu | National Science Library, Chinese Academy of Sciences; Departmen |
Keywords: Autonomous Vehicle, System Modeling and Control, Modeling of Autonomous Systems
Abstract: To tackle the twin challenges of limited battery life and lengthy charging durations in electric vehicles (EVs), this paper introduces an Energy-efficient Hybrid Model Predictive Planner (EHMPP), which employs an energy-saving optimization strategy. EHMPP focuses on refining the design of the motion planner to be seamlessly integrated with the existing automatic driving algorithms, without additional hardware.It has been validated through simulation experiments on the Prescan, CarSim, and Matlab platforms, demonstrating that it can increase passive recovery energy by 11.74% and effectively track motor speed and acceleration at optimal power. To sum up, EHMPP not only aids in trajectory planning but also significantly boosts energy efficiency in autonomous EVs.
|
|
08:05-09:45, Paper TuAPSR.12 | |
A Novel Information-Theoretic Metric for Evaluating LiDAR Setups of Autonomous Vehicles |
|
Hafemann, Philipp | Technical University Munich |
Song, Xulin | Technical University Munich |
Brecht, David | Technical University of Munich |
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Intelligent Transportation Systems
Abstract: The sensor configuration of an autonomous vehicle (AV) is determined in the early development phase when specific perception algorithms are not yet available. Therefore, approaches based on synthetic raw data are necessary to evaluate different configurations. One sensor type used in AV is LiDAR, but developers should carefully consider the amount and placement of the sensors due to their high costs. In this contribution, we propose the Omni-Lidar Evaluation Score (OLES), a novel metric to evaluate different LiDAR configurations based on their simulated raw data. Our OLES metric combines information-theoretic quantities with coverage-based metrics, considering both the spatial coverage and the uniformity of a LiDAR point cloud distribution. We show the need for a new metric and provide details on implementing OLES using the open-source simulator textit{CARLA}. We demonstrate the effectiveness of our new metric in a simulation study and highlight its usefulness in the early phases of vehicle development. This research provides a means to evaluate the quality of LiDAR configurations and provides a basis for further optimizing sensor setups for AVs.
|
|
08:05-09:45, Paper TuAPSR.13 | |
The Eco-Label Strategy of Green Manufacture under the Influence of Consumers’ Intrinsic Preferences |
|
Hou, Yingjie | Northwestern Polytechnical University |
Guo, Peng | Northwestern Polytechnical University |
Zhao, Jing | Northwestern Polytechnical University |
Keywords: Consumer and Industrial Applications
Abstract: Considering two eco-label strategies, self-label and certification-label, we construct a duopoly competition model encompasses both green product and ordinary product manufacturing enterprises. By Investigating the optimal eco-label standards, we explore the product pricing, and profits for enterprises facing green-sensitive consumers and price-sensitive consumers. The we analyze the optimal eco-label selection for green enterprises in different preference markets. Research indicates that the green quality standards and product prices under certification labels are invariably higher than those under self-label. However, the choice of eco-label by enterprises is influenced by consumers' individual intrinsic preferences; in price-sensitive markets, enterprises tend to adopt self-label; In green-sensitive markets, when the value of consumers' individual intrinsic preferences is below a certain threshold, enterprises will prioritize certification labels. Additionally, the profits of enterprises in green-sensitive markets are generally higher than those in price-sensitive markets, enterprises should highlight the advantages of green quality and guide consumers to prefer green attributes more when formulating promotional strategies.
|
|
08:05-09:45, Paper TuAPSR.14 | |
AutoForma: A Large Language Model-Based Multi-Agent for Computer-Automated Design |
|
Liao, JianXing | Shenzhen Institute for Advanced Study, University of Electronic |
Xu, Junyan | University of Electronic Science and Technology of China |
He, Sicheng | University of Electronic Science and Technology of China |
Chen, Zeke | UESTC |
Yu, Shui | Shen Zhen Institute for Advanced Study, UESTC |
Li, Yun | Shenzhen Institute for Advanced Study, University of Electronic |
Keywords: Consumer and Industrial Applications, System Architecture
Abstract: With the proliferation of artificial intelligence, Computer-Aided Design (CAD) is being transformed into Computer-Automated Design (CAutoD). In this paper, the advent of Large Language Models (LLMs) introduces new opportunities for CAutoD. This study develops AutoForma, an LLM-based multi-agent system, for automatic conversion from natural language descriptions to 3D models. By harnessing the comprehension capabilities of LLMs, AutoForma streamlines the CAutoD workflow by efficiently translating design intents into precise models in CAD. Through a comprehensive set of evaluations, AutoForma is seen to offer automation performance across various design tasks, particularly in generating non-standard parts that meet specific requirements, with higher efficiency and accuracy than using just an LLM like GPT-4.
|
|
08:05-09:45, Paper TuAPSR.15 | |
Hybrid Data-Mechanism Modeling for Tire Response Dynamics in Estimating Tire–Road Friction Coefficient |
|
Lu, Jiaxing | Tongji University |
Cheng, Liangzhu | Dongfeng Automotive Technology Center |
Liang, Jun | Dongfeng Automotive Technology Center |
Wang, Nian | Dongfeng Motor Corporation |
Li, Bin | College of Electronic and Information Engineering, Tongji Univer |
Zhang, Lin | Tongji University |
Chen, Hong | Tongji University |
Keywords: System Modeling and Control, Electric Vehicles and Electric Vehicle Supply Equipment, Autonomous Vehicle
Abstract: Advanced control and safety systems are crucial for electric vehicles, and the accurate estimation of the tire-road friction coefficient (TRFC) is crucial for developing effective safety control strategies. The hybrid data-mechanism model (HDMM), introduced in this paper, addresses the performance challenges posed by the inaccuracies of physical models and the limited interpretability of data-driven models in tire force estimation for TRFC estimation.Tire dynamics often exhibit transient responses, while mechanism-based models(MBM) typically reflect steady-state characteristics. Neglecting transient characteristics leads to a decrease in model accuracy.A neural network is used to learn the transient response characteristics of tire dynamics.These characteristics are then integrated with the steady-state tire forces from MBM to estimate the lateral and vertical forces acting on the wheel.The estimated tire forces serve as virtual measurements to calibrate parameters in the TRFC estimator, based on the Unscented Kalman Filter (UKF). During real-world vehicle tests, the proposed method reduced the Mean Error (ME) in lateral and vertical forces by 1271.85 N and 996.7 N, respectively, compared to the estimated tire forces from MBM. Additionally, the estimated TRFC converged to the reference value approximately 40ms earlier than the result from the MBM, with an estimated deviation within 0.1.
|
|
08:05-09:45, Paper TuAPSR.16 | |
FLSTAGCN: Traffic Flow Prediction Based on Federated Learning and Attention Graph Convolutional Network |
|
Shi, Lei | Zhengzhou University, School of Cyber Science and Engineering |
Yuan, Shaohua | Zhengzhou University |
Lian, Huijuan | Zhengzhou University |
Gao, Yufei | Zhengzhou University |
Wei, Lin | Zhengzhou University |
Wang, Qilong | Zhengzhou University |
Keywords: Intelligent Transportation Systems, Distributed Intelligent Systems, Smart Buildings, Smart Cities and Infrastructures
Abstract: Traffic flow prediction assumes a pivotal role in aiding governments and companies accurately forecast changes in vehicle volume, consequently enhancing transportation efficiency and facilitating vehicle travel. Presently, the majority of traffic flow prediction methods rely on centralized learning strategies, which entail the transmission of substantial data and may jeopardize user privacy. To address this issue, we propose a Federated Learning-based Attention Graph Convolutional Network (FLSTAGCN) algorithm for traffic flow prediction. Firstly, we develop a Spatial-Temporal Attention Graph Convolutional Network (STAGCN) method that employs attention mechanism to proficiently extract spatial-temporal features from traffic flow data, augmenting the model's learning capabilities. Subsequently, within the aggregation mechanism of Federated learning, we devise a bespoke optimal selection to enhance training accuracy and reduce communication costs in traffic flow prediction scenarios. Finally, we integrate Federated Learning with STAGCN and utilize the optimal selection protocol to designate participants for transmitting optimal parameters. The Experimental results substantiate that our approach outperforms advanced deep learning approaches in terms of traffic flow prediction performance while ensuring the privacy and security of traffic data.
|
|
08:05-09:45, Paper TuAPSR.17 | |
Steering Control Considering Motion Sickness and Vehicle Performance Via DDPG Algorithm and 6-DoF-SVC Model |
|
Kawakami, Uta | The University of Electro-Communications |
Sawada, Kenji | The University of Electro-Communications |
Keywords: Autonomous Vehicle, Decision Support Systems, Adaptive Systems
Abstract: Autonomous driving demands sophisticated control systems that optimize safety, performance, passenger comfort, and fuel efficiency. This study proposes a steering control system that integrates the Deep Deterministic Policy Gradient (DDPG) for speed planning with a novel feedback mechanism based on Subjective Vertical Conflict (SVC) in the reward function. Using simulations in MATLAB and Simulink, we evaluate the system's performance across various thresholds of SVC, examining its impact on ride comfort, fuel efficiency, and vehicle behavior during lane changes. Results reveal a trade-off relationship between ride comfort and fuel efficiency, with lower SVC thresholds generally improving comfort but potentially increasing steering input. Additionally, excessively low SVC thresholds degrade target-reaching performance and lengthen lane change distances, highlighting the need for careful parameter tuning. Overall, our findings demonstrate the potential of reinforcement learning-based steering control systems to optimize multiple evaluation criteria simultaneously while emphasizing the importance of balancing trade-offs in autonomous driving scenarios.
|
|
08:05-09:45, Paper TuAPSR.18 | |
Robust Controller for Varying Speed Autonomous Ground Vehicles Considering System Uncertainties and Road Conditions |
|
Rahim, Md Abdur | Deakin University |
Arogbonlo, Adetokunbo | Deakin University |
Pappu, Mohammad Rokonuzzaman | Deakin University |
Abu Alqumsan, Ahmad | Deakin University |
Keywords: Autonomous Vehicle, Control of Uncertain Systems, System Modeling and Control
Abstract: This paper presents a novel robust path-tracking controller for autonomous ground vehicles. Environmental and vehicle factors like variation in road conditions and varying speed can adversely affect autonomous ground vehicles' path-tracking capability. A polytopic linear parameter varying model for autonomous ground vehicle that accounts for system uncertainties with varying speeds and road conditions is formulated. Then, an H_∞ based robust path-tracking controller is developed using this model to minimise the vehicle's lateral velocity, heading error, and slip angle. Simulation results comparing the proposed controller with a conventional robust controller are presented. The findings show that the proposed controller performs well and is more effective than the conventional robust controller.
|
|
08:05-09:45, Paper TuAPSR.19 | |
Safety Verification of Advanced Driver Assistance Systems Using Hybrid Automaton Reachability |
|
Liu, Lu | Huazhong University of Science and Technology |
Sun, Qi | Huazhong University of Science and Technology |
Yang, Liren | Huazhong University of Science and Technology |
Li, Yahui | Huazhong University of Science and Technology |
Zhou, Chunjie | Huazhong University of Science and Technology |
Keywords: Autonomous Vehicle, Modeling of Autonomous Systems, Cooperative Systems and Control
Abstract: Advanced driver assistance system (ADAS) is effectively promoting the vehicular automation level and it is critical to ensure its functional safety. While existing analysis mainly focuses on individual applications of ADAS, safety violations in the overall system can be found by extensive road tests, which are not only costly in terms of time and money but also lack a formal safety guarantee. This is because tests may not cover all driving scenarios, especially the ones that involve discrete mode switching. In this paper, we focus on the longitudinal vehicle motion and provide a pipeline to perform safety verification for all the related ADAS applications. To that end, we specify safety constraints and boundaries for a vehicle’s longitudinal cruising and collision avoidance and validate a longitudinal dynamic model against the high-fidelity simulation software CarSim. Then we define hybrid automata to describe the closed-loop system composed of the vehicle dynamics and the ADAS. Finally, by computing the reachable sets of the hybrid automata and comparing them with the specified safety boundaries, the ADAS is verified. Numerical experiments demonstrate the efficacy of the proposed approach.
|
|
08:05-09:45, Paper TuAPSR.20 | |
Multi-Segment Fusion-Enhanced Spatial-Temporal Graph Convolutional Network for Traffic Flow Prediction (I) |
|
Zhang, Wei | Chongqing University of Posts and Telecommunications |
Tang, Peng | Southwest University |
Keywords: Intelligent Transportation Systems
Abstract: Accurate traffic Flow Prediction can assist in traffic management, route planning, and congestion mitigation, which holds significant importance in enhancing the efficiency and reliability of intelligent transportation systems (ITS). However, existing traffic flow prediction models suffer from limitations in capturing the complex spatial-temporal dependencies within traffic networks. In order to address this issue, this study proposes a multi-segment fusion-enhanced spatial-temporal graph convolutional network (MS-STGCN) for traffic flow prediction with the following three-fold ideas: a) building a unified spatial-temporal graph convolutional framework based on Tensor M-product, which capture the spatial-temporal patterns simultaneously; b) incorporating hourly, daily, and weekly components to model multi temporal properties of traffic flows, respectively; c) fusing the outputs of the three components by attention mechanism to obtain the final traffic flow prediction results. The results of experiments conducted on two traffic flow datasets demonstrate that the proposed MS-STGCN outperforms the state-of-the-art models.
|
|
TuBT1 |
MR01 |
Computational Intelligence and Soft Computing 2 |
Regular Papers - Cybernetics |
Chair: Yu, Baijiang | South China University of Technology |
|
11:00-11:20, Paper TuBT1.1 | |
Incremental Evolution of Three Degree-Of-Freedom Arachnid Gaits |
|
Parker, Gary | Connecticut College |
Isak, Manan Basil Masaru | Connecticut College |
O'Connor, Jim | Connecticut College |
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence
Abstract: In this research, we evolve gaits for an arachnid-inspired robot. The method used is an expansion upon previous research on the incremental evolution of gaits for hexapod robots with two degrees of freedom per leg, which we now apply to a more complex, eight-legged robot with three degrees of freedom per leg. Incremental evolution handles gait generation for legged robots in two discrete increments. The first increment uses a cyclic genetic algorithm to learn the activations (pulse instructions to the servos) required for each leg to perform a single-leg cycle. This learning program takes into account the way each leg is mounted on the body and the range of movement provided by the three servos on each leg to produce a smooth, straight, and efficient leg cycle. The second increment uses a genetic algorithm to select the best combination of leg cycles for each leg and to learn the timing to execute each leg cycle to coordinate them all together into a single gait. In this work, we learn the gait incrementally in a simulation and transfer the final gaits to the real robot to confirm the method’s viability.
|
|
11:20-11:40, Paper TuBT1.2 | |
Individual-Level Dominant Exemplar Selection for Particle Swarm Optimization |
|
Wang, Hu-Long | Nanjing University of Information Science and Technology |
Duan, Danting | Key Laboratory of Media Audio & Video, Communication University |
Yang, Qiang | Nanjing University of Information Science and Technology |
Gao, Xu-Dong | Nanjing University of Information Science and Technology |
Xu, Peilan | Nanjing University of Information Science and Technology |
Lin, Xin | Nanjing University of Information Science and Technology |
Lu, Zhen-Yu | Nanjing University of Information Science and Technology |
Zhang, Jun | Hanyang University |
Keywords: Swarm Intelligence, Evolutionary Computation, Computational Intelligence
Abstract: Leading exemplars play significant roles in updating particles to seek optimal solutions for Particle Swarm Optimization (PSO). Along this road, this paper devises an Individual-level Dominant Exemplar Selection (IDES) framework for PSO, giving rise to a new PSO variant named IDESPSO. Specifically, instead of using their own personally best positions and the globally best position of the entire swarm to update particles, IDES first randomly chooses two different exemplars for each particle from all personally best positions. Then, it compares the two selected exemplars with the personally best position of this particle. Based on the comparison results, different updating strategies are utilized to update different particles. This method notably enriches the variety among the chosen leading exemplars, thereby substantially bolstering the updating diversity of particles. Under IDES, this paper further develops seven selection strategies to help IDESPSO pick up promising exemplars for particles to evolve. Specifically, the seven selection schemes are the roulette wheel selection, the tournament selection, and five hybridizations of two basic models. A series of experiments have been undertaken on the universally used CEC2014 problem suite to compare IDESPSO with the seven selection schemes and two classic PSOs. The empirical results show that IDESPSO paired with anyone of the seven selection methods, markedly outperforms the two classical PSO variants, highlighting its significant performance.
|
|
11:40-12:00, Paper TuBT1.3 | |
EARL-Light: An Evolutionary Algorithm-Assisted Reinforcement Learning for Traffic Signal Control |
|
Chen, JingYuan | South China University of Technology |
Wei, Feng-Feng | South China University of Technology |
Chen, Tai-You | South China University of Technology |
Hu, Xiao-Min | Guangdong University of Technology |
Jeon, Sang-Woon | Hanyang University |
Wang, Yang | Northwestern Polytechnical University |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Computational Intelligence, Machine Learning
Abstract: Traffic signal control (TSC) problems have re- ceived increasing attention with the development of the smart city. Reinforcement learning (RL) models TSC as a Markov decision process and learns the timing relationship of traffic scheduling from massive historical data. Due to the uncertainty and mutability of TSC problems, existing RL methods face bottlenecks in diversity and are easy to be trapped into local optima. To alleviate this predicament, this paper combines evolutionary optimization and RL to propose an evolution- ary algorithm-assisted reinforcement learning (EARL-Light) method for TSC problems. EARL-Light is a population-based algorithm, in which one individual represents a policy and a population of individuals are evolved to search for near-optimal policies. The diversified search ability of evolutionary optimiza- tion can help the algorithm get rid of local optima for global optimization and the rapid learning based on the gradient of RL can achieve fast convergence. Extensive experiments on seven real-world traffic datasets demonstrates that EARL-Light achieves shorter travel time with fast convergence.
|
|
12:00-12:20, Paper TuBT1.4 | |
Evolutionary Reinforcement Learning with Double Replay Buffers for UAV Online Target Tracking |
|
Yu, Baijiang | South China University of Technology |
Wei, Feng-Feng | South China University of Technology |
Hu, Xiao-Min | Guangdong University of Technology |
Jeon, Sang-Woon | Hanyang University |
Luo, Wenjian | Harbin Institute of Technology, Shenzhen |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Computational Intelligence, Application of Artificial Intelligence
Abstract: Target tracking has broad applications like disaster relief, and unmanned aerial vehicles (UAVs) have been universally applied in target tracking in recent years. Due to the strong responsiveness to deceptive reward signals and diverse exploration, evolutionary reinforcement learning (ERL) is a more noteworthy option for training UAVs than common reinforcement learning. However, for ERL contains too many agents, its training efficiency is not satisfactory enough. To address this shortcoming, this paper proposes an evolutionary reinforcement learning with double replay buffers (ERLDRB) for UAV online target tracking problem. Firstly, considering the energy consumption and the possible delay of feedback signals to the UAV, a more realistic model of UAV online target tracking problem is designed. Then based on the problem formulation, ERLDRB utilizes a double experience replay buffers technique to increase learning efficiency in the training stage, which can better solve real-world UAV online target tracking problem. Simulation results show that ERLDRB outperforms multiple contrasting algorithms on the designed model.
|
|
12:20-12:40, Paper TuBT1.5 | |
Matrix-Based Ant Colony System for Traveling Salesman Problem |
|
Li, Xu | South China University of Technology |
Li, Jian-Yu | South China University of Technology |
Chen, Chun-Hua | South China University of Technology |
Zhan, Zhi-Hui | South China University of Technology |
Kwong, Sam Tak Wu | Lingnan University |
Zhang, Jun | Hanyang University |
Keywords: Evolutionary Computation, Swarm Intelligence, Computational Intelligence
Abstract: Ant colony system algorithm (ACS), as an important evolutionary computation (EC) algorithm, has demonstrated significant advantages in solving complex optimization problems. However, traditional EC algorithms and traditional ACS algorithm often face the challenge of slow computational speed when dealing with large-scale problems. In recent years, matrix-based EC approaches have been proposed to accelerate the computational speed, which has obtained promising results in dealing with large-scale problems. However, most existing matrix-based EC algorithms are designed for continuous optimization problems, while the matrix-based approach integrated with ACS has not attracted enough attention, which will be efficient for solving large-scale discrete optimization problems. Therefore, in this paper, we propose a matrix-based ACS (MACS) algorithm and apply it to solve the traveling salesman problem (TSP). MACS is an innovative improvement over the traditional ACS algorithm, utilizing matrix operations to parallelly let ants select city and update pheromone. Experimental results show that the MACS algorithm has significantly better efficiency in accelerating computational speed while maintaining the remarkable problem-solving ability in solving large-scale TSP.
|
|
12:40-13:00, Paper TuBT1.6 | |
Building Consensus in Group Decision-Making with Intuitionistic Reciprocal Preference Relations: An Analysis of Various Protocols of Information Granularity Distribution |
|
González-Quesada, Juan Carlos | University of Granada |
Cabrerizo, Francisco Javier | University of Granada (Q1818002F) |
Herrera Viedma, Enrique | University of Granada (Spain) |
Pedrycz, Witold | University of Alberta |
Keywords: Fuzzy Systems and their applications, Computational Intelligence
Abstract: On the one hand, to model experts' preferences in group decision-making, intuitionistic reciprocal preference relations have widely been used because they allow for accommodating hesitation degrees, which are inherent to all decision-making processes. On the other hand, an optimization of information granularity distribution has recently been applied to establish consensus during group decision-making processes. Concretely, a symmetric and uniform distribution of information granularity has been considered for intuitionistic reciprocal preference relations. However, there exist other protocols of information granularity distribution that could be used. Therefore, we aim to analyze all the information granularity distribution protocols and determine their effectiveness in building consensus through intuitionistic reciprocal preference relations. The performance of the different protocols is discussed by conducting some numerical experiments that help provide insights into the effectiveness of the protocols to build consensus.
|
|
TuBT2 |
MR02 |
Deep Learning and Neural Networks 5 |
Regular Papers - Cybernetics |
Chair: Raju, S M Taslim Uddin | University of Waterloo |
|
11:00-11:20, Paper TuBT2.1 | |
CMA-BP: A Clustered Multi-Task Learning and Branch Attention Based Branch Predictor |
|
Ming, Li | University of Electronic Science and Technology of China |
Rucong, Xu | University of Electronic Science and Technology of China |
Zhang, Hexu | University of Electronic Science and Technology of China |
Li, Lin | Qingdao Agriculture University |
Li, Yun | Shenzhen Institute for Advanced Study, University of Electronic |
Keywords: Neural Networks and their Applications, Machine Learning, AI and Applications
Abstract: Branch prediction stands as a key bottleneck in enhancing CPU performance, particularly evidenced by an average of around 10 mispredicted hard-to-predict(H2P) branches per benchmark in SPEC 2017 by current neural network methods. To improve, this paper proposed a Clustered Multitask Learning and Branch Attention Mechanism-Based Branch Predictor (CMA-BP). Clustered multi-task learning enhances model generalization, and branch attention extracts preferences of different branches for global history. Thus, CMA-BP efficiently aggregates branches with similar features, reducing training complexity. Experimental results show that CMABP outperforms existing predictors in accuracy significantly and in the number of parameters required. By advancing the state-of-the-art in branch prediction, this work has important implications for future high-performance computer architecture design
|
|
11:20-11:40, Paper TuBT2.2 | |
RS-DETR: An Improved DETR for High-Resolution Remote Sensing Image Object Detection |
|
Cao, Feng | Shanxi University |
Wang, Ruoyu | Shanxi University |
Li, Deyu | Shanxi University |
Hu, ZhiGuo | Shanxi University |
Keywords: Deep Learning, Machine Learning, Neural Networks and their Applications
Abstract: High-resolution remote sensing image object detection is an important research area in remote sensing information processing and has substantial practical applications. This domain presents unique challenges, including variable object scales, complex backgrounds, prevalent small objects, and densely arranged items, distinguishing it from traditional object detection in natural images. This paper proposes a novel object detection algorithm(RS-DETR), which builds upon the DETR framework and integrates the Swin Transformer. The algorithm features a dual-branch structure in its feature extraction module, markedly improving detection accuracy, especially for objects of varying scales. The addition of the GAM convolutional attention mechanism allows the model to concentrate more effectively on relevant regions, minimizing background complexities. Moreover, we have included the scale-invariant intersection over union (SIoU) loss function to enhance the precise localization of closely packed objects. To demonstrate the efficacy of the algorithm, RS-DETR was applied to the HRSC2016 and NWPU VHR-10 datasets. The results show average detection accuracies of 86.1% and 57.9% on these datasets, respectively, outperforming the baseline models by 1.1% and 0.9%, respectively.
|
|
11:40-12:00, Paper TuBT2.3 | |
TransUAAE-CapGen: Caption Generation from Histopathological Patches through Transformer and UNet-Based Adversarial Autoencoder |
|
Raju, S M Taslim Uddin | University of Waterloo |
Mohammad, Abdul Raqeeb | University of Waterloo |
Islam, Md. Milon | University of Waterloo |
Karray, Fakhreddine | University of Waterloo |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning
Abstract: Captioning Whole Slide Images (WSIs) for pathological analysis is an essential but not extensively explored aspect of computer-aided pathological diagnosis. Challenges arise from insufficient datasets and the effectiveness of model training. Generating automatic caption reports for various gastric adenocarcinoma images is another challenge. In this paper, we introduce a hybrid method referred to as TransUAAECapGen to generate histopathological captions from WSI patches. The TransUAAE-CapGen architecture consists of a hybrid UNet-based Advereasrial Autoencoder (AAE) for feature extraction and a transformer for caption generation. The hybrid UNet-based AAE extracted complex tissue properties from histopathological patches, transforming them into lowdimensional embeddings. The embeddings are then fed into the transformer to generate concise captions. Our proposed method is validated using the PatchGastricADC22 dataset. The TransUAAE-CapGen model provides the best estimated accuracy of BLEU-4 = 86.8%, METEOR = 59.6%, a ROUGE = 89.3%, and CIDEr = 7.72%. Experimental analysis indicates that the TransUAAE-CapGen architecture outperforms the traditional LSTM-based model for the caption generation task. Our findings reveal that the proposed architecture can effectively generate accurate and precise reports for medical image analysis.
|
|
12:00-12:20, Paper TuBT2.4 | |
Learned Image Compression with Transformer-CNN Mixed Structures and Spatial Checkerboard Context |
|
Ji, Kexin | Hohai University |
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: Learning-based image compression techniques combined with current Transformer models and with checkerboard context models have shown the excellent Rate-Distortion performance. However, the mixed structure still has room for optimization in terms of redundancy information and decoding efficiency, while the checkerboard context model has redundancy in capturing correlations between latent representations. To solve these problems, we propose an innovative framework that combines a mixed Transformer-CNN structure with a checkerboard context model. Specifically, we introduce a ``Checkerboard Channel-wise Entropy Module" to improve coding efficiency of utilizing contexts through a two-channel decoding method with checkerboard contexts. Then, we propose the ``In-slice Odd-even Context", which improves the handling of spatial redundancy information by adding additional spatial contexts by introducing a checkerboard context model to the original mixed structure with channel contexts and global contexts. Extensive experimental results demonstrate that our proposed method outperforms JPEG, BPG and previous learned image compression on the Kodak dataset.
|
|
12:20-12:40, Paper TuBT2.5 | |
Multi-Kernel Broad Learning System Based on Elastic-Net with Random Fourier Features |
|
Zhang, Qihuai | Beijing Normal University |
Zhao, Xiaojie | Beijing Normal University |
Keywords: Machine Learning, Neural Networks and their Applications
Abstract: The Broad Learning System (BLS) features a simple yet efficient network structure, with its core being the fast and random generation of hidden layers; however, this generation method not only fails to effectively capture the nonlinear characteristics in the task, but also generates certain 'redundant nodes', which can negatively affect its learning capabilities. In this study, we propose an improved version of BLS, named the KEFBLS, aimed at enhancing the feature extraction capability of the hidden layer through the integration of multi-kernel technology and network sparsification strategies, complemented by deeper feature extraction using random Fourier features. the KEFBLS first combines polynomial and wavelet kernels to boost the nonlinear mapping capabilities of data; then, it applies the elastic-net method to refine the BLS objective function, removing low-impact hidden layer nodes to reduce redundancy and create a more streamlined network; finally, KEFBLS employs random Fourier features to map the processed hidden layers, further enhancing the network's feature extraction capabilities, constructing a new learning model. Our experimental results on three UCI regression datasets demonstrate that KEFBLS surpasses other methods in terms of learning efficiency and model performance.
|
|
12:40-13:00, Paper TuBT2.6 | |
SFAM-Net: A Novel Dual-Branch Network Based on Spectral Feature and Attention Machine for Building Change Detection in Remote Sensing Imagery |
|
Li, Jiequn | Taiyuan University of Technology |
He, Zhisen | Taiyuan University of Technology |
Lv, Yanfang | Taiyuan University of Technology |
Yan, Chen | Taiyuan University of Technology |
Wang, XingKui | Taiyuan University of Technology |
Keywords: Neural Networks and their Applications, Deep Learning, Machine Vision
Abstract: Deep learning techniques have significantly advanced change detection in remote sensing imagery. However, building change detection presents challenges due to the varied appearance of buildings and the complexity of scenes in remote sensing images. Current deep learning-based methods encounter three primary issues. Firstly, CNN-based approaches struggle to model crucial global contextual information essential for remote sensing building images analysis. Transformer-based methods may inadvertently degrade local features. Secondly, traditional attention mechanisms fall short in effectively modeling spatial and spectral features. Thirdly, certain channel attention methods extract excessive redundant information.To address these challenges, this study proposes SFAM-Net, a two-branch hybrid architecture. Our approach initially employs orthogonal methods to minimize redundant information extracted from channels and spaces. Subsequently, we leverage the parallel structure of convolutions and visual transformers to enhance images representation, integrating local features and global representations through cross-attention to better coordinate building and background features. In the CNN and Transformer branches, we adopt spatial-spectral feature coordination and spectral multi-head attention coordination strategies to improve performance in complex scenes. Additionally, we introduce a novel loss function combining edge and center guidance, focusing on changing image edges and centers to enhance sensitivity and accuracy in change area detection. Extensive experiments on widely used LEVIR-CD and WHU-CD datasets validate the effectiveness and efficiency of our network.
|
|