| |
Last updated on October 3, 2023. This conference program is tentative and subject to change
Technical Program for Wednesday October 4, 2023
|
We-PS10T1 Workshop Session, Puna |
Add to My Program |
Multimodal and Multisensory Sensory Information in BMI |
|
|
Organizer: Bougrain, Laurent | Université De Lorraine/LORIA |
Organizer: Herrera Altamira, Gabriela | Université De Lorraine, LORIA |
|
08:00-08:15, Paper We-PS10T1.1 | Add to My Program |
Central and Peripheral Activities During Driving with Simulated Cataract Vision (I) |
|
Kasagi, Shunsuke | Ritsumeikan University |
Kashihara, Koji | Ritsumeikan University |
Keywords: Human Factors, Human-Centered Transportation, Human Performance Modeling
Abstract: Traffic accidents caused by elderly drivers have become major social problems. We investigated central and autonomic nervous system activities at obstacle events while operating a driving simulator with either normal or simulated cataract vision. Driving with cataract vision acutely increased brain and sympathetic nervous system activities, along with attenuated parasympathetic nervous system activity, in some obstacle events under accumulated eyestrain, which may induce driver distraction. In future research, attention alert systems based on biological features may prove useful for safe driving.
|
|
08:15-08:30, Paper We-PS10T1.2 | Add to My Program |
Brain Patterns Generated While Using a Tongue Control Interface: A Preliminary Study with Two Individuals with ALS (I) |
|
Kæseler, Rasmus | Aalborg University |
Jochumsen, Mads | Aalborg University |
Santos Cardoso, Ana S | Center for Rehabilitation Robotics, Aalborg University |
Andreasen Struijk, Lotte N S | Aalborg University |
Keywords: Brain-Computer Interfaces, Human-Machine Interface, Medical Informatics
Abstract: Individuals suffering from a progressive neurodegenerative disease, such as amyotrophic lateral sclerosis (ALS), will lose muscle function over time and become completely paralysed. For some time, people with ALS may retain functional tongue movement, despite losing mobility below the neck. These individuals can benefit from using an inductive tongue control interface (ITCI) to control computers or assistive robotic devices to gain independence in their daily lives. Eventually, when the individual can no longer use their tongue, they can rely on a brain computer interface (BCI). However, these require a lot of data to calibrate and function properly. Recording this data while the individual can still use the ITCI can potentially speed up the training process, allowing for an easier transition between interface technologies. This study investigates whether it is possible to create a background data collector for a BCI based on attempted tongue movement by analyzing brain patterns of two individuals with ALS while using an ITCI. The participants used an inductive ITCI in simple cued movement trials while electroencephalogram (EEG) was collected from the motor cortex. The EEG signal indicated that movement-related cortical potentials (MRCP) were generated after the cued movements. After synchronising the signal to the activations recorded on the ITCI, the MRCP became even more apparent. Therefore, it is concluded that it is possible to record MRCPs from individuals with ALS performing tongue movements, that the ITCI can assist in better extracting synchronized MRCP epochs, and that a background data collector for a tongue movement intention-based BCI is very feasible.
|
|
08:30-08:45, Paper We-PS10T1.3 | Add to My Program |
EEG Modulations Induced by a Visual and Vibrotactile Stimulation (I) |
|
Herrera Altamira, Gabriela | Université De Lorraine, LORIA |
Fleck, Stephanie | Université De Lorraine |
Lecuyer, Anatole | Inria |
Bougrain, Laurent | Université De Lorraine/LORIA |
Keywords: Other Neurotechnology and Brain-Related Topics, Active BMIs
Abstract: Kinesthetic motor imagery (KMI) based brain-computer interfaces hold great potential for post-stroke motor rehabilitation. However, KMI is a complex task to perform, mainly due to the absence of sensory and kinesthetic feedback. To address this issue, sensory neurofeedback solutions have been proposed. One of them is tactile vibration, which has been used in BCI paradigms, but the optimal design choices of the vibration remain unclear. In this study, we present a novel bimodal stimulus that combines a vibrotactile device for the upper limb with a visual animation of a hand grasping a bottle. We aimed to investigate the effects of four different vibration patterns and four stimulation intensities to study the effects of the stimuli on the electroencephalography (EEG) signals of 17 healthy subjects without performing KMI. Our findings revealed EEG activity in the alpha, beta, and delta frequency bands within the somatosensory cortex during the sensory stimulation. While we did not observe significant differences in brain activity among the vibration patterns, we did observe statistically significant variations based on the vibration intensity. We observed central activity in both brain hemispheres, with stronger activity occurring in the contralateral hemisphere. Additionally, EEG activity seemed to involve the contralateral occipital areas. Based on these results, it is worth considering the delivery of feedback after participants have completed KMI tasks to differentiate and comprehensively study the brain activity resulting from the mental and sensory stimulation tasks.
|
|
08:45-09:00, Paper We-PS10T1.4 | Add to My Program |
Decoding Taste from EEG: Gustatory Evoked Potentials During Wine Tasting (I) |
|
Gonzalez-Espana, Juan Jose | University of Houston |
Back, KiJoon | University of Houston |
Reynolds, Dennis E | University of Houston |
Contreras-Vidal, Jose | University of Houston |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics, Passive BMIs
Abstract: Decoding taste from brain activity is of paramount importance for understanding brain responses to gustatory stimuli with applications in the food industry, neuro-marketing, objective evaluation of consumer's preferences, and addiction prevention/treatment. However, progress in this field has been thwarted by challenging data acquisition and the multimodal aspects of gustatory events, which make traditional decoding based on event-related potentials (ERP) very difficult. Additionally, emerging approaches such as Deep Learning come short to give insights into decoding taste because their need of big amounts of data. In this paper, we propose a processing and analytical pipeline based on Multivariate Pattern Analysis (MPA) and Global Field Power (GFP) to analyze gustatory evoked potentials recorded with high-density scalp electroencephalography (EEG). Based on this analysis we show the evolution overtime of neural codes for identification of water-vs-wine and wine-vs-wine dyads.
|
|
We-PS10-T4 Regular Session, Hawaii 2 |
Add to My Program |
Information Visualization I |
|
|
|
08:15-08:30, Paper We-PS10-T4.2 | Add to My Program |
Development of a Calibration Method of Hidden Background Observer Cameras for Diminished Reality Using Radio Direction Finding |
|
Ono, Tasuku | Kyoto University |
Ueda, Kimi | Kyoto University |
Ishii, Hirotake | Kyoto University |
Shimoda, Hiroshi | Kyoto University |
Keywords: Virtual and Augmented Reality Systems, Virtual/Augmented/Mixed Reality, Information Visualization
Abstract: Hidden background cameras used to realize Diminished Reality (DR), which allows the observer to see through obstacles, are installed behind the obstacles. In order to realize DR, it is necessary to measure the positions of these cameras with respect to the observer's position, but this is difficult because the cameras cannot be seen directly from the observer's position. In this study, two calibration methods, the intersection method and the P4P method, which use the directions of radio waves from radio sources mounted on each camera to measure the positions of the hidden background cameras, are proposed. The intersection method measures the direction from several different points to the hidden background cameras and estimate the 3D position of the hidden background camera as the intersection of the measured directions. The P4P method measures the relative pose between multiple hidden background cameras in advance and uses it as a constraint to estimate the 3D poses of the hidden background cameras. The results showed that the P4P method was more accurate than the intersection method, and able to calibrate the camera with a position error of less than 100 mm.
|
|
08:30-08:45, Paper We-PS10-T4.3 | Add to My Program |
Towards Modular and Formally-Verifiable Software Architecture for Clinical Guidance Systems |
|
Song, Shuang | University of Illinois at Urbana-Champaign |
Saxena, Manasvi | University of Illinois at Urbana-Champaign |
Tsai, Pei-Hsuan | National Cheng Kung University |
Sha, Lui | University of Illinois at Urbana-Champaign |
Keywords: Medical Informatics, Human-Machine Cooperation and Systems, Information Visualization
Abstract: Computer Science is being increasingly used in medicine to improve quality of care and patient outcome. Clinical Guidance Systems (CGSs) codify clinical Best Practice Guide- lines (BPGs) and provide situation-specific advice to physicians. CGSs have shown effectiveness in reducing preventable medical errors during clinical evaluations. However, representing both the BPG and the associated physical processes in software is complex and tedious, making CGSs prone to bugs. This can be mitigated using a modular software architecture that allows encapsulated computational representation of physical processes, organized data management, fine-grained development and verification, in addition to improving comprehensibility and maintainability. This paper discusses the sources of complexity in CGSs, and propose a software architecture that consists of a patient digital twin, a compliance monitor, a Graphical User Interface (GUI) along with a communication middleware. We developed a CGS for Pediatric Sepsis Management co-designed by physicians using our approach. We use the Fluid Resuscitation therapy of this CGS as a case study to illustrate the development process.
|
|
08:45-09:00, Paper We-PS10-T4.4 | Add to My Program |
Development of a High-Definition Visualization System for Immersion into a Moving Miniature World |
|
Yamada, Suzuka | Kyoto University |
Murayama, Masahiro | Kyoto University |
Ueda, Kimi | Kyoto University |
Ishii, Hirotake | Kyoto University |
Shimoda, Hiroshi | Kyoto University |
Keywords: Virtual and Augmented Reality Systems, Information Visualization, Virtual/Augmented/Mixed Reality
Abstract: Free viewpoint video technology, which provides a high sense of presence and immersion by allowing users to view live video from any viewpoint, is increasingly demanded mainly in the entertainment field, as an application that provides new experiences for users. Although there have been many studies on free viewpoint video generation for large-scale spaces, they have focused on generating video for the entire wide-area space, and challenges still remain in terms of reproduction of subject details and real-time video performance of image generation. In this study, for the purpose of developing a system which enables immersion into a moving miniature world, a method to generate high-definition free viewpoint video for small-scale spaces in real time was realized and evaluated. It can generate high-definition free viewpoint images by combining free viewpoint depth images applying noise reduction processing and the corresponding color images considering occlusion and the normal direction of the object's surface. The free viewpoint images generated by the proposed system was evaluated. The results confirmed that consideration of occlusion and the normal direction of the subject's surface improved the quality of the generated images. In addition, the realized methods were able to generate clear free viewpoint images even when the free viewpoint approached the subject.
|
|
We-PS10-T6 Regular Session, Hawaii 4 |
Add to My Program |
Cyber-Physical Systems II |
|
|
|
08:45-09:00, Paper We-PS10-T6.4 | Add to My Program |
Optimized Vision Transformer for Dementia Diagnosis Using Micro-Doppler Radar |
|
Ishibashi, Ryuto | Ritsumeikan University |
Kaneko, Hayata | Ritsumeikan University |
Nojiri, Naoto | Ritsumeikan University |
Saho, Kenshi | Ritsumeikan University |
Meng, Lin | Ritsumeikan University |
Keywords: Artificial Life, Application of Artificial Intelligence, AI and Applications
Abstract: In the aging society, the number of dementia patients continues to increase, and early detection of dementia is required. However, going to a hospital and being diagnosed by a doctor is burdensome for elderly people. This paper designs a Vision Transformer(ViT)-based gait diagnosis with micro-Doppler radar to diagnose dementia without burden for elderly people. The ViT is optimized by proposed Vertical Rectangle Patching and Adaptive Thresholding, which improve the Attention of ViT. The micro-Doppler radar collects the signal of elderly people, and the signal is transformed into two kinds of signal-analyzed images by two signal-analysis methods (Short-term Fourier Transform: STFT, and Continuous Wavelet Transform: CWT) for diagnosing by optimized ViT. Experiments compare eight kinds of CNN models, and current ViTs with the optimized ViT to evaluate the proposal's performance. The experimental results show that STFT is suitable for analyzing micro-Doppler radar signals, and ViT-56x4s+th, which uses Vertical Rectangle Patching and Adaptive Thresholding, achieves high scores such as an accuracy of 88.9%. Another proposed model ViT-224x1 allows for faster learning convergence for time series data and improved accuracy without fine-tuning. In summary, the possibility of diagnosing dementia by optimized ViT has been provided. The experimentation of Adaptive thresholding also proves there are some unimportant patches and brings an idea for reducing the unimportant computations to achieve a compact ViT as future work.
|
|
We-PS10-T7 Regular Session, Honolulu |
Add to My Program |
Human Enhancements II |
|
|
|
08:00-08:15, Paper We-PS10-T7.1 | Add to My Program |
Open-Set Motion Recognition and Adaptive Structural Modification of Classifiers Based on Clustering of Unknown Motions |
|
Mukaeda, Takayuki | Yokohama National University |
Shima, Keisuke | Yokohama National University |
Keywords: Human-Machine Interface, Human Enhancements
Abstract: In the development of myoelectric prosthetic hands for reliable determination of the user’s intended motion, open-set recognition methods enabling consideration of unexpected inputs help to prevent malfunction. Conversely, some abnormal inputs may encompass new motions that can be used for motion classifier evolution and automatic acquisition of anomaly cluster structures. Against such a background, this paper outlines advanced open-set recognition involving clustering of unknown classes for enhanced ease of prosthetic hand interface usage. The proposed approach involves open-set recognition using the authors’ probabilistic neural network, a novel cluster detection method employing a non-parametric Bayesian model, and structural modification. Results from forearm motion recognition using electromyogram signals demonstrated the effectiveness of the technique.
|
|
08:15-08:30, Paper We-PS10-T7.2 | Add to My Program |
CARE: Cable Assistive Rehabilitation Elbow Exoskeleton for Arm Movement Assistance |
|
Berdal, Jarren | San Francisco State University |
Lile, Richard | GAF Energy |
Quintero, David | San Francisco State University |
Keywords: Assistive Technology, Human-Machine Interaction, Human Enhancements
Abstract: The Cable Assistive Rehabilitation Elbow Exoskeleton (CARE) is an active arm assistance device for in-home rehabilitation for individuals who suffered a traumatic brain injury, such as stroke, that experienced elbow flexion/extension impairment. A robotic elbow exoskeleton can provide ongoing therapeutic treatment to retrain arm movement functionality to perform activities of daily living. Providing an accessible, lightweight robotic exoskeleton is an ongoing challenge for wearable devices. CARE introduces a cable-driven design using a bidirectional, dual pulley system mated with a custom 17.5:1 planetary gearbox manufactured using high strength engineering stereolithography resin material. The cable-driven dual pulley system transfers the rotary motion to actuate an arm brace for elbow joint flexion and extension. Benchtop experiments demonstrated a maximum output torque of 26 Nm while lifting a 1kg object. Results exceeded the minimum joint torque and velocity requirement to carry out day-to-day activities. The actuation system used additive manufacturing techniques that decreased the transmission mass by 60% compared to conventional transmission designs, reduced cost for rapid prototyping, and provided formability for a compact elbow exoskeleton design. Overall, the system weighs less than 7 kg that can be translated to an in-home rehabilitation unit for stroke treatment.
|
|
08:30-08:45, Paper We-PS10-T7.3 | Add to My Program |
Expressed and Private Opinion Dynamics with Group Pressure and Liberating Effect |
|
Peng, Yuan | University of Electronic Science and Technology of China |
Dong, Jianglin | University of Electronic Science and Technology of China |
Zhao, Yiyi | Southwestern University of Finance and Economics |
Hu, Jiangping | University of Electronic Science and Technology of China |
Keywords: Agent-Based Modeling, Complex Network
Abstract: This paper introduces the liberating effect under group pressure into the HK model and proposes a novel expressed and private opinion dynamics model. Agents in the group hide their honest opinions because of the group pressure, and each agent has a private and expressed opinion. The liberating effect is divided into two stages. In the first stage, one agent in the group is the first to liberate when the number of times it feels pressure exceeds a specific limit and the cumulative pressure is the maximum. The liberating agent will express its opinion authentically, with private opinion consistent with the expressed opinion. In the second stage, the other agents in the group are influenced by the liberating neighbors and also liberate until the group evolution reaches a stable state. Through simulations, we study the effects of confidence level and pressure threshold on group opinion evolution. The experimental results show that both confidence level and pressure threshold are critical. All agents liberate when they are smaller than the critical value; when they are greater than the critical value, the liberating effect disappears. We also find that the liberating effect can accelerate the group opinion evolution.
|
|
We-PS10-T8 Regular Session, Kahuku |
Add to My Program |
Design Methods I |
|
|
|
08:00-08:15, Paper We-PS10-T8.1 | Add to My Program |
Gradient Projection Differential Neural Solution for Quadratic Optimization with Quadratic Constraints: An ACP Perspective |
|
Liufu, Ying | Lanzhou University |
Liu, Mei | Lanzhou University |
Jin, Long | Lanzhou University |
Wang, Fei-Yue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Cognitive Computing, Design Methods
Abstract: In recent years, quadratic optimizations have become increasingly popular in engineering. However, conventional methods that investigate this problem from the perspective of a canonical form with linear constraints are not effective in dealing with the significant challenges posed by quadratic constraints in practice. This paper proposes a solution framework for the quadratic optimization with quadratic constraints (QOQC) based on innovative artificial societies, computational experiments, and parallel execution (ACP) framework. Then, a gradient projection differential neural solution (GPDNS) is proposed to address this. To illustrate the effectiveness of the GPDNS model in solving the QOQC system, numerical simulations are provided. Overall, this paper presents the potential of innovative approaches like the ACP framework to enhance our capabilities in addressing challenging optimization systems.
|
|
08:15-08:30, Paper We-PS10-T8.2 | Add to My Program |
Development and Performance Evaluation of Variable Wheelchair Wheels |
|
Kato, Akira | Tokyo Denki University |
Kuge, Mayuu | Tokyo Denki University |
Yaowei, Chen | Tokyo Denki University |
Iwase, Masami | Tokyo Denki University |
Inoue, Jun | Tokyo Denki University |
Keywords: Assistive Technology, Design Methods, Human-Centered Transportation
Abstract: As ageing populations continue to grow, there is an increasing demand for mobility aids such as wheelchairs. However, conventional wheelchair wheels are designed for level ground and are not suitable for travelling over rough terrain or steps. In addition, with the rapid development of mega-cities, more obstacles such as steps and Braille blocks have been introduced, making outdoor mobility even more challenging for wheelchair users. Although electric wheelchairs have been developed for use in various environments, they are costly and not accessible for all users.To address this issue, the authors focused on developing an affordable wheel that can be used in various driving environments, including flat ground, steps, and rough terrain. They designed a wheel that utilizes a slider-crank mechanism, which can be adjusted by the wheelchair user using a central disc. In this paper, the design of the wheel is presented, and its performance was evaluated through experiments involving step climbing and rough terrain simulations. The results showed a reduction in the force required to overcome steps, improved stability when navigating steps, and enhanced driving performance on rough terrain after deploying the new wheel.
|
|
08:30-08:45, Paper We-PS10-T8.3 | Add to My Program |
Investigating the Usability and Comprehensibility of Process Mining Tools within an Application-Specific Context |
|
Araújo, Thiago Sousa | Federal University of Pernambuco |
Lima, Ricardo | UFPE |
Oliveira, Adriano, Adriano L.I.Oliveira | Universidade Federal De Pernambuco |
D'Castro, Raphael José | Tribunal De Justiça De Pernambuco |
Gusmao, Braulio G | Tribunal Regional Do Trabalho Da 9ª Região |
Leite Paulo, Rafael | National Council of Justice - CNJ |
Guerra, João Thiago de França | Conselho Nacional De Justiça |
Keywords: Visual Analytics/Communication, User Interface Design, Design Methods
Abstract: Context: Process Mining (PM) aims to discover processes and their characteristics from event logs recorded by information systems. There are dozens of general-purpose tools. The Brazilian judiciary wants to make the technology available to magistrates with little or no knowledge of the field of PM. Problem: The usability and comprehensibility of the available tools prevent their adoption by laypeople. In fact, these are two of the eleven challenges proposed by the IEEE Task Force on PM that are still little explored in the context of non-specialists. Methodology: Applied qualitative research, using User Centered Design (UCD) principles to guide the construction of the JuMP tool, conceived considering anthropological and sociological aspects of the Brazilian judiciary. For a year and six months, a team of PM specialists worked with representatives of the judiciary sector to produce, evaluate and evolve the new product. An experiment was performed to evaluate the usability and understandability of JuMP. Results: The study demonstrated that the use of a PM tool oriented to the application domain is fundamental for domain experts with little or no knowledge in PM to be able to make good use of it. Contribution: Demonstration that, unlike the prevailing practice in the area of PM, which prioritizes the provision of general purpose tools, the design of tools oriented to the application domain are prerequisites to improve usability and comprehensibility attributes. Finally, the work proposes a generic and adaptable methodology for developing specific-purpose PM tools and the experience of its implications to the design of the JuMP tool.
|
|
08:45-09:00, Paper We-PS10-T8.4 | Add to My Program |
Homeostatic System Design Based on Understanding the Living Environmental Determinants of Falls |
|
Oono, Mikiko | National Institute of Advanced Industrial Science and Technology |
Nomura, Ayano | Tokyo Insutitute of Technology |
Kitamura, Koji | National Institute of Advanced Industrial Science and Technology |
Nishida, Yoshifumi | Tokyo Institute of Technology |
Nakahara, Shunsaburo | Japan Industrial Design Association |
Kawai, Hisashi | Tokyo Metropolitan Institute for Geriatrics and Gerontology |
Keywords: Cooperative Work in Design, Human Factors, Design Methods
Abstract: Falls among older adults are a serious global challenge. The World Health Organization strongly recommends gait, balance, and functional training, Tai Chi, or home assessment and modification to prevent falls among older people, but these strategies have not changed over the past few decades. The purpose of the present study is to propose an innovative approach to fall prevention naturally embedded in the environment. We first suggest new methods for simplifying and understanding the relationship between falls and everyday life, and describe this relationship as a knowledge graph using emergency transport data on falls among the aged. As a result of a cluster analysis using knowledge graphs, we identified whole-body balance as a key factor and living environmental determinant of falls. We then propose the concept of a homeostatic spatial system and discuss our development of seven homeostatic products as a proof of concept. Finally, we develop an evaluation system to visualize specific locations in a defined area where people maintain whole-body balance using their hands or other body parts. Our findings confirm that the system is useful for evaluating how people change their behaviors to maintain whole-body balance based on homeostatic products and the living environment as a whole.
|
|
We-PS10-T9 Regular Session, Oahu |
Add to My Program |
Wearable Computing II |
|
|
|
08:00-08:15, Paper We-PS10-T9.1 | Add to My Program |
Work Recognition and Movement Trajectory Acquisition Using a Multi-Sensing Wearable Device |
|
Ogata, Kunihiro | National Institute of Advanced Industrial Science and Technology |
Tanaka, Hideyuki | National Institute of Advanced Industrial Science and Technology |
Kourogi, Masakatsu | AIST |
Keywords: Wearable Computing, Environmental Sensing,, Human-centered Learning
Abstract: Musculoskeletal diseases such as low back pain may cause physical disabilities and make it difficult to continue working. From this point of view, logging of workers' conditions in the working environment is an important issue. So far, limited information acquisition such as smart watches has been realized, but comprehensive information has not been realized with a single device. Therefore, in this study, we estimated the wearer's state using a device called THINKLET, which can acquire image information and inertial information at the same time. By acquiring the pose of the hand from the image information, we were able to estimate the work. More detailed information can be obtained by calculating the position at the same time. By integrating image information with inertial information, it becomes possible to measure the position and posture of the wearer in the working environment. By complementing each other's sensor information, it is possible to obtain robust and accurate self-location. In this research, we developed these basic technologies and evaluated their performance.
|
|
08:30-08:45, Paper We-PS10-T9.3 | Add to My Program |
Estimation of Ground Reaction Force Using Carbon Insoles with Piezoelectric Film Sensors and Force Sensing Resistors |
|
Yabu, Soya | Osaka Institute of Technology |
Shimizu, Kotaro | National Rehabilitation Center for Persons with Disabilities |
Morita, Masanori | Murata Manufacturing Co. Ltd |
Kawashima, Noritaka | National Rehabilitation Center for Persons with Disabilities |
Yoshikawa, Masahiro | Osaka Institute of Technology |
Keywords: Wearable Computing
Abstract: Insole-attached force sensing resistors (FSRs) have been used as an alternative to force plates to estimate ground reaction force (GRF) during gait. Since the FSRs have a narrow sensing area and cannot measure force in the mediolateral and anteroposterior directions, it has been challenging to estimate three components of GRF with high accuracy. This paper reports on a gait measurement system using carbon insoles with attached piezoelectric film sensors and FSRs, and the estimation of three components of GRF using the system. Outputs of piezoelectric film sensors reflect the force that deforms the carbon insole in the mediolateral and anteroposterior directions. Using support vector regression (SVR) models trained with the piezoelectric film sensor and FSR data, the estimation accuracy of the three components of GRF was improved.
|
|
We-PS10-T11 Special Session, Hawaii 6 |
Add to My Program |
Assistive Technology and Human -Computer Interaction |
|
|
Organizer: Fortino, Giancarlo | University of Calabria |
Organizer: Liu, Peter X. | CARLETON UNIVERSITY |
Organizer: Wang, Zhelong | Dalian University of Technology |
Organizer: Ye, Li | Chinese Academy of Sciences |
|
08:00-08:15, Paper We-PS10-T11.1 | Add to My Program |
SSE-Based Evolutionary Algorithm for Hyper-Parameter Optimization of LightGBM on Paddy Rice Yield Prediction Problem (I) |
|
Takai, Ayana | Nagoya University |
Makino, Hiroya | Nagoya University |
Kita, Eisuke | Nagoya University |
Keywords: Assistive Technology, Networking and Decision-Making, Environmental Sensing,
Abstract: One of the purposes of smart agriculture is to predict the yield of paddy rice using agricultural data using machine learning. LightGBM, one of the machine learning algorithms, is applied to the yield prediction problem of paddy rice in this paper. Since LightGBM has a large number of hyperparameters, the hyperparameter optimization using the stochastic schemata exploiter (SSE) is used. From the results of comparison with Genetic Algorithm (GA), it is confirmed that SSE has a fast convergence speed. In addition, it is found that the higher the mutation rate of SSE, the more converged to the global optimal solution without falling into the local solution.
|
|
08:15-08:30, Paper We-PS10-T11.2 | Add to My Program |
A Study on Psychological Flow Measure by Human-Machine Interaction Modeling (I) |
|
Kille, Sean | Karlsruhe Institute of Technology |
Witucki, Linus | Karlsruhe Institute of Technology |
Rothfuss, Simon | Karlsruhe Institute of Technology (KIT) |
Hohmann, Sören | KIT |
Keywords: Human-Machine Interaction
Abstract: With human-machine interaction continuously developing, the consideration of human experience in the design of a machine becomes increasingly relevant. An experience measure that rates how well a work or interaction state is perceived by a user is psychological flow. In this paper we introduce a novel approach to assess flow in human-machine interaction by game-theoretically modeling the interaction. To validate our approach, we perform a user study with 30 participant with a study design that allows for the manipulation of the user’s experience through three experience modes. The results both validate our study design and provide indications that our approach to assess flow is promising to be developed further.
|
|
08:30-08:45, Paper We-PS10-T11.3 | Add to My Program |
Reinforcement Learning Based User-Specific Shared Control Navigation in Crowds (I) |
|
Zhang, Bingqing | University College London |
Holloway, Catherine | University College London |
Carlson, Tom | University College London |
Keywords: Shared Control, Assistive Technology, Human-Collaborative Robotics
Abstract: Shared control is a mode where the user input is combined with a planned motion to achieve a common goal. In navigation, a shared control approach could provide a potential mobility solution for people who have a mobility impairment and find traditional powered wheelchairs unsuitable. While state-of-the-art work in shared control has demonstrated its capability in improving safety, human-machine interaction and reduce confusion, it is still challenging to use shared control in dynamic, crowded scenarios, in a way that is acceptable to users. Learning from recent advances in robot navigation, we present a reinforcement learning based framework, which allows navigation to be achieved in a user-specific shared controlled way. Our approach was trained and tested in a Unity3D based simulator. It achieved 33% fewer collisions, similar high user agreement (≥ 85%) and 27% less completion time when compared with our previous model-based method.
|
|
08:45-09:00, Paper We-PS10-T11.4 | Add to My Program |
Tackling the Duality of Obstacles and Targets in Shared Control Systems: A Smart Wheelchair Table-Docking Example (I) |
|
Arditti, Samuel | University College London |
Habert, Felix | University College London |
Saracbasi, Ozge | University of Reading |
Walker, George | University College London |
Carlson, Tom | University College London |
Keywords: Human-Machine Cooperation and Systems, Assistive Technology
Abstract: Many studies have shown that a smart wheelchair could improve the quality of life of people with restricted mobility by providing them with more freedom in the daily activities they can undertake independently. In addition to enhancing independent mobility, it is important to ensure safety for wheelchair users and those around them. To date, previous studies have mostly focused on (semi-)autonomous navigation or obstacle avoidance. By contrast, in this study, we tackle the challenging, but important problem of safely docking to tables. We propose a robotic navigation assistance, applied to electric powered wheelchairs using Time-of-Flight (ToF) sensors to facilitate table-docking for users. To meet this objective, we designed a low-cost sensor system that was integrated into our smart wheelchair prototype, which can detect a table and accurately estimate its height. We then developed a robust algorithm to deliver the manoeuvring assistance. First, we simulated the smart wheelchair system within Unity3D to find the best positions for the ToF sensors and evaluate the accuracy of the docking system, employing different table styles. Then, we experimentally validate the system on our physical wheelchair, using varying angles of approach, which demonstrate its feasibility.
|
|
We-PS10-T12 Special Session, Hilo |
Add to My Program |
Human-Machine Cooperation and Systems |
|
|
Organizer: Shen, Weiming | Huazhong University of Science and Technology |
Organizer: Trappey, Amy | National Tsing Hua University |
Organizer: Lee, Ching-Hung | Xi’an Jiaotong University |
Organizer: Shi, Yanjun | Dalian University of Technology |
|
08:00-08:15, Paper We-PS10-T12.1 | Add to My Program |
Steering Mental Models During Drifting: Can Drivers Understand Automated Counter-Steering? (I) |
|
Yasuda, Hiroshi | Toyota Research Institute |
Chen, Tiffany | Toyota Research Institute |
Keywords: Shared Control, Human-Machine Cooperation and Systems, Human-Machine Interface
Abstract: Automated vehicle control beyond friction limits is an emerging technology that can further improve vehicle safety. This paper focuses on the driver's understanding of steering behavior during automated counter-steering. We hypothesize that: 1) motorsports fans or racing game players (GP) have a correct mental model of automated counter-steering, while 2) ordinary drivers (Non-GP) do not, and 3) Non-GP incorrectly estimate rotation direction as being always consistent between the steering wheel and vehicle. We first propose a method to examine the accuracy of a driver's steering mental model. The method measures the participants' ability to estimate steering wheel rotation from a video only showing vehicle rotation and vice versa. We compared the performance of GP and Non-GP using videos with and without counter-steering. The results only support hypothesis 2. This indicates that automated counter-steering likely confuses most drivers no matter their prior knowledge, which makes the improvement of driver-vehicle communication critical.
|
|
08:15-08:30, Paper We-PS10-T12.2 | Add to My Program |
Shared Telemanipulation with VR Controllers in an Anti Slosh Scenario (I) |
|
Grobbel, Max | FZI Forschungszentrum Informatik |
Varga, Balint | Karlsruhe Institute of Technology (KIT), Campus South |
Hohmann, Sören | KIT |
Keywords: Human-Machine Cooperation and Systems, Shared Control, Human-Machine Interaction
Abstract: Telemanipulation has become a promising technol- ogy that combines human intelligence with robotic capabilities to perform tasks remotely. However, it faces several challenges such as insufficient transparency, low immersion and limited feedback to the human operator. Moreover, the high cost of haptic interfaces is a major limitation for the application of telemanipulation in various fields, including elder care where our research is focused. To address these challenges, this pa- per proposes the usage of nonlinear model predictive control for telemanipulation using low-cost virtual reality controllers, including multiple control goals in the objective function. The framework utilizes models for human input prediction and task related models of the robot and the environment. The proposed framework is validated on an UR5e robot arm in the scenario of handling liquid without spilling. Further extensions of the framework such as pouring assistance and collision avoidance can easily be included.
|
|
08:30-08:45, Paper We-PS10-T12.3 | Add to My Program |
Interaction Mediation for Meaningful Human Control Over Highly Automated Vehicles (I) |
|
Baltzer, Marcel Caspar Attila | Fraunhofer FKIE |
Ripkens, Alexander | Fraunhofer FKIE |
López Hernández, Daniel | Fraunhofer FKIE |
Flemisch, Frank | RWTH Aachen University/Fraunhofer |
Keywords: Human-Machine Cooperation and Systems, Shared Control, Augmented Cognition
Abstract: The current advancement of technology with not only physical, but increasing cognitive functions widely coined as Artificial Intelligence (AI) leads to multiple situations where humans willingly or unwillingly accept the decisions of algorithms and Neural Network (NN) Models resulting in humans being decreasingly involved in the decision-making process. Human-machine cooperation and Human Autonomy Teaming (HAT) are an answer to this problem, joining the best of both sides into joined cognitive systems. Meaningful Human Control (MHC) is the concept of ensuring that humans have enough influence on the outcome of actions in HAT and hence stay morally responsible and legally accountable for their actions. This paper extends the concept of using an interaction mediator that facilitates the interaction between an automated and a human agent with interaction patterns to maintain MHC. The concept is applicable in almost any domain, and will be shown with the example of guiding a highly automated vehicle.
|
|
We-PS10-T13 Regular Session, Kona |
Add to My Program |
Brain-Computer Interfaces I |
|
|
|
08:00-08:15, Paper We-PS10-T13.1 | Add to My Program |
Achieving Effective Artifact Subspace Reconstruction in EEG Using Real-Time Video-Based Artifact Identification |
|
Kang, Sunghyun | Gwangju Institute of Science and Technology |
Won, Kyungho | Centre Inria De l'Universite De Rennes |
Kim, Heegyu | Gwangju Institute of Science and Technology |
Baek, Jihoon | Gwangju Institute of Science and Technology |
Ahn, Minkyu | Handong Global University |
Jun, Sung | Gwangju Institute of Science and Technology |
Keywords: Brain-Computer Interfaces
Abstract: Identifying and minimizing physiological artifacts in EEG is challenging because these artifacts may corrupt the underlying brain activity severely. In this work, we proposed a hybrid approach to detect/reduce EEG artifacts by combining the MediaPipe face mesh model and artifact subspace reconstruction (ASR). Four types of artifacts, eye blinking, horizontal/vertical eye movements, and jaw movement during EEG measurement were generated to test our approach. We observed that real-time video-based artifact identification achieved over 95% accuracy in detecting eye blinking, horizontal eye movement, and jaw movement. Moreover, the targeted noise reduction was effective in analyzing the signal-to-noise ratio (SNR) for each specific artifact. This work may contribute to improving the reliability and accuracy of EEG data analysis in the real-world and online scenarios by providing a practical and effective approach to identifying and reducing physiological artifacts in real-time.
|
|
08:15-08:30, Paper We-PS10-T13.2 | Add to My Program |
Computer Aided Detection of Dominant Artifacts in Ear-EEG Signal |
|
Jayas, Tanuja | TCS Research |
A, Adarsh | TCS Research |
Muralidharan, Kartik | TCS Reseach |
Gubbi, Jayavardhana | TCS Research, Tata Consultancy Services Ltd |
Ramakrishnan, Ramesh Kumar | TCS Research |
Pal, Arpan | Tata Consultancy Services |
Keywords: Brain-Computer Interfaces, Wearable Computing, Medical Informatics
Abstract: Analysis of Electroencephalography (EEG) signals for everyday in-situ applications is hindered by many challenges including artifacts induced from physiological and environmental sources as well as anatomical factors. Recent studies have proposed analytical frameworks for the assessment of scalp EEG quality and identification of artifacts using rule based methods. These methods are typically used in an offline processing manner, employed before signal analysis when abundant data is available. With the advent of wearable devices, it is important to build techniques that work on short term signals giving us the ability of intervention based on the signal quality. Further, ear-EEG is a new modality that requires assessment and calibration with short term signals. To support this new modality we conducted a detailed study of scalp and ear-EEG data from the perspective of different measures as well as time duration. An algorithm is developed to identify the epochs with EOG and EMG artifacts in the ear-EEG using a set of metrics based on the characteristics of EEG. Thresholds of these metrics are determined by training on a dataset containing synchronous ear and scalp EEG with good results. The algorithm obtained an accuracy of 76.7% and 76.84% in classifying artifact EEG for scalp referenced ear-EEG and re-referenced ear-EEG, respectively.
|
|
We-PS20T1 Workshop Session, Puna |
Add to My Program |
Machine Learning for EEG-Based Brain-Computer Interfaces I |
|
|
Organizer: Wu, Dongrui | Huazhong University of Science and Technology |
|
10:45-11:00, Paper We-PS20T1.1 | Add to My Program |
Beyond Within-Subject Performance: A Multi-Dataset Study of Fine-Tuning in the EEG Domain (I) |
|
Sartzetaki, Christina | Deeplab |
Antoniadis, Panagiotis | Deeplab |
Antonopoulos, Nick | Deeplab |
Gkinis, Ioannis | Deeplab |
Krasoulis, Agamemnon | Insilico Medicine |
Perdikis, Serafeim | University of Essex |
Pitsikalis, Vasileios | Deeplab; Ntua |
Keywords: Brain-Computer Interfaces, Cognitive Computing, Human-Machine Interface
Abstract: There is a critical demand for BCI systems that can swiftly adapt to a new user and at the same time function with any user. We propose a fine-tuning approach for neural networks that serves a dual purpose; first, to minimize calibration times through requiring considerably less data - up to one-sixth - from the target subject than training from scratch, and second, to alleviate cases of user illiteracy by providing a substantial performance boost of over 11% in absolute accuracy from the features learned from other subjects. Ultimately, our adaptation method surpasses standard within-subject performance by a large margin in all subjects. We present ablation studies across three datasets, in which we demonstrate that fine-tuning outperforms other adaptation methods for BCI systems and that what matters most is the quantity of pre-training subjects, rather than their BCI-ability, achieving over 8% absolute increase in classification accuracy when scaling up the order of magnitude. Finally, we compare our approach to the state-of-the-art in EEG-based motor imagery and find it comparable, if not superior, to methods employing far more complex neural networks, obtaining 82.60% and 85.64% within-subject accuracy in the four-class BCIC IV-2a and binary MMI datasets respectively.
|
|
11:00-11:15, Paper We-PS20T1.2 | Add to My Program |
Explaining Convolutional Neural Networks for EEG-Based Brain-Computer Interface Using Influence Functions (I) |
|
Park, Hoonseok | Kyung Hee University |
Park, Donghyun | Republic of Korea |
Kim, Sangyeon | North Carolina State University |
Choo, Sanghyun | North Carolina State University |
Nam, Chang | North Carolina State University |
Lee, Sangwon | Korea University |
Jung, Jae-Yoon | Kyung Hee University |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics, Passive BMIs
Abstract: Although the high performance of the convolutional neural networks (CNNs) for brain-computer interface (BCI) tasks based on raw electroencephalography (EEG) signals, the explanation of the prediction result remains challenging owing to their complex structure and numerous parameters. We propose a novel framework for explaining CNNs for EEG-based BCI tasks by using the perturbation-based influence scores. The method supports the interpretation of CNN classification for EEG signals at both the example-level and the feature-level. The experiments on the BCIC III-IVa dataset demonstrate that the proposed method is effective for not only the interpretation of the predictive models, but also for the improvement of the classification accuracy.
|
|
11:15-11:30, Paper We-PS20T1.3 | Add to My Program |
TOINet: Transfer Learning from Overt Speech to Imagined Speech-Based EEG Signals with Convolutional Autoencoder (I) |
|
Lee, Dae-Hyeok | Korea University |
Kim, Sung-Jin | Korea University |
Han, Hyeon-Taek | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: Active BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: Brain-computer interface (BCI) enables the communication between humans and devices by reflecting intentions and status of humans. Endogenous BCI is the imagined-based BCI and it has the advantage that the fatigue level of the body, especially the eyes, is relatively low and no additional equipment for offering stimulation is required. When conducting imagined speech, one of the endogenous BCI paradigms, the users imagine the pronunciation as if actually speaking. In contrast, overt speech is that the users directly pronounce the words. We proposed the transfer learning-based method from overt speech- to imagined speech-based electroencephalogram (EEG) signals (TOINet). The proposed method utilizes an encoder to extract the feature vector of imagined speech from EEG signals, which is subsequently reconstructed into overt speech signals using the decoder. Through this process, the model can identify the significant and common features present in EEG signals for both overt and imagined speech, facilitating the classification of EEG signals associated with imagined speech. Eight subjects participated in the experiment. The average accuracy of the TOINet was 0.4841 for classifying four words and the EEG features of overt speech improved the performance by 0.0742. Hence, we demonstrated that EEG features of overt speech could improve the decoding performance of imagined speech.
|
|
11:30-11:45, Paper We-PS20T1.4 | Add to My Program |
Short Calibrated SSVEP-BCI for Cross-Subject Transfer Learning Via ELM-AE |
|
Christian, Flores Vega | Universidad De Ingeniería Y Tecnología |
Casas Castro, Paolo | UTEC |
Negreiros De Carvalho Leite, Sarah | UFOP: Universidade Federal De Ouro Preto |
Attux, Romis | UNICAMP |
Keywords: Passive BMIs, BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics
Abstract: The Steady-State Visually Evoked Potential (SSVEP) is a robust paradigm used to build a high-speed Brain- Computer Interface (BCI). This technology can benefit disabled subjects, allowing them to interact with their surroundings without using their peripheral nerves. However, one challenge to address is reducing the time calibration of BCI for a new subject (target subject) because of the high brain EEG variability among subjects and within subjects in different sessions. This constraint restricts the application of SSVEP-based BCI in natural environments; thus, some approaches to endeavor this constraint propose a linear transformation of existing subjects over some trials of the target subject. In this paper, we propose an approach to a nonlinear transformation (NLT) using an Extreme Learning Machine Autoencoder (ELM-AE) of SSVEP trials to improve a cross-subject classification reducing the calibration time for the target subject. Our results reported that the recognition accuracy improved by 6.58% for all subjects using NLT. Also, these results exhibit the feasibility of NLT that using a few templates from the target subject can enhance the recognition accuracy over cross-subject classification without NLT.
|
|
11:45-12:00, Paper We-PS20T1.5 | Add to My Program |
SAT-Net: SincNet-Based Attentive Temporal Convolutional Network for Motor Imagery Classification (I) |
|
Kim, Jun-Mo | Korea University |
Bak, Soyeon | Korea University |
Nam, Hyeonyeong | Korea University |
Choi, WooHyeok | Korea University |
Kim, Da-Hyun | Korea University |
Kam, Tae-Eui | Korea University |
Keywords: Active BMIs, BMI Emerging Applications
Abstract: Brain-computer interfaces (BCIs) are a promising method for users to interact with machines using brain signals, primarily electroencephalography (EEG). Motor imagery (MI)- BCI, which decodes EEG signals induced by the user's imagination of moving body parts, has gained great attention due to its applicability in various fields such as robotics and rehabilitation. MI-EEG signals exhibit class-discriminative patterns such as event-related de/synchronization across spectral-spatio-temporal (SST) domains. Numerous studies adopted deep learning frame- work, especially convolutional neural network (CNN), for learning SST feature representations automatically in a data-driven manner. In particular, most of the CNN-based methods adopted temporal convolution to learn the spectral information, as it can act as a band-pass filter. In this paper, we propose SAT-Net, a SincNet-based attentive temporal convolutional network for motor imagery classification. The proposed method utilizes Sinc convolution from SincNet for explicit extraction of spectral information from the input EEG signals with high interpretability. Moreover, we adopt an attentive temporal convolutional network to effectively learn SST feature representations while making full use of temporal information. We evaluate our proposed SAT-Net on the public BCI Competition IV-2a dataset, comparing it not only to conventional CNN-based approaches but also to the state-of-the-art method. The experimental results, supported by statistical analysis, demonstrate that our approach outperforms the competing methods.
|
|
We-PS20-T1 Regular Session, Hawaii 1 |
Add to My Program |
Multimedia Systems |
|
|
|
10:45-11:00, Paper We-PS20-T1.1 | Add to My Program |
A Novel No-Reference HD Video Quality Metric Based on Perceptual Temporal Pooling |
|
Xiang, Jie | University of British Columbia |
Tohidypour, Hamid Reza | University of British Columbia |
Wang, Yixiao | University of British Columbia |
Nasiopoulos, Panos | University of British Columbia |
Pourazad, Mahsa T. | University of British Columbia |
Keywords: Human Perception in Multimedia, Visual Analytics/Communication, Multimedia Systems
Abstract: Impressive advancements in capturing, display, and broadcasting technologies significantly elevate image and video quality, and with that the need for designing new reference and no-reference image and video quality metrics. One of the latest and perceptually accurate video quality metrics is the Video Multi-Method Assessment Fusion (VMAF) method. However, VMAF considers the temporal nature of video using basic average temporal pooling, an approach that falls short from human perception. In this paper, we introduce a new no-reference video quality metric that uses deep learning to extract spatial features and a unique temporal pooling approach to accurately predict the visual quality score. To this end, first we created a video quality dataset that consists of high-resolution, 20s-long test video clips compressed at several different bitrates. These videos were labeled based on subjective evaluations and were used to determine the perceptual importance of frames in our temporal pooling scheme. Evaluations showed that our proposed approach achieved correlation of 90.55% with human perception and outperformed the state-of-the-art VMAF approach by 15.63% accuracy.
|
|
11:00-11:15, Paper We-PS20-T1.2 | Add to My Program |
Acceptability and Renown of Digital Immortality through the Lens of the User |
|
Ferreira Galvão, Vinícius | Universidade Federal De Mato Grosso |
Maciel, Cristiano | Universidade Federal De Mato Grosso | CalPoly Pomona |
Carvalho Pereira, Vinícius | Universidade Federal De Mato Grosso |
Garcia, Ana Cristina Bicharra | Universidade Federal Do Estado Do Rio De Janeiro |
Keywords: Human-Computer Interaction, Multimedia Systems, Affective Computing
Abstract: Technology users will eventually face death in the offline world. However, their data may survive for a long time, recorded in a wide variety of digital services and devices. This digital legacy must be handled and legally abide by the owners’ wishes, which may vary from inheritance to deletion. By carrying out a literature review on the subject, the authors noticed there is a lack of user studies on these issues. We, therefore, conducted a focus group with technology users of different profiles to understand the volunteers’ acceptability of this topic. During the group discussions, an unexpected factor was raised: to what extent might a user's fame influence their digital immortalization? This study aims to discuss digital immortality in terms of its acceptability among users and to contribute to the field of Human-Computer Interaction (HCI) with important insights on death in the digital world. In our findings regarding acceptability, users can feel some comfort in interacting with their deceased loved ones. Regarding fame, while influential individuals may not need digital immortality, there are concerns regarding the utilitarian nature of their legacies.
|
|
11:30-11:45, Paper We-PS20-T1.4 | Add to My Program |
Novel 3D-Aware Composition Images Synthesis for Object Display with Diffusion Model |
|
Chen, Tianrun | Zhejiang University |
Tao, Xu | Huzhou University School of Information Engineering |
Ye, Yiyu | KOKONI, Moxin (Huzhou) Technology Co., LTD |
Mao, Papa | KOKONI, Moxin (Huzhou) Technology Co., LTD |
Zang, Ying | Huzhou University |
Sun, Lingyun | Zhejiang University |
Keywords: Multimedia Systems, Human-Computer Interaction, Design Methods
Abstract: Designing attractive images for object display can be a time-consuming and skill-intensive process. The emergence of advanced algorithms, particularly the Diffusion Model, has made it possible to synthesize attractive images using AI. However, the existing diffusion models are mostly used to generate entire images and lack control over specific objects for object display. Here, to the best of our knowledge, we pioneers to extend the application of the diffusion model to synthesize novel images for specific objects. By encoding the input images of objects into NeRF representation and synthesizing the desired backgrounds using diffusion models with the input of rendered object images and text prompts, our method can generate 3D aware object display images at arbitrary angles and arbitrary backgrounds. We have conducted extensive experiments to demonstrate that our method is capable of generating high-quality and photo-realistic images, which are >6 times faster than the conventional photomontage approach. Moreover, our generated images have higher compositional scores, image quality scores, and aesthetics scores in our user experiments. By significantly reducing the need for human effort and producing higher quality generated images, our approach opens up exciting possibilities for creating versatile novel images of specific objects.
|
|
11:45-12:00, Paper We-PS20-T1.5 | Add to My Program |
Cross-Modal Correspondence between Sound Pitch and Shape Controlled by Shape Feature Indices |
|
Hayashi, Jumpei | Keio University |
Kato, Takeo | Keio University |
Yanagisawa, Hideyoshi | The University of Tokyo |
Keywords: Kansei (sense/emotion) Engineering, Multimedia Systems, Interactive Design Science and Engineering
Abstract: Recently, cross-modal correspondences, which is compatibility effects between attributes or dimensions of stimuli in different sensory modalities, have been drawing attention. In the case of the correspondences between sound and shape, the effect of “angularity” of shapes on sound pitch has been widely studied while there has been little research on the other features of shapes. Previous studies have shown that the human emotion mediates the cross-modal correspondences. Therefore, this study focuses on the complexity and order of shapes, which is said to influence human emotion, and aims to examine the cross-modal correspondences induced by its sensation. First, indices of complexity and order are selected. Next, shapes are selected based on each index. Finally, an experiment was conducted to investigate the correspondence between shapes and sound pitches. As a result, the correspondence between the complexity of shapes and higher pitches were found. On the other hand, the correspondence between the order of shape and lower pitches were not found.
|
|
We-PS20-T2 Regular Session, Lanai |
Add to My Program |
Decision Making |
|
|
|
10:45-11:00, Paper We-PS20-T2.1 | Add to My Program |
Spatial and Social Situation-Aware Transformer-Based Trajectory Prediction of Autonomous Systems (I) |
|
Donandt, Kathrin | University of Duisburg-Essen |
Soeffker, Dirk | University of Duisburg-Essen |
Keywords: Cognitive Computing, Human-Centered Transportation, Systems Safety and Security
Abstract: Autonomous transportation systems such as road vehicles or vessels require the consideration of the static and dynamic environment to dislocate without collision. Anticipating the behavior of an agent in a given situation is required to adequately react to it in time. Developing deep learning-based models has become the dominant approach to motion prediction recently. The social environment is often considered through a CNNLSTM- based sub-module processing a social tensor that includes information of the past trajectory of surrounding agents. For the proposed transformer-based trajectory prediction model, an alternative, computationally more efficient social tensor definition and processing is suggested. It considers the interdependencies between target and surrounding agents at each time step directly instead of relying on information of last hidden LSTM states of individually processed agents. A transformer-based sub-module, the Social Tensor Transformer, is integrated into the overall prediction model. It is responsible for enriching the target agent’s dislocation features with social interaction information obtained from the social tensor. For the awareness of spatial limitations, dislocation features are defined in relation to the navigable area. This replaces additional, computationally expensive map processing sub-modules. An ablation study shows, that for longer prediction horizons, the deviation of the predicted trajectory from the ground truth is lower compared to a spatially and socially agnostic model. Even if the performance gain from a spatial-only to a spatial and social context-sensitive model is small in terms of common error measures, by visualizing the results it can be shown that the proposed model in fact is able to predict reactions to surrounding agents and explicitely allows an interpretable behavior.
|
|
11:00-11:15, Paper We-PS20-T2.2 | Add to My Program |
Modeling the Social Acceptability of Technologies Using Twitter Data (I) |
|
Nanba, Hidetsugu | Chuo University |
Yamamoto, Katsuya | Chuo University |
Fukuda, Satoshi | Chuo University |
Shoji, Hiroko | Chuo University |
Tanishita, Masayoshi | Chuo University |
Kyutoku, Yasushi | Chuo University |
Yamashina, Mitsuru | Chuo University |
Keywords: Assistive Technology
Abstract: Our society is built on the benefits of technological development, but this development also entails risks. When the convenience of a new technology outweighs the sense of anxiety about its risks, we can consider that this technology is socially accepted. In this study, we use the social media to analyze the general public's anxiety feelings toward various technologies and model the social acceptability of the new technologies based on this analysis. In this study, 6,452,730 tweets about three technologies: “automated driving,” “electronic currency,” and “drones” were collected, and the social acceptability of the technologies was examined using emotion classification, text categorization, and topic analysis techniques. As a result, we concluded that the anxiety about unemployment can be one perspective for analyzing the social acceptability of a technology.
|
|
11:15-11:30, Paper We-PS20-T2.3 | Add to My Program |
Preclinical Assessment of Upper Limb Tremor in Parkinson's Disease with Deep Learning and Wearable Technology (I) |
|
Lin, Fang | Dalian University of Technology |
Wang, Zhelong | Dalian University of Technology |
Sen, Qiu | Dalian University of Technology |
Hongyu, Zhao | Dalian University of Technology |
Keywords: Human-Machine Interaction, Medical Informatics
Abstract: Tremors are typically experienced by patients at the beginning of Parkinson’s disease (PD). Clinicians evaluate clinical symptoms based on scale and experience, but mild tremors do not have significant characteristics and are difficult to observe with the naked eye. Implementing an intelligent and objective method to identify PD patients with early tremor symptoms and healthy controls (HC) is necessary. This study used wearable sensors to collect 9-axis inertial signals and 2-channel sEMG signals at the wrists of 13 HC and 24 PD patients from Dalian Municipal Central Hospital. Based on a Long Short-Term Memory Network (LSTM), an attention mechanism, and a Fully Convolutional Network (FCN), we develop a model to classify and recognize data from PD patients. Compared the proposed method with several classification methods, the results showed that the proposed method achieved higher classification accuracy (91.78%), precision (100%), recall (87.50%), and F1-score (93.33%) of PD class than Support Vector Machine, FCN, and LSTM. The computing time of the proposed method is approximately 1 second. The proposed method identifies early PD patients by pre-clinical assessment of mild upper limb tremors, which is valuable for early treatment and rehabilitation.
|
|
We-PS20-T3 Special Session, Lao Needle |
Add to My Program |
Machine Learning for Medical Data Analysis |
|
|
Organizer: Liu, Weifeng | China University of Petroleum (East China) |
Organizer: Zhang, Bingfeng | China University of Petroleum (East China) |
Organizer: Zhou, Yicong | University of Macau |
|
11:00-11:15, Paper We-PS20-T3.2 | Add to My Program |
Instrumental Variable Learning for Chest X-Ray Classification (I) |
|
Nie, Weizhi | The School of Electrical and Information Engineering, Tianjin Un |
Zhang, Chen | Tianjin University |
Song, Dan | Tianjin University |
Bai, Yunpeng | The Department of Cardiac Surgery, Chest Hospital, Tianjin Unive |
Xie, Keliang | The Department of Critical Care Medicine, Department of Anesthes |
Liu, Anan | The School of Electrical and Information Engineering, Tianjin Un |
Keywords: Medical Informatics, Biometrics and Applications,
Abstract: The chest X-ray (CXR) is commonly employed to diagnose thoracic illnesses, but the challenge of achieving accurate automatic diagnosis through this method persists due to the complex relationship between pathology. In the past few years, numerous approaches based on deep learning have been proposed to address this issue but confounding factors such as image resolution or noise problems often damage model performance. In this paper, we focus on the chest X-ray classification task and proposed an interpretable instrumental variable (IV) learning framework, to eliminate the spurious association and obtain accurate causal representation. Specifically, we first construct a structural causal model (SCM) for our task and learn the confounders and the preliminary representations of IV, we then leverage electronic health record (EHR) as auxiliary information and we fuse the above feature with our transformer-based semantic fusion module, so the IV has the medical semantic. Meanwhile, the reliability of IV is further guaranteed via the constraints of mutual information between related causal variables. Finally, our approach’s performance is demonstrated using the MIMIC-CXR, NIH ChestX-ray 14, and CheXpert datasets, and we achieve competitive results.
|
|
11:15-11:30, Paper We-PS20-T3.3 | Add to My Program |
Wearable Devices for Early Warning of Acute Exacerbation in Chronic Obstructive Pulmonary Disease Patients (I) |
|
Hsiao, Chun-Chieh | Lunghwa University of Science and Technology |
Chu, Cai-Ying | Department of Electronic Engineering, National Taipei University |
Lee, Ren-Guey | National Taipei University of Technology |
Chang, Jer-Hwa | School of Respiratory Therapy, College of Medicine, Taipei Medic |
Tseng, Chwan-Lu | National Taipei University of Technology |
Keywords: Wearable Computing, Medical Informatics, Human-centered Learning
Abstract: Chronic Obstructive Pulmonary Disease (COPD) is one of the leading causes of chronic diseases and deaths worldwide. When acute exacerbation of COPD (AECOPD) occurs, the frequency and severity of malignant attacks are highly correlated with mortality rate. The purpose of this study is to use wearable devices to collect physiological parameters of patients for early warning and prevention of complications of possible AECOPD attacks in the future. The subjects used wearable devices to measure Heart Rate Variability (HRV) at home. Physiological data and physiological assessment scales of 13 COPD patients were collected during the 6-month study period. According to the scale responses, the severity of the condition was classified into mild AE and no AE. If the subject needed emergency medical treatment due to COPD, it was classified as AE. With the scale classification method, a machine-learning Random Forest (RF) algorithm is used to predict the occurrence of AECOPD in the next 7 days, so as to prevent the deterioration of the disease in advance. The results of the study show that the accuracy of the model is more than 92% according to different classification methods, and using the mixed-parameter model as a feature for the prediction can improve the sensitivity of the original warning mechanism. In order to provide predictive results to the nursing staff at any time, the user interface of our system would transmit a warning message to remind the nursing staff to ensure early medical intervention for patients to avoid the occurrence of AECOPD.
|
|
We-PS20-T4 Regular Session, Hawaii 2 |
Add to My Program |
Intelligence Interaction |
|
|
|
10:45-11:00, Paper We-PS20-T4.1 | Add to My Program |
Question-Guided Graph Convolutional Network for Visual Question Answering Based on Object-Difference |
|
Minchang, Huangfu | Qilu University of Technology |
Geng, Yushui | Qilu University of Technology (Shandong Academy of Sciences) |
|
11:00-11:15, Paper We-PS20-T4.2 | Add to My Program |
Self-Supervised Optimization of Hand Pose Estimation Using Anatomical Features and Iterative Learning |
|
Jauch, Christian | Fraunhofer IPA |
Leitritz, Timo | Fraunhofer IPA |
Huber, Marco | University of Stuttgart |
Keywords: Human-Machine Interaction, Assistive Technology, Human-Computer Interaction
Abstract: Manual assembly workers face increasing complexity in their work. Human-centered assistance systems could help, but object recognition as an enabling technology hinders a sophisticated human-centered design of these systems. At the same time, activity recognition based on hand poses suffers from poor pose estimation in complex usage scenarios, such as wearing gloves. This paper presents a self-supervised pipeline for adapting hand pose estimation to specific use cases with minimal human interaction. This enables cheap and robust hand pose-based activity recognition. The pipeline consists of a general machine learning model for hand pose estimation trained on a generalized dataset, spatial and temporal filtering to account for anatomical constraints of the hand, and a retraining step to improve the model. Different parameter combinations are evaluated on a publicly available and annotated dataset. The best parameter and model combination is then applied to unlabeled videos from a manual assembly scenario. The effectiveness of the pipeline is demonstrated by training an activity recognition as a downstream task in the manual assembly scenario.
|
|
11:15-11:30, Paper We-PS20-T4.3 | Add to My Program |
Wind Power Scenario Generation Based on Denoising Diffusion Probabilistic Model |
|
Xu, Chenglong | Wuhan University |
Dai, Yuxin | Wuhan University |
Xu, Peidong | Wuhan University |
Gao, Tianlu | Wuhan University |
Zhang, Jun | Wuhan University |
Keywords: Cognitive Computing, Human Factors, Intelligence Interaction
Abstract: The intermittency and randomness of wind power output have a negative impact on the stable operation of the power grid. Accurately modeling the uncertainty of wind power output is essential, and the primary method to achieve this is through scenario generation. Traditional scenario generation methods suffer from limitations such as low accuracy and high computational complexity. In this paper, a novel generation framework based on the denoising diffusion probabilistic model is presented and proposed for scenario generation of wind power. This method can overcome the limitations of traditional methods and learn the distribution of real data to generate reliable wind power scenarios. Compared to a homogeneous generative model, the proposed method shows improved performance in precisely capturing features of wind power scenarios.
|
|
11:30-11:45, Paper We-PS20-T4.4 | Add to My Program |
A Deep Q-Network-Based Algorithm for Obstacle Avoidance and Target Tracking for Drones |
|
Guo, Jingrui | The Hong Kong Polytechnic University |
Huang, Chao | The Hong Kong Polytechnic University |
Huang, Hailong | The Hong Kong Polytechnic University |
Keywords: Cognitive Computing, Intelligence Interaction, Multi-User Interaction
Abstract: This paper introduces a novel algorithm, refer to NEWDQN, which is based on the deep Q-network (DQN) framework. The primary objective of this algorithm is to optimize the successful rate both in autonomous drone obstacle avoidance and target tracking tasks, while this algorithm can also improve the drawbacks of the previous algorithm in convergence. Furthermore, the algorithm endows the drone with environment perception capabilities and incorporates a direction-based reward-penalty function into the reward function, enhancing the drone’s generalization ability and overall performance. Extensive simulations demonstrate that compared to conventional DQN and Double DQN (DDQN) algorithms, NEWDQN exhibits faster convergence speed, shorter tracking paths, and more robust adaptability to different environments.
|
|
11:45-12:00, Paper We-PS20-T4.5 | Add to My Program |
Hybrid Intelligent-Annotation Organ Segmentation on Medical Datasets |
|
Tao, Peng | Soochow University |
Zhao, Jing | Beijing Tsinghua Changgung Hospital |
Gu, Yidong | Suzhou Municipal Hospital |
Di, Gongye | The Affiliated Taizhou People’s Hospital of Nanjing Medical Univ |
Zhang, Lei | Duke Kunshan University |
Cai, Jing | Hong Kong Polytechnic University |
Keywords: Intelligence Interaction, Multimedia Systems, Medical Informatics
Abstract: Ultrasound image segmentation is crucial for early disease detection and treatment planning but remains a challenging task due to the low contrast of organ boundaries and varying image quality. Current methods often require manual intervention or have limited accuracy. In this paper, we propose a novel hybrid framework that combines an automatic option polygon segment (AOPS) algorithm and a distributed- and memory-based evolution (DME) algorithm for precise ultrasound organ segmentation. Our pipeline consists of two cascaded stages: (1) a coarse segmentation step using the AOPS algorithm, which determines the number of vertices/clusters without human intervention, and (2) a refinement step using the DME algorithm for hunting for the optimal neural network, which is then used to represent a smooth, explainable mathematical expression of the organ boundary. We employ the fractional backpropagation learning network with L2 regularization (FBLN) for training and use the scaled exponential linear unit (SELU) activation function to address the vanishing gradient problem. This is a new attempt such a hybrid framework is applied to ultrasound organ segmentation tasks, and it demonstrates significant contributions in terms of accuracy, smoothness, and computational efficiency.
|
|
We-PS20-T5 Regular Session, Hawaii 3 |
Add to My Program |
Virtual and Augmented Reality Systems |
|
|
|
10:45-11:00, Paper We-PS20-T5.1 | Add to My Program |
VRx@Home Pilot: Can Virtual Reality Therapy Improve Quality of Life for People with Dementia Living at Home? |
|
Appel, Lora | York University |
Saryazdi, Raheleh | KITE-Toronto Rehabilitation Institute, University Health Network |
Lewis-Fung, Samantha Evelyn | University Health Network |
Garcia-Giler, Eduardo | Graduate Student (MSc) |
Qi, Di | University of Toronto Mississauga |
Tesfaye, Essete Makonnen | York University | UHN Open Lab |
Garito, Isabella | York University |
Campos, Jennifer | KITE - Toronto Rehabilitation Institute - University Health Netw |
Keywords: Virtual and Augmented Reality Systems, Virtual/Augmented/Mixed Reality, Interactive Design Science and Engineering
Abstract: Virtual Reality (VR) is increasingly considered a valuable therapeutic intervention for people with dementia (PwD). However, it has not yet been widely implemented or rigorously evaluated for use in private residences, where it has potential for significant impact on quality of life for both PwD and their family caregivers. This paper describes results from the VRx@Home Pilot study, which is among the first to explore the potential benefits of immersive VR experiences delivered through a head-mounted display when compared to a standard two-dimensional display (handheld tablet). This was a prospective mixed methods study involving seven PwD-caregiver dyads (n=14) who took part in a four-week home-based intervention (two weeks VR, two weeks Tablet-Only). We evaluated the feasibility, usability, and impact of 360-degree videos on the quality of life of PwD and their caregivers. These outcomes were assessed through in-app metrics, questionnaires, observations, and interviews conducted at a) baseline, b) after each phase of the intervention, and c) at the end of the study. Results revealed that the VR and Tablet-Only conditions were comparable in terms of ease-of-use, session length, and frequency. Both conditions appeared to positively affect in-the-moment mood and quality of life of PwD and their caregivers. Improvements to VR-content, system navigation, and evaluation measures were identified as factors that will increase the likelihood of VR-therapy being adopted in the home setting by PwD and their caregivers.
|
|
11:00-11:15, Paper We-PS20-T5.2 | Add to My Program |
TeleGhost: Asymmetric Telepresence System Using AR and VR Avatars in a Shared Real Space |
|
Seki, Kohta | Waseda University |
Fukushige, Shinichi | Waseda University |
Keywords: Virtual and Augmented Reality Systems, Telepresence, Virtual/Augmented/Mixed Reality
Abstract: This study proposes a simplified telepresence system based on extended reality for increasing the number of remote-able tasks. The system provides an immersive virtual environment that duplicates remote real spaces to enable communication among system users in different geographic locations through their digital avatars. The motion of the avatars corresponds with that of the mobile devices manipulated by the system users, and the avatars’ position and orientation in the real space represent the viewpoint and viewing angle of the users. The three experiments showed that the proposed system creates a high-fidelity immersive environment in a short period of time and positively supports users' remote collaboration in the virtualized real space. The system successfully demonstrated its potential to contribute to the expansion of remotely enabled operations.
|
|
11:15-11:30, Paper We-PS20-T5.3 | Add to My Program |
Comparing Perceptions of Performance across Virtual Reality, Video Conferencing, and Face-To-Face Collaborations |
|
Sanaei, Mohammadamin | Iowa State University |
Machacek, Marielle | Iowa State University |
Gilbert, Stephen | Iowa State University |
Wu, Peggy | Raytheon Technologies Research Center |
Oliver, James | Iowa State University |
Keywords: Human-Computer Interaction, Team Performance and Training Systems
Abstract: As Computer Mediated Communications (CMCs) advance, businesses have sought alternatives to face-to-face (F2F) meetings to increase productivity for geographically dispersed teams while saving time and money. However, critical differences between CMCs and F2F impact multiple aspects of communication performance. To explore these differences, the present study examined communication performance in three conditions: video conferencing (VC), virtual reality (VR), and F2F. The study utilized an electrical circuit repair task and multiple surveys to collect data from 104 participants on four dependent variables: shared situational awareness, usability, mental workload, and performance confidence. For all the variables, results showed significantly better scores in VR and F2F conditions than in VC, but there was no significant difference between the VR and F2F conditions. These findings can inform technology developers in improving communication performance in computer mediated contexts, especially by using VR.
|
|
11:30-11:45, Paper We-PS20-T5.4 | Add to My Program |
A Physiological Approach of Presence and VR Sickness in Simulated Teleoperated Social Tasks |
|
Achanccaray, David | Advanced Telecommunications Research Institute International |
Sumioka, Hidenobu | Advanced Telecommunications Research Institute |
Keywords: Virtual and Augmented Reality Systems, Telepresence, Human-Machine Interaction
Abstract: The presence (or telepresence) feeling and virtual reality (VR) sickness affect the task execution in teleoperation. Most teleoperation works have assessed these concepts using objective (physiological signals) and subjective (questionnaires) measurements. However, these works did not include social tasks. To the best of our knowledge, there was no physiological approach in teleoperation of social tasks. We measured presence and VR sickness in a simulation of teleoperated social tasks by questionnaires and analyzed the correlation between their scores and multimodal biomarkers. The results showed some different correlations from the findings of non-teleoperation studies. These correlations were between presence and neural biomarkers in the frontal-central and central regions (for the beta and delta bands) and between VR sickness and brain biomarkers in the occipital region (for the alpha and beta bands) and the mean temperature. This work revealed significant correlations to support some biomarkers as predictors of the trend of presence and VR sickness in simulated teleoperated social tasks. These biomarkers might also be valid to predict the trend of telepresence and motion sickness in teleoperated social tasks in a remote environment.
|
|
11:45-12:00, Paper We-PS20-T5.5 | Add to My Program |
Human Pain Relief by Using Augmented Reality of Noxious Stimulation |
|
Toratori, Kotaro | University of Tsukuba |
Tanaka, Fumihide | University of Tsukuba |
Keywords: Virtual and Augmented Reality Systems, Medical Informatics
Abstract: It is estimated that 3.5-10 percent of the population are afraid of injections. These people are at risk of vasovagal syncope, a condition in which the patient faints during the injection. To help overcome this response to the perceived pain of injections, this paper proposes a diffuse noxious inhibitory control effect that can be applied through Augmented Reality (AR) smartphone applications. Furthermore, we confirm that presenting ARs that evoke noxious stimulation in participants can influence pain perception. We investigate the pain perception and negative affect when presented with noxious ARs and a distraction AR. The results show that pain thresholds are significantly increased in each AR condition when a painful stimulation is presented to participants. However, no difference in pain reduction between the noxious and distraction ARs is observed. One reason for this result may come from a manipulation problem: exposure to ARs does not make all participants feel pain. For the group of participants who felt pain when exposed to the ARs, the magnitude of pain tended to be smaller in a noxious AR (fire) condition than in the distraction condition. There was no significant increase in negative affect in the fire AR condition compared with the other conditions. These results may imply that the proposed method could reduce pain through diffuse noxious inhibitory control without producing any extra discomfort.
|
|
We-PS20-T6 Special Session, Hawaii 4 |
Add to My Program |
Design Methods and Human Machine Interaction |
|
|
Organizer: Matta, Nada | University of Technology of Troyes |
|
10:45-11:00, Paper We-PS20-T6.1 | Add to My Program |
Intelligent Product Quality Prediction for Highly Customized Complex Production Adopting Ensemble Learning Model (I) |
|
Trappey, Amy | National Tsing Hua University |
Chien, Chun-Hua | National Tsing Hua |
Keywords: Supervisory Control, Information Visualization, Resilience Engineering
Abstract: End-product quality prediction is crucial in smart manufacturing, where reliable evaluation and parameter optimization are essential for ensuring high-quality outputs. This study presents a novel approach that combines adaptive machine learning and nonlinear regression to accurately predict the quality of highly customized end products using limited supply-chain data through digital transformation. The research was conducted in collaboration with a major power transformer manufacturer and its supply chain partners. The adaptive model was trained and validated using real datasets from key components provided by the supply chain, resulting in accurate predictions of end-product quality. The model incorporates the core loss parameter, obtained from the power transformer's key component, as an input dataset for training and testing. The proposed approach, called AdaBoost-Regression, combines adaptive boosting (AdaBoost) and Regression machine learning techniques. Experimental results demonstrate that the AdaBoost-Regression model outperforms simple AdaBoost and Regression models in predicting transformer quality. The model also exhibits superior performance in terms of mean absolute percentage error (MAPE) and root mean square error (RMSE) during real-data verification. This approach has the potential to significantly reduce overall production costs by accurately predicting the quality of complex, expensive, and highly customized industrial products. It can be applied across various industrial sectors.
|
|
11:00-11:15, Paper We-PS20-T6.2 | Add to My Program |
Mathematical Modeling of KANSEI Dynamics for Anxiety Based on Allergy Model (I) |
|
Ohno, Kota | Chuo University |
Shoji, Hiroko | Chuo University |
Keywords: Kansei (sense/emotion) Engineering
Abstract: Contemporary society is inundated with a plethora of information, which can induce anxiety in individuals. However, continuous exposure to various types of information generates a tolerance known as habituation. The dynamics of anxiety share similarities with allergic symptoms in terms of immune reactions to foreign substances. Immunotherapy, a well-established treatment for allergies, suppresses allergic symptoms by properly training the immune system. Anxiety tolerance exhibits parallels to allergic phenomena. This study proposes a KANSEI model based on a mathematical model for allergies focused on immunotherapy to explain anxiety dynamics using differential equations. We conducted numerical simulations using input-induced anxiety and provided a phenomenological discussion of the numerical results.
|
|
11:15-11:30, Paper We-PS20-T6.3 | Add to My Program |
Design and Visualization of a Knowledge Graph Based on Hematology Data: Management of Anemia in Adults (I) |
|
Despres, Sylvie | Sorbonne Paris Nord University |
Hodroj, Soulaymane | Université Sorbonne Paris Nord, LIMICS, INSERM UMRS 1142 |
Hamadi Piriou, Chiraz | Université Sorbonne Paris Nord, LIMICS, INSERM UMRS 1142 |
Keywords: Design Methods, Interactive Design Science and Engineering, Medical Informatics
Abstract: Several clinical decision support systems (CDSSs) have been developed to help practitioners in their diagnostic procedures in order to achieve the best care. This article presents the construction work of a knowledge graph within the framework of a CDSS dedicated to non-hematologist physicians whose main objective is to determine the urgent situations in hematology. First, we recall the basic notions concerning the different approaches of CDSSs. After having identified the skills issues and specified the need, we describe our first work on the construction of this graph based on the latest scientific data concerning the management of anemia and the proposed topology. We finally pass to the validation of the graph as well as its visualization.
|
|
11:30-11:45, Paper We-PS20-T6.4 | Add to My Program |
Knowledge Graph and Ontology for Representing CLL Data (I) |
|
Despres, Sylvie | Sorbonne Paris Nord University |
Hodroj, Soulaymane | Université Sorbonne Paris Nord, LIMICS, INSERM UMRS 1142 |
Hamadi Piriou, Chiraz | Université Sorbonne Paris Nord, LIMICS, INSERM UMRS 1142 |
Keywords: Design Methods, Medical Informatics, Interactive Design Science and Engineering
Abstract: This work is part of a project aiming to identify a profile of patients with an indolent form of chronic lymphocytic leukemia (CLL) using the MTS assay as a predictive marker. We describe the first results related to the construction of a knowledge graph representing data from heterogeneous data sources. After identifying the competency questions defining the prediction of the evolution of CLL, we propose a model to represent patient data in a knowledge graph, and we write the first expert rules to predict the disease progression.
|
|
We-PS20-T7 Special Session, Honolulu |
Add to My Program |
Networking and Decision Making |
|
|
Organizer: Zuo, Yi | Dalian Maritime University |
Organizer: Yada, Katsutoshi | Kansai University |
Organizer: Wang, Hao | Chinese Academy of Sciences |
|
10:45-11:00, Paper We-PS20-T7.1 | Add to My Program |
Multi-Objective Optimization of Multi-Product U-Shaped Disassembly Line Balancing Problem Considering Human Factors (I) |
|
Guo, Xiwang | Liaoning Petrochemical University |
Wei, Tingting | Liaoning Petrochemical University |
Wang, Jiacun | Monmouth University |
Shen, Weiming | Huazhong University of Science and Technology |
Shi, Yanjun | Dalian University of Technology |
Qin, Shujin | Shangqiu Normal University |
Keywords: Human Factors
Abstract: The process of recycling and remanufacturing begins with disassembly. Through disassembly, the components with recycling value are decomposed. However, with the rapid development of production automation, designers often ignore the fact that manual operation is flexible but fails to achieve maximum production efficiency and profit. Therefore, the consideration of human factors in disassembly lines holds significant importance. This study delves into the multi-objective optimization of a U-shaped disassembly line balancing problem involving multiple products. A comprehensive objective function is developed, taking into account various factors including employee fatigue and other factors. To address the aforementioned problem, this study uses a collaborative resource allocation strategy within a multi-objective evolutionary algorithm based on decomposition. By comparing the results of different experimental cases, this paper shows that the proposed algorithm is more competitive than the carnivorous plant algorithm, fruit fly optimization algorithm, and Pareto archiving evolutionary strategy.
|
|
11:00-11:15, Paper We-PS20-T7.2 | Add to My Program |
Post-Covid-19 Digital Nomadism: Beyond Work from (Almost) Anywhere (I) |
|
de Almeida, Marcos Antonio | Ufrj |
De Souza, Jano | UFRJ |
Correia, António | UTAD / INESC TEC / University of Kent |
Schneider, Daniel | UFRJ |
Keywords: Human Factors, Cooperative Work in Design, Information Systems for Design and Marketing
Abstract: In this paper, we continue our investigations on digital nomadism and the impact of COVID-19 pandemic on the work-related aspects and lifestyle of digital nomads (DN). The findings presented in this empirical study reflect the analysis of the impact of COVID-19 outbreak (and its waves) on the market economy and work-life boundaries of DNs as perceived from posts and comments gathered from a Reddit community during the period of early March 2020 until the end of 2022. From this point, our results indicate that the massification of remote work among formal workers in response to COVID-19 pandemic has impacted both the formal labor market and the DN ecosystem. As a consequence, we argue that digital nomadism tends to play a critical role beyond work from (almost) anywhere (WFA) in a post-COVID-19 era taking into account the novel facets of nomadic work-lifestyle.
|
|
11:15-11:30, Paper We-PS20-T7.3 | Add to My Program |
A Quantitative Analysis of Noise Impact on Document Ranking (I) |
|
Giamphy, Edward | Preligens and La Rochelle Université |
Sanchis, Kevin | Preligens |
Dashyan, Gohar | Preligens |
Guillaume, Jean-Loup | La Rochelle University |
Hamdi, Ahmed | University of La Rochelle |
Sanselme, Lilian | Preligens |
Doucet, Antoine | La Rochelle UniversitÉ |
Keywords: Information Systems for Design, Networking and Decision-Making, Human-Computer Interaction
Abstract: After decades of massive digitization, a substantial amount of documents exists in digital form. The accessibility of these documents is strongly impacted by the quality of document indexing. Most of these documents are indexed in noisy versions that include numerous errors. The noise can be due to manual input mistakes or optical character recognition process and results in errors like spelling mistakes, missing characters, and others. This paper presents a study of the impact of noise on document ranking, an essential task in natural language processing (NLP) with wide-ranging practical applications. We provide a deep and quantitative analysis of the impact of recognition errors on document ranking by testing two popular ranking models on several noisy versions of a subset of the MS MARCO passage ranking dataset, with various levels and types of noise. Our study provides insights into the challenges of document ranking under noisy conditions and advocates for developing ranking models that are more robust to noise.
|
|
11:30-11:45, Paper We-PS20-T7.4 | Add to My Program |
Combining Embedding-Based and Semantic-Based Models for Post-Hoc Explanations in Recommender Systems (I) |
|
Le, Ngoc Luyen | Université De Technologie De Compiègne |
Abel, Marie-Hélène | Sorbonne Universités, Université De Technologie De Compiègne, CN |
Gouspillou, Philippe | Vivocaz, 8 B Rue De La Gare, 002200, Mercin-Et-Vaux, France |
Keywords: Information Systems for Design and Marketing, Networking and Decision-Making, Information Visualization
Abstract: In today’s data-rich environment, recommender systems play a crucial role in decision support systems. They provide to users personalized recommendations and explanations about these recommendations. Embedding-based models, despite their widespread use, often suffer from a lack of interpretability, which can undermine trust and user engagement. This paper presents an approach that combines embedding-based and semantic-based models to generate post-hoc explanations in recommender systems, leveraging ontology-based knowledge graphs to improve interpretability and explainability. By organizing data within a structured framework, ontologies enable the modeling of intricate relationships between entities, which is essential for generating explanations. By combining embedding-based and semantic based models for post-hoc explanations in recommender systems, the framework we defined aims at producing meaningful and easy-to-understand explanations, enhancing user trust and satisfaction, and potentially promoting the adoption of recommender systems across the e-commerce sector.
|
|
11:45-12:00, Paper We-PS20-T7.5 | Add to My Program |
CORec-Cri: How Collaborative and Social Technologies Can Help to Contextualize Crises? (I) |
|
Le, Ngoc Luyen | Université De Technologie De Compiègne |
Zhong, Jinfeng | Paris-Dauphine University, PSL Research University, CNRS UMR 724 |
Negre, Elsa | Paris-Dauphine University, PSL Research University |
Abel, Marie-Hélène | Sorbonne Universités, Université De Technologie De Compiègne, CN |
Keywords: Cooperative Work in Design, Networking and Decision-Making, Interactive Design Science and Engineering
Abstract: Crisis situations can present complex and multifaceted challenges, often requiring the involvement of multiple organizations and stakeholders with varying areas of expertise, responsibilities, and resources. Acquiring accurate and timely information about impacted areas is crucial to effectively respond to these crises. In this paper, we investigate how collaborative and social technologies help to contextualize crises, including identifying impacted areas and real-time needs. To this end, we define CORec-Cri (Contextulized Ontology-based Recommender system for crisis management) based on existing work. Our motivation for this approach is two-fold: first, effective collaboration among stakeholders is essential for efficient and coordinated crisis response; second, social computing facilitates interaction, information flow, and collaboration among stakeholders. We detail the key components of our system design, highlighting its potential to support decision-making, resource allocation, and communication among stakeholders. Finally, we provide examples of how our system can be applied to contextualize crises to improve crisis management.
|
|
We-PS20-T8 Regular Session, Kahuku |
Add to My Program |
Design Methods II |
|
|
|
10:45-11:00, Paper We-PS20-T8.1 | Add to My Program |
Estimating Finger Joint Angles with Wearable System Based on Machine Learning Model Utilizing 3D Computer Graphics |
|
Obinata, Taichi | University of Tsukuba |
Yoshikawa, Dan | University of Tsukuba |
Uehara, Akira | University of Tsukuba |
Kawamoto, Hiroaki | University of Tsukuba |
Keywords: Human-Computer Interaction, Assistive Technology, Design Methods
Abstract: Robotic rehabilitation for paralyzed hands utilizes exoskeletons and soft gloves equipped with active mechanisms to provide support for hand motion. From a safety and control perspective, it is imperative to measure finger joint angles during motion support provided by a soft robotic wearable system. However, embedding sensors into these devices can be inconvenient as it may lead to bulkiness or structural difficulties. This study aims to develop a machine learning model for estimating finger joint angles from images utilizing data created with computer graphics (CG), and to validate the feasibility of this method through basic experiments. The three-dimensional CG (3DCG) hand model includes bones corresponding to the major joints of the fingers, and a wearable system of the index finger imported from a computer-aided design software was attached to the 3DCG hand model. After rendering the integrated motion between the hand model and the wearable system for finger flexion and extension, images in conjunction with the finger joint angles were recorded as the training data. The finger joint angle estimator was based on a pre-trained vision transformer model and was learned using the created 3DCG training dataset. The basic experiments showed that the developed machine learning model enabled the estimation of finger joint angles with reproducibility for four types of hand postures with the wearable system. Furthermore, the differences between the joint angles measured in the real world and those estimated by the developed model were smaller than those for an existing hand pose estimation model. The developed machine learning model, which utilizes 3DCG, has the potential to estimate finger joint angles with the wearable system using image videos.
|
|
11:00-11:15, Paper We-PS20-T8.2 | Add to My Program |
A Novel KL Divergence Optimization Method for Aligning Neural Population Patterns During Task Learning |
|
Song, Zhiwei | The Hong Kong University of Science and Technology |
Zhang, Xiang | The Hong Kong University of Science and Technology |
Wang, Yiwen | Hong Kong University of Science and Technology |
Keywords: Brain-Computer Interfaces, Design Methods, Information Visualization
Abstract: Numerous studies suggest that learning related but different tasks prior to a new task makes it easier, possibly because of our brain's neural pattern alignment mechanism. Specifically, the neural patterns in the new task align with those in the learned task, enabling the reuse of knowledge from the previous task to aid learning in the new task. Brain-machine interface (BMI) is an excellent tool for analyzing the dynamics of neural population patterns during new task learning by directly recording neural signals from the brain. If we can repeat the neural pattern alignment process through an alignment algorithm using the recorded neural signals, it would provide a computational tool to help us understand the brain mechanism during task learning. Additionally, the pre-trained decoder parameters from the old task can be reused to expedite learning in the new task. However, the existing Iterative Closest Point (ICP) method easily fails as it is sensitive to neural data distribution. This paper proposes a pair-wise Kullback Leibler (KL) divergence optimizing framework for stable neural pattern alignment. The KL divergence measures the difference between the data distribution of the previous task and the aligned new task. The alignment process is formulated as an optimization problem by minimizing the KL divergence. The proposed algorithm is tested in a simulated experiment where a rat learns a two-lever discrimination task from a one-lever pressing task. Three scenarios are designed to test the feasibility of our algorithm, including non-Gaussian neural pattern shapes, noisy neural data, and different alignment angles. The results demonstrate that the proposed method is more robust than ICP, indicating its potential to discover the brain's alignment mechanism more accurately.
|
|
We-PS20-T9 Regular Session, Oahu |
Add to My Program |
Systems Safety and Security |
|
|
|
11:15-11:30, Paper We-PS20-T9.3 | Add to My Program |
A Static Multi-Class Malicious Office Document Detection Method Via Multi-Feature Fusion |
|
Chen, Jia | Beihang University |
Hu, Yang | Beihang University |
Luo, Xin | Chinese Academy of Sciences |
Keywords: Systems Safety and Security
Abstract: Microsoft Office documents have become hackers’ preferred tool to construct malicious documents. However, current research on detecting malicious Office documents has not covered all document formats and various types of malicious attacks. To address this issue, this paper proposes a Static Multi-class Malicious Office document Detection Method (SM2ODM) for multiple versions of Office documents. The focus of this research is to design a unified static feature representation method for multiple versions of Office documents via multi-feature fusion, including VBA (Visual Basic for Applications) code keywords, DDE (Dynamic Data Exchange) instructions, embedded files, OLE (Object Linking and Embedding) objects, external links, and other relevant features. In addition, this research identifies eight new types of malicious features and embedding locations. Then, this paper proposes a multi-class detection method for malicious Office documents that can detect five common types of malicious documents. Through analyzing 20,000 samples provided by Topsec Technologies Group, the proposed SM2ODM achieves high accuracy in multi-classification detection and identifies 185 malicious Office samples that common antivirus software failed to detect.
|
|
We-PS20-T10 Regular Session, Hawaii 5 |
Add to My Program |
Affective Computing I |
|
|
|
10:45-11:00, Paper We-PS20-T10.1 | Add to My Program |
Leveraging Task-Specific Context to Improve Unsupervised Adaptation for Myoelectric Control |
|
Eddy, Ethan | University of New Brunswick |
Campbell, Evan | University of New Brunswick |
Bateman, Scott | University of New Brunswick |
Scheme, Erik | University of New Brunswick |
Keywords: Human-Computer Interaction, Human-Machine Interaction, Human-Machine Interface
Abstract: While there has been renewed interest in the use of myoelectric control for general-purpose applications, the burden of training and maintaining robust models still limits its real-world viability. Online unsupervised adaptation has been proposed to solve this issue by updating the model using predicted pseudo-labels in real time during regular device use. Until now, however, these unsupervised strategies have been limited as they rely on the very classifier outputs they are adapting, making them ill-suited when there is a drastic shift in the input space (e.g., after donning and doffing a device) or there is insufficient training data. In such situations, leveraging context (i.e., task-specific information that can help understand or assess a circumstance) could provide additional guidance for adaptation and improve its robustness. Although difficult to extract in traditional prosthesis control use cases without additional sensors, context may be more readily available in other general-purpose applications, such as in human-computer interaction. In this study, we explore leveraging context, both positive (i.e., reinforcing correct actions) and negative (i.e., correcting poor actions), for conditioning pseudo-label predictions within an adaptive gamified target acquisition setting. The results show that leveraging this additional context significantly outperforms the current state-of-the-art high-confidence unsupervised adaptation (p<0.05) using both offline and online performance metrics. This pilot work contributes novel findings and contextual approaches that do not rely on additional sensors, and thus outlines a promising direction of study for myoelectric control as a reliable and effective interaction technique.
|
|
11:00-11:15, Paper We-PS20-T10.2 | Add to My Program |
Metamorphopsia Insepction System Based on Relevance Feedback |
|
Zhu, Zhenyang | University of Yamanashi |
Moritake, Katsuhito | University of Yamanashi |
Kashiwagi, Kenji | University of Yamanashi |
Toyoura, Masahiro | University of Yamanashi |
Go, Kentaro | University of Yamanashi |
Fujishiro, Issei | Keio University |
Mao, Xiaoyang | University of Yamanashi |
Keywords: Assistive Technology, Human-Computer Interaction
Abstract: People with metamorphopsia suffer from perceiving things in a distorted way. Various methods for examining metamorphopsia have been suggested in the current literature, with the most advanced techniques demonstrating the ability to yield quantitative measurements. However, these cutting-edge methods necessitate extended examination durations and impose challenging manipulations on patients. In this study, our objective is to enhance the time efficiency of the inspection process and alleviate the burden placed on the user. We propose a novel user-friendly quantitative inspection system which utilizes interactive reinforcement learning. Instead of having users directly operate the system, we ask them to evaluate the stimuli generated by the system. Based on their evaluations, the system gradually refines the deformation map representing the distortion perceived by the user. The reinforcement learning scheme is implemented using relevance feedback approach based on optimum-path forest classifier. To evaluate the effectiveness of the proposed system, subjective evaluation experiments involving simulated and real metamorphopsia participants were conducted in this study. The experimental findings reveal that, when compared to the state-of-the-art method, our proposed system yields comparable inspection outcomes while significantly reducing both the inspection duration and the mental workload.
|
|
11:15-11:30, Paper We-PS20-T10.3 | Add to My Program |
Real-Time Learning of Driving Gap Preference for Personalized Adaptive Cruise Control |
|
Zhao, Zhouqiao | UC, Riverside |
Liao, Xishun | UC, Riverside |
Abdelraouf, Amr | Toyota Motor North America R&D |
Han, Kyungtae | Toyota Motor North America |
Gupta, Rohit | Toyota Motor Engineering & Manufacturing North America, Inc |
Wu, Guoyuan | Center for Environmental Research and Technology, UC, Riverside |
Barth, Matthew | Center for Environmental Research and Technology, UC, Riverside |
Keywords: Human-Centered Transportation, Human-centered Learning, Human-Machine Interaction
Abstract: Advanced Driver Assistance Systems (ADAS) are increasingly important in improving driving safety and comfort, with Adaptive Cruise Control (ACC) being one of the most widely used. However, pre-defined ACC settings may not always align with driver's preferences and habits, leading to discomfort and potential safety issues. Personalized ACC (P-ACC) has been proposed to address this problem, but most existing research uses historical driving data to imitate behaviors that conform to driver preferences, neglecting real-time driver feedback. To bridge this gap, we propose a cloud-vehicle collaborative P-ACC framework that incorporates driver feedback adaptation in real time. The framework is divided into offline and online parts. The offline component records the driver's naturalistic car-following trajectory and uses inverse reinforcement learning (IRL) to train the model on the cloud. In the online component, driver feedback is used to update the driving gap preference in real time. The model is then retrained on the cloud with driver's takeover trajectories, achieving incremental learning to better match driver's preference. Human-in-the-loop (HuiL) simulation experiments demonstrate that our proposed method significantly reduces driver intervention in automatic control systems by up to 62.8%. By incorporating real-time driver feedback, our approach enhances the comfort and safety of P-ACC, providing a personalized and adaptable driving experience.
|
|
11:30-11:45, Paper We-PS20-T10.4 | Add to My Program |
Impact of QOL-Based Robot Counseling on Older Adults' QOL Improvement |
|
Nakagawa, Satoshi | The University of Tokyo |
Naruse, Kana | The University of Tokyo |
Endo, Ryoga | The University of Tokyo |
Kuniyoshi, Yasuo | The University of Tokyo |
Keywords: Companion Technology, Assistive Technology, Affective Computing
Abstract: This study aimed to develop a counseling robot to improve the quality of life (QOL) of older adults and verify its effectiveness. QOL is a comprehensive indicator that includes physical, mental, and social aspects, and gerontechnology aims to improve the autonomy and QOL of older adults. In recent years, utilizing robots to address the shortage of caregivers has been increasingly studied. We developed a counseling robot that estimates the QOL of older adults in real time, and generates appropriate and empathetic responses according to the estimated QOL. The results obtained from a one-week interaction experiment with a counseling robot targeted toward older adults indicated a significant improvement in the mental aspect of QOL, and the use of cognitive behavioral therapy and empathetic responses was inferred to facilitate self-disclosure and enhance the effectiveness of counseling. Additionally, advice based on the QOL estimation results contributed to organizing the thoughts of older adults and further improved their mental health. This study not only contributes to improving the QOL of older adults but also suggests that robots that understand and appropriately respond to individuals can facilitate continuous relationship building. This study may also serve as a guide for promoting the introduction of information and communication technology into welfare facilities.
|
|
11:45-12:00, Paper We-PS20-T10.5 | Add to My Program |
Adjusted Attention YOLOX-Based Far-Distance Face-Recognition |
|
Hwang, Chih-Lyang | National Taiwan University of Science and Technology |
Cheng, Zih-En | National Taiwan University of Science and Technology |
Keywords: Affective Computing, Human-Machine Interaction, Intelligence Interaction
Abstract: To satisfy the required recognition accuracy from a far distance, an adjusted attention-based YOLOX for face recognition (AA-YOLOX-FR) is designed by an appropriate segmentation of the original image, so that faces’ pixels (e.g.,20x20 at 15m) effectively train, validate, and test. Based on an edge computing platform (e.g., NVIDIA Jetson-AGX), the processing time for the image with 8 segmentations of 640x640 equals 296.3ms in comparison to 95.4ms for its down-sampling to 640x640. Although the down-sampling technique achieves a faster processing speed, its recognition rate decreases as a face is at a far distance or with different lighting conditions. The average online video-based recognition rate of AA-YOLOX-FR for a distance from 10m to 15 m and different lighting conditions is 92.5%.
|
|
We-PS20-T11 Regular Session, Hawaii 6 |
Add to My Program |
Assistive Technology |
|
|
|
10:45-11:00, Paper We-PS20-T11.1 | Add to My Program |
A Study on a Sensory Feedback Armband Providing Sensory Information of Gripping Force and Finger Posture for Patients with Hand Paralysis |
|
Yoshikawa, Dan | University of Tsukuba |
Kawamoto, Hiroaki | University of Tsukuba |
Sankai, Yoshiyuki | University of Tsukuba |
Keywords: Assistive Technology, Wearable Computing, Haptic Systems
Abstract: Patients with hand paralysis have motor and sensory dysfunction in their hands. They can neither flex/extend their fingers nor feel gripping force or finger posture. To allow patients with hand paralysis to perform gripping motions, it is necessary to assist finger motion and provide sensory information about the gripping force and posture at the residual sensory area. We previously developed a wearable finger-motion assist system that assists finger motion and measures the gripping force necessary to present sensory information. Therefore, the purpose of this study is to propose and develop a wearable system that provides multiple types of sensory information based on gripping force and finger posture estimated by finger-motion assist system, and to confirm the basic performance of the system through experiments. We developed a band-type sensory feedback system that tightens the forearm in response to gripping force using tendon-driven wires. A vibrator is installed in it, and the intensity of vibration changes in response to finger posture. As basic experiments, identification of gripping force and finger posture using band squeezing and vibration was tested. The results showed the overall identification rate for the combination of gripping force and finger posture that were both correct was 75.6%. The identification rates for gripping force and finger posture when stimulations were presented simultaneously were 93.3 % and 80.9 %, respectively. In conclusion, we confirmed the feasibility of this system that presents multiple types of sensory information.
|
|
11:00-11:15, Paper We-PS20-T11.2 | Add to My Program |
Development of a Balance Training Device That Can Apply Disturbance to the Ankle Using Pneumatic Gel Muscles |
|
Isoshima, Keigo | Hiroshima University |
Kurita, Yuichi | Hiroshima University |
Hirata, Kazuhiko | Hiroshima University Hospital |
Kimura, Hiroaki | Seiwakai Medical Corporation Association |
Keywords: Assistive Technology
Abstract: Although the average life expectancy in Japan has been increasing in recent years, the problem of the large gap between healthy life expectancy and average life expectancy is still unresolved. Among the factors that lead to the need for nursing care, injuries due to falls account for a certain percentage of the total. In this paper, we developed boots that can provide external disturbance to the ankle with pneumatic gel muscles (PGM). We conducted an experiment using electromyographic potentials (EMG) and center of foot pressure (COP) as evaluation indices to evaluate the effectiveness of fall prevention training using this device, which is smaller and lighter than conventional devices and reported on the usefulness of that one.
|
|
11:15-11:30, Paper We-PS20-T11.3 | Add to My Program |
Prediction of Gait Speed from Acceleration Based on Long Short-Term Memory |
|
Kambashi, Shuhei | Tokyo Denki University |
Inoue, Jun | Tokyo Denki University |
Keywords: Assistive Technology, Human Performance Modeling
Abstract: The authors have developed a cane gait-training machine that enables stroke and paraplegic patients to safely rehabilitate on their own. This training machine is pulled by a wire connected to a harness on the patient’s waist; thus, the training machine can follow the patient without the use of hands. However, the machine’s ability to follow the walker is an issue. Therefore, we motorised the casters and set them to follow the patient’s movements. To cope with the transmission, processing, and mechanical delays that occur in this process, and for predictive control, we used long short-term memory, a machine learning method, which predicted the future waist gait speed from the acceleration measured at multiple sites on the body. In this study, we examined the effects on prediction error of varying the combination of acceleration measurement sites used for learning and the prediction horizon, which is the target time to be predicted. Prediction errors under certain conditions enabled the prediction of each subject’s average gait speed with an accuracy of 3–5%. Overall, the prediction error increased with longer prediction horizon but temporarily decreased at 0.4 s. Although prediction is possible using only one site on the lower body, we believe that prediction using multiple sites will reduce the error due to noise.
|
|
11:45-12:00, Paper We-PS20-T11.5 | Add to My Program |
Hand Gesture Classification Model for Intelligent Wheelchair with Improved Gesture Variance Compensation |
|
Bandara, H.M. Ravindu T. | University of Moratuwa |
Priyanayana, Kodikarage Sahan | University of Moratuwa |
Rajendran, Hoshalarajh | University of Moratuwa |
Pathirana, Chandima | University of Moratuwa |
Jayasekara, Buddhika | University of Moratuwa |
Keywords: Intelligence Interaction, Assistive Technology
Abstract: The rapid increase in the elderly and disabled population has been identified as a growing socioeconomic problem. Due to reasons such as a lack of reliable caretakers and the need to empower the elderly and disabled population, it is important to have assistive devices. The interactive capabilities of these devices should match the nature of the interaction that prospective users would have with their companions. Humans communicate with each other in many modalities, such as speech, hand gestures, head gestures, gaze, etc. Hand gestures have been a popular modality that has been used in these interactive devices for speech and mobility-impaired wheelchair users. There have been many gesture models that have been developed recently for hand gesture-controlled wheelchair navigation. Natural hand gestures that are used in human-human interactions include both static and dynamic gestures. Therefore, it was logical to include both of these gestures in a navigational gesture model or hand gesture-controlled navigational system. However, hand tremors that are prevalent among the elderly and disabled community could affect the nature of the hand gesture. These tremors can vary from person to person, and hence the fixed ranges cannot be used for hand features. Hand features such as palm velocity, fingertip velocity, and others will have different ranges from person to person. Due to these reasons, a static hand gesture intended by the human user could be identified as a dynamic gesture. Further, this could lead to the misrecognition of gestures defined in gesture models. Therefore, a system is proposed in this paper to validate the gestures by considering the activity of hand features in 3D regions defined for the gesture. The accuracies of the improved system for static and dynamic gestures were 0.9849 and 0.9840, which were improvements from the accuracies of 0.8994 and 0.8479.
|
|
We-PS20-T13 Regular Session, Kona |
Add to My Program |
Brain-Computer Interfaces II |
|
|
|
10:45-11:00, Paper We-PS20-T13.1 | Add to My Program |
Distance Metric-Based Classification Comparisons for a Brain-Computer Interface Authentication |
|
Lewis, Tyree | University of South Florida |
Agarwal, Rupal | University of South Florida |
Andujar, Marvin | University of South Florida |
Keywords: Brain-Computer Interfaces, Human-Machine Interaction, Biometrics and Applications,
Abstract: The rise of security concerns has spurred ongoing research into Brain-Computer Interfaces (BCI) based authentication. These applications utilize electroencephalogram (EEG) signals, due to their properties that can enhance security systems. In previous studies, EEG data has been incorporated into various authentication systems to compare the performance of new and existing classification methods. However, using EEG data to compare the performance of distance metrics in a P300-based BCI authentication system has not been explored yet. In this study, EEG data is used to determine the most effective distance metric for authenticating users in a closed-loop system. To accomplish this task, we conducted a longitudinal study to evaluate three distance metrics (Cosine, Correlation and Chebyshev) while participants interacted with our BCI authentication system. Our results indicated that the Cosine similarity outperformed all other distance metrics for each user.
|
|
11:00-11:15, Paper We-PS20-T13.2 | Add to My Program |
Feature Engineering for an Efficient Motor Related EcoG BCI System |
|
Jain, Ritwik | IIT Kharagpur |
Jaiman, Prakhar | BITS Pilani, Goa |
Baths, Veeky | BITS Pilani, K.K. Birla Goa Campus |
Keywords: Brain-Computer Interfaces
Abstract: Invasive Brain Computer Interface (BCI) systems through Electrocorticographic (ECoG) signals require efficient recognition of spatiotemporal patterns from a multi-electrodes sensor array. Such signals are excellent candidates for automated pattern recognition through machine learning algorithms. The importance of these patterns can be highlighted through feature extraction techniques. However, the signal variability due to non-stationarity is ignored while extracting features, and which features to use can be challenging to figure out by visual inspection. In this study, we introduce the signal split parameter to account for the variability of the signal and increase the accuracy of the machine learning classifier. We use genetic selection, which allows the selection of the optimal combination of features from a pool of 8 different feature sets. Genetic selection of features increases accuracy and reduces the BCI’s prediction time. Along with Genetic selection, we also use a reduced signal length, which leads to a higher Information Transfer Rate. Thus this approach enables the design of a fast and accurate motor-related EcoG BCI system.
|
|
11:15-11:30, Paper We-PS20-T13.3 | Add to My Program |
Barrier Certificates for a Computational Model of Epileptic Seizures |
|
Ingham, John Frank | Newcastle University |
Wang, Yujiang | Newcastle University |
Zuliani, Paolo | Newcastle University |
Soudjani, Sadegh | Newcastle University |
Keywords: Brain-Computer Interfaces
Abstract: The concept of barrier certificate has been developed recently in control theory to give formal guarantees on safety of a dynamical system. Neural mass models (NMMs) simulate the aggregated activity of neurons in the brain and have been used to model phenomena such as epilepsy. With a view to move towards novel treatments for epilepsy by investigating the application of control theory to epilepsy, we take one such NMM, the Wilson-Cowan (WC) model, and show that it is possible to automatically generate barrier certificates in both deterministic and non-deterministic cases, where the parameters of the model belong to an uncertainty set.
|
|
11:30-11:45, Paper We-PS20-T13.4 | Add to My Program |
Latent State Synchronization in Dyadic Partners Using EEG |
|
Gordon, Stephen | DCS Corporation |
King, Kevin | DCS Corp |
Rabin, Ashley | DCS Corporation |
Keywords: Brain-Computer Interfaces, Human Performance Modeling, Multi-User Interaction
Abstract: Emerging research has shown that interbrain synchronization can occur when individuals interact on a shared task or receive shared stimuli. While the mechanisms producing such effects are not fully understood, interbrain synchronization has been observed in multiple contexts. Analyses of such synchrony, however, are often limited to individual, bottom-up features such as activity at individual cortical sites and frequency bands, or cross-projections of the data into a common space that maximizes correlation. It is an open question whether synchronization can also be observed using more top-down notions of latent state composed of multiple, simultaneous features that are not fit to the individual or task at hand? Here, we investigate whether latent state estimation methods, borrowed from the domain generalization toolboxes of the brain-computer interface community can be used to assess interbrain synchrony. Domain generalization methods learn models of neural activity that transfer from one domain to another without the need for participant- or task-specific training data. We use data from single individuals in a controlled task to train a domain-generalized model to detect latent state changes previously described as relating to alertness or vigilance. We apply the model to a novel, target domain in which teams of two worked together to locate and defuse simulated improvised explosive devices. Our results revealed significantly greater correlation between the latent states of dyadic partners compared to randomly paired individuals, establishing that interbrain synchronization can be observed using top-down methods that are not fit to the individuals, or task, in question.
|
|
11:45-12:00, Paper We-PS20-T13.5 | Add to My Program |
Exploring Hierarchical Changes in Functional Brain Network Hubs through Brain-Activity Prediction with Convolutional Neural Networks |
|
Kawasaki, Haruka | Ochanomizu University |
Nishida, Satoshi | National Institute of Information and Communications Technology |
Kobayashi, Ichiro | Ochanomizu University |
Keywords: Brain-based Information Communications
Abstract: This study aims to clarify how functional network hubs change during hierarchical visual processing in the human brain through the estimation of brain states from features extracted using a convolutional neural network (CNN), a hierarchical model of image processing. We used representational similarity analysis for brain states predicted through encoding models based on feature representations at each layer of the CNN, and applied the PageRank algorithm to matrices converted from the generated representational dissimilarity matrices to capture the hub characteristics of brain region-related systems. This succeeded in capturing changes in the hubness of interregional brain coordination during hierarchical information processing in the human cerebral cortex in visual processing. Specifically, we found that the hubness of the occipital visual cortex increased in the early phase of visual processing, and that the hubness of the prefrontal cortex and temporal lobe increased in the late phase of visual processing. From the above, we found that our proposed method allows us to capture hierarchical changes in the hubness of interregional coordination.
|
|
We-PS30T1 Workshop Session, Puna |
Add to My Program |
Representations of Neural Data in Brain-Machine Interfaces |
|
|
Organizer: Putze, Felix | University of Bremen |
Organizer: Herff, Christian | Maastricht University |
Organizer: Vortmann, Lisa-Marie | University of California, San Diego |
Organizer: Krusienski, Dean | Virginia Commonwealth University |
|
13:00-13:15, Paper We-PS30T1.1 | Add to My Program |
On the Use of FOOOF for Electroencephalography Quality Measurement and Device Assessment (I) |
|
Tiwari, Abhishek | Myant Inc |
Wu, Gloria | Myant Inc |
Innanen, Katrina | Myant Inc |
Mahnam, Amin | Myant Inc |
Moineau, Bastien | Myant Inc |
Falk, Tiago H. | INRS-EMT |
Keywords: Other Neurotechnology and Brain-Related Topics, BMI Emerging Applications, Active BMIs
Abstract: Electroencephalography (EEG) signals capture the electrical activity of the brain and have traditionally been used in clinical settings. More recently, with the development of mobile, wearable EEG devices their use has been explored for other applications, including emotion recognition, quality of experience monitoring, or fatigue detection, just to name a few. EEG signals are known to have a 1/f-noise like structure and are very low amplitude, thus making them highly susceptible to artefacts, such as power line interference, muscle movement, and eye blinks. Moreover, prototyping and development of new devices to record EEG signals may introduce additional sources of artefacts generated by different instrumentation settings. As such, automated quantification of the quality of the EEG signals has become important and the focus of recent research. Here, we propose a new quality metric based on the 1/f-noise structure of the EEG signal. Experimental results show the proposed metric classifying clean versus noisy EEG segments in a subject-independent setting with an accuracy of 86.0% for the AF7 electrode location and 64.6% for AF8, two electrode locations known to be highly degraded by artefacts. Additionally, the proposed metrics are shown to generalize well to unseen electrode locations. For example, a quality model trained on AF7 noisy EEG data achieved an accuracy of 61.4% when tested on data collected from the AF8 location.
|
|
13:15-13:30, Paper We-PS30T1.2 | Add to My Program |
Analyzing the Importance of EEG Channels for Internal and External Attention Detection (I) |
|
Putze, Felix | University of Bremen |
Eilts, Hendrik | University of Bremen |
Keywords: Passive BMIs, BMI Emerging Applications
Abstract: For Brain-Computer Interfaces to be affordable and efficient, it is worth analyzing the importance of individual EEG channels and finding the smallest subset that still provides adequate results. In this work, we applied five different feature importance approaches to three different datasets focused on internally and externally directed attention. The methods used were: Random Forest Importance, Mutual Information, Permutation Importance, Shapley Additive Explanations, and an Ablation Study. We determined the importance of EEG channels, compared the results among the algorithms through correlation analysis, and evaluated the classification performance using different subsets of channels to validate the importance rankings. The results indicate that, in line with the existing literature, electrodes located on the right parietal cortex with the alpha frequency band appear to be the most important, followed by several channels covering the left parietal and frontal lobes. For the first dataset, we were able to reduce the number of channels from 32 to 2 while increasing the accuracy from approximately 60% to 62%. In the second dataset, we reduced the number of channels from 12 to 1 while improving the accuracy from approximately 59.5% to 64%. In the third dataset, we reduced the number of channels from 19 to 1 while only slightly decreasing the accuracy from approximately 60% to 58.5%. The results of the individual feature importance methods generally exhibit positive correlations with each other. Furthermore, we demonstrated that the rankings of channels between subjects are largely positively correlated, suggesting the presence of shared patterns of neural activity across subjects. The substantial reduction in EEG channels, identification of crucial brain regions and application of channel feature importance methods to internal/external attention data, collectively advance the development of cost-effective and efficient Brain-Computer Interfaces, paving the way for future advancements in the field.
|
|
13:45-14:00, Paper We-PS30T1.4 | Add to My Program |
Modeling of Perceived Musical Rhythms Using Electrocorticography (I) |
|
Dexheimer, Michael | Virginia Commonwealth University |
Johnson, Garett | Old Dominion University |
Shih, Jerry | UC San Diego Health |
Herff, Christian | Maastricht University |
Krusienski, Dean | Virginia Commonwealth University |
Keywords: BMI Emerging Applications, Other Neurotechnology and Brain-Related Topics, Active BMIs
Abstract: Numerous studies have explored the neural correlates of musical rhythms using various neuroimaging modalities. Non-invasive neuroimaging modalities lack either the spatial or temporal resolution to reveal the nuances the neural processes involved in perception of musical rhythms. Intracranial recordings of electrophysiological activity such as electrocorticography (ECoG) can jointly provide spatial and temporal resolution for improved characterization and modeling of the underlying processes. The present study examines anticipatory and perceptual models that use ECoG recordings to estimate simple perceived and imagined musical rhythms in human participants. The resulting models are characterized and compared across participants. The results show that the anticipatory and perceptual models can reconstruct the auditory stimulus envelope with statistically-significant correlations when trained and tested on independent listening data. However, these models are unable to reliably reconstruct the expected rhythm pattern when trained on listening data and applied to imagining data. This suggests, similar to recent findings in overt and imagined speech decoding using intracranial signals, that there are likely distinct neural substrates activated during listening and imagining of musical rhythms.
|
|
14:00-14:15, Paper We-PS30T1.5 | Add to My Program |
Semantic Representations of Speech Production in Intracranial EEG (I) |
|
Herff, Christian | Maastricht University |
Verwoert, Maxime | Maastricht University |
Amigó-Vega, Joaquín | Gran Sasso Science Institute |
Ottenhoff, Maarten | Maastricht University |
Keywords: BMI Emerging Applications, Active BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: Speech neurprostheses have the potential to provide severely paralyzed patients with a means of communication. To enable the best possible decoding of speech processes from neural data, it is important to chose a representation of speech that is both meaningful to the decoding process and represented well in the neural recordings. Previously, acoustic, articulatory, and textual representations of speech have been decoded from neural recordings. Semantic representations of speech could add additional information about the content of the produced speech. In this study, we show that semantic embeddings for individual words, as extracted by a emph{word2vec}-model, can be used to reconstruct neural activity during speech production across wide-spread cortical and subcortical areas. We elucidate the temporal dynamics of reconstruction quality and show that a slight right hemisphere preference exists. These findings could be used to add semantic information into speech neuroprostheses in the future.
|
|
14:15-14:30, Paper We-PS30T1.6 | Add to My Program |
Machine Learning from Mistakes: Self-Improving Attention Classifier Using Error-Related Potentials (I) |
|
Vortmann, Lisa-Marie | University of California, San Diego |
Urban, Timo | Cognitive Systems Lab, University of Bremen |
Putze, Felix | University of Bremen |
Keywords: Passive BMIs, BMI Emerging Applications
Abstract: The detection of an individual's attentional state via a Brain-Computer Interface (BCI) holds significant promise, offering a multitude of possibilities, including enhancing the usability of applications and enabling timely alerts in hazardous situations. However, the variability of EEG data between individuals and the dynamic nature of recordings pose practical challenges for achieving reliable results with BCIs. Thus, conventional methods often require the collection of each person's training data prior to usage, which is then used to train an individual model for detection. Such training data collection , makes it difficult to achieve practical use. To overcome this challenge, we propose a self-improving online learning system that personalizes a person-independent model for detecting attentional state in real-time during runtime. This eliminates the need for collecting individual training data prior to usage and instead generates the necessary labels for adaptation using automatically detected error-related potentials. The system was developed based on pre-trained models of two classifiers and used to evaluate different strategies of adaptation and label generation. A statistically significant accuracy improvement of 0.088 was achieved across all available subjects, based on simulations with pre-recorded data. These results suggest that person-dependent models for attentional state detection could in the future be substituted by self-improving classifiers that do not require a dedicated training data collection.
|
|
14:30-14:45, Paper We-PS30T1.7 | Add to My Program |
The Use of SPOD and Spherical Harmonics for the Analysis of EEG Data (I) |
|
Boy, Johann Heinrich | Technische Universität Berlin |
Sieber, Moritz | Technische Universität Berlin |
Oberleithner, Kilian | TU Berlin |
Martinuzzi, Robert J. | University of Calgary |
Hu, Yaoping | University of Calgary |
Keywords: Passive BMIs, Other Neurotechnology and Brain-Related Topics
Abstract: The assessment of mental workload from electroencephalogram (EEG) data for brain-computer interfaces (BCI) poses some challenges due to the interaction of spatial and temporal features within the data. Similar challenges are well known in the analysis of turbulent flows, which exhibit complex space-time correlations resulting from large-scale structures. The similarities motivated us to conduct this feasibility work of applying analytic methods of fluid dynamics – i.e., a scheme combining spectral proper orthogonal decomposition (SPOD) and spherical harmonics as basis functions – to EEG data for identifying features representing mental states. Based on EEG data of an existing BCI Hackathon, the scheme yielded some relevant features across subjects and sessions by relying only on a Fourier transform in time and the basis functions in space. The features were then classified by employing a conventional support vector machine algorithm to produce an accuracy comparable to those reported in a previous study on the same EEG data. This performance comparability indicates the scheme’s potential for analyzing EEG data in BCI applications. Nevertheless, future work is needed to select specific features as general indicators of mental workload.
|
|
We-PS30-T1 Regular Session, Hawaii 1 |
Add to My Program |
Human Factors |
|
|
|
13:15-13:30, Paper We-PS30-T1.2 | Add to My Program |
Uncovering Variability in Human Driving Behavior through Automatic Extraction of Similar Traffic Scenes from Large Naturalistic Datasets |
|
Siebinga, Olger | Delft University of Technology |
Zgonnikov, Arkady | Delft University of Technology |
Abbink, David | Delft University of Technology |
Keywords: Human Factors
Abstract: Recently, multiple naturalistic traffic datasets of human-driven trajectories have been published (e.g., highD, NGSim, and pNEUMA). These datasets have been used in studies that investigate variability in human driving behavior, for example for scenario-based validation of autonomous vehicle (AV) behavior, modeling driver behavior, or validating driver models. Thus far, these studies focused on the variability on an operational level (e.g., velocity profiles during a lane change), not on a tactical level (i.e., to change lanes or not). Investigating the variability on both levels is necessary to develop driver models and AVs that include multiple tactical behaviors. To expose multi-level variability, the human responses to the same traffic scene could be investigated. However, no method exists to automatically extract similar scenes from datasets. Here, we present a four-step extraction method that uses the Hausdorff distance, a mathematical distance metric for sets. We performed a case study on the highD dataset that showed that the method is practically applicable. The human responses to the selected scenes exposed the variability on both the tactical and operational levels. With this new method, the variability in operational and tactical human behavior can be investigated, without the need for costly and time-consuming driving-simulator experiments.
|
|
13:30-13:45, Paper We-PS30-T1.3 | Add to My Program |
The Road to Industry 5.0: The Challenges of Human Fatigue Modeling |
|
Zanoli, Christopher | University of Modena and Reggio Emilia |
Villani, Valeria | University of Modena and Reggio Emilia |
Picone, Marco | University of Modena and Reggio Emilia |
Keywords: Affective Computing, Biometrics and Applications,, Human Factors
Abstract: Industry 5.0 promotes the development of human-centered industrial operations fueled by a fresh wave of disruptive technologies that encourage synergistic human-machine integration. Its focus is on understanding how human cognition contributes to a more secure and harmonious coexistence between humans and machines in industrial scenarios, employing solutions that prioritize fundamental worker demands while preserving or enhancing industrial productivity. In this context, the ability to assess fatigue objectively is crucial for occupational health and safety because it can reduce cognitive and motor function, ultimately lowering productivity and raising the risk of harm to human operators. To this end, wearable systems provide a promising solution for continuous, non-intrusive, and long-term monitoring of biological signals for fatigue detection. However, the adoption of these devices presents unique challenges, such as inter-individual variability that renders traditional one-size-fits-all machine learning models unsuitable. This paper provides an analysis of the current state-of-the-art for wearable device monitoring, including ongoing issues and current knowledge gaps. In addition, an experimental analysis is presented, employing a pattern discovery pipeline based on unsupervised learning on a real-world dataset. Our analysis provides experimental evidence of the limitations of one of the classical approaches to fatigue assessment, thus highlighting the need for more advanced models.
|
|
13:45-14:00, Paper We-PS30-T1.4 | Add to My Program |
A Framework for Intervention Based Team Support in Time Critical Tasks |
|
Hughes, Dana | Carnegie Mellon University |
Li, Huao | University of Pittsburgh |
A Chis, Maximilian | University of Pittsburgh |
Oguntola, Ini | Carnegie Mellon University |
Stepputtis, Simon | Carnegie Mellon University |
Zheng, Keyang | University of Pittsburgh |
Campbell, Joseph | Carnegie Mellon University |
Sycara, Katia | Carnegie Mellon University |
Lewis, Michael | University of Pittsburgh |
Keywords: Team Performance and Training Systems, Human Factors
Abstract: In this paper we describe the intervention framework of ATLAS, an artificial socially intelligent agent that advises teams. The framework treats interventions as atomic components, and manages the lifecycle of each intervention through presentation, as well as followups to interventions. The key benefit of this framework is that it allows for rapid development of scenario-specific Interventions that leverage scenario-agnostic team models. The implementation of this framework is reported for three player teams in a Search and Rescue task simulated in Minecraft. Low competence teams advised by ATLAS improved more between first and second trials than those with a human advisor while the reverse was found for high competence. Four times as many interventions were proposed as were presented. 15% of advice was withheld to avoid repetitive advice, excessive rate of advice, and needlessly advising high performing teams, while a Theory of Mind model and delay for confirmation mechanism filtered out other unnecessary advice.
|
|
14:00-14:15, Paper We-PS30-T1.5 | Add to My Program |
A Multi-Modal Approach to Measuring the Effect of XAI on Air Traffic Controller Trust During Off-Nominal Runway Exits |
|
Pushparaj, Kiranraj | Air Traffic Management Research Institute, Nanyang Technological |
Reddy, Pratusha | School of Biomedical Engineering, Science and Health Systems, Dr |
Vu-Tran, Duy | Air Traffic Management Research Institute, Nanyang Technological |
Izzetoglu, Kurtulus | Drexel University |
Alam, Sameer | Nanyang Technological University |
Keywords: Human-Centered Transportation, Human Factors, Human-Computer Interaction
Abstract: Lack of transparency has been demonstrated to be a stumbling block in building Air Traffic Controller (ATCO) trust towards intelligent decision aids. To address this issue, a runway exit prediction decision aid, with explainability, was developed with the trait of providing explanations involving the top three contributing features to its predictions. To evaluate the influence of the intelligent decision aid’s explanations on ATCO Trust, the decision aid was used in a human-in-the-loop study during off-nominal runway exits, utilizing 12 participants and a total of 67 trials. A multi modal approach was adopted with three types of data (questionnaire, behavioural, physiological) being collected in this study to ensure a more comprehensive understanding of the effects of explainability on ATCO trust. The results indicated that higher levels of perceived transparency led to an increase in trust levels, with an accompanied increase in cognitive load and complacency, even with low prediction accuracy by the intelligent decision aid. As such, these effects must be accounted for in designing XAI decision aids, which are defined as decision aids that rationalize their recommendations, for off-nominal events, when attempting to enhance trust levels by increasing transparency.
|
|
14:15-14:30, Paper We-PS30-T1.6 | Add to My Program |
Effect of Traveling Speed on Visual Processing During Self-Driving of Personal Mobility Vehicle |
|
Hamazaki, Shunichi | The University of Tokyo |
Yoshitake, Hiroshi | The University of Tokyo |
Shino, Motoki | Tokyo Institute of Technology |
Keywords: Human-Centered Transportation, Human Factors
Abstract: The application of self-driving technology to personal mobility vehicles (PMVs) is being investigated. Passengers of self-driving PMVs will likely be engaged in activities with visual processing, such as operating smartphones. Thus, self-driving PMVs need to consider the possibility that passengers will perform these activities and the ease of performing these activities. However, little research has considered such passengers’ convenience when using self-driving PMVs as a service. It is known that visual information is particularly dominant among input information to human beings. As was observed in previous works, changes in the surrounding environment due to moving may negatively impact the passenger’s main activities with visual information processing. The traveling speed affects these changes in the surrounding, and it is a parameter of self-driving methods. Therefore, this paper aims to grasp the effect of traveling speed on passengers’ visual processing during self-driving of PMVs to realize self-driving PMVs on which passengers can easily process information. The effect of the traveling speed of self-driving PMV on task performance and mental workload of passengers’ visual processing was investigated by conducting a simulator experiment. The results showed that although the task performance is maintained, the mental workload of visual processing increases when the PMV moves compared to the stopping condition. Moreover, it was suggested that the visual input of the changing environment and the passenger’s gaze behavior influence the increase in mental workload.
|
|
14:30-14:45, Paper We-PS30-T1.7 | Add to My Program |
Force Information Presentation by Vibrotactile Stimulation Combining Amplitude and Frequency Modulation |
|
Nakada, Naoki | Nagoya Institude of Technology |
Yukawa, Hikari | Nagoya Institute of Technology |
Tanaka, Yoshihiro | Nagoya Institute of Technology |
Keywords: Haptic Systems, Human-Machine Interface, Human Factors
Abstract: Teleoperation makes it possible to work in inaccessible environment. To do so safely and effectively, some means of force feedback is necessary. However, force feedback methods that involve a reaction force often require bulky equipment. Instead, our study has utilized force feedback using vibrotactile stimulation. Although the modulation method of force information to vibrotactile stimulation has previously been demonstrated in magnitude (AM) and frequency (FM), we propose a method called AFM that uses amplitude and frequency modulation simultaneously to improve performance. In particular, we adjusted the amplitude using the sensation level and used logarithmized frequency for the AFM. First, the vibrotactile detection threshold of participants was measured to determine the amplitude to be equivalent in the same sensation level among participants. Then, the performance of discrimination was evaluated through stair-case psychophysical experiments. The results showed that our proposal AFM method significantly improved the discrimination sensitivity for the force information as compared with AM and FM methods.
|
|
We-PS30-T2 Regular Session, Lanai |
Add to My Program |
Human Machine Interface |
|
|
|
13:00-13:15, Paper We-PS30-T2.1 | Add to My Program |
Designing User Interface Elements for Remotely Operated Rubber-Tired Gantry Cranes |
|
Sitompul, Taufik Akbar | Norwegian University of Science and Technology |
Park, Jooyoung | Norwegian University of Science and Technology |
Alsos, Ole Andreas | Norwegian University of Science and Technology |
Keywords: User Interface Design, Human-Machine Interface, Human-Computer Interaction
Abstract: The graphical user interfaces (GUIs) for operating heavy machinery, such as cranes, vary significantly even for the same type of machines depending on machine manufacturers or third-party suppliers who develop the GUIs. This situation leads to diverse GUIs, which require operators to train themselves every time they use GUIs from different machine manufacturers or third-party suppliers. Using significantly different GUIs may also increase the risk of human error, since the GUIs may have different rules or mechanisms that operators should follow. To improve the design consistency across different machine manufacturers and third-party suppliers, there is a need for a design system that crane manufacturers and third-party suppliers can use when developing their own GUIs. This paper presents the process of designing user interface elements for operating remote rubber-tired gantry (RTG) cranes, which will be offered as part of OpenCrane Design System.
|
|
13:15-13:30, Paper We-PS30-T2.2 | Add to My Program |
Embedded Optoelectronics in Fiberglass PCBs and Applications for Robotics with Human Interface and ML-Enabled Detection |
|
Ackerman, Colin | Western Washington University |
Afshari, Reza | Western Washington University |
Lund, John | Western Washington University |
Keywords: Human-Machine Interaction, Human-Machine Interface, User Interface Design
Abstract: We present a novel method for using embedded optoelectronics in existing laminated printed circuit board (PCB) structures to achieve human interface as well as rudimentary sensing of orientation, obstruction, and failure in robotic systems using fiberglass PCBs as foundational structural elements. The ability to use these existing fiberglass structures for dimensionally stable positioning of electronics, mechanical support to broader robotic elements, as well as visualization and sensing, presents an expanded opportunity for PCB utilization in robotics without additional system cost or mass. We demonstrate using a stacked denoising autoencoder (SDAE) with the capacity to use a low feature density infrared emitter and receiver arrays implemented with embedded optoelectronic elements to detect small deviations in the position of human-actuated mechanical elements incorporated into robotic systems.
|
|
13:30-13:45, Paper We-PS30-T2.3 | Add to My Program |
Input Identification of Interface for Cartesian Coordinate System Manipulation Using Machine Learning |
|
Nagai, Harutake | Tokyo Institute of Technology |
Miura, Satoshi | Tokyo Institute of Technology |
Keywords: Human-Machine Interaction, Telepresence, Human-Machine Interface
Abstract: Intuitive robotic tele-operation is a necessary but often difficult task for users because the structure of a robot is different from that of a human body. Our previously developed novel interface allows a user to operate a robot intuitively because the user manipulates this interface according to the Cartesian coordinate system. However, interference between the axes in the interface causes unintended input by the user and reduces operability. In this study, we develop a model that identifies whether each input is the user’s intended or unintended input. We obtained the displacement and angle of the interface’s grip in each direction and investigated the interference relationships between axes. As a result, the large interferences were in the pairs of pitch and Z-axis, and yaw and Y-axis. Finally, we constructed an identification model using a recurrent neural network. The models for all axes achieved F1 scores above 0.97.
|
|
13:45-14:00, Paper We-PS30-T2.4 | Add to My Program |
Human Action Recognition Using Multi-Stream Fusion and Hybrid Deep Neural Networks |
|
Chopra, Saurabh | Royal Holloway, University of London |
Zhang, Li | Royal Holloway, University of London |
Jiang, Ming | University of Sunderland |
Keywords: Human-Machine Interface, Human-Machine Interaction, Human-Computer Interaction
Abstract: Action Recognition in videos is a topic of interest in the area of computer vision, due to potential applications such as multimedia indexing and surveillance in public areas. In this research, we first propose spatial and temporal Convolutional Neural Network (CNNs), based on transfer learning using ResNet101, GoogleNet and VGG16, for undertaking human action recognition. Besides that, hybrid networks such as CNN-Recurrent Neural Network (RNN) models are also exploited as encoder-decoder architectures for video action classification. In particular, different types of RNNs such as Long Short-Term Memory (LSTM), Bidirectional-LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Bidirectional-GRU (BiGRU), are exploited as the decoders for action recognition. To further enhance performance, diverse aggregation networks of CNN and CNN-RNN models are implemented. Specifically, an Average Fusion method is used to integrate spatial and temporal CNNs trained on images, as well as CNN-RNN trained on videos, where the final classification is formed by combining Softmax scores of these models via a late fusion. A total of 22 models (1 motion CNN, 3 spatial CNNs, 12 CNN-RNNs and 6 fusion networks) are implemented which are evaluated using UCF11, UCF50, and UCF101 datasets for performance comparison. The empirical results indicate the significant efficiency of Average Fusion of multiple Spatial-CNNs with one Motion-CNN, and ResNet101-BiGRU, among all the networks for undertaking realistic video action recognition.
|
|
14:00-14:15, Paper We-PS30-T2.5 | Add to My Program |
Spiking Neural Networks for sEMG-Based Hand Gesture Recognition |
|
Montazerin, Mansooreh | Concordia University |
Naderkhani, Farnoosh | Concordia University |
Mohammadi, Arash | Concordia University |
Keywords: Human-Machine Interface, Medical Informatics, Wearable Computing
Abstract: Given the recent surge of significant interest in implementing intelligent hand gesture recognition methods in human-machine interface systems, a wide variety of Deep Neural Networks (DNNs) have been proposed in the literature. In this paper, we introduce a novel and compact Spiking Neural Network (SNN) model for hand gesture recognition using High-Density surface Electromyogram (HD-sEMG) signals. Capitalizing on their ability to extract spatiotemporal features of HD-sEMG signals along with their proven strength in imitating human brain's neural activity using event-driven data processing, we used SNNs as the main building block of our proposed hand gesture recognition model. We show that our proposed model can efficiently differentiate 14 hand movements by considering each sample of the HD-sEMG data as a single time step for the SNN architecture. Moreover, we show that the proposed SNN model does not require huge pre-processing, spike encoding and feature extraction tasks and works effectively on Min-Max normalized continuous-value sEMG signals. We evaluate our SNN model using a 5-fold cross-validation scheme and categorize different participants based on the range of classification accuracy we obtained for them. The following results are acquired by segmenting HD-sEMG signals into windows of size 62.5ms with no overlap. The proposed method led to 6 out of 19 subjects achieving average classification accuracy of greater than 80% with maximum accuracy of 98% associated with 3rd session of the sEMG dataset as the test set.
|
|
14:15-14:30, Paper We-PS30-T2.6 | Add to My Program |
Personalized Decision Supports Based on Theory of Mind Modeling and Explainable Reinforcement Learning |
|
Li, Huao | University of Pittsburgh |
Fan, Yao | University of Pittsburgh |
Zheng, Keyang | University of Pittsburgh |
Lewis, Michael | University of Pittsburgh |
Sycara, Katia | Carnegie Mellon University |
Keywords: Human-Machine Cooperation and Systems, Networking and Decision-Making, Human-Machine Interaction
Abstract: In this paper, we propose a novel personalized decision support system that combines Theory of Mind (ToM) modeling and explainable Reinforcement Learning (XRL) to provide effective and interpretable interventions. Our method leverages DRL to provide expert action recommendations while incorporating ToM modeling to understand users' mental states and predict their future actions, enabling appropriate timing for intervention. To explain interventions, we use counterfactual explanations based on RL's feature importance and users' ToM model structure. Our proposed system generates accurate and personalized interventions that are easily interpretable by end-users. We demonstrate the effectiveness of our approach through a series of crowd-sourcing experiments in a simulated team decision-making task, where our system outperforms control baselines in terms of task performance. Our proposed approach is agnostic to task environment and RL model structure, therefore has the potential to be generalized to a wide range of applications.
|
|
14:30-14:45, Paper We-PS30-T2.7 | Add to My Program |
Development of a Soldier-Robot Teaming Synthetic Environment for Team Effectiveness Evaluation |
|
Fang, Scott | Toronto Research Center, Defence Research and Development Canada |
Hou, Ming | Department of National Defence, Canada |
Pavlovic, Nada | Toronto Research Center, Defence Research and Development Canada |
Banbury, Simon | C3 Human Factors |
Gamble, Murray | CogSim Technologies Inc |
O'Young, Siu | Memorial University of Newfoundland |
Keywords: Human-Machine Cooperation and Systems, Virtual and Augmented Reality Systems, Human Factors
Abstract: To help enhance mission effectiveness for future dismounted soldiers in the Canadian Armed Forces (CAF), a Soldier-Robot Teaming (SRT) concept has been developed and accepted as a force multiplier to extend operational abilities of dismounted soldiers in the battlefield, such as reducing the number of soldiers in dangerous environments and empowering them with advanced robot technologies. To support this endeavour, the Defence Research and Development Canada (DRDC) Toronto Research Centre (TRC) has led a research effort `Concept of Operations (CONOPS) for SRT in the CAF' since 2019. In the first research phase, key stakeholders within the Department of National Defence (DND) and CAF were engaged in a series of meetings and interviews to help identify future SRT concepts across a broad range of missions and operational environments. During the stakeholder analysis, five SRT use cases were developed and validated for the CAF to capture the intended SRT CONOPS, operational priorities, operational contexts, functionality, interactions, and expected mission performance, and to support Land Operations at section and platoon levels. To further facilitate SRT concept development and experimentation activities at DRDC TRC, a synthetic modeling and simulation environment was proposed and developed, in support of future SRT effectiveness research on team communication, coordination, collaboration and trust. In the meantime, this effort can also support studies on human-machine interface and human-systems integration, as well as help carry out SRT human-in-the-loop experiments and trials, for the CAF needs of reducing soldier workload and improving team performance and effectiveness in future SRT operations.
|
|
We-PS30-T3 Regular Session, Lao Needle |
Add to My Program |
Human Machine Interaction |
|
|
|
13:00-13:15, Paper We-PS30-T3.1 | Add to My Program |
Thermal Perception and Response to Overwarmed Contact and Surface Heating on Heat-Sensitive-Impaired Individuals in a BMW Vehicle Environment |
|
Kipp, Manuel | Technical University of Munich |
Hoffman, Fabian | BMW Group |
Koch, Philipp | Technical University of Munich |
Glockner, Matthias | BMW Group |
Bengler, Klaus | Chair of Ergonomics, Technical University of Munich |
Keywords: Human-Machine Interaction, Human-Centered Transportation, Human Factors
Abstract: In the automotive industry, contact and surface heating systems are designed to enhance thermal comfort in vehicles. The maximum temperature thresholds for such systems are integrated in DIN EN ISO 13732-1. This paper presents a study on the effects of contact surface heating on thermal comfort in automotive interiors, with a particular focus on individuals with heat sensitivity impairments caused by polyneuropathy. Two groups of individuals, one with normal sensitivity and the other with impaired heat sensitivity due to polyneuropathy, were tested for their thermal perception and response to an overwarmed seat and steering wheel heating system. The study aimed to measure the temperature and time at which individuals in a Control and Experimental group would stop using an overwarmed contact surface heating on the steering wheel and seat. In addition, the thermal load on passengers in a vehicle environment was investigated to define time limits and temperature thresholds for comfortable and uncomfortable states when using contact surface heating systems. A BMW driving simulator was used to simulate a realistic in-vehicle experience. The results indicate that both groups could feel the heat but had significant differences in their threshold temperature and time to discontinue heating. The heat sensitivity impaired group had a more sensitive thermal perception compared to the normal sensitive group. The outcomes of this research offer valuable insights for the design and advancement of contact and surface heating systems in vehicles, specifically for individuals who suffer from heat sensitivity impairments.
|
|
13:15-13:30, Paper We-PS30-T3.2 | Add to My Program |
How to Discover Competences from Help Interactions (I) |
|
Merzouki, Hocine | University of Technology of Troyes |
Atifi Hassan, Hassan Atifi | University of Technology of Troyes |
Matta, Nada | University of Technology of Troyes |
Keywords: Intelligence Interaction, Human-centered Learning, Interactive Design Science and Engineering
Abstract: Companies increasingly need to know the skills and competences they have developed to cope with market skills and to tackle projects. Yearly evaluations cannot put on operational and technical competences developed by actors during their work. In fact, competence generally refers to the experiences, knowledge, attitudes, abilities and behavior that enable effective action in a work environment. In this paper, an interaction analysis is proposed in order to emphasize the manifestation of actors’ competences. This approach is mainly based on action verbs and interaction linguistics studies.
|
|
13:30-13:45, Paper We-PS30-T3.3 | Add to My Program |
Granger Leadership in a Novel Dyadic Search Paradigm |
|
King, Kevin | DCS Corp |
Gordon, Stephen | DCS Corporation |
Rabin, Ashley | DCS Corporation |
Keywords: Team Performance and Training Systems, Multi-User Interaction, Interactive Design Science and Engineering
Abstract: Understanding the dynamic nature of emergent team properties such as collaboration and role differentiation could inform intervention methods that aim to improve team outcomes. Often such emergent properties are studied in heavily controlled experiments that constrain how these properties develop. In this study, we use a novel paradigm for which to study emergent team properties in a dyadic team setting. Participants are granted extensive autonomy to determine their strategy as they navigate an immersive environment to pursue their objective of locating and defusing improvised explosive devices. We use Granger causality methods to identify when a participant’s behaviors are predictive of their teammate’s behaviors, which we refer to in this study as Granger leadership. Participants that did the most leading in their team were identified as leaders while their teammates were identified as followers. Results indicated better task performance for teams that more evenly shared leadership responsibilities. We also found that gaze fixation rates increased and blink rates decreased with increasing Granger leadership values. This indicates that Granger leadership may be a surrogate for task engagement, and that interventions aimed at improving team performance should attempt to encourage engagement of the followers within a group.
|
|
13:45-14:00, Paper We-PS30-T3.4 | Add to My Program |
Spatial and Temporal Attention-Based Emotion Estimation on HRI-AVC Dataset |
|
Subramanian, Karthik | Rochester Institute of Technology |
Singh, Saurav | Rochester Institute of Technology |
Justin, Namba | Rochester Institute of Technology |
Heard, Jamison | Rochester Institute of Technology |
Kanan, Christopher | University of Rochester |
Sahin, Ferat | Rochester Institute of Technology |
Keywords: Affective Computing, Human-Collaborative Robotics, Human-Machine Interaction
Abstract: Many attempts have been made at estimating discrete emotions (calmness, anxiety, boredom, surprise, anger) and continuous emotional measures commonly used in psychology, namely `valence' (The pleasantness of the emotion being displayed) and `arousal' (The intensity of the emotion being displayed). Existing methods to estimate arousal and valence rely on learning from data sets, where an expert annotator labels every image frame. Access to an expert annotator is not always possible, and the annotation can also be tedious. Hence it is more practical to obtain self-reported arousal and valence values directly from the human in a real-time Human-Robot collaborative setting. Hence this paper provides an emotion data set (HRI-AVC) obtained while conducting a human-robot interaction (HRI) task. The self-reported pair of labels in this data set is associated with a set of image frames. This paper also proposes a spatial and temporal attention-based network to estimate arousal and valence from this set of image frames. The results show that an attention-based network can estimate valence and arousal on the HRI-AVC data set even when Arousal and Valence values are unavailable per frame.
|
|
14:00-14:15, Paper We-PS30-T3.5 | Add to My Program |
Database for Human Emotion Estimation through Physiological Data in Industrial Human-Robot Collaboration |
|
Justin, Namba | Rochester Institute of Technology |
Savur, Celal | Intel Labs |
Subramanian, Karthik | Rochester Institute of Technology |
Sahin, Ferat | Rochester Institute of Technology |
Keywords: Human-Collaborative Robotics, Human-Machine Interaction, Human-Computer Interaction
Abstract: We introduce three new multi-modal data sets. They contain physiological and/or emotional information about human interactions with robotic arms in proximity to completing a task in an industrial setting. The data sets provide data from human subjects engaged in the assistive task of assembling a PVC joint pipe with robots. These data streams were collected to analyze and improve the comfort and safety of humans collaborating with robots in proximity in an industrial setting. These data sets can appeal to researchers studying human-robot collaboration, robot adaptation, and affective computing. Our data is stored in various formats, including images and human-readable Comma-Separated Values (CSV) or JavaScript Object Notation (JSON) files.
|
|
14:15-14:30, Paper We-PS30-T3.6 | Add to My Program |
EEG-Based Emotion Analysis Using Person-Event Network |
|
Tang, Liwei | Tongji University |
He, Lianghua | Tongji University |
Keywords: Brain-Computer Interfaces
Abstract: Brain-computer interface (BCI) technology has attracted a lot of attention in recent years. Emotion recognition which based on electroencephalography is a typical application of BCI. Traditional methods on emotion recognition are mainly focusing on time domain feature and frequency domain feature while spatial information is often been ignored. In this paper, to make use of spatial feature, we propose a new convolutional neural network using not only temporal feature but also person related feature and event related feature. Depthwise convolution and separable convolution are also used for feature extraction. To verify the effectiveness of our method, we conduct extensive experiments on the public dataset DEAP and DREAMER. Compared with other methods, our method has achieved the state-of-the-art effect.
|
|
14:30-14:45, Paper We-PS30-T3.7 | Add to My Program |
Design of Haptic Experience Recording for Guide-Dog Training |
|
Zhu, Qirong | The University of Tokyo |
Wang, Ansheng | The University of Tokyo |
Tanaka, Shinji | Japan Guide Dog Association |
Matsunami, Yoshiro | Japan Guide Dog Association |
Makino, Yasutoshi | The University of Tokyo |
Shinoda, Hiroyuki | The University of Tokyo |
Keywords: Haptic Systems
Abstract: In recent years, the need for guide dogs has increased, and a more efficient methodology for training guide dog trainers is accordingly required. One challenge is that haptic information, which is a significant part of guide-dog training, is difficult to explain and visualize. Virtual reality (VR) systems have attracted attention owing to their ability to support high-level immersive interactions with the presence of multisensory experiences. A haptic-enabled VR system could address the limitations of the conventional method and support novice trainers in practicing in an immersive and remote transmission manner. This study presents a handle-based sensing setup for recording haptic experiences of trainers during guide dog training for use in a VR system. Considering trainers perceive and apply forces via handle to obtain and control dog motion status, the applied forces are decoded as haptic information and recorded using corresponding sensors. While accuracy of proposed setup for collecting required data is checked to be high(average errors of validating forces and yaw angles are 0.4N and 0.93° respectively), we believe that the proposed recording system is able to measure haptic experiences for further haptic experience re-establishments in proposed VR guide-dog training system and could finally facilitate the education of a large number of dog trainers.
|
|
We-PS30-T4 Regular Session, Hawaii 2 |
Add to My Program |
Image Processing, Pattern Recognition, Machine Vision, and Representation
Learning |
|
|
|
13:15-13:30, Paper We-PS30-T4.2 | Add to My Program |
Initial Analysis of Multiple Retinal Diseases Classification with Fuzzy Medical Image Retrieval |
|
Uher, Vojtech | VSB - Technical University of Ostrava |
Nowaková, Jana | VSB-TUO |
Kromer, Pavel | VSB-Technical University of Ostrava |
Keywords: Image Processing and Pattern Recognition, Fuzzy Systems and their applications, Machine Learning
Abstract: Medical image retrieval is a highly discussed topic, and it includes an efficient classification of diagnoses based on the similarity search in large databases of medical images. It is very important for early and correct diagnosis and treatment. In this paper, we focus on detecting four diagnoses of treatable retinal diseases in optical coherence tomography (OCT) images. The fuzzy medical image retrieval model (FMIR) is applied to transfer images to fuzzy signatures organized in Fuzzy S-tree, as it was previously successfully used for breast cancer detection and COVID-19 chest X-ray detection. The paper examines and compares the performance of the FMIR method on 4-class and binary classification models built on an OCT dataset and compares the impact of two metrics, Euclidean and Hamming fuzzy distances. The experiments show a clear dominance of Hamming fuzzy distance. The best accuracy is achieved for binary classification (61.16 - 93.8%), while the performance of the 4-class model is worse (51.7%). The distribution of signature space and classification performance are analyzed in detail.
|
|
13:30-13:45, Paper We-PS30-T4.3 | Add to My Program |
Domain Adaptation for Edge Detectors Using Lightweight Networks |
|
Bommireddy, Venkat Sumanth Reddy | Florida State University |
Raniwala, Sophie | Stanford Online High School |
Kumar, Piyush | Florida State University |
Keywords: Image Processing and Pattern Recognition, Transfer Learning, Neural Networks and their Applications
Abstract: Edge Detectors are one of the most fundamental tools used in Computer Vision applications. Using deep learning, state-of-the-art (SOTA) models are able to produce sharp, fine edges, mirroring human-level performance. However, most of these SOTA models are trained solely on color images. We experimentally determined that this skewed data severely hinders performance on images from other domain spaces. In this paper, we propose a way to adapt SOTA models to single-channel inputs from other domains with minimal overhead, using small-scale feed-forward networks. Our model, which serves as an additional layer to existing SOTA models, greatly boosts their performance on out-of-domain data, allowing the edge detector to generalize to new domain spaces without undergoing extensive, resource- or data-heavy retraining. Using a single off-the-shelf GPU and a small 30-image dataset, we were able to train this low-complexity (less than 13k parameters) model in half an hour to boost the F-score of the produced edgemaps for chest x-rays by over 0.2.
|
|
14:00-14:15, Paper We-PS30-T4.5 | Add to My Program |
Automatic Detection of Puncture Needle from CT Image with Deep Learning and Difference of CT Value Along Craniocaudal Direction |
|
Kobayashi, Seiya | Okayama University |
Toda, Yuichiro | Okayama University |
Matsuno, Takayuki | Okayama University |
Mayumi, Kotaro | Okayama University |
Muramoto, Wataru | Okayama University |
Fujitsuka, Nozomu | Okayama University |
Tanaka, Takaaki | Okayama University |
Kamegawa, Tetsushi | Okayama University |
Hiraki, Takao | Okayama University |
Keywords: Machine Vision, Deep Learning, Machine Learning
Abstract: We have developed a CT guided needle puncture robot (Zerobot) to assist in interventional radiology surgery. Currently, Zerobot is operated remotely, and the next goal is to perform automatic needle puncture surgery. There is a challenge that automatic detection of puncture needle from CT images for first step of automatic puncture surgery. Because there is the case that the form of puncture needle is curved, it is necessary to detect the needle from CT image instead of estimating the needle position from arm of Zerobot. First, the method detects ROI with ResNet. Next, difference of the CT value is calculated for each pixel in the ROI, and a linear approximation is performed for to detect the needle shape. Images from animal experiments were used to evaluate the learner and image processing. We confirmed that the proposed method can detect needles in a single image and in multiple images.
|
|
14:15-14:30, Paper We-PS30-T4.6 | Add to My Program |
Pyramid Pooling-Based Local Profiles for Graph Classification |
|
Wu, Chengpei | Sichuan Normal University |
Lou, Yang | Osaka University |
Li, Junli | Sichuan Normal University |
Keywords: Representation Learning, Neural Networks and their Applications, Complex Network
Abstract: Many natural and engineering systems can be modeled and represented in the forms of graph data, and then studied using graph theory and network analysis tools. Graph representation learning aims at generating lower-dimensional representations from higher-dimensional graph data, which is a crucial step that facilitates the follow-up tasks, such as node and graph classifications. In this paper, we present a simple but effective graph representation learning method, namely the pyramid pooling-based local profile (PPLP), which enables local nodal profiles to be transformed into a graph representation, with multi-scale features extracted. PPLP can be either embedded into a graph neural network as the readout layer, or perform independently as a graph embedding algorithm. The resultant representations of PPLP are for graph-level tasks. PPLP is experimentally tested by performing graph classification tasks on ten representative datasets, either as the readout layer of different graph neural networks, or as an independent graph embedding algorithm. Experimental results demonstrate that: 1) when embedded into graph neural networks, PPLP outperforms the widely-used global pooling-based readout methods; 2) as an independent graph embedding algorithm, PPLP performs fairly good, especially on the social network datasets. The investigation confirms PPLP as a simple but promising method for graph-level tasks.
|
|
We-PS30-T5 Regular Session, Hawaii 3 |
Add to My Program |
Neural Networks and Their Applications |
|
|
|
13:00-13:15, Paper We-PS30-T5.1 | Add to My Program |
Multifunctionality in a Connectome-Based Reservoir Computer |
|
Morra, Jacob | The University of Western Ontario |
Flynn, Andrew | School of Mathematical Sciences, University College Cork, Irelan |
Amann, Andreas | University College Cork |
Daley, Mark | The University of Western Ontario |
Keywords: Neural Networks and their Applications, Machine Learning, Computational Life Science
Abstract: Multifunctionality describes the capacity for a neural network to perform multiple mutually exclusive tasks without altering its network connections; and is an emerging area of interest in the reservoir computing machine learning paradigm. Multifunctionality has been observed in the brains of humans and other animals: particularly, in the lateral horn of the fruit fly. In this work, we transplant the connectome of the fruit fly lateral horn to a reservoir computer (RC), and investigate the extent to which this 'fruit fly RC' (FFRC) exhibits multifunctionality using the 'seeing double' problem as a benchmark test. We furthermore explore the dynamics of how this FFRC achieves multifunctionality while varying the network's spectral radius. Compared to the widely-used Erdös-Renyi Reservoir Computer (ERRC), we report that the FFRC exhibits a greater capacity for multifunctionality; is multifunctional across a broader hyperparameter range; and solves the seeing double problem far beyond the previously observed spectral radius limit, wherein the ERRC's dynamics become chaotic.
|
|
13:15-13:30, Paper We-PS30-T5.2 | Add to My Program |
Semantic Segmentation of Spine and Femur Bone Using Atrous Spatial Pyramid Pooling-Based U-Net with Fully Connected CRF |
|
Tani, Yuki | Doshisha University |
Ono, Keiko | Doshisha University |
Yamakawa, Sohei | Doshisha University |
Yakushijin, Shoma | Ryukoku University |
Tawara, Daisuke | Ryukoku University |
Keywords: Neural Networks and their Applications, Biometric Systems and Bioinformatics, Application of Artificial Intelligence
Abstract: Semantic segmentation of bone structures requires pixel-wise classification and high accuracy to build a precise bone model capable of diagnoses. Various segmentation methods have been developed, and state-of-the-art models, such as DeepLabv3+, are generally based on a Convolutional Neuronal Network (CNN). A CNN requires many images for training, and high performance is not expected when few training images are available. In general, it is costly to obtain training images, known as ground truth for bone segmentation, because ground truth images are usually handcrafted by experts. Considering the lack of training images, we aim to develop a robust method in an environment with few images. U-Net is a well-known CNN and is often adopted for medical image segmentation because it can train with very few images compared to other models, thanks to its U-shaped encoder-decoder architecture. However, it is reported that U-Net cannot capture complex shapes in spine images. In this paper, we propose combing the advantages of both U-Net and DeepLabv3+. Specifically, we incorporate atrous spatial pyramid pooling (ASPP), which is used in DeepLabv3+, into U-Net to adopt varying-sized convolutions to capture precise bone structure. Because applying ASPP to all convolution layers of U-Net would be computationally expensive, it was applied after the encoder as the first step. It is known that the last convolution layer before the fully connected layer expresses visual features in a classification model; therefore, we hypothesized that the layer after the encoder would provide more inherent specific features than other layers in U-Net. Moreover, the Gaussian conditional random field (CRF) was utilized to refine the output semantic segmentation map. We evaluated the proposed model on spine and femur image datasets from Cancer Imaging Archive. We confirmed that the proposed method outperforms U-Net, and observed that both ASPP and CRF are effective for the performance improvement of U-Net.
|
|
13:30-13:45, Paper We-PS30-T5.3 | Add to My Program |
An Approach Toward Multiobjective Optimization Problems in Hysteresis Neural Networks |
|
Kujirai, Shinya | Hosei University |
Saito, Toshimichi | HOSEI University |
Keywords: Neural Networks and their Applications, Evolutionary Computation
Abstract: This paper studies a multiobjective optimization problem in continuous-time recurrent neural networks. For simplicity, we use a simple recurrent neural network: a hysteresis associative memory characterized by a binary hysteresis activation function and ternary cross-connection parameters. The optimization problem is based on two objectives. The first objective evaluates the memory accuracy and the second objective evaluates connection sparsity. In order to analyze the optimization problem, we present a simple evolutionary algorithm with growing connection structure. Applying the algorithm to typical examples, we have obtained a Pareto front that guarantees existence of a trade-off between the two objectives. The trade-off provides basic information for the system dynamics and becomes a criterion to optimize the system parameters.
|
|
13:45-14:00, Paper We-PS30-T5.4 | Add to My Program |
Implementation of Neural Networks in Real-Time Swarm Robotics Applications |
|
Yazici, Emre | Istanbul Technical University, NISO |
Temeltas, Hakan | İstanbul Technical University |
Keywords: Neural Networks and their Applications, Swarm Intelligence, Deep Learning
Abstract: Over the last few decades, significant attention has been devoted to the coordinated motion of swarm systems, which involve multiple autonomous robots. These systems have a range of potential applications, such as creating formations with physical or non-physical bonds, navigating to desired locations while maintaining formation, and preventing collisions. To achieve coordinated motion, recent researches have focused on the potential function method. We previously validated novel potential functions and developed a sliding-mode speed controllers to promote collective behavior. In this study, we implement an accepted neural network with one hidden and one output layer to update potential function weights with decentralized manner in real-time, thereby enhancing swarm behavior for formation creation and maintenance. The effectiveness of this approach is demonstrated through simulation and processor-in-the-loop tests using two different micro-controllers as well as measuring execution time of neural networks’ models. By successfully validating neural networks implementation in this context, our study extends the current understanding of swarm behavior and opens up new possibilities for real-time swarm applications.
|
|
We-PS30-T6 Special Session, Hawaii 4 |
Add to My Program |
Data-Driven Optimization of Distributed Computing Systems |
|
|
Chair: Yuan, Haitao | Beihang University |
Organizer: Yuan, Haitao | Beihang University |
Organizer: Bi, Jing | Beijing University of Technology |
Organizer: Zhang, Jia | Southern Methodist University |
Organizer: Zhou, Mengchu | New Jersey Institute of Technology |
|
13:00-13:15, Paper We-PS30-T6.1 | Add to My Program |
Deep and Spatio-Temporal Detection for Abnormal Traffic in Cloud Data Centers (I) |
|
Yuan, Haitao | Beihang University |
Wang, Shen | Beihang University |
Bi, Jing | Beijing University of Technology |
Zhang, Jia | Southern Methodist University |
Keywords: Intelligent Internet Systems, Cloud, IoT, and Robotics Integration, Neural Networks and their Applications
Abstract: Current interactions of network traffic through cloud data centers have become an important process of network services. Precise and real-time detection and prediction of network traffic can assist system operators in effectively allocating resources, and assessing network performance based on actual service requirements, and analyzing network health. However, sources and distribution of network traffic are different, which makes accurate warnings of network attack traffic become a difficult problem. In recent years, neural networks have been proven to be effective in predicting time series data, particularly long short-term memory networks for capturing temporal features and convolutional methods for capturing spatial features. This work proposes a Deep Hybrid Spatio-Temporal (DHST) network method for abnormal traffic detection in cloud data centers, which combines a cooperative temporal convolutional network, an attention mechanism and a random inactivation method to capture the network traffic data’s spatio-temporal features. It improves accuracy of abnormal traffic detection, and realizes classification of normal traffic and abnormal one. It achieves higher accuracy than typical detection methods when applied to a real-life dataset collected from Yahoo Webscope S5.
|
|
13:15-13:30, Paper We-PS30-T6.2 | Add to My Program |
Towards Energy-Efficient Scheduling of UAV-Enabled Mobile Edge Computing Systems (I) |
|
Yuan, Haitao | Beihang University |
Wang, Meijia | Beihang University |
Bi, Jing | Beijing University of Technology |
Zhang, Jia | Southern Methodist University |
Keywords: Evolutionary Computation, Swarm Intelligence, Intelligent Internet Systems
Abstract: Current mobile edge computing (MEC) owns cloud resources at the network edge, which enables low-latency mobile services. In addition to fixed MEC servers, MEC proxy servers with certain mobility and limited computing, e.g., flying unmanned aerial vehicles (UAVs), and vehicles, have emerged as competitors in providing services. In this work, aiming at a task offloading problem of a UAV-assisted MEC system, a hybrid network environment with multiple mobile devices (MDs) and multiple UAVs is established. A constrained mixed integer nonlinear program of the UAV-assisted hybrid cloud-edge system is formulated. A novel hybrid metaheuristic algorithm called Genetic Simulated annealing-based Particle Swarm Optimization (GSPSO) is presented to solve the program. Then, a task offloading and resource scheduling method is designed to intelligently minimize the total energy consumption of the hybrid system. Simulation results verify superiority of GSPSO over its three benchmark algorithms, thus demonstrating the proposed method significantly improves the energy efficiency of the UAV-enabled hybrid system.
|
|
13:30-13:45, Paper We-PS30-T6.3 | Add to My Program |
Joint Optimization of Cache-Assisted Offloading and Resource Allocation in Mobile Edge Computing (I) |
|
Bi, Jing | Beijing University of Technology |
Zhe, Sun | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Zhang, Jia | Southern Methodist University |
Keywords: Cloud, IoT, and Robotics Integration, Swarm Intelligence, Evolutionary Computation
Abstract: Edge computing is a new architectural model that aims to offer computing, storage, and networking resources to support Internet of Things. Its primary strategy involves transferring computational tasks to the edge of network, which is closer to end-users. This paradigm facilitates offloading of computation, resulting in reduced latency and improved system performance. However, nodes located at the network edge have restricted energy and resources. As a result, running tasks entirely at the edge leads to higher energy consumption. This work proposes a novel three-tier offloading framework comprising of multiple mobile vehicles (MVs), a base station (BS), and a cloud data center (CDC). It jointly optimizes offloading rates of tasks, CPU computation rates of MVs, BS, and CDC, and the allocation of wireless bandwidth resources at MVs during partial computation offloading of tasks. It also considers limits of maximum computational resources and maximum delay of task execution. To further reduce the total system energy consumption, this work actively caches execution codes of tasks in MEC servers to reduce data transmission energy of MVs, which minimizes the total system energy consumption. This work develops a mixed integer nonlinear program and designs a mixed metaheuristic algorithm with a multi-strategy adaptive particle swarm optimizer. Simulation results demonstrate that it outperforms various state-of-the-art algorithms by achieving lower energy consumption in fewer iterations.
|
|
13:45-14:00, Paper We-PS30-T6.4 | Add to My Program |
Latency-Minimized Computation Offloading in Vehicle Fog Computing with Improved Whale Optimization Algorithm (I) |
|
Bi, Jing | Beijing University of Technology |
Xue, Xiangdong | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Zhang, Jia | Southern Methodist University |
Keywords: Intelligent Internet Systems, Metaheuristic Algorithms, Evolutionary Computation
Abstract: Fog computing provides lower latency and higher bandwidth compared to cloud computing and is widely used in Internet of Vehicles (IoV). Vehicles cannot compute all tasks locally due to their limited computing power and battery capacity. Thus, it is a useful way to offload some tasks of vehicles to other resource-rich servers. However, due to the high mobility of vehicles, there may be a failure of returning computing results. Thus, it is a challenge to minimize the latency of tasks while meeting the constraint of energy consumption. Thus, this work proposes a vehicle-fog offloading system that offloads tasks to fog servers or idle vehicles. This work proposes an improved optimization algorithm called an adaptive L ́evy flight-based Whale optimization algorithm with Hierarchical learning (LWH) to solve this problem. Simulation experiments show that LWH has a strong global search capability and outperforms its five typical and widely used algorithms.
|
|
14:00-14:15, Paper We-PS30-T6.5 | Add to My Program |
Web Traffic Anomaly Detection Using a Hybrid Spatio-Temporal Neural Network (I) |
|
Bi, Jing | Beijing University of Technology |
Xu, Lifeng | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Zhang, Jia | Southern Methodist University |
Keywords: Intelligent Internet Systems, Neural Networks and their Applications, Deep Learning
Abstract: Nowadays, rapid development of Internet has brought a sharp increase in traffic data. Abnormal traffic haS serious impact on network security. Traffic anomaly detection can be achieved by extracting characteristics of network traffic to detect anomalous intrusions, and therefore, anomaly detection algorithms are of great significance to maintenance of network security. This work proposes a hybrid spatio-temporal neural network with attention named CTGA to effectively identify anomalous traffic. CTGA combines a Convolutional neural network (CNN), a Temporal convolutional network (TCN), a bidirectional Gated recurrent unit network (BiGRU), and a self- Attention mechanism. It automatically extracts temporal and spatial features of sequences from raw data by sliding window preprocessing followed by CNN, TCN, BiGRU, and the selfattention mechanism to detect anomalous data. CNN is used to extract spatial features of time sequences and reduce the loss of spatial information. In the sequence, TCN obtains shortterm features. Long-term dependencies in the data are captured by BiGRU, and the self-attention mechanism obtains important information in the sequence. Finally, experiments with the reallife Yahoo S5 dataset prove that CTGA outperforms other approaches substantially.
|
|
14:15-14:30, Paper We-PS30-T6.6 | Add to My Program |
Delay-Aware and Energy-Efficient Task Offloading Based on Adaptive Large Neighborhood Search (I) |
|
Jiang, MingZhong | Guangdong University of Technology |
Lu, AnBang | Guangdong University of Technology |
Zhu, QingHua | Guangdong University of Technology |
Fei, Lunke | Guangdong University of Technology |
Keywords: Cloud, IoT, and Robotics Integration
Abstract: Mobile edge computing boosts the application performance on mobile devices by collaborating with cloud platforms. This paper studies the task offloading and computing resource allocation problem in a multibase, multiserver, and multiuser scenario subject to resource constraints. The goal is to maximize the users’ task offloading utility, including improvements in task completion time, energy consumption, and communication cost. The addressed problem is formulated as a mixed integer nonlinear programming (MINLP) model. In this paper, we decompose the MINLP and the optimal computing resource allocation policy under a deterministic offloading strategy obtained by the Karush-Kuhn-Tucker conditions. Then, a hybrid adaptive large neighborhood search (HALNS) algorithm is proposed to conduct task offloading. The adaptive large neighborhood search and the variable neighborhood de-scent stages are jointly employed in HALNS. The proposed algorithm, an improved simulated annealing algorithm, and a modified variable neighborhood search algorithm are executed to evaluate their performances. Digital experimental results show that our proposed algorithm achieves higher system utility, lower delays, and less energy consumption.
|
|
14:30-14:45, Paper We-PS30-T6.7 | Add to My Program |
Autoencoder and Teaching-Learning-Based Optimizer for Mobile Edge Computing System Optimization Problems (I) |
|
Xu, Dian | Macau University of Science and Technology |
Zhou, Mengchu | New Jersey Institute of Technology |
Yuan, Haitao | Beihang University |
Keywords: Heuristic Algorithms, Hybrid Models of Computational Intelligence, Optimization and Self-Organization Approaches
Abstract: By using an autoencoder as a dimension reduction tool, an Autoencoder-embedded Teaching-Learning Based Optimization (ATLBO) has been proved to be effective in solving high-dimensional computationally expensive problems through several widely used function problems. However, the following two crucial issues have not been resolved, 1) ATLBO should be verified by solving real-life optimization problems; and 2) how autoencoder parameters and structures impact AEO’s performance. In this work, ATLBO is verified by an energy consumption minimization problem (ECM) in mobile edge computing systems. To design an effective autoencoder for ATLBO, this work proposes a parameter tuning optimization strategy for autoencoders. By using the proposed Autoencoder Parameter Tuning (APT) strategy, ATLBO can enjoy higher robustness than those without it. The experimental results show that it is three to six times better than state-of-the-art methods in solving ECM. We consider the strategy-induced overhead and take the execution time as the primary criterion to evaluate them. In addition, the experimental results show that, against the conventional wisdom that higher-accuracy autoencoders bring higher system performance, lower-accuracy ones can actually assist ATLBO in locating the best solutions. This work promotes a novel application of autoencoders in optimization theory and practice.
|
|
We-PS30-T7 Regular Session, Honolulu |
Add to My Program |
Deep Learning I |
|
|
|
13:00-13:15, Paper We-PS30-T7.1 | Add to My Program |
A GAN-Based FDI and Civil Attack Detection Framework for Digital Relays |
|
Aflaki, Arshia | University of Calgary Calgary, Canada |
Karimipour, Hadis | University of Calgary |
Namavar Jahromi, Amir | University of Guelph |
Keywords: Deep Learning, Neural Networks and their Applications, Expert and Knowledge-Based Systems
Abstract: Digital relays are critical components of smart power grids, therefore, their security is paramount for the proper operation of the grid. This paper proposes a cyber-attack detection method for digital relays using a modified generative adversarial network combined with the extra tree classifier for dimensionality reduction. The proposed method is evaluated on a IEEE 39-bus transmission network. The results shows that the proposed method can detect false data injection attacks and civil-attack with more than 97 percent of accuracy, f1 score, and more than 96 percent of sensitivity.
|
|
13:15-13:30, Paper We-PS30-T7.2 | Add to My Program |
Detecting Children Emotion through Facial Expression Recognition |
|
Chinta, Uma | University of Colorado at Colorado Springs |
Atyabi, Adham | University of Colorado at Colorado Springs |
Keywords: Deep Learning, Image Processing and Pattern Recognition
Abstract: Emotion detection has become an increasingly important area of research in recent years, as it has numerous applications in fields such as psychology, marketing, and human computer interaction. Deep learning has shown success in emotion recognition due to the availability of large amounts of data and the ability to learn complex patterns in facial expressions, speech, and physiological signals when recognizing emotions. However, there are challenges associated with variations in lighting, pose, and facial expressions. This study introduces a novel deep-learning approach for emotion detection, leveraging the power of VGGFace2 to classify six emotional poses in children. The proposed approach outperforms the state-of-the-art in the field, achieving a success rate of 96.3% on the CAFE dataset. A comprehensive evaluation of the findings and a detailed discussion of their potential implications is offered as part of the study.
|
|
13:45-14:00, Paper We-PS30-T7.4 | Add to My Program |
Pruning Based on Activation Function Outputs for Efficient Neural Networks |
|
Kamma, Koji | Wakayama University |
Wada, Toshikazu | Wakayama University |
Keywords: Deep Learning, Machine Learning, Image Processing and Pattern Recognition
Abstract: Deep Neural Networks (DNNs) are dominant in the field of machine learning. However, because DNN models have large computational complexity, implementation of DNN models on resource-limited equipment is challenging. Therefore, techniques of compressing DNN models without degrading their accuracy is desired. Pruning is one such technique that removes unimportant neurons (or channels). In this paper, we present Pruning with Output Error Minimization (POEM), a method that performs not only pruning but also reconstruction to compensate the error caused by pruning. The strength of POEM lies in its consistent criteria for neuron selection and reconstruction based on the output error of the activation function, whereas the previous methods use ad hoc criteria or relax the problem to minimize the error before the activation function. The experiments with well-known DNN models (VGG-16, ResNet-18, MobileNet) and image recognition datasets (ImageNet, CUB-200-2011) were conducted. The results show that POEM significantly outperformed the previous methods in maintaining the accuracy of the compressed models.
|
|
14:00-14:15, Paper We-PS30-T7.5 | Add to My Program |
DS-Point: A Dual-Scale 3D Framework for Point Cloud Understanding |
|
Zhang, Renrui | The Chinese University of Hong Kong |
Zeng, Ziyao | ShanghaiTech University |
Guo, Ziyu | Peking University |
Chen, Borui | University of Electronic Science and Technology of China |
Zhang, Guangnan | Baoji University of Arts and Science |
Liu, Xilan | Baoji University of Arts and Science |
Keywords: Deep Learning, Machine Vision, Representation Learning
Abstract: Compared with grid-based 2D images, processing 3D point clouds is more challenging due to their irregular distribution and intricate spatial information. Most prior works introduce delicate designs on either local feature aggregators or global geometric architecture, but few combine two scales effectively. Therefore, to better incorporate the advantages of both local and global processing, we propose DS-Point, a dual-scale 3D framework for point cloud understanding. DS-Point firstly disentangles 3D features from channel dimension for concurrent dual-scale modeling, i.e., point-wise convolution for local fine-grained geometry parsing, and voxel-wise attention for global long-range spatial exploration. Upon that, an HF-fusion module is proposed to enhance the cross-modal interaction and thoroughly blend the dual-scale features. Then, with task-specific heads for different downstream tasks, DS-Point serves as an effective 3D framework for feature extraction. By the dual-scale paradigm, DS-Point achieves superior performance on multiple downstream tasks, e.g., 93.8% for shape classification on ModelNet40, 84.9% on ScanObjectNN, and 84.3% on ShapeNetPart.
|
|
We-PS30-T8 Regular Session, Kahuku |
Add to My Program |
Metaheuristic Algorithms |
|
|
|
13:00-13:15, Paper We-PS30-T8.1 | Add to My Program |
Cost-Minimized Partial Computation Offloading in Cloud-Assisted Mobile Edge Computing Systems |
|
Bi, Jing | Beijing University of Technology |
Wang, Ziqi | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Zhang, Jia | Southern Methodist University |
Keywords: Metaheuristic Algorithms, Intelligent Internet Systems, Cloud, IoT, and Robotics Integration
Abstract: Nowadays, smart mobile devices (SMDs) support various computation-intensive and delay-sensitive applications, e.g., online games, and figure compression. However, SMDs have limited computing resources and battery energy and cannot execute all tasks of the above applications in a real-time manner. Cloud computing provides enormous computing resources and energy that can easily execute tasks offloaded from SMDs. However, could data centers (CDCs) are often located in remote sites, which leads to long transmission time. Small base stations (SBSs) offer high-bandwidth and low-latency services for SMDs, which solves the problem of cloud computing. However, it becomes a challenge to achieve the lowest cost in such a heterogeneous architecture including multiple SMDs, SBSs, and the CDC while meeting delay requirements of tasks. This work proposes a cost-minimized computation offloading strategy to minimize the total cost of the system. A constrained optimization problem is first formulated based on the hybrid architecture. Afterward, a two-stage optimization algorithm called a Levy flights and Simulated Annealing-based Grey wolf optimizer (LSAG) is developed to optimize the total cost of the system. In the first stage, the optimal edge selection policy is determined given multiple available SBSs. In the second stage, task offloading and resource allocation among SMDs, SBSs, and the cloud are determined in the second stage. LSAG integrates L´evy flights and simulated annealing into the grey wolf optimizer to improve the global exploration ability and avoid early trapping into local optima. Experiments with real-life tasks prove that LSAG significantly achieves lower cost with faster convergence speed than state-of-the-art peers.
|
|
13:15-13:30, Paper We-PS30-T8.2 | Add to My Program |
Combinatorial Optimization Method Based on Hierarchical Structure in Solution Space with Stochastic Neighborhood Selection |
|
Nakada, Keigo | Tokyo Metropolitan University |
Sekii, Daisuke | Tokyo Metropolitan University |
Tamura, Kenichi | Tokyo Metropolitan University |
Yasuda, Keiichiro | Tokyo Metropolitan University |
Keywords: Metaheuristic Algorithms
Abstract: As systems become larger and more complex, real-world problems, such as system operation, require that quasi-optimal solutions with sufficient engineering optimality be obtained in practical time. Meta-heuristics have attracted attention as a framework for methods that seek quasi-optimal solutions. We focus on the issue of degeneracy in the Combinatorial optimization method with search strategy based on hierarchical interpretation of solution space, which has high performance and versatility compared to existing basic methods. The degeneracy is a phenomenon in which multiple search points follow the same path, which has negative effects such as wasting computational resources and weakening the interaction between search points. A new stochastic process is introduced with the main objective of improving search performance by dealing with the degeneracy. The occurrence of the degeneracy and search performance are compared and verified with the original method. We use the basic benchmark problems: Knapsack Problem (KP), Traveling Salesman Problem (TSP), Flow-shop Scheduling Problem (FSP), and Quadratic Assignment Problem (QAP).
|
|
13:30-13:45, Paper We-PS30-T8.3 | Add to My Program |
Differential Evolution Using Superior Infeasible Solutions for Constrained Optimization |
|
Sato, Yuji | Tokyo Metropolitan University |
Kumagai, Wataru | Yokogawa Electric Corporation |
Yasuda, Yusuke | Tokyo Metropolitan University |
Tamura, Kenichi | Tokyo Metropolitan University |
Yasuda, Keiichiro | Tokyo Metropolitan University |
Keywords: Metaheuristic Algorithms
Abstract: Differential Evolution (DE) is one of the effective metaheuristics for solving unconstrained optimization problems. Constraint Handling Technique (CHT) is needed to extend DE to constrained optimization. Feasibility Rule (FR) is one of the typical CHT. FR addresses constraints by using a simple rule that considers the objective function and constraint violation when comparing search individuals. However, since the solutions with small constraint violation, i.e., feasible solutions, are preferentially selected, the set of search individuals may be biased toward feasible regions and the improvement of the objective function value may stagnate. This paper overcomes this challenge by proposing a DE that uses a superior infeasible solution in an external archive. The external archive in the proposed DE stores search individuals that are superior in both objective function value and constraint violation and utilize them to generate mutant individuals. Finally, we verify the effectiveness of the proposed method using a benchmark problem where the feasible region is a convex set.
|
|
13:45-14:00, Paper We-PS30-T8.4 | Add to My Program |
A Two-Stage Iterated Local Search Algorithm for the Capacitated P-Center Problem |
|
Zhang, Qingyun | Huazhong University of Science and Technology |
Lu, Zhipeng | Huazhong University of Science and Technology |
Su, Zhouxing | Huazhong University of Science and Technology |
Keywords: Metaheuristic Algorithms, Heuristic Algorithms
Abstract: The capacitated p-center problem (CpCP) is an extension of the classical p-center problem. It consists of choosing p centers from a set of candidate centers and assigning each client to a center such that the total client demand assigned to each center does not exceed its given capacity. The objective of the CpCP is to minimize the maximum distance between each client and its assigned center. In this paper, we propose a two-stage iterated local search algorithm called TS-ILS to solve the CpCP. The first stage uses a tabu search procedure to select centers and greedily assign clients to centers, while the second stage adopts a variable neighborhood search procedure to perform the fine-grained assignment of clients. Tested on 39 commonly studied instances in the literature, TS-ILS improves the best known results of the state-of-the-art metaheuristic algorithms on 18 instances and matches the records for the remaining ones within less run time.
|
|
14:00-14:15, Paper We-PS30-T8.5 | Add to My Program |
A Memetic Algorithm for the Multi-Depot Vehicle Routing Problem |
|
Shao, Wenhan | Huazhong University of Science and Technology |
Su, Zhouxing | Huazhong University of Science and Technology |
Ding, Junwen | Huazhong University of Science and Technology |
Lu, Zhipeng | Huazhong University of Science and Technology |
Keywords: Metaheuristic Algorithms, Evolutionary Computation, Heuristic Algorithms
Abstract: Multi-depot vehicle routing problem (MDVRP) is a variant of the classical VRP, which includes several depots with a fleet of homogeneous vehicles to serve each customer exactly once while satisfying the vehicle capacity and duration constraints. We propose a memetic algorithm called GVTS-DPX which hybridizes the granular variable tabu search (GVTS) with the depot partition crossover (DPX) for solving the MDVRP, where GVTS combines tabu search and the granular neighborhoods with variable neighborhood descent, while DPX treats the solution as the collection of depots and partitions the depots into two groups covering the most customers. The main contributions of this study include proposing the DPX operator, reforming several existing move types used for the VRP and its variants, and designing a granular variable neighborhood consisting of a total of 21 kinds of move types. Experimental results on 33 public MDVRP instances indicate that GVTS-DPX is competitive with the state-of-the-art algorithms in the literature.
|
|
14:15-14:30, Paper We-PS30-T8.6 | Add to My Program |
Convolutional Neural Network Compression Based on Improved Fractal Decomposition Algorithm for Large Scale Optimization |
|
Llanza, Arcadi | University Paris Est Créteil, Laboratoire LISSI; Cyclope.ai |
Keddous, Fekhr Eddine | University Paris Est Créteil, Laboratoire LISSI; Cyclope.ai |
Shvai, Nadiya | National University of Kyiv-Mohyla Academy; Cyclope.ai |
Nakib, Amir | Universite Paris Est Creteil, |
Keywords: Metaheuristic Algorithms, Neural Networks and their Applications, Image Processing and Pattern Recognition
Abstract: Deep learning methods have shown state-of-the-art results in various application areas such as computer vision, NLP, etc. However, their practical use presents many challenges, including those caused by the large size of the models, especially in the context of model weight storage and transmission. One of the possible solutions to this problem is Neural Network (NN) compression, which is a process of obtaining a derived model serving the same task with a smaller number of parameters or with parameters of lower precision. The most common NN compression techniques include pruning, sparse representation, quantization, and knowledge transfer. In this article, the compression of Convolutional Neural Networks (CNNs) using fractional differentiation is investigated. A formulation of this task as a large-scale continuous optimization problem is then proposed, and its resolution is performed through a new optimization algorithm, called the Improved Fractal Decomposition Algorithm (IFDA), based on space geometric fractal decomposition. The results obtained show that MobileNetV3, for instance, is compressed by 18.5% with only a 2.5% decrease in accuracy. Additionally, the proposed IFDA algorithm outperforms all other competing metaheuristics in solving this problem.
|
|
We-PS30-T9 Regular Session, Oahu |
Add to My Program |
Deep Learning II |
|
|
|
13:00-13:15, Paper We-PS30-T9.1 | Add to My Program |
Enhancing Graph Structures for Node Classification: An Alternative View on Adversarial Attacks |
|
Jang, Soobeom | Yonsei University |
Park, Junha | Yonsei University |
Lee, Jong-Seok | Yonsei University |
Keywords: Deep Learning, Neural Networks and their Applications
Abstract: Recently, graph neural networks (GNNs) have become a popular approach to deal with machine learning tasks for graph-structured data. To achieve reliable performance with a GNN-based approach, obtaining high-quality graph structures is crucial. However, the graph data in the real-world often contain noise from data themselves or during the collecting procedure, which leads to the performance degradation of GNNs. In this paper, we propose a novel approach to enhance graph structures for performance improvement of GNNs by reversely applying the concept of adversarial attacks on graph data. Experimental results demonstrate the effectiveness of our method in improving performance of GNNs. Furthermore, we investigate the changes in the graph structure induced by our method, taking into account the connectivity of both inter-class and intra-class edges and measuring the extent of over-smoothing.
|
|
13:15-13:30, Paper We-PS30-T9.2 | Add to My Program |
Acquiring a Low-Dimensional, Environment-Independent Representation of Brain MR Images for Content-Based Image Retrieval |
|
Tobari, Shuya | Hosei University |
Oishi, Kenichi | Johns Hopkins University School of Medicine |
Iyatomi, Hitoshi | Hosei University |
Keywords: Deep Learning, Representation Learning, Image Processing and Pattern Recognition
Abstract: To make content-based image retrieval (CBIR) technology for magnetic resonance (MR) images of the brain practical and useful for diagnosis and research, it is important to obtain low-dimensional representations that embody pathological attributes. However, recent evidence suggests that variations in domains resulting from differences in imaging equipment and protocols at each imaging facility can overshadow pathological attributes. In this study, we propose a novel approach known as multidecoder adversarial domain adaptation (MD-ADA) to obtain low-dimensional representations of brain MR images that preserve pathological features while mitigating domain differences. This method combines adversarial domain adaptation techniques with convolutional autoencoders that have distinct decoders for each domain and employs adversarial learning to prevent domain discrimination from the produced low-dimensional representations. Experimental evaluations on two datasets, ADNI and PPMI, comprising 4,168 brain images demonstrate that the proposed MD-ADA significantly reduces domain differences between datasets without compromising the recoverability of brain images or the accuracy of disease classification.
|
|
13:30-13:45, Paper We-PS30-T9.3 | Add to My Program |
Surpass Teacher: Enlightenment Structured Knowledge Distillation of Transformer |
|
Yang, Ran | Wuhan University of Science and Technology |
Deng, Chunhua | Wuhan University of Science and Technology |
Keywords: Deep Learning, Image Processing and Pattern Recognition, Machine Vision
Abstract: It is difficult to train a trustworthy transformer model on a small image classification dataset. This research proposes a sophisticated structured knowledge distillation algorithm that uses CNNs as Transformer's sophisticated teachers, significantly lowering the number of training datasets needed. To better to develop the potential for CNN tutors, this research configures a public data set for CNN teaching as an enlightenment textbook to guide Transformer’s training and avoid falling into local optimization prematurely. The distillation process then employs a ``learn-digest-self-distillation'' learning strategy to enable the Transformer to assimilate CNN knowledge in a structured manner. Sufficient experiments show that the proposed method is significantly better than the direct training Transformer under the condition of limited data sets. Moreover, in order to show the practical application value, this research contributed a practical data set for the classification of smoking and calling. The corresponding code and dataset will be released at https://gitee.com/wustdch/surpass-teacher if this paper is accepted.
|
|
13:45-14:00, Paper We-PS30-T9.4 | Add to My Program |
Leveraging Feature Interaction for Modeling User-Item Interaction in Recommender Systems |
|
Jang, Soobeom | Yonsei University |
Lee, Soomin | Amazon |
Park, Junha | Yonsei University |
Lee, Jong-Seok | Yonsei University |
Keywords: Deep Learning, Big Data Computing,, Neural Networks and their Applications
Abstract: Challenges in recent recommender systems include how to model high-order feature interaction and how to exploit user-item interaction, particularly for neural network-based recommender systems. While previous approaches have focused only on one aspect, this paper attempts to address both simultaneously by extracting augmented embeddings for users and items with feature interaction and modeling user-item interaction using graph neural networks. Real-world experimental results show that the proposed method outperforms state-of-the-art methods considering one type of interaction.
|
|
14:00-14:15, Paper We-PS30-T9.5 | Add to My Program |
Deep-Learning Approach for Revealing Latent Behaviors in Mice: Development of Walking Trajectories Prediction Model and Applications |
|
Oikawa, Haruki | Tokyo University of Sciense |
Tsuruda, Yoshito | Tokyo University of Sciense |
Sano, Yoshitake | Tokyo University of Sciense |
Furuichi, Teiichi | Tokyo University of Sciense |
Yamamoto, Masataka | Tokyo University of Science |
Takemura, Hiroshi | Tokyo University of Science |
Keywords: Neural Networks and their Applications
Abstract: In neuroscience research, in vivo imaging techniques for mice are used to observe brain activity and link it to their behavior. Brain activity can often only be associated with observed behavioral outcomes. In other words, it is difficult to speculate on unmanifested behavior due to factors such as "hesitation" in humans. When a prediction model can predict mice behavior, if brain activity is observed in a specific brain region during incorrect predictions, that would be strong evidence of unmanifest behavior. In this study, we developed a trajectory prediction model to predict the walking trajectory of mice as a prelude to the behavior prediction model. The prediction model was applied to the behavioral analysis of mice administered an anxiolytic drug (diazepam) or saline, revealing significantly different outcomes.
|
|
We-PS30-T10 Regular Session, Hawaii 5 |
Add to My Program |
Human Computer Interaction I |
|
|
|
13:00-13:15, Paper We-PS30-T10.1 | Add to My Program |
Simultaneous Visual and Force Intervention to Present Stair Descent Sensation |
|
Nakagawa, Kota | Hiroshima University |
Kurita, Yuichi | Hiroshima University |
Keywords: Virtual/Augmented/Mixed Reality, Human-Machine Cooperation and Systems, Human-Machine Interaction
Abstract: Descending stairs is a difficult task for the elderly and disabled, and it is difficult to rehabilitate using real stairs because of the high degree of difficulty and the anxiety it causes. Therefore, rehabilitation using real stairs is difficult. Therefore, stair training on a flat surface using motor imagery is expected to be a new rehabilitation training method that eliminates the fear of falling. This paper proposes a new rehabilitation training method that simultaneously provides a visual presentation using virtual reality (VR) and a force sensation to the muscular senses using a pneumatic gel muscle (PGM). The system is designed to allow users to perceive the sensation of descending stairs. To verify the effectiveness of the system, we confirmed the muscle activity of the quadriceps muscle using electromyography (EMG) and evaluated the presence of the sensation of descending stairs using a psychological evaluation.
|
|
13:15-13:30, Paper We-PS30-T10.2 | Add to My Program |
Eye Gaze Estimation Using Iris Segmentation Trained by Semi-Automated Annotation Work |
|
Tanaka, Gai | Tokai University |
Takemura, Kentaro | Tokai University |
Keywords: Human-Computer Interaction
Abstract: The pupil center and glints have been used in conventional eye-gaze estimations, and the optical axis and point-of-gaze are calculated using these as keys. However, the pupil center used as a basis gets shifted when the diameter of the pupil varies, and the variance in illumination condition leads to degradation of accuracy. Therefore, we propose a semi-automated annotation for iris detection and a model-based gaze estimation method that uses an iris center. The iris area was extracted using segmentation trained with annotated data. The optical axis is determined using the iris center, which does not shift, and is therefore implemented instead of the pupil center. We evaluated the robustness of the proposed gaze estimation through an experiment under changing illumination conditions and confirmed the effectiveness of the iris-based estimation.
|
|
13:30-13:45, Paper We-PS30-T10.3 | Add to My Program |
A Multi-Modal Behavior Quantitative Analysis Model for Autism Early Screening |
|
Lei, Jiayi | School of Informatics, Xiamen University |
Zhang, E | Department of Occupational Therapy Education, University of Kans |
She, Yingying | School of Film, National Institute for Data Science in Health An |
Wang, Xin | The School of Informatics, Xiamen University |
Liao, Yuhan | The School of Informatics, Xiamen University |
Hu, Bin | The School of Computer Science and Technology, Beijing Institute |
Wu, Hang | Together Learning Center |
Yang, Minqiang | School of Information Science and Engineering, Lanzhou Universit |
Tian, Jiajia | The School of Informatics, Xiamen University |
Wang, Yong | Department of Pediatric, Fujian Medical University Union Hospita |
Keywords: Human-Computer Interaction
Abstract: 人机交互(HCI)和机器学习(ML)技术具有对自闭症儿童进行行为筛查的潜力,但如何设计工具并可靠地分析行为具有挑战性。在本文中,我们提出了一种交互式行为知觉分析模型,为自闭症筛查寻找客观定量的行为指标。我们介绍了基于自闭症儿童非典型特征设计的多场景反应行为范式。我们记录了91名参与者的眼动数据和面部数据,并进行多模态特征提取,使用机器学习训练分类模型。我们进行了对比实验,实验结果验证了多场景范式和多模态特征群的优势,表明我们的分析方法和筛选模型有效可靠,具有实际研究意义。
|
|
13:45-14:00, Paper We-PS30-T10.4 | Add to My Program |
Error Metric Using Correlation between Binocular Corneal Images |
|
Kawakami, Natsuki | Tokai University |
Takemura, Kentaro | Tokai University |
Keywords: Human-Computer Interaction
Abstract: The 3D point-of-gaze has been calculated using the binocular visual axes in the model-based eye gaze estimation. However, the 3D point-of-gaze is often shifted largely when the visual axis includes certain errors. The distance between the estimated point-of-gaze and the visual axis can be used as an error metric; however, horizontal errors cannot be evaluated, and an additional error metric is required to assess the reliability of the estimated point-of-gaze. Therefore, we propose a novel error metric using the correlation between binocular corneal images. We hypothesize that binocular corneal images extracted around the reflection of the estimated point-of-gaze exhibit a high correlation when the 3D point-of-gaze is correctly estimated. The two approaches were implemented using the corneal images extracted directly and generated based on binocular corneal imaging, and these approaches were evaluated through comparative experiments. We confirmed the feasibility of the proposed error metric and the effectiveness of binocular corneal imaging.
|
|
14:00-14:15, Paper We-PS30-T10.5 | Add to My Program |
Predicting Operator Cognitive States for Supervisory Human-Autonomy Teaming |
|
Kintz, Jacob Ryan | University of Colorado Boulder |
Buchner, Savannah | University of Colorado Boulder |
Anderson, Allison | University of Colorado - Boulder |
Clark, Torin | University of Colorado-Boulder |
Keywords: Human-Computer Interaction, Supervisory Control, Human Space Flight
Abstract: Autonomous systems show promise as teammates for human operators in safety- and performance-critical environments. Humans’ cognitive states (like trust, mental workload, and situation awareness) change as they work in demanding environments with autonomous systems. Providing adaptive autonomous systems with information about human teammates’ cognitive states is a current gap in human-autonomy teaming research. In this paper we present results from a human-autonomy teaming experiment in which participants completed a spaceflight-relevant task in a supervisory, “on-the-loop” role. To complete the task, participants worked with an autonomous system that had five distinct modes of autonomy. Our experiment results show that unobtrusive measures (based on actions participants take and eye tracking data) could inform statistical models that accurately predicted three different subjectively reported cognitive states at the same time (RMSE from 11% to 14% of questionnaires’ ranges). Our work builds upon previous research and demonstrates that our model-building approach can extend to different human-autonomy teaming scenarios. This also represents the first time that three cognitive states were predicted using unobtrusive measures from the same supervisory human-autonomy teaming task. Our research enables future experiments investigating an autonomous system which adapts according to cognitive state predictions.
|
|
We-PS30-T11 Regular Session, Hawaii 6 |
Add to My Program |
Optimization and Self-Organization Approaches |
|
|
|
13:30-13:45, Paper We-PS30-T11.3 | Add to My Program |
Analysis Method of Period Sensitivities and Bifurcations for Rhythm Phenomena |
|
Mori, Yoshihiro | Kyoto Institute of Technology |
Kuroe, Yasuaki | Doshisha University |
Keywords: Optimization and Self-Organization Approaches, Computational Intelligence
Abstract: The purpose of this paper is to propose a method for analyzing the period sensitivities and bifurcations for rhythm phenomena. The authors have already proposed a method for analyzing the sensitivities of the period with respect to parameters, an important characteristic of rhythm phenomena, and computationally efficient algorithms for this purpose. The period sensitivities of rhythm phenomena depend on bifurcations of that phenomena. In this paper, we propose a method and computation algorithms for investigating variations of the period sensitivities and bifurcations for rhythm phenomena by incorporating those algorithms. We also show an example of investigating variations of the period sensitivities and bifurcations by using the developed algorithm and demonstrate what can be revealed by it.
|
|
13:45-14:00, Paper We-PS30-T11.4 | Add to My Program |
Multioutput Surrogate Assisted Evolutionary Algorithm for Expensive Multi-Modal Optimization Problems |
|
Chen, Renzhi | Defense Innovation Institute |
Li, Ke | University of Exeter |
Keywords: Optimization and Self-Organization Approaches, Evolutionary Computation
Abstract: Real-world optimization problems are often computationally expensive and feature multi-modal objective functions. Surrogate-assisted evolutionary optimization has proven to be an effective approach for addressing expensive black-box optimization challenges, but the technique has not been adequately studied in multi-modal situations. In this paper, we propose a simple but effective multi-output surrogate-based approach for empowering surrogate-assisted evolutionary optimization to address expensive multi-modal optimization problems. Specifically, our proposed approach employs a multi-output Gaussian process to capture correlations between data collected from different local areas. Experiments on synthetic benchmark test problems demonstrate the effectiveness of our proposed algorithm against five state-of-the-art peer algorithms.
|
|
We-PS30-T12 Regular Session, Hilo |
Add to My Program |
Energy and Water Management in Smart Cities |
|
|
|
13:00-13:15, Paper We-PS30-T12.1 | Add to My Program |
A New Method for Active Distribution Nework Loss Minimization with Brain-Storm-Optimization–Based Robust Optimization |
|
Mori, Hiroyuki | Meiji University |
Kawauchi, Yusuke | Meiji University |
Keywords: Intelligent Power Grid, System Modeling and Control, Control of Uncertain Systems
Abstract: In this paper, a new method is proposed to minimize network losses in active distribution networks (ADNs). In recent years, studies on ADNs have been prevalent because of the emergence of distributed energy resources (DERs) such as Photovoltaic (PV) systems, wind power generation, etc. One of the challenges is how to handle the uncertainties of such renewables affected by weather conditions. This paper proposes an Evolutionary-Computation (EC)-based Robust Optimization (RO) method for minimizing the network losses in ADNs. This paper aims at reducing the risks of the obtained solutions in ADNs with parameter uncertainties in nonlinear combinatorial optimization problems. RO evaluates feasible solutions by considering the worst scenario through data sampling while EC works to evaluate better solutions that escape from local minima. As a method of EC, this paper makes use of Brain Storm Optimization (BSO) to obtain better solutions. It plays a key role to separate solution candidates into several clusters to evaluate better solutions by selecting one of four rules randomly. The effectiveness of the proposed method is demonstrated in the IEEE 69-node distribution network.
|
|
13:15-13:30, Paper We-PS30-T12.2 | Add to My Program |
Application of Smart Meters for Detecting Human Activity: Concept and Case Study |
|
Kim, Kijun | The University of Tokyo |
Koshizuka, Noboru | The University of Tokyo |
Narisawa, Shota | Japan Data Science Consortium Co., Ltd |
Keywords: Smart Buildings, Smart Cities and Infrastructures, Smart Metering
Abstract: With the widespread implementation of smart meter infrastructure, various applications have been proposed and implemented beyond energy usage monitoring and savings. In particular, there has been a growing interest in using smart meter data for daily life monitoring and remote care systems by visualizing human activities based on electricity, gas, and water usage. Previous research has demonstrated the feasibility of activity detection using an individual smart meter dataset. However, research on the effectiveness of using multiple smart meter datasets is limited. This paper proposes a model for Activities of Daily Living (ADLs) detection using multiple smart meter datasets (i.e., gas and electricity). To evaluate the performance of the proposed approach, a case study was conducted with 47 households for one month, collecting electricity data, gas data, and ground truth data about ADLs. The case study results showed that our proposed approach, which uses multiple smart meter datasets, enables more accurate activity detection.
|
|
13:45-14:00, Paper We-PS30-T12.4 | Add to My Program |
Hybrid Water Quality Prediction with Frequency Domain Conversion Enhancement and Seasonal Decomposition |
|
Bi, Jing | Beijing University of Technology |
Li, Yibo | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Qiao, Junfei | Beijing University of Technology |
Chang, Xingyang | Beijing University of Technology |
Keywords: Smart Buildings, Smart Cities and Infrastructures, Infrastructure Systems and Services, Decision Support Systems
Abstract: Water quality prediction can accurately reflect the development trend of water quality, and it is an important means to prevent the water environment from being polluted and maintain the health of the water environment. Existing prediction methods generally cannot accurately capture nonlinear characteristics of water quality, and suffer from issues of gradient disappearance and gradient explosion. This work designs a water quality prediction model called SMF2 to effectively solve these problems and increase the accuracy of prediction. SMF2 combines the Savitsky-Golay filter, seasonal-trend decomposition using loess for multiple seasonal components, Fourier transform frequency-enhanced block and frequency-enhanced attention, serving for noise smoothing, extraction of exact seasonal components, time domain-frequency domain interconversion, feature extraction, and time series prediction by frequency domain lowrank approximation transform, respectively. Experimental results based on a real-life water environment data set show that the proposed SMF2 outperforms other advanced algorithms in terms of prediction accuracy.
|
|
14:00-14:15, Paper We-PS30-T12.5 | Add to My Program |
Multi-Step Water Quality Prediction with Series Decomposition and Auto-Correlation |
|
Bi, Jing | Beijing University of Technology |
Yuan, Mingxing | Beijing University of Technology |
Yuan, Haitao | Beihang University |
Qiao, Junfei | Beijing University of Technology |
Keywords: Decision Support Systems, Infrastructure Systems and Services, Smart Buildings, Smart Cities and Infrastructures
Abstract: Water quality prediction provides timely management to solve possible water environmental problems, which is of great importance. However, the following challenges exist: 1) The existence of noise in the water quality time series can lead to overfitting of nonlinear models; 2) It is difficult to capture temporal dependencies in complex time series data; 3) Long-term forecasting is difficult to achieve. To address the above difficulties, this work proposes a multi-step water quality prediction model, called SG-Autoformer, which combines the Savitzky-Golay filter, the inner series decomposition, and an autocorrelation mechanism. First, SG-Autoformer performs noise reduction on the water quality time series to suppress overfitting of nonlinear models. Second, it embeds series decomposition inside the encoder and decoder, which obtains more predictable components from complex time series for long-term prediction. Third, SG-Autoformer utilizes the auto-correlation mechanism to capture the time dependence and improve information utilization. Extensive experiments with real-world datasets show that SGAutoformer outperforms other advanced prediction methods in terms of prediction accuracy.
|
|
We-PS30-T13 Regular Session, Kona |
Add to My Program |
Swarm Intelligence |
|
|
|
13:00-13:15, Paper We-PS30-T13.1 | Add to My Program |
Collision-Free Shepherding Control of a Single Target within a Swarm |
|
Deng, Yaosheng | Osaka University |
Li, Aiyi | Osaka University |
Ogura, Masaki | Osaka University |
Wakamiya, Naoki | Osaka University |
Keywords: Swarm Intelligence, Complex Network, Cybernetics for Informatics
Abstract: The shepherding problem refers to guiding a group of agents (called sheep) to a specific destination using the repulsive forces of an external agent (called a shepherd). Although various movement algorithms for the shepherd to achieve the guidance task can be found in the literature, there is a scarcity of methodologies for selective guidance, which is a key technology for realizing finer swarm control. Therefore, in this study, we investigate the problem of guiding a single target sheep within a large swarm to a given destination using a shepherd. We first present our model of the dynamics of sheep agents and the interaction between sheep and shepherd agents. We then show that the model is well-defined in the sense that no collision occurs if the magnitude of the interaction between sheep and shepherd is reasonably small. Based on this analysis and using Lyapunov stability principles, we design a shepherd control law to guide the target to the origin while avoiding collisions. Our experimental results demonstrate that the proposed method is effective in guiding the target sheep in both small and large scale swarms.
|
|
13:15-13:30, Paper We-PS30-T13.2 | Add to My Program |
Distance Constrained Robotic Swarm Shepherding Based on Two-Phase Ant Colony Optimisation |
|
Liu, Jing | UNSW Canberra |
Singh, Hemant Kumar | UNSW Canberra |
Elsayed, Saber | University of New South Wales Canberra |
Hunjet, Robert | Defence Science and Technology, Australian Department of Defence |
Abbass, Hussein | University of New South Wales |
Keywords: Swarm Intelligence, Evolutionary Computation, Application of Artificial Intelligence
Abstract: This paper investigates a swarm shepherding problem which aims to herd multiple sub-swarm of robot agents (sheep) in a large-scale cluttered environment to a specific goal area using multiple distance-constrained robots (sheepdogs) located at different depots. We propose to formulate this challenging problem as a Multi-depot, Distance-constrained Close-Open Mixed Vehicle Routing Problem (MDCOMVRP). We also design a Two-phase Ant Colony Optimisation to address it by decomposing MDCOMVRP into a Multi-depot Open Vehicle Routing Problem~(MOVRP) and a split problem. In the first phase, the Max-Min Ant System algorithm is employed to find open routes for all robots by transforming the MOVRP into a standard Travelling Salesman Problem using the proposed transformation method. In the second phase, a Modified Split algorithm is presented to construct a set of close or open distance-constrained routes, which are further optimised by the 2-opt local search method to generate the optimised sequence for each sheepdog robot to collect/drive sheep sub-swarms. Experiments are conducted to demonstrate that the proposed algorithm can solve MDCOMVRP successfully and assist the robots to complete the swarm shepherding mission efficiently.
|
|
13:30-13:45, Paper We-PS30-T13.3 | Add to My Program |
Investigation of Using Large-Scale Swarm Optimizers to Optimize Sub-Problems in Cooperative Coevolution |
|
Lu, Ming-Yuan | Henan Normal University |
Yang, Qiang | Nanjing University of Information Science and Technology |
Liu, Dong | Henan Normal University |
Ma, Yuan-Yuan | Henan Normal University |
Li, Tao | Henan Normal University |
Zhang, Jun | Hanyang University |
Keywords: Swarm Intelligence, Evolutionary Computation, Metaheuristic Algorithms
Abstract: Cooperative co-evolutionary algorithms (CCEAs) have witnessed giant success in solving large-scale optimization problems (LSOPs). However, most existing CCEAs use low-dimensional EAs to optimize the decomposed sub-problems. Such utilization of low-dimensional EAs may limit the effectiveness of CCEAs because some of the decomposed sub-problems may still be high-dimensional. Since there exist many non-decomposition based large-scale EAs, it is interesting to investigate the optimization effectiveness of CCEAs by using these non-decomposition based large-scale EAs to solve the decomposed sub-problems. To this end, this paper incorporates two state-of-the-art large-scale swarm optimizers into CCEAs with five state-of-the-art decomposition strategies to solve LSOPs. Experiments conducted on the CEC'2010 and CEC'2013 LSOP benchmark sets have shown that the two large-scale swarm optimizers help CCEAs with the five decomposition strategies achieve much better performance than the most widely used low-dimensional EA.
|
|
13:45-14:00, Paper We-PS30-T13.4 | Add to My Program |
Stochastic Dominant Cognitive Experience Guided Particle Swarm Optimization |
|
Pan, Hanyang | Henan Normal University |
Yang, Qiang | Nanjing University of Information Science and Technology |
Li, Ming | Henan Normal University |
Zhang, En | College of Computer and Information Engineering, Henan Normal Un |
Ma, Yuan-Yuan | Henan Normal University |
Li, Tao | Henan Normal University |
Liu, Dong | Henan Normal University |
Zhang, Jun | Hanyang University |
Keywords: Swarm Intelligence, Evolutionary Computation, Computational Intelligence
Abstract: This paper proposes a stochastic dominant cognitive experience-guided learning framework for particle swarm optimization (SDCEGPSO) to enhance its search ability in complex environment. Specifically, different from classical PSOs, SDCEGPSO randomly selects dominant cognitive experiences to guide the learning of particles. To this end, the cognitive experiences of all particles, namely their personal best positions, are sorted from the best to the worst. Then, each particle randomly chooses a personal best position better than its own to learn. For the cognitive experience selection, this paper designs three selection methods, namely the random selection, the roulette wheel selection, and the tournament selection. With this learning framework, particles have diverse guiding exemplars to learn from and thus high search diversity is expectedly maintained. Experiments conducted on the 50-D and 100-D CEC2014 problem suite have verified the effectiveness of SDCEGPSO. Compared with the classical global PSO (GPSO) and local PSO (LPSO), SDCEGPSO with the three selection schemes achieve significantly better performance. Besides, among the three selection schemes, the binary tournament selection is the most effective one to help SDCEGPSO solve optimization problems.
|
|
14:00-14:15, Paper We-PS30-T13.5 | Add to My Program |
Adaptive Perturbation Suppression Control for Multiple Nonholonomic Mobile Robot Clusters against Composite Motion Constraints |
|
Zheng, Zhi | Chongqing University |
Su, Xiaojie | Chongqing University |
Jiang, Tao | Chongqing University |
Shi, Peng | University of Adelaide, Adelaide |
Keywords: Swarm Intelligence, Cybernetics for Informatics
Abstract: The composite velocity and acceleration constraints suffered by nonholonomic mobile robots during motion severely hamper the stability and smoothness of the cluster. This paper presents an applicable control framework for the robust and smooth implementation of distributed formations based on nonholonomic mobile robot clusters against velocity and acceleration saturation. The decoupled feedback linearization control of cascaded position, heading and wheel velocity is used to achieve distributed time-varying formations with modular and scalable stabilization of the triple-level errors. The adaptive auxiliary dynamics are incorporated into the distributed coincident errors to reduce velocity and acceleration saturation, which improves the overall transient results and stability range. The adaptive saturation extended observer in the substrate dynamics is conceived to simultaneously estimate the lumped disturbances and suppress undesired peaks, smoothly enhancing the robustness of wheel velocity control. Ultimately, comparative simulations and extensive experiments for mobile robots are performed to test validity and practicality.
|
|
We-PS4-T1 Regular Session, Hawaii 1 |
Add to My Program |
Human Performance Modelling II |
|
|
|
15:00-15:15, Paper We-PS4-T1.1 | Add to My Program |
Negative Transfer in Task-Based Human Reliability Analysis: A Formal Methods Approach |
|
Bolton, Matthew | University of Virginia |
Riabova, Svetlana | University at Buffalo |
Son, Yeonbin | University of Virginia |
Kang, Eunsuk | Carnegie Mellon University |
Keywords: Human Performance Modeling, Human-Machine Interaction, Human Factors
Abstract: Previous research has shown how statistical model checking can be used with human task behavior modeling and human reliability analysis to make realistic predictions about human errors and error rates. However, these efforts have not accounted for the impact that design changes can have on human reliability. In this research, we address this deficiency by using similarity theory from human cognitive modeling. This replicates how negative transfer can cause people to perform old task behaviors on modified systems. We present details about how this approach was realized with the PRISM model checker and the enhanced operator function model. We report results of a validation exercise using an application from the literature. We discuss the implications of our results and describe future research.
|
|
15:15-15:30, Paper We-PS4-T1.2 | Add to My Program |
A Formal Method for Assessing Mental Workload |
|
Bolton, Matthew | University of Virginia |
Taylor, Skye | University of Virginia |
Humphrey, Laura | Air Force Research Laboratory |
Keywords: Human Performance Modeling, Human-Machine Interaction, Human Factors
Abstract: Mental workload extremes are associated with poor human performance and safety problems across safety critical domains. Mental workload is a complex, difficult-to-predict phenomenon, where issues may only arise due to concurrency between resource-conflicting tasks. This research addresses this deficiency by presenting a novel method for using model checking (for performing formal proofs about concurrent systems) to predict mental workload. Our method combines multiple resource theory and formal methods based on hierarchical task analysis to identify mental workload extremes in a complex system. This paper presents this method and shows preliminary validation using a texting and driving task. Implications of our results and future research are discussed.
|
|
We-PS4-T2 Regular Session, Lanai |
Add to My Program |
Medical Informatics |
|
|
|
15:00-15:15, Paper We-PS4-T2.1 | Add to My Program |
Revised Set Prediction Matching for Chest X-Ray Pathology Detection with Transformers |
|
Bercean, Bogdan | Politehnica University of Timisoara |
Buburuzan, Alexandru-Stefan | The University of Manchester |
Birhala, Andreea | Rayscape |
Avramescu, Cristian | University Politehnica Timisoara |
Tenescu, Andrei | The Polytechnic University of Timisoara |
Costachescu, Dan | University of Medicine and Pharmacy ‘Victor Babes’ Timisoara |
Marcu, Marius | Politehnica University of Timisoara |
Keywords: Medical Informatics, Human-Computer Interaction, Human-Machine Cooperation and Systems
Abstract: Computer-aided-diagnosis (CAD) systems have become an important utensil in today’s radiologist’s toolbox. The new age of computer vision transformers could further increase their value, although domain-specific limitations and adaptations should be studied first. Here we show that with the new adoption of the set prediction paradigm in transformer-based object detection, the Hungarian loss’ applicability to medical imaging could benefit from specialized modifications. A new dataset of 50,000 chest radiographs was used to study the Hungarian set matching’s ability to model the detection of 17 classes of pathologies. Consequently, five new targeted matching schemes were derived accordingly. The proposed strategies increased the overall mean average precision (mAP) score from 47.17 to 50.82 (+3.65). A subsequent reader study involving four radiologists showed the physician’s mean overall sensitivity improved by 6.9 ± 7.1% (95% CI, P = 0.008) while the specificity remained non-inferior (P < 0.001) when assisted by the AI model in the diagnosis of 200 patients. The results show how some of the set prediction matching’s shortcomings could be remodelled to fit chest x-ray pathology detection and make a case for transformed-based computer-aided diagnosis (CAD).
|
|
15:15-15:30, Paper We-PS4-T2.2 | Add to My Program |
Thyroid Nodule Classification in Ultrasound Videos by Combining 3D CNN and Video Transformer |
|
Huang, Jing | Wuhan University of Technology |
Chen, Tianyu | Wuhan University of Technology |
Jiang, Wen | SonoScape |
Zhang, Hewei | Wuhan University of Technology |
Wang, Ruoqi | Wuhan University of Technology |
Keywords: Medical Informatics
Abstract: Diagnosing thyroid nodules with computer-aided techniques remains a challenging task. Using ultrasound videos for the classification of benign and malignant nodules can provide valuable timing and change information that is consistent with clinical diagnosis. In this paper, we propose a novel thyroid nodule classification model based on ultrasound video. To capture different semantic information, we sample video frames at various time intervals and extract local and global features using two different feature extraction branches. Our experimental results show that our method outperforms existing state-of-the-art methods, demonstrating its effectiveness in accurately diagnosing thyroid nodules with ultrasound videos.
|
|
We-PS4-T4 Regular Session, Hawaii 2 |
Add to My Program |
Information Visualization II |
|
|
|
15:00-15:15, Paper We-PS4-T4.1 | Add to My Program |
Phase Discriminated Multi-Policy for Visual Room Rearrangement |
|
Wang, Beibei | Xian Jiatong University |
Wang, Xiaohan | Xi'an Jiaotong University |
Song, Xinhang | Institute of Computing Technology, Chinese Academy of Sciences |
Liu, Yuehu | Xian Jiaotong University |
Keywords: Intelligence Interaction, Networking and Decision-Making
Abstract: Embodied AI, where the agent learns to accomplish tasks through interaction with its surrounding environment, is drawing increasing attention in the community. As a challenging Embodied AI task, visual room rearrangement aims to restore the initially misplaced objects in a room to the target state. Existing approaches usually use a single policy to learn a mapping from visual observation to action. Those methods may be capable of accomplishing tasks with simple goals such as visual navigation. However, the agent in the rearrangement task has to explore various types of interaction for a long time. Only considering a single policy may easily get stuck in local optimum. In this paper, we propose a Phase Discriminated Multi-Policy (PDMP) model, decomposing the task into specific phases and tackling them with customized policies. In particular, we first introduce the graph representation of object relationships providing scene layout knowledge, which is discriminated to task phases. Then based on the knowledge a hierarchical actor-critic module is proposed to dynamically call the policies capable of navigation or object interaction. Each policy is trained with narrowed action space and dense rewards so that they can better converge and cooperate to reach long-term goals. Comprehensive experiments based on the AI2-THOR platform, show that the proposed model achieves better performance than baselines.
|
|
15:15-15:30, Paper We-PS4-T4.2 | Add to My Program |
Object-Aware Attention Branch Network for Interior Style Scene Recognition |
|
Fukao, Kentaro | Doshisha University |
Ono, Keiko | Doshisha University |
Tani, Yuki | Doshisha University |
Keywords: Information Visualization, Kansei (sense/emotion) Engineering, Visual Analytics/Communication
Abstract: Recently, there has been a growing interest in developing recommendation systems that capture users’preferences and interests, while focusing on visual styles has gained increasing attention. However, understanding interior-style scenes is challenging because of the complex interplay of various objects that must be correctly interpreted. To properly understand interior-style scenes, extracting appropriate image features and providing visual explanations for the objects that determine the style is necessary. This study proposed a model that can simultaneously extract image features at multiple scales and provide visual explanations for the objects that characterize each class. Specifically, we adopted a hierarchical attention branch network (ABN) for visual explanations and applied atrous convolution to the feature extractor. We hypothesized that atrous convolution could extract various specific objects of each class that differ in size because its algorithm extracts features at an arbitrary resolution. Our evaluation results show that a conventional hierarchical ABN could not extract interior objects accurately, and our proposed model can detect specific objects by incorporating atrous convolution.
|
|
15:30-15:45, Paper We-PS4-T4.3 | Add to My Program |
How Are Negative Articles Consumed? a Quantitative Analysis of User Behavior in a Real News Service |
|
Ohata, Kazuya | Hosei University |
Iyatomi, Hitoshi | Hosei University |
Morita, Hajime | Gunosy Inc |
Iizuka, Kojiro | Gunosy |
Keywords: Human Perception in Multimedia, Human-Computer Interaction, Information Visualization
Abstract: Recommendation algorithms automatically suggest news articles based on past behavioral logs. There has been reported cases of mental health problems caused by continuous consumption of negative articles, besides recommendation algorithms has a problem of over-recommendation which may induce continuous consumption of negative articles. Although research on the relationship between negative news article consumption and mental health has been conducted via small-scale user interviews, large-scale behavioral research on user engagement with recommended news articles has not been carried out. Therefore, we comprehensively investigated how the emotional polarity of articles affects each indicator of user attention by assigning emotional labels to news articles using crowdsourcing and analyzing about 1 million user behavior logs that viewed these articles. To the best of our knowledge, this is one of the first publicly available studies to analyze the impact of negative articles on users' news consumption behavior on an online news platform. The findings indicated that negative articles, irrespective of their category, were more likely to be clicked on, were read for longer durations, and had lower bounce rates. Furthermore, users showed greater interest in negative news related to entertainment and sports. These findings can be used as a first step for news platforms to build safer recommendation algorithms that consider the psychological impact on users.
|
|
15:45-16:00, Paper We-PS4-T4.4 | Add to My Program |
Agent Based Fetal Face Segmentation for Standard Plane Localization in 3D Ultrasound |
|
Huang, Jing | Wuhan University of Technology |
Wang, Ruoqi | Wuhan University of Technology |
Jiang, Wen | SonoScape |
Shao, Sen | Wuhan University of Technology |
Chen, Tianyu | Wuhan University of Technology |
Keywords: Medical Informatics, Information Visualization, Visual Analytics/Communication
Abstract: In practice, fetal 3D ultrasound can have difficulty in accurately detecting labels for auxiliary standard cut plane localization because of mass loss. Therefore, in this paper, we propose a new segmentation-based reinforcement learning framework for automatically localizing the standard plane of the face: in 3D fetal ultrasound, the initial plane is localized based on anatomical landmarks of mass and geometric relationships, agents navigate through visual segmentation to automatically localize the standard plane, and bound ultrasound views are presented to show the resultant plane. This study was extensively validated on an in-house large dataset. The accuracy of this automatic localization of 3D ultrasound standard planes with sonographer-calibrated median sagittal views of the face, horizontal transverse views of both eyeballs, and coronal views of the nasolabial was 6.64 °/5.65mm, 7.04°/3.58mm, and 5.14 °/4.26mm, respectively, with success rates of 66.67 %, 78.38 %, and 80.41 %, respectively. The experimental results verify that this system can effectively improve navigation performance.
|
|
We-S4VT1 Virtual Session, Room T1 |
Add to My Program |
General Cybernetics VI |
|
|
|
17:45-19:00, Paper We-S4VT1.1 | Add to My Program |
Automated Order Dispatching Strategies Design Using Genetic Programming for Dynamic Ridesharing Problem |
|
Fan, Chongjiong | South China University of Technology |
Jia, Ya-Hui | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Swarm Intelligence, Metaheuristic Algorithms
Abstract: Ridesharing is a popular transportation mode and has become an important part of smart city development, which helps alleviate the pressure of urban travel. The ridesharing problem (RSP) is mainly to match drivers to suitable pas- sengers. In practice, passengers appear dynamically, and the departure and the destination locations of these subsequent orders are unknown, resulting in the dynamic RSP (DRSP). To solve this dynamic optimization problem, this paper develops a new genetic programming hyperheuristic (GPHH) method to evolve order dispatching rules (ODRs), which can guide drivers to match suitable passengers in real time. The proposed GPHH method contains a heuristic template for simulation-based hyper-heuristic optimization. The experiment results show that the proposed GPHH method outperforms the state-of-the-art methods. Further analysis revealed some valuable insights, such as the generalizability of the generated rules and the impact of some features on the results.
|
|
17:45-19:00, Paper We-S4VT1.2 | Add to My Program |
ECdo: An Edge Computing Distributed Data-Driven Evolutionary Optimization Platform |
|
Zeng, Qingye | South China University of Technology |
Wei, Feng-Feng | South China University of Technology |
Guo, Xiao-Qi | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Swarm Intelligence, Computational Intelligence
Abstract: Surrogate-assisted evolutionary algorithms (SAEAs) have become a popular method to solve data-driven optimization problems (DOPs), which are common in industry. However, with the development of the Internet of Things, data are collected, processed, and stored in a distributed manner, leading a new optimization paradigm for SAEAs. To make SAEAs adapt to these distributed DOPs, this paper employs the edge computing paradigm to develop a platform that provides technical support for SAEAs with distributed structures, named ECdo. Specifically, the platform utilizes KubeEdge, an open-source edge computing framework, to mount the cluster and combines microservice interface design with the containerization strategy to offer a flexible deployment approach for distributed SAEAs. In addition, an efficient and stable internal communication mechanism is designed for the interaction between distributed components within the platform. To demonstrate the application of ECdo, we take the examples of a class of distributed DOPs, in which the objective and constraints are expensive and need to be approximated by accumulated data. These problems are known as distributed and expensive constrained optimization problems (DECOPs). We implement a distributed SAEA on ECdo to address DECOPs in real-world scenarios. Experiments show that the ECdo can provide the expected implementation for distributed SAEAs with good network tolerance under tough network conditions.
|
|
17:45-19:00, Paper We-S4VT1.3 | Add to My Program |
Heuristic Navigation Model Based on Genetic Programming for Multi-UAV Power Inspection Problem with Charging Stations |
|
Chen, Xiang-Ling | South China University of Technology |
Liao, Xiao-Cheng | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Swarm Intelligence, Heuristic Algorithms
Abstract: Efficient power inspection is crucial for maintaining a stable power system. During an inspection, unmanned aerial vehicles (UAVs) usually need to be recharged due to the wide geographical range of inspection and the limited battery capacity of UAVs. This limitation makes the problem more challenging that requires not only optimizing the task execution order, but also taking the chargings of UAVs into consideration. In order to address this complex problem, this work first formulates the UAV power inspection planning problem with charging stations. After that, we propose a new heuristic navigation model, in which UAVs can follow a heuristic rule to decide where to go next based on both its own information and task-related information. To obtain the heuristic rule, we design a set of features to describe the status of the UAVs and task completion. Then a genetic programming (GP) algorithm is introduced to evolve and get the heuristic rule. Finally, by applying heuristic navigation rule, the UAV navigation model can automatically prioritize task and charging order, and generate UAV flight routes that satisfy all constraints. The experiment results show that our method significantly outperforms the state-of-the-art algorithms.
|
|
17:45-19:00, Paper We-S4VT1.4 | Add to My Program |
SDC: Spatial Depth Completion for Outdoor Scenes |
|
Huang, Weipeng | University of Electronic Science and Technology of China |
Zhang, Xiaofeng | Shanghai Jiao Tong University |
Xie, Ning | University of Electronic Science and Technology of China |
Zhang, Xiaohua | Hiroshima Institute of Technology |
Keywords: Machine Vision
Abstract: Depth completion is a crucial computer vision task that aims to fill in missing or incomplete depth values in a depth map. In this paper, we propose SDC: Spatial Depth Completion for Outdoor Scenes. Our approach leverages a two-stage architecture with a spatial feature extractor (SFE) to utilize multi-scale features for accurate depth completion effectively. The proposed method incorporates attention mechanisms, including the Efficient Position Attention Module (EPAM) and Channel Attention Module (CAM), to adaptively fuse depth map features and improve the accuracy of depth completion. Additionally, the Pearson loss function is employed to further enhance the accuracy of the completed depth maps. Experimental results on the KITTI depth completion benchmark demonstrate that our method achieves comparable or better results than traditional depth completion methods while significantly reducing the number of parameters. The proposed SDC model shows great potential in practical applications of depth completion, with its ability to effectively fuse multiscale features and compact model size.
|
|
17:45-19:00, Paper We-S4VT1.5 | Add to My Program |
Where Characteristics Effect Night-To-Day Translation Performance? (I) |
|
Yan, Lan | Hunan University |
Zheng, Wenbo | Wuhan University of Technology |
Li, Kenli | Hunan University |
Keywords: Deep Learning, Machine Vision, Representation Learning
Abstract: Inspired by the huge success of generative adversarial networks (GANs), GAN-based night-to-day translation methods have achieved excellent results. However, these methods have not been well visualized and understood, and thus cannot find the characteristics about their night-to-day translation performance. To this end, we present a simple clustering-based parsing approach to effectively understand the internal representations of the GAN-based night-to-day translator. In particular, we first cluster the internal representations of specific layers of the translator into a number of classes. Then, according to the proposed selection strategy, the class that has the most significant impact on the translation performance can be identified. Through experiments on three publicly available datasets, we find the answer of the question (title) is that the characteristics at the junction of bright and dark regions affect the performance of night-to-day translation.
|
|
17:45-19:00, Paper We-S4VT1.6 | Add to My Program |
Generalized Zero-Shot Learning Via Implicit Attribute Composition (I) |
|
Zhou, Lei | Wuhan University of Technology |
Yang, Liu | Zhejiang University |
Li, Qiang | Institute of Automation, Chinese Academy of Sciences |
Keywords: Machine Vision, Deep Learning, Machine Learning
Abstract: Zero-shot learning (ZSL) is an important but challenging task in computer vision that aims to identify unseen classes without matching training samples. Current cutting-edge ZSL methods based on locality focus on acquiring the explicit locality of distinguishing characteristics, which could face a lack of adequate supervision at the class attribute level. This paper introduces a novel approach called IAC, which aims to learn Implicit Attribute Composition for ZSL. This method is more comprehensive compared to attribute localization that solely focuses on class-level attribute supervision. IAC utilizes subspace representations that efficiently capture the inherent structure of high-dimensional image features. Then, we learn implicit attribute composition through subspace representation learning. The superiority of the proposed IAC compared to the state-of-the-art is demonstrated through sufficient experiments conducted on three commonly used ZSL datasets, CUB, SUN, and AwA2.
|
|
17:45-19:00, Paper We-S4VT1.7 | Add to My Program |
Quantum State Generation Via Deep Reinforcement Learning (I) |
|
Wu, Shaojun | University of Electronic Science and Technology of China |
Jin, Shan | University of Electronic Science and Technology of China |
Wang, Xiaoting | University of Electronic Science and Technology of China |
Keywords: Quantum Cybernetics, Quantum Machine Learning, Deep Learning
Abstract: The quantum state generation problem is a major research goal of quantum control and quantum variational algorithms, which use iterative optimization methods to evolve the initial state to the target state. The Twin Delayed Deep Deterministic Policy (TD3) algorithm in reinforcement learning achieves high learning efficiency and better stability in continuous control tasks. Here, using the TD3, we propose a new quantum state preparation method that does not require case-by-case optimization, to find a suitable evolution path to obtain the desired state. Specifically, we input the initial state into the trained actor-network, which can output the parameters of the unitary gates step by step, thus gradually evolving the initial state to the fixed quantum state. According to the reversibility of the unitary transformation, we can obtain a sequence of unitary gates to evolve the fixed state to the desired state. To verify the effectiveness of the algorithm, we perform simulations for one-qubit, two-qubit, and four-qubit cases, and the results show that the trained actor-network can provide appropriate unitary transformations to obtain the fixed state.
|
|
17:45-19:00, Paper We-S4VT1.8 | Add to My Program |
Virtual Pivot Point Model Predicts Instability in Parkinsonian Gaits |
|
Scholl, Patrick | TU DARMSTADT |
Firouzi, Vahid | TU DARMSTADT |
Karimi, Mohammad Taghi | Shiraz University of Medical Sciences |
Seyfarth, Andre | TU DARMSTADT |
Ahmad Sherbafi, Maziar | TU DARMSTADT |
Keywords: Human Performance Modeling, Assistive Technology, Human Enhancements
Abstract: The fear of falling due to changes in gait leads to a decrease in quality of life in Parkinson’s patients. Also, Parkinson’s patients require medical treatment due to falling each year. However, the reasons for the changed walking style still remain unknown. The goal of this study is the evaluation of possible reasons for the changed walking pattern in Parkinson’s disease. A pilot study is conducted, which includes patient experiments, data analysis, and biomechanical modeling. Differences between Parkinsonian and healthy gait are detected and replicated by the model. The model represents simplified body dynamics and is optimized for healthy and Parkinsonian gait, respectively. Comparison measures are ground reaction forces, joint torques, and the virtual pivot point (VPP) location. The VPP is the intersection point of all forces throughout the gait cycle and is closely correlated to human balancing and stability. Parkinsonian gait showed different force and torque curves compared to healthy walking and a VPP below the center of mass location (negative VPP). The model represents healthy walking well. Specific Parkinsonian behavior can be explained by the changed modulation of the model. However, a negative VPP location turns the model unstable as, after two steps, the trunk tilts more than 90 degrees forward. Our modeling with negative VPP shows instability effects observed in Parkinson’s patients walking who struggle with frequent stumbles and falls. Such modeling approaches could be used for developing new rehabilitation techniques and gait assistance devices.
|
|
We-S4VT2 Virtual Session, Room T2 |
Add to My Program |
Cybernetics General V-I |
|
|
|
17:45-19:00, Paper We-S4VT2.1 | Add to My Program |
Priority Scheduling Strategy for Reduced Energy Consumption in UAV-Aided 6G Green Mobile Edge Computing |
|
Zhu, Wenwu | Beijing Information Science & Technology University |
Chen, Xin | Beijing Information Science and Technology University |
Jiao, Libo | Beijing Information Science and Technology University |
Min, Geyong | University of Exeter |
Wang, Yijie | Beijing Information Science and Technology University |
Keywords: Complex Network, Cloud, IoT, and Robotics Integration, Evolutionary Computation
Abstract: With the gradual development of unmanned aerial vehicles (UAV) related disciplines such as mechanics, electronics and wireless communication. UAV technology is an important direction for the future development of sixth generation mobile communications (6G) technology. When the BS computing capacity is insufficient or unavailable, UAV-aided 6G mobile edge computing (MEC) is considered a practical solution to maintain the balance between the number of computing tasks and the number of base stations (BS). Firstly, we consider the transmission power setting and design a prioritization mechanism to determine the link selection. Secondly, we construct energy consumption models for user devices (UDs), BS and UAVs separately. Finally, we propose UAV-aided MEC Genetic Algorithm (UMGA) based on genetic algorithm (GA) to minimize the overall energy consumption of the UAV-aided MEC system. Simulation results show that the system energy consumption of our proposed scheme is lower than the system energy consumption of other baseline schemes. Compared with the three baseline algorithms, the performance of our algorithm is improved by 660%, 300% and 41.2%, respectively.
|
|
17:45-19:00, Paper We-S4VT2.2 | Add to My Program |
Optimal Sharding for Dynamic Throughput Optimization in Blockchain Systems with Deep Reinforcement Learning |
|
Yao, Bingbing | Inner Mongolia University of Technology |
Wan, Jianxiong | Inner Mongolia University of Technology |
Jaffry, Shan | Xi'an Jiaotong-Liverpool University |
Li, Leixiao | Inner Mongolia University of Technology |
Ma, Zhiqiang | Inner Mongolia University of Technology |
Liu, Chuyi | Inner Mongolia University of Technology |
Keywords: Agent-Based Modeling, AI and Applications, Optimization and Self-Organization Approaches
Abstract: The rapid advancement in blockchain technology has enabled its applications across wide spectrum of fields. The blockchain throughput, which is usually measured by Transactions Per Second (TPS), is one of the key metrics to reflect the performance of the blockchain systems. However, current blockchain systems have low TPS rates that makes them unsuitable for latency critical applications like Vehicle-to-vehicle (V2V) communication. To address the above issue, the sharding technology, which divides the network into multiple disjoint groups so that transactions can be processed in parallel, is applied to the blockchain systems as a promising solution to improve TPS. This paper considers the Optimal Blockchain Sharding (OBCS) problem which is formulated as a Markov Decision Process (MDP) where the decision variables are the number of shards, block size and block interval. Previous works solved the OBCS problem via Deep Reinforcement Learning (DRL) based methods where the action space has to be discretized such that it is not too large for tractability. However, the discretization degrades the solution quality since the optimal solution usually lies between discrete values. In this paper, we treat the block size and block interval as continuous decision variables and propose a sharding control algorithm based on Parametrized Deep QNetworks (P-DQN) to efficiently handle the discrete-continuous hybrid action space without the scalability issue. Experimental results show that our Parametrized Deep Q-Networks Blockchain Sharding (P-DQNBS) method can effectively improve the TPS by up to 20%.
|
|
17:45-19:00, Paper We-S4VT2.3 | Add to My Program |
Packet Simulator Tool for Many-Core Systems |
|
Paris, Paulo | Federal University of Sao Carlos |
Pedrino, Emerson | Federal University of Sao Carlos |
Keywords: Cybernetics for Informatics, Evolutionary Computation
Abstract: With recent advances in technology, many-core systems have become increasingly common in high-performance computing applications, such as embedded systems and artificial intelligence. To fully utilize the processing power offered by this architecture, it is necessary to have good management and allocation of application tasks on Processing Elements (PEs), using Design Space Exploration (DSE) techniques. In this article, we develop an approach, called a Packet Simulator Tool for many-core Systems, implemented in a simulation environment using the Matlab, for analysis and calculation of the metric related to energy efficiency based on packet traffic using a Network-on-chip (NoC) architecture for a manycore processor. The approach uses high-level abstraction, is modular, and can be integrated with statistical analysis and Directed Acyclic Graph (DAG) generation tools. The experimental results showed that, although the NoCTweak simulator obtained better results, the approach proposed in this article proved to be promising for academic studies due to its ease of use and faster learning curve.
|
|
17:45-19:00, Paper We-S4VT2.4 | Add to My Program |
A Learning-Based Framework for Constrained Shortest Path Problems |
|
Jin, Xuefeng | Sun Yat-Sen University |
Yu, Shunzheng | Sun Yat-Sen University |
Keywords: Agent-Based Modeling, AI and Applications, Heuristic Algorithms
Abstract: The doubly resource constrained elementary shortest path problem (DRCESPP) has important applications in intelligent network scenarios. For solving this strong NP-hard problem, we introduce learning-based techniques and present a solution framework integrating preprocessing, graph neural networks (GNNs) and deep reinforcement learning (DRL). First, the preprocessing procedure reduces the network size and provides initial feasible paths. Then, the classifier, implemented by a graph attention network (GAT), filters out the reduced networks where better paths exist after preprocessing. Finally, a DRL-based heuristic attempts to construct the optimal path for the filtered reduced networks with an end-to-end solution paradigm. We devise a crafted reward function and a shared low-variance baseline for the reinforcement learning optimization algorithm. Our experiments suggest that the proposed framework achieves better performance compared with competitive heuristic algorithms in terms of solution quality and computational efficiency.
|
|
17:45-19:00, Paper We-S4VT2.5 | Add to My Program |
Sentence Pair Semantic Enhanced Matching Network for Text Information Retrieval |
|
Wang, Weigang | Ocean University of China |
Guo, Zhongwen | Ocean University of China |
Jing, Wei | Ocean University of China |
Wang, Jinxin | Ocean University of China |
Cui, Ziyuan | Ocean University of China |
Li, Xiaomei | Ocean University of China |
Keywords: Expert and Knowledge-Based Systems, Computational Intelligence in Information, Neural Networks and their Applications
Abstract: Text semantic matching is a core problem in Natural Language Processing (NLP), such as information retrieval and question answering, which is significant in intelligent human-computer interaction. However, most deep neural matching models are driven by external knowledge, which lacks fine-grained feature extraction from the sentences, leading to limited performance improvement. Therefore, this paper focuses on generating sentence semantic representations without external knowledge for sentence pair matching. We propose a Gated Attentive Convolutional Recurrent Neural Network (GACRNN), which incorporates a Gated Convolutional Neural Network (GCNN), Multi-scale Cross-Channel Attention Block (MC2AB), and bidirectional gate recurrent units (BiGRUs). First, a gate mechanism is introduced in the convolutional neural network to control the information interaction to extract multi-scale features from the sentence. Then, a multi-scale cross-channel attention mechanism is utilized to capture the feature dependencies at different scales in the channel dimension to generate expressive sentence representation. Finally, an extensive evaluation is conducted on two open-domain and two restricted-domain datasets. The experiment results show that the proposed model outperforms other baselines in terms of sentence pair semantic matching accuracy.
|
|
17:45-19:00, Paper We-S4VT2.6 | Add to My Program |
High Dimensional Exact K Nearest Neighbor Search Using Lower Bound Technique and Parallel Computing |
|
Zhang, Haowen | College of Computer Science and Technology, Zhejiang Sci-Tech Un |
Feng, Jinwang | College of Computer Science and Technology, Zhejiang Sci-Tech Un |
Keywords: Computational Intelligence, Machine Learning, Application of Artificial Intelligence
Abstract: For the past decade, the K Nearest Neighbor (K-NN) search in high dimensional space has been explored extensively. Considerable theoretical and practical algorithms to accelerate approximate K-NN search have been presented. The approximate methods can improve searching efficiency and achieve satisfactory performance. Nevertheless, they are inherently approximation approaches and are not guaranteed to yield exact solutions. To obtain the K-NN results over high dimensional datasets efficiently while guaranteeing the same results as the linear search is a challenging task, attracting a large number of scholars. To this end, in this paper, we focus on improving the exact K-NN search efficiency over high dimensional datasets, and present a framework named LBPC to solve this problem. The lower bound based method and parallel computing are combined in LBPC to accelerate the exact K-NN search. In LBPC, the whole K-NN search task is divided into some sub-tasks and these sub-tasks can be conducted concurrently using the lower bound based method. The LBPC scheme allows users to utilize any lower bound to accelerate the exact K-NN query. In this paper, we use the segment mean to construct the lower bound and provide the theoretical analysis to show its computational efficiency and lower bound property. Various experiments are conducted to analyze the efficiency of LBPC, and the experimental results validate its effectiveness.
|
|
17:45-19:00, Paper We-S4VT2.7 | Add to My Program |
Research on MacBERT-Based Multi-Type Questions Extractive Machine Reading Comprehension |
|
Xingxing, Wang | Inner Mongolia Normal University |
Bao, Yue | Inner Mongolia Normal University |
Li, Yanling | Inner Mongolia Normal University |
Fengpei Ge, Fengpei Ge | Beijing University of Posts and Telecommunications |
Qi, Yaohui | Hebei Normal University |
Wang, Sukun | Inner Mongolia Normal University |
Keywords: Deep Learning, Application of Artificial Intelligence
Abstract: 在本文中,我们提出了一种多层预训练 克服采掘业局限性的方法 机器阅读理解(EMRC)捕获全局 深入的语义信息并促进深度 文本和问题之间的交互。提议的 框架采用掩码语言模型校正BERT (MacBERT)模型和多层感知(MLP)进行预测 每个位置作为答案的概率。要解决 跨度提取,无法回答和是/否问题,我们 采用双向长短期记忆(BiLSTM)和 自我注意创建不同的目标层 神经网络模型。新模型表现出坚固耐用 泛化和交互式提取功能。跟 中国司法阅读理解(CJRC)数据集, 实验结果表明,所提算法 在民用领域带来 3.5% 的显著性能提升 案件和 4.9% 的刑事案件在 F1 分数方面。
|
|
We-S4VT4 Virtual Session, Room T4 |
Add to My Program |
Anomaly Detection V-I |
|
|
Chair: Hou, Yuqiao | Institute of Information Engineering, Chinese Academy of Sciences, Beijing |
|
17:45-19:00, Paper We-S4VT4.1 | Add to My Program |
NadGPT: Semi-Supervised Network Anomaly Detection Via Auto-Regressive Auxiliary Prediction |
|
Hou, Yuqiao | Institute of Information Engineering, Chinese Academy of Science |
Xu, Zhen | Institute of Information Engineering, Chinese Academy of Science |
Wang, Liming | Institute of Information Engineering, Chinese Academy of Science |
Wang, Yuxiang | Institute of Information Engineering, Chinese Academy of Science |
Li, Hongjia | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Communications, Fault Monitoring and Diagnosis, System Modeling and Control
Abstract: We present NadGPT, a transformer-based semi-supervised framework for network anomaly detection. It is known that transformer models are good at modeling long sequence data such as network traffic; however, without sufficient ground-truth labels, transformer models tend to suffer from over-fitting thus leading to inferior performance. Inspired by the recent success of GPT models in natural language processing (NLP), we propose a new auxiliary self-supervised task plugged to the backbone transformer, which enables GPT-like auto-regressive training on network traffic sequence without using ground-truth labels. Experiments demonstrate the proposed method greatly reduces the requirements of labels in network anomaly detection. For example, on ISCX 2012 dataset, given only 0.05% training labels our semi-supervised approach obtains nontrivial 81.7% (2-class) and 64.9% (5-class) F1-scores on the validation set, which is far better than the supervised counterparts using the same training data. We hope our research could inspire more label-efficient methods in network traffic analysis.
|
|
17:45-19:00, Paper We-S4VT4.2 | Add to My Program |
Anomaly Detection in Spot Welding in Automotive Industry with Autoencoder Neural Networks |
|
Brandão, Laislla Carolina Pinheiro | Universidade De Pernambuco |
de Albuquerque Filho, Jose Edson | Universidade De Pernambuco |
Maciel, Alexandre | University of Pernambuco |
Keywords: Fault Monitoring and Diagnosis, Manufacturing Automation and Systems, System Modeling and Control
Abstract: Spot welding is one of the most frequently used material joining techniques in the Automotive Industry. Splashes are an anomalous condition of material expulsion that occurs randomly during the process and since it might result in welds with inadequate quality, it should be avoided. This study uses data of the Spot Welding process of a manufacturing unit that uses BOSCH technology and aims to apply Autoencoders considering a supervised learning approach to identify the occurrence of splashes. Additionally, its goal is to verify if the Autoencoder would outperform traditional techniques in this context when employed to identify rarer anomalies, as anticipated by the studies in the literature review. For this reason, the results for datasets with different anomaly rates are evaluated.
|
|
17:45-19:00, Paper We-S4VT4.3 | Add to My Program |
Fuzz Testing Based on Seed Diversity Analysis |
|
Lan, Wenwei | Beijing Information Science and Technology University |
Cui, Zhanqi | Beijing Information Science and Technology University |
Zhang, Jiaming | Beijing Information Science and Technology University |
Yang, Jun | Beijing Information Science and Technology University |
Gu, Xiguo | Beijing Information Science and Technology University |
Keywords: Quality and Reliability Engineering, Fault Monitoring and Diagnosis
Abstract: Fuzz testing is a widely used technique to detect software defects and vulnerabilities. Coverage-guided fuzzing aims to improve code coverage by generating offspring test cases through mutation, executing the program under test, and retaining interesting seeds for subsequent mutations using customized genetic algorithms. However, existing fuzzing tools rarely consider the similarity between seeds during mutation. Mutating similar seeds frequently generates similar offspring test cases, which results in similar coverage and reduces the efficiency of fuzz testing. To alleviate the impact of this problem on fuzz testing, this paper proposes a fuzz testing method based on seed diversity analysis, which focuses on the characteristics of seeds and uses byte sequences as a feature to measure the similarity between seeds. It collects seeds that can cover new edges and constructs a shorter seed queue with significant differences based on this feature, which replaces the original seed queue for mutation. Based on the proposed method, we implement the prototype tools AFL-Varied and Neuzz-Varied. Compared with AFL and Neuzz on six projects, the edge coverage and basic block coverage can be increased by 214.57% and 233.33% at most, respectively.
|
|
17:45-19:00, Paper We-S4VT4.4 | Add to My Program |
Statement-Level Software Bug Localization Based on Information Retrieval and Spectrum |
|
Li, Jingwen | Beijing Information Science and Technology University |
Yue, Lei | Beijing Information Science and Technology University |
Lan, Wenwei | Beijing Information Science and Technology University |
Cui, Zhanqi | Beijing Information Science and Technology University |
Keywords: Quality and Reliability Engineering, Fault Monitoring and Diagnosis
Abstract: According to whether the program under test is executed, software bug localization methods can be divided into static bug localization and dynamic bug localization. Among them, Information Retrieval-based Bug Localization (IRBL) and Spectrum-based Fault Localization (SFL) are widely used static and dynamic bug localization methods, respectively. But the localization granularity of IRBL is coarse and the localization accuracy of SFL is easily reduced by the information which is unrelated to the bug. In order to refine the localization granularity of IRBL and improve the localization accuracy of SFL, this paper proposes ISBL (Combine Information Retrieval and Spectrum for Bug Localization), a statement-level software bug localization method based on information retrieval and spectrum. Firstly, the suspicious files are filtered using information retrieval technique, and then the suspicious files are used to reduce spectrum information for statement-level bug localization. To evaluate the performance of ISBL, experiments were conducted on the Defects4J dataset, and MRR and TOP@N were used as metrics for evaluation. As the experimental results show, for MRR, ISBL increased 3.0% and 3.1% compared to Ochiai and DStar, respectively; for TOP@1, ISBL locates 4 more bug statements than Ochiai and DStar.
|
|
17:45-19:00, Paper We-S4VT4.5 | Add to My Program |
Predicting Fault-Tolerant Workspace of Planar 3R Robots Experiencing Locked Joint Failures Using Mixture Density Networks |
|
Clark, Landon | University of Kentucky |
Metwly, Mohamed | University of Kentucky |
He, JiangBiao | University of Kentucky |
Xie, Biyun | University of Kentucky |
Keywords: Robotic Systems, Fault Monitoring and Diagnosis
Abstract: There are currently two existing methods to compute the fault-tolerant workspace of a redundant robot arm for a given set of artificial joint limits. However, both of these methods are very computationally expensive. This article proposes using a mixture density network to learn the probability that a rotation angle belongs to the fault-tolerant rotation ranges. A difference filter is used to remove outlying rotation angles predicted by the network, and the remaining rotation angles are grouped together to generate the fault-tolerant workspace. Because this method is highly computationally efficient, it can be used alongside a genetic algorithm to compute the optimal artificial joint limits to maximize the area of the fault-tolerant workspace for a given robot arm. The predicted fault-tolerant workspace is compared to the actual fault-tolerant workspace, which proves the effectiveness of this algorithm. The computational speed of this proposed algorithm is roughly 390 times faster than the traditional method. Finally, a trajectory is placed within the fault-tolerant workspace predicted by the proposed method, and the experimental results show that this trajectory is tolerant to arbitrary joint failures.
|
|
17:45-19:00, Paper We-S4VT4.6 | Add to My Program |
Concept-Based Anomaly Detection in Retail Stores for Automatic Correction Using Mobile Robots |
|
Kapoor, Aditya | TCS Research, Tata Consultancy Services Ltd |
Sengar, Vartika | TCS Research, Tata Consultancy Services Ltd |
George, Nijil | TCS Research, Tata Consultancy Services Ltd |
Vatsal, Vighnesh | TCS Research, Tata Consultancy Services Ltd |
Gubbi, Jayavardhana | TCS Research, Tata Consultancy Services Ltd |
P, Balamuralidhar | TCS Research, Tata Consultancy Services Ltd |
Pal, Arpan | Tata Consultancy Services |
Keywords: Robotic Systems, Service Systems and Organizations, Cyber-physical systems
Abstract: Tracking of inventory and rearrangement of misplaced items are some of the most labor-intensive tasks in a retail environment. While there have been attempts at using vision-based techniques for these tasks, they mostly use planogram compliance for detection of any anomalies, a technique that has been found lacking in robustness and scalability. Moreover, existing systems rely on human intervention to perform corrective actions after detection. In this paper, we present Co-AD, a Concept-based Anomaly Detection approach using a Vision Transformer (ViT) that is able to flag misplaced objects without using a prior knowledge base such as a planogram. It uses an auto-encoder architecture followed by outlier detection in the latent space. Co-AD has a peak success rate of 89.90% on anomaly detection image sets of retail objects drawn from the RP2K dataset, compared to 80.81% on the best-performing baseline of a standard ViT auto-encoder. To demonstrate its utility, we describe a robotic mobile manipulation pipeline to autonomously correct the anomalies flagged by Co-AD. This work is ultimately aimed towards developing autonomous mobile robot solutions that reduce the need for human intervention in retail store management.
|
|
17:45-19:00, Paper We-S4VT4.7 | Add to My Program |
A Novel Adaptive Convolution Confidence Learning for Surface Defect Detection |
|
Lei, Lei | City University of Hong Kong |
Li, Han-Xiong | City University of Hong Kong |
Keywords: Machine Vision, Neural Networks and their Applications, Deep Learning
Abstract: Defect detection is an essential part of quality management for industrial processes. Existing vision-based detection methods are inefficient when uncertainty exists in the industrial image. This paper proposes a systematic methodology for defect detection of uncertain industrial images. It consists of adaptive convolution and confidence learning. First, a convolution model adaptively fuses multiple kernel prediction results, which is employed to learn image defect variation. After that, a confidence learning method is developed to filter the label noise and fine-tune the adaptive convolution model. Finally, experimental studies indicate the proposed method can achieve satisfactory detection accuracy and robustness.
|
|
17:45-19:00, Paper We-S4VT4.8 | Add to My Program |
Lightweight Cryptography Implementation for Internet of Things Network on FPGA |
|
Wu, Ruoyu | Zhejiang University |
Tian, Guanzhong | Zhejiang University |
Ma, Longhua | Information School, NingboTech University |
Li, Zhishan | Zhejiang University |
Liu, Shanqi | Zhejiang University |
Keywords: System Architecture, Communications, Consumer and Industrial Applications
Abstract: With the development of modern communication technology, traditional Internet of Things systems could not provide sufficient support for large data flow, hardware resource, power consumption, and security during transmission. Especially when it comes to the Industrial Internet of Things (IIoT), where the limitations of hardware resource and power consumption are more strict; the requirements of network security and hardware security are much higher. This paper implements a User Datagram Protocol (UDP) communication system, which is encrypted with one of the Lightweight Cryptography, Xoodyak. We aim to design a lightweight encrypted communication platform. Compared with a public key encryption system, our implementation costs much less hardware resource and power consumption, while providing excellent side-channel attack (SCA) protection. Besides, when there is a large burst length of data flow, two asynchronous FIFO can restore those data respectively, so our design can maintain throughput and data integrity in extreme cases. Based on the above properties, our design is relatively ideal in IIoT encryption scenarios.
|
|
We-S4VT5 Virtual Session, Room T5 |
Add to My Program |
Assistive Technology V-I |
|
|
|
17:45-19:00, Paper We-S4VT5.1 | Add to My Program |
Understanding Approaching Behavior for a Wheelchair with a Robotic Arm: A Human Study for Improving Autonomous Navigation |
|
Sarathchandra, Hadigngnapola Appuhamillage Harindu Yasasvi | University of Moratuwa |
Priyanayana, Kodikarage Sahan | University of Moratuwa |
Jayasekara, Buddhika | University of Moratuwa |
Gopura, Ruwan | University of Moratuwa |
Keywords: Assistive Technology, Cognitive Computing, Human-Collaborative Robotics
Abstract: An intelligent wheelchair is a system that evolves its functions for the well-being of humankind in the present status of robotics. Due to the active evolution of these wheelchair systems wheelchair-mounted robotic arms can be caught as the next level of development. However, the current state of intelligence is far behind the expansion and this lack causes to add extra cognitive load on handicapped users. Most of the available hindrances to autonomous operation could certainly be achieved by replicating natural human behaviors. Under these circumstances understanding human cognition on positioning a wheelchair around a table will drastically ease the autonomous object manipulation tasks by these systems. Hence to understand natural human behavior, a human study with three sub-studies was designed and reveal the prominent factors behind human cognition. Results were analyzed statistically to identify the significance of the considered factors. In lite of the study, approaching and positioning of the wheelchair mainly depend on the object position, obstacle position, and obstacles configuration within the considered workspace. Further orientation mainly depends on the object’s position and approaching direction. Besides, these outcomes would be extremely beneficial in synthesizing human cognition to build human-friendly mobile robots for uplifting users’ lives as well.
|
|
17:45-19:00, Paper We-S4VT5.2 | Add to My Program |
Object Detection under Finger Occlusion in AR Geography Assisted Teaching System by Using PCCNet |
|
Yan, Ping | Chongqing University |
Chen, Hengxin | Chongqing University |
Dong, Shuang | Chongqing University |
Chen, Xinrun | Chongqing University |
Keywords: Virtual/Augmented/Mixed Reality, Assistive Technology, Augmented Cognition
Abstract: Augmented reality(AR) plays an important role in geography teaching for creating interactive and immersive experiences. Combining object detection algorithms with AR can identify the specified content quickly and thus overlay digital content to the real world. However, finger occlusion in AR interactions has a bad influence on object detection, which will affect the users’ experience. In this paper, we focus on the detection of country regions on a globe, and aims to improve the performance of object detection in practical AR development. Firstly, we propose a geographic region recognition approach based on region missing-completion. Specifically, we design a supplementary algorithm PCCNet to infer the obscured country by utilizing the invariance of relative position between countries. Moreover, to reduce manual annotation and enrich the virtual dataset, we design a scalable automatic annotation system based on the Unreal Engine and construct a virtual globe dataset named DGAR. Finally, we build an AR geography-assisted teaching system to recognize the area pointed and play multimedia materials. Experiment results show that our proposed approach effectively improves the recognition accuracy from 88.5% to 94%. The practical significance and value of the proposed recognition approach have been confirmed based on the positive user experience with the AR system, highlighting its efficacy in real world scenarios.
|
|
17:45-19:00, Paper We-S4VT5.3 | Add to My Program |
Assessing Electromyographic and Kinematic Signals for Reach-And-Grasp Intention Decoding in Persons with Spinal Cord Injury |
|
Wolf, Marvin Frederik | Heidelberg University Hospital |
Rupp, Rüdiger | Universitätsklinikum Heidelberg |
Schwarz, Andreas | Heidelberg University Hospital |
Keywords: Assistive Technology, Human Performance Modeling, Human-Machine Interface
Abstract: Human-machine interfaces (HMIs) based on muscular and kinematic information promise intuitive real-time control of assistive devices such as grasp neuroprosthesis for persons with cervical spinal cord injury (SCI). However, interpreting this data is challenging due to high dimensionality and nested discriminative information. Hence, feature engineering and ranking are imperative to minimize computational load while maintaining high performance. In this work, we recorded electromyography (EMG) and kinematic (acceleration, orientation, angular rate) information of inertial measurement units (IMUs) during reach-and-grasp movements (uni-/bimanual palmar/lateral grasps) in groups of non-disabled people (n=12) and of people with incomplete cervical SCI (n=3). We extracted 12 EMG and 11 IMU feature types of 8 EMG and 45 IMU channels. We applied the feature selection approaches chi-square, maximum relevance-minimum-redundancy (MRMR), Random Forest (RF), and Boruta for dimensionality reduction of the feature set and evaluated resulting subsets. We could show for both groups that there was no significant decrease in classification accuracy (RF model) with chi-square and Boruta subsets compared to the baseline set of all features, despite their heavily reduced dimensionalities (<25% and <74%, respectively). Accuracies peaked at 98.6 ± STD 0.9% (control group, Boruta) and 97.7 ± STD 1.1% (participants with SCI, Boruta). We found the MRMR subsets to be performing significantly worse. We could further show high information interpretability of chi-square and RF scores that indicated the importance of sensors and extracted features for reach-and grasp classification. We plan to investigate how the approach can be implemented in real-time reach-and-grasp HMIs for persons with SCI.
|
|
17:45-19:00, Paper We-S4VT5.4 | Add to My Program |
Protecting the Future: Neonatal Seizure Detection with Spatial-Temporal Modeling |
|
Li, Ziyue | Microsoft Research |
Fang, Yuchen | Shanghai Jiao Tong University |
Youli, Youli | Central South University |
Ren, Kan | Microsoft |
Wang, Yansen | Microsoft |
Luo, Xufang | Microsoft Research |
Duan, Juanyong | Microsoft |
Huang, Congrui | Microsoft |
Li, Dongsheng | Microsoft Research Asia |
Qiu, Lili | Microsoft Research Asia |
Keywords: Brain-Computer Interfaces
Abstract: A timely detection of seizures for newborn infants with electroencephalogram (EEG) has been a common yet life-saving practice in the Neonatal Intensive Care Unit (NICU). However, it requires great human efforts for real-time monitoring, which calls for automated solutions to neonatal seizure detection. Moreover, the current automated methods focusing on adult epilepsy monitoring often fail due to (i) dynamic seizure onset location in human brains; (ii) different montages on neonates and (iii) huge distribution shift among different subjects. In this paper, we propose a deep learning framework, namely STATENet, to address the exclusive challenges with exquisite designs at the temporal, spatial and model levels. The experiments over the real-world large-scale neonatal EEG dataset illustrate that our framework achieves significantly better seizure detection performance.
|
|
17:45-19:00, Paper We-S4VT5.5 | Add to My Program |
Predictive Simulations of a Wearable Balance Assistance Device in Neuro-Musculoskeletal Models |
|
Hidalgo, Andres Francisco | Istituto Italiano Di Tecnologia |
Svampa, Davide Geoffrey | Istituto Nazionale Per l'Assicurazione Contro Gli Infortuni Sul |
Deshpande, Nikhil | Istituto Italiano Di Tecnologia |
Keywords: Human-Machine Cooperation and Systems, Human-Collaborative Robotics, Assistive Technology
Abstract: The traditional approach in developing balance support devices for humans follows two strategies: (i) to simulate these systems in simplified conditions, considering for instance, the human model as an inverted pendulum; or (ii) using them in open-loop configuration, relying on feed forward controllers. Both approaches being focused on device performance tend to ignore the collateral effect that the device generated torques have on human motor control and the changes caused in gait patterns, etc. With the aim of extracting and understanding such effects this paper presents a first study in combining the control of a balance support wearable gyroscope device with a full-body neuro-musculoskeletal human model in the loop in a predictive simulation framework. The SCONE predictive simulation software provides an OpenSim human model with 9 degreesof- freedom (DoFs) and 18 muscles (9 per leg). The wearable device is modeled in OpenSim and implemented in SCONE as a pyramid cluster of variable speed control moment gyroscopes (VSCMGs), which includes realistic actuator models for the gyroscopes, and allows redundancy to overcome singularities. The performance of the VSCMG device for balance support in 3D was evaluated in two different simulation scenarios: (i) recovering the vertical position of the human model without any human effort contribution, from an initial 10◦ backward inclination; (ii) supporting a walking human model, overcoming intermittent force perturbations to the torso in 3D. The second case also demonstrates that the VSCMG device and the human effort parameters can be optimized to keep them working synchronously to fulfil their balancing and walking tasks. The successful synchronization of the intervening controllers enabled us to operation the VSCMG device in three modes, and analyze the effects on the metabolic energy expenditure, and the gait and balance patterns of the human model. The results show that the knowledge of such mutual device-human adaptations can indeed lead to crucial insights and help improve the designs of the wearable devices themselves.
|
|
17:45-19:00, Paper We-S4VT5.6 | Add to My Program |
Conceptual Design and Simulation of Cold Gas Thrusters As Wearable Fall Arresting Devices |
|
Naderi Akhormeh, Alireza | Istituto Italiano Di Tecnologia |
Hidalgo, Andres Francisco | Istituto Italiano Di Tecnologia |
Svampa, Davide Geoffrey | Istituto Nazionale Per l'Assicurazione Contro Gli Infortuni Sul |
Deshpande, Nikhil | Istituto Italiano Di Tecnologia |
Keywords: Human-Collaborative Robotics, Assistive Technology, Human Enhancements
Abstract: Every third work-related accident involving death or permanent disability is a falling from height accident. Preventing or mitigating the impact of such accidents, has been a long-standing challenge and using wearable robotics technology in such applications is an emerging research field. In this paper, we propose a novel fall velocity mitigation and upper-body reorientation mechanism using a wearable Cold Gas Thruster (CGT) unit to decrease the impact and therefore the injuries for low heights falls (≤ 10m). The ejecting pressurized cold gas of CGTs generates a reaction force that is used for attitude control of spacecrafts and satellites, and in wearable manned maneuvering units of astronauts. Here, we demonstrate, through a conceptual design in simulation, the application of CGTs in reorienting falling humans and reducing their impact velocity to a safe level (≤ 4.8m/s). The simulation system accounts for constraints of the device design including the weight of the CGT unit (based on its eventual wearability and usability), the height and weight of the human, the safe impact velocity, and the safe impact orientation, evaluated for different fall heights. The simulation uses a pre-existing skeletal model from OpenSim for the human motion modeling, and implements a feedback control architecture to allow pointmass evaluation as well as multi-body dynamic simulation. Analysed over a set of different initial falling postures, the results show that a wearable backpack CO2-based CGT unit, with CGTs mounted in specific locations, is capable of achieving a fall impact velocity ≤ 4.8m/s, and maintain an upper body impact orientation within ± 15°, for falling heights ≤ 10m. The conceptual design establishes the feasibility of using CGTs as wearable fall arresting devices and lays the groundwork for their eventual real-world implementation.
|
|
17:45-19:00, Paper We-S4VT5.7 | Add to My Program |
Assessing Upper Limb Motor Function in the Immediate Post-Stroke Period Using Accelerometry |
|
Wallich, Mackenzie | University of Calgary |
Lai, Kenneth | University of Calgary |
Yanushkevich, Svetlana | University of Calgary |
Keywords: Assistive Technology, Biometrics and Applications,, Wearable Computing
Abstract: Recent advancements in machine learning have enabled the use of long-term accelerometry data collection and machine learning algorithms to quickly and accurately detect upper limb weakness. Although accelerometry-derived measurements are commonly used in long-term rehabilitation studies, this study aimed to determine whether similar techniques could be used to detect short-term changes in upper limb motor function in patients who were hospitalized soon after experiencing a stroke. Six binary classification models were created by training on variable data window times of paretic upper limb accelerometer feature data, and four preliminary visualizations were proposed to provide health professionals with information on the duration, intensity, symmetry, and variability of upper limb activity. The models were evaluated using Area Under the Curve (AUC) scores to classify the data into two classes: severe or moderately severe motor function. The AUC scores ranged from 0.72 to 0.94, with higher scores indicating better model performance. While this study provides a preliminary assessment of the efficacy of using accelerometry and machine learning to characterize upper limb motor function immediately following a stroke, the results suggest that further investigation is warranted.
|
|
We-S4VT6 Virtual Session, Room T6 |
Add to My Program |
Biometric Systems, Bioinformatics, and Other Applications of AI |
|
|
|
17:45-19:00, Paper We-S4VT6.1 | Add to My Program |
Wildlife Species Recognition Using Deep Learning |
|
Salomón, Sergio | Axpe Consulting |
Bringas, Santos | University of Cantabria |
Duque, Rafael | University of Cantabria |
Montaña, José Luis | University of Cantabria |
González, Avelino | University of Central Florida |
Keywords: Application of Artificial Intelligence, Machine Vision, Deep Learning
Abstract: The recent deep learning techniques of the last decade are opening new applications and innovations for numerous diverse fields. In this research, we study the problem of wildlife species recognition based on camera trap snapshots and automated methods. In comparison to conventional image classification, this poses a harder problem due to noise and information redundancy. To this end, we apply state-of-the-art techniques from computer vision, and we consider the characteristics of this problem in regard to heterogeneous noisy data. We design a preliminary approach, in the form of a data pipeline, based on techniques such as image preprocessing, data augmentation, transfer learning, and convolutional neural network models. We introduce in this work a case study for the Integrated Management System of the Natural Environment (known as "SIGMedNat'") that collects data about Cantabria's wildlife. We analyze several factors and challenges for this case, as well as results from our preliminary approach for species recognition. This application can be useful to facilitate and improve the monitoring and tracking of wildlife animals for the purposes of observation and preservation.
|
|
17:45-19:00, Paper We-S4VT6.2 | Add to My Program |
Emotion-Conditioned Melody Harmonization with Hierarchical Variational Autoencoder |
|
Ji, Shulei | Xi'an Jiaotong University |
Yang, Xinyu | Xi'an Jiaotong University |
Keywords: Application of Artificial Intelligence, Multimedia Computation, Neural Networks and their Applications
Abstract: Existing melody harmonization models have made great progress in improving the quality of generated harmonies, but most of them ignored the emotions beneath the music. Meanwhile, the variability of harmonies generated by previous methods is insufficient. To solve these problems, we propose a novel LSTM-based Hierarchical Variational Auto-Encoder (LHVAE) to investigate the influence of emotional conditions on melody harmonization, while improving the quality of generated harmonies and capturing the abundant variability of chord progressions. Specifically, LHVAE incorporates latent variables and emotional conditions at different levels (piece- and bar-level) to model global and local music properties. Additionally, we introduce an attention-based melody context vector at each step to better learn the correspondence between melodies and harmonies. Experimental results of the objective evaluation show that our proposed model outperforms other LSTM-based models. Through subjective evaluation, we conclude that only altering the chords hardly changes the overall emotion of the music. The qualitative analysis demonstrates the ability of our model to generate variable harmonies.
|
|
17:45-19:00, Paper We-S4VT6.3 | Add to My Program |
Trusted Detection for Parkinson's Disease Based on Multi-Type Speech Fusion |
|
Liu, Yuxuan | Nanjing University of Posts and Telecommunications |
Ji, Wei | Nanjing University of Posts and Telecommunications |
Zhou, Lin | Nanjing University of Posts and Telecommunications |
Zheng, Huifen | Jiangsu Province Geriatric Hospital |
Li, Yun | Nanjing University of Posts and Telecommunications |
Keywords: Application of Artificial Intelligence, Deep Learning
Abstract: Most patients with Parkinson's disease (PD) suffer from varying degrees of dysarthria. Therefore, speech can be exploited as an effective source of diagnostic information for PD. Different types of speech tasks are designed to evaluate subjects’ verbal ability. Currently, machine learning methods for the detection of PD mostly concentrate on a single type of speech data. To make full use of the information from multiple sources, multimodal learning has been proposed and developed rapidly in recent years. However, most multimodal frameworks fall short of the reliability requirements of medical diagnosis. To solve this problem, a trustworthy model based on multi-type acoustic materials is proposed in this paper. The framework consists of three major components, i.e., pseudo-type generation, decision-making, and opinion combination. The objective of this endeavor is to offer accurate and reliable PD detection, aiding in the diagnostic process. Experimental results demonstrate the advantage of the proposed model over state-of-the-art fusion methods and highlight the necessity of each component in the proposed framework.
|
|
17:45-19:00, Paper We-S4VT6.4 | Add to My Program |
FaFCNN: A General Disease Classification Framework Based on Feature Fusion Neural Networks |
|
Kong, Menglin | Central South University |
Zhao, Shaojie | Shanghai University of Engineering Science |
Cheng, Juan | Central South University |
Li, Xingquan | Pengcheng Laboratory |
Su, Ri | Central South University |
Hou, Muzhou | Central South University |
Cao, Cong | Central South University |
Keywords: Biometric Systems and Bioinformatics, Image Processing and Pattern Recognition, Neural Networks and their Applications
Abstract: There are two fundamental problems in applying deep learning/machine learning methods to disease classification tasks, one is the insufficient number and poor quality of training samples; another part is how to effectively fuse multiple source features and thus train robust classification models. To address these problems, inspired by the process of human learning knowledge, we propose the Feature-aware Fusion Correlation Neural Network (FaFCNN), a general framework for disease classification. Specifically, FaFCNN improves the way existing methods obtain sample correlation features, experimental results show that training using augmented features obtained by pre-training gradient boosting decision tree (GBDT) yields more performance gains than random forest (RF)-based methods. To further improve the classification performance on low-quality datasets, FaFCNN introduces the Feature-aware Interaction Module (FaIM) to model interaction terms of augmented features in a more fine-grained manner. In the feature fusion approach, the Feature Alignment Module (FAM) based on adversarial training is introduced to alleviate the performance degradation caused by the naive summation of existing methods. On the low-quality dataset with a large amount of missing data in our setup, FaFCNN obtains a consistently optimal performance compared to competitive baselines. Extensive experiments demonstrate the robustness of the proposed method and the effectiveness of each component of the model.
|
|
17:45-19:00, Paper We-S4VT6.5 | Add to My Program |
Prediction of Protein-Protein Interactions Based on Attention Diffusion Mechanism in Heterogeneous Information Network |
|
Liu, JinLing | School of Computer Science and Artificial Intelligence Wuhan Uni |
Peng, Jing | Wuhan University of Technology |
Keywords: Biometric Systems and Bioinformatics, Neural Networks and their Applications
Abstract: Protein-protein interaction (PPI) prediction is a deep exploration of the mechanism of life activities, but it is costly to rely solely on experimental methods to predict PPI. To accomplish this task, many computational methods have been proposed, but existing methods fail to make full use of the connection between proteins and other molecules, and can not effectively capture the complex semantics between biological entities related to proteins, resulting the poor performance. In this paper, we propose a heterogeneous graph attention diffusion network (HGADN-PPI) to capture the complex semantics in the biological heterogeneous graph for PPI prediction. HGADN-PPI enhances heterogeneous graph learning from the intra-layer perspective and the inter-layer perspective. Concretely, in the intra-layer perspective, we constructed the heterogeneous graph attention diffusion layer (HGADL) which combines biological semantic information from different paths. To enable information passing from important nodes multiple hops away, we use graph diffusion to establish connections between nodes that are not directly connected. In the intra-layer perspective, we enlarge the receptive field by stacking multiple HGADLs. The experimental results demonstrate that the proposed model outperforms the selected representative baselines.
|
|
17:45-19:00, Paper We-S4VT6.6 | Add to My Program |
DBPR: Dynamic Bidirectional Propagation Relationship Graph Convolution Network for Fake News Detection on Social Media |
|
Wang, Zihang | Ningbo University |
Pan, Shanliang | Ningbo University |
Yang, Ze | Ningbo University |
Keywords: Computational Intelligence in Information, Application of Artificial Intelligence, Deep Learning
Abstract: With the emergence of social networks and digital media, there has been an increasing proliferation of channels through which people receive information, potentially resulting in the widespread dissemination of fake news. Previous approaches to detecting fake news have primarily focused on mining textual features, while neglecting the dynamic spatiotemporal characteristics of news propagation. In this paper, we propose a novel Dynamic Bidirectional Propagation Relationship (DBPR) model that integrates sequential propagation graphs and backtracking trees in the temporal dimension, incorporating positional encoding based on the temporal order of news propagation to more effectively capture its dynamic nature. Extensive experiments conducted on two publicly available datasets demonstrate that our proposed approach achieves state-of-the-art performance, particularly by providing valuable insights for early detection of fake news.
|
|
17:45-19:00, Paper We-S4VT6.7 | Add to My Program |
Enhancing Protein Subcellular Localization Prediction through Multi-Feature Fusion |
|
Kai, Zhao | Xinjiang University |
Liang, Weiyang | Xinjiang Unniversity |
Xuehua, Bi | Xinjiang Medical University |
Yu, Guanglei | Xinjiang Medical University |
Na, Quan | Xinjiang University |
Linlin, Zhang | Xinjiang University |
Keywords: Biometric Systems and Bioinformatics, Deep Learning, Machine Learning
Abstract: Accurately determining the subcellular location of proteins is essential for comprehending their functions, as it provides crucial insights into biochemical pathways and regulatory mechanisms. Although some sequence-based methods have achieved satisfactory performance, it is insufficient to infer the subcellular location of the protein solely from amino acid sequence. In this paper, we propose a protein subcellular localization (PSL) method that utilizes multi-source data and multi-feature fusion. Firstly, we obtain three features, Di-peptide Composition, Moran correlation and Conjoint-Triad, from the amino acid sequence. We also employ node2vec to extract feature from protein-protein interaction (PPI) networks and combine them with gene ontology (GO) feature. To eliminate redundant information between different features, we fuse the multiple features from different sources with an auto-encoder. Finally, we employ a supervised learning model, wide and deep (W&D), to predict the subcellular location of protein. The experimental results demonstrate that our approach achieves higher accuracy than state-of-the-art (SOTA) methods. This approach provides a promising solution for accurately predicting the subcellular location of proteins.
|
|
We-S4VT7 Virtual Session, Room T7 |
Add to My Program |
Blockchain and Cryptography in IoT and Data Security |
|
|
|
17:45-19:00, Paper We-S4VT7.1 | Add to My Program |
A GPU-Accelerated Framework for Standard White-Box Cryptographic Algorithms in Unattended IoT Devices |
|
Ouyang, Qiaoliang | Tongji University |
Li, Yimin | Tongji University |
Shi, Yang | Tongji University |
Keywords: Cyber-physical systems, Smart Sensor Networks
Abstract: White-box cryptography is widely used in Internet of Things (IoT) devices to ensure data confidentiality. However, the traditional implementations of white-box cryptographic algorithms (WBCAs) are inefficient and impractical for IoT devices that require fast encryption and decryption of large amounts of data. To address this challenge, we propose a framework that can accelerate WBCAs employed by IoT devices with Graphics Processing Units (GPUs). Our framework leverages the inherent multithreading capabilities of GPUs to simultaneously perform a large number of table lookups required in WBCAs. Additionally, we employ pipelined execution strategies to enhance performance further. To demonstrate the effectiveness of our framework, we apply it to two well-known WBCAs and evaluate its performance on two IoT devices. The experimental results show that our framework can improve performance by up to 70 times compared with the execution on a single-threaded CPU.
|
|
17:45-19:00, Paper We-S4VT7.2 | Add to My Program |
A Practical Framework of Blockchain in IoT Information Management |
|
Quanlong, Guan | Jinan University, Guangzhou |
Lei, Jiawei | Jinan University, Guangzhou |
Wang, Chaonan | Jinan University, Guangzhou |
Geng, Guanggang | Jinan University, Guangzhou |
Zhong, Yuansheng | Guangdong Testing Institute of Product Quality Supervision, Guan |
Fang, Liangda | Jinan University, Guangzhou |
Huang, Xiujie | Jinan University, Guangzhou |
Luo, Weiqi | Jinan University, Guangzhou |
Keywords: Enterprise Information Systems, Service Systems and Organizations, System Architecture
Abstract: In order to improve the security of IoT information systems, this paper proposes the Blockchain-based Framework for Securing IoT Information (BFSII), which is built on consortium blockchain and the edge IoT architectures. This paper addresses data security in smart hotels as a research scenario. The majority of data generated by IoT devices in smart hotels contains users' private information, which is susceptible to alteration and leakage during transmission and storage. The BFSII solution leverages the decentralized nature of blockchain to enhance data traceability and tamper-proof capabilities. And it uses edge IoT architecture and consortium blockchain to improve system operational efficiency. Sensitive data generated by IoT devices are protected in BFSII. The experiment's findings show that BFSII can boost smart hotel system security while maintaining operational effectiveness. The information management system of smart hotels is provided with an inventive and secure solution by the BFSII framework.
|
|
17:45-19:00, Paper We-S4VT7.3 | Add to My Program |
An Overview of Blockchain-Based Application in Internet of Things (IoT) |
|
Molokwu, Reginald Chukwuka | University of New Brunswick |
Molokwu, Bonaventure Chidube | Concordia University - Gina Cody School of Engineering and Compu |
Molokwu, Victor | Hariot-Watt University, Edinburgh |
Keywords: Distributed Intelligent Systems, Infrastructure Systems and Services, Cyber-physical systems
Abstract: The Internet of Things (IoT) is a very crucial aspect of Computing, and it fosters the interconnection of physical nodes on the Internet via enabling them to interact and share data. However, the potentials for security and privacy breaches increase as the number of connected devices/nodes (in the network) rises. Blockchain technology, with its capacity to provide secure and tamper-proof data storage, possess the potentials to mitigate the aforementioned vulnerabilities in the IoT. Hence, this paper gives a detailed review of the present state of blockchain-based applications in the IoT as well as the prospective benefits of employing blockchain technology with the aim/goal of securing IoT devices and its infrastructure. Also mentioned in this paper are the obstacles, research gaps, etc., currently impeding the implementation of some blockchain-based technologies; and the potential solutions toward surmounting these challenges.
|
|
17:45-19:00, Paper We-S4VT7.4 | Add to My Program |
Automaton-Based Data Consistency Detection |
|
Li, Xuejian | Anhui University |
Wang, Changyu | Anhui University |
Keywords: Quality and Reliability Engineering
Abstract: Data plays a very important role in documents. Although much effort has been made in data processing, little research has been done on data consistency. Data inconsistency may lead to data analysis errors or inaccurate analysis results. In text, mismatches of data referred to by the same label and incompatibility in the implied semantic relationship between data can lead to data errors or data relationship errors. In order to check data consistency of the goal, we rewrite the adaptive automaton algorithm and extend tree automaton to allow the construction of deterministic tree automaton to express data relationships. In the financial report of the Postal Savings Bank of China in the past three years, we use the data in the report to model adaptive automaton models and bottom-up tree automaton, which include financial data, quarterly financial data, and data from the income statement and balance sheet, and used consistency detection algorithm evaluation. Experimental results show that the automated method simplifies the verification process, provides accurate and efficient results, and ensures data consistency and reliability in the text.
|
|
17:45-19:00, Paper We-S4VT7.5 | Add to My Program |
SEAT: A Spatiotemporal Encode-Again Transformer for Traffic Prediction |
|
You, Shengzhe | Zhejiang University of Technology |
Shao, Jianwen | Zhejiang Institute of Metrology |
Zhang, Shifeng | Hangzhou Hikvision Digital Technology Co |
Gao, Fei | Zhejiang University of Technology |
Keywords: Intelligent Transportation Systems, System Modeling and Control
Abstract: Currently, many networks like recurrent neural networks and graph convolution network are paid more attentions for traffic prediction. However, there are still some limitations like lack of consideration of the dynamics between spatial and temporal features, loss of short-term to long-term prediction correlation, and dimensional information destroyed by self-attention. To address these issues, a novel transformer, i.e., Spatiotemporal Encode-Again Transformer (SEAT), is proposed for traffic prediction. In the SEAT, two components, spatial-temporal cross attention and encode-again strategy, are designed to learn spatiotemporal features and capture the relationship among forecasting series. We conducted experiments on several public datasets, METR-LA, PeMS-Bay, and PeMS-S. In particular, SEAT outperforms existing models by up to 6% improvement in RMSE measurement. The experimental results verify that SEAT can better learn the spatiotemporal features and can help lead to more efficient traffic control and management.
|
|
17:45-19:00, Paper We-S4VT7.6 | Add to My Program |
A Consensus Model to Manage Unavailability of Decision-Makers in Group Decision Making |
|
Singh, Manisha | Indian Institute of Technology (BHU) Varanasi |
Baranwal, Gaurav | Institute of Science, Banaras Hindu University, Varanasi |
Tripathi, Anil Kumar | Indian Institute of Technology (BHU) |
Keywords: Decision Support Systems
Abstract: All the known Group Decision Making (GDM) models assume the continuous availability of all DMs during the Consensus Reaching Process (CRP) for achieving consensus. Factually, the constant presence of a DM means that the concerned DMs are interested in adequately contributing to the decision-making process, and the technical support continuously enables their support. However, in a realistic situation, one or more DMs may be unavailable at times in CRP iterations due to technical or non-technical reasons. Hence, working out a model to take care of such absences and eventually make GDM possible is pertinent. This paper considers such a scenario wherein the DMs are sparsely present. The bounded confidence of an individual DM is used to facilitate CRP in evaluating the opinions of the unavailable DMs. We propose to assign weight to a DM based on their cumulative presence in the decision process. Consideration of the opinion of a DM in a particular iteration based on the opinion in the previous iterations in case of the absence of the concerned DM in an implementation shown here is shown to be helpful.
|
|
17:45-19:00, Paper We-S4VT7.7 | Add to My Program |
The DAO to Social Transportation: Towards Smart Mobility in Cyber-Physical-Social Space |
|
Chen, Yuanyuan | Institute of Automation, Chinese Academy of Sciences |
Lv, Yisheng | Institute of Automation, Chinese Academy of Sciences |
Wang, Fei-Yue | Institute of Automation, Chinese Academy of Sciences |
Keywords: Multi-User Interaction, Systems Safety and Security
Abstract: The rapid advancement of intelligence and connectivity technology (ICT) has enhanced the efficiency and safety of intelligent transportation systems (ITS). However, this also increases the complexity of the transportation systems, especially the social complexity, and poses new challenges for their management. This paper discusses the emergence of social transportation, as part of natural evolutionary adaptation to new traffic conditions and technology, which is a paradigm shift from engineering-centered technical systems to society-centered ecosystems. In social transportation systems, all participants including administrators, individual travelers and other stakeholders play a more proactive role in the operation and management of transportation systems. Therefore, we propose to employ a decentralized autonomous organization (DAO) as a model for the social transportation system that allows participants to collaborate and coordinate without relying on a central authority. To meet the challenges brought by this transition, we propose to apply ACPbased parallel system theory to restructure the methodology of transportation management. Finally, we present a case study of personal carbon trading system where economic incentives are used to manage traffic demand, affecting others’ travel behaviors, and reduce carbon emissions. In the conclusive remarks, we provide our visions and future directions of social transportation systems.
|
|
17:45-19:00, Paper We-S4VT7.8 | Add to My Program |
A Development of Time-Varying Weight Model Predictive Control for Autonomous Vehicles |
|
Chalak Qazani, Mohamad Reza | Computer Science and Information Systems (CSIS) |
Asadi, Houshyar | Deakin University |
Shajari, Arian | Deakin University |
Najdovski, Zoran | Deakin University |
Lim, Chee Peng | Deakin University |
Nahavandi, Saeid | Swinburne University of Technology |
Keywords: Modeling of Autonomous Systems, System Modeling and Control, Consumer and Industrial Applications
Abstract: Autonomous vehicles, commonly known as self-driving cars, are rapidly gaining popularity due to their numerous advantages, such as reducing traffic, pollution, and emissions while increasing safety, convenience, and transportation connectivity. In order to accurately track the motion signal, these vehicles are now utilising advanced control techniques, such as model predictive control (MPC). However, the efficiency of MPCs heavily relies on properly tuning their weights. The primary function of the MPC is to recalculate the optimal values for the vehicle control commands, such as desired speed, steering angle, etc., while considering the dynamic model of the autonomous vehicle. The existing linear MPC models cannot reach higher efficiency because of using fixed weights without considering the error. This paper introduces a novel approach for developing an MPC model with a time-varying weights algorithm for autonomous vehicles. The study aims to minimise motion tracking errors such as lateral position and yaw angle errors. Relevant MPC weights are calculated online using fuzzy logic-based units considering the lateral position and yaw angle errors. The proposed linear time-varying MPC was designed and developed using MATLAB software, resulting in improved motion tracking performance with 31.62% and 20.89% reduction of the root means square error of lateral position and yaw angle.
|
|
We-S4VT8 Virtual Session, Room T8 |
Add to My Program |
Deep Learning V-I |
|
|
|
17:45-19:00, Paper We-S4VT8.1 | Add to My Program |
GSWR-DARN: GNSS-R Sea Surface Wind Speed Retrieval Based on Data Augmentation and Residual Network |
|
Zhou, Zhenxiong | National University of Defense Technology |
Keywords: Deep Learning, Neural Networks and their Applications, Machine Learning
Abstract: Global sea surface wind speed is a key parameter for weather forecasting and climate studies. However, retrieving it from Global Navigation Satellite System Reflectometry (GNSS-R) signals, which are reflected by the ocean surface, requires complex data processing and modelling. Moreover, conventional GNSS-R methods have limited accuracy in high wind speed regions. A novel model - GSWR-DARN(GNSS-R Sea Surface Wind Speed Retrieval Based on Data Augmentation and Residual Network) - is presented here that combines data augmentation techniques with residual network to improve the performance of GNSS-R wind speed retrieval. The model transforms one-dimensional (1D) data from the Cyclone Global Navigation Satellite System (CYGNSS) into two-dimensional (2D) data that can be fed into a Convolutional Neural Network (CNN), while also enhancing the interconnectivity between different physical variables. By adding a residual network module, the model achieves higher accuracy and better distribution of wind speed estimates within the 0-20 m/s range than traditional methods. It is shown that the model reduces the average Root Mean Square Error (RMSE) by 22.48% and increases the average Pearson correlation coefficient by 22%. It is also demonstrated that the model reduces the error distribution in high wind speed ranges significantly. The model is compared with other models of varying complexity levels - including Artificial Neural Networks (ANN), CNN model, ResNet18 model, ResNet34 model and Vision-transformer (Vit) model - and it is found that accuracy decreases with an increasing number of residual blocks due to inherent characteristics of CYGNSS data.
|
|
17:45-19:00, Paper We-S4VT8.2 | Add to My Program |
DQMix-BERT: Distillation-Aware Quantization with Mixed Precision for BERT Compression |
|
Tan, Yan | Institute of Information Engineering, Chinese Academy of Science |
Jiang, Lei | Institute of Information Engineering, Chinese Academy of Science |
Chen, Peng | Institute of Information Engineering,Chinese Academy of S |
Tong, Chaodong | Institute of Information Engineering, Chinese Academy of Science |
Keywords: Deep Learning, Neural Networks and their Applications, Application of Artificial Intelligence
Abstract: Transformer-based architecture models like BERT have performed excellently for various Natural Language Processing (NLP) tasks. However, these models are usually computationally expensive with a large number of parameters. As a result, deploying them in edge devices has become a challenging task. The existing compression work on lower-precision quantization still has a severe accuracy decrease and rarely focuses on the information hidden in the different modules of the model. In this paper, we propose a distillation-aware quantization with mixed precision method combined with quantization and knowledge distillation. We achieve the ultra-low mixed precision quantization with the different sensitivity of different modules of BERT. Moreover, we leverage knowledge distillation to reduce the model accuracy degradation. We extensively test our method on four GLUE tasks. It shows that DQMix-BERT outperforms the other BERT compression methods and even achieves comparable performance to the original BERT model while achieving sim8x compression.
|
|
17:45-19:00, Paper We-S4VT8.3 | Add to My Program |
Black-Box Targeted Adversarial Attack Based on Multi-Population Genetic Algorithm |
|
Aiza, Yuuto | Niigata University |
Zhang, Chao | University of Fukui |
Yu, Jun | Niigata University |
Keywords: Deep Learning, Evolutionary Computation, AI and Applications
Abstract: The fast gradient signed method (FGSM) is an efficient white-box attack method that uses the gradient information to generate adversarial examples. However, applying the classic FGSM to real-world applications is often difficult due to the challenge of obtaining the internal structure of the models. Therefore, we have made slight modifications to the conventional genetic algorithm (GA) to effectively optimize the gradient signed function of the classic FGSM and generate adversarial examples from the perspective of the black-box attack. To attack multiple given target classes simultaneously, we initialize multiple different subpopulations and ensure that each subpopulation attacks a specified target class. Additionally, we propose two different strategies to migrate successfully attacked subpopulations into unsuccessful ones to ramp up attacks on unsuccessful classes. To evaluate the performance of the proposed algorithm, we compare it with the conventional GA when attacking the well-trained VGG19_BN model on the CIFAR-10 database. Furthermore, we investigate the impact of the proposed strategies on performance and analyze their respective contributions. The experimental results confirm that the proposed algorithm can successfully attack a greater variety of classes at a faster rate.
|
|
17:45-19:00, Paper We-S4VT8.4 | Add to My Program |
KTPose: Keypoint-Based Tokens in Vision Transformer for Human Pose Estimation |
|
Wang, Jiale | Qingdao University |
Zhang, Xiaowei | Qingdao University |
Wang, Wenjia | Qingdao University |
Keywords: Deep Learning, AI and Applications, Machine Vision
Abstract: Transformers have made remarkable progress on human pose estimation in recent years, however, vision tokens are all in a fixed position, a property unsuitable for unknown human deformation. In this paper, we propose KTPose, a novel keypoint-based tokens in Vision Transformer for human pose estimation, which includes an instance-aware keypoint head and a keypoint refinement with transformer. To address the limb deformation issue, the instance-aware keypoint head is devised to capture the discriminative features dynamically based on the coarse localized keypoints. Further, we propose the multi-granularity vision tokens, in which each keypoint is explicitly embedded as a token to simultaneously learn spatial dependencies and constraint relationships from vision transformer for human pose estimation. Extensive experiments are carried out on two benchmark datasets, which demonstrate that KTPose outperforms state-of-the-art methods and achieves 76.6AP (↑1.06%) and 75.7AP (↑0.93%) on COCO validation and test-dev sets, respectively. This is accomplished with a smaller computational footprint when compared to the current mainstream transformer-based methods. Code is publicly available.
|
|
17:45-19:00, Paper We-S4VT8.5 | Add to My Program |
MiLMo: Minority Multilingual Pre-Trained Language Model |
|
Deng, Junjie | Minzu University of China |
Shi, Hanru | Minzu University of China |
Yu, Xinhe | Minzu University of China |
Bao, Wugedele | Hohhot Minzu College |
Sun, Yuan | Minzu University of China; Minority Languages Branch, National La |
Zhao, Xiaobing | Minzu University of China; Minority Languages Branch, National La |
Keywords: Deep Learning, Machine Learning
Abstract: Pre-trained language models are trained on large-scale unsupervised data, and they can fine-tune the model only on small-scale labeled datasets, and achieve good results. Multilingual pre-trained language models can be trained on multiple languages, and the model can understand multiple languages at the same time. At present, the search on pre-trained models mainly focuses on rich resources, while there is relatively little research on low-resource languages such as minority languages, and the public multilingual pre-trained language model can not work well for minority languages. Therefore, this paper constructs a multilingual pre-trained model named MiLMo that performs better on minority language tasks, including Mongolian, Tibetan, Uyghur, Kazakh and Korean. To solve the problem of scarcity of datasets on minority languages and verify the effectiveness of the MiLMo model, this paper constructs a minority multilingual text classification dataset named MiTC, and trains a word2vec model for each language. By comparing the word2vec model and the pre-trained model in the text classification task, this paper provides an optimal scheme for the downstream task research of minority languages. The final experimental results show that the performance of the pre-trained model is better than the word2vec model, and it has achieved the best results in minority multilingual text classification. The multilingual pre-trained model MiLMo, multilingual word2vec model and multilingual text classification dataset MiTC are published on https://milmo.cmli-nlp.com.
|
|
17:45-19:00, Paper We-S4VT8.6 | Add to My Program |
Memory-Guided Coordinate Encoding Network for Anomaly Detection |
|
Wang, Xingang | Qilu University of Technology(Shandong Academy of Sciences) |
Zhang, Hong | Qilu University of Technology |
Cao, Rui | Qilu University of Technology |
Zhou, Jinyan | Qilu University of Technology |
Lu, Xingchao | Qilu University of Technology |
Keywords: Deep Learning, Machine Vision, Image Processing and Pattern Recognition
Abstract: Video anomaly detection remains challenging due to the complexity of visual scenes. Video surveillance is often fixed lens, and existing approaches, either using autoencoder architectures or generative adversarial network models, lack encoding of dynamic and static information in feature extraction to emphasize information-rich features. Based on this, we improve the encoder architecture and propose a memory-guided coordinate encoding network based on extensive experiments to introduce a coordinate attention module to improve the U-Net network and enhance dynamic entity representation. Considering the diversity of abnormal events, we use the memory module to record the prototype patterns of normal features and propose feature discretization loss and feature aggregation loss to make a compact representation between features and separation between feature terms to improve the accessibility and prediction accuracy of the memory module. Our experimental results on three open standard datasets show that our model outperforms the state-of-the-art methods.
|
|
17:45-19:00, Paper We-S4VT8.7 | Add to My Program |
Adaptive-SpEx: Local and Global Perceptual Modeling with Speaker Adaptation for Target Speaker Extraction |
|
Xu, Xianbo | Ningbo University |
Diqun, Yan | Ningbo University |
Li, Dong | Ningbo University |
Keywords: Deep Learning, Neural Networks and their Applications, AI and Applications
Abstract: Target speaker extraction aims to extract a target speaker’s speech from a multi-talker environment with the help of the target speaker’s reference speech. However, the simple fusion of different features and local perceptual modeling lead to limited extraction performance. In this work, we propose a new speaker extraction model called Adaptive-SpEx. The correlation between mixed speech features and speaker embedding is fully exploited, and a dual-path structure is used for local and global perceptual modeling. We evaluate the model on the WSJ0-2mix-extr dataset in terms of its ability to reconstruct signal quality. Experimental results show that the proposed model outperforms other baseline systems on WSJ0-2mix-extr and achieves better generalizability on the Libri-2talker dataset. Furthermore, the proposed model can significantly reduce the word error rate of mixed speech on speech recognition from 79.49% to 32.73%.
|
|
We-S5VT1 Virtual Session, Room T1 |
Add to My Program |
General Cybernetics VII |
|
|
|
19:00-20:00, Paper We-S5VT1.1 | Add to My Program |
Genetic Algorithm with Dynamic Fitness Sharing Niching Method for Multimodal Opinion Maximization Problem |
|
Wan, Rong | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Shi, Xuanli | South China University of Technology |
Geng, Mingcan | South China University of Technology |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Complex Network
Abstract: Social networks built on real or online provide platforms for people to share and update their opinions. Based on the influence maximization problem, the opinion maximization (OM) problem aims to locate a set of initial nodes to achieve the maximum of total positive opinion dissemination in the social network. However, decision-makers prefer to have multiple optimal or near-optimal solutions at hand, which brings up the multimodal OM problem. In this paper, we firstly define the multimodal OM problem, which aims at providing several promising sets of initial nodes at the same time. To solve this problem, we propose a genetic algorithm with a dynamic fitness sharing method (GADN). In GADN, we take the dynamic fitness sharing method to divide the population dynamically, design a repair strategy to fill the possible gap in sets after crossover, and take a scalable reproduction to dynamically assign reproductive opportunities. Finally, we conduct a series of experiments on multiple social networks. The results show that the proposed GADN outperforms other methods on the opinion ratio and active ratio of nodes in most cases.
|
|
19:00-20:00, Paper We-S5VT1.2 | Add to My Program |
A Nash-Based Evolutionary Algorithm for Dynamic Optimization in Multi-Target UAV Tracking |
|
Zhu, Rui | South China University of Technology |
Chen, Tai-You | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Evolutionary Computation, Metaheuristic Algorithms, Optimization and Self-Organization Approaches
Abstract: Target tracking and path planning using un- manned aerial vehicles (UAVs) have attracted increasing re- search attention in recent years. The rapid development of communication technology enables the use of multiple UAVs to perform target tracking collaboratively. But it remains challeng- ing to coordinate multiple UAVs in some complicated scenarios, e.g., tracking multiple targets using multiple UAVs. In this paper, we intend to propose a Nash-based evolutionary dynamic optimization algorithm for multi-target tracking using multiple UAVs. Firstly, considering the requirement of balancing the number of UAVs tracking each target, we formulate the tracking problem as a distributed constrained multi-objective dynamic optimization problem using model predictive control (MPC). Secondly, to better track dynamic targets with stochastic behaviors, we design an evolutionary dynamic optimization (EDO) approach to solve the optimization problem. Thirdly, in order to avoid collisions, we combine the EDO approach with Nash optimization. The experimental results show that our approach has better performance than compared algorithms.
|
|
19:00-20:00, Paper We-S5VT1.3 | Add to My Program |
Influence Maximization with Reverse Influence Sampling and Evolutionary Algorithm |
|
Du, Yinghao | South China University of Technology |
Qiu, Wen-Jin | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Metaheuristic Algorithms, Complex Network, Evolutionary Computation
Abstract: Identifying influential nodes in social networks is an important problem called the influence maximization (IM) problem. So far, a large number of IM algorithms have been proposed. Among these algorithms, meta-heuristic approaches such as evolutionary algorithms (EAs) can obtain high-quality solutions. But in general, they usually suffer from time efficiency problems and are designed only for a few diffusion models. In this paper, we propose a novel EA combined with the reverse influence sampling (RIS) to solve the IM problem. By introducing the RIS technique, we can evaluate influence spreading efficiently under various diffusion models using hypergraphs. Moreover, the hypergraphs are also used as a kind of high-level heuristic information. To combine the RIS technique with EA, we exploit the idea of RIS to design a surrogate model and decide to address the single-objective IM problem in a multi-objective way. Then we modify the classical NSGA-II algorithm and apply it to this strategy. Our experimental results on million-scale social networks validate the good performance of the proposed approach.
|
|
19:00-20:00, Paper We-S5VT1.4 | Add to My Program |
M2SH: A Hybrid Approach to Table Structure Recognition Using Two-Stage Multi-Modality Feature Fusion |
|
Zhang, Weilong | Nanjing University of Science and Technology |
Zhang, Chongyang | Nanjing University of Science and Technology |
Ning, Zhihan | The Chinese University of Hong Kong (Shenzhen) |
Wang, Guopeng | Linklogis |
Bai, Yingjie | Linklogis |
Jiang, Zhixing | Qilu University of Technology |
Zhang, David | The Chinese University of Hong Kong (Shenzhen) |
Keywords: Machine Vision, Deep Learning, AI and Applications
Abstract: Automatically recovering the original structure of tables from unstructured images is a challenging task, combining techniques from computer vision (CV) and natural language processing (NLP). Unfortunately, common feature extraction methods, naive fusion strategies, and rigid inductive biases have become roadblocks to the effective improvement of previous approaches. Distinguished from other modes of data representation, tables consist of many dispersed cells that are interdependent. Therefore, in this paper, we aim to propose a novel approach for recognizing table structures by mining the special properties of tables. The method begins by utilizing the adaptive fusion method to fuse visual and textual features acquired through a two-stream network. In the second stage, the layout features will be seamlessly integrated using a Kronecker-based strategy. The table elements with multi-modality features are then modeled based on spatial relationships. Interactions among them are established by a hybrid contextual aggregator that allows message passing at both local and global levels. Finally, table structure recognition is achieved by predicting the relationship between elements. We meticulously evaluate the proposed approach on various public datasets, including ICDAR2013, UNLV, WTW, SciTSR, and SciTSR-COMP, as well as a more complicated private dataset. The proposed method performs excellently on these datasets.
|
|
19:00-20:00, Paper We-S5VT1.5 | Add to My Program |
Enhancing the Discriminative Ability for Multi-Label Classification by Handling Data Imbalance |
|
Lim, Jin-Ha | Korea University |
Oh, Myeong Seok | Korea University |
Lee, Seong-Whan | Korea University |
Keywords: Deep Learning, Machine Learning
Abstract: In computer vision, long-tailed multi-label visual recognition is a challenging problem due to the imbalance between classes and the recognition of rare classes. Previous methods for resolving data imbalance mostly originate from single-label classification, which can be obstructed by the label co-occurrence issue, and they attempt to minimize or compensate for it. In this paper, we propose a novel tail class priority sampling method for long-tailed multi-label classification that untangles both issues. Our method samples tail class more often and earlier in order to make the model learn tail classes before it has bias toward common classes. Due to label co-occurrence, other classes will be spontaneously learned in same iteration, ensuring a balanced representation of the head and medium classes. To further enhance the recognition performance, we modify to a bilateral structure that samples both original and proposed sampling distribution to better represent the tail classes. We evaluate our proposed method on two widely used datasets in long-tailed version, COCO-LT and VOC-LT, and compare it with previous methods. The experimental results show that our method achieves a new state-of-the-art performance for tail classes on both datasets. Our method is applicable in various real-world scenarios, making rare class recognition achievable, and can be easily incorporated into conventional recognition frameworks.
|
|
19:00-20:00, Paper We-S5VT1.6 | Add to My Program |
Hierarchical Cross-Scale Graph Reasoning Network for Retinal Vessel Segmentation (I) |
|
Mo, Shaocong | Zhejiang University |
Keywords: Image Processing and Pattern Recognition, Deep Learning
Abstract: Retinal vessel segmentation holds considerable importance in clinical diagnostics for microvascular and ophthalmological diseases. Deep learning-based methodologies have demonstrated impressive results in this task. However, challenges persist due to the complex structures exhibiting scale variation and the limited capacity of CNN models for long-range relationships. In this paper, we propose a U-shaped network that incorporates our proposed hierarchical cross-scale graph reasoning module for retinal vessel segmentation. After encoding, the features are transformed from spatial domain to graph domain with different scale setting to capture various long-range features and model correlations global contextual information between graphs at different scales. Graph convolution networks are utilized to propagate node features in intra-graph and cross-scale graph interaction is adopted to reason and enhance features for target graph. After that, we concatenate all features transformed back from the graph domain and decode. We evaluated our proposed method on two widely-used datasets DRIVE, CHASE_DB1, and compared it with multiple approaches. Experimental results demonstrate the effectiveness of our proposed approach in retinal vessel segmentation.
|
|
19:00-20:00, Paper We-S5VT1.7 | Add to My Program |
A Node-Collaboration-Informed Graph Convolutional Network for Precise Representation to Undirected Weighted Graphs |
|
Wang, Ying | Chongqing University of Posts and Telecommunications |
Yuan, Ye | Southwest University |
Wu, Di | Southwest University |
Keywords: Representation Learning, Machine Learning, Deep Learning
Abstract: An undirected weighted graph (UWG) is frequently adopted to describe the interactions among a solo set of nodes from big data-connected applications like the user contact frequency from a social network services system. A graph convolutional network (GCN) is widely adopted to perform representation learning to a UWG for subsequent pattern analysis tasks such as clustering or missing data estimation. However, existing GCNs mostly neglects the latent collaborative information hidden in its connected node pairs. To address this issue, this study proposes to model the node collaborations via a symmetric latent factor analysis model, which is thus incorporated into a GCN model as a node-collaboration module for supplementing the collaboration loss. Based on this essential idea, a Node-collaboration-informed Graph Convolutional Network (NGCN) model is proposed with three-fold ingredients: a) Learning latent collaborative information from the interaction of node pairs via a node-collaboration module; b) Building the residual connection and weighted representation propagation to obtain high representation capacity; and c) Implementing the model optimization in an end-to-end fashion to achieve precise representation to the target UWG. Empirical studies on four UWGs emerging from real applications demonstrate that owing to its efficient modeling of node-collaborations, the proposed NGCN significantly outperforms state-of-the-art GCNs in addressing the task of missing weight estimation. Meanwhile, its high scalability ensures its compatibility with more advanced GCN extensions, which will be further investigated in our future studies.
|
|
19:00-20:00, Paper We-S5VT1.8 | Add to My Program |
Adapting Energy Management Strategies for Hybrid Electric Vehicles in Dynamic Driving Cycles through Recurrent Policy |
|
Guo, Qinghong | SiChuan University |
Lian, Renzong | Tsinghua University |
Ma, Kuo | SiChuan University |
Hou, Luyang | Beijing University of Posts and Communications |
Wu, Yuankai | Sichuan University |
Keywords: AI and Applications, Deep Learning, Neural Networks and their Applications
Abstract: Deep Reinforcement Learning (DRL) techniques have shown promising results in developing proficient energy management strategies (EMSs) for hybrid electric vehicles (HEVs). However, the current DRL algorithms are often trained on a limited number of standard driving cycles, which do not necessarily represent the highly variable and diverse driving conditions encountered by HEVs in the real world. Therefore, a significant current research objective is to develop DRL methods that can adapt to new and arbitrary driving cycles. Recent studies have shown that DRL algorithms based on Recurrent Neural Networks (RNNs) can better address this challenge by taking into account the temporal dependencies of environmental state. Building upon this insight, we propose an EMS based on a recurrent policy that is trained on a large number of random driving cycles to enhance the generalization of energy management strategies to arbitrary driving cycles. We evaluate the proposed method on a classic HEV, namely the Prius, and demonstrate that the recurrent policy significantly outperforms traditional DRL methods in terms of fuel economy. Furthermore, the recurrent policy is shown to be more robust and generalizable to variable driving cycles.
|
|
We-S5VT2 Virtual Session, Virtual Room T2 |
Add to My Program |
Cybernetics General V-II |
|
|
|
19:00-20:00, Paper We-S5VT2.1 | Add to My Program |
Model Predictive Control for Carbon-Neutral Data Centers |
|
Ren, Guanyu | Inner Mongolia University of Technology |
Wan, Jianxiong | Inner Mongolia University of Technology |
Li, Leixiao | Inner Mongolia University of Technology |
Liu, Chuyi | Inner Mongolia University of Technology |
Wang, Xiaolei | Inner Mongolia University of Technology |
Keywords: Intelligent Internet Systems, Agent-Based Modeling, Cloud, IoT, and Robotics Integration
Abstract: As one of the major carbon producers, data centers produce around 1% of global carbon emissions per year.Researchers are making significant effort to reduce the data center carbon emissions. However, the current carbon-neutral data center solutions seldom take the carbon quotas into account, nor do they integrate new emission reduction technologies like Carbon Capture (CC) and Power-to-Gas (P2G), etc. To bridge this gap, in this paper a novel carbon-neutral data center architecture is proposed based on which a holistic cost minimizing problem is formulated. We use the Variational Mode Decomposition (VMD) and the Long Short-Term Memory (LSTM) neural network to construct highly accurate prediction models, and develop a Model Predictive Control (MPC) algorithm for energy and carbon management. Finally, simulations on real-world data demonstrate that our approach reduces up to 19.19% overall cost compared with traditional solutions.
|
|
19:00-20:00, Paper We-S5VT2.2 | Add to My Program |
Hidden Priors for Bayesian Bidirectional Backpropagation |
|
Kosko, Bart | University of Southern California |
Adigun, Olaoluwa | University of Southern California |
Keywords: Deep Learning, Machine Learning
Abstract: Non-uniform prior probabilities between hidden layers improved deep neural classifiers trained with bidirectional backpropagation. The resulting Bayesian bidirectional backpropagation algorithm jointly maximizes the forward and backward network likelihoods along with the weight priors. The backward direction exploits a hidden regression that ordinary unidirectional backpropagation ignores. Simulations compared Laplacian, Gaussian, Cauchy, and the new emph{sinc-squared} hidden priors on the CIFAR-10 and CIFAR-100 balanced image data sets. These hidden priors improved the classification accuracy of deep neural classifiers compared with default uniform priors and default unidirectional backpropagation. They did so at little extra computational cost. Sinc-squared and Cauchy multivariate priors often had the best classification accuracy. Cauchy hidden priors gave sparse hidden weights similar to the Laplacian priors associated with sparse lasso regression.
|
|
19:00-20:00, Paper We-S5VT2.3 | Add to My Program |
Majority Problems: Formal Study and Practical Resolution |
|
Godoy, Aitor | Universidad Complutense |
Rodríguez, Ismael | Universidad Complutense De Madrid |
Rubio, Fernando | Universidad Complutense |
Keywords: Soft Computing, Socio-Economic Cybernetics, AI and Applications, Computational Intelligence
Abstract: How much power does a political party really have, compared to another party? In principle, the answer might seem obvious, since power seems to be directly proportional to its number of voters (or its number of deputies). However, in reality, the power of a party depends on its ability to form government majorities. In this paper we will demonstrate the computational hardness (in particular, #P-hardness) of determining the relative power of a party in different settings, and provide practical algorithms to compute it. We will use our algorithms to face a case study where we will compare the relative power of parties in real elections. Although we will use political parties as an illustration, the problem is equally applicable to any multi-agent voting system, and this type of environment poses the greatest difficulties.
|
|
19:00-20:00, Paper We-S5VT2.4 | Add to My Program |
Relation Extraction with Knowledge-Enhanced Prompt-Tuning on Multimodal Knowledge Graph |
|
Ming, Yan | Qilu University of Technology(Shandong Academy of Sciences) |
Shang, Yong | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Huiting | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Knowledge Acquisition
Abstract: Recently, Multimodal Knowledge Graphs (MKGs) with visual and textual factual knowledge have been widely used in tasks such as knowledge question answering, recommender systems, and entity disambiguation. Since most of the current MKGs still have defects, a multimodal knowledge graph completion technology is proposed, and multimodal relation extraction (MRE) is one of the basic processes. However, visual objects with high object classification scores are usually selected in previous tasks, which may result in the addition of noise from objects that are either irrelevant or redundant, which can adversely affect multimodal relationship extraction. For this reason, in this paper, we propose a Relation Extraction with Knowledge-enhanced Prompt-tuning modal on multimodal knowledge graph (REKP) to address these issues. Specifically, we inject potential knowledge from relational labels into the prompt construction of answer words and optimize their representation with structured constraints. A Transformer architecture with cross-modal attention is then used to fuse the visual and textual representations. We conduct extensive experiments to verify that our REKP model can achieve SOTA performance on the MNRE dataset with multimodal relational extraction.
|
|
19:00-20:00, Paper We-S5VT2.5 | Add to My Program |
Local Entropy Based Fuzzy Connectedness Segmentation for Thyroid Ultrasound Images |
|
Fu, Yixuan | South China University of Technology |
Kai, Li | The Third Affiliated Hospital of Sun Yat-Sen University |
Chen, Junying | South China University of Technology |
Keywords: Hybrid Models of Neural Networks, Fuzzy Systems, and Evolutionary Computing, Fuzzy Systems and their applications, Image Processing and Pattern Recognition
Abstract: Although deep learning methods are popular in recent years to achieve dominant image segmentation performance, the carefully annotated training images are very difficult to obtain in clinical medicine, and the deep learning model training demands a lot of computing resources which are not always available. To overcome these difficulties, it is very necessary to investigate segmentation algorithms which do not require any training data. In this work, we improve a semi-automatic conventional segmentation algorithm, fuzzy connectedness segmentation, by introducing the local information entropy of the ultrasound image into the fuzzy affinity membership function. The proposed semi-automatic local entropy based fuzzy connectedness (LEFC) segmentation algorithm is further modified by incorporating the popular deep learning methods to make the LEFC algorithm fully automatic. The proposed semi-automatic LEFC algorithm achieves better performance than other representative fuzzy connectedness segmentation algorithms with the accuracy and IoU values increased by 5.9~14% and 5~10.5%, and achieves very close segmentation performance as compared with the best deep learning model in the experiments (0.2% lower accuracy and 0.2% lower IoU compared to CE-Net). Besides, the automatic LEFC algorithm improves the segmentation results obtained by the corresponding deep learning based coarse segmentation modules with the accuracy and IoU increased by 0.2~8% and 0.1~3.8%, respectively. Furthermore, the experiment results demonstrate that the proposed automatic LEFC algorithm obtains stably good segmentation performance even with limited training data, achieving the accuracy and IoU values (at the stable state) 23.6% and 18.5% higher than those of the corresponding U-Net model which is trained with the same-sized dataset.
|
|
19:00-20:00, Paper We-S5VT2.6 | Add to My Program |
Semi-Supervised Network for Thyroid Nodule Segmentation Via Joint Consistency Learning and Co-Training |
|
Dai, Li | Shandong Normal University |
Zhang, Guijuan | Shandong Normal University |
Lyu, Lei | Shandong Normal University |
Keywords: BMI Emerging Applications
Abstract: Thyroid nodule is a common clinical disease, although most nodules are benign, the incidence rate of thyroid cancer has risen rapidly in recent years. Even though many methods have achieved automated thyroid nodule segmentation based on deep learning, these methods are based on supervised learning and require a large amount of labeled data for training. However, the labeling work must be carried out by professional doctors, which results in a small number of datasets and difficulty in labeling. To address this problem, this paper proposes a semi-supervised thyroid nodule segmentation model via joint consistency learning and co-training. This model includes two branches: consistency learning and co-training. In the consistency learning branch, based on consistent regularization, the teacher model guides the student model to optimize. In order to make the teacher model more stable, we design a co-training framework to further optimize the teacher model. In co-training branch, the teacher model and TransUNet extract different representations of the same sample and teach each other to prevent consistent but incorrect predictions between the teacher model and the student model. This semi-supervised model can learns useful feature representations from unlabeled data, and effectively trains the model with a small amount of labeled data, reducing the dependence on labeled data during the model training process.
|
|
19:00-20:00, Paper We-S5VT2.7 | Add to My Program |
Improving Human-Robot Collaboration in TV Assembly through Computational Ergonomics: Effective Task Delegation and Robot Adaptation (I) |
|
Olivas Padilla, Brenda Elizabeth | Mines Paris, Université PSL |
Papanagiotou, Dimitris | Centre for Robotics, Mines Paris, Université PSL |
Senteri, Gavriela | Centre for Robotics, Mines Paris, Université PSL |
Manitsaris, Sotiris | Centre for Robotics, Mines Paris, Université PSL |
Glushkova, Alina | Centre for Robotics, Mines Paris, Université PSL |
Keywords: AI and Applications, Expert and Knowledge-Based Systems, Cloud, IoT, and Robotics Integration
Abstract: The high prevalence of work-related musculoskeletal disorders (WMSDs) could be addressed by optimizing Human-Robot Collaboration (HRC) frameworks. In this context, this paper proposes a methodology for ergonomically effective task delegation and HRC for manufacturing applications based on two hypotheses. The first hypothesis states that it is possible to rapidly quantify ergonomically professional tasks using motion data from a few wearable sensors and then delegate the high-risk tasks to a collaborative robot. The second hypothesis is that, compared to typical HRC frameworks involving physical interaction, the ergonomics and safety of an HRC scenario can be enhanced by combining gesture recognition and pose estimation. These remove unnecessary motions that could expose operators to ergonomic risks, decrease the amount of physical effort, and make the robot aware and responsive to the presence and gestures of the operators. The methodology is evaluated by optimizing the HRC scenario of a television manufacturing process, yet it is described how it can be reconfigured for other industrial scenarios. The effect of the temporal and spatial adaptation on the operator's range of motion was analyzed through three separate experiments. The effectiveness of HRC is measured through the standard key performance indicators (KPIs); however, to evaluate the collaboration and required physical demand, two KPIs are proposed in this paper. These are the rate of spatial adaptation and the rate of reduction in the operator's motion. The results demonstrated that the methodology enhanced the ergonomics and efficiency of the production process. First, the robot was delegated two tasks identified as the most ergonomically dangerous for human operators. Then the optimized HRC achieved an average rate of spatial adaptation of 29.37% and a decrease in operator movement of 28.8% across 14 subjects compared to HRC frameworks that do not include spatial and temporal adaptation.
|
|
We-S5VT4 Virtual Session, Room T4 |
Add to My Program |
Evolutionary Computation, Other Metaheuristic Algorithms, and Computational
Intelligence in General |
|
|
|
19:00-20:00, Paper We-S5VT4.1 | Add to My Program |
Pricing Research on Spatial Crowdsourcing Tasks under Incompletely Uncertain Scene Information |
|
Lin, Weida | Harbin Engineering University |
Dong, Hongbin | University of Harbin Engineering |
Keywords: Computational Intelligence
Abstract: Task pricing is an important step for crowdsourcing platforms to solve profit-driven task allocation and maximize profits. Most of the existing researches only carry out algorithm design on the premise of fully determining the scene information. However, due to the interference of many factors in the real scene, information such as workers and task costs in the scene is usually not completely uncertain. To solve the above problems, a spatial crowdsourcing task pricing algorithm is proposed. Firstly, the algorithm uses the proposed improved gray wolf algorithm and support vector regression to predict the task price, and then sets the price based on the obtained price. In order to solve the instability of average price and matching number caused by dynamic supply and demand, an adjustment mechanism is designed to stabilize the average price of tasks. The experiment uses a real data set—the New York taxi data set, and compares it with the classic greedy algorithm and binary matching algorithm. The experimental results show that the matching rate of the proposed algorithm is above 75% when the scene information is not completely uncertain.
|
|
19:00-20:00, Paper We-S5VT4.2 | Add to My Program |
Surrogate-Assisted Evolutionary Optimization Based on Interpretable Convolution Network |
|
Jiang, Wenxiang | Tongji University |
Xu, Lihong | Tongji University |
Keywords: Evolutionary Computation, Deep Learning
Abstract: When performing evolutionary optimization for computationally expensive objective, surrogate-assisted evolutionary algorithm(SAEA) is an effective approach. However, due to the limited availability of data in these scenarios, it can be challenging to create a highly accurate surrogate model, leading to reduced optimization effectiveness. To address this issue, we propose an Interpretable Convolution Network(ICN) for offline surrogate-assited evolutionary optimization. ICN retains the non-linear expression ability of traditional neural networks, while possessing the advantages of clear physical structure and the ability to incorporate prior knowledge during network parameter design and training process. We compare ICN-SAEA with tri-training method(TT-DDEA) and model-ensemble method(DDEA-SA) in several benchmark problems. Experimental results show that ICN-SAEA is better in searching optimal solution than compared algorithms.
|
|
19:00-20:00, Paper We-S5VT4.3 | Add to My Program |
A Two-Stage Constrained Multi-Objective Evolutionary Algorithm for DNA Encoding Problem |
|
Zhang, Xinbo | Wuhan University of Science and Technology |
Zhang, Kai | Wuhan University of Science and Technology |
Wu, Ni | Wuhan University of Science and Technology |
Duan, Hengyu | Wuhan University of Science and Technology |
Keywords: Evolutionary Computation, Computational Life Science, Computational Intelligence
Abstract: In recent years, DNA computing model has gradually attracted attention due to its low energy consumption, high capacity of storing information and good parallelism. DNA computational model is calculated by DNA molecule as the medium, so its core is to design a high quality DNA sequence conforming to various constraints. Designing DNA sequences that meet a series of constraints, such as temperature, H-measure, and continuity, is a typical multi-objective optimization problem. In traditional multi-objective optimization problems, various fitness functions are usually only related to their own solutions, and have no correlation with other redundant candidate solutions. Based on the DNA coding problem’s characteristics,We propose a two-stage constrained multi-objective evolutionary algorithm. Our algorithm overcomes shortcomings of traditional algorithms in solving DNA coding problems which are easy to fall into local optimal solutions. Experimental results demonstrate that our algorithm is effective and reliable in solving DNA coding problems when compared to other mainstream algorithms from recent years.
|
|
19:00-20:00, Paper We-S5VT4.4 | Add to My Program |
A Three-Stage Adaptive Hybrid Algorithm for Flexible Job Shop Scheduling Problem |
|
Wu, ChongRui | NingBo University |
Xie, ZhiJun | Ningbo University |
Chen, KeWei | NingBo University |
Xin, Yu | NingBo University |
Zarei, Roozbeh | Deakin University |
Xie, YunTao | New South Wales University |
Keywords: Metaheuristic Algorithms, Heuristic Algorithms, Evolutionary Computation
Abstract: The flexible job shop scheduling problem (FJSP) is a complex problem with significant applications in modern manufacturing. While meta-heuristic algorithms have been used to solve FJSP, they often converge to local optima, especially as the problem size increases. To address this problem, we propose an adaptive hybrid algorithm with three stages of "explore-exploit-escape" (E3HA). In the first stage, we design a Simplified Variable Neighborhood Search (Sim-VNS) algorithm and introduce a simplified Nopt1 neighborhood for extensive exploration of solution spaces. In the second stage, we introduce the crossover operation from genetic algorithms to better exploit the elite solutions obtained in the first stage, we also use mutation operations to improve the quality of regular solutions. Finally, in the third stage, we design a hybrid mechanism of Reverse Learning with Path Relinking (RLPR) and introduce a critical path neighborhood structure to improve the ability of solutions to escape from the local optima and prevent premature convergence. To evaluate the effectiveness of each stage of the proposed algorithm, we perform ablation experiments and test the algorithm on all BRdata and Fdata instances, comparing its performance to relevant existing state-of-the-art algorithms. The experimental results demonstrate the effectiveness and stability of our algorithm in solving the FJSP.
|
|
19:00-20:00, Paper We-S5VT4.5 | Add to My Program |
Improved Mayfly Algorithm Based on Co-Evolution for Feature Selection |
|
Guo, Tianyu | Harbin Engineering University |
Dong, Hongbin | University of Harbin Engineering |
Zhou, Jing | Harbin Engineering University |
Keywords: Evolutionary Computation, Heuristic Algorithms
Abstract: Mayfly algorithm (MA) is a new type of intelligent optimization algorithm with excellent search ability and a broad application prospect in feature selection. However, as the dimension of the data increases, MA also has the problem of weak search ability and falling into the local optimal solution. To solve these problems, this paper combines the idea of collaborative evolution and proposes an improved Mayfly algorithm (CEIMA) suitable for feature selection problems. The CEIMA algorithm first divides the population into two subpopulations and initializes them in different ways to increase the diversity of the population. During evolution, information sharing is achieved through an information transmission mechanism between sub-populations, achieving collaborative evolution between sub-populations to enhance the global search ability of CEIMA. Finally, the mRMR-m strategy is proposed to mutate the global optimal individual based on feature importance, improving the ability of CEIMA to escape from local optimal solutions. Through a comparison with 7 feature selection algorithms on 12 UCI datasets, the effectiveness of CEIMA is proven.
|
|
19:00-20:00, Paper We-S5VT4.6 | Add to My Program |
An End-To-End Mandarin Audio-Visual Speech Recognition Model with a Feature Enhancement Module |
|
Wang, Jinxin | Ocean University of China |
Yang, Chao | University of Technology Sydney |
Guo, Zhongwen | Ocean University of China |
Li, Xiaomei | Ocean University of China |
Wang, Weigang | Ocean University of China |
Keywords: Multimedia Computation, Image Processing and Pattern Recognition, Deep Learning
Abstract: Compared to relying only on audio information, incorporating visual information improves speech recognition accuracy in noisy environments. Existing works are prone to design specific architecture for feature extraction, neglecting feature enhancement. In this paper, we propose an End-to-End Mandarin Audio-Visual Speech Recognition Model with a Feature Enhancement Module. Specifically, we design a Feature Enhancement Module (FEM) that uses deconvolution and upsampling to obtain the twin enhanced data for generating high-resolution feature representation. We further develop the Visual Feature Enhancement Module (Visual FEM) and Audio Feature Enhancement Module (Audio FEM) to enhance feature extraction from both visual data and audio data. We incorporate the proposed modules into the blocks of the Residual Network for accurate audio-visual speech recognition. We conducted experiments on the CAS-VSR-W1k and Chinese Mandarin Lip Reading (CMLR) datasets. The experimental results show that the proposed method outperforms the selected competitive baselines and the state-of-the-art, indicating the superiority of our proposed modules.
|
|
19:00-20:00, Paper We-S5VT4.7 | Add to My Program |
ShapeRef: A Representation Method of Industrial Abnormal Time-Series Waveform Based on Shape Reference |
|
Shi, Lin | Institute of Software Chinese Academy of Sciences |
Zhang, ChangYou | Laboratory of Parallel Software and Computational Science, Insti |
Yang, Shuai | Institute of Software Chinese Academy of Sciences |
Wu, Wenjia | Institute of Software Chinese Academy of Sciences |
Bo, Wen | Institute of Software Chinese Academy of Sciences |
Ma, Ji | Institute of Software Chinese Academy of Sciences |
Keywords: Computational Intelligence, Knowledge Acquisition, Representation Learning
Abstract: Time-series waveform data widely exist in various industrial fields, such as equipment monitoring and fault diagnosis. The current time series representation methods have limitations when dealing with industrial abnormal time-series waveforms, such as limited applicability, semantic ambiguity, and time distortion. This work proposes a novel shape reference-based representation method for industrial abnormal time-series waveform (ShapeRef), which takes the shape of the standard waveform as a reference to represent the anomaly deviation. Specifically, ShapeRef first establishes a time-series shape reference frame, then proposes the minimum shape difference-based mapping method to describe the mapping process of coordinates, and finally reduces multi-intersection points in the mapping process to achieve uniform mapping of the abnormal time-series waveform. Experimental results show that ShapeRef can effectively represent abnormal time-series waveforms and outperforms several baseline methods in the clustering task of a real industrial equipment waveform dataset. This work enhances the accuracy and reliability of industrial equipment monitoring and fault diagnosis, which could have significant practical implications.
|
|
We-S5VT5 Virtual Session, Room T5 |
Add to My Program |
Machine Learning |
|
|
|
19:00-20:00, Paper We-S5VT5.1 | Add to My Program |
RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding |
|
Yuan, Zhengqing | Anhui Polytechnic University |
Zhang, Xiaolong | Anhui Polytechnic University |
Wang, Yue | Anhui Polytechnic University |
Hou, Xuecong | Anhui Polytechnic University |
Xue, Huiwen | Soochow University |
Zhao, Zhuanzhe | Anhui Polytechnic University |
Liu, Yongming | Anhui Polytechnic University |
Keywords: Machine Learning, Deep Learning
Abstract: Data augmentation is a widely used technique in machine learning to improve model performance. However, existing data augmentation techniques in natural language understanding (NLU) may not fully capture the complexity of natural language variations, and they can be challenging to apply to large datasets. This paper proposes the Random Position Noise (RPN) algorithm, a novel data augmentation technique that operates at the word vector level. RPN modifies the word embeddings of the original text by introducing noise based on the existing values of selected word vectors, allowing for more fine-grained modifications and better capturing natural language variations. Unlike traditional data augmentation methods, RPN does not require gradients in the computational graph during virtual sample updates, making it simpler to apply to large datasets. Experimental results demonstrate that RPN consistently outperforms existing data augmentation techniques across various NLU tasks, including sentiment analysis, natural language inference, and paraphrase detection. Moreover, RPN performs well in low-resource settings and is applicable to any model featuring a word embeddings layer. The proposed RPN algorithm is a promising approach for enhancing NLU performance and addressing the challenges associated with traditional data augmentation techniques in large-scale NLU tasks. Our experimental results demonstrated that the RPN algorithm achieved state-of-the-art performance in all seven NLU tasks, thereby highlighting its effectiveness and potential for real-world NLU applications.
|
|
19:00-20:00, Paper We-S5VT5.2 | Add to My Program |
Cosine Similarity Based Representation Learning for Adversarial Imitation Learning |
|
Zhang, Xiongzhen | School of Computer Science and Technology, Soochow University, S |
Liu, Quan | School of Computer Science and Technology, Soochow University, S |
Zhang, Lihua | School of Computer Science and Technology, Soochow University, S |
Keywords: Machine Learning, Expert and Knowledge-Based Systems, Representation Learning
Abstract: Adversarial imitation learning (AIL) aims to recover the reward signal from expert demonstrations and learn expert policy by employing reward and reinforcement learning. However, the raw state-action features of the demonstrations usually have redundant information for a particular control task, and therefore the reward learned from the raw features is often biased, which eventually results in low sample efficiency and instability in AIL. To address these issues, we present CSAIL: Cosine Similarity based Adversarial Imitation Learning. CSAIL extracts expert policy representations from demonstrations via a novel cosine similarity based loss and recovers a robust and unbiased reward function by the learned representations. Based on the reward, CSAIL mimics the expert policy by the Wasserstein distance optimization method. Experimental results show that CSAIL outperforms existing state-of-the-art AIL methods on challenging Mujoco robot control and autonomous driving tasks.
|
|
19:00-20:00, Paper We-S5VT5.3 | Add to My Program |
Meta Pattern Concern Score: A Novel Evaluation Measure with Human Values for Multi-Classifiers |
|
Wang, Yanyun | The University of Hong Kong |
Du, Dehui | East China Normal University |
Liu, Yuanhao | East China Normal University |
Keywords: Machine Learning, Deep Learning, AI and Applications
Abstract: While advanced classifiers have been increasingly used in real-world safety-critical applications, how to properly evaluate the black-box models given specific human values remains a concern in the community. Such human values include punishing error cases of different severity in varying degrees and making compromises in general performance to reduce specific dangerous cases. In this paper, we propose a novel evaluation measure named Meta Pattern Concern Score based on the abstract representation of probabilistic prediction and the adjustable threshold for the concession in prediction confidence, to introduce the human values into multi-classifiers. Technically, we learn from the advantages and disadvantages of two kinds of common metrics, namely the confusion matrix-based evaluation measures and the loss values, so that our measure is effective as them even under general tasks, and the cross entropy loss becomes a special case of our measure in the limit. Besides, our measure can also be used to refine the model training by dynamically adjusting the learning rate. The experiments on four kinds of models and six datasets confirm the effectiveness and efficiency of our measure. And a case study shows it can not only find the ideal model reducing 0.53% of dangerous cases by only sacrificing 0.04% of training accuracy, but also refine the learning rate to train a new model averagely outperforming the original one with a 1.62% lower value of itself and 0.36% fewer number of dangerous cases.
|
|
19:00-20:00, Paper We-S5VT5.4 | Add to My Program |
Deep Reinforcement Learning for Large-Scale TSP Graph |
|
Yang, Hua | Tsinghua University |
Keywords: Machine Learning, Optimization and Self-Organization Approaches, Deep Learning
Abstract: Last few years, the Transformer network architecture has had better performance than both Convolutional Neural Networks (CNN) and Recursive Neural Networks (RNN). The Vision Transformer is better than CNN in Computer Vision, and the original Transformer is far ahead in Natural Language Processing. Nevertheless, in combinatorial optimization, the Transformer can barely handle some combinatorial optimization problems such as the large-scale Traveling Salesman Problem (TSP). Therefore, we design a more straightforward Transformer-based network structure, termed TSP Transformer, to deal with large-scale Traveling Salesman Problems. To better handle tasks in combinatorial optimization, we have made improvements to the Transformer network structure. We train the TSP Transformer network architecture to predict a distribution over different city permutations with the input of a set of the city node graph coordinates, use negative tour length as the reward and optimize the parameters of the TSP Transformer network using a policy gradient method. The extensive experimental results show that the TSP Transformer network structure can increase the effect by five times compared with the previous work of other authors, and the optimal ratio gap has been reduced from 1.22% to 0.24%.
|
|
19:00-20:00, Paper We-S5VT5.5 | Add to My Program |
FBL-BP: Byzantine-Resilient and Privacy-Preserving Federated Broad Learning |
|
Cheng, Siyao | Capital Normal University |
Ren, Chang-E | Capital Normal University |
Keywords: Machine Learning, Neural Networks and their Applications, AI and Applications
Abstract: In response to the growing demand for clients’privacy protection, federated learning framework often needs appropriate privacy protection to protect clients’ privacy better, such as the client uploading a blinded local model instead of the original real local model to the server, which makes the real value of the local model unobservable to the server. Although the privacy is protected, it will to be a huge challenge to the server to distinguish clients which are Byzantine clients. We propose a federated learning framework based on broad learning that can simultaneously achieve protection of clients’ privacy and robustness against Byzantine attacks, i.e. FBL-BP. We apply differential privacy techniques to perturb the local models of the clients, which can protect clients’ privacy. The server receives the perturbed local models and then guarantees the Byzantine-resilience of global model through an outlier removal mechanism based on cosine similarity. Finally, experimental results show that FBL-BP has significant Byzantine robustness and satisfactory accuracy, and possesses less time consumption than traditional methods.
|
|
19:00-20:00, Paper We-S5VT5.6 | Add to My Program |
DF4RT: Deep Forest for Requirements Traceability Recovery between Use Cases and Source Code |
|
Wang, Bangchao | School of Computer Science and Artificial Intelligence, Wuhan Te |
Deng, Yang | School of Computer Science and Artificial Intelligence, Wuhan Te |
Li, Xingfu | Wuhan Textile University |
Wan, Hongyan | School of Computer Science and Artificial Intelligence, Wuhan Te |
Keywords: Machine Learning
Abstract: Nowadays, many supervised learning techniques have been applied to requirements traceability recovery (RTR). However, the performance of these supervised learning techniques is still far from satisfactory, and exploring a more effective model is necessary. This paper proposes a new deep forest model for RTR(DF4RT) with a novel composition to improve the model’s performance. The proposed model incorporates three feature representation methods, which not only information retrieval and query quality but also add distance. The DF4RT model is evaluated on four open-source projects and compared with nine state-of-the-art tracing approaches. The experimental results show that DF4RT improves precision by 94%, recall by 58%, and F-measure by 72% on average. We also conduct ablation experiments to explore the impact of the different features. It is the first time that deep forest is employed in requirements traceability. Our approach is effective for RTR with good interpretability, few parameters, and good performance in small-scale data.
|
|
19:00-20:00, Paper We-S5VT5.7 | Add to My Program |
A Systematic Mapping Study of Machine Learning Techniques Applied to Software Traceability |
|
Wang, Bangchao | School of Computer Science and Artificial Intelligence, Wuhan Te |
Li, Xingfu | Wuhan Textile University |
Wan, Hongyan | School of Computer Science and Artificial Intelligence, Wuhan Te |
Deng, Yang | School of Computer Science and Artificial Intelligence, Wuhan Te |
Keywords: Machine Learning
Abstract: Context: Software traceability (ST) refers to capturing associations in various artifacts. A growing interest has been in applying machine learning (ML) techniques to ST. Objective: The purpose of this work is to present a comprehensive review of the state-of-the-art progress on the intersection of ML and ST. Method: A systematic mapping study (SMS) is conducted. A total of 965 citations are retrieved from 2013 to 2022, among which 37 studies are selected as primary studies. Result: 32 ML technologies and 9 enhancement strategies for generating trace links have been identified. Besides, 90 datasets and 16 measures have been summarized, which are applied to evaluate the efficacy of the ML-based tracing techniques. The overall reproducibility of these primary studies is at a medium level. Conclusion: We have found that ML is playing a positive role in improving the accuracy and efficiency of ST. However, there are still some challenges such as reproducibility. Hence, researchers are suggested to pay more attention to standardization to improve the reproducibility of studies.
|
|
We-S5VT6 Virtual Session, Room T6 |
Add to My Program |
Machine Learning and Machine Vision |
|
|
|
19:00-20:00, Paper We-S5VT6.1 | Add to My Program |
Capture More Structured Context by Vision Transformers for Free-Hand Sketch Recognition |
|
Guo, Weirong | Wuhan University |
Liu, Sirui | Wuhan University |
Yu, Yaoxiang | Wuhan University |
Cai, Bo | Wuhan University |
Keywords: Machine Vision, Image Processing and Pattern Recognition, Deep Learning
Abstract: Free-hand sketch recognition presents unique challenges that require careful consideration of spatial and temporal properties. Convolutional neural networks, which are widely used for sketch recognition, have limited ability to extract structured context, leading to insufficient learning of spatial properties. In this study, we introduce Vision Transformers to the sketch recognition domain to address this issue. To the best of our knowledge, this is the first attempt to evaluate the applicability and effectiveness of Vision Transformers for sketch recognition. In addition, we also address the problem of semantically different categories appearing similar in sketches. To overcome this challenge, we propose a novel loss function for sketch recognition, termed Sketch Online Label Smoothing (SketchOLS). Our experiments on QuickDraw dataset demonstrate that the proposed approach is a robust and effective solution for sketch recognition. Remarkably, our method outperforms existing models even without the injection of additional temporal information. These results highlight the potential of our approach to enhance free-hand sketch recognition tasks.
|
|
19:00-20:00, Paper We-S5VT6.2 | Add to My Program |
SAMI: A Structure-Aware Multi-Partition Embedding Interaction Model for Accurate Link Prediction in Knowledge Graphs |
|
Gao, Shuai | Qilu University of Technology (Shandong Academy of Sciences) |
Li, Ming | Shandong University of Traditional Chinese Medicine |
Zhao, Jing | Qilu University of Technology(ShanDong Academy of Sciences) |
Shi, Junkang | Qilu University of Technology (ShanDong Academy of Sciences) |
Keywords: Representation Learning, Deep Learning, Neural Networks and their Applications
Abstract: Knowledge graph embedding (KGE), which applies representation learning to represent entities and relationships in knowledge graphs, has attracted significant attention from researchers due to its potential applications in various domains. However, most of the existing KGE methods suffer from the limitation of using single semantic information. This limitation fails to capture the complex and structural information in knowledge graphs (KGs). In this paper, we introduce a novel method called the Structure-Aware Enhanced Multi-Partition Embedding Interaction (SAMI) model for Knowledge Graph Embedding (KGE). SAMI leverages both graph attention network and tensor decomposition to learn expressive and structural enhanced representations for KGs. Specifically, it uses the graph attention layers to aggregate nodes’ features in a neighborhood as an encoder and utilizes an Enhanced Multi-Partition Embedding Interaction (EMEI) to learn independent local features as a decoder. SAMI shows impressive results on several popular datasets compared with baseline methods in terms of both Mean Reciprocal Rank (MRR) and Hits@K.
|
|
19:00-20:00, Paper We-S5VT6.3 | Add to My Program |
LocoNeRF: A NeRF-Based Approach for Local Structure from Motion for Precise Localization |
|
Nenashev, Artem | Skolkovo Institute of Science and Technology (Skoltech) |
Kurenkov, Mikhail | Skolkovo Institute of Science and Technology |
Potapov, Andrei | Skolkovo Institute of Science and Technology |
Zhura, Iana | Skolkovo Institute of Science and Technology |
Katerishich, Maksim | Skolkovo Institute of Science and Technology |
Tsetserukou, Dzmitry | Skoltech |
Keywords: Machine Vision, Deep Learning, Neural Networks and their Applications
Abstract: Visual localization is a critical task in mobile robotics, and researchers are continuously developing new approaches to enhance its efficiency. In this article, we propose a novel approach to improve the accuracy of visual localization using Structure from Motion (SfM) techniques. We highlight the limitations of global SfM, which suffers from high latency, and the challenges of local SfM, which requires large image databases for accurate reconstruction. To address these issues, we propose utilizing Neural Radiance Fields (NeRF), as opposed to image databases, to cut down on the space required for storage. We suggest that sampling reference images around the prior query position can lead to further improvements. We evaluate the accuracy of our proposed method against ground truth obtained using LIDAR and Advanced Lidar Odometry and Mapping in Real-time (A-LOAM), and compare its storage usage against local SfM with COLMAP in the conducted experiments. Our proposed method achieves an accuracy of 0.068 meters compared to the ground truth, which is slightly lower than the most advanced method COLMAP, which has an accuracy of 0.022 meters. However, the size of the database required for COLMAP is 400 megabytes, whereas the size of our NeRF model is only 160 megabytes. Finally, we perform an ablation study to assess the impact of using reference images from the NeRF reconstruction.
|
|
19:00-20:00, Paper We-S5VT6.4 | Add to My Program |
Deep Moore-Penrose Inverse Network with Refinement Strategy for One-Class Classification |
|
Gao, Junna | Beijing University of Technology |
Kong, Dehui | Beijing University of Technology |
Yin, Baocai | Beijing University of Technology |
Lin, Weisi | Nanyang Technological University |
Zhang, Wandong | Western University |
Keywords: Representation Learning, Machine Learning, Deep Learning
Abstract: Multilayer least-square-based one-class classification networks (MLS-OCNs) have gained great attention for the purpose of identifying anomalies and outliers. However, many MLS-OCNs encounter the issue of loosely connected feature coding because they use two separate mechanisms for feature encoding and final pattern recognition. This paper proposes a solution to this problem by introducing a multilayer algorithm called deep Moore-Penrose inverse network with refinement (DMPINR). In particular, DMPINR employs an end-to-end learning process based on the Moore-Penrose inverse (MPI) to identify optimal latent space and classify objects simultaneously. To enhance the robustness of representations, the DMPINR technique pulls back the residual error from the output layer to the hidden layers sequentially, recalculating the parameters of these hidden layers using MPI. The experimental results on ten popular OCC datasets demonstrate that the proposed approach outperforms many existing MLS-OCNs in G-Mean and F_1 scores.
|
|
19:00-20:00, Paper We-S5VT6.5 | Add to My Program |
Multi-View Attention Learning for Residual Disease Prediction of Ovarian Cancer |
|
Gao, Xiangneng | Southern University of Science and Technology |
Ruan, Shulan | University of Science and Technology of China |
Shi, Jun | Anhui Province Key Laboratory of Big Data Analysis and Applicati |
Hu, Guoqing | Department of Radiology, the First Affiliated Hospital of USTC |
Wei, Wei | Department of Radiology, the First Affiliated Hospital of USTC |
Keywords: Neural Networks and their Applications, Machine Vision, Deep Learning
Abstract: In the treatment of ovarian cancer, precise residual disease prediction is significant for clinical and surgical decision-making. However, traditional methods are either invasive (e.g., laparoscopy) or time-consuming (e.g., manual analysis). Recently, deep learning methods make many efforts in automatic analysis of medical images. Despite the remarkable progress, most of them underestimated the importance of 3D image information of disease, which might brings a limited performance for residual disease prediction, especially in small-scale datasets. To this end, in this paper, we propose a novel Multi-View Attention Learning (MuVAL) method for residual disease prediction, which focuses on the comprehensive learning of 3D CT images in a multi-view manner. Specifically, we first obtain multi-view of 3D CT images from transverse, coronal and sagittal views. To better represent the image features in a multi-view manner, we further leverage attention mechanism to help find the more relevant slices in each view. Extensive experiments on a dataset of 111 patients show that our method outperforms existing deep-learning methods.
|
|
19:00-20:00, Paper We-S5VT6.6 | Add to My Program |
A Dynamic Global Semantic Fusion GNN Model for Commonsense Question Answering |
|
Li, Jinbao | Qilu University of Technology (Shandong Academy of Sciences) |
Wang, Guangchen | Qilu University of Technology (Shandong Academy of Sciences) |
Tian, Cheng | Qilu University of Technology (Shandong Academy of Sciences) |
Liu, Song | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Deep Learning, Knowledge Acquisition, Neural Networks and their Applications
Abstract: Commonsense question answering (CSQA) is a challenging learning task that aims to give correct answers to commonsense questions. CSQA models combining large pre-trained language models with knowledge graphs are proposed to perform one-way or two-way information fusion to enhance their commonsense reasoning ability. However, existing CSQA models only fuse local information at the word level, ignoring the global semantic information fusion. Furthermore, current CSQA models often introduce noise nodes when constructing the knowledge subgraph. In addition, existing methods neglect the edge information in message aggregation. To solve these shortcomings, we propose a novel CSQA model named MDE-QA. In our model, we design the multi-layer attention fusion module to bidirectionally fuse the word-level local information and global semantic information of question context and knowledge subgraph. Moreover, we design the dynamic graph neural network module with improved GAT and aggregating edge information to form the dynamic subgraphs which alleviate the interference of noise nodes on reasoning and enhance the commonsense reasoning ability of our model. Finally, we evaluated our model on CommonsenseQA and OpenBookQA datasets to compare with other baseline models.
|
|
19:00-20:00, Paper We-S5VT6.7 | Add to My Program |
Exemplar-Free Continual Learning in Vision Transformers Via Feature Attention Distillation |
|
Dai, Xiaoyu | Qilu University of Technology (Shandong Academy of Sciences) |
Cheng, Jinyong | Qilu University of Technology |
Wei, Zhonghe | Qilu University of Technology (Shandong Academy of Sciences) |
Du, Baoyu | Qilu University of Technology (Shandong Academy of Sciences) |
Keywords: Machine Vision, Image Processing and Pattern Recognition, Machine Learning
Abstract: In this paper, we propose a new approach for continual learning based on the Visual Transformers (ViTs).The purpose of continual learning is to address the catastrophic forgetting problem. One method for preventing forgetting previous tasks is exemplar replay. However, exemplar replay has limitations such as limited memory capacity, large storage requirements, and difficulty adapting to new tasks, which restrict its practical application. Therefore, we adopted knowledge distillation to implement exemplar-free continual learning. This approach transfers the knowledge of old tasks to the model's prior knowledge, helping the model to learn new tasks better while not increasing the storage burden. Based on this, we propose a new method for training the ViT model called Feature Attention Distillation (FAD). Specifically, we introduced a feature attention mechanism into the knowledge distillation model to help the student model better retain meaningful feature representations and attention distributions during the learning process of new tasks by passing attention information between the teacher and student models, thereby improving the learning efficiency and performance of the model. In addition, we also introduced a new normalization method called Continual Normalization (CN) into the student model. This method dynamically calculates the sample mean and variance of the current and historical tasks, allowing the student network to adapt better to the feature distribution changes between different tasks, thus improving the model's generalization ability and robustness. Extensive experiments on the CIFAR100 and ImageNet-32 datasets show that our exemplar-free method is competitive in performance compared to rehearsal-based ViT methods.
|
|
We-S5VT7 Virtual Session, Room T7 |
Add to My Program |
Medical Informatics or Biometrics and Applications |
|
|
|
19:00-20:00, Paper We-S5VT7.1 | Add to My Program |
Semi-Supervised Carotid Plaque Image Classification Using Feature Correction and Pseudo-Label Balance Correction |
|
Yu, Wenjie | Hubei University of Technology |
Gan, Weiyan | Hubei University of Technology |
Wang, Furong | Huazhong University of Science and Technology |
Yang, Zhi | Hubei University of Technology, School of Computer Science |
Shi, Ming | Hubei University of Technology |
Zhou, Ran | Hubei University of Technology |
Keywords: Medical Informatics
Abstract: Carotid plaque classification is of great significance for the diagnosis and treatment of carotid artery disease and the prediction of ischemic stroke. However, the ultrasound image structure of carotid plaques is complex, and manual labeling is time-consuming, which results in lots of unlabeled images. Semi-supervised methods can be well applied in this field by using unlabeled samples to improve model performance. Most of the existing semi-supervised methods are based on pseudo-labels and consistency regularization; however, these kinds of methods will cause error pseudo-labels during sample selection and an imbalanced distribution of pseudo-labels due to cognitive biases in model training. To solve these problems, We propose a method based on feature correction and pseudo-label balance correction, which utilizes a semi-supervised approach. The feature correction module uses the features of labeled data to correct the predicted probabilities of unlabeled data and improve the discrimination of pseudo-labels. The pseudolabel balance correction module can dynamically adjust the sample weight in the loss function according to the number of pseudo-labels to alleviate the problem of data imbalance. Evaluated on 1270 ultrasound carotid plaque images, our method achieved superior performance to other state-of-the-art methods (i.e., Mixmatch, Fixmatch, and Flexmatch) on 10%, 30%, and 50% labeled training data. Our method demonstrated accurate carotid plaque classification using a small number of labeled training datasets, potentially benefiting clinical practice and trials.
|
|
19:00-20:00, Paper We-S5VT7.2 | Add to My Program |
SAL-Net: Semi-Supervised Auxiliary Learning Network for Carotid Plaques Classification |
|
Fu, Lingchao | Hubei University of Technology |
Gan, Haitao | Hubei University of Technology |
Gan, Weiyan | Hubei University of Technology |
Yang, Zhi | Hubei University of Technology, School of Computer Science |
Zhou, Ran | Hubei University of Technology |
Wang, Furong | Huazhong University of Science and Technology |
Keywords: Medical Informatics
Abstract: The analysis of plaque region in carotid ultrasound images is crucial for determining and assessing the harmfulness of carotid plaques. Carotid ultrasound images provide both the location and status information of plaques, which can help diagnose carotid atherosclerosis. Despite this, the relationship between various plaque tasks has been disregarded in prior research, and due to the significant expense associated with manual image segmentation, there is a shortage of datasets that contain a substantial quantity of manually annotated plaque regions. In this paper, a semi-supervised learning algorithm is proposed to reduce reliance on annotated data, and due to the correlation between the plaque classification task and the plaque region semantic segmentation task, an auxiliary learning method named SAL-Net is proposed. The primary task of this model is supervised plaque classification, while the auxiliary task is a semi-supervised semantic segmentation task. The experiments are carried out on a carotid ultrasound image dataset, and the results show that SAL-net can effectively utilize the correlation between different tasks to improve the performance of the model.
|
|
19:00-20:00, Paper We-S5VT7.3 | Add to My Program |
Multi-Feature Enhanced Multimodal Action Recognition |
|
Zhu, Qingmeng | Science & Technology on Integrated Information System Laboratory |
Gu, Ziyin | Chinese Academy of Sciences |
He, Hao | Chinese Academy of Sciences |
Yu, ZhiPeng | ISCAS |
Li, Chen | Institute of Software Chinese Academy of Sciences |
Zhao, Tianci | Institute of Software, Chinese Academy of Sciences |
Xu, Zixuan | Institute of Software Chinese Academy of Sciences |
Keywords: Biometrics and Applications,, Human Perception in Multimedia, Visual Analytics/Communication
Abstract: Action recognition applications achieve impressive success in various fields, yet existing approaches cannot make good use of different information flows. To tackle such an issue, multimodal action recognition exhibit remarkable potential in strengthening the video representation. However, the direct fusion may not enable the model to learn appropriate knowledge. To leverage extra feature information flow to help improve model performance, this paper proposes MFE, a multi-feature enhanced multimodal action recognition model. MFE introduces a guiding tag to explicit guide different extra feature information flow and designs a feature fusion module to fuse different information flows, to enhance hidden representation through more semantic supervision. The experimental results show that MFE achieves better or comparable accuracies with some advanced video action recognition models on several action recognition datasets.
|
|
19:00-20:00, Paper We-S5VT7.4 | Add to My Program |
Handling Data Distortion in IM Based on Network Embedding and Ant Colony Optimization |
|
Li, Jin-Yong | South China University of Technology |
Shi, Xuanli | South China University of Technology |
Chen, Wei-Neng | South China University of Technology |
Keywords: Complex Network, Evolutionary Computation, Metaheuristic Algorithms
Abstract: Online social networks have greatly facilitated the dissemination of information. The influence maximization (IM) problem is a fundamental problem to research social network propagation. IM aims to find a subset of nodes with the maximum spreading influence. However, traditional research on IM ignores the reliability of data. Due to the uncertainty of data, there is a requirement to ensure connections between network nodes in IM are reliable. To solve this problem, this paper proposes a network embedding prediction model to analyze data reliability and an improved ant colony algorithm for IM. First, we predict the true connections between network nodes through the network embed- ding prediction model. EA − NEcommunity network embedding is used to represent the connections obtained from unreliable data by vectors. Based on these vectors, back-propagation neural network predicts the true connectivity relationships. Second, we develop an ant colony algorithm with specially designed heuristics and evaluation function. Heuristics are calculated by the seed set influence evaluation and influence overlap penalty term, which is calculated by cosine similarity based on vectors obtained from the network embedding. Evaluation function also takes these two items into account.
|
|
19:00-20:00, Paper We-S5VT7.5 | Add to My Program |
Deep Feature Selection Algorithm for Classification of Gastric Cancer Subtypes |
|
Si, Chengkun | Qilu University of Technology (Shandong Academy of Sciences) |
Zhao, Long | Qilu University of Technology |
Liu, Jiao | Qilu University of Technology |
Keywords: Medical Informatics, Biometrics and Applications,
Abstract: Globally, gastric cancer is the third most deadly cancer. More than one million new patients have gastric cancer every year, and the cure rate is extremely low. The probability of successful treatment of gastric cancer and the possibility of recovery from treatment are closely related to the subtype of cancer. To improve the quality of life, it is of great practical significance to correctly distinguish clinically relevant cancer subtypes based on omics data for precise treatment to save gastric cancer patients. However, gene expression data has tens of thousands of feature dimensions, and only a few of them help predict gastric cancer subtype classification. Moreover, the important features selected by existing feature selection algorithms do not achieve good results in predicting gastric cancer subtypes. To this end, this paper proposes a Gradient Boosting Deep Feature Selection (GBDFS) algorithm for gastric cancer subtype classification for the first time, to reduce the feature dimension of omics data and improve the classification accuracy of gastric cancer subtypes. The self-built deep neural network classifier was used to evaluate the prediction accuracy and cost of gastric cancer subtype classification before and after feature selection, which proved the effectiveness of the algorithm. In addition, by comparing the other eight existing feature selection algorithms, this paper concludes that GBDFS has an accuracy rate as high as 99.115%. And the robustness of the algorithm is verified by two classification algorithms of support vector machines and deep neural networks. The best low-dimensional feature subset of GBDFS was selected as a biomarker for gastric cancer subtype classification, with 24 gene features that effectively distinguish gastric cancer subtypes. Finally, bioinformatics analysis such as survival analysis, Gene Enrichment Ontology (GO) terms and biological pathways are done.
|
|
19:00-20:00, Paper We-S5VT7.6 | Add to My Program |
Privacy-Preserving Remote Heart Rate Estimation from Facial Videos |
|
Gupta, Divij | Queen's University |
Etemad, Ali | Queen's University |
Keywords: Medical Informatics
Abstract: Remote Photoplethysmography (rPPG) is the process of estimating PPG from facial videos. While this approach benefits from contactless interaction, it is reliant on videos of faces, which often constitutes an important privacy concern. Recent research has revealed that deep learning techniques are vulnerable to attacks, which can result in significant data breaches making deep rPPG estimation even more sensitive. To address this issue, we propose a data perturbation method that involves extraction of certain areas of the face with less identity-related information, followed by pixel shuffling and blurring. Our experiments on two rPPG datasets (PURE and UBFC) show that our approach reduces the accuracy of facial recognition algorithms by over 60%, with minimal impact on rPPG extraction. We also test our method on three facial recognition datasets (LFW, CALFW, and AgeDB), where our approach reduced performance by nearly 50%. Our findings demonstrate the potential of our approach as an effective privacy-preserving solution for rPPG estimation.
|
|
19:00-20:00, Paper We-S5VT7.7 | Add to My Program |
Using Attention and Multi-Scale Unet Approach to Continuous Blood Pressure Prediction |
|
Jianquan, Ouyang | XiangtanUniversity |
Yihui, Tan | Xiangtan University |
Xianjun, Tang | XiangtanUniversity |
Keywords: Biometrics and Applications,, Medical Informatics, Design Methods
Abstract: Blood pressure prediction is a crucial tool in preventing cardiovascular-related diseases. Therefore, improving the accuracy of blood pressure prediction plays a critical role in disease prevention. Previous models for blood pressure prediction have faced challenges related to inadequate feature extraction and insignificant effective information mining. To address these issues, this paper proposes an improved Unet-based continuous blood pressure prediction method that effectively processes the spatial information of multi-scale feature maps and establishes long-term dependencies between multi-scale channels. The proposed model achieves high accuracy meeting the requirements of the AAMI standard and the BHS A Grade. Moreover, the mean absolute errors of systolic blood pressure (SBP) and diastolic blood pressure (DBP) are 3.41 mmHg and 2.58 mmHg, respectively, with standard deviations (STD) of 6.25 mmHg and 5.08 mmHg.
|
|
We-S5VT8 Virtual Session, Room T8 |
Add to My Program |
Deep Learning V-II |
|
|
|
19:00-20:00, Paper We-S5VT8.1 | Add to My Program |
Geometric Contrastive Learning for Heterogeneous Graphs Encoding |
|
Wang, Siheng | East China Normal University |
Cao, Guitao | East China Normal University |
Wu, Chunwei | East China Normal University |
Keywords: Deep Learning, Representation Learning, Neural Networks and their Applications
Abstract: Heterogeneous graphs can represent many network structures in the real world, and research on heterogeneous graph data has attracted more attention. Most existing approaches require additional label information to obtain meaningful node representations. However, labeling heterogeneous graphs is tedious and time-consuming, and low-quality labels will harm the efficiency of models. In addition, the number of nodes increases exponentially with the distance from the root node, but the linearly expanding Euclidean space is difficult to match this growth rate. In this paper, we propose a unified framework that leverages the label irrelevance of contrastive learning and the unique expressive ability of hyperbolic space to encode heterogeneous graphs. Specifically, we use the contrast mechanism to obtain semantic information in an unsupervised way. Meanwhile, we design hyperbolic encoders that are more suitable for graph structure to learn the latent information from heterogeneous graphs efficiently. Experiments on four real-world heterogeneous graph datasets demonstrate the competitive efficacy of the proposed method.
|
|
19:00-20:00, Paper We-S5VT8.2 | Add to My Program |
Method for Reducing MCI Misclassification Rate Based on Cross-Modal Prototype Generation |
|
Li, Jiyun | Donghua University |
Zhang, Yongmeng | Donghua University |
Qian, Chen | Donghua University |
Keywords: Deep Learning, AI and Applications, Neural Networks and their Applications
Abstract: As an early stage of Alzheimer’s disease (AD), the accurate detection of mild cognitive impairment (MCI) is very important for its early intervention. Due to the small between-group differences, it is easy to cause misclassification of MCI. Intrigued by the idea of metric learning, a novel framework based on cross-modal prototype generation is proposed to reduce the misclassification rate of MCI. First, the single modal prototypes are generated from magnetic resonance images (MRI), and then the auxiliary modal is used to correct the original prototypes and generate new cross-modal prototypes. An adaptive mechanism that can dynamically adjust the proportion of the auxiliary modal data is added in the proposed framework to improve the practicability of the cross-modal prototype framework. Experiments show that the cross-modal prototype generation framework can significantly reduce the misclassification rate of MCI by 40.3%.
|
|
19:00-20:00, Paper We-S5VT8.3 | Add to My Program |
Continual Learning Via Manifold Expansion Replay |
|
Xu, Zihao | East China Normal University |
Tang, Xuan | School of Communication & Electronic Engineering, East China Nor |
Shi, Yufei | Department of Medical Informatics and Zhongshan School of Medici |
Zhang, Jianfeng | Software Engineering Institute, East China Normal University |
Yang, Jian | School of Geospatial Information, Information Engineering Univer |
Chen, Mingsong | Software Engineering Institute, East China Normal University |
Wei, Xian | Software Engineering Institute, East China Normal University |
Keywords: Deep Learning, Transfer Learning, Representation Learning
Abstract: In continual learning, the learner learns multiple tasks in sequence, with data being acquired only once for each task. Catastrophic forgetting is a major challenge to continual learning. To reduce forgetting, some existing rehearsal-based methods use episodic memory to replay samples of previous tasks. However, In the process of knowledge integration when learning a new task, this strategy also suffers from catastrophic forgetting due to an imbalance between old and new knowledge. To address this problem, we propose a novel replay strategy called Manifold Expansion Replay (MaER). We argue that expanding the implicit manifold of the knowledge representation in the episodic memory helps to improve the robustness and expressiveness of the model. To this end, we propose a greedy strategy to keep increasing the diameter of the implicit manifold represented by the knowledge in the buffer during memory management. In addition, we introduce Wasserstein distance instead of cross entropy as distillation loss to preserve previous knowledge. With extensive experimental validation on MNIST, CIFAR10, CIFAR100, and TinyImageNet, we show that the proposed method significantly improves the accuracy in continual learning setup, outperforming the state of the arts.
|
|
19:00-20:00, Paper We-S5VT8.4 | Add to My Program |
A Hybrid Framework for Video Anomaly Detection Based on Semantic Consistency of Motion and Appearance |
|
Wang, Xingang | Qilu University of Technology(Shandong Academy of Sciences) |
Cao, Rui | Qilu University of Technology |
Zhang, Hong | Qilu University of Technology |
Zhou, Jinyan | Qilu University of Technology |
Keywords: Deep Learning, Image Processing and Pattern Recognition
Abstract: Video anomaly detection aims to identify abnormal events deviating from the expected behavior. Because the definition of abnormal and normal behavior is challenging to distinguish directly, and the occurrence of abnormal behavior is rare and random, video anomaly detection becomes challenging. This paper proposes a hybrid architecture MAU-VAD based on the premise that the semantic consistency of appearance features and motion information is high in normal events but low in anomaly events. First, we design a motion feature reconstruction module to reconstruct optical flow, which aims to record the stable motion information in normal events. Afterward, the motion information of the reconstructed optical flow and the appearance information of the corresponding original frame are extracted separately using a two-stream autoencoder, and the extracted motion features and appearance features are constrained and fused to generate predicted future frames. Since the reconstruction quality of optical flow directly affects the generation quality of final future frames, the reconstructed optical flow of abnormal events has a greater impact on the quality of generated future frames, thus improving the efficiency of anomaly detection. Experimental results on three standard public datasets demonstrate the effectiveness of the method.
|
|
19:00-20:00, Paper We-S5VT8.5 | Add to My Program |
Helping Language Models Learn More: Multi-Dimensional Task Prompt for Few-Shot Tuning |
|
Weng, Jinta | University of Chinese Academy of Sciences |
Xu, Xiaofeng | Guangzhou University |
Fa, Daidong | School of Computer and Information, Qiannan Normal University Fo |
Zhang, Jiarui | School of Cyber Security, University of Chinese Academy of Scien |
Hu, Yue | School of Cyber Security, University of Chinese Academy of Scien |
Huang, Heyan | School of Computer Science and Technology, Beijing Institute Of |
Keywords: Deep Learning, Machine Learning, Knowledge Acquisition
Abstract: Large language models (LLMs) can be used as accessible and intelligent chatbots by constructing natural language queries and directly inputting the prompt into the large language model. However, different prompt' constructions often lead to uncertainty in the answers and thus make it hard to utilize the specific knowledge of LLMs (like ChatGPT). To alleviate this, we use an interpretable structure to explain the prompt learning principle in LLMs, which certificates that the effectiveness of language models is determined by position changes of the task’s related tokens. Therefore, we propose MTprompt, a multi-dimensional task prompt learning method based on the task-related object, summary, and task description information. By automatically building and searching for appropriate prompts, our proposed MTprompt achieves the best results on a few-shot sample setting and five different datasets. In addition, we demonstrate the effectiveness and stability of our method in different experimental settings and ablation experiments. In interaction with large language models, embedding more task-related information into prompts will make stimulating knowledge embedded in large language models easier.
|
|
19:00-20:00, Paper We-S5VT8.6 | Add to My Program |
Inter-Modal Interactions Learning Based Entity Alignment |
|
Zhang, Zihao | Qilu University of Technology(Shandong Academy of Sciences) |
Sun, Tao | Qilu University of Technology(Shandong Academy of Sciences) |
Zhang, Xiang | Qilu University of Technology(Shandong Academy of Sciences) |
Yin, Xinyan | Qilu University of Technology(Shandong Academy of Sciences) |
Zheng, Hong Yan | Qilu University of Technology(Shandong Academy of Sciences) |
Zhang, Zhiping | Qilu University of Technology(Shandong Academy of Sciences) |
Liu, Hao | Qilu University of Technology(Shandong Academy of Sciences) |
Keywords: Deep Learning, Neural Networks and their Applications, Representation Learning
Abstract: Multi-modal entity alignment aims to identify equivalent entities between two individual multi-modal knowledge graphs, and this technique has an essential role in integrating knowledge from different data sources. However, most previous works directly adopt simple concatenation or weighted sums as their fusion strategy, ignoring the inter-modal interactions of entities, which leads to potential noise introduced from uni-modal features during the feature fusion stage. The reason for this potential noise is that most previous works use separate encoders to encode each uni-modal information and lack inter-modal information interactions, which leads to the existence of highly similar but non-equivalent entities within each uni-modal feature space. This is seen as a potential noise in the feature fusion stage, and as this work follows the convention of one-to-one alignment constraint, this potential noise is bound to impair the model’s performance. This paper proposes IMILEA, an Inter-modal Interactions Learning based Entity Alignment approach for exploring inter-modal interactions, to reduce the harm caused by potential noise in the feature fusion process. In addition, this paper also proposes a strategy of negative sample weighting, which can improve the model’s robustness by increasing the model’s attention to the hard-to-distinguish negative samples. Experiments on two public datasets show that the proposed model in this paper provides state-of-the-art performance.
|
|
19:00-20:00, Paper We-S5VT8.7 | Add to My Program |
Hierarchical Dynamic Graph Convolutional Networks with Feature Enhancement for Rumor Detection on Social Media |
|
Zhang, Xiang | Qilu University of Technology(Shandong Academy of Sciences) |
Sun, Tao | Qilu University of Technology(Shandong Academy of Sciences) |
Zhang, Zihao | Qilu University of Technology(Shandong Academy of Sciences) |
Yin, Xinyan | Qilu University of Technology(Shandong Academy of Sciences) |
Zheng, Hong Yan | Qilu University of Technology(Shandong Academy of Sciences) |
Zhang, Zhiping | Qilu University of Technology(Shandong Academy of Sciences) |
Liu, Hao | Qilu University of Technology(Shandong Academy of Sciences) |
Keywords: Deep Learning, Neural Networks and their Applications
Abstract: Social media is highly active and full of rumors while the information is widely propagation. Mastering the process information of rumor propagation plays an important role in promoting automated rumor detection. However, the existing methods ignore the rumor evolution and development process, resulting in the loss of details, which is not conducive to extracting key information in rumor detection. In this paper, we propose a novel hierarchical dynamic graph convolution network (H-DynGCN-Enh) to build a dynamic propagation graph for each news differentiation, while taking into account the bi-directional features of rumor propagation and dissemination, fully extracting the key information. In addition, we introduce attention-based graph pooling to design a feature enhancement strategy to avoid redundant information and increase the interaction between comments and original news. Finally, we use the multi-head self-attention mechanism to adjust the weight of dynamic information to obtain global information for detection. The results of two open real-world datasets demonstrate that our model has exceeded the most advanced baseline, the effectiveness of each module has also been fully proved in the ablation experiment.
|
| |