-
NFT.mine: An xDeepFM-based Recommender System for Non-fungible Token (NFT) Buyers
[pdf]
Non-fungible token (NFT) is a tradable unit of data stored on the blockchain which can be associated with some digital asset as a certification of ownership. The past several years have witnessed the exponential growth of the NFT market. In 2021, the NFT market reached its peak with more than $40 billion trades. Despite the booming NFT market, most NFT-related studies focus on its technical aspect, such as standards, protocols, and security, while our study aims at developing a pioneering recommender system for NFT buyers. In this paper, we introduce an extreme deep factorization machine (xDeepFM)-based recommender system, NFT.mine, which achieves real-time data collection, data cleaning, feature extraction, training, and inference. We used data from OpenSea, the most influential NFT trading platform, to testify the performance of NFT.mine. As a result, experiments showed that compared to traditional models such as logistic regression, naive Bayes, random forest, etc., NFT.mine outperforms them with higher AUC and lower cross entropy loss and outputs personalized recommendations for NFT buyers.
-
Reinforcement Learning for Resilient Power Grids
[pdf]
Traditional power grid systems have become obsolete under more frequent and extreme natural disasters. Reinforcement learning (RL) has been a promising solution for resilience given its successful history of power grid control. However, most power grid simulators and RL interfaces do not support simulation of power grid under large-scale blackouts or when the network is divided into sub-networks. In this study, we proposed an updated power grid simulator built on Grid2Op, an existing simulator and RL interface, and experimented on limiting the action and observation spaces of Grid2Op. By testing with DDQN and SliceRDQN algorithms, we found that reduced action spaces significantly improve training performance and efficiency. In addition, we investigated a low-rank neural network regularization method for deep Q-learning, one of the most widely used RL algorithms, in this power grid control scenario. As a result, the experiment demonstrated that in the power grid simulation environment, adopting this method will significantly increase the performance of RL agents.
-
Identifying Exoplanets with Machine Learning Methods: A Preliminary Study
[pdf]
The discovery of habitable exoplanets has long been a heated topic in astronomy. Traditional methods for exoplanet identification include the wobble method, direct imaging, gravitational microlensing, etc., which not only require a considerable investment of manpower, time, and money, but also are limited by the performance of astronomical telescopes. In this study, we proposed the idea of using machine learning methods to identify exoplanets. We used the Kepler dataset collected by NASA from the Kepler Space Observatory to conduct supervised learning, which predicts the existence of exoplanet candidates as a three-categorical classification task, using decision tree, random forest, naïve Bayes, and neural network; we used another NASA dataset consisted of the confirmed exoplanets data to conduct unsupervised learning, which divides the confirmed exoplanets into different clusters, using k-means clustering. As a result, our models achieved accuracies of 99.06%, 92.11%, 88.50%, and 99.79%, respectively, in the supervised learning task and successfully obtained reasonable clusters in the unsupervised learning task.
-
Machine Learning Security: Can Defense against Centralized Backdoors Work on Distributed Backdoors?
[pdf]
Federated learning trains a global model distributedly by aggregating local agents’ models; therefore, some malicious agents can inject backdoors in local models to attack the global model, referred to as distributed backdoors. In this study, my objective is to determine whether the traditional approach for centralized backdoor defense works on defending distributed backdoors. I reproduced the Model Poisoning attack, a representative pixel-patterned distributed backdoor attack against federated learning proposed by A. N. Bhagoji et al. (2019) on MNIST and CIFAR-10 datasets. I further testified the effectiveness of Neural Cleanse, a generic defense against centralized backdoors proposed by B. Wang et al. (2019), by running Neural Cleanse on the poisoned models. The result of Neural Cleanse is false-positive, which demonstrates that Neural Cleanse as an effective defense against centralized backdoors could not recognize distributed backdoors.
-
Educational Data Mining (EDM): Discovering Determinants of Better Academic Performance
[pdf]
The objective of this study is to use Educational Data Mining (EDM) techniques to discover principal factors that affect students’ academic performance. We crawled a dataset from the China Education Panel Survey (CEPS) with 10,279 samples, then by clustering student-related and parents-related variables into three categories: demographic and family background information (Demographic), self-perceived willingness for education (Willingness), perceived family interaction (Interaction), we implemented various EDM methodologies such as linear regression, regression tree, and random forest on the dataset. As the first attempt to conduct a comprehensive and quantitative investigation into the principal factors that influence Chinese junior high school students’ academic performance on a nationally representative survey, this study not only summarizes, explains, and compares different principal factors discovered by different EDM techniques, but also provides some insight for mitigating China’s educational inequality.
-
The Development of a Neural Network-based Solution for Intelligent Waste Recycling
[pdf]
This study aims at developing a pragmatic solution for intelligent garbage recycling that can be deployed on portable, real-time, and energy-efficient edge-computing devices. We proposed a novel YOLO-based neural network model with Variational Autoencoder (VAE) to increase the accuracy of classification, accelerate the speed of calculation, and reduce the model size to make it feasible in the real-world garbage recycling scenario. The model is consisted of a convolutional feature extractor, a convolutional predictor, and a decoder. After the training process, this model can achieve a correct rate of 69.70% with a total number of 32.1 million parameters and a speed of processing 60 Frames Per Second (FPS), surpassing the performance of other existing models such as YOLO v1 and Fast R-CNN.
-
Modeling of Potential Migration and Fishing of Scottish Mackerel and Herring
[pdf]
This project gives a solution to Problem A of the 2020 Mathematical Contest in Modeling, and was awarded Meritorious Winner Prize (top 6% among 13,749 teams). To model the potential migration and fishing of Scottish mackerel and herring, we predicted the change of the North Atlantic sea temperature based on three authoritative datasets by Holt-Winter’s seasonal forecasting method, realized data visualization through heatmaps, and utilized a probabilistic method to simulate the potential future habitats of Scottish mackerel and herring by Markov process. Finally, based on the estimated migration of Scottish mackerel and herring, we developed an economic model to predict future fishing profits by cost-benefit analysis.