Big Data and Social Computing Lab

Outlier Detection: A Python Library for Graph Outlier Detection (PyGOD)

Natural Language Processing: Exploiting the Massive User Generated Utterances for Intent Mining under Scarce Annotations

Spam Detection: Learning Dynamic and Robust Defenses Against Co-Adaptive Spammers

Deep Learning: With the ever-increasing information we gathered from various information sources either online or offline, deep learning significantly contributes to a wide range of research tasks from which representative features can be learned and utilized with minimized hand-craft feature engineering. Our research on deep learning focuses on knowledge discovery/extraction and representation learning on user-generated data such as online text queries, user reviews as well as information/social networks.

Recommender System: With the ability to provide personalized suggestions, recommender systems have become an important tool in many web services for attracting and retaining users. We focus our research on building the next generation of recommender systems which can better understand people’s needs. In particular, we aim to utilize deep learning to alleviate the cold-start problem which is a common issue in recommender systems.
A Survey and Critique of Deep Learning on Recommender Systems by Lei Zheng.

Healthcare: The human brain is one of the most complicated biological structures in the known universe. It is very challenging to understand how it works, especially when disorders and diseases occur. Multiple data representations are usually involved, including multi-view biomarkers, neuroimaging tensor data, brain network data, and sequential user behavior data. We collaborate with the CoNECt Lab on various neuroimaging and healthcare projects.
-Computer-aided diagnosis: We focus on fusing heterogeneous data sources to assist diagnosis.
-Precision medicine: We aim to leverage deep learning techniques to provide personized healthcare services.
-Mobile health: The increasing use of electronic forms of communication presents new opportunities in the study of healthcare, including the ability to investigate the manifestations of psychiatric diseases unobtrusively and in the setting of patients’ daily lives. We aim to study the connections between mood disorders and mobile phone usage.
A Review of Heterogeneous Data Mining for Brain Disorder Identification by Bokai Cao.

Social Network Analysis: Powered by data cloud and mapreduce infrastructure, social network platforms are gathering data on many aspects of our daily lives. Motivated by this trend, our research addresses interesting phenomena on social networks including the following topics:
-Network structure and macro social pattern mining: magnet community detection, and social influence evaluation.
-Influence propagation and social activity mining: social sharing temporal pattern mining, spam detection, and social advertising.
-Role discovery: finding the most influential nodes.
Link Prediction across Heterogeneous Social Networks: A Survey by Jiawei Zhang.

Learning from Multiple Data Sources: Multiple related data sources containing different types of features may be available for a given task. For instance, users’ profiles can be used to build recommendation systems; in addition, a model can also use users’ historical behaviors and social networks to infer users’ interests on related products. It is desirable to collectively use any available multiple heterogeneous data sources in order to build effective learning models, including transfer learning, crowd sourcing, and heterogeneous learning.

Multi-label Learning: Many real-world classification tasks involve multiple concepts instead of one single concept, and each data object can be assigned with multiple concepts (class labels) simultaneously. Multi-label learning aims at building accurate classification models that can predict multiple concepts collectively for each object.

Graph Mining: Graphs are increasingly important in modeling real-world data with complex structures. Our research on graph mining includes the following topics:
-Large graph database management: graph search and indexing on a database of large graphs, and on a single large network.
-Scalable machine mining on large graph(s): community detection, link inference, and collective classification.
-Subgraph pattern mining: finding and extracting useful information from graph structured data sets (e.g., molecular structure graphs) to discover significant features.

Heterogeneous Information Networks: Many real-world networks like social networks and information systems usually involve a large number of components, multiple types entities interconnected with different types of relations. We call these networks as heterogeneous information networks, which are critical for modern information infrastructure.

Mining Uncertain and Incomplete Data: Most real data we are facing these days are neither certain nor complete, which becomes a great challenge for applying conventional data mining methods on these data. We aim at designing effective models to perform knowledge discovery from data with uncertainty and incompleteness.

Stream Mining: Design efficient real-time algorithms for continuous data streams, especially for graph streams.

Privacy Preserving Data Publishing: Privacy-preserving data publishing provides methods and tools for publishing useful information while preserving data privacy.