This paper proposes a feature-mapping technique to statistically model Unconventional Data Sets, such as social and behavioral networks, consisting of complex data objects; it demonstrates the versatility of modeling UDS with real-world datasets; and it discusses the generation of synthetic data from UDS, using an Adversarial Autoencoder as the deep generative approach.
When a sufficient amount of training data is available, Machine Learning (ML) models show great promise for solving problems involving complex and dynamic patterns. Social and behavioral domains are rich with such challenging problems, with complex object data extracted from documents, surveys, etc., and represented in forms such as graphs and trees. However, many social and behavioral data sets are inherently sparse and incomplete. The same data field may be unavailable in different records of a data set due to different causes, e.g., because it was not measured, not known, or simply not applicable to that particular record. Furthermore, collection challenges, cost, lack of participation, small affected populations, etc., result in very small sets of data. Resulting unconventional datasets cannot be directly used with potent approaches such as machine learning. A technique to model and synthesize large sets of such complex data objects while maintaining the same statistical and topological characteristics of original data helps overcome these challenges. The authors propose a novel feature-mapping technique to eliminate data inconsistencies and model data objects from unconventional datasets. The feature-mapped data objects are used to synthesize data using two likelihood approaches, i.e., multi-variate Gaussian and regular vine copulas, and one generative adversarial approach using an adversarial autoencoder (AAE). They demonstrate the robustness of the proposed technique with three real-world datasets representing disparate domains and validate the performance of likelihood and deep-generative approaches with these object synthesis strategies. (Published Abstract Provided)