facebook icon twitter icon youtube icon

Project Descriptions 2024

Understanding the Geographical Concentration of Birth Rates of Individuals with Down Syndrome at the County Level - Dr. Heidi Berger

This project focuses on better understanding the prevalence and distribution of the Down syndrome in the United States. We want to understand whether Down syndrome births are geographically concentrated. If so, where? Is there a spatial process at play? Once we better understand where the need is, we can better understand how well resources for these individuals are poised to serve their need.

This work will be done collaboratively with Dr. Drew Westberg, an economist at Coe College and Dr. Brian Skotko, the director of the Down Syndrome Program at Massachusetts General Hospital and the director of DSC2U. It builds on work conducted in the Bryan Summer Research Program starting in 2016.

 

Data Augmentation Applied to Tabular Data - Dr. Marilyn Vazquez

Classification methods have become valuable tools in multiple sectors of society. Examples of classification applications include self-driving cars, ad targeting, fraud detection, face recognition, protein function prediction, and medical diagnosis. Due to its extensive use, scientists have developed powerful machine-learning techniques for data classification. One of the issues with current state-of-the-art classification approaches, such as deep learning, is that these require lots of data. However, collecting sufficient data to create reliable models is not always possible. For example, data collected from patients can be time-consuming and costly or even impossible if they no longer want to participate in the data collection. In Computer Vision, researchers get around the lack of data by applying data augmentation approaches. Data augmentation refers to creating new data points without collecting any further data. For example, in Computer Vision, new images are created by rotating, scaling, flipping, or recoloring the original image set. This process is applied so that machine learning techniques have large enough data sets to classify data accurately. We will develop and test our data augmentation methods with real tabular data, or data organized by rows and columns, to see how well our techniques keep the intrinsic patterns and if accuracy is improved.