Research and Projects

Research in the lab focuses on strong proof of concepts in real-world applications. The world around us is diverse so is the data. MData lab focuses on harnessing this heterogeneity in modelling the solutions to real world challenges.

Current projects

Our projects shape and change the world around us, providing solutions to real-world problems and funded by governmental agencies and the private sector. Explore our projects.

Funded by NSF

Improving Human Discernment of Audio Deepfakes via Multi-level Information Augmentation


This project increases listeners’ discernment of audio deepfakes through augmentation of information, both technological and sociolinguistic. This project establishes an innovative pathway for collaborative research across sociolinguistics, human centered analytics, and data science and lays the groundwork for future analyses of deepfakes that are broadly relevant across disciplines, informed by human behavioral perspectives. The project will address the societal challenge of misinformation by generating insights that can increase the ability of listeners – particularly college students, whose lives are indelibly shaped by technology – to evaluate the veracity and authenticity of information online. The project’s broader significance is to address the societal challenge of misinformation by generating insights that can help empower listeners to make decisions about how to evaluate the veracity and authenticity of information they encounter online. The project improves understanding and modeling of how deepfakes are involved in spreading misinformation and tracking how language technology is adapted for social harm and/or used in unethical ways.

The proposed work will increase listeners’ discernment of audio deepfakes through augmentation of information that draws upon integrated interdisciplinary knowledge and advances data augmentation as an important tool for deepfake detection. The objectives of the project are to: (1) Study and evaluate listener perceptions of audio deepfakes that have been created with varying degrees of linguistic complexity; (2) Study and evaluate the efficacy of training sessions that increase listeners’ sociolinguistic perceptual ability and improve their ability to discern deepfake audio content; (3) Augment the audio deepfake discernment via multi-level temporal and linguistic signatures, informed by training and linguistic labeling; (4) Evaluate the impact of augmented signature information on listener perceptions of audio deepfakes; (5) Create open-access online modules and materials with social science and data science student involvement to improve listeners’ discernment of audio cues on a wider public scale.


Funded by NSF

HDR Institute: HARP- Harnessing Data and Model Revolution in the Polar Regions



Data Science Pedagogy at Scale

Funded by NSF

Because data science is a comparatively new field, little is known about how pedagogical and curricular approaches work in this domain, particularly at the undergraduate level. As such, data science education presents an outstanding “case” in which to examine the efficacy of a range of existing practices in STEM education. This project will generate new knowledge about a data science curriculum and pedagogy designed to promote learning among diverse undergraduate students, many from groups underrepresented in STEM; inspire their involvement in research; and lay the groundwork for their future engagement and success in STEM courses. Through its continuous cycle of design, implementation, evaluation, and refinement, the project will advance the field of data science education; contribute to the empirical literature on social impact-based active learning, team-based learning, near-peer teaching (meet the Data Science Scholars affiliated with this grant), and student involvement in research; and generate new knowledge on strategies for recruitment and retention of groups underrepresented in data science and across STEM fields, including minorities, women, and first generation college students.

Completed Projects

Ethical Considerations in Data Science Pedagogy

Funded by Hrabowski Innovation Fund

This project will develop a pedagogical module to infuse ethics into the data science and related curricula. PI Janeja will work with faculty partners across the campus to design lectures (in person and online) to discuss ethical considerations in data science, particularly focused on decision making throughout the data life cycle. The students will be given a survey before the ethics module is discussed in class, including scenarios of decision making during the development of a data science project. They will also be given a post survey after the module has been discussed. A before and after evaluation will help us gauge the success of the project. We plan to leverage our experience in such pedagogical evaluation to demonstrate the effectiveness of this approach. Infusing ethics in the data science curricula is a recognized need as evident in various national discussions. In addition to the PIs from the IS department, this proposal also includes partnerships from across campus. In addition, other colleagues from DPS and other academic units involved in data science education will be invited to participate throughout the roundtables and utilizing the module for their own classes. We anticipate this project will contribute effectively to discussions of ethical considerations in data science curricula and provide leadership opportunities to UMBC in this area for engaging with partners across the country in future opportunities.

Spatial Temporal Linkages

Funded by US Army Corps

Relationships between spatial datasets from multiple domains are not always apparent even though they exist all around us. Domain here refers to distinct applications or geo-processes generating spatial datasets. For example, in a spatial location there may be relationships between air toxicity and disease prevalence, which may not necessarily be apparent. This project was geared towards the key tasks of: single domain
spatio-temporal anomalies, association discovery between the anomalies discovery of multi-domain anomalies. We also develop novel metrics for correlation evaluation.

We utilize single domain anomalies and considered overlaps among them to find multi-domain anomalies. In addition, we have considered a master data management approach where we coalesce the data from multiple domains together and mine for associations across the multi-domain datasets. We also investigate methods for managing time series data, which is generally prevalent in such spatio-temporal analysis. For this, we investigated temporal interpolation and evaluated multiple interpolation methods.

Dynamic Reputation Scoring Model for Real Time Anomaly Detection

Funded by CISCO

The objective of this project is to overcome the shortcomings of current IP reputation scoring systems in order to enable a variety of high-level analysis and attack forensics. Our project facilitate attack prevention and empower organizations to “achieve a level of closed-loop intelligence” by stopping an attack at an early stage. The key contribution lies in enriching the set of attributes that the reputation scoring considers, providing an expressive scoring system that enables an administrator to understand what is at stake, and increasing robustness by correlating the various pieces of information while factoring in the trustworthiness of their sources.

We develop models for reputation scoring that support multiple observable attributes and factor in the trustworthiness of the data sources. We plan to explore a number of viable options and assess their effectiveness using multiple external publicly available sources. We investigate multiple perspectives of a reputation scoring by updating and informing trustworthiness through dynamically mining internal and external data sources.

Integrating Cybersecurity with Undergraduate IT Programs

Funded by NSF

Cybersecurity has become a matter of national and global importance because of the economy’s dependence on the Internet, and on cyber-infrastructure. Workforce development through cyber education and training are key towards protecting the ever-growing cyberspace and cyberinfrastructure. However, US universities are not graduating nearly sufficient workforce in the IT areas, let alone cybersecurity. Academic programs in STEM and IT fields are uniquely positioned to address this need. Tailoring curricula in these areas to include cyber education can bridge the gap existing between the demand and supply of trained workforce. Existing certificate programs do not offer a data analytics perspective along with a software security and network security perspective for undergraduate curricula. In addition, the existing certificates are not seamlessly integrated with the existing curriculum towards a bachelor’s degree. This project is motivated to fill this gap.



Other funding sources