Ethical thinking in Data Science

On this page you will find relevant information on Ethical thinking in Data Science including workshops, embedded modules that you can adapt and related publications. For questions or follow ups please contact Dr. Vandana Janeja. This is an important disciplinary discussion and we would love to hear from you.

Ethical Data Life Cycle

Ethics generally works on the principles of do no harm. Although research protocols to protect human beings have been in place for a while now, the pervasiveness of multiple types of data and their use make it less clear where the impact on human beings is in the data life cycle. Thus, harm is not only direct based on exposing identifiable data for individuals, but also indirect resulting from the reuse of easily available data and combining multiple datasets.

In particular for data science there is a need to develop ethical critical thinking while analyzing the data. Throughout the entire lifecycle of the data in the knowledge discovery process there are many opportunities for ethical decision making that a data scientist can evaluate to do no harm.


V. P. Janeja, Do No Harm: An Ethical Data Life Cycle, AAAS Sci on the Fly, April 2019.

Module to embed an Ethical perspective in Data Science classes

Funded by Hrabowski Innovation Fund

Embedded Module


We designed a module which could be embedded in data science and related curricula. This module included the following:

  • Ethical Data life cycle
  • Discuss theory of ethics
  • Introduction to various data science code of conducts
  • Case studies, readings, and reflection questions in the context of an ethical data life cycle
  • Jupyter Notebook demonstrating principles of ethics in action in an algorithmic choice (Example of KNN algorithm)

GitHUB for Jupyter Notebook and Module files:


A series of statements were presented to the students to measure the effect of the course on their attitudes related to the ethical considerations in the analysis of big data. Particularly we included questions around (1) perceptions towards ethical considerations and (2) actions they may need to take as a result of their understanding of ethical issues.

Through our survey-based evaluation we found that overall, the ethics module has had a positive effect on the attitude of the students towards the importance of ethical considerations in the analysis of data.


In general, students in the 2021 offering tended to start the module with a higher tendency towards neutrality and agreement to the action statements than those students in the 2020 offering. Essentially the responses are somewhat right skewed to agreement. A possible explanation for this occurrence could be the effect of the increased exposure to data associated with the pandemic which may have brought in students at a higher level of awareness. However, this is not necessarily the case for the perception statements. It is our hypothesis that perceptions are complex to understand but easier to establish, whereas actions while they take time to establish are concrete and easier to actualize. The goal of the module was to increase the ethical thinking of students when analyzing data in real world projects they would encounter. The survey results demonstrated that the goal was achieved as there was a general increase of the percentage of students that agree with the statements presented about ethical thinking. These findings were presented at the Edulearn 2022 conference[1].

[1] Vandana P. Janeja, Maria Sanchez, “Rethinking Data Science Pedagogy with Embedded Ethical Considerations,” EDULEARN 2022,

EDULearn Full paper PDF



Workshop : Ethics in Data Science Pedagogy

Funded by NSF


  • C. Erickson, C. Carson, J. Aikat, S. Davis, & V. Janeja, (2019, April 1). Ethics Panel Report: 2018 Data Science Leadership Summit. Zenodo.
  • V. P. Janeja, Do No Harm: An Ethical Data Life Cycle, AAAS Sci on the Fly, April 2019.
  • V. P. Janeja and Susan M. Sterett, Infusing Ethical Considerations in a Data Science Curriculum, UMBC- Provost’s Teaching and Learning Symposium, 2018