A collective textbook-guide by Colaboratoria

Ru

En

Table of contents

A collective textbook-guide by Colaboratoria

Ethics of using AI in scientific research

Глава 18

Olga Akimova

Author

The use of artificial intelligence in scientific research opens up new opportunities for scholars, but at the same time requires strict adherence to ethical principles. In this chapter, we examine key steps that help researchers use AI responsibly and effectively: disclosure, personal data protection, bias prevention, and ensuring reproducibility.

Шаг 1: раскрытие информации
Шаг 2: защита персональных данных
Шаг 3: предотвращение предвзятости
Шаг 4: обеспечение воспроизводимости

/ STEP 1

Disclosure

Disclosure — transparency in the use of AI technologies, including descriptions of models, data, and methods, is the foundation of trust in scientific results.

The use of artificial intelligence in scientific work opens new horizons for researchers, but simultaneously confronts them with a number of ethical challenges. The first step toward ethically responsible use of AI in scientific research is disclosure, or transparency.

At this stage, the researcher is obliged to provide complete information about the AI technologies used, including the type of model, its training data, as well as the methods and algorithms applied. This is not only a matter of trust in the research but also a way to prevent potential ethical issues related to bias, discrimination, or data manipulation. Without disclosure, it is impossible to ensure reproducibility and reliability of results, which are fundamental principles of the scientific method.

If AI algorithms assisted in data analysis, text generation, literature processing, or even hypothesis formulation, this must be clearly stated. However, practice shows that many researchers, especially early-career scholars, either underestimate the importance of this step or forget about it altogether. As a result, there is a risk not only of losing trust in research results but also of violating academic ethics.

It is important to understand that the use of AI does not absolve researchers of responsibility for their results. On the contrary, it requires a deeper understanding of both the capabilities and limitations of these technologies. In this chapter, we discuss how to properly integrate and document the use of AI in scientific work so that it becomes a strength rather than a weakness of the research.

Many international publishers have published detailed guidelines on the use of generative AI in scientific research. One of the first requirements is disclosure of AI and AI-based technologies used in the study.

Here, for example, is an excerpt from Elsevier’s policy:

Authors should disclose in their manuscript the use of Al and Al-assisted technologies and a statement will appear in the published work. Declaring the use of these technologies supports transparency and trust between authors, readers, reviewers, editors and contributors and facilitates compliance with the terms of use of the relevant tool or technology.

And from Wiley:

If an author has used a GenAl tool to develop any portion of a manuscript, its use must be described, transparently and in detail, in the Methods section (or via a disclosure or within the Acknowledgements section, as applicable).

Thus, the paper must specify:

The type of tool used by the author;

How exactly it was used;

The large language model used in the research: ChatGPT-4o, YandexGPT-3, Gemini-2.0-flash, etc.

- important
Ethical use of generative artificial intelligence does not involve generating any type of original research data and is permissible only for improving the readability and language of the work as a whole. All conclusions and research results must be obtained and formulated by the author.

important

Ethical use of generative artificial intelligence does not involve generating any type of original research data and is permissible only for improving the readability and language of the work as a whole. All conclusions and research results must be obtained and formulated by the author.

This rule also implies a prohibition on using generative AI to create or modify images, graphs, etc., presented in the work—namely, improving, moving, removing, or adding specific elements to an image or figure.

Thus, generative AI may be used only as an editing tool and a linguistic analysis instrument (grammar, spelling, punctuation, style checking, etc.). Ultimately, the author bears responsibility for the content of the work; therefore, generative AI cannot be listed as a co-author of a scientific article.

/ STEP 2

Personal Data Protection

Personal data protection — compliance with legal and ethical standards when working with sensitive information, including data minimization, anonymization, and ensuring security.

The use of artificial intelligence in scientific research is often associated with processing large volumes of data, including personal data of experiment participants and respondents. This makes personal data protection one of the key ethical aspects of working with AI.

Personal data are any information that can be used to identify an individual. This may include name, address, email, phone number, IP address, financial details, and other sensitive information.

Using AI to analyze such data can increase the risk of data leakage, especially if algorithms are not properly configured or data are not protected at all stages of processing. Different countries have different laws and regulations governing personal data protection. Researchers must be aware of applicable legislation and comply with it when working with data.

Legislation in different countries:

in the European Union: the General Data Protection Regulation (GDPR) establishes strict requirements for processing personal information;

in the United States: data privacy and security laws (e.g., HIPAA for medical data);

in Russia: similar regulations are governed by the Federal Law "On Personal Data."

History provides many examples where violations of personal data protection led to serious consequences. For example, the 2018 scandal involving Cambridge Analytica showed how data from millions of Facebook users were used to manipulate public opinion without their consent. In academia, such incidents can lead not only to reputational damage but also to retraction of publications and even criminal liability.

Principles of working with personal data:

/ STEP 3

Bias Prevention

Bias prevention — data analysis, selection of fair algorithms, and regular model evaluation help avoid unfair and discriminatory outcomes.

Bias in AI is a systematic error that arises at various stages of algorithm development and use. While AI can significantly improve data analysis efficiency in scientific research, it also carries the risk of amplifying bias and injustice. Bias in AI occurs when algorithms reproduce or amplify existing stereotypes, discrimination, or inequality present in data. This can distort research results, lead to unfair conclusions, and even harm specific groups. Therefore, preventing bias and ensuring fairness are essential ethical principles in the use of AI in science.

Sources of bias

Bias may be caused by the following factors:

Bias in data

If training data contain prejudices or uneven representation of certain groups (e.g., by gender, age, race, or social status), the AI model may learn to reproduce these biases. For example, if hiring data contain historical discrimination, the model may begin to favor certain candidate groups.

Bias in algorithms

Some algorithms may be more sensitive to certain data types or have built-in assumptions that lead to unfair outcomes. For instance, algorithms optimized solely for accuracy may ignore minorities if they represent a small portion of the data.

In scientific research, bias can manifest in various contexts. For example, in medical data analysis, an AI model may underestimate risks for certain patient groups, leading to incorrect treatment decisions. In social sciences, bias may distort survey or experiment results if data are not representative.

One well-known example of AI bias is the COMPAS algorithm used in the United States to predict recidivism. Studies showed showed that the algorithm was more likely to predict repeat offenses for African Americans than for white individuals with similar risk levels.

How to prevent bias?

/ STEP 4

Ensuring Reproducibility

Ensuring reproducibility — documenting all research stages, publishing data and code, and using standard methods help strengthen trust in results and enable verification by other scholars.

Reproducibility is one of the key principles of the scientific method, implying that research results should be confirmed by independent experiments or analyses. In the context of AI use in scientific research, reproducibility becomes especially important because machine learning algorithms can be complex and not always predictable.

Reproducibility ensures the reliability of research results. If other researchers can replicate an experiment or analysis and obtain similar results, trust in the scientific work is strengthened. In AI research, reproducibility also helps identify potential errors or biases in algorithms. There are many examples in science where lack of reproducibility has led to serious consequences.

Example:
In 2011, a 1998 study on the link between vaccination and autism was retracted because its results could not be reproduced.

More information on the reproducibility crisis can be found in Nature or Science.

How to ensure reproducibility?

The steps discussed not only help avoid ethical and legal risks but also contribute to improving the quality and reliability of scientific research. The use of AI in science is not merely a technical tool but a new stage in the development of scientific methodology, requiring researchers to deeply understand both the capabilities and limitations of these technologies.

Practicum

Why is disclosure of AI use in scientific research an important ethical principle?
What elements must be specified when describing the use of AI in a scientific paper?
Why cannot generative AI be used to create original research data?
What are personal data, and why is their protection important when using AI in scientific research?
What methods of data anonymization do you know, and how do they help protect personal information?
What is bias in AI, and how can it affect the results of scientific research?
What factors can cause bias in data and algorithms?
What methods can be used to minimize bias in machine learning models?
Why is model interpretability important for preventing bias?
What is reproducibility, and why is it important in scientific research?
Why is publishing data and code an important element of reproducibility?
What is cross-validation, and how does it help ensure the robustness of results?
What consequences may arise if research results cannot be reproduced?
Which ethical principles do you consider most important when using AI in science?
In your opinion, what new ethical challenges may arise with the further development of AI in the scientific field?