A collective textbook-guide by Colaboratoria

Ru

En

Table of contents

A collective textbook-guide by Colaboratoria

Working in Google Colab and Generating Code Using AI

Chapter 12

Olga Akimova

Author

In modern science—especially in interdisciplinary fields such as digital diplomacy—researchers often face the need to work with large volumes of data. Large language models and tools such as Google Colab or Replit.com offer a solution to this problem by helping researchers generate code for data analysis without deep programming knowledge.

1. Работа в Google Colab
2. Генерация кода в LLM
3. Загрузка CSV-файл в Google Colab

/01

Working in Google Colab

Google Colab is a cloud-based platform that supports Python and R, providing an intuitive interface for writing and running code. Google Colab does not require installing additional software. All you need is a browser and a Google account. The platform supports popular data analysis libraries in Python е Python (Pandas, NumPy, Matplotlib, Seaborn) and in R (ggplot2, dplyr).

Google Colab allows you to work with data stored in Google Drive, BigQuery, and other cloud services. Multiple users can work on the same project simultaneously, which is especially useful for academic research. With pre-written scripts and templates, researchers can quickly analyze data and create visualizations. All your work is automatically saved to Google Drive, making it easy to manage files and share them with colleagues.

Step-by-step instructions for working in Google Colab:

Go to the Google Colab website and sign in to your Google account

After signing in, you will see the Google Colab interface. Click New Notebook to create a new document where you can write and run code.

In the notebook, you can create cells for code and text. To run code, click Run (the triangle icon).

Getting started in Google Colab

Creating a notebook in Google Colab

Code input field in Google Colab

/02

Code Generation in LLMs

Large language models such as Chat GPT, Perplexity or DeepSeek can generate programming code based on a text description of a task, making them an ideal solution for researchers without deep programming expertise.

Example:
If you describe the task as: "Create a bar chart for data on how many times countries are mentioned on Twitter; use Python code for Google Colab," the model will generate code that you can run directly in Google Colab.

The tools and libraries available in Google Colab make it possible to solve a wide range of tasks. With code, you can:

build graphs and charts;

visualize time series in a study;

perform sentiment analysis;

generate word clouds, etc.

Example of using an LLM to generate code

Suppose you have digital diplomacy data and want to analyze it using Python (instead of Python, you can also generate code in R). You can ask a language model—for example via Perplexity or similar tools—and describe your task.

/ STEP 1

Describing the task

To begin, you need to clearly formulate the task you want to solve. For instance, if you need to create a chart to visualize the number of diplomatic missions for each country, you can write the following prompt:

/prompt

Write Python code for Google Colab to create a bar chart showing the number of diplomatic missions for each country: 'Country': ['США', 'Китай', 'Россия', 'Германия', 'Франция'], 'Number of missions': [300, 250, 200, 150, 100]

/ STEP 2

Code generation

The model will generate code that creates a simple bar chart displaying the number of diplomatic missions by country.

# Create the data
data = {
   'Country': ['USA', 'China', 'Russia', 'Germany', 'France'],
   'Number of missions': [300, 250, 200, 150, 100]
}
df = pd.DataFrame(data)

# Create the bar chart
plt.figure(figsize=(10, 6))
sns.barplot(x='Country', y='Number of missions', data=df)
plt.title('Number of Diplomatic Missions by Country')
plt.xlabel('Countries')
plt.ylabel('Number of missions')
plt.show()

/ STEP 3

Running the code

Copy the generated code and paste it into a code cell in Google Colab using Ctrl+V. After pasting, click Run. If the platform returns an error when running the code, simply copy the error message and ask the LLM you are using to fix it or rewrite the code. The illustration below shows the result of running the code:

/03

Uploading a CSV File to Google Colab

You can upload a data file directly into a large language model when generating code. However, if your dataset is quite large or contains confidential information, you may face a security concern: how can you work with the data without sending it to third-party services such as LLMs? In this case, Google Colab offers an approach that allows you to keep data inside a closed environment, minimizing the risk of leaks or unauthorized access.

Google Colab allows you to upload data as CSV files directly into the runtime environment without transferring it to external cloud services or third-party tools. This is especially important if you work with sensitive information—for example, data on international relations, diplomatic negotiations, or internal statistics.

/ METHOD 1

Local upload

You can upload a CSV file from your computer directly into Google Colab. Use the following code:

After running this code, a Choose file button will appear, allowing you to upload a file from your device. The files are stored in the Colab environment and referenced in code.

/ METHOD 2

Working with Google Drive

If your data is stored in Google Drive, you can connect Google Colab to your account and work with files directly. Use the following code:

After running this code, you will have access to your Google Drive files via the path: /content/drive/MyDrive/

/ METHOD 3

Working with a file path

When asking a large language model to generate code, specify that you already have a file you will work with—simply paste its name or the path used in Google Colab.

Using large language models to generate code opens up new opportunities for researchers in digital diplomacy and other fields. It allows you to focus on data analysis and interpretation rather than technical programming details. Google Colab, in turn, provides a secure and convenient environment for working with large datasets, including sensitive information.