An introduction to Python

Author

Dr. Emily Johnson

Plotting with Python

Before beginning

Please open this link if you haven’t already and let it run in the background.

Introduction

Graphs are extremely important for communicating data quickly and effectively. You might have created a graph in MicroSoft Excel before. When you do, you have to click around the software to modify the graph’s appearance and what data it uses. This might not take too much time if you’re only doing it once but what if you had to make similar graphs another ten times? A hundred times? Or a thousand times? The process would quickly get very boring, take a lot of time, and you’d be more likely to make mistakes!

We can use Python to write programs that plot our data. The program acts as instructions to create the graph. It’s very customisable and you can use the same code over and over again! Many of the graphs and visualisations you see in magazines, newspapers and social media are created using programming. Data scientists and statisticians create graphs to communicate data to doctors, politicians, CEOs, etc., to influence important decisions.

In this activity you are going to create your own graphs using Python.

Don’t feel nervous if this is your first time using Python and you don’t understand all the code (this is a normal feeling for programmers too). You won’t be asked to write your own from scratch, only to edit what we give you.



Animal ageing

The oldest human ever was a French woman named Jeanne Calment. She lived to the age of 122 years and 164 days. Whilst humans can be very long-lived, some animals can live even longer. Understanding what makes these animals live for so long could be important for letting us live longer and healthier lifespans. Below are some examples of long-lived animals:



We are going to create plots to show the maximum lifespan of various animals, both long-lived and short-lived. We will customise our plots in various ways.

Learning objectives:

  • To understand which types of animals live the longest

  • To have an introduction to Python

  • To use Python to create plots of animal maximum ages



Load packages

Python does not know how to process and plot data on its own. Python packages contain additional commands that don’t come installed with Python, and allow us to carry out certain extra tasks. In this activity we’re going to need to load two known as pandas and matplotlib. Pandas is a package that lets python read and edit data, much like you’d use Excel to process raw data. Matplotlib is the package that then lets us plot the data. We can load them by pasting the following code into our first chunk and pressing the play button:


import matplotlib.pyplot as plt
import pandas as pd


As part of this, we rename the packages to something shorter. ‘Matplotlib.pyplot’ is quite a lot of letters to type every time! Each time we used it we’d have to type:

matplotlib.pyplot.plot


Instead, we can rename the package as we load it to something simple and easy to remember. In this case ‘plt’. So the above line of code would become:

plt.plot


In summary, this is how we load and rename a package:


Alternative text



Load the data

Next we are going to load our animal ageing data. The data we’re going to be using can be found here. We are going to load it using the package mentioned earlier ‘pandas’. Remember, we have loaded ‘pandas’ and renamed it to ‘pd’.


The code to load the data is below.

data = pd.read_csv("https://raw.githubusercontent.com/CBFLivUni/scholars_event/refs/heads/main/data/animal_ageing_data.csv") 

We use the read_csv function contained inside the ‘pandas’ package to load the data at the url. We then store it in a variable called ‘data’ using the equals sign.

We can view the data by typing the name of the variable in a code chunk:

data
          Type                           Common name  ...  Sample size  Data quality
0     Molluscs                     Ocean quahog clam  ...       medium    acceptable
1         Fish                       Greenland shark  ...        small    acceptable
2      Mammals                         Bowhead whale  ...       medium    acceptable
3         Fish                     Rougheye rockfish  ...       medium    acceptable
4    Echinoids                        Red sea urchin  ...       medium    acceptable
5     Reptiles                    Galapagos tortoise  ...       medium    acceptable
6         Fish                         Lake sturgeon  ...       medium    acceptable
7      Mammals                                 Human  ...         huge          high
8      Mammals                            Blue whale  ...       medium    acceptable
9   Arthropods                               Lobster  ...       medium    acceptable
10     Mammals                          Killer whale  ...       medium    acceptable
11       Birds                         Pink cockatoo  ...       medium    acceptable
12     Mammals                      Asiatic elephant  ...        large    acceptable
13       Birds                      Laysan albatross  ...        large    acceptable
14    Reptiles          West African dwarf crocodile  ...       medium    acceptable
15       Birds                          Common raven  ...       medium    acceptable
16       Birds                    Eurasian eagle-owl  ...        large    acceptable
17     Mammals                            Chimpanzee  ...        large    acceptable
18     Mammals                          Hippopotamus  ...        large    acceptable
19    Reptiles                        Painted turtle  ...       medium    acceptable
20     Mammals                               Gorilla  ...        large    acceptable
21     Mammals                                 Horse  ...        large          high
22        Fish                     Great white shark  ...       medium    acceptable
23       Birds                          Golden eagle  ...       medium    acceptable
24     Mammals                            Polar bear  ...       medium    acceptable
25     Mammals                     Indian rhinoceros  ...       medium    acceptable
26     Mammals                        Naked mole-rat  ...        large          high
27     Mammals                          Domestic dog  ...        large    acceptable
28     Mammals                                 Tiger  ...        large          high
29     Mammals                          Snow leopard  ...        large          high
30     Mammals                             Gray wolf  ...        large          high
31     Mammals  European badger, or Old World badger  ...       medium    acceptable
32     Mammals                              Capybara  ...        large          high
33       Birds                          Powerful owl  ...        small  questionable
34     Mammals                        Golden hamster  ...       medium    acceptable
35     Mammals                                   Rat  ...        large    acceptable
36        Fish                           Pink salmon  ...       medium    acceptable
37     Mammals                       Star-nosed mole  ...        small  questionable
38  Amphibians                          Rainbow frog  ...        small  questionable
39        Fish                        Dwarf seahorse  ...       medium    acceptable
40       Birds                        Bassian thrush  ...         tiny    acceptable
41  Arthropods                             Fruit fly  ...        large    acceptable
42  Arthropods                             Bumblebee  ...        small    acceptable

[43 rows x 13 columns]

Inspect and format data

There are lots of rows and columns in our data. In the code below we will extract just the oldest animals from the data, and then we will print our their names and lifespans.

# Find the top 8 longest living animals
oldest_animals = data.sort_values("Maximum longevity (yrs)", ascending=False).head(8)

print("\nThe 8 longest-living animals in this dataset are:")
print(oldest_animals[["Common name", "Maximum longevity (yrs)"]])

The 8 longest-living animals in this dataset are:
          Common name  Maximum longevity (yrs)
0   Ocean quahog clam                    507.0
1     Greenland shark                    392.0
2       Bowhead whale                    211.0
3   Rougheye rockfish                    205.0
4      Red sea urchin                    200.0
5  Galapagos tortoise                    177.0
6       Lake sturgeon                    152.0
7               Human                    122.5

In this code we take our data and sort it in order of longevity. We tell it NOT to do it in ascending order by saying ascending=False. We take the top 8 by including head(8). We then save it in a new variable called ‘oldest animals’.

After that, we print the two colums we are interested in (“Common name” and “Maximum longevity” (yrs)“). Do any of these animals surprise you?

Challenge 1: Can you modify the above code so that ‘oldest_animals’ has the 12 most long-lived animals instead?

Challenge 2: Can you make another variable called ‘shortest_lived_animals’ that has the 12 most short-lived animals?


# CODE GOES HERE

# Challenge 1

# Challenge 2


Plot the data!

We are now going to create a simple horizontal bar plot of the 12 most long-lived animals. In the code below:

  • The first line sets the size of the figure.
  • The second line creates a horiztonal bar plot (this is a normal bar plot rotated 90 degrees).
  • The third line tells Python to display the plot.

We have left the name of the column containing the lifespan data blank “______” can you fill it in below with the correct column name? (Hint: look at what columns we printed above.)

plt.figure(figsize=(10, 6))
plt.barh(oldest_animals["Common name"], oldest_animals["_______"]) # ← Which column shows lifespan?
plt.show()

So far this is quite a simple plot, and it doesn’t have any labels or units for the x- and y- axes - this would be considered a poor graph in reality!

With matplotlib (our Python plotting package) we can keep adding layers of new information to our plot. We are now going to fill in the x- and y- axis. However, we have left the y-axis blank for you to fill in with a suitable name.

plt.figure(figsize=(10, 6))
plt.barh(oldest_animals["Common name"], oldest_animals["Maximum longevity (yrs)"]) 
plt.xlabel("Maximum Lifespan (years)") 
plt.ylabel("______") # ← Label for y-axis
plt.show()

We have now created a simple plot of animal longevity!



Make it pretty

Our graph above looks quite good but we might want to customise it more to our liking. We could first change the colours. We do this by passing an additional ‘argument’ to our command plt to let it know the colours we want to use. An example of changing all the colours of our graph is below. (Note: the spelling of ‘colour’ is American in python).

We have also added a title to our graph to explain what we are seeing.

plt.figure(figsize=(10, 6))
plt.barh(oldest_animals["Common name"], oldest_animals["Maximum longevity (yrs)"], 
         color=["red", "orange", "yellow", "green", "blue", "purple", "pink", "brown", "gray", "cyan", "teal", "magenta"])
plt.xlabel("Maximum Lifespan (years)")
plt.ylabel("Animal")
plt.title("Top 12 Longest-Living Animals in the Data")
plt.show()


You can use any colours you like to customise the graph. You just have to change the name of the colours inside the ‘color’ argument. The ones below all come included with matplotlib.


Alternative text


As well as the colours that come included, matplotlib lets you to pick any colour using a hexcode (a 6 character number/letter code after a hash/#).

We can create hexcodes for colours using some of these links:

The last link will allow you to create a colour palette from an image and give you the hex codes for the palette.


Alternative text

Challenge 1: Can you change the colours of the graph using the inbuilt colour names in matplotlib?

Challenge 2: Can you change the colours using hexcodes instead? Maybe you could pick colours that remind you of the animals? If you’re unsure what they look like then use Google to check. Otherwise, just select colours you like. ☺


# Challenge 1

# Challenge 2

Another way we can modify our graph is by using a different ‘theme’. Themes are different ways of styling the plot, for example background colour, default bar colours, fonts, gridlines, etc. Matplotlib comes with prepared styles that you can use to modify your graph.

We can then see a list of which styles we can use using this code:

plt.style.available
['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']

An example of one of the styles applied to our graph is below:

plt.style.use('ggplot')

plt.figure(figsize=(10, 6))
plt.barh(oldest_animals["Common name"], oldest_animals["Maximum longevity (yrs)"])
plt.xlabel("Maximum Lifespan (years)")
plt.ylabel("Animal")
plt.title("Top 12 Longest-Living Animals in the Data")
plt.show()


Challenge 1: pick a style! Apply it to your graph instead of the one used in the example above. Try a few, which do you like best?

Challenge 2: Can you include a theme AND your custom colours?


# Challenge 1

# Challenge 2

If you don’t want to use a theme and prefer the default appearance, you can change it back at any time by running this block of code.


plt.style.use('default')

To finish

Well done! You have now learned how to create a plot of using Python.

Now, using everything we’ve learned, can you modify the code to create a plot of the shortest-lived animals in the dataset - remember that earlier in the activity we created a variable storing the data called shortest_lived_animals.

Don’t worry if this part proves tricky and ask the teachers for help at any point if you get stuck!



Advanced/additional activities

This section is entirely optional and you should only attempt it if you have completed all the challenges above!

When we first looked at our dataset at the start it had columns containing extra information. One column of interest is “type”, which lets us know what type of animal they are (mammal, reptile, etc). Let’s manually inspect what type of animal the longest-lived animals are…

print(oldest_animals[["Type", "Common name", "Maximum longevity (yrs)"]])
          Type         Common name  Maximum longevity (yrs)
0     Molluscs   Ocean quahog clam                    507.0
1         Fish     Greenland shark                    392.0
2      Mammals       Bowhead whale                    211.0
3         Fish   Rougheye rockfish                    205.0
4    Echinoids      Red sea urchin                    200.0
5     Reptiles  Galapagos tortoise                    177.0
6         Fish       Lake sturgeon                    152.0
7      Mammals               Human                    122.5
8      Mammals          Blue whale                    110.0
9   Arthropods             Lobster                    100.0
10     Mammals        Killer whale                     90.0
11       Birds       Pink cockatoo                     83.0

Are there any of these you haven’t heard of before?

For example, echinoids are animals that have a spikey hard shell. Echinoids evolved about 450 million years ago, which is about 220 million years before first dinosaurs appeared! Today we would commonly think of them as ‘sea urchins’ but there are plenty of fossils of ancient echinoids. Below is an artist’s rendition of one vs some red and black sea urchins.


Echinoids


It would be useful to colour our graph by the different animal types. This would give our colours more meaning and also allow us to see if there are any patterns in the data. First, we will store all the unique types of animals in a new variable called ‘types’. We can print ‘types’ to see what this includes.

types = oldest_animals['Type']

print(types)
0       Molluscs
1           Fish
2        Mammals
3           Fish
4      Echinoids
5       Reptiles
6           Fish
7        Mammals
8        Mammals
9     Arthropods
10       Mammals
11         Birds
Name: Type, dtype: object

Now we are going to create a colour map for the different types of animals.

This code is a little more complex so don’t try to understand all of it. To summarise, we first manually create a colour map for our different types of animals. Then we create a ‘list’, which is a type of information Python can work with to set the colours.


# Pick a color for each type of animal
color_map = {
    'Molluscs': 'purple',
    'Fish': 'blue',
    'Mammals': 'orange',
    'Echinoids': 'teal',
    'Reptiles': 'green',
    'Arthropods': 'pink',
    'Birds': 'red'
}

# Create a list of colors based on each animal's type
bar_colors = [color_map.get(t, 'gray') for t in types]

We can then use our colour map in our plotting code. This includes an aditional step where we manually add a legend too so we can see which colour corresponds to which type of animal. (Note, we have given a graph an alternative/more interesting title this time.)

# Plot
plt.figure(figsize=(10, 5))
plt.barh(oldest_animals["Common name"], oldest_animals["Maximum longevity (yrs)"], color=bar_colors)
plt.xlabel("Maximum Lifespan (years)")
plt.ylabel("Animal")
plt.title("How Long Do These Animals Live?")

# Add a legend manually
legend_labels = {v: k for k, v in color_map.items()}
for color in legend_labels:
    plt.bar(0, 0, color=color, label=legend_labels[color])  # invisible bars for legend
plt.legend()

# Show the plot
plt.show()

From this, we can see that many of the most long-lived animals in our dataset are mammals, in particular whales and humans! The longest-lived animal is a mollusc, but there is only one of them.

The full code to generate the graph is below:

types = oldest_animals['Type']

# Pick a color for each type of animal
color_map = {
    'Molluscs': 'purple',
    'Fish': 'blue',
    'Mammals': 'orange',
    'Echinoids': 'teal',
    'Reptiles': 'green',
    'Arthropods': 'pink',
    'Birds': 'red'
}

# Create a list of colors based on each animal's type
bar_colors = [color_map.get(t, 'gray') for t in types]

# Plot
plt.figure(figsize=(10, 5))
plt.barh(oldest_animals["Common name"], oldest_animals["Maximum longevity (yrs)"], color=bar_colors)
plt.xlabel("Maximum Lifespan (years)")
plt.ylabel("Animal")
plt.title("How Long Do These Animals Live?")

# Add a legend manually
legend_labels = {v: k for k, v in color_map.items()}
for color in legend_labels:
    plt.bar(0, 0, color=color, label=legend_labels[color])  # invisible bars for legend
plt.legend()

# Show the plot
plt.show()

Challenge 1: Using the above code, can you modify the colour map and assign your own colours to the types of animals?

Challenge 2: Can you repeat the same steps as above to colour code the data for the shortest-lived animals?


# Challenge 1
# Challenge 2 - check what types of animals there are
types = shortest_lived_animals['Type']

print(types)
# 5 unique types of animal: Arthropods, Birds, Fish, Amphibians, Mammals
# Challenge 2 - create the colour map and graph

color_map = {
    '_____': '_____', # ← Enter types and colours in these blank spaces
    '_____': '_____',
    '_____': '_____',
    '_____': '_____',
    '_____': '_____'
}

# Create a list of colors based on each animal's type
bar_colors = [color_map.get(t, 'gray') for t in types]

# Plot
plt.figure(figsize=(10, 5))
plt.barh(shortest_lived_animals["Common name"], shortest_lived_animals["Maximum longevity (yrs)"], color=bar_colors)
plt.xlabel("Maximum Lifespan (years)")
plt.ylabel("Animal")
plt.title("How Long Do These Animals Live?")

# Add a legend manually
legend_labels = {v: k for k, v in color_map.items()}
for color in legend_labels:
    plt.bar(0, 0, color=color, label=legend_labels[color])  # invisible bars for legend
plt.legend()

# Show the plot
plt.show()

If you got this far then well done, that concludes all of our activities! If you enjoyed learning about animal ageing and want to browse the real website this data came from then check out AnAge.