import matplotlib.pyplot as plt
import pandas as pd
An introduction to Python
Plotting with Python
Before beginning
Please open this link if you haven’t already and let it run in the background.
Introduction
Graphs are extremely important for communicating data quickly and effectively. You might have created a graph in MicroSoft Excel before. When you do, you have to click around the software to modify the graph’s appearance and what data it uses. This might not take too much time if you’re only doing it once but what if you had to make similar graphs another ten times? A hundred times? Or a thousand times? The process would quickly get very boring, take a lot of time, and you’d be more likely to make mistakes!
We can use Python to write programs that plot our data. The program acts as instructions to create the graph. It’s very customisable and you can use the same code over and over again! Many of the graphs and visualisations you see in magazines, newspapers and social media are created using programming. Data scientists and statisticians create graphs to communicate data to doctors, politicians, CEOs, etc., to influence important decisions.
In this activity you are going to create your own graphs using Python.
Don’t feel nervous if this is your first time using Python and you don’t understand all the code (this is a normal feeling for programmers too). You won’t be asked to write your own from scratch, only to edit what we give you.
Animal ageing
The oldest human ever was a French woman named Jeanne Calment. She lived to the age of 122 years and 164 days. Whilst humans can be very long-lived, some animals can live even longer. Understanding what makes these animals live for so long could be important for letting us live longer and healthier lifespans. Below are some examples of long-lived animals:
We are going to create plots to show the maximum lifespan of various animals, both long-lived and short-lived. We will customise our plots in various ways.
Learning objectives:
To understand which types of animals live the longest
To have an introduction to Python
To use Python to create plots of animal maximum ages
Load packages
Python does not know how to process and plot data on its own. Python packages contain additional commands that don’t come installed with Python, and allow us to carry out certain extra tasks. In this activity we’re going to need to load two known as pandas
and matplotlib
. Pandas
is a package that lets python read and edit data, much like you’d use Excel to process raw data. Matplotlib
is the package that then lets us plot the data. We can load them by pasting the following code into our first chunk and pressing the play button:
As part of this, we rename the packages to something shorter. ‘Matplotlib.pyplot’ is quite a lot of letters to type every time! Each time we used it we’d have to type:
matplotlib.pyplot.plot
Instead, we can rename the package as we load it to something simple and easy to remember. In this case ‘plt’. So the above line of code would become:
plt.plot
In summary, this is how we load and rename a package:
Load the data
Next we are going to load our animal ageing data. The data we’re going to be using can be found here. We are going to load it using the package mentioned earlier ‘pandas’. Remember, we have loaded ‘pandas’ and renamed it to ‘pd’.
The code to load the data is below.
= pd.read_csv("https://raw.githubusercontent.com/CBFLivUni/scholars_event/refs/heads/main/data/animal_ageing_data.csv") data
We use the read_csv
function contained inside the ‘pandas’ package to load the data at the url. We then store it in a variable called ‘data’ using the equals sign.
We can view the data by typing the name of the variable in a code chunk:
data
Type Common name ... Sample size Data quality
0 Molluscs Ocean quahog clam ... medium acceptable
1 Fish Greenland shark ... small acceptable
2 Mammals Bowhead whale ... medium acceptable
3 Fish Rougheye rockfish ... medium acceptable
4 Echinoids Red sea urchin ... medium acceptable
5 Reptiles Galapagos tortoise ... medium acceptable
6 Fish Lake sturgeon ... medium acceptable
7 Mammals Human ... huge high
8 Mammals Blue whale ... medium acceptable
9 Arthropods Lobster ... medium acceptable
10 Mammals Killer whale ... medium acceptable
11 Birds Pink cockatoo ... medium acceptable
12 Mammals Asiatic elephant ... large acceptable
13 Birds Laysan albatross ... large acceptable
14 Reptiles West African dwarf crocodile ... medium acceptable
15 Birds Common raven ... medium acceptable
16 Birds Eurasian eagle-owl ... large acceptable
17 Mammals Chimpanzee ... large acceptable
18 Mammals Hippopotamus ... large acceptable
19 Reptiles Painted turtle ... medium acceptable
20 Mammals Gorilla ... large acceptable
21 Mammals Horse ... large high
22 Fish Great white shark ... medium acceptable
23 Birds Golden eagle ... medium acceptable
24 Mammals Polar bear ... medium acceptable
25 Mammals Indian rhinoceros ... medium acceptable
26 Mammals Naked mole-rat ... large high
27 Mammals Domestic dog ... large acceptable
28 Mammals Tiger ... large high
29 Mammals Snow leopard ... large high
30 Mammals Gray wolf ... large high
31 Mammals European badger, or Old World badger ... medium acceptable
32 Mammals Capybara ... large high
33 Birds Powerful owl ... small questionable
34 Mammals Golden hamster ... medium acceptable
35 Mammals Rat ... large acceptable
36 Fish Pink salmon ... medium acceptable
37 Mammals Star-nosed mole ... small questionable
38 Amphibians Rainbow frog ... small questionable
39 Fish Dwarf seahorse ... medium acceptable
40 Birds Bassian thrush ... tiny acceptable
41 Arthropods Fruit fly ... large acceptable
42 Arthropods Bumblebee ... small acceptable
[43 rows x 13 columns]
Inspect and format data
There are lots of rows and columns in our data. In the code below we will extract just the oldest animals from the data, and then we will print our their names and lifespans.
# Find the top 8 longest living animals
= data.sort_values("Maximum longevity (yrs)", ascending=False).head(8)
oldest_animals
print("\nThe 8 longest-living animals in this dataset are:")
print(oldest_animals[["Common name", "Maximum longevity (yrs)"]])
The 8 longest-living animals in this dataset are:
Common name Maximum longevity (yrs)
0 Ocean quahog clam 507.0
1 Greenland shark 392.0
2 Bowhead whale 211.0
3 Rougheye rockfish 205.0
4 Red sea urchin 200.0
5 Galapagos tortoise 177.0
6 Lake sturgeon 152.0
7 Human 122.5
In this code we take our data and sort it in order of longevity. We tell it NOT to do it in ascending order by saying ascending=False
. We take the top 8 by including head(8)
. We then save it in a new variable called ‘oldest animals’.
After that, we print the two colums we are interested in (“Common name” and “Maximum longevity” (yrs)“). Do any of these animals surprise you?
Challenge 1: Can you modify the above code so that ‘oldest_animals’ has the 12 most long-lived animals instead?
Challenge 2: Can you make another variable called ‘shortest_lived_animals’ that has the 12 most short-lived animals?
# CODE GOES HERE
# Challenge 1
# Challenge 2
Plot the data!
We are now going to create a simple horizontal bar plot of the 12 most long-lived animals. In the code below:
- The first line sets the size of the figure.
- The second line creates a horiztonal bar plot (this is a normal bar plot rotated 90 degrees).
- The third line tells Python to display the plot.
We have left the name of the column containing the lifespan data blank “______” can you fill it in below with the correct column name? (Hint: look at what columns we printed above.)
=(10, 6))
plt.figure(figsize"Common name"], oldest_animals["_______"]) # ← Which column shows lifespan?
plt.barh(oldest_animals[ plt.show()
So far this is quite a simple plot, and it doesn’t have any labels or units for the x- and y- axes - this would be considered a poor graph in reality!
With matplotlib (our Python plotting package) we can keep adding layers of new information to our plot. We are now going to fill in the x- and y- axis. However, we have left the y-axis blank for you to fill in with a suitable name.
=(10, 6))
plt.figure(figsize"Common name"], oldest_animals["Maximum longevity (yrs)"])
plt.barh(oldest_animals["Maximum Lifespan (years)")
plt.xlabel("______") # ← Label for y-axis
plt.ylabel( plt.show()
We have now created a simple plot of animal longevity!
Make it pretty
Our graph above looks quite good but we might want to customise it more to our liking. We could first change the colours. We do this by passing an additional ‘argument’ to our command plt
to let it know the colours we want to use. An example of changing all the colours of our graph is below. (Note: the spelling of ‘colour’ is American in python).
We have also added a title to our graph to explain what we are seeing.
=(10, 6))
plt.figure(figsize"Common name"], oldest_animals["Maximum longevity (yrs)"],
plt.barh(oldest_animals[=["red", "orange", "yellow", "green", "blue", "purple", "pink", "brown", "gray", "cyan", "teal", "magenta"])
color"Maximum Lifespan (years)")
plt.xlabel("Animal")
plt.ylabel("Top 12 Longest-Living Animals in the Data")
plt.title( plt.show()
You can use any colours you like to customise the graph. You just have to change the name of the colours inside the ‘color’ argument. The ones below all come included with matplotlib.
As well as the colours that come included, matplotlib lets you to pick any colour using a hexcode (a 6 character number/letter code after a hash/#).
We can create hexcodes for colours using some of these links:
- https://htmlcolorcodes.com/color-picker/
- https://color.adobe.com/create/color-wheel
- https://www.colorhexa.com/6738c9
- https://imagecolorpicker.com/en
The last link will allow you to create a colour palette from an image and give you the hex codes for the palette.
Challenge 1: Can you change the colours of the graph using the inbuilt colour names in matplotlib?
Challenge 2: Can you change the colours using hexcodes instead? Maybe you could pick colours that remind you of the animals? If you’re unsure what they look like then use Google to check. Otherwise, just select colours you like. ☺
# Challenge 1
# Challenge 2
Another way we can modify our graph is by using a different ‘theme’. Themes are different ways of styling the plot, for example background colour, default bar colours, fonts, gridlines, etc. Matplotlib comes with prepared styles that you can use to modify your graph.
We can then see a list of which styles we can use using this code:
plt.style.available
['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'seaborn', 'seaborn-bright', 'seaborn-colorblind', 'seaborn-dark', 'seaborn-dark-palette', 'seaborn-darkgrid', 'seaborn-deep', 'seaborn-muted', 'seaborn-notebook', 'seaborn-paper', 'seaborn-pastel', 'seaborn-poster', 'seaborn-talk', 'seaborn-ticks', 'seaborn-white', 'seaborn-whitegrid', 'tableau-colorblind10']
An example of one of the styles applied to our graph is below:
'ggplot')
plt.style.use(
=(10, 6))
plt.figure(figsize"Common name"], oldest_animals["Maximum longevity (yrs)"])
plt.barh(oldest_animals["Maximum Lifespan (years)")
plt.xlabel("Animal")
plt.ylabel("Top 12 Longest-Living Animals in the Data")
plt.title( plt.show()
Challenge 1: pick a style! Apply it to your graph instead of the one used in the example above. Try a few, which do you like best?
Challenge 2: Can you include a theme AND your custom colours?
# Challenge 1
# Challenge 2
If you don’t want to use a theme and prefer the default appearance, you can change it back at any time by running this block of code.
'default') plt.style.use(
To finish
Well done! You have now learned how to create a plot of using Python.
Now, using everything we’ve learned, can you modify the code to create a plot of the shortest-lived animals in the dataset - remember that earlier in the activity we created a variable storing the data called shortest_lived_animals
.
Don’t worry if this part proves tricky and ask the teachers for help at any point if you get stuck!
Advanced/additional activities
This section is entirely optional and you should only attempt it if you have completed all the challenges above!
When we first looked at our dataset at the start it had columns containing extra information. One column of interest is “type”, which lets us know what type of animal they are (mammal, reptile, etc). Let’s manually inspect what type of animal the longest-lived animals are…
print(oldest_animals[["Type", "Common name", "Maximum longevity (yrs)"]])
Type Common name Maximum longevity (yrs)
0 Molluscs Ocean quahog clam 507.0
1 Fish Greenland shark 392.0
2 Mammals Bowhead whale 211.0
3 Fish Rougheye rockfish 205.0
4 Echinoids Red sea urchin 200.0
5 Reptiles Galapagos tortoise 177.0
6 Fish Lake sturgeon 152.0
7 Mammals Human 122.5
8 Mammals Blue whale 110.0
9 Arthropods Lobster 100.0
10 Mammals Killer whale 90.0
11 Birds Pink cockatoo 83.0
Are there any of these you haven’t heard of before?
For example, echinoids are animals that have a spikey hard shell. Echinoids evolved about 450 million years ago, which is about 220 million years before first dinosaurs appeared! Today we would commonly think of them as ‘sea urchins’ but there are plenty of fossils of ancient echinoids. Below is an artist’s rendition of one vs some red and black sea urchins.
It would be useful to colour our graph by the different animal types. This would give our colours more meaning and also allow us to see if there are any patterns in the data. First, we will store all the unique types of animals in a new variable called ‘types’. We can print ‘types’ to see what this includes.
= oldest_animals['Type']
types
print(types)
0 Molluscs
1 Fish
2 Mammals
3 Fish
4 Echinoids
5 Reptiles
6 Fish
7 Mammals
8 Mammals
9 Arthropods
10 Mammals
11 Birds
Name: Type, dtype: object
Now we are going to create a colour map for the different types of animals.
This code is a little more complex so don’t try to understand all of it. To summarise, we first manually create a colour map for our different types of animals. Then we create a ‘list’, which is a type of information Python can work with to set the colours.
# Pick a color for each type of animal
= {
color_map 'Molluscs': 'purple',
'Fish': 'blue',
'Mammals': 'orange',
'Echinoids': 'teal',
'Reptiles': 'green',
'Arthropods': 'pink',
'Birds': 'red'
}
# Create a list of colors based on each animal's type
= [color_map.get(t, 'gray') for t in types] bar_colors
We can then use our colour map in our plotting code. This includes an aditional step where we manually add a legend too so we can see which colour corresponds to which type of animal. (Note, we have given a graph an alternative/more interesting title this time.)
# Plot
=(10, 5))
plt.figure(figsize"Common name"], oldest_animals["Maximum longevity (yrs)"], color=bar_colors)
plt.barh(oldest_animals["Maximum Lifespan (years)")
plt.xlabel("Animal")
plt.ylabel("How Long Do These Animals Live?")
plt.title(
# Add a legend manually
= {v: k for k, v in color_map.items()}
legend_labels for color in legend_labels:
0, 0, color=color, label=legend_labels[color]) # invisible bars for legend
plt.bar(
plt.legend()
# Show the plot
plt.show()
From this, we can see that many of the most long-lived animals in our dataset are mammals, in particular whales and humans! The longest-lived animal is a mollusc, but there is only one of them.
The full code to generate the graph is below:
= oldest_animals['Type']
types
# Pick a color for each type of animal
= {
color_map 'Molluscs': 'purple',
'Fish': 'blue',
'Mammals': 'orange',
'Echinoids': 'teal',
'Reptiles': 'green',
'Arthropods': 'pink',
'Birds': 'red'
}
# Create a list of colors based on each animal's type
= [color_map.get(t, 'gray') for t in types]
bar_colors
# Plot
=(10, 5))
plt.figure(figsize"Common name"], oldest_animals["Maximum longevity (yrs)"], color=bar_colors)
plt.barh(oldest_animals["Maximum Lifespan (years)")
plt.xlabel("Animal")
plt.ylabel("How Long Do These Animals Live?")
plt.title(
# Add a legend manually
= {v: k for k, v in color_map.items()}
legend_labels for color in legend_labels:
0, 0, color=color, label=legend_labels[color]) # invisible bars for legend
plt.bar(
plt.legend()
# Show the plot
plt.show()
Challenge 1: Using the above code, can you modify the colour map and assign your own colours to the types of animals?
Challenge 2: Can you repeat the same steps as above to colour code the data for the shortest-lived animals?
# Challenge 1
# Challenge 2 - check what types of animals there are
= shortest_lived_animals['Type']
types
print(types)
# 5 unique types of animal: Arthropods, Birds, Fish, Amphibians, Mammals
# Challenge 2 - create the colour map and graph
= {
color_map '_____': '_____', # ← Enter types and colours in these blank spaces
'_____': '_____',
'_____': '_____',
'_____': '_____',
'_____': '_____'
}
# Create a list of colors based on each animal's type
= [color_map.get(t, 'gray') for t in types]
bar_colors
# Plot
=(10, 5))
plt.figure(figsize"Common name"], shortest_lived_animals["Maximum longevity (yrs)"], color=bar_colors)
plt.barh(shortest_lived_animals["Maximum Lifespan (years)")
plt.xlabel("Animal")
plt.ylabel("How Long Do These Animals Live?")
plt.title(
# Add a legend manually
= {v: k for k, v in color_map.items()}
legend_labels for color in legend_labels:
0, 0, color=color, label=legend_labels[color]) # invisible bars for legend
plt.bar(
plt.legend()
# Show the plot
plt.show()
If you got this far then well done, that concludes all of our activities! If you enjoyed learning about animal ageing and want to browse the real website this data came from then check out AnAge.