Without solutions
The fundamentals of the Python language and Jupyter notebooks¶
# Copyright (c) Thalesians Ltd, 2019-2023. All rights reserved.
# Copyright (c) Paul Alexander Bilokon, 2019-2023. All rights reserved.
# Author: Paul Alexander Bilokon <[email protected]>
# This version: 2.0 (2023.11.17)
# Previous versions: 1.0 (2019.01.28)
# Email: [email protected]
Motivation¶
Programming is one of the most important skills for a data scientist, and Python is the de facto lingua franca — the programming language of choice — for Data Science.
Data Scientists perform much of this programming inside the Jupyter environment.
In this Chapter we introduce just enough Python (and Jupyter) to get you started in Data Science.
Objectives¶
- To introduce the Python programming language.
- To explain where and how the reader can download the Anaconda Python distribution.
- To introduce the Jupyter notebooks.
- To demonstrate how different types of Jupyter notebook cells can be used.
- To introduce the Python programming language.
- To introduce variables.
- To explain how to use Python’s numeric data types:
int
s andfloat
s. - To introduce type casting.
- To demonstrate how to use Python libraries, using
math
as an example. - To explain the concept of dynamic typing.
- To introduce strings.
- To introduce
None
. - To introduce arithmetic expressions.
- To introduce functions and explain their role in code reuse.
- To explain why functions are first-class citizens in Python.
- To introduce
bool
eans and logic. - To introduce comparison operators.
- To explain how comparison operators can be combined with logical operators, such as
not
,and
, andor
. - To introduce
all
andany
. - To explain that any value can be cast to a
bool
. - To introduce control flow and
if
statements. - To introduce key data structures: lists, tuples, dictionaries, and sets.
- To explain the difference between the shallow copy and the deep copy.
- To explain iteration, and introduce the
while
loop and thefor
loop. - To introduce the temporal types:
date
,time
, anddatetime
. - To provide examples and exercises on this material, so the reader can practise programming.
- To introduce the Python to the literature and web resources on Python.
What are Python and Jupyter¶
Python is a programming language that was created by Guido van Rossum and first released in 1991.
Its distinguishing characteristics are straightforwardness and readability, especially in comparison with other programming languages, such as C++ and Java. At the same time, Python is very expressive, powerful, and laconic, enabling programmers to express complex ideas in very little code.
Python is not only a language of choice for data science. It is frequently employed by web designers (for making websites), system administrators (for writing scripts and automation), hackers (also for writing scripts) and anyone who needs to process numeric and textual data in bulk.
There are two “lineages” of the Python language in existence. There is Python 2.x (the latest being version 2.7.18) and there is Python 3.x (the latest being version 3.12.0 at the time of writing). Python 3.x is supposed to supersede Python 2.x, but because so much systems code is powered by Python 2.x, Python 2.x is still supported and distributed. We shall stick with Python 3.x in the present work.
There are several Python distributions to choose from. The Anaconda distribution is a popular choice among Data Scientists. You can download the latest version of the Anaconda distribution for your operating system from https://www.anaconda.com/
If you have a 64-bit operating system, we suggest that you download the 64-bit version.
Once you have downloaded the distribution, install it.
Launch Anaconda Navigator. Once the Anaconda Navigator window shows up, launch Jupyter notebook from it. When it shows up in the browser, click on “New”, then “Python 3”. A blank Jupyter notebook should show up inviting you to enter some Python code.
It is worth noting that Jupyter notebooks are not the only way to write Python code. You could launch the Python interpreter (python.exe
on Windows) from the Anaconda Prompt and type in Python code closer to the metal. Or you could write your Python code in a text file, save it as something.py
and end up with a standalone Python module or multiple such modules, forming a complex software product. This is something that we would do for a finished, polished solution in production. For research and prototyping, though, Jupyter notebooks are a perfect environment. (While Python is perfectly good for many production use cases, for others you may consider migrating to a language like C++, C#, or Java.)
For completeness, we shall mention that you don’t have to use Python in Jupyter notebooks. The name “Jupyter” itself stands for “Julia, Python, R” — indeed, other programming languages, such as kdb+’s q, can be used in Jupyter notebooks, although we shall stick with Python in this work.
Introduction to Jupyter¶
Jupyter notebooks are at the core of Python’s research environment. In Jupyter notebooks, the data is
- loaded,
- cleaned,
- visualised,
- analysed,
possibly over multiple iterations, until the desired result is obtained. It is therefore unsurprising that Jupyter notebooks are often quite messy. Until they are finally cleaned up to present the conclusions of the research work. In fact, what you are reading right now is also a Jupyter notebook.
Cells¶
A Jupyter notebook comprises a column of basic building blocks called cells.
To insert a new cell in Jupyter, first click on an existing cell, then click on “Insert” in Jupyter’s menu and select “Insert Cell Above” or “Insert Cell Below”.
Under the menu, in the toolbar, there is a drop-down box with cell types: “Code”, “Markdown”, “Raw NBConvert”, and “Heading”. Click on an existing cell, then alter its type by selecting a different value from that drop-down box.
The most important cell types for us are “Code” and “Markdown”.
“Code” cells, such as the one below…
3 + 5
8
allow you to enter Python code (in our example, the numeric expression 3 + 5
) as “In” (input) and display the result as “Out” (output, in our example, 8
). Don’t forget to press [Shift] + [Enter], once you have entered the code in your “Code” cell, to evaluate it and display the result in “Out”. (The cursor will automatically move to the next cell.)
Markdown cells, such as the one you are currently reading, enable you to document your code. Moreover, you can use markdown syntax, such as *this*
(to italicise the text), **this**
(to make the text bold), include # Headings
(prefixed with #
), bulleted lists (prefixed with *
, such as
- this
- simple
- list),
and numbered lists (prefixed with 1.
, such as
- this
- simple
- list).
It is possible to include snippets of Python code, between two backticks, which will be rendered in a special font
.
Finally, if you are a mathematician, you will be pleased to hear that you can include mathematical formulae, in $\LaTeX$, between two dollar signs (or double dollar signs for standalone equations). $\LaTeX$ looks pretty in Jupyter notebooks, such as this Euler’s formula, $e^{ix} = \cos x + i \sin x$.
If we use double dollar signs, then we get $$e^{ix} = \cos x + i \sin x.$$
Unfortunately, teaching you $\LaTeX$, Donald Knuth’s mathematics typesetting language, is outside the scope of this work, but you will find plenty of resources on it online.
However, by now we hope that we have shown you the power of Markdown, Jupyter’s language for documenting Python. The work that you are reading now is written in Markdown. You can read up on Markdown in Wikipedia: https://en.wikipedia.org/wiki/Markdown
Exercise¶
Typeset the following in Markdown:
In algebra, a quadratic equation (from the Latin quadratus for “square”) is any equation having the form $$ax^2 + bx + c = 0,$$ where
- $x$ represents an unknown, and
- $a$, $b$, and $c$ represent known numbers, with $a \neq 0$.
If $a = 0$, then the equation is linear, not quadratic, as there is no $ax^2$ term.
The numbers $a$, $b$, and $c$ are the coefficients of the equation and may be distinguished by calling them, respectively, the quadratic coefficient, the linear coefficient, and the constant or free term.
The values of $x$ that satisfy the equation are called solutions of the equation, and roots or zeros of its left-hand side. A quadratic equation has at most two solutions.
The value $b^2 – 4ac$ is known as the discriminant.
- If the discriminant is positive, there are two real solutions given by the formula $$x_{1,2} = \frac{-b \pm \sqrt{b^2 – 4ac}}{2a}.$$
- If the discriminant is zero, there is one real solution (referred to as a double root) given by the formula $$x = \frac{-b}{2a}.$$
- If the discriminant is negative, there are no (real) solutions.
You can learn more about the quadratic equations on Wikipedia: https://en.wikipedia.org/wiki/Quadratic_equation
Introduction to Python¶
We have already entered our first piece of Python code, namely
3 + 5
8
Exercise¶
Compute, using Python, (i) the product of seven and eight, (ii) the difference between 2190 and 518, (iii) the result of dividing 100 by four (iv) the result of multiplying by 10 of the difference between 2190 and 518.
(2190 - 518) * 10
16720
It should now be clear why we call Python a “supercalculator”. Indeed, you could use Python as a calculator (but it is so much more). To start harnessing its power we should introduce
Variables¶
A variable is one of the most important concepts in programming. Essentially, it is a named value. Moreover, as the name suggests, this named value can be varied (changed), while keeping the name the same.
Let us create a variable named a
. We create a variable by assigning to it, using the assignment operator =
, its initial value:
a = 5
This statement (command) essentially says “set the variable a
to value 5″.
Once the variable a
has been created and initialised (set to its initial value), we can use it in expressions, such as a + 3
. When we write a
in expressions, its value (5) will be substituted for a
, so the result of the arithmetic expression a + 3
will be 5 + 3
, in other words, 8:
a + 3
8
We note that the difference between the statements and expressions is that the latter evaluate to a result.
As we said, the value of the variable can be varied (changed). Let us assign to a
a different value, say, 7:
a = 7
Now when we evaluate the expression a + 3
, we will get a different result, namely 10:
a + 3
10
What if we now assign to a
the result of the expression a + 3
?
a = a + 3
First, the expression a + 3
on the right-hand side of =
is evaluated (it is 7 + 3, i.e. 10). Next, it is assigned to a
as its new value. So, as a result of this assignment, the value of the variable a
has become
a
10
We may now introduce a different variable, say b
,
b = 5
and use it in arithmetic expressions alongside a
:
a + b + 3
18
Notice that the values of the variables persist (are remembered) as we go from one Jupyter cell to the next.
We could write all of the above more succinctly in a single cell:
a = 7
a = a + 3
b = 5
a + b + 3
18
Notice that only the result of the last expression, a + b + 3
is returned as the output (“Out”) by Jupyter.
Sometimes there is no “Out” to be printed, as is the case with assignment to a variable:
a = 10
However, as we said, variables persist throughout the Jupyter session, so we can inspect them in one of the following cells:
a
10
Remember that only the result from the last expression is printed:
2 + 2
3 + 7
10
However, you can print multiple things using the print
function. This is convenient for inspecting intermediate results in your code:
print(2 + 2)
print(3 + 5)
3 + 4
4 8
7
In the example above, 4
and 8
are displayed by the two print
functions, whereas the output of the cell is 7
, which is the result of evaluating the last expression in the cell, 3 + 4
.
Note that variable names in Python are case-sensitive, so a
is not the same as A
, myvar
is not the same as myVar
:
a = 3
A = 5
a
3
Whereas
A
5
Exercise¶
Set the variable a
to 15
, the variable b
to 7
, then, without typing in any digits, swap the values of the two variables, so the variable a
becomes equal to 7
and the variable b
to 15
.
Numerics¶
So far all the values that we have dealt with in Python have been numeric, such as 3
and 5
in the expression
3 + 5
8
The result, 8
, is also numeric.
Moreover, these values are all integers. An integer in programming is the same as in mathematics: a whole number with no digits after the decimal point:
8
8
We can use the built-in Python function type
to confirm that the type of 8 is indeed an integer (or int
for short):
type(8)
int
We can assign this value to a variable
my_int = 8
And then that variable will have the type integer:
type(my_int)
int
Or we could print out the value of my_int
along with the type of its value using print
:
print(my_int, type(my_int))
8 <class 'int'>
Python supports fractions (mathematically speaking, real numbers), as well as integers. Fractions are implemented using a different type, the floating point type, float
:
type(3.57)
float
We can force a literal to be interpreted as a float (rather than as an integer) by including the decimal point:
type(42.)
float
whereas
type(42)
int
We say that 42.
is a float
literal, whereas 42
is an int
literal.
We could also cast a value of type int
to float
:
float(42)
42.0
type(float(42))
float
When casting a value of type float
to type int
we may end up losing precision as we lose all digits after the decimal point:
int(3.57)
3
type(int(3.57))
int
The float
data type is used throughout data science to represent numerical values in arithmetic operations.
Exercise¶
Is the sum of 3
and 3.57
an int
or a float
? Will you lose precision by casting 3
to a float
then back to an int
? Will you lose precision by casting 3.57
to an int
then back to a float
?
Standard python libraries¶
The power of Python is in its libraries — pre-written collections of Python code that do useful stuff for us. We make use of libraries by import
ing their modules:
import math
Once we have imported the standard Python library module math
, we can start using functions defined in it, such as sqrt
for the square root:
math.sqrt(3.57)
1.8894443627691184
We can use the results of these functions in expressions:
4.5 + 2 * math.sqrt(3.57)
8.278888725538238
Modules may define other things in addition to functions, such as constants. In particular, the math
module defines the mathematical $\pi$ (“pi”) constant, which relates the radius of a circle to its circumference (via $C = 2\pi r$, where $r$ is the radius, $C$ the circumference):
math.pi
3.141592653589793
As a side comment, many fractions, such as the transcendental number $\pi$, cannot be represented exactly using floating point. Floating point arithmetics relies on truncated, approximate representations of real numbers, which may lead to all sorts of numerical issues (often subtle) in scientific computing. However, what we are doing here is too basic for us to worry about these numerical issues. If you want to really understand floating point numbers, have a look at the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg (Google it).
Exercise¶
In one of the previous exercises we have already mentioned quadratic equations. Use math
to find both solutions of the quadratic equation $2x^2 -3x + \frac{1}{2} = 0$.
Dynamic typing¶
Let us set the variable x
, so it equals 65:
x = 65
Its type, then, will be integer:
type(x)
int
We could overwrite x
with a value of a different type, such as a float:
x = 3.57
The type of x
has now changed:
type(x)
float
Some programming languages (such as Java, C++, C#, and many others) would not allow overwriting x
with a value of a different type: once something is an int
, it is always an int
. We say that these languages are statically typed, whereas Python is dynamically typed. Types are important in Python, and Python is still a strongly typed language, although the type of a variable may change over the lifetime of the program, hence the expression: “dynamically typed”.
Strings¶
The string type allows us define textual variables. A string
literal is enclosed within two single '
or double "
quotation marks.
my_str = 'foo'
print(my_str, type(my_str))
foo <class 'str'>
It is customary for introductions to programming languages to include an example that prints out the string 'Hello, World!'
In Python, this is a one-liner:
print('Hello, World!')
Hello, World!
The function len
returns the length of a string:
len('Hello, World!')
13
We can access individual characters in a string using indexing with the square brackets. Notice that the indexing starts at zero, thus
'Hello, World!'[0]
'H'
whereas
'Hello, World!'[1]
'e'
We can also index from the back using negative indices:
'Hello, World!'[-1]
'!'
Moreover, we can index longer substrings, rather than individual characters:
'Hello, World!'[3:7]
'lo, '
Notice that the first index is inclusive, whereas the second exclusive, so the resulting substring consists of characters at indices 3, 4, 5, and 6 (but not 7).
When indexing, we can also provide a step:
'Hello, World!'[3:7:2]
'l,'
'Hello, World!'[::2]
'Hlo ol!'
Of course, instead of repearing the string ‘Hello, World!’ so many times (while running the risk of mistyping it), we should have stored it in a variable…
greet = 'Hello, World!'
…and then indexed:
greet[::2]
'Hlo ol!'
One of the most useful operations on strings is concatenation. It enables us to produce a single string from multiple:
'first' + 'second'
'firstsecond'
separator = ', '
'first' + separator + 'second' + separator + 'third'
'first, second, third'
Exercise¶
Use indexing and concatenation to obtain the string 'World, Hello!'
from 'Hello, World!'
.
None¶
We can set Python variables to a special value, None
,
a = None
of a special type,
type(a)
NoneType
None
is used to signal that the value is absent or missing.
In fact, this is the value implicitly returned by statements, such as
print(357)
357
Arithmetic expressions¶
Python supports the standard arithmetic operators:
print('Addition:', 5 + 3)
print('Subtraction:', 5 - 3)
print('Multiplication:', 5 * 3)
print('Division:', 5 / 3)
print('Exponentiataion:', 5**3)
print('Modulo:', 5 % 3)
Addition: 8 Subtraction: 2 Multiplication: 15 Division: 1.6666666666666667 Exponentiataion: 125 Modulo: 2
Python also supports integer division, which produces the largest integer less than or equal to 5 / 3
:
5 // 3
1
If any of the arguments is a float
, the result will also be of type float
:
5.1 // 3.1
1.0
Expressions such as
3 + 5
8
2. * x + 7.
14.14
evaluate to numbers (whether integers, or floating point numbers). They are known as arithmetic expressions.
We can perform some other common operations on numerics:
print('Absolute value:', abs(-5))
print('Rounding:', round(3.56))
print('Maximum value:', max(3, 2, 8, 10, 2, 5))
print('Minimum value:', min(3, 2, 8, 10, 2, 5))
Absolute value: 5 Rounding: 4 Maximum value: 10 Minimum value: 2
Functions¶
Suppose that we have written some code to compute the area of a circle:
radius = 5.
area = math.pi * radius * radius
print(area)
78.53981633974483
There is little point in rewriting it each time we encounter a new circle with a different radius. So we wrap it inside a function, which takes radius
as its parameter (argument) and returns the result:
def area_of_circle(radius):
area = math.pi * radius * radius
return area
We can call our function with the values of the arguments that we need in each case:
area_of_circle(5.)
78.53981633974483
r = 7.5
area_of_circle(r)
176.71458676442586
Functions can have multiple arguments:
def area_of_triangle(base, height):
print('Base:', base)
print('Height:', height)
area = .5 * base * height
return area
area_of_triangle(3., 5.)
Base: 3.0 Height: 5.0
7.5
Notice how the block of code was indented (we chose to indent it using four spaces, although some people prefer to use tabs) to dilimit it, designating it as the body of the function area_of_triangle
. The function call that ensues, area_of_triangle(3., 5.)
, is not indented, and is not part of that body.
The variables base
and height
are defined only within the body of the function. We say that those variables’ scope is limited to the body of the function.
It is possible to call the function specifying the values of the arguments in order
area_of_triangle(3., 5.)
Base: 3.0 Height: 5.0
7.5
or by name
area_of_triangle(height=5., base=3.)
Base: 3.0 Height: 5.0
7.5
Functions can also specify default values for their arguments in their definitions:
def area_of_triangle(base, height=5.):
return .5 * base * height
So calling
area_of_triangle(3., 5.)
7.5
can now be equivalently done as
area_of_triangle(3.)
7.5
Notice that like everything else (e.g. integers) functions are objects and first-class citizens. Thus we can think of area_of_triangle
as a variable set to a value of type function
:
type(area_of_triangle)
function
Function objects can be passed to other functions as parameters:
def add(x, y):
return x + y
def multiply(x, y):
return x * y
def result_printer(op, x, y):
print('The result is', op(x, y))
result_printer(add, 3, 5)
result_printer(multiply, 3, 5)
The result is 8 The result is 15
Good programmers are masters of code reuse therefore they wrap generally useful pieces of code into convenient functions.
If a library defines the function that we need, then we don’t need to write our own. We have already seen (and used) the function
math.sqrt(9.)
3.0
Exercise¶
Write two functions that will return the two roots of a given quadratic equation. Test them on the quadratic equation $2x^2 -3x + \frac{1}{2} = 0$.
Booleans and logic¶
bool
ean is a binary variable type, that can either be True
or False
. It is so named after the self-taught English mathematician, philosopher, and logician George Boole: https://en.wikipedia.org/wiki/George_Boole
my_bool = True
print(my_bool, type(my_bool))
True <class 'bool'>
my_bool = False
print(my_bool, type(my_bool))
False <class 'bool'>
Let us set x
to the integer 10:
x = 10
Expressions that evaluate to either True
or False
are known as boolean expressions.
x < 10
False
type(x < 10)
bool
Different boolean expressions can be obtained by using different comparison operators, such as less than:
x < 10
False
less than or equals:
x <= 10
True
equals:
x == 10
True
greater than or equals:
x >= 10
True
greater than:
x > 10
False
And these comparison operators can be combined with logical operators, such as not
, and
, and or
:
x <= 10 and x % 2 == 1
False
x <= 10 or x % 2 == 1
True
We can also use the built-in function all
:
all([x > 1, 5 <= x, 5 > 3, 7 != 1])
True
which is equivalent to
x > 1 and 5 <= x and 5 > 3 and 7 != 1
True
Similarly,
all([x > 1, 5 <= x, x == 5, 5 > 3, 7 != 1])
False
is equivalent to
x > 1 and 5 <= x and x == 5 and 5 > 3 and 7 != 1
False
Another builtin function, any
, enables us to write
any([x > 1, 5 <= x, x == 5, 5 > 3, 7 != 1])
True
which is somewhat more succinct and arguably more readable than the equivalent
x > 1 or 5 <= x or x == 5 or 5 > 3 or 7 != 1
True
Each data type can also be cast to True
or False
. As a general rule, objects like string
if they do not contain anything, zeros, and None
will be cast to False
, while everything else will be cast to True
:
print(bool())
print(bool(''))
print(bool(' '))
print(bool(0))
print(bool(0.))
print(bool(1))
print(bool(1.5))
print(bool(None))
False False True False False True True False
Control flow¶
We can control the flow of our programs using the basic logical operators and if
statements. The if
statement evaluates the if
block if the given boolean expression is True
and the else
block (as long as it is present) if the given boolean expression is False
. Else-if or elif
lets us set a specific boolean expression to evaluate if the base case is not True
.
if x <= 7:
print('x is less than or equal to seven')
else:
print('x is greater than seven')
x is greater than seven
if x <= 7:
print('x is less than or equal to seven')
In this example, x > 7
(so x <= 7
is False
) but there is no else
block, so nothing is evaluated/printed.
if x <= 7:
print('x is less than or equal to seven')
elif x <= 10:
print('x is greater than seven but less than or equal to ten')
elif x <= 15:
print('x is greater than ten but less than or equal to fifteen')
else:
print('x is greater than fifteen')
x is greater than seven but less than or equal to ten
We can also have nested if-else
statements:
if x % 2 == 0:
print('x is divisible by 2')
if x % 5 == 0:
print('x is divisible by 2 and 5')
elif x % 5 == 0:
print('x is divisible by 5 but not 2')
else:
print('x is divisible by neither 2 nor 5')
x is divisible by 2 x is divisible by 2 and 5
To check whether a variable is None
we use is None
rather than == None
:
if x is None:
print('x is None')
else:
print('x is not None')
x is not None
Exercise¶
Write a function that will return the number of real solutions of a quadratic equation.
Exercise¶
The Fibonacci sequence is a sequence of integers, starting with zero and one, such that each term in the sequence is the sum of the previous two. Thus the first few terms of the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, etc. Write a function that, given n
, will return the n
th term of the Fibonacci sequence.
Data structures¶
As data scientists, we care a lot about data structures that let us store and access large amounts of data. Some such data structures, such as lists, tuples, dictionaries, and sets, are part of the Python standard. Others, such as multidimensional arrays and dataframes, are provided by third-party, but de facto standard libraries, such as NumPy and Pandas, respectively.
Lists¶
A list is arguably the most commonly used data structure in Python. Its core function is to allow storage of and access to various elements. Financial data in particular are often represented as time-series, which are, collections of observed values with corresponding time. To define a list
we use square brackets []
:
my_list = [1, 5, 6, 3]
print(my_list, type(my_list))
[1, 5, 6, 3] <class 'list'>
Python allows us to combine elements of different types into the same list
:
my_list = [3, "hello world", True, None, 3, math.pi]
print(my_list)
[3, 'hello world', True, None, 3, 3.141592653589793]
Let’s examine the length of our list:
len(my_list)
6
Notice that repeated values are counted as distinct elements.
Accessing elements of a list
is performed using indexing with []
. Rememeber that the index of the first element of the list is 0
:
print(my_list[0])
print(my_list[1])
print(my_list[3])
print(my_list[2])
print(my_list[4])
3 hello world None True 3
You may also access elements from the end of a list by using negative indexing:
print(my_list[-1])
print(my_list[-2])
print(my_list[-3])
print(my_list[-4])
3.141592653589793 3 None True
We may set an element in a list
to a new value:
my_list[-1] = 4
print(my_list)
[3, 'hello world', True, None, 3, 4]
We can select a sublist from the list:
my_list = ['problems','worthy','of','attack','prove','their','worth','by','fighting','back']
print(my_list[3:6])
['attack', 'prove', 'their']
Notice that the index 3 is inclusive, whereas the index 6 exclusive, so, as a result, we obtain a sublist containing elements at indices 3, 4, and 5 (but not 6).
We may also select sublists without the lower and/or upper bounds:
print(my_list[3:])
print(my_list[:5])
print(my_list[:])
['attack', 'prove', 'their', 'worth', 'by', 'fighting', 'back'] ['problems', 'worthy', 'of', 'attack', 'prove'] ['problems', 'worthy', 'of', 'attack', 'prove', 'their', 'worth', 'by', 'fighting', 'back']
You can specify a step:
my_list[::2]
['problems', 'of', 'prove', 'worth', 'fighting']
Reverse the order by setting a negative step size:
my_list[::-1]
['back', 'fighting', 'by', 'worth', 'their', 'prove', 'attack', 'of', 'worthy', 'problems']
And combine the step with lower and upper bounds:
my_list[2:10:3]
['of', 'their', 'fighting']
We can use Python’s range
function to generate a list of consecutive integers:
list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Conveniently, we can specify a step as well:
list(range(1,10,2))
[1, 3, 5, 7, 9]
We can add elements to the end of the list via the append
method (a method is a function associated with a particular object, in our example, my_list
):
my_list = list(range(0,10))
my_list.append(25)
my_list.append(25)
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 25, 25]
We can remove specific elements by calling the method remove
and supplying it with the value of an element that we would like to remove. Note that only the first instance of an element will be removed:
my_list.remove(25)
my_list.remove(my_list[5])
print(my_list)
[0, 1, 2, 3, 4, 6, 7, 8, 9, 25]
We can filter a list using something like
list(filter(lambda x: x > 5, my_list))
[6, 7, 8, 9, 25]
Here the lambda or anonymous function lambda x: x > 5
is equivalent to
def my_func(x): return x > 5
but shorter and avoids giving the function a name — it is not needed, as we don’t intent to call this function in the future.
We can map or apply a function (or lambda) to each element of a list:
list(map(lambda x: x*2, my_list))
[0, 2, 4, 6, 8, 12, 14, 16, 18, 50]
The function sorted
sorts a list without modifying it — it returns a new, sorted list, while keeping the original one intact:
sorted(my_list, reverse=True)
[25, 9, 8, 7, 6, 4, 3, 2, 1, 0]
my_list
[0, 1, 2, 3, 4, 6, 7, 8, 9, 25]
On the other hand, the method sort
modifies the list — it sorts it in place:
my_list.sort()
my_list
[0, 1, 2, 3, 4, 6, 7, 8, 9, 25]
Tuples¶
Let us consider an example.
a = [3, "hello world", True, None, 3, math.pi]
a = ['some', 'other', 'list']
a
['some', 'other', 'list']
The variable a
was first assigned to the list [3, "hello world", True, None, 3, math.pi]
, but was then reassigned to another list, ['some', 'other', 'list']
. Variables can be thought of as pointers (references) to objects in memory, such as lists. Two variables can reference the same object in memory, e.g.
a = [3, "hello world", True, None, 3, math.pi]
b = a
Now,
a
[3, 'hello world', True, None, 3, 3.141592653589793]
b
[3, 'hello world', True, None, 3, 3.141592653589793]
Since lists are mutable objects, they can be modified after construction. Notice that we are not reassigning a variable so it references a new object in memory, we are modifying the object that it is already pointing to:
a[2] = False
a
[3, 'hello world', False, None, 3, 3.141592653589793]
Notice that, since the variable b
is referencing the same object, its value has also changed:
b
[3, 'hello world', False, None, 3, 3.141592653589793]
In this sense, mutable objects are somewhat dangerous. Consider the following code:
def my_mean(arg):
# This could be a long function, which, perhaps by mistake,
# modifies arg:
# ...
arg[3] = 11.7
# ...
return sum(arg) / len(arg)
Let’s apply this function to
a = [4.25, 18.5, 22.5, 13.7, 25.4]
The result of the (broken) my_mean
looks roughly correct…
my_mean(a)
16.47
…although it’s not. But what’s worse, the user of my_mean
, who never expected that function to modify its argument, is in for a surprise:
a
[4.25, 18.5, 22.5, 11.7, 25.4]
When we doubt the validity of some code, we may defensively copy the arguments like so:
a = [4.25, 18.5, 22.5, 13.7, 25.4]
print(my_mean(a.copy()))
a
16.47
[4.25, 18.5, 22.5, 13.7, 25.4]
Notice that a copy is equal to…
a = [4.25, 18.5, 22.5, 13.7, 25.4]
b = [4.25, 18.5, 22.5, 13.7, 25.4]
a == b
True
…but not identical to (does not correspond to the same object in memory as) the original:
a is b
False
Whereas if both variables point to the same object in memory we get both equality and identity:
a = [4.25, 18.5, 22.5, 13.7, 25.4]
a = b
print(a == b)
print(a is b)
True True
We can also check this by examining the id
of the object, which in CPython is equal to its address in memory:
id(a)
1401550383104
id(b)
1401550383104
Mutable objects, such as lists, may therefore be a source of subtle and difficult to track bugs. They are less safe than immutable objects, which cannot be modified after construction. Mutable objects are particularly dangerous in multi-threaded environments where code runs in parallel.
Fortunately, Python has a built-in data structure, which is very similar to a list, but immutable: a tuple. We create a tuple instead of a list by using round brackets instead of square brackets:
a = (4.25, 18.5, 22.5, 13.7, 25.4)
type(a)
tuple
Alternatively, we may cast a list to a tuple:
a = tuple([4.25, 18.5, 22.5, 13.7, 25.4])
type(a)
tuple
Once a tuple has been created, it cannot be modified: a[0] = 3.57
will raise an error and the tuple doesn’t have methods such as a.append(3.57)
.
Notice that
(3)
3
is interpreted as the number 3, whereas
(3,)
(3,)
is interpreted as a tuple containing a single element — number 3.
Exercise¶
Write a single function that will return the two roots of a given quadratic equation as a tuple. Test your function on the quadratic equation $2x^2 -3x + \frac{1}{2} = 0$.
Exercise¶
Set the variable a
to 15
, the variable b
to 7
, then, without typing in any digits, without using arithmetics, and without introducing any new variables, swap the values of the two variables, so the variable a
becomes equal to 7
and the variable b
to 15
. Hint: use tuples.
Dictionaries¶
As we have mentioned the copying of objects, we should point out that there are the shallow and deep variants of copy in Python.
The shallow variant copies the object but not its elements; elements of the original data structure are still referenced. For example:
a = (['one', 'two', 'three'], [0, 1, 2, 3, 4, 5])
import copy
a_copy = copy.copy(a)
a_copy
(['one', 'two', 'three'], [0, 1, 2, 3, 4, 5])
a_copy[0].append('four')
While we cannot change the tuple itself since the tuple is immutable, we can change the tuple’s elements, which in this particular case are mutable. a_copy
‘s zeroth element has changed:
a_copy
(['one', 'two', 'three', 'four'], [0, 1, 2, 3, 4, 5])
And, because a_copy
is a shallow copy of a
, the zeroth element of a
has also changed:
a
(['one', 'two', 'three', 'four'], [0, 1, 2, 3, 4, 5])
This isn’t the case for the deep copy:
a_deep_copy = copy.deepcopy(a)
a
(['one', 'two', 'three', 'four'], [0, 1, 2, 3, 4, 5])
a_deep_copy
(['one', 'two', 'three', 'four'], [0, 1, 2, 3, 4, 5])
a_deep_copy[0].append('five')
a_deep_copy
(['one', 'two', 'three', 'four', 'five'], [0, 1, 2, 3, 4, 5])
Notice that the zeroth element of the original a
has not changed:
a
(['one', 'two', 'three', 'four'], [0, 1, 2, 3, 4, 5])
Since we took a deep copy of a
to produce a_deep_copy
from a
, a_deep_copy[0]
and a[0]
are distinct objects:
id(a[0])
1401549898496
id(a_deep_copy[0])
1401550384640
Dictionaries¶
Python dictionaries are powerful abstractions that let us define key-value pairs. In other programming languages, such abstractions are also known as maps. We define dictionaries by using the following notation:
book = {
'authors': 'Michael Berthold',
'title': 'Intelligent Data Analysis',
'publisher': 'Springer',
'year': 2003
}
In this dictionary, the keys 'authors'
, 'title'
, 'publisher'
, and 'year'
correspond to the values 'Michael Berthold'
, 'Intelligent Data Analysis'
, 'Springer'
, and 2003
, respectively.
Data structures can be nested. For example, the value in a dictionary may itself be a data structure, such as a list:
book = {
'authors': ['Michael Berthold', 'David J. Hand'],
'title': 'Intelligent Data Analysis',
'publisher': 'Springer',
'year': 2003
}
We can index the dictionary using the []
notation:
book['authors']
['Michael Berthold', 'David J. Hand']
Notice that dictionaries are mutable:
my_dict = {1:'one',2:'two',3:'three'}
print(my_dict[1])
my_dict[4] = 'four'
print(my_dict)
one {1: 'one', 2: 'two', 3: 'three', 4: 'four'}
Let’s see how we could define a toy dataset of financial time-series:
my_dict = {
'AAPL':[200,201,200.1,205],
'GOOG':[700,750,640,720],
'AMZN':[900,850,920,910]
}
my_dict
{'AAPL': [200, 201, 200.1, 205], 'GOOG': [700, 750, 640, 720], 'AMZN': [900, 850, 920, 910]}
Here, each value is a list of asset prices, e.g.
my_dict['AMZN']
[900, 850, 920, 910]
Sets¶
Sets are defined using the syntax
s = {'red', 'green', 'blue', 'red', 'red', 'green', 'blue'}
Alternatively, a set can be constructed from another collection (such as the list in the following example) using the set
constructor:
s = set(['red', 'green', 'blue', 'red', 'red', 'green', 'blue'])
Unlike lists, repeated elements in sets count as one:
s
{'blue', 'green', 'red'}
len(s)
3
It doesn’t make sense to talk about the indices of the elements of the set. The element is either present in or absent from the set:
'green' in s
True
'purple' in s
False
Sets are mutable:
s.add('cyan')
s
{'blue', 'cyan', 'green', 'red'}
We can consider unions of sets…
{'red', 'green', 'blue', 'red', 'red', 'green', 'blue'}.union({'purple', 'green', 'yellow'})
{'blue', 'green', 'purple', 'red', 'yellow'}
…intersections of sets…