With solutions
The fundamentals of the Python language and Jupyter notebooks¶
# Copyright (c) Thalesians Ltd, 2019-2023. All rights reserved.
# Copyright (c) Paul Alexander Bilokon, 2019-2023. All rights reserved.
# Author: Paul Alexander Bilokon <[email protected]>
# This version: 2.0 (2023.11.17)
# Version: 1.1 (2020.04.07)
# Email: [email protected]
Motivation¶
Programming is one of the most important skills for a data scientist, and Python is the de facto lingua franca — the programming language of choice — for Data Science.
Data Scientists perform much of this programming inside the Jupyter environment.
In this Chapter we introduce just enough Python (and Jupyter) to get you started in Data Science.
Objectives¶
- To introduce the Python programming language.
- To explain where and how the reader can download the Anaconda Python distribution.
- To introduce the Jupyter notebooks.
- To demonstrate how different types of Jupyter notebook cells can be used.
- To introduce the Python programming language.
- To introduce variables.
- To explain how to use Python’s numeric data types:
int
s andfloat
s. - To introduce type casting.
- To demonstrate how to use Python libraries, using
math
as an example. - To explain the concept of dynamic typing.
- To introduce strings.
- To introduce
None
. - To introduce arithmetic expressions.
- To introduce functions and explain their role in code reuse.
- To explain why functions are first-class citizens in Python.
- To introduce
bool
eans and logic. - To introduce comparison operators.
- To explain how comparison operators can be combined with logical operators, such as
not
,and
, andor
. - To introduce
all
andany
. - To explain that any value can be cast to a
bool
. - To introduce control flow and
if
statements. - To introduce key data structures: lists, tuples, dictionaries, and sets.
- To explain the difference between the shallow copy and the deep copy.
- To explain iteration, and introduce the
while
loop and thefor
loop. - To introduce the temporal types:
date
,time
, anddatetime
. - To provide examples and exercises on this material, so the reader can practise programming.
- To introduce the Python literature and web resources on Python.
What are Python and Jupyter¶
Python is a programming language that was created by Guido van Rossum and first released in 1991.
Its distinguishing characteristics are straightforwardness and readability, especially in comparison with other programming languages, such as C++ and Java. At the same time, Python is very expressive, powerful, and laconic, enabling programmers to express complex ideas in very little code.
Python is not only a language of choice for data science. It is frequently employed by web designers (for making websites), system administrators (for writing scripts and automation), hackers (also for writing scripts) and anyone who needs to process numeric and textual data in bulk.
There are two “lineages” of the Python language in existence. There is Python 2.x (the latest being version 2.7.18) and there is Python 3.x (the latest being version 3.12.0 at the time of writing). Python 3.x is supposed to supersede Python 2.x, but because so much systems code is powered by Python 2.x, Python 2.x is still supported and distributed. We shall stick with Python 3.x in the present work.
There are several Python distributions to choose from. The Anaconda distribution is a popular choice among Data Scientists. You can download the latest version of the Anaconda distribution for your operating system from https://www.anaconda.com/
If you have a 64-bit operating system, we suggest that you download the 64-bit version.
Once you have downloaded the distribution, install it.
Launch Anaconda Navigator. Once the Anaconda Navigator window shows up, launch Jupyter notebook from it. When it shows up in the browser, click on “New”, then “Python 3”. A blank Jupyter notebook should show up inviting you to enter some Python code.
It is worth noting that Jupyter notebooks are not the only way to write Python code. You could launch the Python interpreter (python.exe
on Windows) from the Anaconda Prompt and type in Python code closer to the metal. Or you could write your Python code in a text file, save it as something.py
and end up with a standalone Python module or multiple such modules, forming a complex software product. This is something that we would do for a finished, polished solution in production. For research and prototyping, though, Jupyter notebooks are a perfect environment. (While Python is perfectly good for many production use cases, for others you may consider migrating to a language like C++, C#, or Java.)
For completeness, we shall mention that you don’t have to use Python in Jupyter notebooks. The name “Jupyter” itself stands for “Julia, Python, R” — indeed, other programming languages, such as kdb+’s q, can be used in Jupyter notebooks, although we shall stick with Python in this work.
Introduction to Jupyter¶
Jupyter notebooks are at the core of Python’s research environment. In Jupyter notebooks, the data is
- loaded,
- cleaned,
- visualised,
- analysed,
- documented,
possibly over multiple iterations, until the desired result is obtained. It is therefore unsurprising that Jupyter notebooks are often quite messy. Until they are finally cleaned up to present the conclusions of the research work. In fact, what you are reading right now is also a Jupyter notebook.
Cells¶
A Jupyter notebook comprises a column of basic building blocks called cells.
To insert a new cell in Jupyter, first click on an existing cell, then click on “Insert” in Jupyter’s menu and select “Insert Cell Above” or “Insert Cell Below”.
Under the menu, in the toolbar, there is a drop-down box with cell types: “Code”, “Markdown”, “Raw NBConvert”, and “Heading”. Click on an existing cell, then alter its type by selecting a different value from that drop-down box.
The most important cell types for us are “Code” and “Markdown”.
“Code” cells, such as the one below…
3 + 5
8
allow you to enter Python code (in our example, the numeric expression 3 + 5
) as “In” (input) and display the result as “Out” (output, in our example, 8
). Don’t forget to press [Shift] + [Enter], once you have entered the code in your “Code” cell, to evaluate it and display the result in “Out”. (The cursor will automatically move to the next cell.)
Markdown cells, such as the one you are currently reading, enable you to document your code. Moreover, you can use markdown syntax, such as *this*
(to italicise the text), **this**
(to make the text bold), include # Headings
(prefixed with #
), bulleted lists (prefixed with *
, such as
- this
- simple
- list),
and numbered lists (prefixed with 1.
, such as
- this
- simple
- list).
It is possible to include snippets of Python code, between two backticks, which will be rendered in a special font
.
Finally, if you are a mathematician, you will be pleased to hear that you can include mathematical formulae, in $\LaTeX$, between two dollar signs (or double dollar signs for standalone equations). $\LaTeX$ looks pretty in Jupyter notebooks, such as this Euler’s formula, $e^{ix} = \cos x + i \sin x$.
If we use double dollar signs, then we get $$e^{ix} = \cos x + i \sin x.$$
The mathematical formulae introduced using a pair of single dollar signs, such as $x^2$
: $x^2$, are called inline mathematical formulae, whereas those introduced using a pair of double dollar signs are called standalone.
Unfortunately, teaching you $\LaTeX$, Donald Knuth’s mathematics typesetting language, is outside the scope of this work, but you will find plenty of resources on it online.
However, by now we hope that we have shown you the power of Markdown, Jupyter’s language for documenting Python. The work that you are reading now is written in Markdown. You can read up on Markdown in Wikipedia: https://en.wikipedia.org/wiki/Markdown
Exercise¶
Typeset the following in Markdown:
In algebra, a quadratic equation (from the Latin quadratus for “square”) is any equation having the form $$ax^2 + bx + c = 0,$$ where
- $x$ represents an unknown, and
- $a$, $b$, and $c$ represent known numbers, with $a \neq 0$.
If $a = 0$, then the equation is linear, not quadratic, as there is no $ax^2$ term.
The numbers $a$, $b$, and $c$ are the coefficients of the equation and may be distinguished by calling them, respectively, the quadratic coefficient, the linear coefficient, and the constant or free term.
The values of $x$ that satisfy the equation are called solutions of the equation, and roots or zeros of its left-hand side. A quadratic equation has at most two solutions.
The value $b^2 – 4ac$ is known as the discriminant.
- If the discriminant is positive, there are two real solutions given by the formula $$x_{1,2} = \frac{-b \pm \sqrt{b^2 – 4ac}}{2a}.$$
- If the discriminant is zero, there is one real solution (referred to as a double root) given by the formula $$x = \frac{-b}{2a}.$$
- If the discriminant is negative, there are no (real) solutions.
You can learn more about the quadratic equations on Wikipedia: https://en.wikipedia.org/wiki/Quadratic_equation
Solution¶
In algebra, a **quadratic equation** (from the Latin *quadratus* for “square”) is any equation having the form $$ax^2 + bx + c = 0,$$ where * $x$ represents an unknown, and * $a$, $b$, and $c$ represent known numbers, with $a \neq 0$. If $a = 0$, then the equation is linear, not quadratic, as there is no $ax^2$ term. The numbers $a$, $b$, and $c$ are the **coefficients** of the equation and may be distinguished by calling them, respectively, the **quadratic coefficient**, the **linear coefficient**, and the **constant** or **free term**. The values of $x$ that satisfy the equation are called **solutions** of the equation, and **roots** or **zeros** of its left-hand side. A quadratic equation has at most two solutions. The value $b^2 – 4ac$ is known as the **discriminant**. 1. If the discriminant is *positive*, there are two real solutions given by the formula $$x = \frac{-b \pm \sqrt{b^2 – 4ac}}{2a}.$$ 1. If the discriminant is *zero*, there is one real solution (referred to as a **double root**) given by the formula $$x = \frac{-b}{2a}.$$ 1. If the discriminant is *negative*, there are no (real) solutions. You can learn more about the quadratic equations on Wikipedia: https://en.wikipedia.org/wiki/Quadratic_equation
Introduction to Python¶
We have already entered our first piece of Python code, namely
3 + 5
8
Exercise¶
Compute, using Python, (i) the product of seven and eight, (ii) the difference between 2190 and 518, (iii) the result of dividing 100 by four (iv) the result of multiplying by 10 of the difference between 2190 and 518.
Solution¶
7 * 8
56
2190 - 518
1672
100 / 4
25.0
10 * (2190 - 518)
16720
or
(2190 - 518) * 10
16720
It should now be clear why we call Python a “supercalculator”. Indeed, you could use Python as a calculator (but it is so much more). To start harnessing its power we should introduce
Variables¶
A variable is one of the most important concepts in programming. Essentially, it is a named value. Moreover, as the name suggests, this named value can be varied (changed), while keeping the name the same.
Let us create a variable named a
. We create a variable by assigning to it, using the assignment operator =
, its initial value:
a = 5
This statement (command) essentially says “set the variable a
to value 5″.
Once the variable a
has been created and initialised (set to its initial value), we can use it in expressions, such as a + 3
. When we write a
in expressions, its value (5) will be substituted for a
, so the result of the arithmetic expression a + 3
will be 5 + 3
, in other words, 8:
a + 3
8
We note that the difference between the statements and expressions is that the latter evaluate to a result.
As we said, the value of the variable can be varied (changed). Let us assign to a
a different value, say, 7:
a = 7
Now when we evaluate the expression a + 3
, we will get a different result, namely 10:
a + 3
10
What if we now assign to a
the result of the expression a + 3
?
a = a + 3
First, the expression a + 3
on the right-hand side of =
is evaluated (it is 7 + 3, i.e. 10). Next, it is assigned to a
as its new value. So, as a result of this assignment, the value of the variable a
has become
a
10
We may now introduce a different variable, say b
,
b = 5
and use it in arithmetic expressions alongside a
:
a + b + 3
18
Notice that the values of the variables persist (are remembered) as we go from one Jupyter cell to the next.
We could write all of the above more succinctly in a single cell:
a = 7
a = a + 3
b = 5
a + b + 3
18
Notice that only the result of the last expression, a + b + 3
is returned as the output (“Out”) by Jupyter.
Sometimes there is no “Out” to be printed, as is the case with assignment to a variable:
a = 10
However, as we said, variables persist throughout the Jupyter session, so we can inspect them in one of the following cells:
a
10
Remember that only the result from the last expression is printed:
2 + 2
3 + 7
10
However, you can print multiple things using the print
function. This is convenient for inspecting intermediate results in your code:
print(2 + 2)
print(3 + 5)
3 + 4
4 8
7
In the example above, 4
and 8
are displayed by the two print
functions, whereas the output of the cell is 7
, which is the result of evaluating the last expression in the cell, 3 + 4
.
Note that variable names in Python are case-sensitive, so a
is not the same as A
, myvar
is not the same as myVar
:
a = 3
A = 5
a
3
Whereas
A
5
Exercise¶
Set the variable a
to 15
, the variable b
to 7
, then, without typing in any digits, swap the values of the two variables, so the variable a
becomes equal to 7
and the variable b
to 15
.
Solution¶
a = 15
b = 7
temp = a
a = b
b = temp
print('a:', a) # or simply print(a)
print('b:', b) # or simply print(b)
a: 7 b: 15
Notice that in Python we use #
to introduce comments.
Here is another solution, without using a temporary variable. It’s a bit trickier:
a = 15
b = 7
a = a + b
b = a - b
a = a - b
print(a)
print(b)
7 15
Numerics¶
So far all the values that we have dealt with in Python have been numeric, such as 3
and 5
in the expression
3 + 5
8
The result, 8
, is also numeric.
Moreover, these values are all integers. An integer in programming is the same as in mathematics: a whole number with no digits after the decimal point:
8
8
We can use the built-in Python function type
to confirm that the type of 8 is indeed an integer (or int
for short):
type(8)
int
We can assign this value to a variable
my_int = 8
And then that variable will have the type integer:
type(my_int)
int
We could print out the value of my_int
along with the type of its value using print
:
print(my_int, type(my_int))
8 <class 'int'>
Python supports fractions (mathematically speaking, real numbers), as well as integers. Fractions are implemented using a different type, the floating point type, float
:
type(3.57)
float
We can force a literal to be interpreted as a float (rather than as an integer) by including the decimal point:
type(42.)
float
whereas
type(42)
int
We say that 42.
is a float
literal, whereas 42
is an int
literal.
So let us get the terminology right. In the following code
a = 3
b = 7 + 5.3
print(b)
10 + a + b
12.3
25.3
Here, a = 3
, b = 7 + 5.3
, print(b)
are statements (i.e. commands: “assign a value to a variable”, “print something”). Whereas “things” that evaluate the values (including the values themselves), 3
, 7 + 5.3
, 10 + a + b
are expressions. 3
, 7
and 10
are literals of type int
. 5.3 is a literal of type float
. The value of the variable a
is of type int
, as is the value of the literal 3
. Notice that you can mix values of different types in the same expression, as in 7 + 5.3
. The type of the result in this particular instance will be
type(7 + 5.3)
float
float
is also the type of the expression 10 + a + b
.
We could also cast a value of type int
to float
:
float(42)
42.0
type(float(42))
float
whereas
type(42)
int
When casting a value of type float
to type int
we may end up losing precision as we lose all digits after the decimal point:
int(3.57)
3
type(int(3.57))
int
The float
data type is used throughout data science to represent numerical values in arithmetic operations.
Exercise¶
Is the sum of 3
and 3.57
an int
or a float
? Will you lose precision by casting 3
to a float
then back to an int
? Will you lose precision by casting 3.57
to an int
then back to a float
?
Solution¶
3 + 3.57
6.57
type(3 + 3.57)
float
int(float(3))
3
We haven’t lost any precision.
float(int(3.57))
3.0
This time we have lost precision — the digits after the decimal point.
Standard python libraries¶
The power of Python is in its libraries — pre-written collections of Python code that do useful stuff for us. We make use of libraries by import
ing their modules:
import math
Once we have imported the standard Python library module math
, we can start using functions defined in it, such as sqrt
for the square root:
math.sqrt(3.57)
1.8894443627691184
We can use the results of these functions in expressions:
4.5 + 2 * math.sqrt(3.57)
8.278888725538238
Modules may define other things in addition to functions, such as constants. In particular, the math
module defines the mathematical $\pi$ (“pi”) constant, which relates the radius of a circle to its circumference (via $C = 2\pi r$, where $r$ is the radius, $C$ the circumference):
math.pi
3.141592653589793
As a side comment, many fractions, such as the transcendental number $\pi$, cannot be represented exactly using floating point. Floating point arithmetics relies on truncated, approximate representations of real numbers, which may lead to all sorts of numerical issues (often subtle) in scientific computing. However, what we are doing here is too basic for us to worry about these numerical issues. If you want to really understand floating point numbers, have a look at the paper What Every Computer Scientist Should Know About Floating-Point Arithmetic by David Goldberg (Google it).
To check what functions, constants, etc. are defined in a library you can either Google its documentation, or use
dir(math)
['__doc__', '__loader__', '__name__', '__package__', '__spec__', 'acos', 'acosh', 'asin', 'asinh', 'atan', 'atan2', 'atanh', 'cbrt', 'ceil', 'comb', 'copysign', 'cos', 'cosh', 'degrees', 'dist', 'e', 'erf', 'erfc', 'exp', 'exp2', 'expm1', 'fabs', 'factorial', 'floor', 'fmod', 'frexp', 'fsum', 'gamma', 'gcd', 'hypot', 'inf', 'isclose', 'isfinite', 'isinf', 'isnan', 'isqrt', 'lcm', 'ldexp', 'lgamma', 'log', 'log10', 'log1p', 'log2', 'modf', 'nan', 'nextafter', 'perm', 'pi', 'pow', 'prod', 'radians', 'remainder', 'sin', 'sinh', 'sqrt', 'tan', 'tanh', 'tau', 'trunc', 'ulp']
Exercise¶
In one of the previous exercises we have already mentioned quadratic equations. Use math
to find both solutions of the quadratic equation $2x^2 -3x + \frac{1}{2} = 0$.
Solution¶
a = 2.
b = -3.
c = .5
discriminant = b * b - 4 * a * c
discriminant
5.0
The discriminant is positive, so we should indeed have two real solutions:
x1 = (-b + math.sqrt(discriminant)) / (2. * a)
print('x1:', x1)
x2 = (-b - math.sqrt(discriminant)) / (2. * a)
print('x2:', x2)
x1: 1.3090169943749475 x2: 0.19098300562505255
Let us check:
a * x1 * x1 + b * x1 + c
0.0
a * x2 * x2 + b * x2 + c
5.551115123125783e-17
We have encountered a (very minor) floating point arithmetics, numerical issue: instead of zero, we got a very small number (...e-17
should be read as $\ldots \cdot 10^{-17}$). For all practical purposes, this is zero, and x2
is indeed a solution of our quadratic equation.
Dynamic typing¶
Let us set the variable x
, so it equals 65:
x = 65
Its type, then, will be integer:
type(x)
int
We could overwrite x
with a value of a different type, such as a float:
x = 3.57
The type of x
has now changed:
type(x)
float
Some programming languages (such as Java, C++, C#, and many others) would not allow overwriting x
with a value of a different type: once something is an int
, it is always an int
. We say that these languages are statically typed, whereas Python is dynamically typed. Types are important in Python, and Python is still a strongly typed language, although the type of a variable may change over the lifetime of the program, hence the expression: “dynamically typed”.
Strings¶
The string type allows us define textual variables. A string
literal is enclosed within two single '
or double "
quotation marks.
my_str = 'foo'
print(my_str, type(my_str))
foo <class 'str'>
It is customary for introductions to programming languages to include an example that prints out the string 'Hello, World!'
In Python, this is a one-liner:
print('Hello, World!')
Hello, World!
The function len
returns the length of a string:
len('Hello, World!')
13
We can access individual characters in a string using indexing with the square brackets. Notice that the indexing starts at zero, thus
'Hello, World!'[0]
'H'
whereas
'Hello, World!'[1]
'e'
We can also index from the back using negative indices:
'Hello, World!'[-1]
'!'
Moreover, we can index longer substrings, rather than individual characters:
'Hello, World!'[3:7]
'lo, '
Notice that the first index is inclusive, whereas the second exclusive, so the resulting substring consists of characters at indices 3, 4, 5, and 6 (but not 7).
When indexing, we can also provide a step:
'Hello, World!'[3:7:2]
'l,'
If we skip the first index, we start at the beginning. If we skip the second index, we go until the end.
'Hello, World!'[::2]
'Hlo ol!'
Of course, instead of repearing the string ‘Hello, World!’ so many times (while running the risk of mistyping it), we should have stored it in a variable…
greet = 'Hello, World!'
…and then indexed:
greet[::2]
'Hlo ol!'
One of the most useful operations on strings is concatenation. It enables us to produce a single string from multiple:
'first' + 'second'
'firstsecond'
separator = ', '
'first' + separator + 'second' + separator + 'third'
'first, second, third'
Exercise¶
Use indexing and concatenation to obtain the string 'World, Hello!'
from 'Hello, World!'
.
Solution¶
greet = 'Hello, World!'
greet[7:12] + greet[5:7] + greet[:5] + greet[-1]
'World, Hello!'
None¶
We can set Python variables to a special value, None
,
a = None
of a special type,
type(a)
NoneType
None
is used to signal that the value is absent or missing.
In fact, this is the value implicitly returned by statements, such as
print(357)
357
Arithmetic expressions¶
Python supports the standard arithmetic operators:
print('Addition:', 5 + 3)
print('Subtraction:', 5 - 3)
print('Multiplication:', 5 * 3)
print('Division:', 5 / 3)
print('Exponentiataion:', 5**3)
print('Modulo:', 5 % 3)
Addition: 8 Subtraction: 2 Multiplication: 15 Division: 1.6666666666666667 Exponentiataion: 125 Modulo: 2
Python also supports integer division, which produces the largest integer less than or equal to, in the next example, 5 / 3
:
5 // 3
1
If any of the arguments is a float
, the result will also be of type float
:
5.1 // 3.1
1.0
Expressions such as
3 + 5
8
2. * x + 7.
14.14
evaluate to numbers (whether integers, or floating point numbers). They are known as arithmetic expressions.
We can perform some other common operations on numerics:
print('Absolute value:', abs(-5))
print('Rounding:', round(3.56))
print('Maximum value:', max(3, 2, 8, 10, 2, 5))
print('Minimum value:', min(3, 2, 8, 10, 2, 5))
Absolute value: 5 Rounding: 4 Maximum value: 10 Minimum value: 2
Functions¶
Suppose that we have written some code to compute the area of a circle:
radius = 5.
area = math.pi * radius * radius
print(area)
78.53981633974483
There is little point in rewriting it each time we encounter a new circle with a different radius. So we wrap it inside a function, which takes radius
as its parameter (argument) and returns the result:
def area_of_circle(radius):
area = math.pi * radius * radius
return area
We can call our function with the values of the arguments that we need in each case:
area_of_circle(5.)
78.53981633974483
r = 7.5
area_of_circle(r)
176.71458676442586
Functions can have multiple arguments:
def area_of_triangle(base, height):
print('Base:', base)
print('Height:', height)
area = .5 * base * height
return area
area_of_triangle(3., 5.)
Base: 3.0 Height: 5.0
7.5
Notice how the block of code was indented (we chose to indent it using four spaces, although some people prefer to use tabs) to dilimit it, designating it as the body of the function area_of_triangle
. The function call that ensues, area_of_triangle(3., 5.)
, is not indented, and is not part of that body.
The variables base
and height
are defined only within the body of the function. We say that those variables’ scope is limited to the body of the function.
It is possible to call the function specifying the values of the arguments in order
area_of_triangle(3., 5.)
Base: 3.0 Height: 5.0
7.5
or by name
area_of_triangle(height=5., base=3.)
Base: 3.0 Height: 5.0
7.5
Functions can also specify default values for their arguments in their definitions:
def area_of_triangle(base, height=5.):
return .5 * base * height
So calling
area_of_triangle(3., 5.)
7.5
can now be equivalently done as
area_of_triangle(3.)
7.5
Notice that like everything else (e.g. integers) functions are objects and first-class citizens. Thus we can think of area_of_triangle
as a variable set to a value of type function
:
type(area_of_triangle)
function
Function objects can be passed to other functions as parameters:
def add(x, y):
return x + y
def multiply(x, y):
return x * y
def result_printer(op, x, y):
print('The result is', op(x, y))
result_printer(add, 3, 5)
result_printer(multiply, 3, 5)
The result is 8 The result is 15
Good programmers are masters of code reuse therefore they wrap generally useful pieces of code into convenient functions.
If a library defines the function that we need, then we don’t need to write our own. We have already seen (and used) the function
math.sqrt(9.)
3.0
Exercise¶
Write two functions that will return the two roots of a given quadratic equation. Test them on the quadratic equation $2x^2 -3x + \frac{1}{2} = 0$.
Solution¶
def discriminant(a, b=0., c=0.):
return b * b - 4. * a * c
def root1(a, b=0., c=0.):
d = discriminant(a, b, c)
return (-b - math.sqrt(d)) / (2. * a)
def root2(a, b=0., c=0.):
d = discriminant(a, b, c)
return (-b + math.sqrt(d)) / (2. * a)
root1(2., -3., .5)
0.19098300562505255
root2(2., -3., .5)
1.3090169943749475
Booleans and logic¶
bool
ean is a binary variable type, that can either be True
or False
. It is so named after the self-taught English mathematician, philosopher, and logician George Boole: https://en.wikipedia.org/wiki/George_Boole
my_bool = True
print(my_bool, type(my_bool))
True <class 'bool'>
my_bool = False
print(my_bool, type(my_bool))
False <class 'bool'>
Let us set x
to the integer 10:
x = 10
Expressions that evaluate to either True
or False
are known as boolean expressions.
x < 10
False
type(x < 10)
bool
Different boolean expressions can be obtained by using different comparison operators, such as less than:
x < 10
False
less than or equals:
x <= 10
True
equals:
x == 10
True
greater than or equals:
x >= 10
True
greater than:
x > 10
False
not equal:
x != 10
False
And these comparison operators can be combined with logical operators, such as not
, and
, and or
:
x <= 10 and x % 2 == 1
False
x <= 10 or x % 2 == 1
True
We can also use the built-in function all
:
all([x > 1, 5 <= x, 5 > 3, 7 != 1])
True
which is equivalent to
x > 1 and 5 <= x and 5 > 3 and 7 != 1
True
Similarly,
all([x > 1, 5 <= x, x == 5, 5 > 3, 7 != 1])
False
is equivalent to
x > 1 and 5 <= x and x == 5 and 5 > 3 and 7 != 1
False
Another builtin function, any
, enables us to write
any([x > 1, 5 <= x, x == 5, 5 > 3, 7 != 1])
True
which is somewhat more succinct and arguably more readable than the equivalent
x > 1 or 5 <= x or x == 5 or 5 > 3 or 7 != 1
True
Each data type can also be cast to True
or False
. As a general rule, objects like string
if they do not contain anything, zeros, and None
will be cast to False
, while everything else will be cast to True
:
print(bool())
print(bool(''))
print(bool(' '))
print(bool(0))
print(bool(0.))
print(bool(1))
print(bool(1.5))
print(bool(None))
False False True False False True True False
Control flow¶
We can control the flow of our programs using the basic logical operators and if
statements. The if
statement evaluates the if
block if the given boolean expression is True
and the else
block (as long as it is present) if the given boolean expression is False
. Else-if or elif
lets us set a specific boolean expression to evaluate if the base case is not True
.
if x <= 7:
print('x is less than or equal to seven')
else:
print('x is greater than seven')
x is greater than seven
if x <= 7:
print('x is less than or equal to seven')
In this example, x > 7
(so x <= 7
is False
) but there is no else
block, so nothing is evaluated/printed.
if x <= 7:
print('x is less than or equal to seven')
elif x <= 10:
print('x is greater than seven but less than or equal to ten')
elif x <= 15:
print('x is greater than ten but less than or equal to fifteen')
else:
print('x is greater than fifteen')
x is greater than seven but less than or equal to ten
We can also have nested if-else
statements:
if x % 2 == 0:
print('x is divisible by 2')
if x % 5 == 0:
print('x is divisible by 2 and 5')
elif x % 5 == 0:
print('x is divisible by 5 but not 2')
else:
print('x is divisible by neither 2 nor 5')
x is divisible by 2 x is divisible by 2 and 5
To check whether a variable is None
we use is None
rather than == None
:
if x is None:
print('x is None')
else:
print('x is not None')
x is not None
Exercise¶
Write a function that will return the number of real solutions of a quadratic equation.
Solution¶
def number_of_solutions(a, b=0, c=0):
# We could have used a function to compute the discriminant, but didn't in this case:
discriminant = b * b - 4. * a * c
if discriminant > 0:
return 2
elif discriminant == 0:
return 1
else:
return 0
number_of_solutions(2., -3., .5)
2
number_of_solutions(1., 3., 5.)
0
Exercise¶
The Fibonacci sequence is a sequence of integers, starting with zero and one, such that each term in the sequence is the sum of the previous two. Thus the first few terms of the sequence are 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, etc. Write a function that, given n
, will return the n
th term of the Fibonacci sequence.
Solution¶
def fibonacci(n):
if n == 0: return 0
elif n == 1: return 1
else:
return fibonacci(n - 2) + fibonacci(n - 1)
fibonacci(0)
0
fibonacci(1)
1
fibonacci(2)
1
fibonacci(3)
2
fibonacci(4)
3
fibonacci(5)
5
fibonacci(6)
8
Note that, when defining the function fibonacci
, we have used what is known as recursion: the function returns specific values in the base cases (when n = 0
and n = 1
) and calls itself in the recursive case.
Data structures¶
As data scientists, we care a lot about data structures that let us store and access large amounts of data. Some such data structures, such as lists, tuples, dictionaries, and sets, are part of the Python standard. Others, such as multidimensional arrays and dataframes, are provided by third-party, but de facto standard libraries, such as NumPy and Pandas, respectively.
Lists¶
A list is arguably the most commonly used data structure in Python. Its core function is to allow storage of and access to various elements. Financial data in particular are often represented as time-series, which are, collections of observed values with corresponding time. To define a list
we use square brackets []
:
my_list = [1, 5, 6, 3]
print(my_list, type(my_list))
[1, 5, 6, 3] <class 'list'>
Python allows us to combine elements of different types into the same list
:
my_list = [3, "hello world", True, None, 3, math.pi]
print(my_list)
[3, 'hello world', True, None, 3, 3.141592653589793]
Let’s examine the length of our list:
len(my_list)
6
Notice that repeated values are counted as distinct elements.
Accessing elements of a list
is performed using indexing with []
. Rememeber that the index of the first element of the list is 0
:
print(my_list[0])
print(my_list[1])
print(my_list[3])
print(my_list[2])
print(my_list[4])
3 hello world None True 3
You may also access elements from the end of a list by using negative indexing:
print(my_list[-1])
print(my_list[-2])
print(my_list[-3])
print(my_list[-4])
3.141592653589793 3 None True
We may set an element in a list
to a new value:
my_list[-1] = 4
print(my_list)
[3, 'hello world', True, None, 3, 4]
We can select a sublist from the list:
my_list = ['problems','worthy','of','attack','prove','their','worth','by','fighting','back']
print(my_list[3:6])
['attack', 'prove', 'their']
Notice that the index 3 is inclusive, whereas the index 6 exclusive, so, as a result, we obtain a sublist containing elements at indices 3, 4, and 5 (but not 6).
We may also select sublists without the lower and/or upper bounds:
print(my_list[3:])
print(my_list[:5])
print(my_list[:])
['attack', 'prove', 'their', 'worth', 'by', 'fighting', 'back'] ['problems', 'worthy', 'of', 'attack', 'prove'] ['problems', 'worthy', 'of', 'attack', 'prove', 'their', 'worth', 'by', 'fighting', 'back']
You can specify a step:
my_list[::2]
['problems', 'of', 'prove', 'worth', 'fighting']
Reverse the order by setting a negative step size:
my_list[::-1]
['back', 'fighting', 'by', 'worth', 'their', 'prove', 'attack', 'of', 'worthy', 'problems']
And combine the step with lower and upper bounds:
my_list[2:10:3]
['of', 'their', 'fighting']
We can use Python’s range
function to generate a list of consecutive integers:
list(range(10))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Conveniently, we can specify a step as well:
list(range(1,10,2))
[1, 3, 5, 7, 9]
We can add elements to the end of the list via the append
method (a method is a function associated with a particular object, in our example, my_list
):
my_list = list(range(0,10))
my_list.append(25)
my_list.append(25)
print(my_list)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 25, 25]
We can remove specific elements by calling the method remove
and supplying it with the value of an element that we would like to remove. Note that only the first instance of an element will be removed:
my_list.remove(25)
my_list.remove(my_list[5])
print(my_list)
[0, 1, 2, 3, 4, 6, 7, 8, 9, 25]
We can filter a list using something like
list(filter(lambda x: x > 5, my_list))
[6, 7, 8, 9, 25]
Here the lambda or anonymous function lambda x: x > 5
is equivalent to
def my_func(x): return x > 5
but shorter and avoids giving the function a name — it is not needed, as we don’t intent to call this function in the future.
We can map or apply a function (or lambda) to each element of a list:
list(map(lambda x: x*2, my_list))
[0, 2, 4, 6, 8, 12, 14, 16, 18, 50]
The function sorted
sorts a list without modifying it — it returns a new, sorted list, while keeping the original one intact:
sorted(my_list, reverse=True)
[25, 9, 8, 7, 6, 4, 3, 2, 1, 0]
my_list
[0, 1, 2, 3, 4, 6, 7, 8, 9, 25]
On the other hand, the method sort
modifies the list — it sorts it in place:
my_list.sort(reverse=True)
my_list
[25, 9, 8, 7, 6, 4, 3, 2, 1, 0]
Tuples¶
Let us consider an example.
a = [3, "hello world", True, None, 3, math.pi]
a = ['some', 'other', 'list']
a
['some', 'other', 'list']
The variable a
was first assigned to the list [3, "hello world", True, None, 3, math.pi]
, but was then reassigned to another list, ['some', 'other', 'list']
. Variables can be thought of as pointers (references) to objects in memory, such as lists. Two variables can reference the same object in memory, e.g.
a = [3, "hello world", True, None, 3, math.pi]
b = a
Now,
a
[3, 'hello world', True, None, 3, 3.141592653589793]
b
[3, 'hello world', True, None, 3, 3.141592653589793]
Since lists are mutable objects, they can be modified after construction. Notice that we are not reassigning a variable so it references a new object in memory, we are modifying the object that it is already pointing to:
a[2] = False
a
[3, 'hello world', False, None, 3, 3.141592653589793]
Notice that, since the variable b
is referencing the same object, its value has also changed:
b
[3, 'hello world', False, None, 3, 3.141592653589793]
In this sense, mutable objects are somewhat dangerous. Consider the following code:
def my_mean(arg):
# This could be a long function, which, perhaps by mistake,
# modifies arg:
# ...
arg[3] = 11.7
# ...
return sum(arg) / len(arg)
Let’s apply this function to
a = [4.25, 18.5, 22.5, 13.7, 25.4]
The result of the (broken) my_mean
looks roughly correct…
my_mean(a)
16.47
…although it’s not. But what’s worse, the user of my_mean
, who never expected that function to modify its argument, is in for a surprise:
a
[4.25, 18.5, 22.5, 11.7, 25.4]
When we doubt the validity of some code, we may defensively copy the arguments like so:
a = [4.25, 18.5, 22.5, 13.7, 25.4]
print(my_mean(a.copy()))
a
16.47
[4.25, 18.5, 22.5, 13.7, 25.4]
Notice that a copy is equal to…
a = [4.25, 18.5, 22.5, 13.7, 25.4]
b = [4.25, 18.5, 22.5, 13.7, 25.4]
a == b
True
…but not identical to (does not correspond to the same object in memory as) the original:
a is b
False
Whereas if both variables point to the same object in memory we get both equality and identity:
a = [4.25, 18.5, 22.5, 13.7, 25.4]
b = a
print(a == b)
print(a is b)
True True
We can also check this by examining the id
of the object, which in CPython is equal to its address in memory:
id(a)
2363859066816
id(b)
2363859066816
Mutable objects, such as lists, may therefore be a source of subtle and difficult to track bugs. They are less safe than immutable objects, which cannot be modified after construction. Mutable objects are particularly dangerous in multi-threaded environments where code runs in parallel.
Fortunately, Python has a built-in data structure, which is very similar to a list, but immutable: a tuple. We create a tuple instead of a list by using round brackets instead of square brackets:
a = (4.25, 18.5, 22.5, 13.7, 25.4)
type(a)
tuple
Alternatively, we may cast a list to a tuple:
a = tuple([4.25, 18.5, 22.5, 13.7, 25.4])
type(a)
tuple
Once a tuple has been created, it cannot be modified: a[0] = 3.57
will raise an error and the tuple doesn’t have methods such as a.append(3.57)
.
Notice that
(3)
3
is interpreted as the number 3, whereas
(3,)
(3,)
is interpreted as a tuple containing a single element — number 3.
Exercise¶
Write a single function that will return the two roots of a given quadratic equation as a tuple. Test your function on the quadratic equation $2x^2 -3x + \frac{1}{2} = 0$.