Functional Programming In Python #AI - The Entrepreneurial Way with A.I.

Breaking

Friday, December 31, 2021

Functional Programming In Python #AI

#A.I.

Last Updated on December 19, 2021

Python is a fantastic programming language. It is likely to be your first choice for developing a machine learning or data science application. Python is interesting because it is a multi-paradigm programming language that can be used for both object-oriented and imperative programming. It has a simple syntax that is easy to read and comprehend.

In computer science and mathematics, the solution of many problems can be more easily and naturally expressed using the functional programming style. In this tutorial, we’ll discuss Python’s support for the functional programming paradigm, and Python’s classes and modules that help you program in this style.

After completing this tutorial, you will know:

  • Basic idea of functional programming
  • The itertools library
  • The functools library
  • Map-reduce design pattern and its possible implementation in Python

Let’s get started.

Functional Programming In Python
Photo by Abdullah_Shakoor, some rights reserved

Tutorial Overview

This tutorial is divided into 5 parts; they are:

  1. The idea of functional programming
  2. High order functions: Filter, map, and reduce
  3. Itertools
  4. Functools
  5. Map-reduce pattern

The idea of functional programming

If you have programming experience, likely you learned imperative programming. It is built with statements and manipulating variables. Functional programming is a declarative paradigm. It is different from imperative paradigm that programs are built by applying and composing functions. The functions here are supposed to be closer to the definition of a mathematical function, which there are no side effects, or simply, no access to external variables and when you call them with the same argument, they always give you the same result.

The benefit of functional programming is to make your program less error-prone. Without the side effects, it is more predictable and easier to see the outcome. We also no need to worry about one part of the program is interfering another part.

Many libraries adopted a functional programming paradigm. For example the following using pandas and pandas-datareader:

import pandas_datareader as pdr
import pandas_datareader.wb

df = (
    pdr.wb
    .download(indicator="SP.POP.TOTL", country="all", start=2000, end=2020)
    .reset_index()
    .filter(["country", "SP.POP.TOTL"])
    .groupby("country")
    .mean()
)
print(df)

This gives you the following output:

SP.POP.TOTL
country                                  
Afghanistan                  2.976380e+07
Africa Eastern and Southern  5.257466e+08
Africa Western and Central   3.550782e+08
Albania                      2.943192e+06
Algeria                      3.658167e+07
...                                   ...
West Bank and Gaza           3.806576e+06
World                        6.930446e+09
Yemen, Rep.                  2.334172e+07
Zambia                       1.393321e+07
Zimbabwe                     1.299188e+07

The pandas-datareader is a useful library that helps you download data from the Internet in the realtime. The above example is to download population data from the World Bank. The result is a pandas dataframe with countries and years as index and a single column named “SP.POP.TOTL” for the population. Then we manipulate the dataframe step by step, and at the end, we find the average population of all countries across the years.

We can write in this way because in pandas, most functions on the dataframe are not changing the dataframe, but to produce a new dataframe to reflect the result of the function. We call this behavior immutable because the input dataframe never changed. The consequence is that, we can chain up the functions to manipulate the dataframe step by step. If we have to break it into using the style of imperative programming, the above program is same as the following:

import pandas_datareader as pdr
import pandas_datareader.wb

df = pdr.wb.download(indicator="SP.POP.TOTL", country="all", start=2000, end=2020)
df = df.reset_index()
df = df.filter(["country", "SP.POP.TOTL"])
groups = df.groupby("country")
df = groups.mean()

print(df)

High order functions: Filter, map, and reduce

Python is not a strictly functional programming language. But it is trivial to write Python in a functional style. There are three basic functions on iterables that allows us to write powerful program in a very trivial way: filter, map, and reduce.

Filter is to select some of the elements in an iterable, such as a list. Map is to transform elements one by one. Finally, reduce is to convert the entire iterable into a different form, such as the sum of all elements or concatenating substrings in a list into a longer string. To illustrate their use, let’s consider a simple task: Given a log file from the Apache web server, find the IP address that sent most requests with error code 404. If you have no idea what a log file from Apache web server looks like, the following is an example:

89.170.74.95 - - [17/May/2015:16:05:27 +0000] "HEAD /projects/xdotool/ HTTP/1.1" 200 - "-" "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:24.0) Gecko/20100101 Firefox/24.0" 
95.82.59.254 - - [19/May/2015:03:05:19 +0000] "GET /images/jordan-80.png HTTP/1.1" 200 6146 "http://www.semicomplete.com/articles/dynamic-dns-with-dhcp/" "Mozilla/5.0 (Windows NT 6.1; rv:27.0) Gecko/20100101 Firefox/27.0"
155.140.133.248 - - [19/May/2015:06:05:34 +0000] "GET /images/jordan-80.png HTTP/1.1" 200 6146 "http://www.semicomplete.com/blog/geekery/debugging-java-performance.html" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"
68.180.224.225 - - [20/May/2015:20:05:02 +0000] "GET /blog/tags/documentation HTTP/1.1" 200 12091 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"

The above is from a bigger file located here. These are a few lines from the log. Each line begins with the IP address of the client (i.e., the browser) and the code after “HTTP/1.1” is the response status code. Normally it is 200 if the request is fulfilled. But if the browser requested something not exists in the server, the code will be 404. To find the IP address that corresponds to the most 404 requests, we can simply scan the log file line by line, find those with 404, and count the IP address to find the one with the most occurrences.

In Python code, we can do the following. First we see how we can read the log file and extract the IP address and status code from a line:

import urllib.request
import re

# Read the log file, split into lines
logurl = "https://raw.githubusercontent.com/elastic/examples/master/Common%20Data%20Formats/apache_logs/apache_logs"
logfile = urllib.request.urlopen(logurl).read().decode("utf8")
lines = logfile.splitlines()

# using regular expression to extract IP address and status code from a line
def ip_and_code(logline):
    m = re.match(r'([\d\.]+) .*? \[.*?\] ".*?" (\d+) ', logline)
    return (m.group(1), m.group(2))

print(ip_and_code(lines[0]))

then we can use a couple map() and filter() and some other functions to find the IP address:

...

import collections

def is404(pair):
    return pair[1] == "404"
def getIP(pair):
    return pair[0]
def count_ip(count_item):
    ip, count = count_item
    return (count, ip)

# transform each line into (IP address, status code) pair
ipcodepairs = map(ip_and_code, lines)
# keep only those with status code 404
pairs404 = filter(is404, ipcodepairs)
# extract the IP address part from each pair
ip404 = map(getIP, pairs404)
# count the occurrences, the result is a dictionary of IP addresses map to the count
ipcount = collections.Counter(ip404)
# convert the (IP address, count) tuple into (count, IP address) order
countip = map(count_ip, ipcount.items())
# find the tuple with the maximum on the count
print(max(countip))

Here we did not use reduce() function because we have some specialized reduce operation built-in, such as max(). But indeed, we can make a simpler program with list comprehension notation:

...

ipcodepairs = [ip_and_code(x) for x in lines]
ip404 = [ip for ip,code in ipcodepairs if code=="404"]
ipcount = collections.Counter(ip404)
countip = [(count,ip) for ip,count in ipcount.items()]
print(max(countip))

or even write it in a single statement (but less readable):

import urllib.request
import re
import collections

logurl = "https://raw.githubusercontent.com/elastic/examples/master/Common%20Data%20Formats/apache_logs/apache_logs"
print(
    max(
        [(count,ip) for ip,count in
            collections.Counter([
                ip for ip, code in
                [ip_and_code(x) for x in
                     urllib.request.urlopen(logurl)
                     .read()
                     .decode("utf8")
                     .splitlines()
                ]
                if code=="404"
            ]).items()
        ]
    )
)

Itertools in Python

The above example on filter, map, and reduce illustrates the ubiquity of iterables in Python. This includes lists, tuples, dictionaries, sets, and even generators as all of them can be iterated using a for-loop. In Python, we have a module named itertools that brings in more functions to manipulate (but not mutate) iterables. From Python’s official documentation:

The module standardizes a core set of fast, memory efficient tools that are useful by themselves or in combination. Together, they form an “iterator algebra” making it possible to construct specialized tools succinctly and efficiently in pure Python.

We’ll discuss a few functions of itertools in this tutorial. When trying out the examples given below, be sure to import itertools and operator as:

import itertools
import operator

Infinite Iterators

Infinite iterators help you create sequences of infinite length as shown below.

Construct + Example Output
count()
start = 0
step = 100
for i in itertools.count(start, step):
    print(i)
    if i>=1000:
        break
0
100
200
300
400
500
600
700
800
900
1000
cycle()
counter = 0
cyclic_list = [1, 2, 3, 4, 5]
for i in itertools.cycle(cyclic_list):
    print(i)
    counter = counter+1
    if counter>10:
        break
1
2
3
4
5
1
2
3
4
5
1
repeat()
for i in itertools.repeat(3,5):
    print(i)
3
3
3
3
3

Combinatoric iterators

You can create permutations, combinations, etc. with these iterators.

Construct + Example Output
product()
x = [1, 2, 3]
y = ['A', 'B']
print(list(itertools.product(x, y)))
[(1, 'A'), (1, 'B'), (2, 'A'), (2, 'B'), 
 (3, 'A'), (3, 'B')]
permutations()
x = [1, 2, 3]
print(list(itertools.permutations(x)))
[(1, 2, 3), (1, 3, 2), (2, 1, 3), 
 (2, 3, 1), (3, 1, 2), (3, 2, 1)]
combinations()
y = ['A', 'B', 'C', 'D']
print(list(itertools.combinations(y, 3)))
[('A', 'B', 'C'), ('A', 'B', 'D'), 
 ('A', 'C', 'D'), ('B', 'C', 'D')]
combinations_with_replacement()
z = ['A', 'B', 'C']
print(list(itertools.combinations_with_replacement(z, 2)))
[('A', 'A'), ('A', 'B'), ('A', 'C'), 
 ('B', 'B'), ('B', 'C'), ('C', 'C')]

More Useful Iterators

There are other iterators that stop at the end of the shorter of the two lists passed as arguments.  Some of them are described below. This is not an exhaustive list and you can see the complete list here.

Accumulate()

Automatically creates an iterator that accumulates the result of a given operator or function, and returns the result. You can choose an operator from Python’s operator  library or write your own customized operator.

# Custom operator
def my_operator(a, b):
    return a+b if a>5 else a-b
    
x = [2, 3, 4, -6]
mul_result = itertools.accumulate(x, operator.mul)
print("After mul operator", list(mul_result))
pow_result = itertools.accumulate(x, operator.pow)
print("After pow operator", list(pow_result))
my_operator_result = itertools.accumulate(x, my_operator)
print("After customized my_operator", list(my_operator_result))
After mul operator [2, 6, 24, -144]
After pow operator [2, 8, 4096, 2.117582368135751e-22]
After customized my_operator [2, -1, -5, 1]

Starmap()

Apply the same operator to pairs of items.

pair_list = [(1, 2), (4, 0.5), (5, 7), (100, 10)]

starmap_add_result = itertools.starmap(operator.add, pair_list)
print("Starmap add result: ", list(starmap_add_result))

x1 = [2, 3, 4, -6]
x2 = [4, 3, 2, 1] 

starmap_mul_result = itertools.starmap(operator.mul, zip(x1, x2))
print("Starmap mul result: ", list(starmap_mul_result))
Starmap add result:  [3, 4.5, 12, 110]
Starmap mul result:  [8, 9, 8, -6]

filterfalse()

Filter out data based on a specific criterion.

my_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
even_result = itertools.filterfalse(lambda x: x%2, my_list)
small_terms = itertools.filterfalse(lambda x: x>=5, my_list)                               
print('Even result:', list(even_result))
print('Less than 5:', list(small_terms))
Even result: [2, 4, 6, 8, 10]
Less than 5: [1, 2, 3, 4]

Functools in Python

In most programming languages, passing function as arguments or a function returning another function might be confusing or hard to work with. Python includes the functools library that makes it easy to work with these functions. From Python’s official functools documentation:

The functools module is for higher-order functions: functions that act on or return other functions. In general, any callable object can be treated as a function

Here we explain a few nice features of this library. You can look at the complete list of functools functions here.

Using lru_cache

In imperative programming languages, recursion is very expensive. Every time a function is invoked, it is evaluated, even if it is called with the same set of arguments. In Python, the lru_cache is a decorator that can be used to cache the results of function evaluations. When the function is invoked again with the same set of arguments, the stored result is used, avoiding the extra overhead related to recursion.

Let’s look at the following example. We have the same implementation of the computation of the nth Fibonacci number with and without lru_cache. We can see that fib(30) has 31 function evaluations just as we expect because of lru_cache. The fib() function is invoked only for n=0,1,2…30 and the result is stored in memory and used later. This is significantly less as compared to fib_slow(30), with 2692537 evaluations.

import functools
@functools.lru_cache
def fib(n):
    global count
    count = count + 1
    return fib(n-2) + fib(n-1) if n>1 else 1

def fib_slow(n):
    global slow_count
    slow_count = slow_count + 1
    return fib_slow(n-2) + fib_slow(n-1) if n>1 else 1

count = 0
slow_count = 0
fib(30)
fib_slow(30)

print('With lru_cache total function evaluations: ', count)
print('Without lru_cache total function evaluations: ', slow_count)
With lru_cache total function evaluations:  31
Without lru_cache total function evaluations:  2692537

It is worth to note that the lru_cache decorator is particularly useful when you’re experimenting with machine learning problems in Jupyter notebooks. If you have a function that downloads data from the Internet, wrapping it with lru_cache can keep your download in memory and avoid downloading the same file again even if you invoked the download function multiple times.

Using reduce()

Reduce is similar to the itertools.accumulate(). It applies a function repeatedly to the elements of a list and returns the result. Here are a few examples with comments to explain the working of this functions.

# Evaluates ((1+2)+3)+4
list_sum = functools.reduce(operator.add, [1, 2, 3, 4])
print(list_sum)

# Evaluates (2^3)^4
list_pow = functools.reduce(operator.pow, [2, 3, 4])
print(list_pow)
10
4096

The reduce() function can accept any “operators” and optionally an initial value. For example, the collections.Counter function in the previous example can be implemented as follows

import functools

def addcount(counter, element):
    if element not in counter:
        counter[element] = 1
    else:
        counter[element] += 1
    return counter

items = ["a", "b", "a", "c", "d", "c", "b", "a"]

counts = functools.reduce(addcount, items, {})
print(counts)
{'a': 3, 'b': 2, 'c': 2, 'd': 1}

Using partial()

There are situations when you  have a function that takes multiple arguments and some of its arguments are repeated again and again. The function partial() returns a new version of the same function with a reduced number of arguments.

For example, if you have to compute the power of 2 repeatedly, you can create a new version of numpy’s power() function as shown below:

import numpy

power_2 = functools.partial(np.power, 2)
print('2^4 =', power_2(4))
print('2^6 =', power_2(6))
2^4 = 16
2^6 = 64

Map-Reduce Pattern

We have mentioned the filter, map, and reduce functions as high order functions in a previous section. The use of map-reduce design pattern is indeed a way to help us easily make a highly scalable program. The map-reduce pattern is an abstract representation of many types of computations that manipulate lists or collections of objects. The map stage takes the input collection and maps it to an intermediate representation. The reduce step takes this intermediate representation and computes a single output from it. This design pattern is very popular in functional programming languages. Python also provides constructs to implement this design pattern in an efficient manner.

Map-Reduce In Python

As an illustration of the map-reduce design pattern, let’s take a simple example. Suppose we want to count the numbers divisible by 3 in a list. We’ll use lambda to define an anonymous function and use it to map() all items of a list to 1 or 0 depending upon whether they pass our divisibility test or not. The function map() takes as argument a function and an iterable. Next we’ll use reduce() to accumulate the overall result.

# All numbers from 1 to 20
input_list = list(range(20))
# Use map to see which numbers are divisible by 3
bool_list = map(lambda x: 1 if x%3==0 else 0, input_list)
# Convert map object to list
bool_list = list(bool_list)
print('bool_list =', bool_list)

total_divisible_3 = functools.reduce(operator.add, bool_list)
print('Total items divisible by 3 = ', total_divisible_3)
bool_list = [1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0]
Total items divisible by 3 =  7

The previous example, while being very simple, illustrates how easy it is to implement map-reduce design pattern in Python. You can solve complex and lengthy problems using the surprisingly simple and easy constructs in Python.

Further Reading

This section provides more resources on the topic if you are looking to go deeper.

Books

Python Official Documentation

Summary

In this tutorial, you discovered features of Python that support functional programming

Specifically, you learned:

  • The iterables returning finite or infinite sequences in Python using itertools
  • The higher order functions supported by functools
  • The map-reduce design pattern’s implementation in Python

Do you have any questions about Python discussed in this post? Ask your questions in the comments below and I will do my best to answer.

The post Functional Programming In Python appeared first on Machine Learning Mastery.



via https://AIupNow.com

Mehreen Saeed, Khareem Sudlow