Programming

  minte9
LearnRemember



Numpy / Matrices


Vector

A vector is an one-dimensional array.
 
""" Matrices / Vectors
Vectors are one-dimensional arrays
"""

import numpy as np

v1 = np.array([1, 2, 3])
v2 = np.array([[1], [4], [3]])

print("Row vector: \n", v1)
print("Column vector: \n", v2)

"""
    Row vector: 
     [1 2 3]
    Column vector: 
    [[1]
     [4]
     [3]]
"""

Matrix

Numpy main data structure is the multidimensional array.
 
""" Matrices / Create matrix

Numpy is the foundation of the Python machine learning stack
The main data structure is the multidimensional array

Arrays are zero-indexed, first element index is 0
Use ':' to select everything 'up to' or 'after'
"""

import numpy as np

M = np.array([ # three rows, four columns
    [1, 2, 3, 4],
    [5, 6, 7, 8],
    [9, 10, 11, 12],
])

v = np.array([1, 2, 3, 4, 5, 6])

print("M items: \n", M)
print("M.shape =", M.shape)
print("M.size =", M.size)
print("M.ndim =", M.ndim, '\n')

print("v[:]  =", v[:])
print("v[:3] =", v[:3])
print("v[3:] =", v[3:])
print("v[-1] =", v[-1], "\n")

print("M[1, 1] second row, second column:", M[1, 1])
print("M[:2, :] up to 2 rows, all columns: \n", M[:2, :])
print("M[:, 1:2] all rows, second column: \n", M[:, 1:2])

"""
    M items: 
     [[ 1  2  3  4]
     [ 5  6  7  8]
     [ 9 10 11 12]]
    M.shape = (3, 4)
    M.size = 12
    M.ndim = 2 

    v[:]  = [1 2 3 4 5 6]
    v[:3] = [1 2 3]
    v[3:] = [4 5 6]
    v[-1] = 6 

    M[1, 1] second row, second column: 6
    M[:2, :] up to 2 rows, all columns: 
     [[1 2 3 4]
      [5 6 7 8]]
    M[:, 1:2] all rows, second column: 
     [[ 2]
      [ 6]
      [10]]
"""

Sparse Matrices

Sparse matrices only store non-zero elements.
 
""" Matrices / Sparse matrix

A sparse matrix stores only non-zero elements, for computation savings.
Compress sparce row (CSR) matrices contain indexes of non-zero values.

Example (Netflix movies/users):
  Columns are every movie on Netflix
  Rows are every Netflix user
  Values are how many times a user watched that movie
"""

import numpy as np
from scipy import sparse

# Sparse matrix
M = np.array([
    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
    [0, 1, 0, 0, 0, 0, 0, 0, 0, 0], 
    [3, 0, 0, 0, 0, 0, 0, 0, 0, 0], 
])
M_sparse = sparse.csr_matrix(M) # CSR matrix

# Random matrix
np.random.seed(0)
M_random1 = np.random.random(3)          # floats
M_random2 = np.random.randint(1, 11, 3)  # integers

print("Sparce matrix: \n", M_sparse)
print("Random matrix of floats: \n", M_random1)
print("Random matrix of integers: \n", M_random2)

"""
    Sparce matrix: 
      (1, 1)        1
      (2, 0)        3
    Random matrix of floats: 
     [0.5488135  0.71518937 0.60276338]
    Random matrix of integers: 
     [ 4  8 10]
"""

Vectorization

Vectorization is essentialy a for loop.
 
""" Vectorization
It is essentialy a for loop that does not increase performance.
When broadcasting different dimensions are allowed.
"""

import numpy as np

M = np.array([
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9],
])

# Add 100 every element
add_100 = lambda i: i + 100                 # create function that add 100
vectorize_add_100 = np.vectorize(add_100)   # create vectorized function
M_vectorized1 = vectorize_add_100(M)        # apply vectorization to all

# One line code
M_vectorized2 = np.vectorize(lambda i: i + 100)(M)
assert(M_vectorized1 == M_vectorized2).all()

# Brodcasting
M_vectorized3 = M + 100

print("Matrix: \n", M)
print("Vectorized 100: \n", M_vectorized1)
print("Broadcasted 100: \n", M_vectorized3)

"""
    Matrix: 
     [[1 2 3]
      [4 5 6]
      [7 8 9]]
    Vectorized 100: 
     [[101 102 103]
      [104 105 106]
      [107 108 109]]
    Broadcasted 100: 
     [[101 102 103]
      [104 105 106]
      [107 108 109]]
"""

Count Vectorizer

We can represent texts as vectors and compute similarity.
 
""" Count Vectorizer
Represent texts as vectors and compute similarity.
The word `london` occurs 2 times in A and 1 time in B.
We can find the cos similarity between these two vectors.
"""

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

A = 'London Paris London'
B = 'Paris Paris London'

cv = CountVectorizer()
matrix = cv.fit_transform([A, B])
similarity_scores = cosine_similarity(matrix)

print("Features names: \n", cv.get_feature_names_out())
print("Matrix vectorized: \n", matrix)
print("Matrix array: \n", matrix.toarray(), "\n")
print ("Similarity scores: \n", similarity_scores)

"""
    Features names: 
     ['london' 'paris']
    Matrix vectorized: 
       (0, 0)       2
      (0, 1)        1
      (1, 0)        1
      (1, 1)        2
    Matrix array: 
     [[2 1]
     [1 2]] 

    Similarity scores: 
     [[1.  0.8]
     [0.8 1. ]]
"""





References


Related