Libraries Pandas - Mlearning

Pandas Package

Pandas is the main package for working with relational or labeled data. It is built on top of NumPy package.

Data Frame

A DataFrame is a multi-dimensional table containing a collection of Series. A Pandas Series is like a column in a table. It is an one-dimensional array holding data of any type.

 
""" Pandas / DataFrame
"""

import pandas as pd

data = {
    'apples': [3, 2, 0, 1],
    'oranges': [0, 3, 7, 2],
    'available': ['yes', 'no', 'yes', 'no'],
}

df = pd.DataFrame(data)
print(df)

"""
   apples  oranges available
0       3        0       yes
1       2        3        no
2       0        7       yes
3       1        2        no
"""

Read Csv

We can import in data frame values from .csv file.

 
""" Read data / CSV
The source can be URL or FILE
For table display we can use tabulate package 
    pip install tabulate
"""

import pandas as pd
import pathlib

URL = 'https://raw.githubusercontent.com/chrisalbon/sim_data/master/data.csv'
DIR = pathlib.Path(__file__).resolve().parent 
FILE = DIR / '_data/01.csv'

df1 = pd.read_csv(URL)
df2 = pd.read_csv(FILE)

print("URL:")
print(df1.head(2).to_markdown())

print("FILE:")
print(df2.head(2).to_markdown())

"""
Read csv from url:
|    |   integer | datetime            |   category |
|---:|----------:|:--------------------|-----------:|
|  0 |         5 | 2015-01-01 00:00:00 |          0 |
|  1 |         5 | 2015-01-01 00:00:01 |          0 |

Read csv from file:
|    |   integer | datetime            |   category |
|---:|----------:|:--------------------|-----------:|
|  0 |         5 | 2015-01-01 00:00:00 |          0 |
|  1 |         5 | 2015-01-01 00:00:01 |          0 |
"""

Excel Spreadsheet

We can import data from an Excel spreadsheet.

 
""" Read data / Excel
Import an Excel spreadsheet
    pip install openpyxl
"""

import pandas as pd
import pathlib

DIR = pathlib.Path(__file__).resolve().parent 
FILE = DIR / '_data/02.xlsx'

df = pd.read_excel(FILE , sheet_name=0)

print("EXCEL:")
print(df.head(2).to_markdown())

"""
Read csv from excel:
|    |   integer | datetime            |   category |
|---:|----------:|:--------------------|-----------:|
|  0 |         5 | 2015-01-01 00:00:00 |          0 |
|  1 |         5 | 2015-01-01 00:00:01 |          0 |
"""

Json File

We can load data from a JSON for data preprocessing.

 
""" Read data / Json
Load a JSON file for data preprocessing.
"""

import pandas as pd
import pathlib

# File path
DIR = pathlib.Path(__file__).resolve().parent 
FILE = DIR / '_data/03.json'

# Read from json
print("Load from json file:")
df = pd.read_json(
        FILE, orient='columns'
)
print(df.head(2).to_markdown())

# Read from string
print("Load from json string:")
data = [
    {"id": 1, "name": "Mary"},
    {"id": 2, "name": "John"},
]
df = pd.json_normalize(data)
print(df.head(2).to_markdown())

"""
DataFrame from json file:
|    |   integer | datetime            |   category |
|---:|----------:|:--------------------|-----------:|
|  0 |         5 | 2015-01-01 00:00:00 |          0 |
|  1 |         5 | 2015-01-01 00:00:01 |          0 |

DataFrame from json string:
|    |   id | name   |
|---:|-----:|:-------|
|  0 |    1 | Mary   |
|  1 |    2 | John   |
"""

Sql Queries

Querying databases is probably the most used method in real world.

 
""" Read data / SQL Queries
"""

import pandas as pd
import sqlite3
import pathlib

# File path
DIR = pathlib.Path(__file__).resolve().parent
DB = DIR / '_data/04.db'

conn = sqlite3.connect(DB)
df = pd.read_sql_query("SELECT * FROM data", conn)

print("DB:")
print(df.head(2).to_markdown())

"""
DataFrame from DB:
|    | first_name   | last_name   |   age |   preTestScore |   postTestScore |
|---:|:-------------|:------------|------:|---------------:|----------------:|
|  0 | Jason        | Miller      |    42 |              4 |              25 |
|  1 | Molly        | Jacobson    |    52 |             24 |              94 |
"""

 Last update: 509 days ago