Pandas Package
Pandas is the main package for working with relational or labeled data. It is built on top of NumPy package.Data Frame
A DataFrame is a multi-dimensional table containing a collection of Series. A Pandas Series is like a column in a table. It is an one-dimensional array holding data of any type.
""" Pandas / DataFrame
"""
import pandas as pd
data = {
'apples': [3, 2, 0, 1],
'oranges': [0, 3, 7, 2],
'available': ['yes', 'no', 'yes', 'no'],
}
df = pd.DataFrame(data)
print(df)
"""
apples oranges available
0 3 0 yes
1 2 3 no
2 0 7 yes
3 1 2 no
"""
Read Csv
We can import in data frame values from .csv file.
""" Read data / CSV
The source can be URL or FILE
For table display we can use tabulate package
pip install tabulate
"""
import pandas as pd
import pathlib
URL = 'https://raw.githubusercontent.com/chrisalbon/sim_data/master/data.csv'
DIR = pathlib.Path(__file__).resolve().parent
FILE = DIR / '_data/01.csv'
df1 = pd.read_csv(URL)
df2 = pd.read_csv(FILE)
print("URL:")
print(df1.head(2).to_markdown())
print("FILE:")
print(df2.head(2).to_markdown())
"""
Read csv from url:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
Read csv from file:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
"""
Excel Spreadsheet
We can import data from an Excel spreadsheet.
""" Read data / Excel
Import an Excel spreadsheet
pip install openpyxl
"""
import pandas as pd
import pathlib
DIR = pathlib.Path(__file__).resolve().parent
FILE = DIR / '_data/02.xlsx'
df = pd.read_excel(FILE , sheet_name=0)
print("EXCEL:")
print(df.head(2).to_markdown())
"""
Read csv from excel:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
"""
Json File
We can load data from a JSON for data preprocessing.
""" Read data / Json
Load a JSON file for data preprocessing.
"""
import pandas as pd
import pathlib
# File path
DIR = pathlib.Path(__file__).resolve().parent
FILE = DIR / '_data/03.json'
# Read from json
print("Load from json file:")
df = pd.read_json(
FILE, orient='columns'
)
print(df.head(2).to_markdown())
# Read from string
print("Load from json string:")
data = [
{"id": 1, "name": "Mary"},
{"id": 2, "name": "John"},
]
df = pd.json_normalize(data)
print(df.head(2).to_markdown())
"""
DataFrame from json file:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
DataFrame from json string:
| | id | name |
|---:|-----:|:-------|
| 0 | 1 | Mary |
| 1 | 2 | John |
"""
Sql Queries
Querying databases is probably the most used method in real world.
""" Read data / SQL Queries
"""
import pandas as pd
import sqlite3
import pathlib
# File path
DIR = pathlib.Path(__file__).resolve().parent
DB = DIR / '_data/04.db'
conn = sqlite3.connect(DB)
df = pd.read_sql_query("SELECT * FROM data", conn)
print("DB:")
print(df.head(2).to_markdown())
"""
DataFrame from DB:
| | first_name | last_name | age | preTestScore | postTestScore |
|---:|:-------------|:------------|------:|---------------:|----------------:|
| 0 | Jason | Miller | 42 | 4 | 25 |
| 1 | Molly | Jacobson | 52 | 24 | 94 |
"""
Last update: 277 days ago