Click to Flip Card
Pandas / Read Data
Pandas package is built on top of NumPy. A Pandas Series is similar to a column in a table. A Series can contain any type of data. We can import from csv, json, databases and even excel.
Pandas Package
Pandas is the main package for working with relational or labeled data. It is built on top of NumPy package.Data Frame
A DataFrame is a multi-dimensional table containing a collection of Series. A Pandas Series is like a column in a table. It is an one-dimensional array holding data of any type.
""" Pandas / DataFrame
"""
import pandas as pd
data = {
'apples': [3, 2, 0, 1],
'oranges': [0, 3, 7, 2],
'available': ['yes', 'no', 'yes', 'no'],
}
df = pd.DataFrame(data)
print(df)
"""
apples oranges available
0 3 0 yes
1 2 3 no
2 0 7 yes
3 1 2 no
"""
Read Csv
We can import in data frame values from .csv file.
""" Read data / CSV
The source can be URL or FILE
For table display we can use tabulate package
pip install tabulate
"""
import pandas as pd
import pathlib
URL = 'https://raw.githubusercontent.com/chrisalbon/sim_data/master/data.csv'
DIR = pathlib.Path(__file__).resolve().parent
FILE = DIR / '_data/01.csv'
df1 = pd.read_csv(URL)
df2 = pd.read_csv(FILE)
print("URL:")
print(df1.head(2).to_markdown())
print("FILE:")
print(df2.head(2).to_markdown())
"""
Read csv from url:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
Read csv from file:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
"""
Excel Spreadsheet
We can import data from an Excel spreadsheet.
""" Read data / Excel
Import an Excel spreadsheet
pip install openpyxl
"""
import pandas as pd
import pathlib
DIR = pathlib.Path(__file__).resolve().parent
FILE = DIR / '_data/02.xlsx'
df = pd.read_excel(FILE , sheet_name=0)
print("EXCEL:")
print(df.head(2).to_markdown())
"""
Read csv from excel:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
"""
Json File
We can load data from a JSON for data preprocessing.
""" Read data / Json
Load a JSON file for data preprocessing.
"""
import pandas as pd
import pathlib
# File path
DIR = pathlib.Path(__file__).resolve().parent
FILE = DIR / '_data/03.json'
# Read from json
print("Load from json file:")
df = pd.read_json(
FILE, orient='columns'
)
print(df.head(2).to_markdown())
# Read from string
print("Load from json string:")
data = [
{"id": 1, "name": "Mary"},
{"id": 2, "name": "John"},
]
df = pd.json_normalize(data)
print(df.head(2).to_markdown())
"""
DataFrame from json file:
| | integer | datetime | category |
|---:|----------:|:--------------------|-----------:|
| 0 | 5 | 2015-01-01 00:00:00 | 0 |
| 1 | 5 | 2015-01-01 00:00:01 | 0 |
DataFrame from json string:
| | id | name |
|---:|-----:|:-------|
| 0 | 1 | Mary |
| 1 | 2 | John |
"""
Sql Queries
Querying databases is probably the most used method in real world.
""" Read data / SQL Queries
"""
import pandas as pd
import sqlite3
import pathlib
# File path
DIR = pathlib.Path(__file__).resolve().parent
DB = DIR / '_data/04.db'
conn = sqlite3.connect(DB)
df = pd.read_sql_query("SELECT * FROM data", conn)
print("DB:")
print(df.head(2).to_markdown())
"""
DataFrame from DB:
| | first_name | last_name | age | preTestScore | postTestScore |
|---:|:-------------|:------------|------:|---------------:|----------------:|
| 0 | Jason | Miller | 42 | 4 | 25 |
| 1 | Molly | Jacobson | 52 | 24 | 94 |
"""
Last update: 53 days ago