MLearning
/
Pandas
- 1 Supervised ML 4
-
Classifier S
-
Linear model S
-
Basis expansion S
-
Regularization S
- 2 Matplotlib 2
-
Subplots S
-
Pyplot S
- 3 Datasets 4
-
Iris species S
-
Diabetes S
-
Breast cancer S
-
Simulated data S
- 4 Numpy 7
-
Matrices S
-
Sparse matrices S
-
Vectorize S
-
Average S
-
Standard deviation S
-
Reshape S
-
Multiplication S
- 5 Pandas 5
-
Read data S
-
Data cleaning S
-
Find values S
-
Group rows S
-
Merge data S
- 6 Calculus 2
-
Derivatives S
-
Integrals S
- 7 Algorithms 3
-
K nearest neighbors S
-
Linear regression S
-
Gradient descent S
S
R
Q
ML Pandas Read Data
Import data into a DataFrame The source can be anything df = pd.read_csv(URL) df = pd.read_json(FILE, orient='columns') df = pd.read_sql_query("SELECT * FROM data", conn)
Csv
p27 Import a comma-separated values (CSV) file.
""" Read CSV
The source can be URL or FILE
"""
import pandas as pd
import pathlib
# Read from URL
URL = 'https://raw.githubusercontent.com/chrisalbon/sim_data/master/data.csv'
dataframe = pd.read_csv(URL)
# Read from FILE
DIR = pathlib.Path(__file__).resolve().parent
dataframe = pd.read_csv(DIR / '../_data/01.csv')
print(dataframe.head(2).to_markdown())
# | | integer | datetime | category |
# |---:|----------:|:--------------------|-----------:|
# | 0 | 5 | 2015-01-01 00:00:00 | 0 |
# | 1 | 5 | 2015-01-01 00:00:01 | 0 |
➥ Excel
Excel
p28 Import an Excel spreadsheet.
""" Read Excel
Import an Excel spreadsheet
pip install openpyxl
"""
import pandas as pd
import pathlib
FILE = pathlib.Path(__file__).resolve().parent / '../_data/02.xlsx'
df = pd.read_excel(FILE , sheet_name=0)
print(df.head(2).to_markdown())
# | | integer | datetime | category |
# |---:|----------:|:--------------------|-----------:|
# | 0 | 5 | 2015-01-01 00:00:00 | 0 |
# | 1 | 5 | 2015-01-01 00:00:01 | 0 |
➥ Json
Json
p29 Load a JSON file for data preprocessing.
""" Read Json
Load a JSON file for data preprocessing.
"""
import pandas as pd
import pathlib
FILE = pathlib.Path(__file__).resolve().parent / '../_data/03.json'
df = pd.read_json(FILE, orient='columns')
print(df.head(2).to_markdown())
# | | integer | datetime | category |
# |---:|----------:|:--------------------|-----------:|
# | 0 | 5 | 2015-01-01 00:00:00 | 0 |
# | 1 | 5 | 2015-01-01 00:00:01 | 0 |
# Json normalize
DATA = [
{
"id": 1,
"name": "Mary",
},
{
"id": 2,
"name": "John",
},
]
df = pd.json_normalize(DATA)
print(df.head(2).to_markdown())
# | | id | name |
# |---:|-----:|:-------|
# | 0 | 1 | Mary |
# | 1 | 2 | John |
➥ Sql
Sql
p30 Probably the most used method in real world.
""" Read SQL Database
Load data from a database using SQL queries.
Probably the most used in real world.
"""
import pandas as pd
import sqlite3
import pathlib
DIR = pathlib.Path(__file__).resolve().parent
conn = sqlite3.connect(DIR / '../_data/04.db')
df = pd.read_sql_query("SELECT * FROM data", conn)
print(df.head(2).to_markdown())
# | | first_name | last_name | age | preTestScore | postTestScore |
# |---:|:-------------|:------------|------:|---------------:|----------------:|
# | 0 | Jason | Miller | 42 | 4 | 25 |
# | 1 | Molly | Jacobson | 52 | 24 | 94 |
➥ Questions
Last update: 45 days ago