PROGRAMMING

  minte9
learningjourney




S R Q

Csv

p27 Import a comma-separated values (CSV) file.
 
""" Read CSV
The source can be URL or FILE
"""

import pandas as pd
import pathlib

# Read from URL
URL = 'https://raw.githubusercontent.com/chrisalbon/sim_data/master/data.csv'
dataframe = pd.read_csv(URL)

# Read from FILE
DIR = pathlib.Path(__file__).resolve().parent 
dataframe = pd.read_csv(DIR / '../_data/01.csv')
print(dataframe.head(2).to_markdown())

# |    |   integer | datetime            |   category |
# |---:|----------:|:--------------------|-----------:|
# |  0 |         5 | 2015-01-01 00:00:00 |          0 |
# |  1 |         5 | 2015-01-01 00:00:01 |          0 |
Excel

Excel

p28 Import an Excel spreadsheet.
 
""" Read Excel
Import an Excel spreadsheet
pip install openpyxl
"""

import pandas as pd
import pathlib

FILE = pathlib.Path(__file__).resolve().parent / '../_data/02.xlsx'
df = pd.read_excel(FILE , sheet_name=0)
print(df.head(2).to_markdown())

# |    |   integer | datetime            |   category |
# |---:|----------:|:--------------------|-----------:|
# |  0 |         5 | 2015-01-01 00:00:00 |          0 |
# |  1 |         5 | 2015-01-01 00:00:01 |          0 |
Json

Json

p29 Load a JSON file for data preprocessing.
 
""" Read Json
Load a JSON file for data preprocessing.
"""

import pandas as pd
import pathlib

FILE = pathlib.Path(__file__).resolve().parent / '../_data/03.json'
df = pd.read_json(FILE, orient='columns')
print(df.head(2).to_markdown())

# |    |   integer | datetime            |   category |
# |---:|----------:|:--------------------|-----------:|
# |  0 |         5 | 2015-01-01 00:00:00 |          0 |
# |  1 |         5 | 2015-01-01 00:00:01 |          0 |


# Json normalize
DATA = [
    {
        "id": 1,
        "name": "Mary",
    },
    {
        "id": 2,
        "name": "John",
    },
]
df = pd.json_normalize(DATA)
print(df.head(2).to_markdown())

# |    |   id | name   |
# |---:|-----:|:-------|
# |  0 |    1 | Mary   |
# |  1 |    2 | John   |
Sql

Sql

p30 Probably the most used method in real world.
 
""" Read SQL Database
Load data from a database using SQL queries.
Probably the most used in real world.
"""

import pandas as pd
import sqlite3
import pathlib

DIR = pathlib.Path(__file__).resolve().parent

conn = sqlite3.connect(DIR / '../_data/04.db')
df = pd.read_sql_query("SELECT * FROM data", conn)
print(df.head(2).to_markdown())

# |    | first_name   | last_name   |   age |   preTestScore |   postTestScore |
# |---:|:-------------|:------------|------:|---------------:|----------------:|
# |  0 | Jason        | Miller      |    42 |              4 |              25 |
# |  1 | Molly        | Jacobson    |    52 |             24 |              94 |

Questions    
Last update: 45 days ago
Pandas, Data Cleaning