當前位置: 首頁>>技術問答>>正文


python – 如何將SQL查詢結果轉換為Pandas數據結構DataFrame?

如何將SQL查詢結果轉換為PANDAS數據結構DataFrame?

簡單來說,我想對我的SQL數據庫運行查詢並將返回的數據存儲為Pandas數據結構。

我正在閱讀有關Pandas的文檔,但是在識別查詢的返回類型時遇到了問題。

我試圖打印查詢結果,但沒有得到任何有用的信息。查詢代碼如下:

from sqlalchemy import create_engine

engine2 = create_engine('mysql://THE DATABASE I AM ACCESSING')
connection2 = engine2.connect()
dataid = 1022
resoverall = connection2.execute("
  SELECT 
      sum(BLABLA) AS BLA,
      sum(BLABLABLA2) AS BLABLABLA2,
      sum(SOME_INT) AS SOME_INT,
      sum(SOME_INT2) AS SOME_INT2,
      100*sum(SOME_INT2)/sum(SOME_INT) AS ctr,
      sum(SOME_INT2)/sum(SOME_INT) AS cpc
   FROM daily_report_cooked
   WHERE campaign_id = '%s'", %dataid)

如代碼所示,我想了解變量”resoverall”的格式/數據類型是什麽,以及如何將其與PANDAS數據結構一起使用。

 

最佳辦法

這是完成任務的最短代碼:

from pandas import DataFrame
df = DataFrame(resoverall.fetchall())
df.columns = resoverall.keys()

 

次佳辦法

如下所述, Pandas 現在可以使用SQLAlchemy進行數據庫的讀取(read_sql)和插入(to_sql)。以下方法有效:

import pandas as pd

df = pd.read_sql(sql, cnxn)  # 令 cnxn = connection2

老版的做法:

import pyodbc
import pandas.io.sql as psql

cnxn = pyodbc.connect(connection_info) 
cursor = cnxn.cursor()
sql = "SELECT * FROM TABLE"

df = psql.frame_query(sql, cnxn)
cnxn.close()

第三種辦法

SQLAlchemy之外,如果需要使用pyodbc,MySQLdb或psychopg2,像下麵這樣一個簡單的函數往往可以滿足我的需求:

import decimal

import pydobc
import numpy as np
import pandas

cnn, cur = myConnectToDBfunction()
cmd = "SELECT * FROM myTable"
cur.execute(cmd)
dataframe = __processCursor(cur, dataframe=True)

def __processCursor(cur, dataframe=False, index=None):
    '''
    Processes a database cursor with data on it into either
    a structured numpy array or a pandas dataframe.

    input:
    cur - a pyodbc cursor that has just received data
    dataframe - bool. if false, a numpy record array is returned
                if true, return a pandas dataframe
    index - list of column(s) to use as index in a pandas dataframe
    '''
    datatypes = []
    colinfo = cur.description
    for col in colinfo:
        if col[1] == unicode:
            datatypes.append((col[0], 'U%d' % col[3]))
        elif col[1] == str:
            datatypes.append((col[0], 'S%d' % col[3]))
        elif col[1] in [float, decimal.Decimal]:
            datatypes.append((col[0], 'f4'))
        elif col[1] == datetime.datetime:
            datatypes.append((col[0], 'O4'))
        elif col[1] == int:
            datatypes.append((col[0], 'i4'))

    data = []
    for row in cur:
        data.append(tuple(row))

    array = np.array(data, dtype=datatypes)
    if dataframe:
        output = pandas.DataFrame.from_records(array)

        if index is not None:
            output = output.set_index(index)

    else:
        output = array

    return output

 

第四種辦法

對於mysql:

import pandas as pd
import mysql.connector

# Setup MySQL connection
db = mysql.connector.connect(
    host="<IP>",              # your host, usually localhost
    user="<USER>",            # your username
    password="<PASS>",        # your password
    database="<DATABASE>"     # name of the data base
)   

# You must create a Cursor object. It will let you execute all the queries you need
cur = db.cursor()

# Use all the SQL you like
cur.execute("SELECT * FROM <TABLE>")

# Put it all to a data frame
sql_data = pd.DataFrame(cur.fetchall())
sql_data.columns = cur.column_names

# Close the session
db.close()

# Show the data
print(sql_data.head())

 

第五種辦法

對於SQL Server。

import pandas as pd
from sqlalchemy import create_engine

def getData():
  # Parameters
  ServerName = "my_server"
  Database = "my_db"
  UserPwd = "user:pwd"
  Driver = "driver=SQL Server Native Client 11.0"

  # Create the connection
  engine = create_engine('mssql+pyodbc://' + UserPwd + '@' + ServerName + '/' + Database + "?" + Driver)

  sql = "select * from mytable"
  df = pd.read_sql(sql, engine)
  return df

df2 = getData()
print(df2)

Pandas SQL


參考資料

 

本文由《純淨天空》出品。文章地址: https://vimsky.com/zh-tw/article/4363.html,未經允許,請勿轉載。