5 Writing to a Database
Contents
- Writing to a Database in Julia
- Writing to a Database in SAS
- Writing to a Database in R
- Writing to a Database in Python
Once you are connected to a database, you can create tables with data! If you’re only using a database to access data, and aren’t writing to it, feel free to continue to the next chapter on querying a database.
In the code blocks below, we assume you’ve already connected to your database and have named the connection as shown in the previous chapter on connecting to a database.
5.1 Julia
Depending on your ODBC driver, you could write a table with a simple line of code: ODBC.load(database, "name of sql table", julia_table)
MS SQL Server along with other ODBC drivers/database systems don’t support this, so we must use an alternative method.
Imports
using ODBC, CSV, JuliaDB, Query, DataFrames
The Julia Language Community has suggested structure and instructions for working with tabular data, known as the Tables.jl interface. Both the indexed table type from JuliaDB and the more familiar DataFrame implement these standards, and integrate well with the ODBC package. Any of these queries can be converted into a dataframe for a more familiar interface, or into a JuliaDB IndexedTable if you need to run online algorithims or have easier access to parallel features.
Clear Up Workplace
!(
ODBC.execute,
dsn"""
drop table if exists goods;
drop table if exists items;
drop table if exists receipts;
drop table if exists customers;
drop table if exists iris;
drop table if exists iris2;
""",
)
Create tables
!(
ODBC.execute,
dsn"""
create table customers
(Id int Primary Key, LastName VARCHAR(30), FirstName VARCHAR(30))
""",
)
!(
ODBC.execute,
dsn"create table goods
30) Primary Key, Flavor VARCHAR(30), Food VARCHAR(30), Price FLOAT)",
(Id VARCHAR(
)
!(
ODBC.execute,
dsn"""
create table receipts
(ReceiptNumber int PRIMARY KEY, Date VARCHAR(30), CustomerId int, CONSTRAINT CustomerID FOREIGN KEY (CustomerId) references customers(Id) )
""",
)
!(
ODBC.execute,
dsn"create table items
, Ordinal INT, Item VARCHAR(30), PRIMARY KEY (Receipt,Ordinal), CONSTRAINT ReceiptNumber FOREIGN KEY (Receipt) REFERENCES receipts (ReceiptNumber)
(Receipt INT, CONSTRAINT ItemId FOREIGN KEY(Item) References goods (Id) )
",
)
Insert the Data First, make a sql statement with “blank” values that we’ll put in actual values into
= ODBC.prepare(dsn, "insert into customers values(?,?,?)") insertstmt
Now, for each row in the rows of our csv files, run the previous insert statement, with the values for each row inserted into the previous “blank rows” ie those question marks
for row in rows(customers)
!(insertstmt, row)
ODBC.executeend
## Now do this for the remaining three tables
= ODBC.prepare(dsn, "insert into goods values(?,?,?,?)")
insertstmt for row in rows(goods)
!(insertstmt, row)
ODBC.executeend
= ODBC.prepare(dsn, "insert into receipts values(?,?,?)")
insertstmt for row in rows(receipts)
!(insertstmt, row)
ODBC.executeend
= ODBC.prepare(dsn, "insert into items values(?,?,?)")
insertstmt for row in rows(items)
!(insertstmt, row)
ODBC.executeend
5.1.1 Alternative method for big data
The above method requires you to have the full dataset loaded into memory. But sometimes a data file is so large that it is inefficient to load everything at once and iterate through it to insert each row into a table.
The method used here instead loads one row of data into memory at a time, and writes it to the SQL table before loading the next row.
!(
ODBC.execute,
dsn""""
create table iris
(Sepal_length Float, Sepal_width Float,
Petal_length FLOAT, Petal_width FLOAT, Species VARCHAR(20))
"""
)
= ODBC.prepare(dsn,"insert into iris values(?,?,?,?,?)")
insert_statement
open("iris.csv") do f
# skip the header row (remove this line if no header)
readline(f)
# read one line at a time, write to SQL table, and repeat
for lines in readlines(f)
= map(String, split(lines, ","))
rawline !(insert_statement, rawline)
ODBC.executeend
end
The file can also be read using the CSV package. The code below would replace open("iris.csv") do f
and everything following in the code chunk above .
for row in CSV.Rows("iris.csv", skipto=2 )
!(insert_statement, row)
ODBC.executeend
5.1.2 Bulk Insert
If you have bulk insert permissions you can write data without reading it into memory. For more info, see bulk insert documentation.
TODO: Test this. This code is not tested, since we do not have bulk load permissions. TODO: Explain the difference between these two options.
!(
ODBC.execute,
dsn"""
BULK INSERT iris
FROM 'iris.csv';
"""
)
!(
ODBC.execute,
dsn"""
INSERT INTO iris (Sepal_length, Sepal_width, Petal_length, Petal_width, Species)
SELECT *
FROM OPENROWSET (
BULK 'iris.csv') AS b;
"""
)
5.2 SAS
Downloading Data
Here we use a url libname to download the iris csv file from the internet, and preprocess it using a data step.
filename download url "https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv";
data iris;
infile download delimiter = "," firstobs=2;
input var1-var4 species $;
run;
Writing Data
Finally, we write this data to the database.
proc sql;
create table conn2.iris as (select * from work.iris);
quit;
5.3 R
5.3.1 Simple Table: Iris
The first example is a basic one to illustrate how to create and fill a table.
# drop table from database, if it exists
dbExecute(conn, 'drop table if exists iris;')
# make column names sql-compatable, can't have "." in a name
names(iris) <- c(
'SepalLength','SepalWidth',
'PetalLength','PetalWidth',
'Species'
)
# create new table from iris dataset
dbWriteTable(conn, name = "iris", value = iris)
# make sure table exists in database
dbListTables(conn, table_name = "iris")
# list fields the new table
dbListFields(conn, name = "iris")
5.3.3 Alternative method for big data
The above method requires you to have the full dataset loaded into memory. But sometimes a data file is so large that it is inefficient to load everything at once and iterate through it to insert each row into a table.
The method used here instead loads one row of data into memory at a time, and writes it to the SQL table before loading the next row.
It’s a bit more complex because we can’t use the simple dbWriteTable
anymore. Instead, we define the table with a create table
statement with dbExecute
. To keep the code readable, we suggest using the paste
function and using the same indentation you would in a sql file.
For simplicity and lack of repetition, this example doesn’t use the four related datasets again. For information on defining primary and foreign keys in a create table
statement, see the w3schools example for SQL primary key on SQL Server and for foreign keys.
dbExecute(conn, 'drop table if exists iris;')
dbExecute(conn, paste(
"create table iris(",
"SepalLength decimal(5,2),",
"SepalWidth decimal(5,2),",
"PetalLength decimal(5,2),",
"PetalWidth decimal(5,2),",
"Species varchar(50)",
");"
))
file("iris.csv", open = "r")
f <- TRUE
first =
# while the next line exists
while (length(oneLine <- readLines(f, n = 1)) > 0) {
# this skips the header row. delete this if there's no header
if (first) {
FALSE
first =next
}
# separate values
unlist(strsplit(oneLine, ","))
myLine <-
# insert values
paste(
insert_statement <-"insert into iris(",
"SepalLength,",
"SepalWidth,",
"PetalLength,",
"PetalWidth,",
"Species",
") values (",
as.numeric(myLine[1]), ",",
as.numeric(myLine[2]), ",",
as.numeric(myLine[3]), ",",
as.numeric(myLine[4]),",'",
str_replace_all(myLine[5], '\"',''), "');"
)dbExecute(conn, insert_statement)
}close(f)
5.4 Python
In python, there’s no pretty function to create a table from a file or pandas DataFrame like there is in R. Here, we rely exclusively on the cursor’s execute
method and raw sql strings. To keep the code readable, we suggest using the .join
method on an empty string or space (" "
) and using the same indentation you would in a sql file.
Note that if you are using pandas, you can write data in the same way as shown here by iterating through the rows of the DataFrame as opposed to lines of a file.
5.4.1 Simple Table: Iris
The first example is a basic one to illustrate how to create and fill a table.
Load data
# load in data
with open('iris.csv') as file:
= file.readlines()[1:] iris_lines
Drop table if it exists, and create a new table
# drop table if it already exists
"drop table if exists iris;")
cursor.execute(
# create new table
" ".join([
cursor.execute("create table iris(",
"SepalLength decimal(5,2),",
"SepalWidth decimal(5,2),",
"PetalLength decimal(5,2),",
"PetalWidth decimal(5,2),",
"Species varchar(50)"
");"
]))
Insert values To insert values into a sql table, we have two options.
This first strategy is beneficial because it doesn’t require you to load the full file into memory. This means that even if a dataset is very large, python can handle it. Here we only show using it to enter the first line of data, but this could be put inside of a for loop to iterate over all rows.
# single input: write one item to table
= iris_lines[0].split(',')
line
cursor.execute(" ".join([
"insert into iris(",
"SepalLength,",
"SepalWidth,",
"PetalLength,",
"PetalWidth,",
"Species"
")"
"values(?, ?, ?, ?, ?)"]),
0],
line[1],
line[2],
line[3],
line[4].strip()
line[ )
This second option is used to insert many rows of data into the table in one command. Using cursor.executemany
, we can pass the same insert statement as above, but then a list of a list of values, where each inner list is 1 row of data. This is nice because it is less code. However, we prefer the first option because it is more consistent in that it will work for data tables of any size.
# bulk input: write many items to table
cursor.executemany(" ".join([
"insert into iris(",
"SepalLength,",
"SepalWidth,",
"PetalLength,",
"PetalWidth,",
"Species"
")"
"values(?, ?, ?, ?, ?)"]),
',') for line in iris_lines[1:]] # [1:] because observation 0 was inserted above
[line.split( )