It’s pretty common that you want to insert/update data into a new table, but insert or update depending on whether the data already exists. Schematically, we want to do something like this:

In the case above, we want row 37 to be updated, and row 45 to be added. Just to complement that with a Venn diagram, here’s the situation:

Upserts in Redshift

Redshift doesn’t support upserts (updates + inserts at the same time) but it can be done using a few tricks:

1. Create a temporary staging table and delete entries in the destination table from it
2. Removing duplicates in the staging table

The temporary staging table strategy is described pretty well in this blog post. The idea is to insert into a separate table, then delete any overlapping rows (the intersection in the Venn diagram). We can do this using a series of SQL commands:

Not quite done yet! There’s a slightly annoying edge case where we have duplicates in the source table. We can delete those using the following SQL statement:

Compound keys

The solution extends pretty easily to compound keys, meaning when you want to use a combination of columns as the key to delete on. Let’s say your compound key consists of (id_1, id_2):

Python code

In case it’s useful, here’s some Python code that creates and executes the statements for you: