PostgreSQL
Created: 2015-08-11 12:17:29 -0700 Modified: 2023-12-17 16:20:55 -0800
Getting started
Section titled Getting startedI wanted to test PostgreSQL out. I did the following steps:
- Save this as compose.yml
- Run this command: docker compose -f ./compose.yml up
- Install pgcli for a command-line client with autocomplete
- Run this command to connect to the database: pgcli foo -p 5432 -h localhost -u postgres
Quick examples
Section titled Quick examples- Case-insensitive queries:
select * from command_names where name ilike ('Addcom');
select * from command_names where LOWER(name) = LOWER('Addcom');
Dumping the database
Section titled Dumping the databaseUse pg_dump
(reference). To get the location on macOS, do brew info libpq
, e.g. /opt/homebrew/Cellar/libpq/16.1_1/bin/pg_dump
(you can add that to your PATH if you want (reference)). Example usage:
pg_dump -d postgres://postgres:password@localhost/dbname > dump.sql
Restoring from a backup: psql -h localhost -p 5432 -U postgres -d foo < dump.sql
- If you already have the
foo
table:\c postgres
(to switch away fromfoo
so that you donât get âcannot drop the currently open databaseâ)drop database foo;
create database foo
Old notes (from like 2015 đ)
Section titled Old notes (from like 2015 đ)========================================================================================
Simple syntax examples
Create table: CREATE TABLE movie(id SERIAL PRIMARY KEY, name VARCHAR(100) NOT NULL);
Insert row: INSERT INTO movie (name) VALUES(âGood Will Huntingâ);
The â(name)â is specifying that youâre only inserting based on the name, and then the value provide is the name. If inserting datetimes, use this format: â1970-01-01 00:00:00â
Delete row: DELETE FROM movie WHERE id = 1;
Delete everything from a table: TRUNCATE users;
Modifying tables (reference) (adding/removing columns, constraints, renaming, etc.)
Update row: UPDATE accounts SET balance = balance + 100.00, something_else = 5 WHERE acctnum = 12345;
Create index: CREATE INDEX user_region_index ON users(region);
========================================================================================
Basics
PostgreSQL is a client/server database, so unless someone has created an embedded version by now, you would have to install the server on any target machine you use. This could be done with a script.
To stop PostgreSQL from starting up on Windows, go to services.msc and set the startup to manual (reference). You can also stop the service to make the ~6-8 related executables stop immediately (even when âpg_ctl stop -m fastâ does nothing).
Pronunciation: post gress Q L (I looked up the Wikipedia IPA!)
Connection strings are in this form:
postgres://username:password @localhost/database
The default username is âpostgresâ, and the default database created is also âpostgresâ, so your basic link ends up looking like this:
postgres://postgres:fakepassword @localhost/postgres
On Windows, Postgres binaries will attempt to run as your username if you donât specify anything, so you should probably create your user with âcreateuser.exeâ:
createuser -P -U postgres -s agd13_000
The â-Pâ is so that you can provide a password for the new user. If you donât do this, I believe youâll be forced to change the password via pgAdmin or âalter userâ, because it doesnât seem to allow a user with no password to connect. The password is hashed (and salted with the username or something) and saved directly into the database.
A hint for the password Iâm using is 0131 134 4e
The â-Uâ is to connect with your existing âpostgresâ user. If you donât know which user you already have made, use pgAdmin III to find out. Youâll need to type in the password for this user after typing the password for your new user.
The â-sâ is to make your new user a super user. The only privilege this does NOT give you is ââreplicationâ.
To create a database: âcreatedb testpgâ (note: this calls createdb.exe from the install location).
The tutorial that they mention has to be obtained through source. Do âselect version();â from psql to find your version, then go download the source here.
Never mind, donât follow the tutorial. Building on Windows is very tough.
A Schema contains tables, data types, functions, and operators, and is itself sort of a namespace (e.g. you can have two tables named âmytableâ as long as theyâre in different Schemas).
Aggregate functions are things like âmaxâ, âminâ, and âcountâ (reference). WHERE selects input rows before groups and aggregates are computed, and HAVING selects group rows after groups and aggregates are computed. Thatâs why WHERE clauses cannot contain aggregate functions. Likewise, HAVING clauses almost always contain aggregate functions.
PostgreSQL provides views (reference), but they are for organization, not for performance (reference).
Unlike MongoDB, a table appears as soon as you create it; you donât need to actually add rows.
PostgreSQL has reserved words (reference), so âcreate table user (id int);â will fail with âsyntax error at or near userâ.
If you have a table with defaults, you can insert into it without specifying any arguments by doing this:
CREATE TABLE made_up(has_default int DEFAULT 5);
INSERT INTO made_up DEFAULT VALUES;
There will now be a single row in made_up whose has_default is set to 5.
Table aliases:
SELECT u.id FROM users AS u WHERE u.id = 1;
Note: the âASâ is optional, so this is the same as SELECT u.id FROM users u WHERE u.id = 1;
Table aliases are only necessary when joining a table to itself (or in some subqueries), but theyâre convenient.
Column aliases:
SELECT id AS user_id, name AS user_name FROM users WHERE id = 1;
user_id | user_name
---------+-----------
1 | Adam
(1 row)
The PostgreSQL documentation suggests that you always use âASâ for column aliases (reference).
There is a lot of good information in the indexes chapter (reference). For example, you can make an index on (x,y), but then if you frequently do searches using operators on just âyâ, you wonât be able to make full use of the index. The query planner figures out whether to even use an index at all, as sometimes a sequential search will be faster.
Use partial indexes when you want to ignore common values (e.g. IP address access from â192.168..â) that account for more than a few percent of the total number of rows (reference).
You should use EXPLAIN/ANALYZE to figure out which indexes are actually being used (reference). You can add an index in realtime, but doing so is discouraged since it prevents writes by default (see CONCURRENTLY for how to still allow writes).
Always test your indexes with something like 1000 rows out of 100000 rows (of real world data, not just test data). Fetching 1 row out of 100 will probably ignore the index entirely.
You can SELECT âFORâ something, which is called a âlocking clauseâ (reference). E.g. âSELECT FOR UPDATEâ causes rows retrieved by SELECT to be locked as though for update. This prevents them from being locked, modified, or deleted by other transactions until the current transaction ends. It does NOT prevent querying, only writing/locking! Also, apparently it may not even ensure that concurrent updates/deletes donât happen with a selected row unless you actually update the row (reference).
Foreign and Primary keys
You can have multiple foreign keys in a table. For example, if you had a really naĂŻve friend list that could contain exactly three friends, you would do this:
CREATE TABLE users (
id int primary key,
friend1 int references users(id),
friend2 int references users(id),
friend3 int references users(id)
);
Also, your primary key or your foreign key could refer to a group of columns:
CREATE TABLE example (
a integer,
b integer,
c integer,
PRIMARY KEY (a, c)
);
Letâs say you delete a row in Table X that was acting as a foreign key in Table Y (reference). For example, you delete a Movie row despite that some people had that listed as their favorite movie in the movie_preferences table. You could mark the foreign key as âON DELETE RESTRICTâ which wouldnât let you delete the movie, or âON DELETE CASCADEâ which would delete the preference row in addition to deleting the movie. âNO ACTIONâ (which is the default) will simply raise an error when you delete the movie (in MySQL, âNO ACTIONâ and âRESTRICTâ are actually the same, not sure about PostgreSQL).
Example of NO ACTION:
testpg=# DELETE FROM movie WHERE id=1;
ERROR: update or delete on table âmovieâ violates foreign key constraint âprefs_movie_id_fkeyâ on table âprefsâ
DETAIL: Key (id)=(1) is still referenced from table âprefsâ.
========================================================================================
Transaction isolation
There are three phenomena that could be caused by concurrent transactions (and the Wikipedia article is GREAT for this; keep in mind that PostgreSQL uses Multiversion Concurrency Control):
- Dirty read: a transaction reads data written by a concurrent uncommitted transaction.
- This is never possible in PostgreSQL, as the Read-Uncommitted isolation level actually maps to Read-Committed, which guarantees that you wonât get dirty reads.
- Nonrepeatable read: a transaction re-reads data it has previously read and finds that data has been modified by another transaction (that committed since the initial read).
- With a little help from StackOverflow (reference), hereâs how this can happen:
- BEGIN; â this is the same as BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED, because thatâs what PostgreSQL defaults to.
- SELECT name FROM users WHERE id=1; â âSamâ
- â ⌠other stuff perhapsâŚ
- â At some point, another transaction commits a change to the âusersâ table.
- â We rerun the original query within this transaction:
- SELECT name FROM users WHERE id=1; â âSamanthaâ. Despite being the same query, a row was modified! This is a nonrepeatable read.
- COMMIT;
- With a little help from StackOverflow (reference), hereâs how this can happen:
- Phantom read: (note: going to paste a quote and the explanation after; the quote makes no sense when you first read it, but it is precise): âa transaction re-executes a query returning a set of rows that satisfy a search condition and finds that the set of rows satisfying the condition has changed due to another recently-committed transaction.â An explanation as to how this differs from non-repeatable reads: the rows youâre getting back will have the same values as before, but now different rows are being selected (because the concurrent transaction deleted/inserted rows). Hereâs how this can happen:
- BEGIN;
- SELECT year_of_earthquake FROM nat_disaster WHERE country=âUSâ; â Returns 1970, 1975, 1990
- â At this point, another transaction adds in a new row.
- â We rerun the original query within this transaction:
- SELECT year_of_earthquake FROM nat_disaster WHERE country=âUSâ; â Returns 1970, 1975, 1990, 2015. We got the same values as before but with an addition! This is a phantom read.
Hereâs a useless conclusion that I drew from the above: nonrepeatable reads and phantom reads can appear to be functionally identical in this case:
- Nonrepeatable read: âSamâ becomes âSamanthaâ
- Phantom read: delete âSamâ, add in âSamanthaâ with the same ID.
- (you would never do this in a real database)
Here are the âfourâ isolation levels for PostgreSQL (âread uncommittedâ is internally the same as âread committedâ, so there are really only three):
[TABLE]
Remember that Read Committed is on by default, which means that the same query performed multiple times in Transaction X can see changes, insertions, and deletions from concurrent transactions committed while Transaction X is running.
Note: you can test the below âproblemsâ by just running two instances of PSQL on your machine. You donât even need to be on two different user accounts. Letâs say you run Transaction 1 below and then run Transaction 2 when the comment says to. T2 wonât actually finish until T1 is committed. If you go into pgAdmin â> your database â> catalogs â> pg_catalog â> views â> pg_locks, youâll see that T1 acquires a RowExclusiveLock and two ExclusiveLocks. Then, T2 requests ExclusiveLock, ShareLock, and AccessExclusiveLock, and it obviously canât get them because T1 has at least one of them.
â Transaction 1
BEGIN;
UPDATE accounts SET balance = balance + 100.00 WHERE acctnum = 12345;
UPDATE accounts SET balance = balance - 100.00 WHERE acctnum = 7534; â Transaction 2 is now run.
COMMIT;
â Transaction 2
BEGIN;
DELETE FROM accounts WHERE balance = 100;
COMMIT;
With Read-Committed isolation (which is the default for transactions (reference)), Transaction 1 can be run however many times you want concurrently and youâll never run into cases where you create/destroy âmoneyâ as you would in typical threading scenarios. This is because each Read-Committed transaction only sees data committed before the âBEGIN;â statement; it never sees uncommitted data, and it never sees data committed by concurrent transactions.
========================================================================================
Good database practices
Every table should have a primary key (reference).
Use NOT NULL wherever you can (reference). Primary keys donât need to be marked as NOT NULL because thatâs implied. Foreign keys, however, can have NULL, which means itâs up to you to decide if you want to make it NOT NULL.
You should supply a name with constraints so that if they end up being violated, you can know which constraint it was (reference).
CREATE TABLE products (
product_no integer,
name text,
price numeric CONSTRAINT positive_price CHECK (price > 0)
);
OIDs should not be assumed to be unique. In large databases, they can wrap.
Always index columns that you plan on searching on (reference). Indexes take up disk space (which means updating a row in the table will update the index), which usually means it costs more real-life money to index a table if youâre on a cloud platform like AWS or Azure. So donât add indexes willy-nilly; identify where you need them and use them.
To make JOINs faster, make sure the columns that youâre joining on are indexed.
If you are expecting a unique row (e.g. get a user with id==571), then put âLIMIT 1â in your query to speed it up (reference).
SELECT id,name FROM users WHERE id = 1 LIMIT 1;
If youâre NOT getting a unique row, then this should not be done without âORDER BYâ, otherwise the results you get will be unpredictable.
Use the smallest data types possible for each column. In paticular, with PostgreSQL:
Character vs. varchar vs. text: there is no real performance difference among these three types in PostgreSQL (reference). âIn most situations text or varchar should be used instead [of character]â. Varchar(80) will NOT use 80 bytes (plus some overhead) when storing the string âAdamâ.
Performance tips:
- When doing lots of commits (e.g. to populate a database for the first time), issue a BEGIN, then do the INSERT commands, then issue a COMMIT (reference). This is better than just issues tons of INSERT commands. Even better is to just use COPY to read from a file to populate a database.
- Creating an index of a pre-existing table is faster than updating the index each time a row is added, so if youâre populating a table for the first time, remove the indexes first, then add them.
- Likewise, do this for foreign keys too.
========================================================================================
PSQL
âpsqlâ is the shell for PostgreSQL. You can specify a database as you start it up: âpsql testpgâ.
The prompt itself is very important. Youâll usually see something like âdbname=#â. The â#â means youâre a superuser (reference).
- A â=â means that there is currently no context in the shell; you are free to type any command.
- A â-â means that youâre typing in a command and havenât entered the semicolon yet. E.g. if you type âselect * from usersâ, you will see the prompt change to a hyphen.
- A â(â means that you have one or more open parentheses somewhere. This symbol will never change until you close all parentheses.
If you donât see capitalized text when you finish an operation (e.g. âCREATE TABLEâ when creating a table), then you should check your prompt! Odds are it shows a â-â because you havenât put a semicolon, or maybe you have a â(â there because you didnât close your parens.
Semicolons are not implicit like in JavaScript! If a command says to type âSELECT version();â then youâd better add the semicolon! This is because you may want a âCREATE TABLEâ command or something to span multiple rows to make it easier to type/read.
Key words are case-insensitive, although you should always capitalize them so that they stand out from other identifiers.
âlâ will list all databases.
âdtâ will list all tables in the current database. Remember: something became the current database because you connected to it, probably via âpsql dbnameâ when you started psql. You can switch databases with âconnect dbnameâ (reference).
========================================================================================
Install (reference)
This includes pgAdmin III, which you run just like any other program on Windows (Windows key â> type âpgadminâ).
I may not have HAD to do this, but I added âC:Program FilesPostgreSQL9.4binâ (or usr/local/pgsql/bin) to my PATH so that I could call âcreatedbâ from the command line.
========================================================================================
Node interface
Connection pool example:
var conString = âpostgres://postgres:fakepassword @localhost/postgresâ;
//this initializes a connection pool
//it will keep idle connections open for a (configurable) 30 seconds
//and set a limit of 20 (also configurable)
pg.connect(conString, function(err, client, done) {
if(err) {
return console.error(âerror fetching client from poolâ, err);
}
client.query(âSELECT $1::int AS numberâ, [â1â], function(err, result) {
//call done()
to release the client back to the pool
done();
if(err) {
return console.error(âerror running queryâ, err);
}
console.log(result.rows[0].number);
//output: 1
});
});
Explanation:
âpg.connectâ - if there is a ready client in the pool, it will use that. If not and the pool is not full, it will create a new connection. If the pool is full, it will wait for a Client to be available. Your callback will only be called when the Client is ready to be used (or if an error is encountered).
Rationale:
I looked through the code and documentation. Node-postgres is backed by generic-pool, and its acquire function says this (typos are theirs): âThe callback will be called, when a new client will be availabe, passing the client to it.â
Hereâs why this is important: this lets you have many different queries running at once without knowing whether theyâre actually connected to the server. For example, suppose you wanted to query the User table 1000 times. If you just made a new pg.Client and went at it, it would do them all sequentially. With a pool, you can process up to 20 (see pg.defaults) at a time. The default max_connections to a PostgreSQL server is 100, so you definitely want to pool connections on the client.
As shown in the connection-pool example, you can submit a query with parameterization (reference):
query(âselect name from emp where emp_id=$1â, [123]);
This protects against SQL injections, but itâs slower than a plain query. You can get some of the speed back with prepared queries:
query({
name: âemp_name saved queryâ,
text: âselect name from emp where emp_id=$1â,
values: [123]
});
Instead of cramming everything into the callback for query(), you can instead use events for ârowâ, âerrorâ and âendâ (reference):
var query = client.query(âSELECT name, age as user_age FROM usersâ);
query.on(ârowâ, function(row, result) {
result.addRow(row);
});
query.on(âerrorâ, function(error) {
//handle the error
});
query.on(âendâ, function(result) {
//fired once and only once, after the last row has been returned and after all ârowâ events are emitted
//in this example, the ârowsâ array now contains an ordered set of all the rows which we received from postgres
console.log(result.rowCount + â rows were receivedâ);
})
Those are the most important details of the Node interface.k
========================================================================================
pgAdmin III
To change your password, go to Server Groups â> Servers â> (your server) -=-> Login Roles â> click a name â> File â> Change PasswordâŚ
To view tables, you really have to drill down. Server Groups â> Servers â> (your server) â> Databases â> (your database) â> Schemas â> public â> Tables â> (your table) â> click the button in the top bar that looks like a table.
Thereâs also a table-with-filter icon that you can press, then type any filter like âtemp_lo < 45â. You can validate this filter before executing it.
========================================================================================
Comparison to MySQL
Section titled Comparison to MySQLEveryJuan: Adam13531, PostgreSQL has a single node system with a async replication, which would have a ~5sec lag, so not good for real time, MySQL can have sync replica nodes, or better yet, master-master systems through stuff like percona and galera, we use percona in production and love it, they have async alter statements, donât have to wait for migrations in production
Community edition of MySQL is free, as is Percona. Same license as PostgreSQL.