PostgreSQL
PostgreSQL (or Postgres) is a database management system, similar to MySQL. As the name implies, it uses standard SQL syntax for database interaction.
The VMs and developer localhosts store user data and vds data as Postgres databases. On the VMs, these are
userdb2006_1012 and
vds2006_1022, but they may be called something different on individual localhosts.
Postgres Quick Reference
At the top for accessibility. Commands assume default or most typical settings; keep reading if these don't work for you.
OS X does not run services like Linux distributions do, so you'll have to use the
pg_ctl
commands.
$ sudo -u postgres pg_ctl start -D /Library/PostgreSQL/9.4/data/
$ sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/9.4/data/
$ sudo -u postgres pg_ctl reload -D /Library/PostgreSQL/9.4/data/
$ sudo -u postgres pg_ctl stop -D /Library/PostgreSQL/9.4/data/
As shown,
pg_ctl
commands require specifying the data directory. The default on OS X is
/Library/PostgreSQL/X.X/data/
though you can check by examining the output of
$ ps -ef | grep postgres
for the
-D
flag or running
SHOW data_directory;
from within
psql
.
Postgres Basics
Postgres operates on a hierarchy of
cluster >
database >
schema >
table
- A cluster is the grouping of individual databases associated with a single instance of the Postgres server. Each cluster is associated one-to-one with a server instance as well as a unique data directory in which the databases are stored, and each cluster is created with the default databases
template0
, template1
and postgres
. A typical Postgres installation (ours, for example) uses only one cluster.
- A database is an orderable collection of related data, used in the normal sense of the word.
- A schema is an organizational unit within a database that contains one or more tables along with built-in data types, functions, and operators. Schema are similar to directories in a file system. They are most useful for controlling user permissions for various cross-sections of the data stored in the database. More on schema
- A table is a named set of rows whose entries are sorted into named columns. Again, the normal sense of the word when talking about databases.
There are also views and sequences.
- A view is a shorthand name assigned to a particular query statement. It is useful if you find yourself making the same query repeatedly and don't want to keep typing it out every time. Views are similar to tables and can be manipulated in almost exactly the same way.
- A sequence is something.
Postgres creates an OS user (or system user) called
postgres
upon installation. In addition, Postgres itself has Postgres users called
roles with passwords, groups, permissions, etc. The Postgres installation defines a Postgres role called
postgres
as a Postgres "super user" role (distinct from the
OS super user and from the
OS user
postgres
!). The Postgres super user role is used to create new Postgres roles and perform other administrative tasks. So, after installation, you have
- An OS user
postgres
- A Postgres super user role
postgres
- A Postgres database
postgres
- The Postgres server program
postgres
and the terminal command postgres
used to invoke it
It can get confusing.
Every Postgres database cluster is uniquely associated with one instance of the
postgres
server process and one data directory where cluster data is stored. The data directory is indicated by the
-D
flag in shell commands and output.
Using Postgres
When you use a Postgres utility (like
psql
,
pg_ctl
or
postgres
) or otherwise connect to a database, Postgres must recognize you acting as a certain role. If you don't specify a Postgres role explicitly, the role is assumed to be one with the same name as the OS user attempting to make the connection. So, for example, you
might try to start an instance of the
postgres
server with the command
[Don't do this - keep reading]
$ postgres -D /data/directory/filepath/for/this/cluster
Here, Postgres looks for a role with the same name as your username. If that role doesn't exist or doesn't have permissions to start the server, you'll get an error. Typically, this is handled by always acting as the OS user
postgres
, which will be recognized as the Postgres super role
postgres
. In practice, this means using the
sudo -u postgres
command to impersonate the OS user
postgres
when interacting with Postgres:
[This is typically the correct way to do the above command]
$ sudo -u postgres postgres -D /data/directory/filepath/for/this/cluster
The first
postgres
is the
postgres
user, part of (
sudo -u postgres
), telling the shell that you want to act as the OS user
postgres
. The second is the invocation of the
postgres
server program. I warned you.
Postgres in Practice
Postgres logs
Postgres logs are, by default, in
/Library/PostgreSQL/9.4/data/pg_log/
(OS X 10.11 El Capitan)
Identifying Postgres server processes
In general, multiple Postgres server instances are possible, though a single instance is most typical. To see what, if any, Postgres processes are running, try
$ ps aux | grep postgres
If a Postgres server is running, there will probably be several processes. Look for an output line of the form
postgres 95 0.0 0.1 2641612 15640 ?? Ss 12Mar16 0:07.54 /Library/PostgreSQL/9.4/bin/postmaster -D/Library/PostgreSQL/9.4/data
postmaster
is a deprecated alias of
postgres
, the Postgres database server (so you might see
bin/postgres
instead of
bin/postmaster
, depending on your version). This line identifies the server instances and identifies the data directory associated with it (what follows the
-D
flag). Remember that individual server instances, clusters, and data directories are all uniquely grouped, so if there's another cluster it will have to have its own process line with its server instance and data directory.
Note that the first entry in the line identifies which OS user initiated the server process.
Checking the status of a Postgres server instance
Once you've identified a server instance and its associated data directory, you can use the
pg_ctl
command to administrate it. For example,
$ sudo -u postgres pg_ctl status -D /Library/PostgreSQL/9.4/data/
pg_ctl: server is running (PID: 95)
/Library/PostgreSQL/9.4/bin/postgres "-D/Library/PostgreSQL/9.4/data"
Process id files
When a Postgres server instance is started, it creates a "process id" file, or pid file (with
.pid
extension) in the data directory to store the OS process id number for easy reference. This is called
postmaster.pid
on my system (OS X El Cap), which is probably pretty standard. I imagine that it might be named
postgres.pid
on some systems. When the server instance is stopped, this file is deleted. So, checking for the existence of the
.pid
file is another way to see if a the server is running. If it isn't, and you issue a
postgres
,
pg_ctl
, or
psql
command that assumes it is, you may get an error that references the absence of this file. For example,
$ sudo -u postgres pg_ctl stop -m smart -D /Library/PostgreSQL/9.4/data/
pg_ctl: PID file "/Library/PostgreSQL/9.4/data/postmaster.pid" does not exist
Is server running?
Postgres versions in the e-Labs
The e-Lab code includes a technology called VDS (Virtual Data Systems), an invention of some of the early e-Lab authors. VDS places restrictions on the version of Postgres that can be installed relative to the JDBC Postgres driver.
JDBC
Java Database Connectivity (JDBC) is a standard API, part of Java SE. JDBC classes are contained in Java packages
java.sql and
javax.sql. The e-Lab class
DatabaseConnectionManager
depends on many of these package members when forming connections to the e-Lab Postgres databases.
JDBC is for general database utility. Working with Postgres in particular requires a Postgres-specific driver, found in the e-Lab code base as
common/lib/postgresql-9.4-1205.jdbc4.jar
Note from the filename that this driver is intended for use with Postgres 9.4. Technically, later versions of Postgres should work fine with this driver. VDS, however, explicitly checks to see that the working version of Postgres is less than or equal to that indicated by this JDBC driver. Thus, the e-Lab code breaks if you use a higher version of Postgres than found in the filename of the JDBC driver.
If you want to (or have to) user a newer version of Postgres, there are a couple of solutions to this:
- Check for an updated JDBC driver. As of Sept 2016, the most recent driver version is 9.4-1211, while the most recent Postgres version is 9.5.4, so there won't always be a most-up-to-date driver.
- Have Postgres report itself as a lower version. This was suggested by Mihael, but I'm not certain how to implement it (Joel)
- Fix VDS not to check, or to have a more flexible check. Also not sure how to implement this.
For the moment, it's best to keep Postgres 9.4 or lower.
pg_ctl
To make certain things slightly less cumbersome, Postgres includes a wrapper program called
pg_ctl
that can take care of several administrative tasks. This also must be invoked by the OS's Postgres user account (
sudo -u postgres
) and specify a data directory with the
-D
flag, i.e.,
$ sudo -u postgres pg_ctl <command> -D <data directory>
Some handy
pg_ctl
commands are
sudo -u postgres pg_ctl -w start -D /Library/PostgreSQL/9.4/data/
sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/9.4/data/
sudo -u postgres pg_ctl reload -D /Library/PostgreSQL/9.4/data/
sudo -u postgres pg_ctl stop -D /Library/PostgreSQL/9.4/data/
(Postgres 9.4 on OS X 10.11 El Capitan; adjust for your machine as needed).
The
-w
flag on
pg_ctl start
tells the process to wait for the startup or shutdown to complete. I'm not entirely sure what this entails, but in my experience it works better to have the flag.
psql
Postgres includes an interactive terminal program called
psql
. Thus, there are typically two ways to manipulate Postgres databases (or clusters or schema) from the terminal window: directly through the command line, or from within
psql
.
To start
psql
from the command line,
$ psql <database name>
<database name>
is optional; if you don't specify one, then
psql
will default to a database with the name of your user account.
The
psql
prompt will look like
<database name>=>
or
<database name>=#
(the latter, using the octothorpe, indicates a Postgres super user, just as in
bash
).
To add a user in psql,
# CREATE USER username;
just like SQL.
Facts
The default installation path for Postgres in OS X (El Cap) is
/Library/PostgreSQL/9.4
, where
9.4
may vary according to your version.
Postgres creates the OS user
postgres
, as you can see from examining
$ dscl . -list /Users
or
$ cat /etc/passwd | grep postgres
before and after installation.
During installation, you will be prompted for a postgres user password.
Postgres includes a terminal-based front-end called
psql
, like a Postgres-specific shell. Enter it from the terminal using
$ sudo -u postgres /Library/PostgreSQL/9.4/bin/psql
Password: <see below>
psql (9.4.5)
Type "help" for help.
postgres=#
For the password, first enter your localhost
sudo
password, which will return the password prompt without any comment from the shell. It'll look like you've had a password error. Now enter your postgres user password, though, and you should enter
psql
. It took me forever to figure this out. If you legitimately mis-enter the password, you'll get a
Sorry, try again
error.
Once you've entered
psql
, the following commands are useful:
-
\list
or \l
lists all databases
-
\connect <db_name>
or \c <db_name>
connects to database db_name
-
\dt
lists all tables in the current database
-
\d+ <table_name>
shows all columns in the table table_name
-
\du
lists users
-
\q
exits psql
and returns you to the shell.
Standard SQL syntax is accepted for general database work.
How to do a few specific things:
- CREATE DATABASE <db_name>;
creates a new database. There are several options to this; see
http://www.postgresql.org/docs/current/static/sql-createdatabase.html
deletes a database entirely. Use with extreme care. Once you do this, a database can only be recovered from a backup, if you've made one.
If you want to change the postgres user password,
# ALTER USER postgres WITH PASSWORD '<newpassword>';
ALTER ROLE
There's also
# ALTER USER postgres WITH ENCRYPTED PASSWORD '<newpassword>';
ALTER ROLE
ENCRYPTED PASSWORD
stores the password in an encrypted state, but I'm not sure what the practical difference is here, otherwise.
A default data directory can be set via the environment variable
$PGDATA
.
Backups
PostgreSQL provides a utility program
pg_dump
for creating multiple forms of backup. These should be executed from the command line, not from within
psql
. You will probably need to prepend
sudo -u postgres
to them, as with everything else.
To create a backup:
-
pg_dump database_name_here > database.sql
-
pg_dump -Fc database_name_here > database.bak
-
pg_dump -Ft database_name_here > database.tar
The default form of dump with
.sql
extension is simply a plaintext listing of the SQL commands required to re-create the database. This output file will be readable, but large. The
.bak
form is a compressed binary, which will be unreadable but significantly smaller than the
.sql
form. The
.tar
form is your ordinary tarball.
Permissions can be troublesome here. Using
sudo -u postgres
to access
pg_dump
as the super-user role
postgres
means that the dump file will be written as the OS user
postgres
, which means that the OS user
postgres
must have write permissions to the destination directory. This user will not typically have write permission to your home directory, for example, so outputting the file there will throw an error. There are several potential workarounds, but my (Joel) preferred method is to output to
/tmp/
and then move to home:
sudo -u postgres pg_dump -Fc database_name_here > /tmp/database.bak ; sudo mv /tmp/database.bak ~/database.bak
(this may give a "=could not change directory to "/home/username": Permission denied=" error if you run it from your home directory. That doesn't keep it from working, though).
To restore to a DB that already exists:
-
pg_restore -Fc database.bak
-
pg_restore -Ft database.tar
If the existing database has a different name:
-
pg_restore -Fc -d dbname database.bak
-
pg_restore -Ft -d dbname database.tar
To create the DB during restoration:
-
pg_restore -C -d postgres database.bak
-
pg_restore -C -d postgres database.tar
as appropriate for the file extension in either case.
On the VMs
Postgres is installed in
/etc/postgresql/9.3/
The data directory is
/var/lib/postgresql/9.3/main
, though you'll need to become root (
$ sudo su
) to
cd
to it.
You'll also have to become root to run the backup commands given above.
quirks
1)
Home directory
You may run into a
Permission denied
error of the following form when using Postgres from your home directory:
jgriffi8@i2u2-db:~$ sudo -u postgres <some valid command>
could not change directory to "/home/jgriffi8": Permission denied
Tildes (
~
), which the shell uses to abbreviate your home directory, can flip Postgres's crap for some reason. To get around this,
cd
out of your home directory and try the command again.
2)
Server won't stop
Sometimes the server won't stop with a
pg_ctl stop
or
pg_ctl restart
command. For example,
$ sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/9.4/data/
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
HINT: The "-m fast" option immediately disconnects sessions rather than
waiting for session-initiated disconnection.
If that happens, try the following sequence in order (adjusting for your own data directory):
-
$ sudo -u postgres pg_ctl stop -D /Library/PostgreSQL/9.4/data/
-
$ sudo -u postgres pg_ctl stop -m s -D /Library/PostgreSQL/9.4/data/
-
$ sudo -u postgres pg_ctl stop -m f -D /Library/PostgreSQL/9.4/data/
-
$ sudo -u postgres pg_ctl stop -m i -D /Library/PostgreSQL/9.4/data/
The
-m
flag allows you to specify how
postgres
should stop the server. In increasing order of severity, the
s
option means "smart", the
f
option means "fast", and the
i
option means "immediate".
If an immediate stop doesn't work, something is probably really wrong. It's possible to
kill
the server process directly through the shell, but it's probably best to investigate what the error is and correct it, instead.
3)
passwords
To make it worse, there's high potential for password confusion. When issuing the above command from a blank-slate terminal, the first password requested is your OS
sudo
password so that the OS can verify that you're allowed to run commands as an OS user other than yourself (in this case, as OS user
postgres
). Once you've entered your
sudo
password and the command is executed, Postgres takes the connecting OS username and, by default, looks for a Postgres role with the same name. If found, it allows the connection as that role (in this case, Postgres role
postgres
, the default super-user role). At this point, Postgres authenticates the connecting user. If you set a Postgres password for
postgres
at installation, you'll need to enter it here. Then (and only then) will Postgres allow connection to the database.
This is further complicated by two facts:
- On most UNIX-based systems, entering your
sudo
password once establishes a time window, typically 10 or 15 minutes, where you can issue further sudo
commands without re-entering your sudo
password
- At least on my system (OS X El Cap), the prompts for the two passwords are identical (
Password:
)
So, unless you're paying strict attention, it isn't always evident which password is being requested! There's probably a solution to this involving altering one or both password prompts, but my strategy so far has been to just get used to it.
Rough notes on installation
On OS X 10.11 (El Capitan)
On Wed 14Oct2015 I downloaded the graphical installer from EnterpriseDB:
http://www.enterprisedb.com/products-services-training/pgdownload#osx
Installed PostgreSQL version 9.4.5 to /Library/PostgreSQL/9.4
(default setting)
Data Directory /Library/PostgreSQL/9.4/data
(default setting)
password for dtabase superuser (postgres) created and stored in pw manager
Port number the server should listen on: 5432
(default setting)
Locale to be used by the new database cluster [Default locate]
(default setting)
[I didn't know what this was. It presented a giant list of options that seemed to be constructed from country name abbreviations and ISO numbers]
Do you want the application "postgres" to accept incoming network connections?
Clicking Deny may limit the application's behavior. This setting can be changed in the Firewall pane of Security & Privacy preferences.
I clicked "Deny".
According to the README, it may be necessary to configure OS X to allow larger amounts of shared memory.
I created /etc/sysctl.conf to contain
kern.sysv.shmmax=1610612736
kern.sysv.shmall=393216
kern.sysv.shmmin=1
kern.sysv.shmmni=32
kern.sysv.shmseg=8
kern.maxprocperuid=512
kern.maxproc=2048
and rebooted.
I downloaded and installed pgAdminIII (
http://www.pgadmin.org/download/macosx.php) v1.20.0
It immediately identified PostgreSQL 9.4 (localhost:5432)
Entered password, and it connected! Basic functions seem to be correct.
-- Main.jgriffith - 2016-03-03