You are here: Foswiki>ELabs Web>HowToGuide>UsingPostgres (2016-09-29, JoelG)Edit Attach

PostgreSQL

PostgreSQL (or Postgres) is a database management system, similar to MySQL. As the name implies, it uses standard SQL syntax for database interaction.

The VMs and developer localhosts store user data and vds data as Postgres databases. On the VMs, these are userdb2006_1012 and vds2006_1022, but they may be called something different on individual localhosts.

Postgres Quick Reference

At the top for accessibility. Commands assume default or most typical settings; keep reading if these don't work for you.

OS X does not run services like Linux distributions do, so you'll have to use the pg_ctl commands.

$ sudo -u postgres pg_ctl start -D /Library/PostgreSQL/9.4/data/
$ sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/9.4/data/
$ sudo -u postgres pg_ctl reload -D /Library/PostgreSQL/9.4/data/
$ sudo -u postgres pg_ctl stop -D /Library/PostgreSQL/9.4/data/

As shown, pg_ctl commands require specifying the data directory. The default on OS X is

/Library/PostgreSQL/X.X/data/

though you can check by examining the output of

$ ps -ef | grep postgres

for the -D flag or running

SHOW data_directory;

from within psql.

Postgres Basics

Postgres operates on a hierarchy of cluster > database > schema > table

A cluster is the grouping of individual databases associated with a single instance of the Postgres server. Each cluster is associated one-to-one with a server instance as well as a unique data directory in which the databases are stored, and each cluster is created with the default databases template0, template1 and postgres. A typical Postgres installation (ours, for example) uses only one cluster.
A database is an orderable collection of related data, used in the normal sense of the word.
A schema is an organizational unit within a database that contains one or more tables along with built-in data types, functions, and operators. Schema are similar to directories in a file system. They are most useful for controlling user permissions for various cross-sections of the data stored in the database. More on schema
A table is a named set of rows whose entries are sorted into named columns. Again, the normal sense of the word when talking about databases.

There are also views and sequences.

A view is a shorthand name assigned to a particular query statement. It is useful if you find yourself making the same query repeatedly and don't want to keep typing it out every time. Views are similar to tables and can be manipulated in almost exactly the same way.
A sequence is something.

Postgres creates an OS user (or system user) called postgres upon installation. In addition, Postgres itself has Postgres users called roles with passwords, groups, permissions, etc. The Postgres installation defines a Postgres role called postgres as a Postgres "super user" role (distinct from the OS super user and from the OS user postgres!). The Postgres super user role is used to create new Postgres roles and perform other administrative tasks. So, after installation, you have

An OS user postgres
A Postgres super user role postgres
A Postgres database postgres
The Postgres server program postgres and the terminal command postgres used to invoke it

It can get confusing.

Every Postgres database cluster is uniquely associated with one instance of the postgres server process and one data directory where cluster data is stored. The data directory is indicated by the -D flag in shell commands and output.

Using Postgres

When you use a Postgres utility (like psql, pg_ctl or postgres) or otherwise connect to a database, Postgres must recognize you acting as a certain role. If you don't specify a Postgres role explicitly, the role is assumed to be one with the same name as the OS user attempting to make the connection. So, for example, you might try to start an instance of the postgres server with the command

[Don't do this - keep reading]  
$ postgres -D /data/directory/filepath/for/this/cluster

Here, Postgres looks for a role with the same name as your username. If that role doesn't exist or doesn't have permissions to start the server, you'll get an error. Typically, this is handled by always acting as the OS user postgres, which will be recognized as the Postgres super role postgres. In practice, this means using the sudo -u postgres command to impersonate the OS user postgres when interacting with Postgres:

[This is typically the correct way to do the above command]  
$ sudo -u postgres postgres -D /data/directory/filepath/for/this/cluster

The first postgres is the postgres user, part of (sudo -u postgres), telling the shell that you want to act as the OS user postgres. The second is the invocation of the postgres server program. I warned you.

Postgres in Practice

Postgres logs

Postgres logs are, by default, in

/Library/PostgreSQL/9.4/data/pg_log/

(OS X 10.11 El Capitan)

Identifying Postgres server processes

In general, multiple Postgres server instances are possible, though a single instance is most typical. To see what, if any, Postgres processes are running, try

$ ps aux | grep postgres

If a Postgres server is running, there will probably be several processes. Look for an output line of the form

postgres           95   0.0  0.1  2641612  15640   ??  Ss   12Mar16   0:07.54 /Library/PostgreSQL/9.4/bin/postmaster -D/Library/PostgreSQL/9.4/data

postmaster is a deprecated alias of postgres, the Postgres database server (so you might see bin/postgres instead of bin/postmaster, depending on your version). This line identifies the server instances and identifies the data directory associated with it (what follows the -D flag). Remember that individual server instances, clusters, and data directories are all uniquely grouped, so if there's another cluster it will have to have its own process line with its server instance and data directory.

Note that the first entry in the line identifies which OS user initiated the server process.

Checking the status of a Postgres server instance

Once you've identified a server instance and its associated data directory, you can use the pg_ctl command to administrate it. For example,

$ sudo -u postgres pg_ctl status -D /Library/PostgreSQL/9.4/data/
pg_ctl: server is running (PID: 95)
/Library/PostgreSQL/9.4/bin/postgres "-D/Library/PostgreSQL/9.4/data"

Process id files

When a Postgres server instance is started, it creates a "process id" file, or pid file (with .pid extension) in the data directory to store the OS process id number for easy reference. This is called postmaster.pid on my system (OS X El Cap), which is probably pretty standard. I imagine that it might be named postgres.pid on some systems. When the server instance is stopped, this file is deleted. So, checking for the existence of the .pid file is another way to see if a the server is running. If it isn't, and you issue a postgres, pg_ctl, or psql command that assumes it is, you may get an error that references the absence of this file. For example,

$ sudo -u postgres pg_ctl stop -m smart -D /Library/PostgreSQL/9.4/data/
pg_ctl: PID file "/Library/PostgreSQL/9.4/data/postmaster.pid" does not exist
Is server running?

Postgres versions in the e-Labs

The e-Lab code includes a technology called VDS (Virtual Data Systems), an invention of some of the early e-Lab authors. VDS places restrictions on the version of Postgres that can be installed relative to the JDBC Postgres driver.

JDBC

Java Database Connectivity (JDBC) is a standard API, part of Java SE. JDBC classes are contained in Java packages java.sql and javax.sql. The e-Lab class DatabaseConnectionManager depends on many of these package members when forming connections to the e-Lab Postgres databases.

JDBC is for general database utility. Working with Postgres in particular requires a Postgres-specific driver, found in the e-Lab code base as

common/lib/postgresql-9.4-1205.jdbc4.jar

Note from the filename that this driver is intended for use with Postgres 9.4. Technically, later versions of Postgres should work fine with this driver. VDS, however, explicitly checks to see that the working version of Postgres is less than or equal to that indicated by this JDBC driver. Thus, the e-Lab code breaks if you use a higher version of Postgres than found in the filename of the JDBC driver.

If you want to (or have to) user a newer version of Postgres, there are a couple of solutions to this:

Check for an updated JDBC driver. As of Sept 2016, the most recent driver version is 9.4-1211, while the most recent Postgres version is 9.5.4, so there won't always be a most-up-to-date driver.
Have Postgres report itself as a lower version. This was suggested by Mihael, but I'm not certain how to implement it (Joel)
Fix VDS not to check, or to have a more flexible check. Also not sure how to implement this.

For the moment, it's best to keep Postgres 9.4 or lower.

pg_ctl

To make certain things slightly less cumbersome, Postgres includes a wrapper program called pg_ctl that can take care of several administrative tasks. This also must be invoked by the OS's Postgres user account (sudo -u postgres) and specify a data directory with the -D flag, i.e.,

$ sudo -u postgres pg_ctl <command> -D <data directory>

Some handy pg_ctl commands are

sudo -u postgres pg_ctl -w start -D /Library/PostgreSQL/9.4/data/
sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/9.4/data/
sudo -u postgres pg_ctl reload -D /Library/PostgreSQL/9.4/data/
sudo -u postgres pg_ctl stop -D /Library/PostgreSQL/9.4/data/

(Postgres 9.4 on OS X 10.11 El Capitan; adjust for your machine as needed).

The -w flag on pg_ctl start tells the process to wait for the startup or shutdown to complete. I'm not entirely sure what this entails, but in my experience it works better to have the flag.

psql

Postgres includes an interactive terminal program called psql. Thus, there are typically two ways to manipulate Postgres databases (or clusters or schema) from the terminal window: directly through the command line, or from within psql.

To start psql from the command line,

$ psql <database name>

<database name> is optional; if you don't specify one, then psql will default to a database with the name of your user account.

The psql prompt will look like

<database name>=>

<database name>=#

(the latter, using the octothorpe, indicates a Postgres super user, just as in bash).

To add a user in psql,

# CREATE USER username;

just like SQL.

Facts

The default installation path for Postgres in OS X (El Cap) is /Library/PostgreSQL/9.4, where 9.4 may vary according to your version.

Postgres creates the OS user postgres, as you can see from examining $ dscl . -list /Users or $ cat /etc/passwd | grep postgres before and after installation.

During installation, you will be prompted for a postgres user password.

Postgres includes a terminal-based front-end called psql, like a Postgres-specific shell. Enter it from the terminal using

$ sudo -u postgres /Library/PostgreSQL/9.4/bin/psql
Password: <see below>
psql (9.4.5)
Type "help" for help.

postgres=#

For the password, first enter your localhost sudo password, which will return the password prompt without any comment from the shell. It'll look like you've had a password error. Now enter your postgres user password, though, and you should enter psql. It took me forever to figure this out. If you legitimately mis-enter the password, you'll get a Sorry, try again error.

Once you've entered psql, the following commands are useful:

\list or \l lists all databases
\connect <db_name> or \c <db_name> connects to database db_name
\dt lists all tables in the current database
\d+ <table_name> shows all columns in the table table_name
\du lists users
\q exits psql and returns you to the shell.

Standard SQL syntax is accepted for general database work.

How to do a few specific things:

CREATE DATABASE <db_name>;

creates a new database. There are several options to this; see http://www.postgresql.org/docs/current/static/sql-createdatabase.html

DROP DATABASE <db_name>;

deletes a database entirely. Use with extreme care. Once you do this, a database can only be recovered from a backup, if you've made one.

If you want to change the postgres user password,

# ALTER USER postgres WITH PASSWORD '<newpassword>';
ALTER ROLE

There's also

# ALTER USER postgres WITH ENCRYPTED PASSWORD '<newpassword>';
ALTER ROLE

ENCRYPTED PASSWORD stores the password in an encrypted state, but I'm not sure what the practical difference is here, otherwise.

A default data directory can be set via the environment variable $PGDATA.

Backups

PostgreSQL provides a utility program pg_dump for creating multiple forms of backup. These should be executed from the command line, not from within psql. You will probably need to prepend sudo -u postgres to them, as with everything else.

To create a backup:

pg_dump database_name_here > database.sql
pg_dump -Fc database_name_here > database.bak
pg_dump -Ft database_name_here > database.tar

The default form of dump with .sql extension is simply a plaintext listing of the SQL commands required to re-create the database. This output file will be readable, but large. The .bak form is a compressed binary, which will be unreadable but significantly smaller than the .sql form. The .tar form is your ordinary tarball.

Permissions can be troublesome here. Using sudo -u postgres to access pg_dump as the super-user role postgres means that the dump file will be written as the OS user postgres, which means that the OS user postgres must have write permissions to the destination directory. This user will not typically have write permission to your home directory, for example, so outputting the file there will throw an error. There are several potential workarounds, but my (Joel) preferred method is to output to /tmp/ and then move to home:

sudo -u postgres pg_dump -Fc database_name_here > /tmp/database.bak ; sudo mv /tmp/database.bak ~/database.bak

(this may give a "=could not change directory to "/home/username": Permission denied=" error if you run it from your home directory. That doesn't keep it from working, though).

To restore to a DB that already exists:

pg_restore -Fc database.bak
pg_restore -Ft database.tar

If the existing database has a different name:

pg_restore -Fc -d dbname database.bak
pg_restore -Ft -d dbname database.tar

To create the DB during restoration:

pg_restore -C -d postgres database.bak
pg_restore -C -d postgres database.tar

as appropriate for the file extension in either case.

On the VMs

Postgres is installed in /etc/postgresql/9.3/

The data directory is /var/lib/postgresql/9.3/main, though you'll need to become root ($ sudo su) to cd to it.

You'll also have to become root to run the backup commands given above.

quirks

1) Home directory

You may run into a Permission denied error of the following form when using Postgres from your home directory:

jgriffi8@i2u2-db:~$ sudo -u postgres <some valid command>
could not change directory to "/home/jgriffi8": Permission denied

Tildes ( ~), which the shell uses to abbreviate your home directory, can flip Postgres's crap for some reason. To get around this, cd out of your home directory and try the command again.

2) Server won't stop

Sometimes the server won't stop with a pg_ctl stop or pg_ctl restart command. For example,

$ sudo -u postgres pg_ctl restart -D /Library/PostgreSQL/9.4/data/
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
HINT: The "-m fast" option immediately disconnects sessions rather than
waiting for session-initiated disconnection.

If that happens, try the following sequence in order (adjusting for your own data directory):

$ sudo -u postgres pg_ctl stop -D /Library/PostgreSQL/9.4/data/

$ sudo -u postgres pg_ctl stop -m s -D /Library/PostgreSQL/9.4/data/

$ sudo -u postgres pg_ctl stop -m f -D /Library/PostgreSQL/9.4/data/

$ sudo -u postgres pg_ctl stop -m i -D /Library/PostgreSQL/9.4/data/

The -m flag allows you to specify how postgres should stop the server. In increasing order of severity, the s option means "smart", the f option means "fast", and the i option means "immediate".

If an immediate stop doesn't work, something is probably really wrong. It's possible to kill the server process directly through the shell, but it's probably best to investigate what the error is and correct it, instead.

3) passwords

To make it worse, there's high potential for password confusion. When issuing the above command from a blank-slate terminal, the first password requested is your OS sudo password so that the OS can verify that you're allowed to run commands as an OS user other than yourself (in this case, as OS user postgres). Once you've entered your sudo password and the command is executed, Postgres takes the connecting OS username and, by default, looks for a Postgres role with the same name. If found, it allows the connection as that role (in this case, Postgres role postgres, the default super-user role). At this point, Postgres authenticates the connecting user. If you set a Postgres password for postgres at installation, you'll need to enter it here. Then (and only then) will Postgres allow connection to the database.

This is further complicated by two facts:

On most UNIX-based systems, entering your sudo password once establishes a time window, typically 10 or 15 minutes, where you can issue further sudo commands without re-entering your sudo password
At least on my system (OS X El Cap), the prompts for the two passwords are identical (Password: )

So, unless you're paying strict attention, it isn't always evident which password is being requested! There's probably a solution to this involving altering one or both password prompts, but my strategy so far has been to just get used to it.

Rough notes on installation

On OS X 10.11 (El Capitan)

On Wed 14Oct2015 I downloaded the graphical installer from EnterpriseDB:

http://www.enterprisedb.com/products-services-training/pgdownload#osx

Installed PostgreSQL version 9.4.5 to /Library/PostgreSQL/9.4 (default setting)

Data Directory /Library/PostgreSQL/9.4/data (default setting)

password for dtabase superuser (postgres) created and stored in pw manager

Port number the server should listen on: 5432 (default setting)

Locale to be used by the new database cluster [Default locate] (default setting)

[I didn't know what this was. It presented a giant list of options that seemed to be constructed from country name abbreviations and ISO numbers]

Do you want the application "postgres" to accept incoming network connections?

Clicking Deny may limit the application's behavior. This setting can be changed in the Firewall pane of Security & Privacy preferences.

I clicked "Deny".

According to the README, it may be necessary to configure OS X to allow larger amounts of shared memory.

I created /etc/sysctl.conf to contain

    kern.sysv.shmmax=1610612736
    kern.sysv.shmall=393216
    kern.sysv.shmmin=1
    kern.sysv.shmmni=32
    kern.sysv.shmseg=8
    kern.maxprocperuid=512
    kern.maxproc=2048

and rebooted.

I downloaded and installed pgAdminIII (http://www.pgadmin.org/download/macosx.php) v1.20.0

It immediately identified PostgreSQL 9.4 (localhost:5432)
Entered password, and it connected! Basic functions seem to be correct.

-- Main.jgriffith - 2016-03-03

Topic revision: r21 - 2016-09-29, JoelG

ELabs

Webs
- ELabs
- Main
- Sandbox
- System

Create personal sidebar

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback