Postgresql kettastruktuurist
Allikas: Lambda
Enar Reilenti koostatud ülevaade
klaster - ühe serveri instance'i poolt hallatav failide kogum
ühes masinas võib olla mitu klastrit ja serverit
igal klastril oma conf (postgresql.conf, pg_hba.conf, pg_ident.conf)
klastri juur on tavaliselt dir data (/var/lib/pgsql/data) - PGDATA keskkonnamuutuja
Data dir'i sees:
Item Description
---------------------------------------------------------
* PG_VERSION A file containing the major version number of PostgreSQL
* base Subdirectory containing per-database subdirectories
* global Subdirectory containing cluster-wide tables, such as pg_database
* pg_clog Subdirectory containing transaction commit status data
* pg_multixact Subdirectory containing multitransaction status data (used for shared row locks)
* pg_subtrans Subdirectory containing subtransaction status data
* pg_tblspc Subdirectory containing symbolic links to tablespaces
* pg_twophase Subdirectory containing state files for prepared transactions
* pg_xlog Subdirectory containing WAL (Write Ahead Log) files
* postmaster.opts A file recording the command-line options the server was last started with
* postmaster.pid A lock file recording the current server PID and shared memory segment ID (not present after server shutdown
iga (kasutaja) baasi kohta on data/base/ dir'is subdir, mille nimi on andmebaasi nimele vastav ID (OID)
klastri globaalsest tabelist pg_database
näide:
===============================================================================
perf=# SELECT dаtnаme, oid FROM pg_dаtаbаse;
dаtnаme | oid
-----------+-------
perf | 16556
templаte1 | 1
templаte0 | 16555
--------------------------------------------------------------------
$ cd ./bаse
$ ls -l
totаl 12
drwx------ 2 postgres pgаdmin 4096 Jаn 01 20:53 1
drwx------ 2 postgres pgаdmin 4096 Jаn 01 20:53 16555
drwx------ 3 postgres pgаdmin 4096 Jаn 01 22:38 16556
===============================================================================
iga tabeli ja indeksi jaoks on OID nimega dir'is oma fail - nimetatakse 'filenode'
filenod'i nimi tuleb tabelist: pg_class.relfilenode
enamasti langeb relfilenode kokku OID'iga, aga, ajutisel tabelil ei ole relfilenode'i,
kirje pg_class tabelis on ja OID ka on
TRUNCATE, REINDEX, CLUSTER ja ALTER TABLE võivad muuta filenode'i, OID ei muutu
näiteks:
===============================================================================
templаte1=# SELECT oid, relnаme FROM pg_clаss ORDER BY oid;
oid | relnаme
-------+---------------------------------
1247 | pg_type
1249 | pg_аttribute
1255 | pg_proc
1259 | pg_clаss
1260 | pg_shаdow
1261 | pg_group
1262 | pg_dаtаbаse
16384 | pg_аttrdef
===============================================================================
kui tabeli või indeksi faili suurus läheb üle 1GB, siis tehakse segmendid:
esimene segment on algne filenode, teine filenoad.1, kolmas filenode.2 jne
iga tabeli või indeksi fail jaguneb page'ideks
näide:
===============================================================================
perf=# SELECT relnаme, oid, relpаges, reltuples FROM pg_clаss
relnаme | oid | reltuples | relpаges
--------------+------+-----------+----------
pg_type | 1247 | 143 | 2
pg_аttribute | 1249 | 795 | 11
pg_proc | 1255 | 1263 | 31
pg_clаss | 1259 | 101 | 2
-------------------------------------------------------------
$ ls -l 1247 1249
-rw------- 1 postgres pgаdmin 16384 Jаn 01 20:53 1247
-rw------- 1 postgres pgаdmin 90112 Jаn 01 20:53 1249
16384/2 = 8192
90122/11 = 8192
eаch pаge is 8192 bytes - saab muuta serveri kompileerimise käigus
===============================================================================
veel üks näide:
===============================================================================
robots=# select relname, oid,relfilenode, relpages,reltuples from pg_class order by reltuples;
relname | oid | relfilenode | relpages | reltuples
...
strings | 16389 | 16389 | 4 | 560
strings_string_key | 16395 | 16395 | 6 | 560
pg_operator | 2617 | 2617 | 13 | 702
pg_operator_oprname_l_r_n_index | 2689 | 2689 | 11 | 702
pg_operator_oid_index | 2688 | 2688 | 4 | 702
pg_description_o_c_o_index | 2675 | 2675 | 12 | 1988
pg_description | 2609 | 2609 | 16 | 1988
pg_proc_oid_index | 2690 | 2690 | 7 | 2143
pg_proc | 1255 | 1255 | 50 | 2143
pg_proc_proname_args_nsp_index | 2691 | 2691 | 40 | 2143
pg_attribute_relid_attnam_index | 2658 | 2658 | 28 | 2255
pg_attribute_relid_attnum_index | 2659 | 2659 | 9 | 2255
pg_attribute | 1249 | 1249 | 38 | 2255
pg_depend_depender_index | 2673 | 2673 | 26 | 5267
pg_depend | 2608 | 2608 | 37 | 5267
pg_depend_reference_index | 2674 | 2674 | 28 | 5267
robodata_property_idx | 16441 | 16441 | 2256 | 786039
robodata_addtime_idx | 16442 | 16442 | 2167 | 786039
robodata_date_idx | 16443 | 16443 | 2821 | 786039
robodata_pkey | 16403 | 16403 | 1728 | 786039
robodata_subject_idx | 16440 | 16440 | 2258 | 786039
idx | 24782 | 24782 | 3034 | 787235
robodata_sender_receiver_property_subject_idx | 24787 | 24787 | 3034 | 787241
robodata | 16397 | 16397 | 10792 | 787241
...
robots=# \d robodata
Table "public.robodata"
Column | Type | Modifiers
-----------+-----------------------------+-------------------------------------------------------
id | integer | not null default nextval('robobase_id_seq'::regclass)
sender | integer | not null
receiver | integer |
addtime | timestamp without time zone | not null default now()
status | character(1) | not null default 'A'::bpchar
subject | integer | not null
property | integer | not null
value | character varying(256) | not null
valuetype | integer | not null
rid | integer |
date | timestamp without time zone |
usectime | integer |
datestamp | timestamp without time zone |
usecstamp | integer |
source | integer |
context | integer |
Indexes:
"robodata_pkey" PRIMARY KEY, btree (id)
"idx" btree (subject, property, addtime)
"robodata_addtime_idx" btree (addtime)
"robodata_date_idx" btree (date)
"robodata_property_idx" btree (property)
"robodata_sender_receiver_property_subject_idx" btree (sender, receiver, property, subject)
"robodata_subject_idx" btree (subject)
Foreign-key constraints:
"robodata_context_fkey" FOREIGN KEY (context) REFERENCES strings(id)
"robodata_property_fkey" FOREIGN KEY (property) REFERENCES strings(id)
"robodata_receiver_fkey" FOREIGN KEY (receiver) REFERENCES strings(id)
"robodata_sender_fkey" FOREIGN KEY (sender) REFERENCES strings(id)
"robodata_source_fkey" FOREIGN KEY (source) REFERENCES strings(id)
"robodata_subject_fkey" FOREIGN KEY (subject) REFERENCES strings(id)
"robodata_valuetype_fkey" FOREIGN KEY (valuetype) REFERENCES strings(id)
postgres@eliko4:~/8.3/main/base/16384$ ls -l 16397
-rw------- 1 postgres postgres 88408064 2009-04-14 09:34 16397
88408064 / 10792 = 8192 bati iga page
88408064 / 787241 = 112.3 baiti iga rea kohta keskmiselt
===============================================================================
page'il on 5 osa:
Item Description
* PageHeaderData 24 bytes long. Contains general information about the page, including free space pointers.
* ItemIdData Array of (offset,length) pairs pointing to the actual items. 4 bytes per item.
* Free space The unallocated space. New item pointers are allocated from the start of this area, new items from the end.
* Items The actual items themselves.
* Special space Index access method specific data. Different methods store different data. Empty in ordinary tables.
Items osa 'item' on data tabeli puhul row, index'i puhul node (entry)
PageHeaderData hoiab pointereid (offset) Free space'i algusesse ja lõppu, Special space'ile;
flage;
viimase muutuse xlog andmeid (XLogRecPtr, TimeLineID);
versiooni !:
PostgreSQL 8.3 alates 4
PostgreSQL 8.1, 8.2 3
PostgreSQL 8.0 2
PostgreSQL 7.3, 7.4 1
varasemad 0
ItemIdData's on viidad itemitele, iga viit 4 baiti: 2 offset page'i algusest + 2 item'i pikkus.
itemite hulka saab PageHeaderData'st välja rehkendada.
kui item'it page'i sees ümber paigutatakse, siis viida asukoht (array's) jääb samaks!
seega mistahes viidad item'ile (ItemPointer, also known as CTID) esitatakse andmebaasis nii:
viita pageile ja item'i indeksit array's
Special space kas puudub või hoitakse seal access meetodite spetsiifilist datat,
näiteks b-puu indeksi vasaku ja parema naaber-page'i (sibling) viidad
item jaguneb kaheks: muutuva pikkusega header ja data osa
headeris on kindlasti:
Field Length Description
t_xmin 4 bytes insert XID stamp
t_xmax 4 bytes delete XID stamp
t_cid 4 bytes insert and/or delete CID stamp (overlays with t_xvac)
t_xvac 4 bytes XID for VACUUM operation moving a row version
t_ctid 6 bytes current TID of this or newer row version
t_infomask2 2 bytes number of attributes, plus various flag bits
t_infomask 2 bytes various flag bits
t_hoff 1 byte offset to user data
t_hoff must always be a multiple of the MAXALIGN distance for the platform
lisaks optional headeri osad: null bitmap ja object id
kui aligne'i pärast on vaja jätta tühja ruumi, jäetakse see null bitmapi ja object id vahele
data osa interpreteerimiseks on vaja teiste (süsteemsete) tabelite abi:
tulpade väärtused on järjest ühes baidijadas
pg_attribute tabelist veerud attlen ja attalign määravad välja väärtuse asukoha baidijadas
näide:
=====================================================================
robots=# select count(*) from pg_attribute;
count
-------
2262
(1 row)
robots=# select * from pg_attribute where attname = 'subject' limit 7;
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount
----------+---------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+--------------+------------+-------------
16397 | subject | 23 | -1 | 4 | 6 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0
16440 | subject | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0
24782 | subject | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0
24787 | subject | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0
(4 rows)
robots=#
robots=# select * from pg_attribute where attname = 'receiver' limit 7;
attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount
----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+--------------+------------+-------------
16397 | receiver | 23 | -1 | 4 | 3 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0
24787 | receiver | 23 | -1 | 4 | 2 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0
(2 rows)
=====================================================================
kõige parem olukord on siis, kui kõik väljad on fixed pikkusega ja null'e ei ole
kui välja andmetüüp on muutuva pikkusega (attrlen=-1) jaguneb väli omakorda dataks ja päiseks
päises on öeldud data pikkus ja TOAST info
TOAST - The Oversized-Attribute Storage Technique or "the best thing since sliced bread"
kuna ei lubata data row jagamist mitmesse page'i, siis tuleb suurte väärtuste puhul pakkida
või jagada row osadeks ja panna teise tabelisse.
TOAST'i saab teha ainult muutuva pikkusega väljadel (varlena)
flag bit'id varlena välja päises näitavad, kas tegu on:
tavalise dataga otse väljas
pakitud dataga otse väljas
viidaga (+muud andmed) TOAST'itud datale teises tabelis
kui kasutaja tabelis on võimalikud suured väärtused, siis tehakse automaatselt
selle tabeli TOAST tabel, mille oid kirjutatakse pg_class.reltoastrelid välja
suured väärtused jagatakse tükkideks - 2000 baiti tavaliselt - 4 tükki mahub 1 page'i
iga tükk läheb row'ks TOAST tabelis
chunk_id ja chunk_seq väljad seovad (loogiliselt) tükid kokku
päris tabeli row väärtuse kohal on TOAST tabeli oid ja chunk_id (+suurused)
seda, kui suur peab välja väärtus olema, et TOAST rakenduks, saab konfida
tabeli tulpadel võib olla 4 erinevat TOAST strateegiat:
PLAIN - ei pakkimist, ei TOAST'i
EXTENDED - algul üritatakse pakkida, kui ei piisa, siis TOAST'ida
EXTERNAL - pakkimist ei ole lubatud (tekstiotsinguks)
MAIN - pakkimine on lubatud, TOAST'i kasutatakse viimases hädas, kui muul viisil ei saa rida väikseks teha
ALTER TABLE SET STORAGE - TOAST strateegia muutmiseks