Postgresql kettastruktuurist
Allikas: Lambda
Enar Reilenti koostatud ülevaade
klaster - ühe serveri instance'i poolt hallatav failide kogum ühes masinas võib olla mitu klastrit ja serverit igal klastril oma conf (postgresql.conf, pg_hba.conf, pg_ident.conf) klastri juur on tavaliselt dir data (/var/lib/pgsql/data) - PGDATA keskkonnamuutuja Data dir'i sees: Item Description --------------------------------------------------------- * PG_VERSION A file containing the major version number of PostgreSQL * base Subdirectory containing per-database subdirectories * global Subdirectory containing cluster-wide tables, such as pg_database * pg_clog Subdirectory containing transaction commit status data * pg_multixact Subdirectory containing multitransaction status data (used for shared row locks) * pg_subtrans Subdirectory containing subtransaction status data * pg_tblspc Subdirectory containing symbolic links to tablespaces * pg_twophase Subdirectory containing state files for prepared transactions * pg_xlog Subdirectory containing WAL (Write Ahead Log) files * postmaster.opts A file recording the command-line options the server was last started with * postmaster.pid A lock file recording the current server PID and shared memory segment ID (not present after server shutdown iga (kasutaja) baasi kohta on data/base/ dir'is subdir, mille nimi on andmebaasi nimele vastav ID (OID) klastri globaalsest tabelist pg_database näide: =============================================================================== perf=# SELECT dаtnаme, oid FROM pg_dаtаbаse; dаtnаme | oid -----------+------- perf | 16556 templаte1 | 1 templаte0 | 16555 -------------------------------------------------------------------- $ cd ./bаse $ ls -l totаl 12 drwx------ 2 postgres pgаdmin 4096 Jаn 01 20:53 1 drwx------ 2 postgres pgаdmin 4096 Jаn 01 20:53 16555 drwx------ 3 postgres pgаdmin 4096 Jаn 01 22:38 16556 =============================================================================== iga tabeli ja indeksi jaoks on OID nimega dir'is oma fail - nimetatakse 'filenode' filenod'i nimi tuleb tabelist: pg_class.relfilenode enamasti langeb relfilenode kokku OID'iga, aga, ajutisel tabelil ei ole relfilenode'i, kirje pg_class tabelis on ja OID ka on TRUNCATE, REINDEX, CLUSTER ja ALTER TABLE võivad muuta filenode'i, OID ei muutu näiteks: =============================================================================== templаte1=# SELECT oid, relnаme FROM pg_clаss ORDER BY oid; oid | relnаme -------+--------------------------------- 1247 | pg_type 1249 | pg_аttribute 1255 | pg_proc 1259 | pg_clаss 1260 | pg_shаdow 1261 | pg_group 1262 | pg_dаtаbаse 16384 | pg_аttrdef =============================================================================== kui tabeli või indeksi faili suurus läheb üle 1GB, siis tehakse segmendid: esimene segment on algne filenode, teine filenoad.1, kolmas filenode.2 jne iga tabeli või indeksi fail jaguneb page'ideks näide: =============================================================================== perf=# SELECT relnаme, oid, relpаges, reltuples FROM pg_clаss relnаme | oid | reltuples | relpаges --------------+------+-----------+---------- pg_type | 1247 | 143 | 2 pg_аttribute | 1249 | 795 | 11 pg_proc | 1255 | 1263 | 31 pg_clаss | 1259 | 101 | 2 ------------------------------------------------------------- $ ls -l 1247 1249 -rw------- 1 postgres pgаdmin 16384 Jаn 01 20:53 1247 -rw------- 1 postgres pgаdmin 90112 Jаn 01 20:53 1249 16384/2 = 8192 90122/11 = 8192 eаch pаge is 8192 bytes - saab muuta serveri kompileerimise käigus =============================================================================== veel üks näide: =============================================================================== robots=# select relname, oid,relfilenode, relpages,reltuples from pg_class order by reltuples; relname | oid | relfilenode | relpages | reltuples ... strings | 16389 | 16389 | 4 | 560 strings_string_key | 16395 | 16395 | 6 | 560 pg_operator | 2617 | 2617 | 13 | 702 pg_operator_oprname_l_r_n_index | 2689 | 2689 | 11 | 702 pg_operator_oid_index | 2688 | 2688 | 4 | 702 pg_description_o_c_o_index | 2675 | 2675 | 12 | 1988 pg_description | 2609 | 2609 | 16 | 1988 pg_proc_oid_index | 2690 | 2690 | 7 | 2143 pg_proc | 1255 | 1255 | 50 | 2143 pg_proc_proname_args_nsp_index | 2691 | 2691 | 40 | 2143 pg_attribute_relid_attnam_index | 2658 | 2658 | 28 | 2255 pg_attribute_relid_attnum_index | 2659 | 2659 | 9 | 2255 pg_attribute | 1249 | 1249 | 38 | 2255 pg_depend_depender_index | 2673 | 2673 | 26 | 5267 pg_depend | 2608 | 2608 | 37 | 5267 pg_depend_reference_index | 2674 | 2674 | 28 | 5267 robodata_property_idx | 16441 | 16441 | 2256 | 786039 robodata_addtime_idx | 16442 | 16442 | 2167 | 786039 robodata_date_idx | 16443 | 16443 | 2821 | 786039 robodata_pkey | 16403 | 16403 | 1728 | 786039 robodata_subject_idx | 16440 | 16440 | 2258 | 786039 idx | 24782 | 24782 | 3034 | 787235 robodata_sender_receiver_property_subject_idx | 24787 | 24787 | 3034 | 787241 robodata | 16397 | 16397 | 10792 | 787241 ... robots=# \d robodata Table "public.robodata" Column | Type | Modifiers -----------+-----------------------------+------------------------------------------------------- id | integer | not null default nextval('robobase_id_seq'::regclass) sender | integer | not null receiver | integer | addtime | timestamp without time zone | not null default now() status | character(1) | not null default 'A'::bpchar subject | integer | not null property | integer | not null value | character varying(256) | not null valuetype | integer | not null rid | integer | date | timestamp without time zone | usectime | integer | datestamp | timestamp without time zone | usecstamp | integer | source | integer | context | integer | Indexes: "robodata_pkey" PRIMARY KEY, btree (id) "idx" btree (subject, property, addtime) "robodata_addtime_idx" btree (addtime) "robodata_date_idx" btree (date) "robodata_property_idx" btree (property) "robodata_sender_receiver_property_subject_idx" btree (sender, receiver, property, subject) "robodata_subject_idx" btree (subject) Foreign-key constraints: "robodata_context_fkey" FOREIGN KEY (context) REFERENCES strings(id) "robodata_property_fkey" FOREIGN KEY (property) REFERENCES strings(id) "robodata_receiver_fkey" FOREIGN KEY (receiver) REFERENCES strings(id) "robodata_sender_fkey" FOREIGN KEY (sender) REFERENCES strings(id) "robodata_source_fkey" FOREIGN KEY (source) REFERENCES strings(id) "robodata_subject_fkey" FOREIGN KEY (subject) REFERENCES strings(id) "robodata_valuetype_fkey" FOREIGN KEY (valuetype) REFERENCES strings(id) postgres@eliko4:~/8.3/main/base/16384$ ls -l 16397 -rw------- 1 postgres postgres 88408064 2009-04-14 09:34 16397 88408064 / 10792 = 8192 bati iga page 88408064 / 787241 = 112.3 baiti iga rea kohta keskmiselt =============================================================================== page'il on 5 osa: Item Description * PageHeaderData 24 bytes long. Contains general information about the page, including free space pointers. * ItemIdData Array of (offset,length) pairs pointing to the actual items. 4 bytes per item. * Free space The unallocated space. New item pointers are allocated from the start of this area, new items from the end. * Items The actual items themselves. * Special space Index access method specific data. Different methods store different data. Empty in ordinary tables. Items osa 'item' on data tabeli puhul row, index'i puhul node (entry) PageHeaderData hoiab pointereid (offset) Free space'i algusesse ja lõppu, Special space'ile; flage; viimase muutuse xlog andmeid (XLogRecPtr, TimeLineID); versiooni !: PostgreSQL 8.3 alates 4 PostgreSQL 8.1, 8.2 3 PostgreSQL 8.0 2 PostgreSQL 7.3, 7.4 1 varasemad 0 ItemIdData's on viidad itemitele, iga viit 4 baiti: 2 offset page'i algusest + 2 item'i pikkus. itemite hulka saab PageHeaderData'st välja rehkendada. kui item'it page'i sees ümber paigutatakse, siis viida asukoht (array's) jääb samaks! seega mistahes viidad item'ile (ItemPointer, also known as CTID) esitatakse andmebaasis nii: viita pageile ja item'i indeksit array's Special space kas puudub või hoitakse seal access meetodite spetsiifilist datat, näiteks b-puu indeksi vasaku ja parema naaber-page'i (sibling) viidad item jaguneb kaheks: muutuva pikkusega header ja data osa headeris on kindlasti: Field Length Description t_xmin 4 bytes insert XID stamp t_xmax 4 bytes delete XID stamp t_cid 4 bytes insert and/or delete CID stamp (overlays with t_xvac) t_xvac 4 bytes XID for VACUUM operation moving a row version t_ctid 6 bytes current TID of this or newer row version t_infomask2 2 bytes number of attributes, plus various flag bits t_infomask 2 bytes various flag bits t_hoff 1 byte offset to user data t_hoff must always be a multiple of the MAXALIGN distance for the platform lisaks optional headeri osad: null bitmap ja object id kui aligne'i pärast on vaja jätta tühja ruumi, jäetakse see null bitmapi ja object id vahele data osa interpreteerimiseks on vaja teiste (süsteemsete) tabelite abi: tulpade väärtused on järjest ühes baidijadas pg_attribute tabelist veerud attlen ja attalign määravad välja väärtuse asukoha baidijadas näide: ===================================================================== robots=# select count(*) from pg_attribute; count ------- 2262 (1 row) robots=# select * from pg_attribute where attname = 'subject' limit 7; attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount ----------+---------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+--------------+------------+------------- 16397 | subject | 23 | -1 | 4 | 6 | 0 | -1 | -1 | t | p | i | t | f | f | t | 0 16440 | subject | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0 24782 | subject | 23 | -1 | 4 | 1 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0 24787 | subject | 23 | -1 | 4 | 4 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0 (4 rows) robots=# robots=# select * from pg_attribute where attname = 'receiver' limit 7; attrelid | attname | atttypid | attstattarget | attlen | attnum | attndims | attcacheoff | atttypmod | attbyval | attstorage | attalign | attnotnull | atthasdef | attisdropped | attislocal | attinhcount ----------+----------+----------+---------------+--------+--------+----------+-------------+-----------+----------+------------+----------+------------+-----------+--------------+------------+------------- 16397 | receiver | 23 | -1 | 4 | 3 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0 24787 | receiver | 23 | -1 | 4 | 2 | 0 | -1 | -1 | t | p | i | f | f | f | t | 0 (2 rows) ===================================================================== kõige parem olukord on siis, kui kõik väljad on fixed pikkusega ja null'e ei ole kui välja andmetüüp on muutuva pikkusega (attrlen=-1) jaguneb väli omakorda dataks ja päiseks päises on öeldud data pikkus ja TOAST info TOAST - The Oversized-Attribute Storage Technique or "the best thing since sliced bread" kuna ei lubata data row jagamist mitmesse page'i, siis tuleb suurte väärtuste puhul pakkida või jagada row osadeks ja panna teise tabelisse. TOAST'i saab teha ainult muutuva pikkusega väljadel (varlena) flag bit'id varlena välja päises näitavad, kas tegu on: tavalise dataga otse väljas pakitud dataga otse väljas viidaga (+muud andmed) TOAST'itud datale teises tabelis kui kasutaja tabelis on võimalikud suured väärtused, siis tehakse automaatselt selle tabeli TOAST tabel, mille oid kirjutatakse pg_class.reltoastrelid välja suured väärtused jagatakse tükkideks - 2000 baiti tavaliselt - 4 tükki mahub 1 page'i iga tükk läheb row'ks TOAST tabelis chunk_id ja chunk_seq väljad seovad (loogiliselt) tükid kokku päris tabeli row väärtuse kohal on TOAST tabeli oid ja chunk_id (+suurused) seda, kui suur peab välja väärtus olema, et TOAST rakenduks, saab konfida tabeli tulpadel võib olla 4 erinevat TOAST strateegiat: PLAIN - ei pakkimist, ei TOAST'i EXTENDED - algul üritatakse pakkida, kui ei piisa, siis TOAST'ida EXTERNAL - pakkimist ei ole lubatud (tekstiotsinguks) MAIN - pakkimine on lubatud, TOAST'i kasutatakse viimases hädas, kui muul viisil ei saa rida väikseks teha ALTER TABLE SET STORAGE - TOAST strateegia muutmiseks