NEW BOT Телеграм, страница - 38539786

postgresources

@postgresources

19 subscribers

8 photos

149 links

halp: @olegkovalov

Download Telegram

About

Blog

Apps

Platform

#rand Selecting a 'sample' of a table's rows

create a table that contains 1000 rows — the numbers 1 to 1000:

CREATE TABLE numbers (number int);
INSERT INTO numbers (number)
  SELECT id FROM generate_series(1,1000) AS id;

Postgres 9.5 introduced a new TABLESAMPLE clause that lets you sample tables in different ways (2 ways by default, but more can be added via extensions). Let's give it a go at retrieving a random 0.5% of the rows from our table

SELECT * FROM numbers TABLESAMPLE BERNOULLI (0.5);
 number 
--------
    101
    213
    278
    433

The BERNOULLI here selects rows from the table with a 0.5% chance (so statistically we'd get 5 rows on average), but there's also a more efficient SYSTEMmethod that uses a cruder block-based technique (with the possible downside that it tends to return ranges/groups of rows within the table).

You can also use TABLESAMPLE with other SQL statements like UPDATE or `DELETE`if, say, you wanted to delete 30% of a table's rows for whatever reason.

14 viewsedited 17:00

# “is a subset of”
SELECT * FROM json_test WHERE data @> '{"a":1}';

SELECT * FROM json_test WHERE data <@ '{"a":1}';

# test for the existence of a key
SELECT * FROM json_test WHERE data ? 'a';

# test for objects that have any of a list of keys:
SELECT * FROM json_test WHERE data ?| array['a', 'b'];

15 viewsedited 17:15

#array In Postgres syntax, the part '{1,2,3}' is a plain string, :: is the Postgres typecast operator, and int[] is the type (an array of integers). The expression '{1,2,3}'::int[] reads as “typecast the string {1,2,3} into an array of integers”.

The string notation of curly brackets and comma-separated values is an array value constructor recognized by Postgres.

test=# SELECT '{1,2,3}'::int[] ; -- an array of integers
  int4
---------
 {1,2,3}
(1 row)

test=# SELECT '{"hello","world"}'::text[] ; -- an array of strings
     text
---------------
 {hello,world}
(1 row)

16 views17:19

test=# SELECT array[1,2,3];
    array
  ---------
   {1,2,3}
  (1 row)
  
  test=# SELECT array['hello', 'world'];
       array
  ---------------
   {hello,world}
  (1 row)
  
  test=# SELECT array[1, 'world'];
  ERROR:  invalid input syntax for integer: "world"
  LINE 1: SELECT array[1, 'world'];

14 viewsedited 17:19

#array Here is how you can have an array as a column type in a table:

test=# CREATE TABLE posts (
test(#     noscript text NOT NULL PRIMARY KEY,
test(#     tagz  text[]
test(#     tags  text[] NOT NULL DEFAULT '{}'::text[]
test(# );
CREATE TABLE

test=# INSERT INTO posts (noscript, tags) VALUES
test-# ('Using PostgreSQL Arrays with Golang', '{"postgres","golang"}');
INSERT 0 1

15 views17:22

12 views17:35

#index #bloom

Let us create a 1,000,000 row 57 MB table with 5 random integer columns:

CREATE TABLE foo(id SERIAL, c1 INT, c2 INT, c3 INT, c4 INT, c5 INT,
                     it TIMESTAMP NOT NULL DEFAULT NOW());
    INSERT INTO foo(c1, c2, c3, c4, c5)
      SELECT random() * 1000000, random() * 1000000, random() * 1000000,
             random() * 1000000, random() * 1000000
      FROM generate_series(1, 1000000);

The following query takes 110.2 ms:

SELECT * FROM foo WHERE c1 = 171267 AND c4=65555;
     -- Seq Scan on foo
     --   Filter: ((c1 = 171267) AND (c4 = 65555))
     --   Rows Removed by Filter: 999999

After creating a 13 MB Bloom index on the five columns:

CREATE INDEX foo_c1c2c3c4c5
      ON foo USING bloom(c1, c2, c3, c4, c5)
      WITH (length=64, col1=3, col2=3, col3=3, col4=3, col5=3);

The same query takes 9.3 ms:

SELECT * FROM foo WHERE c1 = 171267 AND c4=65555;
     -- Bitmap Heap Scan on foo
     --   Recheck Cond: ((c1 = 171267) AND (c4 = 65555))
     --   Rows Removed by Index Recheck: 232
     --   Heap Blocks: exact=232
     --   ->  Bitmap Index Scan on foo_c1c2c3c4c5
     --       Index Cond: ((c1 = 171267) AND (c4 = 65555))

The index provides a 10 to 20 speedup to these queries on my SSD laptop.

15 viewsedited 18:02

#json

https://www.postgresql.org/docs/9.6/functions-json.html
https://www.postgresql.org/docs/9.6/datatype-json.html

18 views18:04

#date

date_trunc truncates timestamps to a certain level of precision, such as a specific hour, second, day, week, year, or so on.

CREATE TABLE orders (created_at timestamp);

INSERT INTO orders VALUES
  ('2019-01-01 04:05:33'),
  ('2019-01-01 09:05:51'),
  ('2019-01-03 02:55:14'),
  ('2019-01-08 14:12:07');

SELECT date_trunc('day', created_at) AS day,
       COUNT(created_at) FROM orders
       GROUP BY day
       ORDER BY day;

day                 | count
----------------------------
2019-01-01T00:0 ... | 2
2019-01-08T00:0 ... | 1
2019-01-03T00:0 ... | 1

16 views14:01

CREATE TABLE electrothingy
AS
  SELECT
    x::int AS id,
    (x::int % 20000)::int AS locid,  -- fake location ids in the range of 1-20000
    now() AS tsin,                   -- static timestmap
    97.5::numeric(5,2) AS temp,      -- static temp
    x::int AS usage                  -- usage the same as id not sure what we want here.
  FROM generate_series(1,1728000000) -- for 1.7 billion rows
    AS gs(x);

14 views08:28

#index

Combined indexes vs Separate indexes

A “composite index”, also known as “concatenated index”, is an index on multiple columns in a table. Using separate or using composite indexes?

https://www.cybertec-postgresql.com/en/combined-indexes-vs-separate-indexes-in-postgresql/

CYBERTEC PostgreSQL | Services & Support

Composite/ combined indexes vs. separate indexes in PostgreSQL

What is more beneficial: Using separate or combined indexes in your PostgreSQL database? Speed performance by understanding index scans.

11 views07:31

pgstattuple is a built-in Postgres extension that provides a variety of functions for looking at statistics of various objects within your database. One use is for checking how many dead rows (often caused when data is deleted from a table) there are or how much free space there is within a table.

Let's create a table with 1000 rows of data and see what info pgstattuple can give us:

CREATE TABLE numbers (id int);
INSERT INTO numbers SELECT * FROM generate_series(1,1000);

CREATE EXTENSION pgstattuple;
SELECT * FROM public.pgstattuple('numbers');

DELETE FROM numbers WHERE id < 500;
SELECT * FROM public.pgstattuple('numbers');

Now we get a dead_tuple_count of 499 — the number of rows we deleted! But the overall table_len is the same.. because of those dead rows. This can be resolved by running: VACUUM numbers;

Re-running pgstattuple at this stage will show no dead rows, but the free_space will jump up.. with table_len remaining the same! The reason for this is that VACUUM will clean up dead rows an

14 views12:59

http://www.interdb.jp/pg/index.html

13 views09:14

-- blocks all writes
CREATE INDEX items_value_idx ON items USING GIN (value jsonb_path_ops);

-- only blocks other DDL
CREATE INDEX CONCURRENTLY items_value_idx ON items USING GIN (value jsonb_path_ops);

11 views17:42

ALTER TABLE items ADD COLUMN last_update timestamptz;

SET lock_timeout TO '2s'
ALTER TABLE items ADD COLUMN last_update timestamptz;

14 views17:42

-- blocks queries for a long time
ALTER TABLE items ADD PRIMARY KEY (id);

-- takes a long time, but doesn’t block queries
CREATE UNIQUE INDEX CONCURRENTLY items_pk ON items (id);
-- blocks queries, but only very briefly
ALTER TABLE items ADD CONSTRAINT items_pk PRIMARY KEY USING INDEX items_pk;

10 views17:43

#locks

Add a new column (safe)
Drop a column (safe)
Add an index concurrently (safe)
Drop a constraint (safe)
Add a default value to an existing column (safe)

Add an index (unsafe)
Add a column with a default (unsafe if PostgreSQL < 11)
Add a column that is non-nullable (unsafe if PostgreSQL < 11)
Add a column with a unique constraint (unsafe)
Change the type of a column (unsafe)
VACUUM FULL (unsafe)
ALTER TABLE SET TABLESPACE (unsafe)

11 viewsedited 17:45

#mvcc

https://habr.com/ru/company/postgrespro/blog/442804/
https://habr.com/ru/company/postgrespro/blog/444536/
https://habr.com/ru/company/postgrespro/blog/445820/
https://habr.com/ru/company/postgrespro/blog/446652/
https://habr.com/ru/company/postgrespro/blog/449704/
https://habr.com/ru/company/postgrespro/blog/452320/
https://habr.com/ru/company/postgrespro/blog/452762/

12 viewsedited 18:42

#query

At the most basic level, DISTINCT is a keyword you can use on SELECT queries to eliminate duplicate rows from the end result. For example:

SELECT DISTINCT city FROM employees;

This simple example is basically a nicer looking equivalent to:

SELECT city FROM employees GROUP BY city;

12 viewsedited 09:01

#btree #index

https://www.qwertee.io/blog/postgresql-b-tree-index-explained-part-1/

PostgreSQL B-Tree Index Explained - PART 1

One can imagine searching a certain explanation in a textbook. One way for this is to read the entire book again. Of course this works, but it will t…

13 viewsedited 14:16

#query #parallel

https://www.percona.com/blog/2019/07/30/parallelism-in-postgresql/

15 viewsedited 14:19