MATCH_RECOGNIZE (short intro) Alex S, http://theparticle.com/ alex@theparticle.com ref: https://trino.io/docs/current/sql/match-recognize.html The MATCH_RECOGNIZE is similar to windowing-functions, in that it operates on a window of records. It provides for regex matching on sequence of records! In traditional string regular-expressions, suppose we wanted to detect an "A" followed by one-or-more "B"s, followed by a single "C". The regex would be "AB+C", and we'd write something like this with sample_strings as ( select 'AAAAAABBBBBCCCCC' as s union all select 'BBBBBCCCCC' union all select 'AAAAAABBBBB' union all select 'AAAAAACCCCC' union all select 'ABCCCCC' union all select 'AAAAAABBBBBC' union all select 'ABC' ) select s, regexp_match(s, 'AB+C') mtched from sample_strings where regexp_like(s, 'AB+C'); output: s | mtched ------------------+----------- AAAAAABBBBBCCCCC | {ABBBBBC} ABCCCCC | {ABC} AAAAAABBBBBC | {ABBBBBC} ABC | {ABC} (4 rows) With SQL2016 and MATCH_RECOGNIZE we have something similar with records. Suppose we wanted to detect a sequence of events such as, for each customer: one-or-more sell orders, a sequence of one-or-more buy orders at progressively increasing prices, followed by one-or-more sell-trades, followed by one-or-more buy-order-cancels. note: this is often called layering or spoofing: https://en.wikipedia.org/wiki/Layering_(finance) https://en.wikipedia.org/wiki/Spoofing_(finance) Suppose we have a mktevents table such as: drop table mktevents; create table mktevents ( type_cd varchar(1), --event type: O-order, T-trade, C-cancel tim bigint, --timestamp, nanoseconds since epoch or whatever. customer_id varchar(10), --identifies the customer order_id varchar(6), --identifies the order side varchar(1), --B-buy, S-sell price decimal(18,8), --limit price of the order or trade price. qty int --quantity to buy/sell, or trade quantity. ); insert into mktevents values --positive scenario ('O',1,'XYZ','ABC123','S',10.50,100), --sell order priced at 10.50 ('O',2,'XYZ','ABC124','B',10.40,100), --buy order priced at 10.40 ('O',3,'XYZ','ABC125','B',10.41,100), ('O',4,'XYZ','ABC126','B',10.42,100), ('O',5,'XYZ','ABC127','B',10.45,100), ('O',6,'XYZ','ABC128','B',10.48,100), ('O',7,'XYZ','ABC129','B',10.49,100), --buy order priced at 10.49 ('T',8,'XYZ','ABC123','S',10.50,100), --SELL trade at 10.50 ('C',9,'XYZ','ABC124','B',null,null), --cancels for buy interst. ('C',10,'XYZ','ABC125','B',null,null), ('C',11,'XYZ','ABC126','B',null,null), ('C',12,'XYZ','ABC127','B',null,null), ('C',13,'XYZ','ABC128','B',null,null), ('C',14,'XYZ','ABC129','B',null,null), --negative scenario ('O',15,'XYZ','ABC991','S',10.50,100), --sell order priced at 10.50 ('T',16,'XYZ','ABC991','S',10.50,100), --SELL trade at 10.50 --negative scenario ('O',17,'XYZ','ABC888','S',10.50,100), --sell order priced at 10.50 ('O',18,'XYZ','ABC999','B',10.40,100), --buy order priced at 10.40 ('T',19,'XYZ','ABC999','B',10.40,100), --BUY trade at 10.40 --small positive scenario ('O',20,'XYZ','ABC001','S',10.50,100), --sell order priced at 10.50 ('O',21,'XYZ','ABC002','B',10.49,100), --buy order priced at 10.49 ('T',22,'XYZ','ABC001','S',10.50,100), --SELL trade at 10.50 ('C',23,'XYZ','ABC002','B',null,null), --cancels for buy interst. --small negative scenario (buy prices are not increasing) ('O',24,'XYZ','ABC001','S',10.50,100), --sell order priced at 10.50 ('O',25,'XYZ','ABC002','B',10.49,100), --buy order priced at 10.49 ('O',26,'XYZ','ABC003','B',10.48,100), --buy order priced at 10.48 NOT INCREASING. ('O',27,'XYZ','ABC004','B',10.47,100), --buy order priced at 10.47 ('T',28,'XYZ','ABC001','S',10.50,100), --SELL trade at 10.50 ('C',29,'XYZ','ABC002','B',null,null), --cancel ('C',30,'XYZ','ABC003','B',null,null), ('C',31,'XYZ','ABC004','B',null,null) ; We can use the match_recognize SELECT * FROM mktevents match_recognize ( PARTITION BY customer_id ORDER BY tim MEASURES match_number() as nmbr ALL ROWS PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B+ C+ D+) DEFINE A as type_cd='O' and side='S', B as type_cd='O' and side='B' and (price >= prev(price) or prev(side)='S'), C as type_cd='T' and side='S', D as type_cd='C' and side='B' ); output: customer_id | tim | nmbr | type_cd | order_id | side | price | qty -------------+-----+------+---------+----------+------+-------------+------ XYZ | 1 | 1 | O | ABC123 | S | 10.50000000 | 100 XYZ | 2 | 1 | O | ABC124 | B | 10.40000000 | 100 XYZ | 3 | 1 | O | ABC125 | B | 10.41000000 | 100 XYZ | 4 | 1 | O | ABC126 | B | 10.42000000 | 100 XYZ | 5 | 1 | O | ABC127 | B | 10.45000000 | 100 XYZ | 6 | 1 | O | ABC128 | B | 10.48000000 | 100 XYZ | 7 | 1 | O | ABC129 | B | 10.49000000 | 100 XYZ | 8 | 1 | T | ABC123 | S | 10.50000000 | 100 XYZ | 9 | 1 | C | ABC124 | B | NULL | NULL XYZ | 10 | 1 | C | ABC125 | B | NULL | NULL XYZ | 11 | 1 | C | ABC126 | B | NULL | NULL XYZ | 12 | 1 | C | ABC127 | B | NULL | NULL XYZ | 13 | 1 | C | ABC128 | B | NULL | NULL XYZ | 14 | 1 | C | ABC129 | B | NULL | NULL XYZ | 20 | 2 | O | ABC001 | S | 10.50000000 | 100 XYZ | 21 | 2 | O | ABC002 | B | 10.49000000 | 100 XYZ | 22 | 2 | T | ABC001 | S | 10.50000000 | 100 XYZ | 23 | 2 | C | ABC002 | B | NULL | NULL (18 rows) If we do not want all records, we can just pick out aggregate scenarios: SELECT * FROM mktevents match_recognize ( PARTITION BY customer_id ORDER BY tim MEASURES match_number() as nmbr, first(A.tim) as start_tim, last(D.tim) as end_tim, avg(A.price) as avg_sell_px, avg(B.price) as avg_buy_px, avg(C.price) as avg_sell_ex_px ONE ROW PER MATCH AFTER MATCH SKIP PAST LAST ROW PATTERN (A+ B+ C+ D+) DEFINE A as type_cd='O' and side='S', B as type_cd='O' and side='B' and (price >= prev(price) or prev(side)='S'), C as type_cd='T' and side='S', D as type_cd='C' and side='B' ); output: customer_id | nmbr | start_tim | end_tim | avg_sell_px | avg_buy_px | avg_sell_ex_px -------------+------+-----------+---------+-------------+-------------+---------------- XYZ | 1 | 1 | 14 | 10.50000000 | 10.44166667 | 10.50000000 XYZ | 2 | 20 | 23 | 10.50000000 | 10.49000000 | 10.50000000 (2 rows) The MATCH_RECOGNIZE feature is an unbelievably powerful tool! It opens up amazing possibilities for pattern detection and event stream processing, making complex data analysis feel almost effortless. Good luck!