Samples of Data Change Notifications of Interest to Finance

Scott:
How to find the answers to these seven sample "datababase change notifications" provided by Ben and Doug. I don't think I need to figure out how to do all of them right away, but I'd like to flesh out a few. Below are the tables/views I'm contemplating using. If I end up using triggers as the main mechanism, I think any of the views used would need to be materialized - using Lloyd's technique.

1. New hire in HVTN or any other network

Scott:
Is there a table or view that lists all employee ids and their network affiliation (or department)?

Maybe?

views.mpmybt_employee_data

views.mpbt_activestaff

pdb.tbl_dir_person_network

matviews.mv_mpmybt_employee_data

Lloyd:
When do they want to know? If after the first paycheck is ok, then you can monitor public.payroll which has the project id's and the join to public.node_project_id (new table) where the node is the network.


This tell you all new people who charge to a project. I think this may be actually what they want
.

Mija:
You could certainly do a combo of admin.tbl_protocols with something from finance. One question is how is Ben defining in a network? Paid initially from a network? Paid in part by a network?

2. New protocol is entered

Scott:
Maybe a entry (row) in:

pdb.tbl_protocol

or protimp.protocols

Mija:
Does Ben want a new protocol or a new study? What does he mean by entered? Protocol should be from pdb.tbl_protocol, study is from source.protocol.


Polling Sample Code

-- Function: poll_new_protocols()

-- DROP FUNCTION poll_new_protocols();

CREATE OR REPLACE FUNCTION poll_new_protocols()
  RETURNS void AS
$BODY$DECLARE
  inserted_row  source.protocol%ROWTYPE;
BEGIN

  IF NOT EXISTS(SELECT tablename FROM pg_catalog.pg_tables WHERE 
     tablename = 'protocol_previous' AND schemaname ='source') THEN
    CREATE TABLE source.protocol_previous AS SELECT protocol_id FROM source.protocol;
  END IF;

  FOR inserted_row IN SELECT  * FROM source.protocol WHERE protocol_id NOT IN
      (SELECT protocol_id FROM 	source.protocol_previous WHERE protocol_id IS NOT NULL) 
      LOOP
	  INSERT INTO joe_event_table
	       VALUES
	  (
		'poll_new_protocols',
		now(),
		'New Protocol',
		'New Protocol - ' || inserted_row.tla,
		CURRENT_USER,
		'source.protocol'
	   );
	   INSERT INTO source.protocol_previous VALUES ( inserted_row.protocol_id);	
       END LOOP;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;

Sample run:

SELECT * FROM poll_new_protocols();
DELETE FROM source.protocol_previous WHERE protocol_id=12;
DELETE FROM source.protocol_previous WHERE protocol_id=24;
SELECT * FROM poll_new_protocols();


INSERT INTO joe_event_table 
(originating_event, event_time, event_text, event_query, database_user, triggering_table, event_id)
 VALUES 
('poll_new_protocols', '2009-04-06 14:08:41.007-07', 'New Protocol', 'New Protocol - v041', 'slangley', 'source.protocol', 1);
INSERT INTO joe_event_table 
(originating_event, event_time, event_text, event_query, database_user, triggering_table, event_id)
 VALUES 
('poll_new_protocols', '2009-04-06 14:08:41.007-07', 'New Protocol', 'New Protocol - h021', 'slangley', 'source.protocol', 2);

Trigger Sample Code

   Example:

CREATE OR REPLACE FUNCTION getnewprotocol()
  RETURNS trigger AS
$BODY$
    BEGIN
	INSERT INTO joe_event_table
	    VALUES
	(
		TG_NAME,
		now(),
		'New Protocol,
		'New Protocol - ' || inserted_row.tla,
		CURRENT_USER,
		TG_RELNAME
	);
	RETURN NULL;
     END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;

CREATE TRIGGER new_protocol
  AFTER INSERT
  ON source.protocol
  FOR EACH ROW
  EXECUTE PROCEDURE getnewprotocol();

3. Key date changes to protocol (1st participant enrolled, Database Lock)

Scott:
Somehow capture the dates changes are made to the table:

protimp.protocols

Lloyd:
Also take a look at matviews.mv_protocol_milestone_dates This has all the major dates that they are interested in. It i s a materialized view, so you may want to take a look at the underlying queries.

Mija:
I would go for stuff in pdb.tbl_protocol_milestones

If we are only looking for key milestones as defined in the table:

source.protocol_status

"not implemented"
"in setup"
"enrolling"
"followup"
"closed"
"locked"
"archived"

then we could add a trigger to source.protocol that detects row updates and inserts.

Otherwise, other types of milestones - as recorded in:

	pdb.tbl_protocol_milestones
would need to be captured via polling instead of triggers as they are replicated - dropped and recreated every night.

4. New project ID is created

Scott:
A new entry (row) in the view:

views.all_project_ids

Lloyd:
This is also a brand new table that I just created yesterday. It combines the projects ID's from various table to try and create a master list.


5. First time a cost occurs on a project ID

Scott:
Derive a new view from:

public.project_expenses

Lloyd:
You may also want to check public.der_enhanced and public.payroll as this data comes directly from the hutch accounting.


6. If it has been greater than 30 days since a cost occurred on any given project ID

Scott:
Derive a new view from:

public.project_expenses

Lloyd:
Same as above.


7. Increase in personnel costs >10% for a given month

Scott
Not sure what this means. Per VIDI, per network or department, per project ID, per supervisor, per employee id?


Lloyd:
This would be take the payroll for each project id for the previous month and compare it to the current months payroll and if any project increased over 10% then send out an alert. You would monitor public.payroll for this
.

  • Do we need to track when we've fired an event for a projectid, month pair so that we don't fire the event a second time or just fire an event anytime we transition from NULL or less than 10% to greater than or equal to 10%?
  • Need a function to calculate the percentage changed between one month total and the next month total.
  • Do we need to allow a window of time for the data to 'settle' before we actually run this comparison query on a month's total?
  • Lloyd says finance is not interested in 'adjustments to previous months' for this calculation.
  • Lloyd thinks this should be per project/per employee rather than just per project.
  • Triggers are not currently possible for this event because the relevant data is replicated nightly.

In the below version, if there was no data for the previous month, we assume that the percent increase is 100% - rather than infinity.

Polling Sample Code

-- Function: poll_ten_percent_monthly_personnel_costs_change()

-- DROP FUNCTION poll_ten_percent_monthly_personnel_costs_change();

CREATE OR REPLACE FUNCTION poll_ten_percent_monthly_personnel_costs_change()
  RETURNS void AS
$BODY$

DECLARE
    set_of_rows_to_process REFCURSOR;
    row_to_process percent_monthly_personnel_costs_change%ROWTYPE;
    new_percent_change double precision := 0.0;
    current_percent_change double precision := 0.0;
    old_percent_change double precision:= 0.0;
BEGIN

    -- Create the Table for monitoring if it doesn't exist
    IF NOT EXISTS(SELECT tablename FROM pg_catalog.pg_tables 
	WHERE tablename = 'percent_monthly_personnel_costs_change' AND schemaname ='public') THEN
	CREATE TABLE public.percent_monthly_personnel_costs_change AS SELECT projectid, earnenddate, SUM(total) 
		FROM public.payroll WHERE (earnenddate::date = payenddate::date) 
		GROUP BY projectid, earnenddate ORDER BY projectid, earnenddate; 
	ALTER TABLE public.percent_monthly_personnel_costs_change ADD COLUMN percent_change double precision DEFAULT 0.0;
	ALTER TABLE public.percent_monthly_personnel_costs_change ADD COLUMN previous_percent_change double precision DEFAULT 0.0;
                -- Grab all the newly created rows
	OPEN set_of_rows_to_process FOR SELECT * FROM percent_monthly_personnel_costs_change FOR UPDATE;
	-- Should we surpress change events for data records that happened many months ago, or maybe for all events when 
	-- the percent_monthly_personnel_costs_change table is first created?
	-- If so, we shouldn't be attempting event detection for these rows in the loop below.
    ELSE
	-- Grab all the rows that have been updated
	OPEN set_of_rows_to_process FOR SELECT * FROM find_new_or_changed_rows() FOR UPDATE;
    END IF;
    
    LOOP
	FETCH set_of_rows_to_process INTO row_to_process;
	EXIT WHEN NOT FOUND;
	old_percent_change = row_to_process.previous_percent_change;
	current_percent_change = row_to_process.percent_change;	
	new_percent_change := calculate_percent_change_from_previous_month(row_to_process);
	row_to_process.previous_percent_change := current_percent_change;
	row_to_process.percent_change := new_percent_change;
	UPDATE percent_monthly_personnel_costs_change 
	SET  previous_percent_change = row_to_process.previous_percent_change, percent_change = row_to_process.percent_change
	-- WHERE CURRENT OF set_of_rows_to_process -- probably not supported on this variety of cursor until PG 8.4;
	WHERE projectid = row_to_process.projectid AND earnenddate = row_to_process.earnenddate;
	IF (
		(new_percent_change >= 10.0)
		AND 
		(current_percent_change < 10.0)
	    )
	THEN
		RAISE NOTICE 'New Change Event: old percent = %, new percent = %', row_to_process.previous_percent_change,
			row_to_process.percent_change;
	ELSE
		RAISE NOTICE 'NOT a New Change Event: old percent = %, new percent = %', row_to_process.previous_percent_change,
			row_to_process.percent_change;		
	END IF;
    END LOOP;

    CLOSE set_of_rows_to_process;

END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;


-- Function: find_new_or_changed_rows()

-- DROP FUNCTION find_new_or_changed_rows();

CREATE OR REPLACE FUNCTION find_new_or_changed_rows()
  RETURNS SETOF percent_monthly_personnel_costs_change AS
$BODY$
DECLARE

  inserted_or_changed_row RECORD;
  row_to_be_updated  percent_monthly_personnel_costs_change%ROWTYPE;
  row_to_return percent_monthly_personnel_costs_change%ROWTYPE;

BEGIN

FOR inserted_or_changed_row IN SELECT  projectid, earnenddate, SUM(total) FROM public.payroll 
	WHERE (earnenddate::date = payenddate::date) GROUP BY projectid, earnenddate 
	EXCEPT SELECT projectid, earnenddate, sum FROM percent_monthly_personnel_costs_change
LOOP  
      SELECT * INTO row_to_be_updated FROM percent_monthly_personnel_costs_change 
	WHERE (projectid = inserted_or_changed_row.projectid) AND  (earnenddate = inserted_or_changed_row.earnenddate);
      IF(row_to_be_updated IS NOT NULL) THEN	
	-- UPDATE this changed row in the percent_monthly_personnel_costs_change table
	-- That is, the source table (payroll) has updated (or changed) info for 
	-- a particular (projectid,earneddate) combination for which we will
	-- "update" the corresponding row in the percent_monthly_personnel_costs_change table.
	RAISE NOTICE 'Row Changed: projectid = %, earnenddate = %, old total = %, new total = %', 
		row_to_be_updated.projectid, 
		row_to_be_updated.earnenddate, 
		row_to_be_updated.sum,  
		inserted_or_changed_row.sum;
	   UPDATE  percent_monthly_personnel_costs_change
	   SET sum=inserted_or_changed_row.sum
	   WHERE (projectid = row_to_be_updated.projectid) AND (earnenddate = row_to_be_updated.earnenddate)
	   RETURNING * INTO STRICT row_to_return; -- Return the updated row, but (STRICT) only one row
	   RETURN NEXT row_to_return; -- Add to the Set of Results returned by this function

      ELSE
	-- ADD this new row to the percent_monthly_personnel_costs_change table
	-- That is, the source table (payroll) has information about a 
	-- a new (projectid,earneddate) combination for which we will
	-- "insert" a new row in the percent_monthly_personnel_costs_change table.
	RAISE NOTICE 'New Row: projectid = %, earnenddate = %, total = %',  
		inserted_or_changed_row.projectid,  
		inserted_or_changed_row.earnenddate,  
		inserted_or_changed_row.sum;	
	INSERT INTO percent_monthly_personnel_costs_change(
	            projectid, earnenddate, sum)
	    VALUES (inserted_or_changed_row.projectid,  
		inserted_or_changed_row.earnenddate,  
		inserted_or_changed_row.sum)
	     RETURNING * INTO STRICT row_to_return; -- Return the inserted row, but (STRICT) only one row
	RETURN NEXT row_to_return; -- Add to the Set of Results returned by this function
      END IF;
END LOOP;
      RETURN;
END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

-- Function: calculate_percent_change_from_previous_month(percent_monthly_personnel_costs_change)

-- DROP FUNCTION calculate_percent_change_from_previous_month(percent_monthly_personnel_costs_change);

CREATE OR REPLACE FUNCTION calculate_percent_change_from_previous_month(current_month percent_monthly_personnel_costs_change)
  RETURNS double precision AS
$BODY$
DECLARE

	previous_month  percent_monthly_personnel_costs_change%ROWTYPE;
BEGIN

	SELECT INTO previous_month * FROM percent_monthly_personnel_costs_change  
	WHERE (projectid = current_month.projectid) 
		AND (earnenddate < current_month.earnenddate) 
		AND (earnenddate > (current_month.earnenddate - interval '2 months'));

	IF ((previous_month IS NULL) OR (previous_month.sum = 0.0 )) THEN
		RETURN 100.0;
	END IF;

	RETURN (100.0 * (current_month.sum - previous_month.sum) / previous_month.sum);

END;$BODY$
  LANGUAGE 'plpgsql' VOLATILE STRICT;

-- ScottLangley - 16 Apr 2009