Monitoring Tables: The first phase of the Database Change Notifications-to-RSS Project

Overview:

Create an RSS item for each INSERT, UPDATE, or DELETE of a row from selectively monitored tables

Operational Status:

Can be determined by reading the directions on this page

Subscribe:

Subsrcibe to Scott's RSS using http://www.scharp.org/rss/finance_tables.rss

Requirements:

  1. Monitor the following tables on the main database for changes on the server db.scharp.org
    • pdb.protocol (network_id field tag in the xml file for filtering; example at W:\web\cgi-bin\rss)
    • admin.employee (employee_id tag)
    • admin.employee_job (supervisor_id tag)
    • admin.tbl_tbn (supervisor_id tag)
    • public.grants (network tag)
    • public.projectidcontractmapping (project_id tag)
    • public.projectid_budref(project_id tag)
    • vidi_secure.tbl_employee (employee_id)
  2. Create a separate RSS feed for each monitored table.
  3. Feed entries will be in the form of this sample:

  1. The RSS Feeds will have no inherit security other than being located inside the Hutch firewall.
  2. We will utilize Wolfe's Email-to-RSS script for creating the new RSS items. This we'll prepend new RSS items to the top of the feed.

Project Milestones & Timeline

  • TBD - Rough estimate is 80 hours of development time - estimate July 15th for testing with Finance.
  • DONE Mija estimates that the admin and pdb schemas will no longer be replicated sometime in the 2nd half of May. That's a prequisite for using the trigger-based mechanism for monitoring changes made to a table. Done.
  • DONE Test whether triggers will work on the admin and pdb schemas after the new sync'ng mechanism is in place. Chuck has given me some guidance on how to test this ( 6 hours). Yes, they do work. More rigorous testing (2 more hours) as recommended by Chuck also successful..
  • No Implement a common pg/PSQL trigger for monitoring tables and recording changes in a new event table and a shadow or history table for each monitored table. Using Robert's shadow table PL/pgperl script as a base. (12 hours). Mostly done. Forecast 8 more hours.
    • Need to ensure recorded data is in a format that is easy to extract for monitoring script and html page generation script. (+4 hours)
    • Optimize functions for database performance by batching table INSERT's, preferring pg_catalog over information_schema tables, and adding appropriate indexes. (+4 hours)
    • Because the PL/perl language doesn't allow you to catch database exceptions, if one of the triggers for this project throws an exception, it may (probably will) cause the nightly sync process to fail. As discussed with Robert, we will either have to turn on the unsafe (unsecure) version of PL/perl language that does have the ability to catch database exceptions; or I will need to rewrite our trigger functions in a safe PL language that can handle exceptions. The only such language appears to be PL/pgsql. (Probable +16 hours).
    • Working with Robert on the issue of how to extract the fields of a RECORD Pseudo Data Type, e.g., OLD and NEW, into an array of indexable TEXT values. May involve coding a user-function for PostgreSQL in C..
    • Aborting the trigger mechanism for now. Should be more easily doable in PostgreSQL 8.4..
  • DONE Due to the difficulty of implementing the above and the concerns expressed by Robert, Mija, and Chuck, we've decided to switch from using database triggers for our implementation to using polling. Estimate 40 hour to implement the polling version.
    • Worked with Lloyd Albin to modify his Materialized view functionality to allow storing of both the current and last version of a materialized view, so that differences between the two, i.e., the changes, can easily be detected. (Done.)
    • Write a PL/pgSQL function to record the primary keys of a monitored table into a meta-table to be used by the difference function (2 hours).
    • Write a PL/pgSQL function to record the differences between two versions of a materialized view into event tables. (6 hours). Had to re-write this as a PL/Perl function in order to extract the column names, due to their dynamic nature. (7 more hours).
  • DONE Implement a script to monitor the event table for new events and send the results to the email-to-RSS program. As per Robert, the scripts will be written in the python language. (16 hours)
  • hourglass.gif Create an RSS feed for each table to be monitored. (Should this be automated? Should a master index of available RSS feeds be created somewhere?) (1 hour or 4 hours) For now, went with a single rss feed that receives the updates for all monitored tables - finance_tables.rss.
  • DONE Implement a script to create a html page for each database event to display the details of the database change - to be linked to from the RSS entries. (22 hours)
  • DONE Doug expressed a desire to have a PostgreSQL view for each monitored table that records the complete status of a row as it changes over time: it's current values; how it was changed; and the date of change; such that these views could be imported into Microsoft Excel and be acted on as a 'pivot table'. This would facilitate them being able to filter and drill-down with the data in a way that suits their work practice. When certain fields on a row are UPDATE'd, it will not be necessary to record the previous values side-by-side with the current values. Instead, Finance will be able to compare the newer version of the row to the earlier version of the row - as long as the 'original' copy of the row is loaded into the view at the time monitoring begins.

Future TODO's (updated 9/1/09, DL)

  1. Request: In the RSS view, we need views that return condensed, relevant information as opposed to the entire table view. For instance, change in admin database for Bartz, Traci... we have to use a separate tool (Pg III) to obtain the meaningful metadata. A second example was from the protocol table, where change from stage 2 to 3... it would be best to have the description or name of the stage instead of 2 and 3. The goal is to reach a condensed screen shot that the ordinary user could quickly scan in the morning.
    1. pdb_protocol
      1. Instead of using tbl_protocol, please use views.views.rss_update_protocol. The primary key is probably a composite key and we may have to add protocol_id to the view. We stripped it since it was not as useful. Let's talk about this.
    2. admin_tbl_employee
      1. use views.views.rss_update_employee
    3. admin_tbl_employee_job
      1. use views.views.rss_update_employee_job
    4. admin_tbl_tbn
      1. No change to view. Continue with table only.
    5. public.grants
      1. No change to view. Continue with table only.
    6. public.projectidcontractmapping
      1. No change to view. Continue with table only.
    7. public.projectid_budref
      1. No change to view. Continue with table only.
    8. vidi.views.employee_list.
      1. This was suggested as an addition, as table only on 9/2/09.
  2. Request: Pursue the sophisticated version of query. Higher levels of changes https://twiki.vidi.org/bin/view/PopSci/SCHARP/CustomQueryDatabaseMonitoring
  3. Request: List of what we can answer by using a view.
  4. Note: The "Open History Table" did work IF we were already logged into phpPGadmin.
  5. Cancelled: Since we would like to rollout specific feeds to various users (e.g. Admin - Rose or HVTN - Drienna), it would be best if the RSS could be subscribed to using separate feeds as opposed to a single feed.
    1. Three user groups:
      1. All current views and tables - Finance & DDs, where feed name Finance_tables
      2. Admin tables (tbn, employee, and employee job) - Rose and Helen, where feed name admin_tables
      3. Protocol and Finance table (protocol, grants, projectidcontractmapping, projectid_budref, where feed name is public_pdb_tables
  6. Cancelled on 9/1/09. Can we have the table open in Google spreadsheets on select (Ben). Our original thought was to cut/paste the html in excel or use the history table view in tableau. Doug talked with Ben about this and demonstrated Excel solution and this seems good enough for now.

Bug's and Issues

  • Can only monitor database tables with a primary key defined. (Composite - or multiple - primary keys are also supported). Lloyd has indicated that matviews are basically tables and do therefore have a primary key.
  • If a change to the structure (columns) or constraints is made to the monitored table, those changes may cause the monitoring of that table to completely fail - with an error message - or for some data to be silently ignored. This is because the 'definition' of the monitored table is recorded the first time monitoring is setup on a table and never updated subsequently. (The first version of the materialized view table contains an exact copy of the original's table structure and constraints.) Being able to accurately detect these circumstances would be possible with more coding - checking the table structure and constraints on every poll and comparing with the previously detected values. Being able to automatically handle/compensate for such changes without manual intervention may not be possible in the current implementation.
  • Security for this system, as to who can view database changes, is only partially implemented at this time. The content of changes is secure on the webserver but not on the filesystem.
  • If the target RSS feed does not already exist, the RSS notifications will fail. The only indication for such failures will be bounced emails that say Undelivered Mail Returned to Sender, unknown user. There is currently no easy way to resend such failed RSS notifications. From a engineering point of view, it would be better to abandon the use of the email-to-RSS.pl script in order to make this system more robust.
  • Because we are using polling - instead of triggers - we are not capturing the information on which database user made a change to the data.
  • There is no user-interface for setting up the monitoring on particular tables, nor for monitoring the status and health of the application. Everything is done via Postgresql tables and functions.

Project Deliverables

Current Program Source Code

-- ScottLangley - 07 May 2009