Using Databases with Dispersion Modeling: Nested Queries


Most of you out there reading this article are scientists or engineers. In the course of our education we have come across some really gnarly complicated problems to solve. We probably started shaking and sweating profusely while reading the text of these problems. To solve these types of problems, we were all trained to break these complicated problems down into smaller more easily solvable problems then combine all the results together.

When processing all the model results for a dispersion modeling analysis, compiling and summarizing the results can be overwhelming. You may literally have millions of records of results to sort through, manipulate, and analyze. How do you break down reams and reams of results into something manageable?

If you are using a database to process your model output, the answer to the question above is pretty straightforward: nested queries.

Using Nested Queries

We have talked about queries with one table (http://www.naviknow.com/2018/07/18/using-databases-simple-queries/) and queries with multiple tables (http://www.naviknow.com/2018/08/08/using-databases-more-on-queries/), so you have seen that the general structure of a SQL SELECT statement looks like this:

SELECT
    <table_name>.<field_name>,
    …
    <table_name>.<field_name>
FROM
    <table>

When looking at the FROM clause, though it indicates the records are being selected from a database table, it would be more correct to use the term record set rather than table. Why record set? A SELECT statement can select from records returned from another SELECT statement. So if you have massive number of records to return, perform calculations on (http://www.naviknow.com/2018/07/25/using-databases-calculations-with-queries/), compile, and summarize, you can use SELECT statements within SELECT statements to break the problem into manageable steps.

Unitized Emission Rate Dispersion Modeling

When using the unitized emission rate modeling technique, you will generate potentially millions of records of results contained in your PLOTFILE. Using nested queries is the best way to perform calculations on, compile, and summarize these records.

The general SQL SELECT statement structure in pseudo-code will look something like:

SELECT
    MAX(SUM((unitized concentrations)*(air contaminant emission rate)))
        for each air contaminant
FROM
    (SELECT
        SUM((unitized concentrations)*(air contaminant emission rate))
            for each air contaminant at each receptor
    FROM
        (SELECT
            (unitized concentrations)*(air contaminant emission rate)
            for each source and each air contaminant at each receptor
        FROM
        (PLOTFILE results table)))

Notice the nesting of the SELECT statements, indented and color coded for your convenience.

In red, you are multiplying the unitized concentration by the emission rate (products) for each source, each air contaminant, at each receptor.

In blue, you are summing all the products (sum of the products) for each air contaminant at each receptor.

Finally, in black, you take the maximum of the sum of the products for each air contaminant.

Pretty straightforward and compact.

If you have done this same calculation in a spreadsheet, and a lot of you have, you would have an entire sheet dedicated to each SELECT statement above. That equates to thousands to tens of thousands of cells. Also, for each project, you will most likely have a different number of receptors, so you will have to customize each spreadsheet workbook. Why do that to yourself?

Another thing to keep in mind is the regulator reviewing your modeling will have thousands of cells to check and validate. Why do that to them?

Lastly, why bill your clients for your time doing the calculations in a spreadsheet and extend the permit review time by providing the spreadsheet to the regulator when there is a simpler and quicker way to perform and validate the same calculations?

Based on the points made above, for this type of analysis, databases are just better than spreadsheets.

SUMMARY

For many types of data analysis involved with dispersion modeling, database queries can provide results to very complicated problems. These problems can be simplified tremendously by using nested queries that break the larger problem down into simple and manageable problems that are easy to solve by the modeler and easy to validate by the regulator.

The unitized dispersion modeling technique provides a good example of how to apply nested queries.

If you found this article informative, there is more helpful and actionable information for you.  Go to http://learn.naviknow.com to see a list of past webinar mini-courses. Every Wednesday (Webinar Wednesday), NaviKnow is offering FREE webinar mini-courses on topics related to air quality dispersion modeling and air quality permitting. If you want to be on our email list, drop me a line at [email protected].

One of the goals of NaviKnow is to create an air quality professional community to share ideas and helpful hints like those covered in this article. So if you found this article helpful, please share with a colleague.