Keyword Search using Regex
Many research questions revolve around finding transactions that match a specific key word or combination of key words. In the example below, we are interested in getting all "Sales Tax" revenue from the "State of Utah" entity. We first define a variable "search_term" and give it the value of "sales tax". Please make sure your search term is lowercase since all the paramters in the WHERE clause lower the text fields we are searched against. The GoogleSQL function "REGEXP_CONTAINS" provides standard regex detection.
DECLARE search_term STRING;
SET search_term = r"sales tax"; -- search term must be LOWER case!
SELECT * FROM `ut-sao-transparency-prod.transaction.transaction`
WHERE entity_name = "State of Utah" AND
type = "RV" AND
(
REGEXP_CONTAINS(LOWER(description), search_term)
REGEXP_CONTAINS(LOWER(fund1), search_term) OR
REGEXP_CONTAINS(LOWER(fund2), search_term) OR
REGEXP_CONTAINS(LOWER(fund3), search_term) OR
REGEXP_CONTAINS(LOWER(fund4), search_term) OR
REGEXP_CONTAINS(LOWER(org1), search_term) OR
REGEXP_CONTAINS(LOWER(org2), search_term) OR
REGEXP_CONTAINS(LOWER(org3), search_term) OR
REGEXP_CONTAINS(LOWER(org4), search_term) OR
REGEXP_CONTAINS(LOWER(org5), search_term) OR
REGEXP_CONTAINS(LOWER(org6), search_term) OR
REGEXP_CONTAINS(LOWER(org7), search_term) OR
REGEXP_CONTAINS(LOWER(org8), search_term) OR
REGEXP_CONTAINS(LOWER(org9), search_term) OR
REGEXP_CONTAINS(LOWER(org10), search_term) OR
REGEXP_CONTAINS(LOWER(cat1), search_term) OR
REGEXP_CONTAINS(LOWER(cat2), search_term) OR
REGEXP_CONTAINS(LOWER(cat3), search_term) OR
REGEXP_CONTAINS(LOWER(cat4), search_term) OR
REGEXP_CONTAINS(LOWER(cat5), search_term) OR
REGEXP_CONTAINS(LOWER(cat6), search_term) OR
REGEXP_CONTAINS(LOWER(cat7), search_term) OR
REGEXP_CONTAINS(LOWER(program1), search_term) OR
REGEXP_CONTAINS(LOWER(program2), search_term) OR
REGEXP_CONTAINS(LOWER(program3), search_term) OR
REGEXP_CONTAINS(LOWER(program4), search_term) OR
REGEXP_CONTAINS(LOWER(program5), search_term) OR
REGEXP_CONTAINS(LOWER(program6), search_term) OR
REGEXP_CONTAINS(LOWER(program7), search_term) OR
REGEXP_CONTAINS(LOWER(function1), search_term) OR
REGEXP_CONTAINS(LOWER(function2), search_term) OR
REGEXP_CONTAINS(LOWER(function3), search_term) OR
REGEXP_CONTAINS(LOWER(function4), search_term) OR
REGEXP_CONTAINS(LOWER(function5), search_term) OR
REGEXP_CONTAINS(LOWER(function6), search_term) OR
REGEXP_CONTAINS(LOWER(function7), search_term) OR
);
Search using Uniform Chart of Accounts Code
Since local governments have different ways of tracking various accounts in their accounting system, using a keyword search is not always sufficient and sometimes incomplete. For example, If you wanted all transactions from the "General Fund", there could be many variations such as "Gen Fund", "GF", "Gen. Fund", "General Fund 1001", etc.
The uniform chart of accounts (UCA) solves this issue by standardizing accounts into a 3 component code.
- Fund (3 digits)
- Function (6 digits)
- Caveat1: CARES ACT uses a lower case 'a' in the second digit of the 'function' component"
- Caveat2: ARPA uses a lower case 'b' in the second digit of the 'function' component"
- Account (8 digits)
- If the transaction type is "RV" then the "Account" code should start with a "3"
- If the transaction type is "EX" then the "Account" code should start with a "4"
An example of a valid UCA code would be "100-100000-30000000". For more information on available and valid UCA codes, visit this link: https://auditor.utah.gov/wp-content/uploads/sites/6/2020/09/Chart-of-Accounts-10-26-15_9_17_2020.pdf
In our example, we are interested in getting grant expenses. Below, we show how to extract the fund, function, and account codes from the UCA column ("account number" field). Then we also shown how to filter WHERE clauses off "Account" component.
-- Note, there are two other Charts of Account that may be used other than the Office of the State Auditor's UCA.
-- 1. Local Education Agency (LEA) will use The Utah State Board of Educations (USBE)'s chart of accounts.
-- 2. State of Utah. Uses its own COA.
SELECT entity_name,
fiscal_year,
type,
fund1,
fund2,
org1,
org2,
org3,
cat1,
cat2,
cat3,
cat4,
description,
vendor_name,
account_number AS uca,
REGEXP_EXTRACT(account_number, r'^\d{3}') AS uca_fund,
REGEXP_REPLACE(REGEXP_EXTRACT(account_number, r'-\d{1}[\d|a|b]\d{4}-'), '-','') AS uca_function,
REGEXP_EXTRACT(account_number, r'\d{8}$') AS uca_account,
SUM(amount) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction`
WHERE type IN('EX') AND
REGEXP_REPLACE(REGEXP_EXTRACT(account_number, r'-\d{8}'),'-','') IN ( '40060000', # Grants and Contracts
'40060100', # Grants and Contracts - Federal
'40060200', # Grants and Contracts - State
'40060300', # Grants and Contracts - Local
'40060400' # Grants and Contracts - Private
)
GROUP BY
entity_name,
fiscal_year,
type,
fund1,
fund2,
org1,
org2,
org3,
cat1,
cat2,
cat3,
cat4,
description,
account_number,
uca_fund,
uca_function,
uca_account
School District and Charter School UCA
-- Description: This scripts shows how to perform searches using USBE's Uniform
-- Chart of Accounts. This script show how to use built-in Big Query string
-- manipulation functions.
-- To see other UCA codes available visit:
-- https://www.schools.utah.gov/financialoperations/reporting
SELECT
entity_name,
govt_lvl,
fiscal_year,
type,
account_number AS uca,
REGEXP_EXTRACT(account_number , r"^\d{2}") AS uca_fund,
REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}"), r"\w{3}$") AS uca_location,
REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}-\d{4}"), r"\d{4}$") AS uca_program,
REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}-\d{4}-\d{4}"), r"\d{4}$") AS uca_function,
REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}-\d{4}-\d{4}-\d{3}"), r"\d{3}$") AS uca_object,
REGEXP_EXTRACT(account_number , r"\d{4}$") AS uca_revenue,
COUNT (account_number) AS freq,
SUM(amount) AS net_amount
FROM transaction.transaction
WHERE govt_lvl IN ("K12 EDUCATION") AND
type IN ("RV","EX") AND
(-- Search by Fund:
REGEXP_EXTRACT(account_number , r"^\d{2}") IN ('10', '20')
-- Search by Location:
AND REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}"), r"\w{3}$") IN ('A45')
-- Search by Program:
AND REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}-\d{4}"), r"\d{4}$") IN ('1205')
-- Search by Function:
AND REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}-\d{4}-\d{4}"), r"\d{4}$") IN ('1000')
-- Search by Object Code
AND REGEXP_EXTRACT(REGEXP_EXTRACT(account_number , r"^\d{2}-\w{3}-\d{4}-\d{4}-\d{3}"), r"\d{3}$") IN ('110')
-- Search by Revenue Code
OR REGEXP_EXTRACT(account_number , r"-\d{4}$") IN ('1100')
)
GROUP BY entity_name,
govt_lvl,
fiscal_year,
type,
batch_id,
uca,
uca_fund,
uca_location,
uca_program,
uca_function,
uca_object,
uca_revenue
How can I replicate the data in the employee payment search?
We make an aggregated table from the transaction table, which partitions by employee begining initial. We also use regex to define what counts as "Wages/Salary" vs "Benefits"
CREATE OR REPLACE TABLE `your_project_id.you_data_set.employee_search` (
fiscal_year INT64,
wages FLOAT64,
benefits FLOAT64,
title STRING,
vendor_name STRING,
entity_name STRING,
employee_beg_name_id INT64
)
PARTITION BY
RANGE_BUCKET(employee_beg_name_id, GENERATE_ARRAY(0, 26, 1))
AS
SELECT
fiscal_year,
wages,
benefits,
INITCAP(title),
INITCAP(vendor_name) AS vendor_name,
entity_name,
employee_beg_name_id
FROM
(SELECT
fiscal_year,
SUM(
CASE
WHEN is_salary = 1 THEN amount
ELSE 0
END
) AS wages,
SUM(
CASE
WHEN is_salary = 0 THEN amount
ELSE 0
END
) AS benefits,
title,
vendor_name,
entity_name,
employee_beg_name_id
FROM
(SELECT
fiscal_year, category,
CASE
WHEN
(REGEXP_CONTAINS(LOWER(category),
r'(salary)|(salaries)|(wage)|(compensation)|(excess time)|(overtime)|(bonus)|(incentive)|(leave)|(allowance)|(earning)|(stipend)|(regular pay)|(per diem)|(holiday)|(vacation)|(sick)|(comp time)|(pto)|(capital outlay)|(personal time)|(uncollected)|(hourly)')
OR
LOWER(category) IN ('regular employees', 'miscellaneous earnings', 'full-time earnings', 'on call pay', 'hourly and adjunct faculty', 'personnel', 'regular pay', '340 other contracted professional services', 'other pay', '320 professional - educational services', 'temporary employees', 'uniform', 'adjuncts', '999 revenue', '8131 local', 'taxable reimbursements', 'special programs', 'payroll', 'payroll system incentive, service and retirement awards', 'administrative', 'additional pay', 'bereavement pay', '850 contingency (for budgeting purposes only)', '340 other professional services'))
THEN 1
ELSE 0
END AS is_salary,
amount,
title,
vendor_name,
entity_name,
employee_beg_name_id
FROM
(SELECT
fiscal_year,
CASE
WHEN col_check = 'c3' THEN cat3
WHEN col_check = 'c2' THEN cat2
WHEN col_check = 'c1' THEN cat1
WHEN col_check = 'error' THEN 'no category'
END AS category,
amount,
title,
vendor_name,
entity_name,
employee_beg_name_id
FROM
(SELECT
fiscal_year, cat1, cat2, cat3,
CASE
WHEN cat3 != '' THEN 'c3'
WHEN cat2 != '' THEN 'c2'
WHEN cat1 != '' THEN 'c1'
ELSE 'error'
END AS col_check,
amount, title,
vendor_name,
entity_name,
CASE
WHEN STARTS_WITH(LOWER(vendor_name), "a") THEN 1
WHEN STARTS_WITH(LOWER(vendor_name), "b") THEN 2
WHEN STARTS_WITH(LOWER(vendor_name), "c") THEN 3
WHEN STARTS_WITH(LOWER(vendor_name), "d") THEN 4
WHEN STARTS_WITH(LOWER(vendor_name), "e") THEN 5
WHEN STARTS_WITH(LOWER(vendor_name), "f") THEN 6
WHEN STARTS_WITH(LOWER(vendor_name), "g") THEN 7
WHEN STARTS_WITH(LOWER(vendor_name), "h") THEN 8
WHEN STARTS_WITH(LOWER(vendor_name), "i") THEN 9
WHEN STARTS_WITH(LOWER(vendor_name), "j") THEN 10
WHEN STARTS_WITH(LOWER(vendor_name), "k") THEN 11
WHEN STARTS_WITH(LOWER(vendor_name), "l") THEN 12
WHEN STARTS_WITH(LOWER(vendor_name), "m") THEN 13
WHEN STARTS_WITH(LOWER(vendor_name), "n") THEN 14
WHEN STARTS_WITH(LOWER(vendor_name), "o") THEN 15
WHEN STARTS_WITH(LOWER(vendor_name), "p") THEN 16
WHEN STARTS_WITH(LOWER(vendor_name), "q") THEN 17
WHEN STARTS_WITH(LOWER(vendor_name), "r") THEN 18
WHEN STARTS_WITH(LOWER(vendor_name), "s") THEN 19
WHEN STARTS_WITH(LOWER(vendor_name), "t") THEN 20
WHEN STARTS_WITH(LOWER(vendor_name), "u") THEN 21
WHEN STARTS_WITH(LOWER(vendor_name), "v") THEN 22
WHEN STARTS_WITH(LOWER(vendor_name), "w") THEN 23
WHEN STARTS_WITH(LOWER(vendor_name), "x") THEN 24
WHEN STARTS_WITH(LOWER(vendor_name), "y") THEN 25
WHEN STARTS_WITH(LOWER(vendor_name), "z") THEN 26
ELSE 0
END AS employee_beg_name_id
FROM `ut-sao-transparency-prod.transaction.transaction`
WHERE title IS NOT NULL
AND type = 'PY')))
GROUP BY fiscal_year,
title,
vendor_name,
entity_name,
employee_beg_name_id);
You can then use that table to conduct your own searches like below:
--getEmployeeSearch
SELECT fiscal_year,
ROUND(wages,2) AS wages,
ROUND(benefits,2) AS benefits,
ROUND((wages + benefits),2) AS total,
title,
entity_name,
employee_name,
DENSE_RANK() OVER ( ORDER BY entity_name,employee_name,title ) AS ids
FROM (
SELECT fiscal_year,
wages,
benefits,
title,
entity_name,
vendor_name AS employee_name
FROM `ut-sao-transparency-prod.tu_public.employee_search`
WHERE lower(vendor_name) LIKE '%john%' AND lower(vendor_name) LIKE '%dougall%'
)
ORDER BY fiscal_year;
How can I replicate the Highest Paid Employees data?
Similar to the employee pay search, we have to define salaries and wages. We also create a rank for each employee and filter to only the top 100. You can search the created table easily by filter to entity name and fiscal year. You can expand the employee rank to be greater than 100 if you desire too.
CREATE OR REPLACE TABLE `your_project_name.your_dataset.highest_paid_emp` (
entity_id INT64,
fiscal_year INT64,
entity_name STRING,
org1 STRING,
employee_name STRING,
title STRING,
emp_rank INT64,
wages FLOAT64,
benefits FLOAT64,
total FLOAT64
)
PARTITION BY RANGE_BUCKET(entity_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT
entity_id ,
fiscal_year ,
entity_name ,
org1 ,
employee_name ,
title ,
emp_rank ,
wages ,
benefits ,
total
FROM
(
SELECT entity_id ,
fiscal_year ,
entity_name ,
org1 ,
employee_name ,
title ,
ROW_NUMBER() OVER (PARTITION BY entity_id, fiscal_year ORDER BY total DESC) AS emp_rank ,
wages ,
benefits ,
ROUND(total, 2) AS total
FROM
(
SELECT
entity_id,
entity_name,
fiscal_year,
org1,
ROUND(wages, 2) AS wages,
ROUND(benefits, 2) AS benefits,
coalesce(wages,0) + coalesce(benefits,0) AS total,
INITCAP(title) AS title,
INITCAP(employee_name) AS employee_name,
INITCAP(govt_lvl) AS govt_lvl
FROM
(SELECT
entity_id,
entity_name,
fiscal_year,
org1,
SUM(
CASE
WHEN is_salary = 1 THEN amount
ELSE 0
END
) AS wages,
SUM(
CASE
WHEN is_salary = 0 THEN amount
ELSE 0
END
) AS benefits,
title,
employee_name,
govt_lvl
FROM
(SELECT
entity_id,
entity_name,
fiscal_year,
org1,
category,
CASE
WHEN
(REGEXP_CONTAINS(LOWER(category),
r'(salary)|(salaries)|(wage)|(compensation)|(excess time)|(overtime)|(bonus)|(incentive)|(leave)|(allowance)|(earning)|(stipend)|(regular pay)|(per diem)|(holiday)|(vacation)|(sick)|(comp time)|(pto)|(capital outlay)|(personal time)|(uncollected)|(hourly)')
OR
LOWER(category) IN ('regular employees', 'miscellaneous earnings', 'full-time earnings', 'on call pay', 'hourly and adjunct faculty', 'personnel', 'regular pay', '340 other contracted professional services', 'other pay', '320 professional - educational services', 'temporary employees', 'uniform', 'adjuncts', '999 revenue', '8131 local', 'taxable reimbursements', 'special programs', 'payroll', 'payroll system incentive, service and retirement awards', 'administrative', 'additional pay', 'bereavement pay', '850 contingency (for budgeting purposes only)', '340 other professional services'))
THEN 1
ELSE 0
END AS is_salary,
amount,
title,
employee_name,
govt_lvl
FROM
(SELECT
entity_id,
entity_name,
fiscal_year,
org1,
CASE
WHEN col_check = 'c3' THEN cat3
WHEN col_check = 'c2' THEN cat2
WHEN col_check = 'c1' THEN cat1
WHEN col_check = 'error' THEN 'no category'
END AS category,
amount,
title,
vendor_name AS employee_name,
govt_lvl
FROM
(SELECT
elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
org1,
cat1,
cat2,
cat3,
CASE
WHEN cat3 != '' THEN 'c3'
WHEN cat2 != '' THEN 'c2'
WHEN cat1 != '' THEN 'c1'
ELSE 'error'
END AS col_check,
amount,
title,
vendor_name,
elookup.govt_lvl
FROM `ut-sao-transparency-prod.transaction.transaction` tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup` AS elookup ON tran.entity_name = elookup.sf_name
WHERE title IS NOT NULL
AND TRIM(LOWER(vendor_name)) NOT IN ('not provided', 'not applicable', 'redacted')
AND type = 'PY')))
GROUP BY entity_id,
entity_name,
fiscal_year,
org1,
title,
employee_name,
govt_lvl)
)
)
WHERE emp_rank < 101;
How can I replicate the data in the vendor payment search?
We make an aggregated table from the transaction table, which partitions by vendor names begining initial.
CREATE OR REPLACE TABLE `your_project_name.your_dataset.vendor_search` (
fiscal_year INT64,
entity_name STRING,
vendor_name STRING,
vendor_beg_name_id INT64,
net_amount FLOAT64
)
PARTITION BY
RANGE_BUCKET(vendor_beg_name_id, GENERATE_ARRAY(0, 26, 1))
AS
SELECT fiscal_year,
entity_name,
vendor_name,
CASE
WHEN STARTS_WITH(LOWER(vendor_name), "a") THEN 1
WHEN STARTS_WITH(LOWER(vendor_name), "b") THEN 2
WHEN STARTS_WITH(LOWER(vendor_name), "c") THEN 3
WHEN STARTS_WITH(LOWER(vendor_name), "d") THEN 4
WHEN STARTS_WITH(LOWER(vendor_name), "e") THEN 5
WHEN STARTS_WITH(LOWER(vendor_name), "f") THEN 6
WHEN STARTS_WITH(LOWER(vendor_name), "g") THEN 7
WHEN STARTS_WITH(LOWER(vendor_name), "h") THEN 8
WHEN STARTS_WITH(LOWER(vendor_name), "i") THEN 9
WHEN STARTS_WITH(LOWER(vendor_name), "j") THEN 10
WHEN STARTS_WITH(LOWER(vendor_name), "k") THEN 11
WHEN STARTS_WITH(LOWER(vendor_name), "l") THEN 12
WHEN STARTS_WITH(LOWER(vendor_name), "m") THEN 13
WHEN STARTS_WITH(LOWER(vendor_name), "n") THEN 14
WHEN STARTS_WITH(LOWER(vendor_name), "o") THEN 15
WHEN STARTS_WITH(LOWER(vendor_name), "p") THEN 16
WHEN STARTS_WITH(LOWER(vendor_name), "q") THEN 17
WHEN STARTS_WITH(LOWER(vendor_name), "r") THEN 18
WHEN STARTS_WITH(LOWER(vendor_name), "s") THEN 19
WHEN STARTS_WITH(LOWER(vendor_name), "t") THEN 20
WHEN STARTS_WITH(LOWER(vendor_name), "u") THEN 21
WHEN STARTS_WITH(LOWER(vendor_name), "v") THEN 22
WHEN STARTS_WITH(LOWER(vendor_name), "w") THEN 23
WHEN STARTS_WITH(LOWER(vendor_name), "x") THEN 24
WHEN STARTS_WITH(LOWER(vendor_name), "y") THEN 25
WHEN STARTS_WITH(LOWER(vendor_name), "z") THEN 26
ELSE 0
END AS vendor_beg_name_id,
SUM(amount) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction`
WHERE type="EX"
GROUP BY
fiscal_year,
entity_name,
vendor_name,
vendor_beg_name_id;
An example of a search against that table using the vendor name "amazon"
--getVendorSearch
SELECT fiscal_year,
entity_name,
vendor_name,
net_amount,
DENSE_RANK() OVER ( ORDER BY entity_name,vendor_name ) AS ids
FROM (
SELECT fiscal_year,
entity_name,
vendor_name,
ROUND(net_amount,2) AS net_amount
FROM `ut-sao-transparency-prod.tu_public.vendor_search`
WHERE lower(vendor_name) LIKE '%amazon%'
)
ORDER BY fiscal_year;
How do I replicate the Highest Paid Vendors and Vendor Payments?
If you wish to make data similar to the highest paid vendors page, please use the code below. We remove data without a vendor name from the aggregation. We also rank vendors and filter to the top 100.
CREATE OR REPLACE TABLE `your_project_name.your_dataset.highest_paid_ven` (
entity_id INT64,
fiscal_year INT64,
entity_name STRING,
-- org1 STRING,
vendor_name STRING,
ven_rank INT64,
net_amount FLOAT64
)
PARTITION BY RANGE_BUCKET(entity_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT entity_id,
fiscal_year,
entity_name,
-- org1,
vendor_name,
ven_rank,
net_amount
FROM
(
SELECT entity_id,
fiscal_year,
entity_name,
-- org1,
vendor_name,
net_amount,
ROW_NUMBER() OVER (PARTITION BY entity_id, fiscal_year ORDER BY net_amount DESC) AS ven_rank
FROM
(
SELECT elookup.entity_id AS entity_id,
fiscal_year,
entity_name,
title,
-- org1,
vendor_name,
SUM(amount) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE
(
type = "EX" AND
entity_name != "State of Utah" AND
LOWER(vendor_name) NOT IN ("not provided", 'not applicable', 'redacted', 'not specified')
)
OR
(
type="EX" AND
entity_name = "State of Utah" AND
org1 != "Education" AND
LOWER(vendor_name) NOT IN ("not provided", 'not applicable', 'redacted', 'not specified')
)
GROUP BY entity_id,
fiscal_year,
entity_name,
title,
-- org1,
vendor_name
)
)
WHERE ven_rank < 101;
If you want to replicate the highest payments, please use this code:
CREATE OR REPLACE TABLE `your_project_name.your_dataset.highest_payments` (
entity_id INT64,
fiscal_year INT64,
entity_name STRING,
-- org1 STRING,
vendor_name STRING,
ven_rank INT64,
amount FLOAT64
)
PARTITION BY RANGE_BUCKET(entity_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT entity_id,
fiscal_year,
entity_name,
-- org1,
vendor_name,
ven_rank,
amount
FROM
(
SELECT entity_id,
fiscal_year,
entity_name,
-- org1,
vendor_name,
amount,
ROW_NUMBER() OVER (PARTITION BY entity_id, fiscal_year ORDER BY amount DESC) AS ven_rank
FROM
(
SELECT elookup.entity_id AS entity_id,
fiscal_year,
entity_name,
title,
-- org1,
vendor_name,
amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE
(
type = "EX" AND
entity_name != "State of Utah" AND
LOWER(vendor_name) NOT IN ("not provided", 'not applicable', 'redacted', 'not specified')
)
OR
(
type="EX" AND
entity_name = "State of Utah" AND
org1 != "Education" AND
LOWER(vendor_name) NOT IN ("not provided", 'not applicable', 'redacted', 'not specified')
)
)
)
WHERE ven_rank < 101;
How do I recreate the data from the Job Title Search?
CREATE OR REPLACE TABLE `your_project_name.project_dataset.job_title_base`(
entity_id INT64,
entity_name STRING,
fiscal_year INT64,
wages FLOAT64,
benefits FLOAT64,
total FLOAT64,
title STRING,
employee_name STRING,
title_beg_id INT64,
govt_lvl STRING
)
PARTITION BY
RANGE_BUCKET(title_beg_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT
entity_id,
entity_name,
fiscal_year,
wages,
benefits,
coalesce(wages,0) + coalesce(benefits,0) AS total,
INITCAP(title),
INITCAP(employee_name),
title_beg_id,
govt_lvl
FROM
(SELECT
entity_id,
entity_name,
fiscal_year,
SUM(
CASE
WHEN is_salary = 1 THEN amount
ELSE 0
END
) AS wages,
SUM(
CASE
WHEN is_salary = 0 THEN amount
ELSE 0
END
) AS benefits,
title,
employee_name,
title_beg_id,
govt_lvl
FROM
(SELECT
entity_id,
entity_name,
fiscal_year,
category,
CASE
WHEN
(REGEXP_CONTAINS(LOWER(category),
r'(salary)|(salaries)|(wage)|(compensation)|(excess time)|(overtime)|(bonus)|(incentive)|(leave)|(allowance)|(earning)|(stipend)|(regular pay)|(per diem)|(holiday)|(vacation)|(sick)|(comp time)|(pto)|(capital outlay)|(personal time)|(uncollected)|(hourly)')
OR
LOWER(category) IN ('regular employees', 'miscellaneous earnings', 'full-time earnings', 'on call pay', 'hourly and adjunct faculty', 'personnel', 'regular pay', '340 other contracted professional services', 'other pay', '320 professional - educational services', 'temporary employees', 'uniform', 'adjuncts', '999 revenue', '8131 local', 'taxable reimbursements', 'special programs', 'payroll', 'payroll system incentive, service and retirement awards', 'administrative', 'additional pay', 'bereavement pay', '850 contingency (for budgeting purposes only)', '340 other professional services'))
THEN 1
ELSE 0
END AS is_salary,
amount,
title,
employee_name,
title_beg_id,
govt_lvl
FROM
(SELECT
entity_id,
entity_name,
fiscal_year,
CASE
WHEN col_check = 'c3' THEN cat3
WHEN col_check = 'c2' THEN cat2
WHEN col_check = 'c1' THEN cat1
WHEN col_check = 'error' THEN 'no category'
END AS category,
amount,
title,
vendor_name AS employee_name,
title_beg_id,
govt_lvl
FROM
(SELECT
elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
cat1,
cat2,
cat3,
CASE
WHEN cat3 != '' THEN 'c3'
WHEN cat2 != '' THEN 'c2'
WHEN cat1 != '' THEN 'c1'
ELSE 'error'
END AS col_check,
amount, title,
vendor_name,
CASE
WHEN STARTS_WITH(LOWER(title), "a") THEN 1
WHEN STARTS_WITH(LOWER(title), "b") THEN 2
WHEN STARTS_WITH(LOWER(title), "c") THEN 3
WHEN STARTS_WITH(LOWER(title), "d") THEN 4
WHEN STARTS_WITH(LOWER(title), "e") THEN 5
WHEN STARTS_WITH(LOWER(title), "f") THEN 6
WHEN STARTS_WITH(LOWER(title), "g") THEN 7
WHEN STARTS_WITH(LOWER(title), "h") THEN 8
WHEN STARTS_WITH(LOWER(title), "i") THEN 9
WHEN STARTS_WITH(LOWER(title), "j") THEN 10
WHEN STARTS_WITH(LOWER(title), "k") THEN 11
WHEN STARTS_WITH(LOWER(title), "l") THEN 12
WHEN STARTS_WITH(LOWER(title), "m") THEN 13
WHEN STARTS_WITH(LOWER(title), "n") THEN 14
WHEN STARTS_WITH(LOWER(title), "o") THEN 15
WHEN STARTS_WITH(LOWER(title), "p") THEN 16
WHEN STARTS_WITH(LOWER(title), "q") THEN 17
WHEN STARTS_WITH(LOWER(title), "r") THEN 18
WHEN STARTS_WITH(LOWER(title), "s") THEN 19
WHEN STARTS_WITH(LOWER(title), "t") THEN 20
WHEN STARTS_WITH(LOWER(title), "u") THEN 21
WHEN STARTS_WITH(LOWER(title), "v") THEN 22
WHEN STARTS_WITH(LOWER(title), "w") THEN 23
WHEN STARTS_WITH(LOWER(title), "x") THEN 24
WHEN STARTS_WITH(LOWER(title), "y") THEN 25
WHEN STARTS_WITH(LOWER(title), "z") THEN 26
ELSE 0
END AS title_beg_id,
elookup.govt_lvl
FROM `ut-sao-transparency-prod.transaction.transaction` tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup` AS elookup ON tran.entity_name = elookup.sf_name
WHERE title IS NOT NULL
AND TRIM(LOWER(vendor_name)) NOT IN ('not provided', 'not applicable', 'redacted')
AND type = 'PY')))
GROUP BY entity_id,
entity_name,
fiscal_year,
title,
employee_name,
title_beg_id,
govt_lvl)
WHERE wages >= 0 AND benefits >=0;
An example to query that table for jobs with a title including 'data' and 'analyst':
--jobTitleWildcard
SELECT entity_name,
fiscal_year,
wages,
benefits,
total,
title,
employee_name,
govt_lvl
FROM `ut-sao-transparency-prod.tu_public.job_title_base`
WHERE lower(title) LIKE '%data%' AND lower(title) LIKE '%analyst%'
ORDER BY total;
How do I recreate the data used in the entity overview page?
That table is a combination of three data sets. The first is a high level surplus/deficit calculation. The next table is one with revenue aggregation. The final table is an expense aggregation.
This code creates the surplus deficit data
CREATE TABLE `your_project_name.your_dataset.entity_surpdef` (
entity_id INT64,
fiscal_year INT64,
entity_name STRING,
net_revenue FLOAT64,
net_expense FLOAT64,
surdef FLOAT64
)
PARTITION BY RANGE_BUCKET(entity_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT exp.entity_id,
exp.fiscal_year,
exp.entity_name,
net_revenue,
net_expense,
ROUND((net_revenue-net_expense),2) AS surdef
FROM
(
SELECT elookup.entity_id AS entity_id,
fiscal_year,
entity_name,
ROUND(SUM(amount),2) AS net_expense
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="EX"
GROUP BY entity_id, entity_name, fiscal_year
) AS exp
LEFT JOIN
(
SELECT elookup.entity_id AS entity_id,
fiscal_year,
entity_name,
ROUND(SUM(amount),2) AS net_revenue
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="RV"
GROUP BY entity_id, entity_name, fiscal_year
) AS rev ON
exp.entity_id = rev.entity_id AND
exp.fiscal_year = rev.fiscal_year AND
exp.entity_name = rev.entity_name ;
The following code creates the revenue details information
CREATE OR REPLACE TABLE `your_project_name.your_dataset.entity_revenue_details` (
entity_id INT64,
entity_name STRING,
fiscal_year INT64,
agg_type STRING,
agg1 STRING,
-- agg2 STRING,
-- agg3 STRING,
-- agg4 STRING,
net_amount FLOAT64
)
PARTITION BY RANGE_BUCKET(entity_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT entity_id,
entity_name,
fiscal_year,
agg_type,
CASE WHEN agg1 IS NOT NULL THEN agg1
ELSE "NOT SPECIFIED"
END AS agg1,
net_amount
FROM
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'fund' AS agg_type,
INITCAP(fund1) AS agg1,
-- fund2 AS agg2,
-- fund3 AS agg3,
-- fund4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="RV"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'organization' AS agg_type,
INITCAP(org1) AS agg1,
-- org2 AS agg2,
-- org3 AS agg3,
-- org4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="RV"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'category' AS agg_type,
INITCAP(cat1) AS agg1,
-- cat2 AS agg2,
-- cat3 AS agg3,
-- cat4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="RV"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'program' AS agg_type,
INITCAP(program1) AS agg1,
-- program2 AS agg2,
-- program3 AS agg3,
-- program4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="RV"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'function' AS agg_type,
INITCAP(function1) AS agg1,
-- function2 AS agg2,
-- function3 AS agg3,
-- function4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="RV"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1--,
-- agg2,
-- agg3,
-- agg4
) ;
UPDATE `ut-sao-transparency-prod.tu_public.entity_revenue_details`
SET agg1 = "NOT SPECIFIED"
WHERE agg1 IS NULL;
The following code creates the expense aggregate information:
CREATE TABLE `your_project_name.your_dataset.entity_expense_details` (
entity_id INT64,
entity_name STRING,
fiscal_year INT64,
agg_type STRING,
agg1 STRING,
-- agg2 STRING,
-- agg3 STRING,
-- agg4 STRING,
net_amount FLOAT64
)
PARTITION BY RANGE_BUCKET(entity_id, GENERATE_ARRAY(1, 4000, 1))
AS
SELECT entity_id,
entity_name,
fiscal_year,
agg_type,
CASE
WHEN agg1 IS NOT NULL THEN agg1
WHEN agg1 IS NULL THEN "NOT SPECIFIED"
ELSE "NOT SPECIFIED"
END AS agg1,
net_amount
FROM
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'fund' AS agg_type,
INITCAP(fund1) AS agg1,
-- fund2 AS agg2,
-- fund3 AS agg3,
-- fund4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="EX"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'organization' AS agg_type,
INITCAP(org1) AS agg1,
-- org2 AS agg2,
-- org3 AS agg3,
-- org4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="EX"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'category' AS agg_type,
INITCAP(cat1) AS agg1,
-- cat2 AS agg2,
-- cat3 AS agg3,
-- cat4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="EX"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'program' AS agg_type,
INITCAP(program1) AS agg1,
-- program2 AS agg2,
-- program3 AS agg3,
-- program4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="EX"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1
-- agg2,
-- agg3,
-- agg4
)
UNION ALL
(
SELECT elookup.entity_id AS entity_id,
entity_name,
fiscal_year,
'function' AS agg_type,
INITCAP(function1) AS agg1,
-- function2 AS agg2,
-- function3 AS agg3,
-- function4 AS agg4,
ROUND(SUM(amount),2) AS net_amount
FROM `ut-sao-transparency-prod.transaction.transaction` AS tran
LEFT JOIN `ut-sao-transparency-prod.admin.entity_lookup`AS elookup ON tran.entity_name = elookup.sf_name
WHERE type="EX"
GROUP BY entity_id,
entity_name,
fiscal_year,
agg_type,
agg1--,
-- agg2,
-- agg3,
-- agg4
) ;
UPDATE `ut-sao-transparency-prod.tu_public.entity_expense_details`
SET agg1 = "NOT SPECIFIED"
WHERE agg1 IS NULL;