This describes the analysis of the legacy-data-import which took way too long, which turned out to be a problem in the RBAC-access-rights-check.
## Our Performance-Problem
During the legacy data import for hosting assets we noticed massive performance problems. The import of about 2200 hosting-assets (IP-numbers, managed-webspaces, managed- and cloud-servers) as well as the creation of booking-items and booking-projects as well as necessary office-data entities (persons, contacts, partners, debitors, relations) **took 10-25 minutes**.
We could not find a pattern, why the import mostly took about 25 minutes, but sometimes took *just* 10 minutes. The impression that it had to do with too many other parallel processes, e.g. browser with BBB or IntelliJ IDEA was proved wrong, but stopping all unnecessary processes and performing the import again.
## Preparation
### Configuring PostgreSQL
The pg_stat_statements PostgreSQL-Extension can be used to measure how long queries take and how often they are called.
The module auto_explain can be used to automatically run EXPLAIN on long-running queries.
To use this extension and module, we extended the PostgreSQL-Docker-image:
LEFT JOIN hs_office_person_rv a1_0 ON a1_0.uuid=hore1_0.anchoruuid
LEFT JOIN hs_office_contact_rv c1_0 ON c1_0.uuid=hore1_0.contactuuid
LEFT JOIN hs_office_person_rv h1_0 ON h1_0.uuid=hore1_0.holderuuid
WHERE hore1_0.uuid=$1
```
That query on the `hs_office_relation_rv`-table joins the three references anchor-person, holder-person and contact.
### Total-Query-Time > Total-Import-Runtime
That both queries total up to more than the runtime of the import-process is most likely due to internal parallel query processing.
## Attempts to Mitigate the Problem
### VACUUM ANALYZE
In the middle of the import, we updated the PostgreSQL statistics to recalibrate the query optimizer:
```SQL
VACUUM ANALYZE;
```
This did not improve the performance.
### Improving Joins + Indexes
We were suspicious about the sequential scan over all `rbacpermission` rows which was done by PostgreSQL to execute a HashJoin strategy. Turning off that strategy by
```SQL
ALTER FUNCTION queryAccessibleObjectUuidsOfSubjectIds SET enable_hashjoin = off;
```
did not improve the performance though. The HashJoin was actually still applied, but no full table scan anymore:
-> CTE Scan on grants (cost=0.00..11.84 rows=592 width=16)
[...]
```
The HashJoin strategy could be great if the hash-map could be kept for multiple invocations. But during an import process, of course, there are always new rows in the underlying table and the hash-map would be outdated immediately.
Also creating indexes which should suppor the RBAC query, like the following, did not improve performance:
```SQL
create index on RbacPermission (objectUuid, op);
create index on RbacPermission (opTableName, op);
```
### LAZY loading for Relation.anchorPerson/.holderPerson/
At this point, the import took 21mins with these statistics:
| query | calls | total_m | mean_ms |
|-------|-------|---------|---------|
| select hore1_0.uuid,a1_0.uuid,a1_0.familyname,a1_0.givenname,a1_0.persontype,a1_0.salutation,a1_0.title,a1_0.tradename,a1_0.version,c1_0.uuid,c1_0.caption,c1_0.emailaddresses,c1_0.phonenumbers,c1_0.postaladdress, c1_0.version,h1_0.uuid,h1_0.familyname,h1_0.givenname,h1_0.persontype,h1_0.salutation,h1_0.title,h1_0.tradename,h1_0.version,hore1_0.mark,hore1_0.type,hore1_0.version from public.hs_office_relation_rv hore1_0 left join public.hs_office_person_rv a1_0 on a1_0.uuid=hore1_0.anchoruuid left join public.hs_office_contact_rv c1_0 on c1_0.uuid=hore1_0.contactuuid left join public.hs_office_person_rv h1_0 on h1_0.uuid=hore1_0.holderuuid where hore1_0.uuid=$1 | 517 | 11 | 1282 |
| select hope1_0.uuid,hope1_0.familyname,hope1_0.givenname,hope1_0.persontype,hope1_0.salutation,hope1_0.title,hope1_0.tradename,hope1_0.version from public.hs_office_person_rv hope1_0 where hope1_0.uuid=$1 | 973 | 4 | 254 |
| select hoce1_0.uuid,hoce1_0.caption,hoce1_0.emailaddresses,hoce1_0.phonenumbers,hoce1_0.postaladdress,hoce1_0.version from public.hs_office_contact_rv hoce1_0 where hoce1_0.uuid=$1 | 973 | 4 | 253 |
| with recursive grants as ( select descendantUuid, ascendantUuid from RbacGrants where descendantUuid = grantedId union all select ""grant"".descendantUuid, ""grant"".ascendantUuid from RbacGrants ""grant"" inner join grants recur on recur.ascendantUuid = ""grant"".descendantUuid ) select exists ( select $3 from grants where ascendantUuid = any(granteeIds) ) or grantedId = any(granteeIds) | 47540 | 0 | 0 |
| insert into RbacGrants (grantedByTriggerOf, ascendantuuid, descendantUuid, assumed) values (currentTriggerObjectUuid(), superRoleId, subRoleId, doAssume) on conflict do nothing" | 40472 | 0 | 0 |
The slowest query now was fetching Relations joined with Contact, Anchor-Person and Holder-Person, for all tables using the restricted (RBAC) views (_rv).
We changed these mappings from `EAGER` (default) to `LAZY` to `@ManyToOne(fetch = FetchType.LAZY)` and got this result:
| with recursive grants as ( select descendantUuid, ascendantUuid from RbacGrants where descendantUuid = grantedId union all select ""grant"".descendantUuid, ""grant"".ascendantUuid from RbacGrants ""grant"" inner join grants recur on recur.ascendantUuid = ""grant"".descendantUuid ) select exists ( select $3 from grants where ascendantUuid = any(granteeIds) ) or grantedId = any(granteeIds) | 47538 | 0 | 0 |
insert into RbacGrants (grantedByTriggerOf, ascendantuuid, descendantUuid, assumed) values (currentTriggerObjectUuid(), superRoleId, subRoleId, doAssume) on conflict do nothing | 40472 | 0 | 0 |
Now, finally, the total runtime of the import was down to 12 minutes. This is repeatable, where originally, the import took about 25mins in most cases and just rarely - and for unknown reasons - 10min.
## Summary
That the import runtime is down to about 12min is repeatable, where originally, the import took about 25mins in most cases and just rarely - and for unknown reasons - just 10min.
Merging the recursive CTE query to determine the RBAC SELECT-permission, made it more clear which business-queries take the time.
Avoiding EAGER-loading where not neccessary, reduced the total runtime of the import to about the half.