This describes the analysis of the legacy-data-import which took way too long, which turned out to be a problem in the RBAC-access-rights-check.
## Our Performance-Problem
During the legacy data import for hosting assets we noticed massive performance problems. The import of about 2200 hosting-assets (IP-numbers, managed-webspaces, managed- and cloud-servers) as well as the creation of booking-items and booking-projects as well as necessary office-data entities (persons, contacts, partners, debitors, relations) **took 10-25 minutes**.
We could not find a pattern, why the import mostly took about 25 minutes, but sometimes took *just* 10 minutes. The impression that it had to do with too many other parallel processes, e.g. browser with BBB or IntelliJ IDEA was proved wrong, but stopping all unnecessary processes and performing the import again.
## Preparation
### Configuring PostgreSQL
The pg_stat_statements PostgreSQL-Extension can be used to measure how long queries take and how often they are called.
The module auto_explain can be used to automatically run EXPLAIN on long-running queries.
To use this extension and module, we extended the PostgreSQL-Docker-image:
LEFT JOIN hs_office_person_rv a1_0 ON a1_0.uuid=hore1_0.anchoruuid
LEFT JOIN hs_office_contact_rv c1_0 ON c1_0.uuid=hore1_0.contactuuid
LEFT JOIN hs_office_person_rv h1_0 ON h1_0.uuid=hore1_0.holderuuid
WHERE hore1_0.uuid=$1
```
That query into the `hs_office_relation_rv`-table is presumably the query to determine the partner of a debitor, which requires two of such queries each (Debitor -> Debitor-Relation -> Debitor-Anchor == Partner-Holder -> Partner-Relation -> Partner).
### Total-Query-Time > Total-Import-Runtime
That both queries total up to more than the runtime of the import-process is most likely due to internal parallel query processing.
## Attempts to Mitigate the Problem
### VACUUM ANALYZE
In the middle of the import, we updated the PostgreSQL statistics to recalibrate the query optimizer:
```SQL
VACUUM ANALYZE;
```
This did not improve the performance.
### Improving Indexes
The sequentiap
create index on RbacPermission (objectUuid, op);
create index on RbacPermission (opTableName, op);
###
We were suspicious about the sequential scan over all `rbacpermission` rows which was done by PostgreSQL to execute a HashJoin strategy. Turning off that strategy by
```SQL
ALTER FUNCTION queryAccessibleObjectUuidsOfSubjectIds SET enable_hashjoin = off;
```
did not improve the performance though. The HashJoin was actually still applied, but no full table scan anymore:
-> CTE Scan on grants (cost=0.00..11.84 rows=592 width=16)
[...]
```
The HashJoin strategy could be great if the hash-map could be kept for multiple invocations. But during an import process, of course, there are always new rows in the underlying table and the hash-map would be outdated immediately.
### LAZY loading for Relation.anchorPerson/.holderPerson/
The slowest query now was fetching Relations joined with Contact, Anchor-Person and Holder-Person, for all tables using the restricted (RBAC) views (_rv).
We changed these mappings from `EAGER` (default) to `LAZY` to `@ManyToOne(fetch = FetchType.LAZY)` and got this result:
| query | calls | total (min) | mean (ms) |
|-------|-------|-------------|-----------|
| select hore1_0.uuid,hore1_0.anchoruuid,hore1_0.contactuuid,hore1_0.holderuuid,hore1_0.mark,hore1_0.type,hore1_0.version from hs_office_relation_rv hore1_0 where hore1_0.uuid=$1 | 517 | 5 | 565 |
| select hope1_0.uuid,hope1_0.familyname,hope1_0.givenname,hope1_0.persontype,hope1_0.salutation,hope1_0.title,hope1_0.tradename,hope1_0.version from hs_office_person_rv hope1_0 where hope1_0.uuid=$1 | 1015 | 4 | 240 |
| select hoce1_0.uuid,hoce1_0.caption,hoce1_0.emailaddresses,hoce1_0.phonenumbers,hoce1_0.postaladdress,hoce1_0.version from hs_office_contact_rv hoce1_0 where hoce1_0.uuid=$1 | 497 | 2 | 235