This describes the analysis of the legacy-data-import which took way too long, which turned out to be a problem in the RBAC-access-rights-check as well as `EntityManager.persist` creating too many SQL queries.
During the legacy data import for hosting assets we noticed massive performance problems. The import of about 2200 hosting-assets (IP-numbers, managed-webspaces, managed- and cloud-servers) as well as the creation of booking-items and booking-projects as well as necessary office-data entities (persons, contacts, partners, debitors, relations) **took 25 minutes**.
Importing hosting assets up to UnixUsers and EmailAddresses even **took about 100 minutes**.
(The office data import sometimes, but rarely, took only 10min.
We could not find a pattern, why that was the case. The impression that it had to do with too many other parallel processes, e.g. browser with BBB or IntelliJ IDEA was proved wrong, but stopping all unnecessary processes and performing the import again.)
LEFT JOIN hs_office_person_rv a1_0 ON a1_0.uuid=hore1_0.anchoruuid
LEFT JOIN hs_office_contact_rv c1_0 ON c1_0.uuid=hore1_0.contactuuid
LEFT JOIN hs_office_person_rv h1_0 ON h1_0.uuid=hore1_0.holderuuid
WHERE hore1_0.uuid=$1
```
That query on the `hs_office_relation_rv`-table joins the three references anchor-person, holder-person and contact.
### Total-Query-Time > Total-Import-Runtime
That both queries total up to more than the runtime of the import-process is most likely due to internal parallel query processing.
## Attempts to Mitigate the Problem
### VACUUM ANALYZE
In the middle of the import, we updated the PostgreSQL statistics to recalibrate the query optimizer:
```SQL
VACUUM ANALYZE;
```
This did not improve the performance.
### Improving Joins + Indexes
We were suspicious about the sequential scan over all `rbacpermission` rows which was done by PostgreSQL to execute a HashJoin strategy. Turning off that strategy by
```SQL
ALTER FUNCTION queryAccessibleObjectUuidsOfSubjectIds SET enable_hashjoin = off;
```
did not improve the performance though. The HashJoin was actually still applied, but no full table scan anymore:
-> CTE Scan on grants (cost=0.00..11.84 rows=592 width=16)
[...]
```
The HashJoin strategy could be great if the hash-map could be kept for multiple invocations. But during an import process, of course, there are always new rows in the underlying table and the hash-map would be outdated immediately.
Also creating indexes which should suppor the RBAC query, like the following, did not improve performance:
```SQL
create index on RbacPermission (objectUuid, op);
create index on RbacPermission (opTableName, op);
```
### LAZY loading for Relation.anchorPerson/.holderPerson/
At this point, the import took 21mins with these statistics:
| query | calls | total_m | mean_ms |
|-------|-------|---------|---------|
| select hore1_0.uuid,a1_0.uuid,a1_0.familyname,a1_0.givenname,a1_0.persontype,a1_0.salutation,a1_0.title,a1_0.tradename,a1_0.version,c1_0.uuid,c1_0.caption,c1_0.emailaddresses,c1_0.phonenumbers,c1_0.postaladdress, c1_0.version,h1_0.uuid,h1_0.familyname,h1_0.givenname,h1_0.persontype,h1_0.salutation,h1_0.title,h1_0.tradename,h1_0.version,hore1_0.mark,hore1_0.type,hore1_0.version from public.hs_office_relation_rv hore1_0 left join public.hs_office_person_rv a1_0 on a1_0.uuid=hore1_0.anchoruuid left join public.hs_office_contact_rv c1_0 on c1_0.uuid=hore1_0.contactuuid left join public.hs_office_person_rv h1_0 on h1_0.uuid=hore1_0.holderuuid where hore1_0.uuid=$1 | 517 | 11 | 1282 |
| select hope1_0.uuid,hope1_0.familyname,hope1_0.givenname,hope1_0.persontype,hope1_0.salutation,hope1_0.title,hope1_0.tradename,hope1_0.version from public.hs_office_person_rv hope1_0 where hope1_0.uuid=$1 | 973 | 4 | 254 |
| select hoce1_0.uuid,hoce1_0.caption,hoce1_0.emailaddresses,hoce1_0.phonenumbers,hoce1_0.postaladdress,hoce1_0.version from public.hs_office_contact_rv hoce1_0 where hoce1_0.uuid=$1 | 973 | 4 | 253 |
| with recursive grants as ( select descendantUuid, ascendantUuid from RbacGrants where descendantUuid = grantedId union all select ""grant"".descendantUuid, ""grant"".ascendantUuid from RbacGrants ""grant"" inner join grants recur on recur.ascendantUuid = ""grant"".descendantUuid ) select exists ( select $3 from grants where ascendantUuid = any(granteeIds) ) or grantedId = any(granteeIds) | 47540 | 0 | 0 |
| insert into RbacGrants (grantedByTriggerOf, ascendantuuid, descendantUuid, assumed) values (currentTriggerObjectUuid(), superRoleId, subRoleId, doAssume) on conflict do nothing" | 40472 | 0 | 0 |
The slowest query now was fetching Relations joined with Contact, Anchor-Person and Holder-Person, for all tables using the restricted (RBAC) views (_rv).
We changed these mappings from `EAGER` (default) to `LAZY` to `@ManyToOne(fetch = FetchType.LAZY)` and got this result:
| with recursive grants as ( select descendantUuid, ascendantUuid from RbacGrants where descendantUuid = grantedId union all select ""grant"".descendantUuid, ""grant"".ascendantUuid from RbacGrants ""grant"" inner join grants recur on recur.ascendantUuid = ""grant"".descendantUuid ) select exists ( select $3 from grants where ascendantUuid = any(granteeIds) ) or grantedId = any(granteeIds) | 47538 | 0 | 0 |
insert into RbacGrants (grantedByTriggerOf, ascendantuuid, descendantUuid, assumed) values (currentTriggerObjectUuid(), superRoleId, subRoleId, doAssume) on conflict do nothing | 40472 | 0 | 0 |
Now, finally, the total runtime of the import was down to 12 minutes. This is repeatable, where originally, the import took about 25mins in most cases and just rarely - and for unknown reasons - 10min.
But once UnixUser and EmailAlias assets got added to the import, the total time went up to about 110min.
This was not acceptable, especially not, considering that domains, email-addresses and database-assets are almost 10 times that number and thus the import would go up to over 1100min which is 20 hours.
In a first step, a `HsHostingAssetRawEntity` was created, mapped to the raw table (hs_hosting_asset) not to the RBAC-view (hs_hosting_asset_rv). Unfortunately we did not keep measurements, but that was only part of the problem anyway.
The main problem was, that there is something strange with persisting (`EntityManager.persist`) for EmailAlias assets. Where importing UnixUsers was mostly slow due to RBAC SELECT-permission checks, persisting EmailAliases suddenly created about a million (in numbers 1.000.000) SQL UPDATE statements after the INSERT, all with the same data, just increased version number (used for optimistic locking). We were not able to figure out why this happened.
Keep in mind, it's the same table with the same RBAC-triggers, just a different value in the type column.
Once `EntityManager.persist` was replaced by an explicit SQL INSERT - just for `HsHostingAssetRawEntity`, the total time was down to 17min. Thus importing the UnixUsers and EmailAliases took just 5min, which is an acceptable result. The total import of all HostingAssets is now estimated to about 1 hour (on my developer laptop).
That the `INSERT into hs_hosting_asset` (No. 1) takes up the most time, seems to be normal, and 21ms for each call is also fine.
It seems that the trigger effects (eg. No. 3 and No. 4) are included in the measure for the causing INSERT, otherwise summing up the totals would exceed the actual total time of the whole import. And it was to be expected that building the RBAC rules for new business objects takes most of the time.
In production, the `SELECT ... FROM hs_office_relation_rv` (No. 2) with about 0.5 seconds could still be a problem. But once we apply the improvements from the hosting asset area also to the office area, this should not be a problem for the import anymore.
## Further Options To Explore
1. Instead of separate SQL INSERT statements, we could try bulk INSERT.
2. We could use the SQL INSERT method for all entity-classes, or at least for all which have high row counts.
3. For the production code, we could use raw-entities for referenced entities, here usually RBAC SELECT permission is given anyway.
Avoiding EAGER-loading where not necessary, reduced the total runtime of the import to about the half.
The major improvement came from using direct INSERT statements, which then also bypassed the RBAC SELECT permission checks.
### What Still Has To Be Done?
Where this performance analysis was mostly helping the performance of the legacy data import, we still need measures and improvements for the productive code.
For sure, using more LAZY-loading also helps in the production code. For some more ideas see section _Further Options To Explore_.