How to Optimize the Generation of Sequence Identifiers via the hi/lo Algorithm

4 min readFeb 18, 2025

How to Optimize the Generation of Sequence Identifiers via the hi/lo Algorithm

In MySQL and Hibernate 5, the GenerationType.AUTO generator type will result in using the TABLE generator. This adds a significant performance penalty. The TABLE generator type doesn’t scale well and is much slower than the IDENTITY and SEQUENCE (not supported in MySQL) generators types, even with a single database connection.

As a rule of thumb, always avoid the TABLE generator.

By default, the SEQUENCE generator must hit the database for each new sequence value via a SELECT statement. Assuming the following Author entity:

@Entity
public class Author implements Serializable {
   ...
   @Id
   @GeneratedValue(strategy = GenerationType.SEQUENCE) 
   private Long id;
...
}

Each persisted Author requires an identifier (the current sequence value) fetched via a database round trip materialized in the following SELECT:

SELECT nextval('hibernate_sequence')

relying on cached sequences or database sequence pre-allocation doesn’t help. for cached sequences, the application still requires a database round trip for every new sequence value. On the other hand, the database sequence pre-allocation still has a significant database round trip score.

This can be optimized via the Hibernate-specific hi/lo algorithm (especially with a high number of inserts). This algorithm is part of the Hibernate built-in optimizers capable of computing identifiers values in-memory. Therefore, using hi/lo reduces the number of database round trips and, as a consequence, increases application performance.

This algorithm splits the sequence domains into synchronously hi groups. The hi value can be provided by the database sequence (or the table generator), and its initial value is configurable (initial_value). Basically, at a single database round trip, the
hi/lo algorithm fetches from the database a new hi value and uses it to generate a number of identifiers given by a configurable increment (increment_size) representing the number of lo entries. While lo is in this range, no database round trip that fetches a new hi is needed and the in-memory generated identifiers can be safely used. When all the lo values are used, a new hi value is fetched via a new database round trip.

In code, the hi/lo algorithm can be employed for the Author entity, as shown here:

@Entity
public class Author implements Serializable {

    private static final long serialVersionUID = 1L;

    @Id
    // This will disable insert batching - AVOID IT!
    // @GeneratedValue(strategy = GenerationType.IDENTITY)

    // This will work, but better use the below solution to reduce database roundtrips
    // @GeneratedValue(strategy = GenerationType.AUTO)
    
    // This will allow insert batching and optimizes the identifiers
    // generation via the hi/lo algorithm which generated in-memory identifiers   
    @GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "hilo")
    @GenericGenerator(name = "hilo",
            strategy = "org.hibernate.id.enhanced.SequenceStyleGenerator",
            parameters = {
                @Parameter(name = "sequence_name", value = "hilo_sequence"),
                @Parameter(name = "initial_value", value = "1"),
                @Parameter(name = "increment_size", value = "10"),
                @Parameter(name = "optimizer", value = "hilo")
            })
    private Long id;

    private int age;
    private String name;
    private String genre;
}

The hi/lo algorithm requires several parameters:

sequence_name: This is the name of the database sequence (e.g., hilo_sequence); the database sequence is created via the following statement:

CREATE sequence hilo_sequence start 1 increment 1

initial_value: This is the first sequence value or the first hi (e.g., 1)
increment_size: This is the number of identifiers (number of lo entries) that will be computed in memory before fetching the next hi (e.g., 100)
optimizer: This is the name of the Hibernate built-in optimizers (in this case, hilo)

To generate identifiers in memory, the hi/lo algorithm uses the following formula to compute the valid range of values:

[increment_size x (hi - 1) + 1, increment_size x hi]

For example, conforming to these settings, the ranges of in-memory generated identifiers will be:

For hi=1, the range is [1, 100]
For hi=2, the range is [101, 200]
For hi=3, the range is [201, 300]
…

The lo value ranges from [0, increment_size) starting at (hi — 1) * increment_size) + 1.

Ned starts a transaction and fetches from the database a new hi and obtains the value 1.
Ned has two in-memory generated identifiers (1 and 2). He uses the identifier with a value of 1 to insert a row.
Jo starts her transaction and fetches from the database a new hi. She obtains the value 2.
Jo has two in-memory generated identifiers (3 and 4). She uses the identifier with value 3 to insert a row.
Jo triggers one more insert having the in-memory identifier with a value of 4.
Jo doesn’t have more in-memory generated identifiers; therefore, the program must fetch a new hi. This time, she gets from the database the value 3. Based on this hi, Jo can generate in-memory the identifiers with values 5 and 6.
Ned uses the in-memory generated identifier with value 2 to insert a new row.
Ned has no more in-memory generated identifiers; therefore, the program must fetch a new hi. This time, he gets from the database the value 4. Based on this hi, Ned can generate in-memory the identifiers with values 7 and 8.

This being said, a simple way to test the hi/lo algorithm consists of employing a quick batching process. Let’s insert in batches 1,000 Author instances (in the author table). The following service-method batches 1,000 inserts with a batch size of 30 via the saveAll() built-in method (while saveAll() is okay for examples, it’s not the proper choice for production

public void batch1000Authors() {
    List<Author> authors = new ArrayList<>();
    for (int i = 1; i <= 1000; i++) {
        Author author = new Author();
        author.setName("Author_" + i);
        authors.add(author);
    }
    authorRepository.saveAll(authors);
}

Thanks to the hi/lo algorithm, all 1,000 identifiers are generated using only 10 database round trips. The code fetches only 10 hi and, for each hi, it generates 100 identifiers in memory. This is way better than 1,000 database round trips. Each round trip for fetching a new hi looks as follows:

SELECT nextval('hilo_sequence')

IMP:

spring.datasource.url=jdbc:mysql://localhost:3306/bookstoredb?cachePrepStmts=true&useServerPrepStmts=true&rewriteBatchedStatements=true&createDatabaseIfNotExist=true

Written by Prateek

No responses yet