Since v1.10.0, Pegasus supports single row atomic operations.
The single row here means all data in the same HashKey.
Principle
Pegasus adopts a fixed hash sharding strategy for data distribution, where data from the same HashKey is always stored in the same partition, while the data is in the same Replica within a single node. Meanwhile, the write operations of the same Replica are always executed serially on the server side. Therefore, for data operations under the same HashKey, atomic semantics can be implemented conveniently.
For write operations, such as multiSet and multiDel, set or delete operate multiple SortKeys simultaneously in a single operation, atomic semantics can be comprehended easily, they are either successful or failed at the same time, so these two operations are single row atomic operations.
Here we focus on another type of operations: read first, then write, and the write operation depends on the result of the read operation.
The characteristic of this type of operations are non-idempotent, that is, if the same operation is repeated, the results (including the actual updates of data and the results returned to the user APIs) may be different. Atomic increment and decrement operations, as well as CAS operations, belong to this category. Pegasus can ensure the atomicity and consistency of such operations, because:
- The data of the same HashKey is always stored in the same Replica
- The write operations of the same Replica are always executed serially on the server side
- One operation is guaranteed to be executed exactly once, even in the event of data migration, downtime recovery, etc
Due to non-idempotent properties, such operations may conflict with other features of Pegasus, such as Duplication. So since 1.10.0, Pegasus provides an option to specify whether the cluster allows non-idempotent operations. If setting as false, all non-idempotent operations will return ERR_OPERATION_DISABLED
:
[replication]
allow_non_idempotent_write = false
Atomic increment and decrement operations
Although Pegasus does not support schema in values, it still provides atomic increment and decrement operations, similar to Redis’s incr command, refer to Pegasus’ interface incr.
Description
- Due to the fact that the storage engine RocksDB can only store values of byte string type, when using
incr()
, the value byte string will be read and converted toint64
type (the conversion method is simple String-to-Int). For example, the byte string"12345"
will be converted to a number12345
. After completing theincr()
operation, the obtained result will be converted back into a byte string and stored as a new value - When converting a byte string to
int64
, it may encounter errors, such as invalid numbers or overflow ofint64
, all of these cases will return a failure status - If the original value does not exist, it is considered as
0
and thenincr()
operation is executed normally - The operand
increment
can be positive or negative, so theincr()
interface can implement as atomic increment and decrement - TTL: If the original value exists, the TTL of the new value and the original value remains the same. If the original value does not exist, the new value will be stored without TTL
CAS operations
Another useful type of atomic operations are the CAS (Compare-And-Swap) operations. Based on CAS operations, many advanced distribute concurrency features can be implemented, such as distributed locks.
Pegasus provides check_and_set CAS operations, the semantics are: whether to update the value of one SortKey is depends on whether the value of another SortKey of the same HashKey meets certain conditions.
The SortKey which is used to determine the conditions is called CheckSortKey
, the SortKey which is used to set value is called SetSortKey
. Correspondingly, the value of CheckSortKey
is called CheckValue
, and the value to be set by SetSortKey
is called SetValue
.
See checkAndSet, as well as its extended versions checkAndMutate and compareExchange.
Description
- The value of
SetSortKey
will be set only whenCheckValue
meets the specified conditions - The condition types that need to be met are specified through
CheckType
, and someCheckType
also require the specified operandCheckOperand
. Currently, supporting:- Determine the existence of
CheckValue
: Whether it exists, or is an empty byte string, etc - Byte string comparison: Compare
CheckValue
andCheckOperand
in byte order to check if they meet the relationships of<
,<=
,==
,>=
, or>=
- Number comparison: similar to Atomic increment and decrement operations, convert
CheckValue
to int64, then compare the converted int64 value withCheckOperand
, to check if they meet the relationships of<
,<=
,==
,>=
, or>=
- Determine the existence of
CheckSortKey
andSetSortKey
can be the same. If they are the same, it means checking whether the old value meets the condition first. If it does, set it to the new value for the same SortKey- You can enable the
CheckAndSetOptions.returnCheckValue
option if you want to return the value ofCheckValue
- You can enable the
CheckAndSetOptions.setValueTTLSeconds
option if you want to specify TTL
For ease of use, Pegasus Java Client also provides compare_exchange interface: When the value of a SortKey is equal to the user specified ExpectedValue
in byte string, its value will be updated to the user specified DesiredValue
. Semantically, compare_exchange is a special form of Compare-And-Swap. The interface can be found in compareExchange.
Actually, compare_exchange is a specialized form of check_and_set, namely:
CheckSortKey
andSetSortKey
are the sameCheckType
isCT_VALUE_BYTES_EQUAL