Overview

Apache Pegasus is a distributed key-value storage system which is designed to be:

  • horizontally scalable: distributed using hash-based partitioning
  • strongly consistent: ensured by PacificA consensus protocol
  • high-performance: using RocksDB as underlying storage engine
  • simple: well-defined, easy-to-use APIs

Background

Pegasus targets to fill the gap between Redis and HBase. As the former is in-memory, low latency, but does not provide a strong-consistency guarantee. And unlike the latter, Pegasus is entirely written in C++ and its write-path relies merely on the local filesystem.

Apart from the performance requirements, we also need a storage system to ensure multiple-level data safety and support fast data migration between data centers, automatic load balancing, and online partition split.

Features

  • Persistence of data: Each write is replicated three-way to different ReplicaServers before responding to the client. Using PacificA protocol, Pegasus has the ability for strong consistent replication and membership changes.

  • Automatic load balancing over ReplicaServers: Load balancing is a builtin function of MetaServer, which manages the distribution of replicas. When the cluster is in an inbalance state, the administrator can invoke a simple rebalance command that automatically schedules the replica migration.

  • Cold Backup: Pegasus supports an extensible backup and restore mechanism to ensure data safety. The location of snapshot could be a distributed filesystem like HDFS or local filesystem. The snapshot storing in the filesystem can be further used for analysis based on pegasus-spark.

  • Eventually-consistent intra-datacenter replication: This is a feature we called duplication. It allows a change made in the local cluster accesible after a short time period by the remote cluster. It help achieving higher availability of your service and gaining better performance by accessing only local cluster.

Presentations

(Incomplete statistics. If you have any new Pegasus related sharing, please feel free to submit a PR)

  • 2023, Chengdu China, COSCon 2023, How does Apache Pegasus used in SensorsData, Guohao Li (Intro, Slides)
  • 2023, Beijing China, DataFunSummit 2023, The Implementation and Future Planning of Apache Pegasus Application, Yuchen He
  • 2022, Beijing China, DataFunSummit 2022, The Design, Implementation, and Open Source Way of Pegasus, Yuchen He (Intro)
  • 2022, Beijing China, Pegasus meetup, How does the Apache Pegasus used in Advertising Data Stream in SensorsData, Jiaoming Shi (Slides, video)
  • 2022, Beijing China, Pegasus meetup, How to continuously improve Apache Pegasus in complex toB scenarios, Hao Wang (Slides, video)
  • 2022, Beijing China, Pegasus meetup, The Construction and Practice of Apache Pegasus in Offline and Online Scenarios Integration, Wei Wang (Slides, video)
  • 2022, Beijing China, Pegasus meetup, How does Apache Pegasus used in Xiaomi’s Universal Recommendation Algorithm Framework, Wei Liang (Slides, video)
  • 2022, Beijing China, Pegasus meetup, The Introduction of the Apache Pegasus 2.4.0 release, Shuo Jia (Slides, video)
  • 2022, Online, ApacheCon Asia 2022, How does Apache Pegasus (incubating) community develop at SensorsData, Dan Wang, Yingchun Lai (Slides, video)
  • 2021, Beijing China, System Software Tech Day, Apache Pegasus: A high performance, strong consistent distributed key-value storage system, Yuchen He (Intro, video)
  • 2021, Beijing China, Pegasus meetup, The Design, Implementation and Open Source Way of Apache Pegasus, Yuchen He (Slides, video)
  • 2021, Beijing China, Pegasus meetup, Apache Pegasus’s Practice in Data Access Business of Xiaomi, Fateng Xiao (Slides, video)
  • 2021, Beijing China, Pegasus meetup, The Advertising Algorithm Architecture in Xiaomi and How does Pegasus Practice in Feature Caching, Gang Hao (Slides, video)
  • 2021, Beijing China, Pegasus meetup, How do we manage more than one thousand of Pegasus clusters - engine part, Guohao Li (Slides, video)
  • 2021, Beijing China, Pegasus meetup, How do we manage more than one thousand of Pegasus clusters - backend part, Dan Wang (Slides, video)
  • 2021, Online, ApacheCon Asia 2021, Apache Pegasus (incubating): A distributed key-value storage system, Yuchen He, Shuo Jia (Slides, video)
  • 2020, Beijing China, MIDC 2020, Pegasus: Make an open source Key-Value storage system, Tao Wu (Intro)
  • 2018, Beijing China, MIDC 2018, Pegasus: A distributed Key-Value storage system, Zuoyan Qin
  • 2018, Beijing China, Pegasus In Depth, Zuoyan Qin (Slides)
  • 2018, Beijing China, Pegasus KV Storage, Let the Users focus on their work, Zuoyan Qin (Slides)
  • 2017, Shenzhen China, ArchSummit, Behind Pegasus, What matters in a Distributed System, Weijie Sun (Intro, Slides)
  • 2016, Beijing China, ArchSummit, Pegasus: Designing a Distributed Key Value System, Zuoyan Qin (Intro, Slides)