This article discusses the pioneering efforts by Ory to build the open source implementation of Google Zanzibar authorization system called Ory Keto.
Authorization is the process of determining who is allowed to do what in an application. Especially in critical global systems featuring human and machine actors with different sets of permissions it is still a hard problem to solve. In such a system, billions of permission checks have to be processed fast, reliably and consistently across many different environments.
The research paper describes how Google solved authorization globally and across all products:
"Zanzibar provides a uniform data model and configuration language for expressing a wide range of access control policies from hundreds of client services at Google, including Calendar, Cloud, Drive, Maps, Photos, and YouTube. Its authorization decisions respect causal ordering of user actions and thus provide external consistency amid changes to access control lists and object contents. Zanzibar scales to trillions of access control lists and millions of authorization requests per second to support services used by billions of people. It has maintained 95th-percentile latency of less than 10 milliseconds and availability of greater than 99.999% over 3 years of production use."
Problem Statement
Google needed a solution that would enabled them to continue to grow at a massive scale, but still respond within milliseconds - independent of location around the globe. They had a number of hard requirements for this to work, it had to be:
- flexible;
- low latency;
- large scale;
- always available and
- consistent.
A highly qualified team of researchers came up with a new type of access system,
later codenamed Zanzibar (originally it was
called ”Spice”).
”Spice” from the space saga “Dune” makes it possible to safely travel to other
planets and grants superhuman powers.
For cloud based services a globally spanning low latency access control system
is nothing short of a superpower.
At Ory, we knew we had to unlock this superpower free and open for everyone!
With the previous approach based on OPA
prior to 2019, Ory Keto reached some boundary conditions in both complexity and
scalability.
Many developers within our own community were also looking for a solution;
prompting us to study alternatives. There is also work in progress on some other
proprietary implementations of the Zanzibar paper, most notably by
authzed and
auth0. We believe this shows that there is
great demand for an open source solution that is community vetted and
transparent.
Building upon the knowledge that was available publicly, the Ory team devised a new generation of Ory Keto, with input and hard work from our awesome open source community.
Ory uses Keto as part of the Ory Platform, a comprehensive security system designed to meet the needs of sophisticated application developers seeking to provide cloud identity and access management in low latency infrastructure at scale.
Flexible
“When strangers meet, great allowances should be made for differences in custom and training.”- Frank Herbert, Dune
Ory Keto is designed with the goal to serve hundreds or thousands of different applications and services (it's clients). Each client may have different access control patterns, so it is crucial it works for all possible patterns.
Small applications with few users, resources and integrations have vastly different requirements than massive multi-national cloud services with millions of users. All kinds of services must be able to manage access control policies at any detail and in compliance with their internal data structure. Ory Keto needs to be able to understand and work with all policies.
A simple, but powerful data model provides the basis for this, combined with effective configuration capabilities.
This data model allows to define arbitrary relationships such as owner
,
editor
, and pays for
between digital objects and subjects (like users).
In Ory Keto, Access Control Lists (ACLs) contain information on
is _subject_ allowed to do _relation_ on _object_
.
object
, relation
and subject
are all variables that can be defined as
required, together they form a
"relation tuple".
For the simplest case a relation tuple would be "user X has the relation Y to object Z". But ACLs can become much more complex for example "set of users X has the relation Y to object Z", while this set of users is defined as another relation tuple. So ACLs can contain other ACLs, for example to define that a set of users that can view a document is made up of the users that have been granted rights to modify that specific document as well as those who have viewing permissions for the whole folder of documents.
Low latency and high availability
”Delay is as dangerous as the wrong answer.” - Frank Herbert, Dune
Access control systems need to be secure, consistent and flexible enough to meet
the myriad of user demands.
They also have to serve high numbers of users around the planet, while ideally
being unnoticed and available at all times.
Authorization checks are usually in the path of the most critical parts of our
interaction with digital systems.
When a service wants to do something that could have dire consequences, delete a
table for instance, authorization checks should assure that the initiator are
really allowed to perform that action.
It is important to not confuse this with authentication (is this person who they pretend to be?). This question can be answered by Ory Kratos, which is a fully fledged modern identity management system.
In sophisticated cloud systems this can mean that thousands of authorization checks have to be made to serve a basic search response. If each of those checks takes only tenths of a second, round trips and waiting times can range in the minutes. This is bothersome when searching for something, but not acceptable when running critical infrastructure.
Worse, if the authorization system is offline, all requests have to be denied - completely blocking even basic interactions. Ory Keto was designed to be able to meet these demands spanning across clouds and environments.
Consistency
”A place is only a place.”- Frank Herbert, Dune
One concern is strong consistency. In distributed systems this means you always get the same response if you send the same requests to different parts of your system. This is opposed to "eventual consistency", which simplified means that you will get the same response once all updates are propagated.
In an ideal world all access control systems would be a strongly consistent
system.
The problem is, that this is clashing with other requirements, especially low
latency and high availability.
Because of the required strict data duplication, a strongly consistent system
can not achieve low latency worldwide. Also your access control policies are
probably not in the same physical place as the data you want to protect.
The solution is to use version snapshot tokens (short "snaptokens"). They allow
us to provide a strongly consistent response from an eventually consistent
system.
The basic idea is that all digital objects are versioned. Keto only answers
allow
if access to this specific version is granted. After an ACL update is
issued (grant or revoke permissions), there is a short period until the update
propagates. The snaptokens ensure that the system only ever wrongly answers
allowed
if the permission on the requested version was previously granted.
This requires the application to store the snaptoken next to every object
version and provide it on every access check request. In essence, Keto
guarantees that an entity can only access objects they currently have access to
and the exact versions of the objects they used to have access to.
For example a common scenario is that a user had access to a previous object
version, but then access was revoked to the object. During the propagation time
the system can still answer allow
for the old version as defined by the
snaptoken. But when newer object versions are requested, it will always be
denied.
Similarly, the system might answer deny
for older object versions if access
was granted later on, but will always answer allow
for object versions younger
than the last ACL update.
This means that the system is not only consistent within itself - same requests lead to the same response in different parts of your system - but also in sync with data object changes.
One important point is that Keto can be operated locally as well, which means there are no database sync delays.
Currently snaptokens are not implemented in Ory Keto, but all the concepts have been worked out and integrated into the design already. Actual consistency guarantees will be added soon.
Conclusion
”The spice must flow.” - Frank Herbert, Dune
So far the Ory Keto effort implements:
- Access Control Language
- Data model
- Example configurations
- Read / Write / Check / Expand API
- Namespaces
In the near future:
- Consistency guarantees using snaptokens
- Globe spanning cluster operation mode
- Interoperability with other Ory products including Hydra and Kratos
Also check our Implemented and Planned Features documentation for more information.
If you are interested in seeing what this software package can do, you might
want to check out the
quickstart example and the
full feature example.
Also read more about the
concepts implemented in
Ory Keto.
Ory, a multi-cloud SaaS platform, aims to advance the state of zero trust
security by making it easier and safer for developers to implement a number of
boilerplate services such as ID management, credentials, service authorisation,
authentication, rule based reverse proxy and access control in a low latency
access system.
Ory has years of open source software experience with a global community of
contributors, leading zero trust security products such as
Ory Hydra, and tens of thousands of satisfied users
worldwide.