Ellen Friedman
Ellen is principal technologist at HPE Ezmeral, focused on large-scale analytics and artificial intelligence. She is also a committer for Apache Drill and Apache Mahout. With a Ph.D. in biochemistry, Ellen is an international speaker and co-author of multiple books.

You leave home for two weeks, so you carefully lock the door to keep your home secure. You briefly hesitate. Have you locked all the windows? Have you left a way for the plumbing and carpentry teams to get access when they come to remodel your bathroom? During your trip, your son announces that he’ll be dropping by for a couple days — now you need to grant him access to your home.

Now imagine your house has millions of windows to lock in thousands of rooms and many different groups are allowed access to only certain rooms. That’s how it feels to manage security for a large-scale data analytics and AI/ML system.

You can manage who has access and who is denied access to data in a variety of ways. Yet, is your approach to data access management practical as systems scale, as complexity increases, and most importantly, does it let you adapt easily to new situations? If not, data protection and control over data access may become inadequate because it’s too cumbersome and inconsistent.

Read More:   Update How MemSQL Enables Exactly-Once Semantics with Apache Kafka

Expressivity Makes Data Security Easier

One helpful approach uses a high degree of expressivity. To explain why this makes a difference, let’s compare data access management using the relatively simple but less expressive access control lists (ACLs) with a more expressive approach known as access control expressions (ACEs).

With file system ACLs, a list of users and groups with associated permissions are typically attached to each object in your system, such as a file or directory. These permissions control which operations the users, specified directly or by membership in a group, can do with data objects. That’s fairly straightforward. The ACL approach has some limitations, however, especially at large scale and complexity or when you need to make changes.

In addition, with ACLs, you have the power to give permission for access and a particular operation, yet you don’t have the power to explicitly deny permission. To set exclusion just by omitting a user from the ACL isn’t sufficient because that user might be a member of a group that is listed in the ACL.

Consider how this might play out in your data management approach. Imagine you grant access and read/write operations for particular data to people in finance and to a fraud analyst team as shown in Table 1.

SalariesTransactionsTraining data
GroupReadWriteReadWriteReadWrite
Finance
Fraud control

Table 1: Different groups have different data access needs that may be expressed as ACLs, but exclusion is hard to express explicitly with ACLs.

Now suppose you bring in some interns as members of the different groups. You don’t want them to read any salaries, and you don’t want them to be able to write to any files except for the interns in the data science group. It’s not easy to make that change without having to duplicate most of the permissions of each group for the corresponding group interns, which would be a violation of the DRY principle. This is just a toy situation, though. Imagine how difficult ongoing changes and updates are in a complex system with millions or billions of files. In simple terms, ACLs are somewhat lacking in expressive power.

Read More:   Update Microservices and Data with Vexata CTO Surya Varanasi

In contrast, an alternative way to control access is to use access control expressions. With ACEs, you can express the pattern of each column in the table above as a Boolean expression of users and groups. In fact, ACLs are a special case of ACEs because any ACL can be written as an expression using just “or” (Finance OR Fraud). ACEs also allow us to express exclusion. The read permission on transactions might be written, for instance, as (Finance OR Fraud) AND NOT Interns. The power comes from the ability to use the “and” and “not” operations in these expressions.

Where this extra expressive power of ACEs comes in particularly handy is when we want to allow a security team to specify high-level access constraints over an entire section of a file system. We also want to allow operational teams to specify their own permissions on specific files and directories. With ACEs, these two parts can be combined with an “and” operation. In doing so, we can be sure that the security team’s constraints will apply everywhere. For example, the security team might use an ACE such as TOP_SECRET_CLEARANCE AND FULL_TIME_EMPLOYEE, while the application team might specify what data engineers and data scientists can do. ACEs can express this easily, but ACLs cannot.

Be Careful How You Use ‘Not’

Using the “not” expression is powerful but must be done carefully, or you may not be denying access in the way you intend. It’s good practice to set permissions for inclusion independently from exclusion constraints. Think of it this way: A and C are groups, and you want to deny permission for a user of subgroup B. Remember that “and” has higher preference than “or.” If you define the permission as

Read More:   How a Security-Minded Culture Can Change Bad Habits – InApps 2022

A OR C AND NOT B

you are denying permission for B when B is a subset of C, but B would still have permission if this user or subgroup is a member of A.

It is better to set the ACE this way:

(A OR C) AND NOT (B)

In the second case, you’ve granted permission for groups A and C and denied permission for B, regardless of what other groups B may belong to. The parentheses aren’t strictly necessary, but are handy to mark the exclusion part of the expression.

Data Access Control on Different Levels

An example of a system that uses ACEs is the HPE Ezmeral Data Fabric. ACEs can be set at multiple levels in this data infrastructure and these ACEs are implicitly combined with an “and.” For example, overall security constraints can be expressed at a data fabric volume level, operational constraints at a directory, file or NoSQL database table level, while specific access to masked or unmasked data can be controlled at a column-level within a table. This range of control allows access to be controlled en masse or in detail or at any level in between.

To find out more about ACEs and other aspects of data management and data storage using the data fabric as your data platform, read the technical report HPE Ezmeral Data Fabric: Modern infrastructure for data storage and management.

 Featured image via Shutterstock, provided by HPE.