Thank you for the detailed response Jakub.

-mdg

On Thu, Jan 30, 2020 at 6:22 AM Jakub Skoczen <jakub@indexdata.com> wrote:
I'll just confirm what others said on this thread -- this is not a security issue.

Zak -- there seems to be an escaping/encoding problem in the UI though -- we don't want user input to "break" or "manipulate" UI-generated CQL queries. I think the UI should escape all special CQL chars when constructing CQL queries, including double-quotes, to prevent this.

Marc and Tod -- as far as potential backend vulnerabilities go, I don't think we should be using the term "CQL injection" as there's no such thing. CQL is exposed in the public APIs (it's part of the API). The specific concerns Tod listed (2 and 3) are worth investigating though and they would apply to API design in general, they are not specific to using CQL. We probably need to dig deeper into those but here are my initial thoughts (again this is not related to the problem reported by Kevin):

1. reading sensitive data that the user does not have no access to

In FOLIO we don't use DB row-level nor field-level security. Platform authorization (permissions) is enforced per API endpoint -- either the user has access to one or not. FOLIO has a "hook" for implementing field- or row-level security with the so-called "desired permissions" -- not enforced by the platform but passed down to the module  -- but it's left to the module to implement and enforce. What this means in practice is that if you wish to have certain fields more "sensitive" than others (e.g password field in the user record) and have it accessible only to a subset of users you have two options:
  a) implement access control in the module --- e.g add a new "desired permission" called "user.item.get.sensitive". This permission will not be enforced in the gateway but it will be passed down to the module in the headers. The module must then verify the permission and strip/include sensitive data accordingly. Afaik, no module in FOLIO actually does this (mod-login being an exception)
  b) structure the API so that the sensitive information is accessible under a separate endpoint -- e.g assuming user is retrieved through GET /user, the sensitive portion would be in GET /user/sensitive. It would need it's own JSON schema (there must be one for each endpoint) either for the entire record or just the sensitive part. In this case the permission to read sensitive data is a "required permission" and it's enforced by the gateway

That's the basics and in the case when a module implements its own access control -- it's hard to reason about what may happen (anything?).

In the platform-enforced access control case, there is a risk of "leaking" information from other "endpoints" through a bad module implementation -- e.g when processing a request the module may read something sensitive from it's DB and append to the response. I guess the only way, I can imagine, we could truly prevent bad implementations is by granulary enforcing data access in the DB (e.g a separate ROLE for each table) and mapping that up to the endpoint -- e.g request to access /user has only privileges to access the USER table. This would be an extremely complex and elaborate DB ROLE scheme though and I've never seen that used in practice. I guess it's possible in theory though. We would still need to figure out who does the ROLE "switching", we obviously can't leave it up to the module -- if we assume the module may include "bad implementation" it could also break switching.

So that's in general, fortunately in FOLIO most modules use a similar approach for API impl and DB access (through RMB) and it's a bit easier to reason what happens there. 

For the sake of argument, assume the module exposes two endpoints /instances and /items, items being "linked" to the instances through a foreing-key (instanceId), each with individual access permissions (instances.list.get and items.list.get). 

Tod already noted that CQL is used only for filtering down result sets, it is not used for data manipulation (though it can be combined with data manipulation functions as a filtering mechanism to narrow down what needs to be manipulated). I'll add that CQL is also NOT used to control what information is retrieved back to the user -- it only controls if a bit of information is included in the result or not. In SQL terms, CQL only controls what goes in after the WHERE clause, it does not control what is specified right after the SELECT clause, in short: SELECT something FROM somewhere WHERE <criteria based on CQL>. So with CQL per se it's not possible to actually READ any information that were not explicitly made available in the API endpoint. In the above example -- you can't use CQL to read items through the /instances endpoint. For that you would need to have a new endpoint with a separate JSON Schema and with a new permission called e.g /instances-with-items. Since CQL only controls what goes after the WHERE clause it cannot manipulate the retrieval portion of your SELECT and include anything beyond what the endpoint (which usually directly maps to a table or a view) specifies. (Side note: this is also the major difference between CQL and GraphQL and also a major source of frustration by FOLIO developers, I wonder how does GraphQL deal with this "problem"?)

What CQL can do, however, is to filter/match on properties from "related" entities, if and only if, such link has been established in the so-called "schema.json" configuration file (I'm assuming RMB here, ERM uses a different CQL impl). Because of this, I guess, one could find out if a piece of information, he or she does not have direct access to, is present in the system or not, though still not read it. E.g I could find out if there is an item with a barcode "xyz" without having access to the /items endpoint but only having access to the /instances endpoint by querying it with "item.barcode='xyz'". I can't quite tell if this is a vulnerability or not. I guess there might be examples where this undesirable. I should stress though that for this search to work, the link must be explicit in "schema.json" -- and it sounds like we should avoid casually linking sensitive information this way anyway.

2. slowing or taking a module down 

This I think is a practical threat, though not specific to CQL -- we currently have such "vulnerabilities" present in the system through the unbounded "limit" parameter, if the "limit" is set too high attacker may shut the module down with OOM. Platform team is looking for a solution to the "limit" issue under https://issues.folio.org/browse/RMB-535.  In the CQL case, a simple "fix" could be limiting the size of the query (long queries may take more time/resources) and, which Marc already mentioned, rejecting queries that are not backed up by an explicit DB index -- currently we only flag such queries with WARN messages but pass them through as there are many "small" modules where lack of indexing still results in acceptable performance.

On Thu, Jan 30, 2020 at 1:57 AM Zak Burke <zburke@cornell.edu> wrote:
This is not a security issue; there is no injection vector.

An SQL injection attack happens when the _backend_ fails to properly sanitize its input. The situation described here asserts the _frontend_ is not properly sanitizing its input. There is not a security issue on the frontend because it has no special relationship with the backend. Everything sent from the frontend to an endpoint can be sent to an endpoint directly via cURL or Postman or what have you.

It may be worth considering whether we want to allow API endpoints to accept arbitrary CQL queries, but that’s not the issue described in this ticket.

It would be polite of the frontend to sanitize its input so folks could get results when they search for titles like

    Robert"; drop table students; —

instead of a syntax error about unmatched quotes. Is that a bug? Sure, but a syntax error is not a security issue.

Zak

> On Jan 29, 2020, at 5:10 PM, Kevin Day <kday@library.tamu.edu> wrote:
>
> Hello Tech Council,
>
> I original identified the problem described in https://issues.folio.org/browse/STRIPES-667 .
>
> While there may be concerns about CQL itself and its processing, my concern has been primarily with the forms in Stripes and their lack of sanitization. This lack of sanitization opens up a front-door.
>
> This is way too easy to do:
> step 1) Adding a single double-quote in the form input.
> step 2) Done; injection now possible.
>
> The sanitization would be to, at the very least, escape double-quotes.
> Additional sanitization rules might need to be applied depending on the specific use of that form input.
>
> There are multiple modules that exhibit the same behavior of not escaping the form input.
> This is cause for some concern.
>
> I believe that, from a security perspective, there needs to be a standardized (and well documented) way to ensure that form input is safe to use.
>
> Whether or not a specific case of CQL has any risks is moot until such time that form inputs themselves are properly sanitized.
>
> Thank You,
> Kevin Day
> To unsubscribe from this list please go to http://archives.simplelists.com

To unsubscribe from this list please go to http://archives.simplelists.com


--
--
Regards,
Jakub Skoczen

To unsubscribe from this list please go to http://www.simplelists.com/confirm.php?u=SeK0ArgpLZB2ijVHc1q8eivZ4CXy8J2w