Hi Tod, I’ll try to answer your questions.

On Wed, 29 Jan 2020 at 20:18, Tod Olson <tod@uchicago.edu> wrote:
I'm trying to think quickly about malicious ways to use CQL injection:

(1) As I understand, CQL is query-only, there is no way to make a CQL query that causes an update to the system. Is that true? No sneaky way to slip in an update? If that is true, good, one concern is addressed.

This is mostly true. 

Increasingly folks have asked for bulk endpoints that use CQL for record selection, either to delete or in unison with some form of transformation.

I believe delete using CQL has been implemented in a limited number of places.


(2) What about reading sensitive information? Could one use injection to retrieve information that should be protected, like user addresses or purchase prices (which are sometimes confidential by contract, especially for large packages of electronic resources)? The injection examples make it seem like one could request any data in a module where one has read privileges. This is probably my main concern.


For the most part, the collection endpoints where CQL can be provided use a coarse grained permission that either allows or disallows access.

Only tokens for users or other modules that have that permission should be able to make requests to that endpoint.

However once they do, they can receive any records in that collection, which might include additional information that they wouldn’t normally have access to.

This last part is fairly standard, for example, loans provided by the business logic API have a small amount of item information included with them, even if the user does not have direct access to items.

It might be that some modules have introduced finer grained permissions that can affect search results or what is included in the records.

I don’t know of an example of that off the top of my head. The only core modules that I think might have done that is in acquisitions.


(3) Could CQL injection be crafted simply to consume inordinate resources, slow the system, or take a module down? 

I think it is possible to construct CQL that could use lots of resources. I’m not aware of specific examples or exploitations. 

At present it would likely take the form of using a CQL index not supported by a PostgreSQL index or a combination of CQL indexes that generates an very expensive SQL query.

I believe the Core Platform team may already have work outlined that could reduce or mitigate some of this. For example, only allowing CQL queries to use indexes that have a corresponding PostgreSQL index defined.

Theoretically, there might be a variant of this where specially crafted CQL could generate injected SQL, in the sense that it exploits the transformation process from CQL to SQL to generate SQL that shouldn’t be possible. This is pure speculation on my part, I’m not aware of any examples of this. It may well be the CQL syntax, parsing or translation to SQL implementation already prevents this.

Jakub may be able to expand upon this area better than I can.


I do think we want to close that injection hole, but for prioritizing I'm trying to focus on potential malicious uses. (2) above makes me a bit nervous for reasons of privacy and contractual obligation.

-Tod

On Jan 29, 2020, at 11:34 AM, Mike Gorrell <mdg@indexdata.com> wrote:

This was raised at the end of today's TC call - a Texas A&M dev has identified an issue that they thought should be raised to the Security Group - which doesn't exist yet.

The Tech Council is the closest thing we have to that security group at this time. I would like to ask this group to weigh in on this issue:


And comment on the issue as a potential security concern as well as how urgently we might want to address it.

Please correspond in email. Feel free to invite others who aren't officially on the TC to be part of this email thread.

Thanks.

-mdg