Be cautious about using Chartio (or at least, don’t follow their directions)
In the past couple of years the number of web startups aiming to help other web startups with various tasks has grown immensely. That’s probably a good thing; more innovation for the web is always welcome, and a great way to drive innovation is to have good tools readily available. On the other hand, any time you have a bunch of newcomers to a space, there’s going to be some rough edges involved as they learn some of the lessons that older participants learned the hard way. There’s also going to be some risks taken in the name of the aforementioned innovation, or perhaps in the name of “disruption” (a more vague goal).
Case in point: Chartio. To quote their site, “Chartio is your data’s interface. It’s simple to set up, easy to use, and provides business intelligence for the world’s most popular data sources.”
The service they offer is definitely a useful one – there’s a bunch of other companies that also try to provide it, each in their own way, and with various trade-offs. I’ve certainly seen some grotesque systems built for the purpose of providing business analysts with tools to do their job, and I’m willing to bet that Chartio’s internal code is a lot cleaner than a lot of that which I’ve come across in the past.
Where things go off the rails, however, is the frankly astounding compromises Chartio wants its clients to make in order to enable the “simple to set up and easy to use” parts of their pitch. If we click through the landing page to get to the setup instructions (we’ll use their MySQL instructions as an example), we find that Chartio essentially wants to connect directly to your site’s production database.
If you’re a systems administrator, alarm bells are probably going off in your mind now. Oh, but supposedly, it’s okay – they’ll use an encrypted SSH tunnel so that instead of you opening a hole in your firewall for them, they’ll bypass your firewall for you. Well, unless you don’t have shell access, in which case you will have to open a hole in your firewall for them.
Sure, that SSH tunnel might be difficult for an third-party attacker to break into, but what about compromises of Chartio’s servers? Whatever Chartio machines are on the other ends of these tunnels are veritable goldmines if a malicious user can compromise them, with active firewall-bypassing connections to a multitude of companies’ database servers. By opening up a tunnel, it’s effectively reducing your network’s defenses to the lowest common denominator of your existing defenses or Chartio’s. While I’m sure the people behind Chartio are just as dedicated to security as any of us, their entire company is 8 people, and only half of those are even engineers, let alone actively working on security.
It’s okay, though – even if someone did get control of Chartio’s servers and credentials, they specifically have you set it up so that they connect to your database as a read-only user. So an attacker couldn’t delete all your data or anything malicious like that. Except, well, there’s still that matter of being able to read all of your data. At least, if you follow their setup instructions (again using MySQL as an example):
GRANT SELECT, SHOW VIEW ON $database_name.* TO $user@`rackspace1.chart.io` IDENTIFIED BY '$password'; FLUSH PRIVILEGES;
See the * wildcard on the end of that second line, giving access to every single table in the database? After all, it’d be a hassle to grant access to only specific tables that it would make sense for business analysts to examine, and disallow access to things like your users’ sensitive data. It might also mean that your analysts’ time is wasted asking engineers to add new tables for them to read when they need access to data that’s not on the whitelist.
Of course, that doesn’t even take into account the harm that non-malicious users can do. While I don’t know the exact extent of the queries that a user of Chartio can cause it to run, it’s certainly possible to impact the performance of a database by issuing read-only queries that happen to result in large, inefficient scans of tables. One might hope that Chartio has built-in protections against this, but given the wide variety of databases they inter-operate with, all of which has varying levels of similarity in their query semantics, it seems unlikely that every query coming from Chartio is going to be perfectly optimized for the data it’s running against.
Let me make it clear that I don’t hate Chartio or anything – and they’re certainly not unique in making some of the choices I’ve highlighted above. My real goal here is just to make people more aware of the security trade-offs they are making when they use these kinds of methods to enable third-party services. It’s quite possible that the risks I’ve highlighted above are ones that you feel it’s okay to take, and in that case, go for it – as long as you’re respecting your end users’ interests as well. Just try to be cognizant of the risks you’re taking, and not just plug in new things because they’re shiny. Also realize that there may be ways that you can adjust the risks you’re taking – such as not using wildcard grants, as I mentioned above.
I would love to see Chartio develop some alternative methods of data acquisition that didn’t involve plugging their servers directly into your database, or at least have some guides on their site about good data isolation practices (e.g. restricting access to only tables that are really relevant to business analysts, and partitioning other sensitive user data into separate tables). That would be good for both Chartio’s customers (in that their overall approach to data security would improve) and also for Chartio (who might garner a little extra goodwill for helping that to happen). I expect that it might take some time before that happens, though, given that Chartio is a startup and has limited personnel resources to devote to all of their endeavors.