Be cautious about using Chartio (or at least, don’t follow their directions)

In the past couple of years the number of web startups aiming to help other web startups with various tasks has grown immensely. That’s probably a good thing; more innovation for the web is always welcome, and a great way to drive innovation is to have good tools readily available. On the other hand, any time you have a bunch of newcomers to a space, there’s going to be some rough edges involved as they learn some of the lessons that older participants learned the hard way. There’s also going to be some risks taken in the name of the aforementioned innovation, or perhaps in the name of “disruption” (a more vague goal).

Case in point: Chartio. To quote their site, “Chartio is your data’s interface. It’s simple to set up, easy to use, and provides business intelligence for the world’s most popular data sources.”

The service they offer is definitely a useful one – there’s a bunch of other companies that also try to provide it, each in their own way, and with various trade-offs. I’ve certainly seen some grotesque systems built for the purpose of providing business analysts with tools to do their job, and I’m willing to bet that Chartio’s internal code is a lot cleaner than a lot of that which I’ve come across in the past.

Where things go off the rails, however, is the frankly astounding compromises Chartio wants its clients to make in order to enable the “simple to set up and easy to use” parts of their pitch. If we click through the landing page to get to the setup instructions (we’ll use their MySQL instructions as an example), we find that Chartio essentially wants to connect directly to your site’s production database.

Wait, what?

If you’re a systems administrator, alarm bells are probably going off in your mind now. Oh, but supposedly, it’s okay – they’ll use an encrypted SSH tunnel so that instead of you opening a hole in your firewall for them, they’ll bypass your firewall for you. Well, unless you don’t have shell access, in which case you will have to open a hole in your firewall for them.

Wait, what?

Sure, that SSH tunnel might be difficult for an third-party attacker to break into, but what about compromises of Chartio’s servers? Whatever Chartio machines are on the other ends of these tunnels are veritable goldmines if a malicious user can compromise them, with active firewall-bypassing connections to a multitude of companies’ database servers. By opening up a tunnel, it’s effectively reducing your network’s defenses to the lowest common denominator of your existing defenses or Chartio’s. While I’m sure the people behind Chartio are just as dedicated to security as any of us, their entire company is 8 people, and only half of those are even engineers, let alone actively working on security.

It’s okay, though – even if someone did get control of Chartio’s servers and credentials, they specifically have you set it up so that they connect to your database as a read-only user. So an attacker couldn’t delete all your data or anything malicious  like that. Except, well, there’s still that matter of being able to read all of your data. At least, if you follow their setup instructions (again using MySQL as an example):

GRANT SELECT, SHOW VIEW
    ON $database_name.*
    TO $user@`rackspace1.chart.io` IDENTIFIED BY '$password';
FLUSH PRIVILEGES;

See the * wildcard on the end of that second line, giving access to every single table in the database? After all, it’d be a hassle to grant access to only specific tables that it would make sense for business analysts to examine, and disallow access to things like your users’ sensitive data. It might also mean that your analysts’ time is wasted asking engineers to add new tables for them to read when they need access to data that’s not on the whitelist.

Of course, that doesn’t even take into account the harm that non-malicious users can do. While I don’t know the exact extent of the queries that a user of Chartio can cause it to run, it’s certainly possible to impact the performance of a database by issuing read-only queries that happen to result in large, inefficient scans of tables. One might hope that Chartio has built-in protections against this, but given the wide variety of databases they inter-operate with, all of which has varying levels of similarity in their query semantics, it seems unlikely that every query coming from Chartio is going to be perfectly optimized for the data it’s running against.

Afterword

Let me make it clear that I don’t hate Chartio or anything – and they’re certainly not unique in making some of the choices I’ve highlighted above. My real goal here is just to make people more aware of the security trade-offs they are making when they use these kinds of methods to enable third-party services. It’s quite possible that the risks I’ve highlighted above are ones that you feel it’s okay to take, and in that case, go for it – as long as you’re respecting your end users’ interests as well. Just try to be cognizant of the risks you’re taking, and not just plug in new things because they’re shiny. Also realize that there may be ways that you can adjust the risks you’re taking – such as not using wildcard grants, as I mentioned above.

would love to see Chartio develop some alternative methods of data acquisition that didn’t involve plugging their servers directly into your database, or at least have some guides on their site about good data isolation practices (e.g. restricting access to only tables that are really relevant to business analysts, and partitioning other sensitive user data into separate tables). That would be good for both Chartio’s customers (in that their overall approach to data security would improve) and also for Chartio (who might garner a little extra goodwill for helping that to happen). I expect that it might take some time before that happens, though, given that Chartio is a startup and has limited personnel resources to devote to all of their endeavors.

Posted on May 10, 2013, in Software Development and tagged . Bookmark the permalink. 7 Comments.

  1. As posted on HN

    Hi, founder of Chartio here. Other than the title (which makes me a little sad) I liked your post. Its great to fully inform people of the security tradeoffs and you’ve done a nice job of laying out the levels and options of security that we’ve spent a lot of time developing.
    In anything that is cloud based there is going to be some level where some hacker could get in and destroy everything. Most people on this site use cloud hosted servers, all of which would be at risk if Amazon or Rackspace got hacked. BI in the cloud is a new space and will be cautiously entered by some, but benefits will outweigh the potential risks and just as has happened in every other segment of cloud computing.

    (will write more soon)

  2. Hey.
    I think Dave has been particularily nice and understanding about your article, that’s great !
    Personally, I dislike this kind of post. In fact, as soon as it’s related to security, I start being cautious.
    It’s damn too easy to tell this and that about one’s security issues, or, just like you do, the “potential” security issues their architecture MAY present. I agree with Linus on that point, security is almost always a matter of intellectual masturbation, a whole bunch of “what if”.

    I tell you, you could write an article about the poor security model and the flaws of almost ANY platform. Then you add a sexy little name, just like you did, to attract readers, and serve them a lesson about YOUR point of view about the security of something, which had almost nothing to see with the heinous title, all of that served on a security-awarded wordpress blog, and you get a ton of readers.

    It’s too easy, it’s bad, it has negative impact on chartio’s reputation for nothing, so really, do something else of your life. I don’t know chartio much but my guess is that it’s far more interesting to study their architecture and in general, how to build excellent software than pointing out miserabily potential flaws that did not even harm you. You’re not helping anybody nor any community here.

    Regards,

    • I suppose we’ll have to disagree about whether this kind of post is useful. Security always seems like “intellectual masturbation” until something gets broken into, and then it’s not.

      As far as titles go; I really don’t care about reader count – Dave pointed out on HN that the title was kind of harsh and I agreed, and I toned it down *because* I agreed with him.

      The entire blog post came out of a discussion with other colleagues from a handful of companies who actively work in the industry as infrastructure engineers and thus run into BI people asking them to set up stuff like this without any care for the security aspects. So yes, actually, I am “helping anybody or any community here” – they now have something to point at and tell their BI people “we can consider it, but we also need to take these things into account”.

      Furthermore, as mentioned above and the discussion on Hacker News has shown, there *are* ways that things could be fairly easily improved (notes about restricting access to only the tables that makes sense, suggestions to have a second analytics-only database, et cetera). That’s a useful discussion to have, and yes, Dave seems to have been taking that feedback quite well.

      ~A

      P.S. “security-awarded wordpress blog” – er, what?

  3. Well, I guess I got a little too excited there maybe, especially after having read your explanations. But definitely, something was disturbing in this article. Maybe in the original title, or something in the tone… I usually never react to anything I read. But anyway, we both made out points, so… sorry if my answer was a little warm.

  4. Why would anyone even *think* about granting a third-party system access to a live production system? How do you know if the Chartio queries that will be running don’t take down your entire database (performance-wise)? How about the security implications that Amber pointed out?
    So instead how about regularly replicating the relevant tables to a different database, maybe even on a different server, in that process anonymizing critical (user-identifying) data, and granting Chartio access to that database instead? Isn’t that what every sane database administrator would do?

  5. Great postin spirit – security should be given more thought than it seems to receive.

    However, in defense of Chartio:

    1. Their website does recommend connecting to a read-only replicant rather than the production master (at least in the Postgres instructions).

    2. You have the option to write raw SQL and inspect the queries that their tool uses if you’re still concerned about performance hits against the replicant DB.

    3. A good sysadmin shouldn’t configure a DB to contain both highly sensitive data and data thy would be interesting to report on. Anyone who has gone through PCI certification knows this – a “red zone” DB with tight access controls should contain password hashes, SSN, credi card info, etc., insulated from domain-specific data (thy which is more interesting to BI).

    I do think it would be prudent for Chartio to at least mention the option of granting more granular table access, but still, if you’re the engineer integrating with Chartio (i.e. having root DB access), you should probably already know better.

    Ross