ggsql: A Grammar of Graphics for SQL

thomasp85 390 points 78 comments April 20, 2026
opensource.posit.co · View on Hacker News

Discussion Highlights (20 comments)

thomasp85

The new visualisation tool from Posit. Combines SQL with the grammar of graphics, known from ggplot2, D3, and plotnine

kasperset

Looks intriguing. Brings plotting to Sql instead of “transforming” sql for plotting.

radarsat1

Wow, love this idea.

anentropic

Maybe I skim read it too fast, but I did not find any clear description in the blog post or website docs of how this relates to SQL databases I was kind of guessing that it doesn't run in a database, that it's a SQL-like syntax for a visualisation DSL handled by front end chart library. That appears to be what is described in https://ggsql.org/get_started/anatomy.html But then https://ggsql.org/faq.html has a section, "Can I use SQL queries inside the VISUALISE clause," which says, "Some parts of the syntax are passed on directly to the database". The homepage says "ggsql interfaces directly with your database" But it's not shown how that happens AFAICT confused

efromvt

Love the layering approach - that solves a problem I’ve had with other sql/visual hybrids as you move past the basics charts.

kasperset

Will this ever integrate rest of the ggplot2 dependent packages described here: https://exts.ggplot2.tidyverse.org/gallery/ in the near or distant future? Sorry if it already mentioned somewhere.

dartharva

Would be awesome if somehow coupled into Evidence.dev

gh5000

It is conceivable that this could become a duckdb extension, such that it can be used from within the duckdb CLI? That would be pretty slick.

getnormality

I skimmed the article for an explanation of why this is needed, what problem it solves, and didn't find one I could follow. Is the point that we want to be able to ask for visualizations directly against tables in remote SQL databases, instead of having to first pull the data into R data frames so we can run ggplot on it? But why create a new SQL-like language? We already have a package, dbplyr, that translates between R and SQL. Wouldn't it be more direct to extend ggplot to support dbplyr tbl objects, and have ggplot generate the SQL? Or is the idea that SQL is such a great language to write in that a lot of people will be thrilled to do their ggplots in this SQL-like language? EDIT: OK, after looking at almost all of the documentation, I think I've finally figured it out. It's a standalone visualization app with a SQL-like API that currently has backends for DuckDB and SQLite and renders plots with Vegalite. They plan to support more backends and renderers in the future. As a commenter below said, it's supposed to help SQL specialists who don't know Python or R make visualizations.

data_ders

ok, this is definitely up my alley. color me nerd-sniped and forgive the onslaught of questions. my questions are less about the syntax, which i'm largely familiar with knowing both SQL and ggplot. i'm more interested in the backend architecture. Looking at the Cargo.toml [1], I was surprised to not see a visualization dependency like D3 or Vega. Is this intentional? I'm certainly going to take this for a spin and I think this could be incredible for agentic analytics. I'm mostly curious right now what "deployment" looks like both currently in a utopian future. utopia is easier -- what if databases supported it directly?!? but even then I think I'd rather have databases spit out an intermediate representation (IR) that could be handed to a viz engine, similar to how vega works. or perhaps the SQL is the IR?! another question that arises from the question of composability: how distinct would a ggplot IR be from a metrics layer spec? could i use ggsql to create an IR that I then use R's ggplot to render (or vise versa maybe?) as for the deployment story today, I'll likely learn most by doing (with agents). My experiment will be to kick off an agent to do something like: extract this dataset to S3 using dlt [2], model it using dbt [3], then use ggsql to visualize. p.s. @thomasp85, I was a big fan of tidygraph back in the day [4]. love how small our data world is. [1]: https://github.com/posit-dev/ggsql/blob/main/Cargo.toml [2]: https://github.com/dlt-hub/dlt [3]: https://github.com/dbt-labs/dbt-fusion [4]: https://stackoverflow.com/questions/46466351/how-to-hide-unc...

hei-lima

Really cool!

jiehong

Outstanding! This can replace a lot of Excel in the end. It makes so much sense now that it exists!

rvba

1) does this alllw to export to Excel? 2) how to make manual adjustments?

jorin

Really cool project! Would love to see a standard established for representing visualizations in SQL! I built a whole dashboarding tool on top of the idea: https://taleshape.com/shaper/docs/getting-started/ But Shaper takes a more pragmatic approach and just uses built in functionality to describe how to visualize the results. The most value I see with viz as SQL is that it's a great format for LLMs to specify what they want while making it easy to audit and reproduce. Just built a slack bot on top of that concept last week: https://taleshape.com/blog/build-your-own-data-analytics-sla...

breakfastduck

This is fantastic. Feels like something that should've been in there from the start!

jiehong

The cli only produces vega-lite[0] json graphics, right? It would be nice if it included a rendering engine. [0]: https://github.com/vega/vega-lite

nicoritschel

This is neat. I do wish there was a way for this to gracefully degrade in contexts without support for the grammar, though. I devised a similar in spirit (inside SQL, very simplified vs GoG) approach that does degrade (but doesn't read as nice): https://sqlnb.com/spec

lmeyerov

This is great We reached a similar conclusion for GFQL (oss graph dataframe query language), where we needed an LLM-friendly interface to our visualization & analytics stack, especially without requiring a code sandbox. We realized we can do quite rich GPU visual analytics pipelines with some basic extensions to opencypher . Doing SQL for the tabular world makes a lot of sense for the same reasons! For the GFQL version (OpenCypher), an example of data loading, shaping, algorithmic enrichment, visual encodings, and first-class pipelines: - overall pipelines: https://pygraphistry.readthedocs.io/en/latest/gfql/benchmark... - declarative visual encodings as simple calls: https://pygraphistry.readthedocs.io/en/latest/gfql/builtin_c...

urams

What feels like a lifetime ago, I made almost all of the R tooling at Uber and actually implemented what was effectively exactly this on top of the R DB tooling. Everyone, I think correctly in retrospect, I showed it to thought it was more or less useless. I had hoped it would be a nice way to pull and instantly visualize data but that was rarely all that valuable like this.

semmulder

This seems really cool, will try it out soon! What made it click for me was the following snippet from: https://ggsql.org/get_started/grammar.html > We’ve tried to make the learning curve as easy as possible by keeping the grammar close to the SQL syntax that you’re already familiar with. You’ll start with a classic SELECT statement to get the data that you want. Then you’ll use VISUALIZE (or VISUALISE ) to switch from creating a table of data to creating a plot of that data. Then you’ll DRAW a layer that maps columns in your data to aesthetics (visual properties), like position, colour, and shape. Then you tweak the SCALEs, the mappings between the data and the visual properties, to make the plot easier to read. Then you FACET the plot to show how the relationships differ across subsets of the data. Finally you finish up by adding LABELs to explain your plot to others. This allows you to produce graphics using the same structured thinking that you already use to design a SQL query.

Semantic search powered by Rivestack pgvector
5,126 stories · 48,318 chunks indexed