When I’ve shared this discovery with my co-workers, they told me I’d better not write any blog post about this, because god forbid it is actually a bug, and somebody will fix it, and we’ll loose this feature. But I hope it’s not!
… For years I’ve being complaining about the fact, that Postgres functions are atomic, meaning there is no way to have transactions inside the function, thereby it’s impossible to commit intermediate results, it’s always either all or nothing. Not like I really wanted to have the checkpoints and such, but processing huge data volumes without the option of committing the intermediate results is at least challenging. You are bound to have long-running transactions, extensive locks and such. I really missed this option I had with Oracle functions to be able to commit each 100,000 records….
For a while I’ve being asking the lead Postgres contributors, “how much longer”, and for a while they were replying – in the next release, until they just stopped replying…
So, the other day I was testing my new function, which is building a table out of multiple materialized views, and for each INSERT I have a prepared statement, which is executed by a single EXECUTE operator. When the execution crashed, because one of the materialized views which meant to be on place was not, I was think: well… now I need to start all over again… and to my surprise I saw, that all inserts which happened before this crash, persisted!
So, let’s re-iterate. If might be the same function, but if the SQL statements are executed as generated statements using EXECUTE operator, each of the executions will be treated as a separate transaction! Which is pretty awesome, keeping in mind that we need to insert over 18 million records! And no, I do not mean I am going to insert 18 million times 🙂