Ir al contenido principal

Ralsina.Me — El sitio web de Roberto Alsina

Excel formula parsers are hell

On 2004 I wrote a spread­sheet in python, which was about a 25KB down­load (com­pressed). It was pret­ty func­tion­al!.

The main prob­lem that would nev­er let that pro­gram be a re­al ap­pli­ca­tion was that it used python as a for­mu­la lan­guage in­stead of the tra­di­tion­al Lo­tus/Ex­cel thing.

Yes­ter­day I de­cid­ed to look at it again, and see if there was a way around it [1]. My main as­set was that I know a fair bit more about parsers now than I did then, and how hard could it be?

Well, it turns out that lan­guage is a whole lot hard­er than I knew!

I found this page about a pars­er

Here's an ex­am­ple from it:

=IF(R[39]C[11]>65,R[25]C[42],ROUND((R[11]C[11]*IF(OR(AND(R[39]C[11]>=55,
R[40]C[11]>=20),AND(R[40]C[11]>=20,R11C3="YES")),R[44]C[11],R[43]C[11]))+
(R[14]C[11] *IF(OR(AND(R[39]C[11]>=55,R[40]C[11]>=20),AND(R[40]C[11]>=20,
R11C3="YES")), R[45]C[11],R[43]C[11])),0))

Yes, that's a valid Ex­cel for­mu­la. No, I have no idea what it does.

And here's some ex­tra nuggets about the lan­guage (quot­ed from the same page):

  • For some rea­­son Ex­­cel us­es a com­­ma as the range union op­er­a­­tor. This means that its on­­ly un­am­bigu­ous use be­tween ranges is out­­­side of a func­­tion call or with­­in a sub­­ex­pres­­sion (i.e., be­tween paren­the­s­es).

  • For some oth­­er rea­­son Ex­­cel us­es a space (or mul­ti­­ple spaces) as the range in­­ter­sec­­tion op­er­a­­tor. While not as am­bigu­ous as the com­­ma, it does re­quire some con­sid­er­a­­tion.

And it goes on, and on, and on....

But hey, he wrote a nice parser!

Just for kick­s, here's the best BNF I could find.

I re­al­ly don't en­vy the KSpread au­thors if they try to be com­pat­i­ble to this garbage!

But why is this lan­guage so gnarly?

  • Be­­cause it has been grow­ing or­­gan­i­­cal­­ly for 30 years?

  • Be­­cause of back­­wards com­­pat­i­­bil­i­­ty?

  • Be­­cause it was nev­er de­signed?

  • All of the above?

I mean, it can't be in­ten­tion­al! Noone ac­tu­al­ly meant it to be like this, right?

Anonymous / 2007-05-22 10:44:

You forgot the worst part: the function names are locale dependant!

So if you find a function on the web, you can't just paste it into Excel (if you don't have the English UI), you have to translate it first...


Contents © 2000-2023 Roberto Alsina