Wednesday, October 3, 2007

@BFNH This is a Big Fat Nasty Hack. Be ye' warned!

The above is my recent favorite @ tag. We all come upon those times when, no matter how much we try to avoid it, a Big Fat Nasty Hack is required. We try to make them pretty. We roll them in perfume, wrap them in silk and adorn them with flowers, but they remain BIG, FAT and NASTY. It's like a warthog in a wedding dress. Let's just call them what they are. The @BFNH tag calls these hacks out and exposes them to the world. Use it wisely.

I ran across this situation most recently when trying to insert cells into an Excel workbook that contained functions with references to named ranges. It would seem that Apache POI does not really support this. POI can read these cells in, but it is unable to provide a valid "function value" for the cell when calling HSSFCell.getFunctionValue(). Instead of a good function string, it gives something like "COUNT( NO IDEA )" or some such nonsense. Not being able to produce a function string for this type of function call, it also fails to parse a function of this type for insertion. I really needed this functionality. As I described in my previous post, I came upon a way to provide dynamic charts in Excel from Java. Seeing what could be done with this, my users went a little over-the-top. Excel reports with charts are something they have been requesting for a while, only to be rejected. So, I set upon the creation of the plethora of new "must have" reports. Many of the reports they want involved breaking up data into ranges and reporting on range statistics. They also wanted to be able to play with the raw data and see the results in the charts. This was a task.

What I decided to do was create named ranges referencing the desired "raw" data ranges for periods. Because this data is being queried based on the users parameters, the ranges must be calculated and created on the server. I planned to then reference these ranges in the cells of a table of period statistics. This would free me from calculating the statistics on the server and allow Excel to alter the charts as data in the ranges is changed. The only thing that was missing was how to go about setting a cell value to something like "MEDIAN("MyWorkbook.xls"!MyNamedRange)". I looked everywhere for a solution. I scoured the POI mailing lists and APIs, I tried a number of variations on the formula string, I searched Krugle like mad and I got nowhere. Not getting any response on lists or forums, and being one to take defeat likely, I took up my trusty debugger and set to probing the depths of POI. Let me warn those of weak constitution to stop reading here. The resulting code involves accessing the inaccessible, breaking the rules that the makers have set out for us and generally doing nasty things.

I created a number of cells that contained references of the type that I required, then created expressions in the debugger to expose the internal representation of the file that POI uses. I found that formulas are parsed into a list of references (to functions, ranges, etc.). I found that I could create formulas that resembled the ones that I desired, using a dummy static reference as a place holder and then swap the static reference for a dynamic one of my creation. This converts a formula like "AVERAGE(A:A)" to something like "AVERAGE("MyWorkbook.xls"!MyNamedRange)". The thing is, the POI guys/gals didn't really intend these things to be accessed directly. So, naturally, I promptly went against their wishes. The following is the result. Use it at your own risk, observe the comments and stated limitations, expand it, generalize it, whatever...but I would appreciate it if you relay any improvements back to me as well. ;)

    /**
* The purpose of this method is to HACK OUT a static range reference from
* POI's internal representation of a single range parameter function cell
* and replace it with a reference to a existing named range. Because the
* necessary fields and methods are not meant for general use, they are
* inaccessible. As a result, reflection was used to obtain the necessary
* references. This provides an opportunity for undetected errors to creep
* in, as changes to these methods would not be detectable at compile time.
* Furthermore, as non-accessible methods, they are more likely to be
* subject to change with future releases.
*
* The process for replacement of references is as follows: - Retrieve the
* FormulaRecordAggregate from the target cell. - Reduce the FormulaRecord's
* expression length by 2 (area references are larger than name references). -
* Set the calculate on load option for the cell to true (options = 2). -
* Create a new named range reference and set field_2_ilbl to the index of
* the desired name in the workbook's name table. - Remove the static area
* reference from the parsed expression (index = 0). - Insert the newly
* created named range reference into the parsed expression (index = 0). -
* Smile.
*
* @BFNH This is a Big Fat Nasty Hack. Be ye' warned!
* @param results the workbook being dealt with
* @param targetcell the cell in which the formula reference
* @param nameName the name of the named range to reference
*/
@SuppressWarnings("unchecked")
protected void hackFunctionCellToSetNewNamePointer(HSSFWorkbook results,
HSSFCell targetcell, String name) {
try {
short nameIndex = (short) (results.getNameIndex(name) + 1);

// Get the original formula record aggregate. This must be done
// via reflection, as it is not an accessible method.
Method method = HSSFCell.class.getDeclaredMethod(
"getCellValueRecord", new Class[] {});
method.setAccessible(true);
FormulaRecordAggregate original = (FormulaRecordAggregate) method
.invoke(targetcell);

// A static area reference is a couple of bytes larger than a name
// reference. So, if we don't reduce the formula record size, we
// will get corruption of the produced file.
List parsedExpression = original.getFormulaRecord()
.getParsedExpression();
short length = (short) (original.getFormulaRecord()
.getExpressionLength() - 2);
original.getFormulaRecord().setExpressionLength(length);

// Now we want to make sure that we cause the cell to be calculated
// on-load. The following line should accomplish that. Note, that
// if you have other options on the cell, they will be wiped out by
// this. Chances are that won't effect you though.
original.getFormulaRecord().setOptions((short) 2);

// Now we create a name pointer...the constructor I'm using here has
// not been implemented yet. If it were, we would likely not need to
// set the "field_2_ilbl" value. From what I can tell, this field is
// essentially the pointer to the name table in the workbook. We
// give it the name index and leave the rest alone.
NameXPtg nameptr = new NameXPtg(name);
Field nameTableIndex = NameXPtg.class
.getDeclaredField("field_2_ilbl");
nameTableIndex.setAccessible(true);
nameTableIndex.setShort(nameptr, nameIndex);

// Now we remove the static area reference from the parsed
// expression and add our name pointer in it's place. Please note
// that I have only tried this with functions that take a single
// name pointer and of those, only functions with 1 or 2 total
// parameters. In the case of two parameters, I have only tested
// when the second parameter is a constant. For example,
// This should work with: COUNT("MyFile.xls"!MyName) or
// AVERAGE("MyFile.xls"!MyName) or even
// QUARTILE("MyFile.xls"!MyName, 1). I would imagine that the
// process is similar for more complex function cells, but this is
// all I have needed, so that's as far as I've gone. My guess is
// that the length would have to be decremented once for each Area
// to Name conversion you do and I have no speculation about where
// the other area pointers might land in the parsed expression (in
// terms of their index).
parsedExpression.remove(0);
parsedExpression.add(0, nameptr);
} catch (Exception e) {
// You screwed up and I don't care...write your own error handling.
// ;)
log.error("Bad developer...BAD!");
}
}

No comments: