Showing posts with label POI. Show all posts
Showing posts with label POI. Show all posts

Wednesday, October 3, 2007

@BFNH This is a Big Fat Nasty Hack. Be ye' warned!

The above is my recent favorite @ tag. We all come upon those times when, no matter how much we try to avoid it, a Big Fat Nasty Hack is required. We try to make them pretty. We roll them in perfume, wrap them in silk and adorn them with flowers, but they remain BIG, FAT and NASTY. It's like a warthog in a wedding dress. Let's just call them what they are. The @BFNH tag calls these hacks out and exposes them to the world. Use it wisely.

I ran across this situation most recently when trying to insert cells into an Excel workbook that contained functions with references to named ranges. It would seem that Apache POI does not really support this. POI can read these cells in, but it is unable to provide a valid "function value" for the cell when calling HSSFCell.getFunctionValue(). Instead of a good function string, it gives something like "COUNT( NO IDEA )" or some such nonsense. Not being able to produce a function string for this type of function call, it also fails to parse a function of this type for insertion. I really needed this functionality. As I described in my previous post, I came upon a way to provide dynamic charts in Excel from Java. Seeing what could be done with this, my users went a little over-the-top. Excel reports with charts are something they have been requesting for a while, only to be rejected. So, I set upon the creation of the plethora of new "must have" reports. Many of the reports they want involved breaking up data into ranges and reporting on range statistics. They also wanted to be able to play with the raw data and see the results in the charts. This was a task.

What I decided to do was create named ranges referencing the desired "raw" data ranges for periods. Because this data is being queried based on the users parameters, the ranges must be calculated and created on the server. I planned to then reference these ranges in the cells of a table of period statistics. This would free me from calculating the statistics on the server and allow Excel to alter the charts as data in the ranges is changed. The only thing that was missing was how to go about setting a cell value to something like "MEDIAN("MyWorkbook.xls"!MyNamedRange)". I looked everywhere for a solution. I scoured the POI mailing lists and APIs, I tried a number of variations on the formula string, I searched Krugle like mad and I got nowhere. Not getting any response on lists or forums, and being one to take defeat likely, I took up my trusty debugger and set to probing the depths of POI. Let me warn those of weak constitution to stop reading here. The resulting code involves accessing the inaccessible, breaking the rules that the makers have set out for us and generally doing nasty things.

I created a number of cells that contained references of the type that I required, then created expressions in the debugger to expose the internal representation of the file that POI uses. I found that formulas are parsed into a list of references (to functions, ranges, etc.). I found that I could create formulas that resembled the ones that I desired, using a dummy static reference as a place holder and then swap the static reference for a dynamic one of my creation. This converts a formula like "AVERAGE(A:A)" to something like "AVERAGE("MyWorkbook.xls"!MyNamedRange)". The thing is, the POI guys/gals didn't really intend these things to be accessed directly. So, naturally, I promptly went against their wishes. The following is the result. Use it at your own risk, observe the comments and stated limitations, expand it, generalize it, whatever...but I would appreciate it if you relay any improvements back to me as well. ;)

    /**
* The purpose of this method is to HACK OUT a static range reference from
* POI's internal representation of a single range parameter function cell
* and replace it with a reference to a existing named range. Because the
* necessary fields and methods are not meant for general use, they are
* inaccessible. As a result, reflection was used to obtain the necessary
* references. This provides an opportunity for undetected errors to creep
* in, as changes to these methods would not be detectable at compile time.
* Furthermore, as non-accessible methods, they are more likely to be
* subject to change with future releases.
*
* The process for replacement of references is as follows: - Retrieve the
* FormulaRecordAggregate from the target cell. - Reduce the FormulaRecord's
* expression length by 2 (area references are larger than name references). -
* Set the calculate on load option for the cell to true (options = 2). -
* Create a new named range reference and set field_2_ilbl to the index of
* the desired name in the workbook's name table. - Remove the static area
* reference from the parsed expression (index = 0). - Insert the newly
* created named range reference into the parsed expression (index = 0). -
* Smile.
*
* @BFNH This is a Big Fat Nasty Hack. Be ye' warned!
* @param results the workbook being dealt with
* @param targetcell the cell in which the formula reference
* @param nameName the name of the named range to reference
*/
@SuppressWarnings("unchecked")
protected void hackFunctionCellToSetNewNamePointer(HSSFWorkbook results,
HSSFCell targetcell, String name) {
try {
short nameIndex = (short) (results.getNameIndex(name) + 1);

// Get the original formula record aggregate. This must be done
// via reflection, as it is not an accessible method.
Method method = HSSFCell.class.getDeclaredMethod(
"getCellValueRecord", new Class[] {});
method.setAccessible(true);
FormulaRecordAggregate original = (FormulaRecordAggregate) method
.invoke(targetcell);

// A static area reference is a couple of bytes larger than a name
// reference. So, if we don't reduce the formula record size, we
// will get corruption of the produced file.
List parsedExpression = original.getFormulaRecord()
.getParsedExpression();
short length = (short) (original.getFormulaRecord()
.getExpressionLength() - 2);
original.getFormulaRecord().setExpressionLength(length);

// Now we want to make sure that we cause the cell to be calculated
// on-load. The following line should accomplish that. Note, that
// if you have other options on the cell, they will be wiped out by
// this. Chances are that won't effect you though.
original.getFormulaRecord().setOptions((short) 2);

// Now we create a name pointer...the constructor I'm using here has
// not been implemented yet. If it were, we would likely not need to
// set the "field_2_ilbl" value. From what I can tell, this field is
// essentially the pointer to the name table in the workbook. We
// give it the name index and leave the rest alone.
NameXPtg nameptr = new NameXPtg(name);
Field nameTableIndex = NameXPtg.class
.getDeclaredField("field_2_ilbl");
nameTableIndex.setAccessible(true);
nameTableIndex.setShort(nameptr, nameIndex);

// Now we remove the static area reference from the parsed
// expression and add our name pointer in it's place. Please note
// that I have only tried this with functions that take a single
// name pointer and of those, only functions with 1 or 2 total
// parameters. In the case of two parameters, I have only tested
// when the second parameter is a constant. For example,
// This should work with: COUNT("MyFile.xls"!MyName) or
// AVERAGE("MyFile.xls"!MyName) or even
// QUARTILE("MyFile.xls"!MyName, 1). I would imagine that the
// process is similar for more complex function cells, but this is
// all I have needed, so that's as far as I've gone. My guess is
// that the length would have to be decremented once for each Area
// to Name conversion you do and I have no speculation about where
// the other area pointers might land in the parsed expression (in
// terms of their index).
parsedExpression.remove(0);
parsedExpression.add(0, nameptr);
} catch (Exception e) {
// You screwed up and I don't care...write your own error handling.
// ;)
log.error("Bad developer...BAD!");
}
}

Friday, August 10, 2007

Excel Charting in Java

Ahh, Excel...  I hate it.  I find it rather unintuitive, to be honest.  No, no no...that is not quite saying enough.  I find it nearly impossible to use.  Business folk however, love it.  In every company that I have worked for, every user group I have served has, eventually, asked for something to be output to Excel.  Of course, for us Java folk, there is good old Apache POI.  Now, POI is pretty good.  I've used it in a number of projects and have not had any stability issues.  Performance is generally acceptable when dealing with reasonable amounts of data.  It does however, have a good number of limitations.  A large subset of the Excel functions are still unimplemented and there is almost no support for charts (To be specific: charts will be preserved but cannot be altered).  This is often a big stumbling block.  Furthermore, building spreadsheets with method calls is not a very efficient process for the developer.  Creating just a single spreadsheet can require a lot of code.

This is the problem I found myself up against recently.  In searching for a possible solution, I came up with a couple of really useful finds.  The first of which was jXLS.  jXLS is an excel templating framework of sorts.  It allows a designer to insert tags into a template spreadsheet which will be interpreted by jXLS and populated appropriately.  The tags utilize an expression language very similar to the EL used in JSPs or XSLT.  Developers who are comfortable with either of these technologies should feel right at home with jXLS.

Basically, you can create an excel sheet and throw in some expressions like so:

jXLSexpressions.png

Then you run in through jXLS with something similar to:

Map beans = new HashMap();
beans.put("item", items);
XLSTransformer transformer = new XLSTransformer();
HSSFWorkbook results  = transformer.transformXLS(Thread.currentThread().getContextClassLoader()
    .getResourceAsStream("daysToProvideInstructions.xls"), beans);

As you can see by the fact that we get a HSSFWorkbook back from this method call, jXLS is based around POI, but it does the having lifting for the data insertion.  If you want to shove that file back out via a HttpServletResponse, you don't need much:

response.setContentType("application/vnd.ms-excel");
response.setHeader("Content-Disposition","attachment; filename=warrantyAgingReport.xls");
results.write(response.getOutputStream());

Put all this in your web app and you'll get:

jXLSpopulated.png

So, jXLS gives me a nice and easy solution for populating a spreadsheet with data...but, we still cannot edit the ranges of any tables that we create.  Furthermore, if we want functions to be evaluated when the spreadsheet is opened, we have to evaluate them in POI, right?  Named Ranges to the rescue!  It just so happens that we can define a dynamic range in Excel and give it a name.  We can then use those named ranges in charts, functions, etc.  This results in a those items being dynamically evaluated and gives us the results we want.  So following Micro$oft's own instructions on the matter, we can create a couple of ranges in our template spreadsheet like so:

namedExcelRanges.png

Using those ranges in a chart gives us the following:

dynamicallyRangedChart.png

Now we have the dynamic Excel charts we need, generated via our Java web application with very little excess effort.  So, while I still hate Excel...today I hate it from Java just a little less. :)