Reverse Engineer Apache Jackrabbit Setup

travisdh1

@anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

@travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

@anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

@travisdh1 said in Reverse Engineer Apache Jackrabbit Setup:

@anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

@dafyre said in Reverse Engineer Apache Jackrabbit Setup:

@anthonyh said in Reverse Engineer Apache Jackrabbit Setup:

I think I may go down a less elegant, but something I can put together more quickly, method.

I discovered that once I'm logged into the system (it's web based), I can simply browse to the document retrieval URL and stick the appropriate document ID in said URL. This will spit out said document.

I can script this via Lynx on a Linux VM relatively easily.

All we need to do is dump the desired document IDs to a list that I can then read on the Lynx side and, boom, we'll have the docs to do with as we please.

You could also browse the database tables and figure out where said document IDs live, that way you can simply pull straight from the DB.

If I could do that, I would. The DB is in no way/shape/form readable by anything other than Jackrabbit. This was just confirmed by the vendor of the system. They actually just suggested exactly what I'm working on doing (after my boss had what he calls a "come to Jesus" moment with them).

Hrm, let me guess, they're storing entire tables of values from PHP in single database columns? That is so very highly annoying, and goes against everything relational databases are supposed to be. I've had bad experiences with this in Drupal myself.

No, it's not doing that. What it's doing kinda makes sense (at least from the limited sleuthing knowledge I have), it's just organized for Jackrabbit and not for a human. There are 6 tables:

GOBAL_REVISION - Not sure what this is, we only have one record here. I believe it has to do with clustering (there are 4 app servers and Jackrabbit runs on each app).
JOURNAL - I believe this is something to do with clustering as well.
BINVAL - Where the documents are stored, I believe. There are two colums, BINVAL_ID and BINVAL_DATA.
BUNDLE - Not sure what this is.
NAMES - A reference table for various object names.
REFS - Empty in our implementation.

From what I've researched, the docs are stored in hexidecimal format. However, when I pull the BINVAL_DATA field for a given record and convert from hex to binary, the file is unreadable. Even if I could successfully convert the doc, the IDs for these records do not correspond to the IDs that we see on the front-end. I have not found any sort of relationship table/list in the front-end database, I suspect it's all done via Jackrabbit.

VINVAL_DATA is probably the raw jpg/gif/whatever, I'd be surprised if you needed to convert it.

Overall, Jackrabbit sounds like it was designed horribly, and you've found the best option out of the bad choices you have

Looks like BINVAL_DATA is a byte array type. This link below, though not Jackrabbit specific, shows how to convert between a file and byte array.

http://www.programcreek.com/2009/02/java-convert-a-file-to-byte-array-then-convert-byte-array-to-a-file/

The more I find out about this thing, the more my dislike is turning to hate.... just saying.

anthonyh

lol @travisdh1