I’ve pretty much got my setup as needed now: parallel Jena and AllegroGraph triple stores, each with an identical set of 21 million triples. So I’m doing experiments with queries - trying to find good tests that demonstrate particular points I want to make. I’m having some trouble with AG queries - my setup isn’t quite right I think, but Franz are being very helpful. With Jena I’m running queries via JDBC and having a lot of grief over Java heap space in Eclipse. Inching forwards though…

Now I should do the conversion and loading of thesauri I think. I need that in place before I can do my query summarisation stuff. Designing the schema should be pretty straightforward, but I’ve found a couple of papers on the subject from the Amsterdam people, so I’ll check out their thoughts first. I’m hoping it’ll only take a day or two to do the whole thing: design schema, export from Oracle in the right format, load into my two stores. The plan is to use separate named graphs within the same physical store.

I need to write up my query experiments in time for the IACH deadline in August, as they’ve accepted my submission. My attempt for ISWC wasn’t in the 16% of good ones, alas. Pity, as I’d have liked to go, but actually ECDL might be more useful in terms of meeting people. It’s certainly likely to be a more familiar milieu for me.

Lots of things have been happening over the last few weeks but none is relevant here. Now I’m getting back in gear, picking up the loose threads, climbing the learning curve and applying various similar metaphors. I want to get my AllegroGraph and Jena databases fully in sync with each other, with copies on DICE and on my laptop. Then I’ll get on with query experiments, which is the penultimate big task remaining.

Today I installed the latest version of Jena and ARQ on DICE (and upgraded the quite recent version on laurie the laptop, as there are improvements in ARQ that are worth having). I’ve got a Jena reload running on DICE, that should be finished sometime mid-morning tomorrow. That should bring the Jena version into line with the AG one, with 21,152,388 triples. I installed AllegroGraph 3 on DICE too, and copied across the latest version of the data (to my scratch disk at present as it’s so big, which isn’t ideal…) Once all the jobs are finished tomorrow I just need to double-check the versions are all just so, and run a few test queries on each setup. Then I can get down to designing the queries I want to use.

I’ve done some work on the AllegroGraph querying front. I updated the Java interface files and revised my query programs for the new API. It seems to work… but it’s incredibly slow. I need to do some more experimenting.

Today I wrote an extended abstract to submit to IACH, whose deadline was today, 8th June 2008. Or so I thought… I found when I went to submit that it had already closed, presumably at midnight last night. Seems a funny interpretation of “June 8th” to me (I’ve double-checked the calendar and that really is today’s date!) so I’ve sent it by email with a plea for their understanding.

Yet more EPIC revisions done, and n thousand words dropped into the technical appendix. We had an encouraging message from ADS about that and Claire’s now sending it on its way. I wonder if they’ll fund it? It’s stuff that ought to be done but I have to admit there’s a tiny corner of me that won’t mind if it doesn’t come off, as I sometimes think wistfully that it might be nice to be paid the way other people are. But I’m feeling so much encouraged by Catherine thinking the technical appendix is ok, that I don’t really care about such sordid things!

I expect those ignoble thoughts intruded just for a moment because I’m a bit tired. I really fancy a bit of kip. However, I’m working this afternoon and till Friday at the airfield, doing one of the “flight test” courses for Strathclyde University. One does need a leetle bit of sordid cash. If I can manage not to fall asleep as soon as I finish in the evenings I need to get down to some SPARQL experiments so I can write something to send to the IACH workshop for their Sunday deadline. The idea, as with the RDB2RDF stuff, is to get another chapter written, though I would rather like to go to ECDL (digital libraries) where the workshop’s being held.

Next Tuesday I go down to Wales for a couple of days for SWISH meetings (must read the papers for that…) and then we’re off on holiday for a couple of weeks (Yes!), calling in on my sister on the way back to help her with some house clearing work for a few days. Mustn’t forget to pack up my BP stuff at some point too. All sorts of housekeeping jobs to do before we go away, that I’ve been letting slide while I’ve been a bit busy.

And now, as we pilots say, I must fly!

I was working all last week, running a holiday course at the airfield. In the evenings I tried to resist the body’s natural urge to flake out and fall asleep - it’s physically demanding work. (And immensely worthwhile-feeling.) Many hours went into my response to the CAA’s Mode S proposals and a letter to my MP on the subject. Less worthwhile-feeling, but it had to be done. Being on duty yesterday was bad timing, but it rained in the afternoon so it was a short day. This week I’m only working from Wednesday lunchtime till Friday.

I had another play with AllegroGraph over the weekend and tidied everything up: reinstalled the newest build of 3.0 that I’d been sent and made sure all the API stuff was consistent with it. I realised that the Java SPARQL interface has changed, which may account for why my query program was having problems. A quick SPARQL test showed that my 32-bit laptop can access the database created with the 64-bit software, which is a relief as torridon doesn’t seem very usable. I need to rewrite the queries using the new API - I hope to get to that on Wednesday.

Today has gone into more tinkering with the EPIC bid and probably tomorrow will too. Better get back to it now, in fact.

Another revision of the EPIC case for support done and circulated, and today I drafted the objectives and summary pieces. Had an interesting presentation on D2RQ in the rdb2rdf telecon on Friday. Have given up torridon as a bit of a bad job after another try on it. I’ve copied the tether database from it to my laptop and will have another go at it on there asap.  I thought I might have go at something for the IACH workshop at ECDL, which would mean getting some query experiment results. The deadline is  8th June, just after the AHRC one on Thurs 5th. My next priority is to do my response to the CAA consultation on Mode S transponders, which is due this week. In the coming week I’m working at the airfield so only able to do PhD work in the evenings. I’ll only have occasional Web access.

Bit of a mixed bag today. I went to the first couple of talks in the NeSC Symposium on Provenance in Databases, but I didn’t have time to stay for the rest of the day, as I wanted to see the RCAHMS Faces and Places exhibition at the National Portrait Gallery in advance of going to a lunchtime lecture about it at the National Gallery. In the afternoon Claire and I reviewed the EPIC documents ahead of a meeting tomorrow, and I’ve spent the rest of the day revising the Case for Support. Still not quite finished.

As soon as I get a moment I really want to get on with AllegroGraph experiments. I’m finding torridon too awkward to use and I need to find a workaround, but I haven’t had time yet.

I’ve been experimenting with 64-bit AllegroGraph on torridon and the good news is that the data loads and the indexing I think worked (some odd messages). I’m having trouble running SPARQL queries though. I’d been hoping the servers would be feeling more like themselves today but it seems not - at any rate torridon is an extremely tiresome beast to work with, and I’m really hoping it’s not always like that. I shall try again tomorrow.

Still doing a few odds and ends on the EPIC proposal and other housekeeping stuff.

Mostly writing last week, for a Friday deadline - useful to get my database to RDF conversion stuff down on paper. There’s a fair bit that wouldn’t fit in the paper’s page limit, so I’ll need to write that up too. We had a meeting for EPIC last Monday and have another one scheduled for Thursday.

Today I downloaded the 64-bit version of AllegroGraph, installed it and started the loading and indexing jobs. As they’ve been upgrading machines today it has been rather a frustrating day to try to get anything done as the servers keep hanging. I didn’t manage to get much done, though I wasted a lot of time trying. Better luck tomorrow I hope.

Yesterday I went to a really good workshop about ontologies in cultural heritage, hosted by the Archaeology Data Service in York. Excellent mixture of people: NLP experts, heritage thesauri people, archaeologists. It was a pity RCAHMS didn’t send someone themselves, as well as alerting me to it; but I can report back to them. The ADS have clearly realised that NLP can really help in cultural data management, and are taking practical steps about it.

Lots of work done on the EPIC proposal, which is taking shape reasonably I think. I’ve also got access to a 64-bit machine now, and a download link from Franz for the 64-bit version of AllegroGraph, so I must install and test that as soon as I get a moment.

The priority in the coming week though is to try to write a paper about my RDB to RDF conversion work by Friday.

Next Page »