Sunday 5 February 2012

SEQwiki integration with NeuoLex


Topic 1) Sharing data between SMWs:

There are three broad ways to do it (if not more):
** If you want to cut to the chase, skip to method 3.

1) Export / import of pages.

The most reliable but least 'dynamic' way to share content between SMWs is to use (automated?) export / import of pages using standard MW functionality (http://meta.wikimedia.org/wiki/Help:Export).

This isn't a great solution for several reasons:

  • By copying pages around, we require the wikis to synchronize the SMW structure precisely, including properties, templates, forms, etc. This isn't ideal, given the different focus of the two wikis, the effort required to define an agreed standard, and subsequent migration overhead required to meet the standard.
  • Most importantly, synchronization (merging) of edits on the two different wikis will cause problems unless carefully managed. We could code something to perform merging, or implement policy that keeps edits separate. Both would take time to develop. There may be existing tools for doing this, but I don't know any...



2) Nice but 'unsupported' tools.

There are a couple of solutions that look nice, but the underlying projects are in an uncertain state:

2.1) Distributed Semantic MediaWiki (DSMW)
http://www.mediawiki.org/wiki/Extension:DSMW

Looks impressive, but I can't even find the documentation any more! The purpose isn't (wasn't?) 100% identical to what we want, as it seems to be aimed at mirroring *all* content (or more precisely, all edits) between two SMW instances, rather than just a subset of pages or properties. It's possible it could be tailored to our needs, but I'm not sure how it would handle different user accounts on the two wikis, for example. I think it needs some time to investigate if it's a viable solution, and again, it probably requires us to harmonize data models.

2.2) Remote queries via the Exhibit extension
http://semantic-mediawiki.org/wiki/Help:Exhibit_format

This is a 'format' plugin for SMWs #ask query. It has a lot of nice functionality for faceted display of query results (example, http://seqanswers.com/wiki/Service_Provider, see facets on the right).

This extension additionally has a simple mechanism for issuing #ask queries against a 'remote' SMW instance. This allows us to display (and potentially 'process') the results of a query on wiki X within wiki Y.

Unfortunately, the SMW format plugin project isn't being maintained, although the underlying 'exhibit' project is:
http://www.simile-widgets.org/exhibit/

However, the latter project isn't really relevant to our needs.

Perhaps we can just extract the 'remote' part of the code and create our own extension?


3) Using the External Data extension

This is the best solution I've found so far:
http://www.mediawiki.org/wiki/Extension:External_Data

Using this extension, we issue a query against the remote wiki (or a variety of other sources) and then display and/or annotate the results on the local wiki. This is dynamic, and flexible.

I've put a basic demonstration of a NeuroLex query on SEQwiki here:
http://seqanswers.com/wiki/NeuroLex_test

Configuring the query, the mapping between remote and local 'variables', and displaying and/or annotating the results is quite a painful process, but it does work (as demonstrated). To be clear, above I've only implemented displaying the results, not semantically annotating properties or creating 'object'

There are some clear and (relatively?) simple ways we can develop this extension to make it more useful and easier to deploy.

  • The mapping of remote and local 'variables' can be simplified (factored out).
  • The need for a separate annotation step could be removed, allowing us to 'replicate' remote properties and classes locally. This is a bit more complex, but doable.
  • Created objects should be viewable in Semantic Drilldown, which is a broader SIO/SD bug. Resolved by the native SIO support?



4) The semantic web solution?

It's annoying that SMW exports RDF and SPARQL (internally), but can't function as a SPARQL endpoint out of the box, and has no support for issuing queries against remote endpoints. I've pestered the developers about this a few times (I'm not offering to do any work, so I can't complain) and it seems like all this IS possible by 'plumbing' existing functionality. It just requires a few good programming hours.

If I were in a position to magically create a solution, I'd pick this one, as it not only solves our problem succinctly, but also has the widest possible benefit to the SMW community, which directly feeds back to us.

UPDATE:

I forgot to cover the various RDF/SPARQL enabled extensions that do this (please add comments and I'll work things in):


I got an email from Christian Becker mentioning this work:
http://www4.wiwiss.fu-berlin.de/bizer/ldif/

and previously I've seen the following work by Alfredas Chmieliauskas and Chris Davis:




5) Nice and supported tools:

SMWs ASK API is in early stages, but avoids the messy URLs involved with solution 3.


From Jesse Wang:
        I believe Wiki Object Model (WOM) can help here. It allows you to do a WOM API call. It forwards the query to the remote site (by WOM / ask API / ...), to get JSON data (or other supported formats) back. It requires bot wikis to be running WOM, so maybe it isn't a universal solution.




SPECIFIC TODO
* Configure SIO from the resulting query results.
* Think about how to implement the 'recreate' function vis. SIO

No comments: