Monday 12 November 2012

A Perl client and REST wrapper for the CiteXplore SOAP webservices

OBSOLETE! See the official EuropePMC REST API which is now used within SEQwiki in this template.



As part of my ongoing plan for world domination, I decided to write a REST wrapper for the SOAP webservice provided by CiteXplore, the EBI's Citation Database. At first I thought this might be a daunting task. However, CPAN and the nice people on freenode came to the rescue, as usual.


First, I installed SOAP::WSDL locally using my usual recipe, and then simply pointed the handy script it provides at the WSDL for the CiteXplore SOAP webservices:

wsdl2perl.pl -b CXPSC URL


This 'auto-created' the Perl client for the webservice. What impressed me most was the resulting perldoc, that told me exactly how to use the Perl client. For example, typing:

perldoc CXPSC/MyInterfaces/WSCitationImplService/WSCitationImplPort.pm


gave this output. Using the POD and CiteXplore's Web Services Reference Guide I created a simple test script that even works!


Next came the task of wrapping these functions in a REST server. Some people on freenode told me to use Mojolicious::Lite. Although it's trendy, it also seems to work with minimal fuss.

Here it is, but it isn't pretty:
https://github.com/dbolser/Perl-CiteXplore-SOAP-client/blob/master/rest_script.plx


Finally I used the newly created REST interface to pull citation data into SEQwiki using the External Data extension. Setting up the execution environment using App::perlbrew (my webhost didn't have the requisite version of Perl for Mojo) was also much easier than I imagined.


THE END

Thursday 1 November 2012

MySQL, what are the values for this column in all tables?

If you have a complex schema, you may have an ID column that is used in several tables. Lets say you want to count all values of all such columns... without having to think about which tables it occurs in? Then this is the stored procedure for you:

DELIMITER $$

DROP   PROCEDURE IF EXISTS GetAllDistinctValuesByColumnName$$
CREATE PROCEDURE           GetAllDistinctValuesByColumnName (ColName VARCHAR(255))

BEGIN

DECLARE no_more_tables INT DEFAULT 0;
DECLARE my_table VARCHAR(255); 

DECLARE cur_tables CURSOR FOR
  SELECT DISTINCT
    TABLE_NAME
  FROM
    INFORMATION_SCHEMA.COLUMNS
  WHERE
    TABLE_SCHEMA = DATABASE()
  AND
    COLUMN_NAME = ColName;

DECLARE CONTINUE HANDLER FOR NOT FOUND
  SET no_more_tables = 1;

/* Result table */
CREATE TEMPORARY TABLE temp_my_values (
  my_table VARCHAR(255) NOT NULL,
  my_value VARCHAR(255) NOT NULL,
  my_count INT UNSIGNED NOT NULL
);


OPEN cur_tables; 

FETCH cur_tables INTO my_table; 

REPEAT

SET @ugly_sql := CONCAT('
  INSERT INTO temp_my_values
  SELECT "', my_table, '", ', ColName, ', COUNT(*)
  FROM ', my_table, '
  GROUP BY ', ColName
);

-- SELECT @ugly_sql;

PREPARE mySt FROM @ugly_sql;
EXECUTE mySt;

FETCH cur_tables INTO my_table;

UNTIL no_more_tables = 1 
END REPEAT; 

CLOSE cur_tables; 

SELECT * FROM temp_my_values; 
DROP TABLE temp_my_values; 

END$$

DELIMITER ;