Navigation
SEARCH
TOOLBOX
LANGUAGES
Create a book
STELLAR Network of Excellence
ResearchFM
Create a book

ResearchFM

From Stellar Deliverable 6.3

Jump to: navigation, search

Contents

1 Why research.fm?

In a STELLAR working group we have discovered that many of us deal with similar problems: fetching and exploring research artefacts. And all of us face the same challenges: getting good data to work with -- and getting it fast. In our dispersed working groups, we have created a lot of dispersed data so far. With this application programming interface, we have the vision to collect and combine what we have -- and build new applications on top of a reliable service infrastructure.

There is some related work: the Mendeley API, which has some shared features with this work (but seems to have a different focus); the last.fm API has been very inspiring.

2 Requirements

Here is a list of applications describing their planned use of research.fm -- this should help to steer the discussion better. Remember: to create a beautiful API, we have to make sacrifices. We should not go for the 'lateral omnibus model' (= everyone is sitting in the first row), let's stick with a good core and add all other functionality on top of it (outside this API).

3 Overview

research.fm 4-tier view.


4 Schema

Here is the gliffy chart with the ER model. It is public, so everyone should be able to edit. Please be aware that the snapshot below is NOT a live snapshot of the gliffy chart!

ER.png

4.1 Missing keys, fields, tables, ... entities

@NOTE : Im not sure if a DB schema and/or definition should go here, the idea of the API is to be connected to any kind of back end? @NOTE: I think that a ER schema (not necessarily a DB schema) is needed to provide information about the what the API is doing

Please add here entities and relations you are missing.

For example:

  • 'sea-level' for 'affiliations': To place researchers institutions on a Google Earth, I need their sea-level (Fridolin Wild)

And here the real thing:

  • table 'journals' linking against papers
  • table 'conferences' linking against papers
  • table 'affiliation' linking against paper (it is not enough to have it linked to author. They can change affiliation, Papers can't)
  • table 'topics' linking against 'papers': using latent-semantic spaces, I can add topic relations for each paper and relations between topics (Fridolin Wild)
  • table 'feeds' linking against researchers and/or affiliations
  • tables 'tags' and 'items' linking against 'feeds'
  • BuRST and research.fm shoul hold keywords from the publications. either via dc:subject or via swrc:keywords. I'm missing this in the figure (Wolfgang Reinhardt)


  • what about other social media? Wolle? Gonzalo?
    • 'socialHandles': SlideShare, Mendeley, ResearchGATE, del.icio.us, linkedin, facebook, twitter
    • table 'slides'?
    • table 'bookmarks'?
    • table 'tweets'?

@NOTE: social media artefacts will be handled in a separate exchange format

@NOTE: An alternative implementation could be to use a graph-based database such as neo4j [1] to avoid all the complicated Joins.

4.2 How to add stuff?

So far we envision one API paired on a single central data storage. We are working on federating and harvesting services behind it, but to the end user, it should be exposed as a single API address using whatever data source behind. If you want to provide publication data, you should look into the harvesting options.

Though there need to be some 'write' functions in this API, to e.g. report back duplicate identifiers, or to add tags to an entry.

 To be discussed: should we allow to add full records by authenticated users? This way, we could easily 
 go around the harvesting problem: in the worst case we just write an insert script as a CRON job at midnight.

4.3 How to use it?

@NOTE: what about a REST approach?

Use the javascript package like this:

<link src="http://api.stellarnet.eu/libs/researchFM.js"/>

And then subsequently in the code:

<script type="text/JavaScript">
var rFM = new ResearchFM();
$('list').innerHTML = rFM.getConferences();
</script>

The script is a wrapper for a couple of REST API calls, which you can (if you need) also call directly.

5 The client application programming interface (API, draft version 0.1)

@NOTE : Seems to me that this is a description of a javascript lib that USES the API, not the API itself ? Maybe we need an agnostic version of this description ? @NOTE: yes, but this was done on purpose. Consider it as pseudocode ;) -- the aim is that we really need to think close to the implementation which defines the back-end API, not the other way round!

This lists all the core access routines supported by the API. For each of the entities, you can find the available access methods.

The deal with each of these entities is, that the REST-service returns (unless specified otherwise) a JSON object that can be directly evaluated. Meaning: you can directly address the data of each of the items and you can directly use lists of items. Each object does chaching its own way: it might be the case that details that are not relevant are requested on demand from the server. Per default, each item will contain only a unique resource locator and a name, unless specified otherwise.

Each single entity can be used like this:

 var myAffiliation = new Affiliation();
 myAffiliation = myAffiliation.find(name='Open University');
 myAffiliation.data.name; // -> 'The Open University'
 var myPaper: Paper;
 myPaper.find(author='Fridolin Wild', title='designing');
 myPaper.data.title; // 'designing for change: mash-up personal learning environments'

Each list can be evaluated like this:

 var myList: new Affiliation();
 myList.find(country='UK')
 myList.data; // -> ['OU','UCL', ...]
 myList.data[1].data.address; // -> 'Walton Hall, MK7 6AA, Milton Keynes, UK'
 myList.data[1].members(); // -> ['Fridolin Wild', 'Peter Scott', ... ]

Here are the main entities and their access methods:

  • Affiliation @NOTE: I would add Affiliation.papers(): return a list of papers published at the institution
    • Affiliation.info(): get the meta-data on the institution
    • Affiliation.find(): get a list of Affiliations fitting the search term, store them in Affiliation.data
    • Affiliation.data: link to last result (set)
    • Affiliation.geo(): get geo data
    • Affiliation.members(): return a list of Researchers at the institution
  • Geo
    • geo.metros(): return country and metro objects (metro objects are: e.g. 'London' or 'Bavaria', was 'borrowed' from last.fm)
    • geo.find(): find list of metros and countries, store them in geo.data
    • geo.data: link to last result (set)
    • geo.affiliations(): return the Affiliations in a metro or country
  • Researcher
    • researcher.info(): get detailed infos on a particular researcher @NOTE : what kind of info ? @NOTE: good point. Spelling alternatives? skype address? facebook accountname?
    • researcher.find(): find researchers, store the list of researchers into researcher.data
    • researcher.data: link to last result (set)
    • researcher.coauthors(): find those colleagues, the researcher has written a paper with
    • researcher.affiliations: return the list of affiliations, the researcher is or has been associated with
    • researcher.papers: returns the list of papers the researcher has written
    • researcher.isSameAs(alter): stores in the storage system that the researcher is a duplicate of 'alter'
  • Network @NOTE : Examples to clarify ? @NOTE: Networks could also be between geolocations, between institutions, between journals and/or conferences
    • Network.initialize(personFilter, paperFilter=): return graph with papers and authors as nodes and their relations as edges
    • Network.filters.persons(names): names of the person(s) for which the network should be filtered
    • Network.filters.papers(titles): keyword(s) in the paper titles for which the network should be filtered
    • Network.getUrl(): get a url to a graphML file with papers and authors as nodes and their relations as edges, thereby execute filters (rem: I would do this with a stored procedure on the database level plus a small wrapper writing it out as graphML) example
    • Network.get(): fetch Network.getUrl() and store results into Network.data
    • Network.data: contains a list of nodes and edges
    • Network.display(): return flash fossa.swf object that renders the network at this.getUrl() using a force-directed lay-out in flash
    • Network.sets(): get a list of sets into which the nodes and edges of the network can be clustered
  • Paper @NOTE : why are the motheds here different than the author methods ? (paging and stuff ...) @NOTE: I would expect the researcher lists to be always considerably short, but there are far too many papers a query can find to transfer and process it in one go? @NOTE:Affiliation should also be related to a paper (eg. finding all the publications from Ecuador/Guayaquil/ESPOL - Finding the country of a paper)
    • Paper.info(): returns meta-data for the item of desire
    • Paper.find('keyword', limits=10): find papers, store the list of the first 'limits' papers into paper.data
    • Paper.data: link to last result (set)
    • Paper.data.field1: value of 'field1', if there is only one result
    • Paper.data[]: list of the first 'limits' papers
    • Paper.next(): get the next 'limits' results (paging), returns NULL when there are no more
    • Paper.references(): returns the list of references listed typically as last section of the paper
    • Paper.cites(): returns the list of other papers that have cited this one
  • Journal
    • Journal.initialize(jid): journal with the id 'jid'
    • Journal.find(keyword): store (list of) fitting conference in Journal.data
    • Journal.data: interface to the (sets of) meta-data
    • Journal.papers(): return list of papers
  • Conference
    • Conference.initialize(cid): conference with the id 'cid'
    • Conference.papers(): return list of papers
    • Conference.find(keyword): store (list of) fitting conference in Conference.data
    • Conference.data: interface to the (sets of) meta-data
    • Conference.data.years[]: list of the years the conference took place
  • Repository
    • Repository.harvest(feed): get a BuRST feed, put it into the database, index it
  • Feed
    • Feed.find(keywords): identify relevant feeds

6 The back-end application programming interface

API, draft version 0.1

6.1 Methods

NOTE Every call that returns a list supports paging with the parameters page={pageNbr}&items={itemsPerPage}

  • authors
    • /authors/{author_id}
      • return: SWRC.Person
    • /authors
      • return: List of SWRC.Person.about
    • /authors/search/{keyword}
      • return: List of SWRC.Person.about
    • /authors/search?q={query}
      • return: List of SWRC.Person.about
    • /authors/organization/{organization_id}/search/{keyword}
      • return: List of SWRC.Person.about
  • publications
    • /publications/{doi_prefix}/{doi_handle}
      • return: SWRC.Publicaton
    • /publications/{publication_id}
      • return: SWRC.Publicaton
    • /publications/year/{year}
      • return: List of SWRC.Publication.about
    • /publications/search/{keyword}/year/{year}
      • return: List of SWRC.Publication.about
    • /publications/search?q={query}
      • return: List of SWRC.Publication.about
    • /publications/author/{author_id}/year/{year}/search/{keyword}
      • return: List of SWRC.Publication.about
    • /publications/organization/{organization_id}/year/{year}/search/{keyword}
      • return: List of SWRC.Publication.about
  • organizations
    • /organizations/{organization_id}
      • return: SWRC.Organization
    • /organizations
      • return: List of SWRC.Organization.about
    • /organizations/search/{keyword}
      • return: List of SWRC.Organization.about
    • /organizations/search?q={query}
      • return: List of SWRC.Organization.about

6.2 Reference Implementations

6.3 Harvesting

Harvesting (or: 'import') of data is done via BuRST feeds. Here is a directory of BuRST feeds.

7 Limitations and other problems

  • So far, the 'search' is part of each class. If the result is a single item, they can be accessed through entity.data.title, if it is a list, it goes through, e.g., entity.data[1].data.title. Shouldn't lists and single items be separate? Attention: each other access routine (e.g. Affiliations.members has to check whether Affiliations.data is a single item -- and throw an error code otherwise).
  • The sets are so far only in the Networks section: they would be an easy way, when the number of items is too big to browse through (and a rank order does not make sense) -- maybe add an 'explore' method to complement 'find'?
  • Which elements need to have a unique identifier / locator?
  • Shall we do read-only or also write services? which authentication method should we use then (last.fm seems to have a bright way of doing it)?
  • We had quite a bit of discussion about whether we want to develop one single target for these services, or a distributed network. Likewise: do we want to rely on harvesting for replicating relevant data between research.fm sites? Basically, some exploratory tools need only one step at a time and can work on an API; some other tools need most all data immediately and would need mostly a dump or additional API calls?