Session 1: Legal and policy issues

Audio from the session
[audio:http://www.jisc.ac.uk/media/avfiles/events/2008/07/session1b.mp3]
To downlod the MP3 click here

The motion being debated in this session is “Curating and sharing research data is best done where the researcher’s institution asserts IPR claims over the data”.

Initial discussion centred around what was the right approach to IPR and copyright that would enable curation and data sharing to take place. Some people argued for the Science Commons approach (ie that as soon as possible, all data should be put into the public domain with all rights waived) which was seen as a simple and effective way of sharing data. Others felt the challenge here was to get researchers to accept this approach – and thought that licensing data was the way forward. There was discussion around who was the rights holder – it’s not clear (a) what the rights are and (b) who owns them.

An interesting point was made about EU legislation, which requires any public sector institution creating data to supply it at cost price for commercial organisations to re-use. Universities may in time be included under this proposed legisation.

Speaking for the motion is Charles Oppenheim, Loughborough University, and against, Mags McGeever from the Digital Curation Centre (DCC).

Before the debate began, a vote was taken: 5 people supporting the motion, 10 against, 7 abstentions.

Charles Oppenheim spoke for the motion, and outlined the formal legal position.

  • the IPR of anything created by an employee in the course of their employee duties automatically belongs to the employer, unless there’s a different contract in place

In practice, there’s a different situation. Many employers choose to waive their rights by not doing anything about what their researchers do (it’s custom and practice). Copyright is often assigned to a publisher for an article, even though the researcher isn’t in a position to do so. Institutions don’t object, so a court of law would infer that the employer has waived their rights.

Problems:

  • researchers are too willing to assign copyright to commercial journal publishers, and can be restricted from using their own material (eg for teaching purposes). The Open Access movement is trying to work against this.
  • curation of research data is best done by those expert in the field. But they are only able to do the necessary tasks if they can be confident that they have the right to do this – all data curation involves copying of one sort or another. Copyright ownership must clearly be held – and this is easiest if this is the employer.

All this is not just a hypothetical argument – commercial publishers are well aware of the dangers to their business model of the open access movement. They are now showing great interest in raw data, and developing their own repositories of research data which subscribers can have access to.

So if researchers assign not just an article but also data to commercial publishers, access to this data can be charged. If an employer owns research data, they can be much more robust with the publisher than single researcher can.

The idea of employers asserting copyright, may be seen as an infringement of academic freedom, and researchers may worry that employers may not exploit/use data properly.

So the employer should clearly assert ownership in employment contracts but should offer

1. a royalty-free indefinite licence to the employee to do what employee wishes to do with it (disseminating it, putting it in repository etc)

2. the employee a right to object if employer does anything with the data which may affect their reputation

The data curation expert then has assurance that there are no IPR problems, and the employer gives the employee a reassurance that nothing bad will happen. This doesn’t stop copying going on by the employee, it just stops it being handed over to commercial publisher.

Mags McGeever spoke against the motion, stating that curating and sharing research data is best done where no-one asserts the IPR.

Facts are not copyrightable, but the structure of databases are. Using the example of the British Horseracing Board, she outlined the rights for databases – where there has been substantial investment in obtaining, identifying and verifying the material within – but pointed out that all this only applies within Europe.

She moved onto the fact that the ‘ownership’ of data can be uncomfortable. Asserting IPR where it doesn’t exist is complicated, and if people are unsure, they don’t act, stifling curation and access.

As an example of complication – previously impossible collaboration is now possible, between research teams, projects, jurisdictions etc. There may be hundreds of owners, within different jurisdictions (eg databases, data). No one institution could own this data without a framework agreement – and this may cost in terms of time and resources.

She pointed out that time and money spent on this agreement, away from core topic of research is a nuisance and waste of resources. Even where IPR is agreed, if curation may be done differently, how do you proceed?

Data unencumbered by IPR, therefore, is easier to share, and it’s easier for innovation, curation etc. In the US, data policy is much more open. The Science Commons recommends waiving all rights and placing data in the public domain.

Once the data has been manipulated, combined, worked on by many parties – what we need is a technical solution rather than a legal solution. We don’t need another set of license terms that people don’t have time to read – ill founded assertions of IPR only serve to stifle curation and innovation.

The debate was then opened to the floor:

Sam Pepler, British Atmospheric Data Centre: He pointed out that he gets data from academics on the basis they’re NERC (Natural Environment Research Council)-funded. When they get a grant, they sign up to a contract that gives NERC licence to distribute indefinitely. IPR rights are about clearing this up early, rather than clouding the issue. Although, if you have one person asserting the rights in a group of 10 that’s even more unclear.

Charles Oppenheim: asked who ends up owning the IPR.

Sam Pepler: said that the researchers own it. But they give NERC the licence – ie you own it but you don’t have to care.

Chris Rusbridge, Digital Curation Centre: gave the example of wanting to use some of the data that Sam curates in NERC, in a revolutionary worldwide experiment with several data sources. He said it meant that he would have to negotiate licence agreements with tens/hundreds/thousands of people asserting control over these data sources, which would be problematic. Data sharing in an interdisciplinary web is difficult – with the semantic web, robots can’t negotiate license agreements! The only way to promote interoperability is to say waive all rights and consign data to the public domain.

Sam Pepler: said that the owner was waiving the rights. However, they need to assert them first, to put them in the public domain!

Mags McGeever: If you are making something under a licence you need to assert your rights. In terms of the waiver, it can be a two-way thing. To the extent that I hold rights, you can waive them. Different from granting a licence.

Owen Stephens, Imperial College London: made the point that this was diverging slightly from the debate. This was reuse and interoperability, whereas it should be about curation. He wasn’t convinced about the institutional nature of IPR, but said it can lead to preservation.

Chris Rusbridge: disagreed, and said that curating data is absolutely about making it reusable.

Peter Morgan, University of Cambridge: said that curation is the object of the exercise, and that if the institution has IPR it’s more likely to curate. There are complexities with lots of institutions etc, but researchers are all motivated by the desire to share their work, and reuse. Institutions (universities) are in competition with one another, and it’s much more difficult to see how different institutions can collaborate on IPR.

Sam Pepler: said that all the data he has arrives close to the end of the project. The conditions of use are based on an embargo period – as lots of projects are still writing up data. Individuals are often disinterested at this point – so he wasn’t sure that they knew best. It’s better done by the funder because they’re the people who actually want to know what’s happening to the data. People do want to keep the data close to their chests until it’s published.

Norman Gray, University of Leicester: said he was against the motion. He said that both speakers had talked about removing barriers to archiving, preservation etc, about dealing with the rights involved, but said that this was a red herring.

Robin Rice, University of Edinburgh: said that the funder taking responsibility had a similarity to the institution taking responsibility. The institution doesn’t necessarily want to get the data out there – they want to hold onto it in case there’s a gem buried in there. The researcher feels duty-bound to curate it themselves – she mentioned research ethics.

Joy Davidson, Digital Curation Centre: thought that without the IPR being associated with the institution, you wouldn’t get the infrastructure. She said that there are publishers who will take this on, to do the work, and pointed out that there were lots of places of deposit (eg the Arts and Humanities Data Service (AHDS)) which are no longer viable so institutions have to do it.

Owen Stephens: said that you could outsource to the private sector who would charge you for it. He pointed out that we pay somewhere, whether it’s a private organisation or you fund it nationally. So far we haven’t shown much appetite for the national thing. If AHDS isn’t viable, how on earth can we do it?

Joy Davidson: said that it’s still going to cost money. Investment might go down over time – and a lot of the investment in-house is less risky.

Norman Gray: said that institutions don’t have much interest in preserving data, neither do researchers – the real interest in making sure is funders. They’re big enough to develop expertise, and negotiate third party agreements.

Rob Sanderson, University of Liverpool: pointed out that the party with the greatest desire is the community, not the funder, and it should share the cost.

Chris Rusbridge: said that academic colleagues are two-faced. For their own purposes, data has to be exploited to the last inch but kept private and closed, but your data needs to be made available in the public domain! If you separate the researcher from the research community you get different views. We have to get away from the individual researcher, and start thinking more about the whole venture that we’re trying to construct. It should require that the results of science are made available in the public domain (withheld for a certain period).

Carmel de Nahlik, Coventry University: said it would be useful to have a stakeholder map to understand the process we’re looking at. It’s contextual – different for different disciplines.

Greg, Pytel, LSE: said that the funder automatically dictates the rules. He gave the example of a major company in management consultancy, that funded research. The company was interested in putting research out immediately, to get people to publish on data, as they wanted to sell consultancy on the back of the research. However, the researchers wanted to publish articles for the next few years… He made the point that where a public purse is involved it becomes much more complex.

Chris Rusbridge: said that institutions are incompetent and self-serving and that agreements take a long time.

Robin Rice: pointed out that what we’re missing here is ramping up the services which are provided by the institutions – asking researchers if they want some help? There are some researchers who do produce large datasets, but no-one is coming from the institution to offer help. We can sit here and wait for mandates, but in the meantime how do we build institutional capacity?

Neil Geddes, Science Technology Facilities Council: said that both speakers argue for clarity in data ownership. Successful curation means different things in different contexts. The only successful curation is where the body responsible understands the value for it – and a university wouldn’t. No one size fits all.

Greg Pytel, LSE: said that if things are publicly funded, that implies that the information goes to the public domain.

Sam Pepler: used the example of NERC – they take the right to do what they like with the data, so they can make sure that researchers can do what they want with it. If there’s commercial success, the money from it goes back into the public purse that paid for it.

Owen Stephens: mentioned the issue of patents. He said that if the economic value of research is tied up with the patent side, it demotivates people to preserve data. What’s in it for you? You’ve already made your money out of it.

Richard Green, University of Hull: said there is an implict assumption that if the IPR is vested in the institution then it’s going to be the institution that does the curation. He pointed out that some small institutions don’t have ability to do this – data should go to a national institution, passing the responsibility on.

After that, it was back to the original speakers to sum up for and against.

Charles Oppenheim summed up for the motion. He said that the idea of an institution was too simplistic – it should be broadened to include funders etc. Quite a lot of research is done without external funding – especially in the arts and humanities – so we should also consider institutional/employers/funding agencies.

He gave a couple of examples of what can go wrong with access to data, and also pointed out that international research, involving a consortium agreement would be a nightmare.

Another issue was that any public sector institution (EU-based) creating data – must supply that data at cost price for commercial organisations to reuse (including exploitation). Universities are currently exempt but are likely to be included and will be legally liable to hand over data unless there is a confidentiality agreement in place.

Mags McGeever summed up against the motion. She said that we’re not clear about what we’re talking about when it comes to ‘data’. While we don’t have clarity, we shouldn’t make stabs in the dark. Until we know what we’re talking about we should license as little as possible.

The time and resources spent negotiating would be better spent doing the research. She challenged institutions’ ability to do the curation, and said that the subject specialists in the data are better.

She pointed out that the motivation of researchers was a social rather than a legal issue. Removing legal barriers, and opting out, may help to investigate why researchers don’t want to share research.

To end the session, a final vote was taken: in favour 6, against 14, with 3 abstentions.