All publicly funded data and information have public good characteristics; they cannot be depleted with use and it can be inefficient to restrict access. The value of data and information generally increases with their reuse by others. These characteristics are particularly strong for public data and information generated by governments and by activities outside government that are publicly funded, including in research. This is because both the data or information, and the public-sector activities that generated them, are undertaken with public money and in the public interest.
Data, or factual information that has been created or collected in a structured database or compilation of information, are particularly valuable to share (i.e., make openly available) on digital networks. Since the advent of the internet, many studies have shown that open data online have economic, social, educational, and research values that can greatly benefit society and the progress of science and technology. However, data and information strategies, policies and management have not kept pace with scientific and technological changes. The practice of data sharing has not kept pace with the technological ability to do so.
Less economically developed countries and those with emerging economies have much to gain from the formation of open data policies in the public sector and devising mechanisms for their implementation. In addition to the general values that are inherent from open access and reuse of data, schools, universities, research organizations, governments, and of the entire societies in the developing world can improve governance and decision making, empower and educate citizens, promote capacity building, and generally increase the opportunities for innovation and the return on public investments. It is not possible for nations in the developing world to reach the post-2015 U.N. Sustainable Development Goals, to play an equal role in international cooperation programs, and to close the digital divide without such data and information policies. Moreover, governments, foundations, and in some cases even the private sector should open their data as much as possible for the public welfare of users in developing countries. All these justifications and issues will be elaborated in greater detail in accompanying Guidelines.
It is for these reasons that we, the participants in the International Workshop on Open Data for Science and Sustainability in Developing Countries, agree on the following ten Data Sharing Principles in Developing Countries, which we also refer to as “The Nairobi Data Sharing Principles”:
- Data should be open and unrestricted
Data generated with public support, including those of private, charitable foundations, should be openly accessible and subject to unrestricted (re)use, absent specific, justified reasons to the contrary (see Principle 10). Openness is especially beneficial for development purposes, and for educational and research uses, but can benefit all society equally and have a multiplier effect on the economy.
- Data should be free to the end users
In most cases, any cost for access is an insurmountable barrier to users in the developing world. Therefore, data should be free online to the user. In some special cases, access to data may be no more than the marginal cost of fulfilling a user request. At the same time, it is recognized that adequate preparation and open availability of data require sufficient financial support (see Principle 7).
- Data should be informative and assessed for quality
Data should be of known quality and integrity, and should be organized and described (with metadata) in datasets sufficient to allow them to be understood and effectively (re)used by others. Baseline technical and management standards need to be established, especially in the developing world where state-of-the art practices are not yet as prevalent. Adequate preparation and the use of non-proprietary software and formats are especially important for any datasets expected to have long-term value.
- Data sharing should be timely
Once datasets are sufficiently informative and quality controlled, they should be released as quickly as possible. This can be done in steps, starting with the metadata to avoid duplication. In some cases, such as public emergencies and disasters, open release of relevant data should be an immediate priority. In other cases, such as research, data should be openly available no later than upon the publication or patenting of results. Users in developing countries have the most to gain from such policies.
- Data should be easy to find and access
Upon the public release of any dataset, the provider should promote ease of access by the broadest user base. Diverse means of publication should be considered in recognition of potential connectivity and other technological challenges.
- Data should be interoperable
To facilitate reuse and combination with data from one or more other datasets (e.g., in geospatially referenced research), special attention should be given to making data technically, semantically, and legally interoperable.
- Data should be sustainable
The life-cycles of all datasets should be planned at the outset with support sufficient to successfully implement the first six Principles. The lower availability of funding in developing countries, especially for long-term preservation, makes this a key priority so that valuable datasets remain intelligible and are not lost or in need of rescue. Consistent with Principle 2, cost recovery for data archiving and availability should not be borne by the immediate users, but by other entities in the data lifecycle.
- Data contributors should be given credit
A significant incentive for the open disclosure and publication of a dataset is the ability to properly cite and attribute the contributor(s), whether internal or external to an organization. Any subsequent user of the data has at least an ethical obligation—and possibly a legal one—to cite and attribute the source of the data whenever they are reused, and not to misuse the data in any way. Such practices can also improve the integrity of the datasets made available by the contributors, in support of Principle 3. Data contributors in the developing world require greater recognition and rewards for such disclosure, and this should become common practice.
- Data access should be equitable
Open access and use of data in developing countries, especially for public purposes, should be supported by the governments and institutions in the more economically developed nations. Capacity building of essential experts and infrastructure in developing countries should be a priority of international organizations. Similarly, experts in developing countries should join and actively participate in the relevant regional and international organizations to exchange skills and knowledge.
- Data may be restricted for a limited time, if adequately justified
Restrictions may be placed on access to and uses of publicly funded data and datasets for specified periods of time. Justified restrictions may include specific protections of national security, personal privacy, intellectual property, confidentiality, and other values, such as indigenous peoples’ rights or location of endangered species. Nevertheless, the default rule should be one of openness, consistent with Principle 1, and any restrictions should be minimized to the extent possible.