Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open-source data movement are similar to those of other "open(-source)" movements such as open-source software, hardware, open content, open education, open educational resources, open government, open knowledge, open access, open science, and the open web. Paradoxically, the growth of the open data movement is paralleled by a rise in intellectual property rights. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as Data.gov, Data.gov.uk and Data.gov.in.
Open data, can also be linked data; when it is, it is linked open data. One of the most important forms of open data is open government data (OGD), which is a form of open data created by ruling government institutions. Open government data's importance is borne from it being a part of citizens' everyday lives, down to the most routine/mundane tasks that are seemingly far removed from government.
The concept of open data is not new; but a formalized definition is  Other definitions, including the Open Data Institute's "Open data is data that anyone can access, use or share", have an accessible short version of the definition but refer to the formal definition.. One definition is the Open Definition which can be summarized in the statement that "A piece of data is open if anyone is free to use, reuse, and redistribute it – subject only, at most, to the requirement to attribute and/or share-alike."
Open data may include non-textual material such as maps, genomes, connectomes, chemical compounds, mathematical and scientific formulae, medical data and practice, bioscience and biodiversity. Problems often arise because these are commercially valuable or can be aggregated into works of value. Access to, or re-use of, the data is controlled by organisations, both public and private. Control may be through access restrictions, licenses, copyright, patents and charges for access or re-use. Advocates of open data argue that these restrictions are against the common good and that these data should be made available without restriction or fee. In addition, it is important that the data are re-usable without requiring further permission, though the types of re-use (such as the creation of derivative works) may be controlled by a license.
A typical depiction of the need for open data:
Numerous scientists have pointed out the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery ... we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge.— John Wilbanks, VP Science, Creative Commons
Creators of data often do not consider the need to state the conditions of ownership, licensing and re-use; instead presuming that not asserting copyright puts the data into the public domain. For example, many scientists do not regard the published data arising from their work to be theirs to control and consider the act of publication in a journal to be an implicit release of data into the commons. However, the lack of a license makes it difficult to determine the status of a data set and may restrict the use of data offered in an "Open" spirit. Because of this uncertainty it is also possible for public or private organizations to aggregate said data, protect it with copyright and then resell it.
The issue of indigenous knowledge (IK) poses a great challenge in terms of capturing, storage and distribution. Many societies in third-world countries lack the technicality processes of managing the IK.
At his presentation at the XML 2005 conference, Connolly displayed these two quotations regarding open data:
Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.
The concept of open access to scientific data was institutionally established with the formation of the World Data Center system, in preparation for the International Geophysical Year of 1957–1958. The International Council of Scientific Unions (now the International Council for Science) oversees several World Data Centres with the mandate to minimize the risk of data loss and to maximize data accessibility.
While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.
The Human Genome Project was a major initiative that exemplified the power of open data. It was built upon the so-called Bermuda Principles, stipulating that: "All human genomic sequence information (…) should be freely available and in the public domain in order to encourage research and development and to maximise its benefit to society'. More recent initiatives such as the Structural Genomics Consortium have illustrated that the open data approach can also be used productively within the context of industrial R&D.
In 2004, the Science Ministers of all nations of the Organisation for Economic Co-operation and Development (OECD), which includes most developed countries of the world, signed a declaration which essentially states that all publicly funded archive data should be made publicly available. Following a request and an intense discussion with data-producing institutions in member states, the OECD published in 2007 the OECD Principles and Guidelines for Access to Research Data from Public Funding as a soft-law recommendation.
Examples of open data in science:
There are a range of different arguments for government open data. For example, some advocates contend that making government information available to the public as machine readable open data can facilitate government transparency, accountability and public participation. "Open data can be a powerful force for public accountability—it can make existing information easier to analyze, process, and combine than ever before, allowing a new level of public scrutiny."  Governments that enable public viewing of data can help citizens engage within the governmental sectors and "add value to that data." 
Some make the case that opening up official information can support technological innovation and economic growth by enabling third parties to develop new kinds of digital applications and services.
Several national governments have created websites to distribute a portion of the data they collect. It is a concept for a collaborative project in municipal Government to create and organize culture for Open Data or Open government data.
Additionally, other levels of government have established open data websites. There are many government entities pursuing Open Data in Canada. Data.gov lists the sites of a total of 40 US states and 46 US cities and counties with websites to provide open data; e.g. the state of Maryland, the state of California, US.
At the international level, the United Nations has an open data website that publishes statistical data from member states and UN agencies, and the World Bank published a range of statistical data relating to developing countries. The European Commission has created two portals for the European Union: the EU Open Data Portal which gives access to open data from the EU institutions, agencies and other bodies and the PublicData portal that provides datasets from local, regional and national public bodies across Europe.
In October 2015, the Open Government Partnership launched the International Open Data Charter, a set of principles and best practices for the release of governmental open data formally adopted by seventeen governments of countries, states and cities during the OGP Global Summit in Mexico.
The debate on open data is still evolving. The best open government applications seek to empower citizens, to help small businesses, or to create value in some other positive, constructive way. Opening government data is only a way-point on the road to improving education, improving government, and building tools to solve other real world problems. While many arguments have been made categorically, the following discussion of arguments for and against open data highlights that these arguments often depend highly on the type of data and its potential uses.
Arguments made on behalf of open data include the following:
It is generally held that factual data cannot be copyrighted. However, publishers frequently add copyright statements (often forbidding re-use) to scientific data accompanying publications. It may be unclear whether the factual data embedded in full text are part of the copyright.
While the human abstraction of facts from paper publications is normally accepted as legal there is often an implied restriction on the machine extraction by robots.
Unlike open access, where groups of publishers have stated their concerns, open data is normally challenged by individual institutions. Their arguments have been discussed less in public discourse and there are fewer quotes to rely on at this time.
Arguments against making all data available as open data include the following:
The goals of the Open Data movement are similar to those of other "Open" movements.
Several funding bodies which mandate Open Access also mandate Open Data. A good expression of requirements (truncated in places) is given by the Canadian Institutes of Health Research (CIHR):
Other bodies active in promoting the deposition of data as well as fulltext include the Wellcome Trust. An academic paper published in 2013 advocated that Horizon 2020 (the science funding mechanism of the EU, due to launch in 2014) should mandate that funded projects hand in their databases as "deliverables" at the end of the project, so that they can be checked for third party usability then shared.
Several mechanisms restrict access to or reuse of data (and several reasons for doing this are given above). They include:
CiteSeerx (originally called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer holds a United States patent # 6289342, titled "Autonomous citation indexing and literature browsing using citation context," granted on September 11, 2001. Stephen R. Lawrence, C. Lee Giles, Kurt D. Bollacker are the inventors of this patent assigned to NEC Laboratories America, Inc. This patent was filed on May 20, 1998, which has its roots (Priority) to January 5, 1998. A continuation patent was also granted to the same inventors and also assigned to NEC Labs on this invention i.e. US Patent # 6738780 granted on May 18, 2004 and was filed on May 16, 2001. CiteSeer is considered as a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search. CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. For this reason, authors whose documents are freely available are more likely to be represented in the index.
CiteSeer's goal is to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access movement that is attempting to change academic and scientific publishing to allow greater access to scientific literature. CiteSeer freely provided Open Archives Initiative metadata of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP and the ACM Portal. To promote open data, CiteSeerx shares its data for non-commercial purposes under a Creative Commons license.The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations. CiteSeer changed its name to ResearchIndex at one point and then changed it back.DBpedia
DBpedia (from "DB" for "database") is a project aiming to extract structured content from the information created in the Wikipedia project. This structured information is made available on the World Wide Web. DBpedia allows users to semantically query relationships and properties of Wikipedia resources, including links to other related datasets. Tim Berners-Lee described DBpedia as one of the most famous parts of the decentralized Linked Data effort.Data set
A data set (or dataset) is a collection of data. Most commonly a data set corresponds to the contents of a single database table, or a single statistical data matrix, where every column of the table represents a particular variable, and each row corresponds to a given member of the data set in question. The data set lists values for each of the variables, such as height and weight of an object, for each member of the data set. Each value is known as a datum. The data set may comprise data for one or more members, corresponding to the number of rows.
The term data set may also be used more loosely, to refer to the data in a collection of closely related tables, corresponding to a particular experiment or event. Less used names for this kind of data sets are data corpus and data stock. An example of this type is the data sets collected by space agencies performing experiments with instruments aboard space probes. Data sets that are so large that traditional data processing applications are inadequate to deal with them are known as big data.In the open data discipline, data set is the unit to measure the information released in a public open data repository. The European Open Data portal aggregates more than half a million data sets. In this field other definitions have been proposed but currently there is not an official one. Some other issues (real-time data sources, non-relational data sets, etc.) increases the difficulty to reach a consensus about it.Definition of Free Cultural Works
The Definition of Free Cultural Works is a definition of free content from 2006. The project evaluates and recommends compatible free content licenses.Geopolitical ontology
The FAO geopolitical ontology is an ontology developed by the Food and Agriculture Organization of the United Nations (FAO) to describe, manage and exchange data related to geopolitical entities such as countries, territories, regions and other similar areas.INaturalist
iNaturalist is a citizen science project and online social network of naturalists, citizen scientists, and biologists built on the concept of mapping and sharing observations of biodiversity across the globe. iNaturalist may be accessed via its website or from its mobile applications. Observations recorded with iNaturalist provide valuable open data to scientific research projects, conservation agencies, other organizations, and the public. The project has been called "a standard-bearer for natural history mobile applications."KulturNav
KulturNav is a Norwegian cloud-based software service, allowing users to create, manage and distribute name authorities and terminology, focusing on the needs of museums and other cultural heritage institutions. The software is developed by KulturIT ANS and the development project is funded by the Arts Council Norway.KulturNav is designed to enhance access to heritage information in archives, libraries and museums, working across institutions with common metadata. Thus many institutions can collaborate to build up a list of standard naming and terminology. The metadata is published as linked open data (LOD), which can be linked further against other LOD resources. The application programming interface (API) currently supports HTTP GET requests to read data. API calls are currently not authenticated or authorized. This means that the system returns only published content that is readable by any user. The system was developed within Play Framework together with Solr and jQuery.The company KulturIT, launched in 2013, is owned by five Norwegian and one Swedish museum. It is a non-profit organisation with all surplus going to development.The website was launched on 20 January 2015 and is currently being used by approximately 130 museums in Norway, Sweden and Åland. In March 2015 the Swedish national register of photography was in the process of being transferred to the KulturNav site. A register of Swedish architects is also available through Kulturnav.LIBRIS
LIBRIS (Library Information System) is a Swedish national union catalogue maintained by the National Library of Sweden in Stockholm. It is possible to freely search about 6.5 million titles nationwide.In addition to bibliographic records, one for each book or publication, LIBRIS also contains an authority file of people. For each person there is a record connecting name, birth and occupation with a unique identifier.The MARC Code for the Swedish Union Catalog is SE-LIBR, normalized: selibr.The development of LIBRIS can be traced to the mid-1960s. While rationalization of libraries had been an issue for two decades after World War II, it was in 1965 that a government committee published a report on the use of computers in research libraries. The government budget of 1965 created a research library council (Forskningsbiblioteksrådet, FBR). A preliminary design document, Biblioteksadministrativt Information System (BAIS) was published in May 1970, and the name LIBRIS, short for Library Information System, was used for a technical subcommittee that started on 1 July 1970. The newsletter LIBRIS-meddelanden (ISSN 0348-1891) has been published since 1972 and is online since 1997.Linked data
In computing, linked data (often capitalized as Linked Data) is a method of publishing structured data so that it can be interlinked and become more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web pages only for human readers, it extends them to share information in a way that can be read automatically by computers. Part of the vision of linked data is for the internet to become a global database.
Tim Berners-Lee, director of the World Wide Web Consortium (W3C), coined the term in a 2006 design note about the Semantic Web project.Linked data may also be open data, in which case it is usually described as linked open data (LOD).OpenCorporates
OpenCorporates is a website which shares data on corporate entities as open data under the share-alike attribution Open Database License. It was created by Chris Taggart and Rob McKinnon, under the auspices of their company, Chrinon Ltd, and launched on 20 December 2010. It has the aims of creating a URL with such data for every corporate entity in the world, importing government data relating to companies and matching it to specific companies.The site also shows groups of companies which are legally part of the same conglomerate. Basic company information is available as open data in XML or JSON format.The OpenCorporates Advisory Board exists to advise OpenCorporates on policy, practice and principles, and to ensure that OpenCorporates remains true to its central mission of the opening of company data for the public good. It was formed by three members: David Eaves, Kaitlin Lee and Andrew Stott.The same team also operate OpenCharities, compiling data on registered charities.Open Data Indices
Open data indices are indicators which assess and evaluates the general openness of an open government data portal. Open data indices not only show how open a data portal is, but also encourage citizens and government officials alike, to participate in their local open data communities, particularly in advocating for local open data and local open data policies.
There are two mainstream methodologies, which are Global Open Data Index and Open Data Barometer. The Global Open Data Index evaluates an open data portal from 11 different aspects based on the Open Definition of open data, while the Open Data Barometer adds two more indices compared to the previous one.Open Knowledge International
Open Knowledge International (OKI), known as the Open Knowledge Foundation (OKF) until April 2014 and then Open Knowledge until May 2016, is a global, non-profit network that promotes and shares information at no charge, including both content and data. It was founded by Rufus Pollock on 24 May 2004 in Cambridge, UK.Its slogan is, "Sonnets to statistics, genes to geodata ..."Open government
Open government is the governing doctrine which holds that citizens have the right to access the documents and proceedings of the government to allow for effective public oversight. In its broadest construction it opposes reason of state and other considerations, which have tended to legitimize extensive state secrecy. The origins of open government arguments can be dated to the time of the European Enlightenment: to debates about the proper construction of a then nascent democratic society.Open knowledge
Open knowledge is knowledge that one is free to use, reuse, and redistribute without legal, social or technological restriction. Open knowledge is a set of principles and methodologies related to the production and distribution of how knowledge works in an open manner. Knowledge is interpreted broadly to include data, content and general information.
The concept is related to open source and the Open Knowledge Definition is directly derived from the Open Source Definition. Open knowledge can be seen as being a superset of open data, open content and libre open access with the aim of highlighting the commonalities between these different groups.Open science data
Open science data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. A major purpose of the drive for open data is to allow the verification of scientific claims, by allowing others to look at the reproducibility of results, and to allow data from many sources to be integrated to give new knowledge. While the idea of open science data has been actively promoted since the 1950s, the rise of the Internet has significantly lowered the cost and time required to publish or obtain data.Open source
Open source is a term denoting that a product includes permission to use its source code, design documents, or content. It most commonly refers to the open-source model, in which open-source software or other products are released under an open-source license as part of the open-source-software movement. Use of the term originated with software, but has expanded beyond the software sector to cover other open content and forms of open collaboration.The Open Definition
The Open Definition is a document published by Open Knowledge International (OKI) (previously the Open Knowledge Foundation) to define openness in relation to data and content. It specifies what licences for such material may and may not stipulate, in order to be considered open licences. The definition itself was derived from the Open Source Definition for software.OKI summarise the document as:
Open means anyone can freely access, use, modify, and share for any purpose (subject, at most, to requirements that preserve provenance and openness).
The latest form of the document, published in November 2015, is version 2.1. The use of language in the document is conformant with RFC 2119.The document is available under a Creative Commons Attribution 4.0 International License, which itself meets the Open Definition.Tim Berners-Lee
Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English engineer and computer scientist, best known as the inventor of the World Wide Web. He is currently a professor of computer science at the University of Oxford and the Massachusetts Institute of Technology (MIT). He made a proposal for an information management system on March 12, 1989, and he implemented the first successful communication between a Hypertext Transfer Protocol (HTTP) client and server via the internet in mid-November the same year.Berners-Lee is the director of the World Wide Web Consortium (W3C), which oversees the continued development of the Web. He is also the founder of the World Wide Web Foundation and is a senior researcher and holder of the 3Com founders chair at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). He is a director of the Web Science Research Initiative (WSRI), and a member of the advisory board of the MIT Center for Collective Intelligence. In 2011, he was named as a member of the board of trustees of the Ford Foundation. He is a founder and president of the Open Data Institute, and is currently an advisor at social network MeWe.In 2004, Berners-Lee was knighted by Queen Elizabeth II for his pioneering work. In April 2009, he was elected a foreign associate of the United States National Academy of Sciences. Named in Time magazine's list of the 100 Most Important People of the 20th century, Berners-Lee has received a number of other accolades for his invention. He was honoured as the "Inventor of the World Wide Web" during the 2012 Summer Olympics opening ceremony, in which he appeared in person, working with a vintage NeXT Computer at the London Olympic Stadium. He tweeted "This is for everyone", which instantly was spelled out in LCD lights attached to the chairs of the 80,000 people in the audience. Berners-Lee received the 2016 Turing Award "for inventing the World Wide Web, the first web browser, and the fundamental protocols and algorithms allowing the Web to scale".Wikidata
Wikidata is a collaboratively edited knowledge base hosted by the Wikimedia Foundation. It is a common source of open data that Wikimedia projects such as Wikipedia can use, and anyone else, under a public domain license. This is similar to the way Wikimedia Commons provides storage for media files and access to those files for all Wikimedia projects, and which are also freely available for reuse. Wikidata is powered by the software Wikibase.
|Open data projects|
Massive open online education