OpenRefine

OpenRefine, formerly called Google Refine and before that Freebase Gridworks, is a standalone open source desktop application for data cleanup and transformation to other formats, the activity known as data wrangling.[3] It is similar to spreadsheet applications (and can work with spreadsheet file formats); however, it behaves more like a database.

It operates on rows of data which have cells under columns, which is very similar to relational database tables. An OpenRefine project consists of one table. The user can filter the rows to display using facets that define filtering criteria (for example, showing rows where a given column is not empty). Unlike spreadsheets, most operations in OpenRefine are done on all visible rows: transformation of all cells in all rows under one column,[4] creation of a new column based on existing column data, etc. All actions that were done on a dataset are stored in a project and can be replayed on another dataset.

Unlike spreadsheets, no formulas are stored in the cells, but formulas are used to transform the data, and transformation is done only once.[5] Transformation expressions can be written in General Refine Expression Language (GREL),[6] Jython (i.e. Python) and Clojure.[7]

The program has a web user interface. However, it is not hosted on the web (SAAS), but is available for download and use on the local machine. When starting OpenRefine, it starts a web server and starts a browser to open the web UI powered by this web server.

OpenRefine
OpenRefine New Logo
Developer(s)Freebase, then Google, now open source community
Initial releaseNovember 10, 2010
Stable release
2.8 / November 19, 2017 [1]
Written inJava [2]
PlatformMicrosoft Windows, Linux, macOS
Available inEnglish, Italian, Chinese, Japanese, French
Type
LicenseBSD License
Websiteopenrefine.org

Possible uses of software

  • Cleaning messy data: for example if working with a text file with some semi-structured data, it can be edited using transformations, facets and clustering to make the data cleanly structured.[8]
  • Transformation of data: converting values to other formats, normalizing and denormalizing.
  • Parsing data from web sites: OpenRefine has a URL fetch feature and jsoup HTML parser and DOM engine.[9]
  • Adding data to dataset by fetching it from webservices (i.e. returning json).[10] For example, can be used for geocoding addresses to geographic coordinates.[11]
  • Aligning to Wikidata (formerly Freebase[12]): this involves reconciliation — mapping string values in cells to entities in Wikidata.[13]

Supported formats from import and export

Import is supported from following formats:[14]

If input data is in a non-standard text format, it can be imported as whole lines, without splitting into columns, and then columns extracted later with OpenRefine's tools. Archived and compressed files are supported (.zip, .tar.gz, .tgz, .tar.bz2, .gz, or .bz2) and Refine can download input files from a URL. To use web pages as input, it is possible to import list of URLs and then invoke a URL fetch function.

Export is supported in following formats:[16]

Whole OpenRefine projects in native format can be exported as a .tar.gz archive.

History

OpenRefine started life as Freebase Gridworks developed by Metaweb and has been available as open source since January, 2010.[17] On 16 July 2010, Google acquired Metaweb,[18] the creators of Freebase, and on 10 November 2010 renamed their Freebase Gridworks software to Google Refine, releasing version 2.0.[19] On 2 October 2012, original author David Huynh announced that Google would soon stop its active support of Google Refine.[20][21][22] Since then, the codebase has been in transition to an open source project named OpenRefine.[23]

References

  1. ^ "Release OpenRefine v2.8".
  2. ^ "OpenRefine/OpenRefine - GitHub". Retrieved 25 June 2017.
  3. ^ "OpenRefine Project Home".
  4. ^ "Editing by transforming: Cell Editing wiki page from Refine documentation". Retrieved 18 April 2012.
  5. ^ "Comparison with spreadsheet software: Cell Editing wiki page in Refine documentation". Retrieved 18 April 2012.
  6. ^ General Refine expression language OpenRefine/OpenRefine Wiki GitHub. Github.com (2013-04-03). Retrieved on 2013-08-16.
  7. ^ "Expressions: Refine documentation". Retrieved 18 April 2012.
  8. ^ "Screencast: Google Refine 2.0 - Introduction (1 of 3) - editing government data". Retrieved 18 April 2012.
  9. ^ "Stripping HTML: Refine documentation wiki page". Retrieved 18 April 2012.
  10. ^ "FetchingURLsFromWebServices wiki page: Refine documentation". Retrieved 18 April 2012.
  11. ^ "Screencast: Google Refine 2.0 - Data Augmentation (3 of 3) - using Openstreetmap Nominatim for geocoding and Freebase for augmentation". Retrieved 18 April 2012.
  12. ^ "Schema Alignment: Refine documentation wiki page". Retrieved 18 April 2012.
  13. ^ "OpenRefine documentation: Reconciliation". Retrieved 12 March 2017.
  14. ^ "Importers: Refine documentation wiki page". Retrieved 18 April 2012.
  15. ^ "Changelog for 2.5". Retrieved 18 April 2012.
  16. ^ "Exporting: Refine documentation wiki page". Retrieved 18 April 2012.
  17. ^ https://code.google.com/p/google-refine/source/detail?r=2
  18. ^ "Google Official Blog: Deeper understanding with Metaweb". Retrieved 18 April 2012.
  19. ^ "Google Opensource blog: Announcing Google Refine 2.0, a power tool for data wranglers". Retrieved 18 April 2012.
  20. ^ "[announcement] the future of the Refine projects".
  21. ^ "From Freebase Gridworks to Google Refine and now OpenRefine".
  22. ^ OpenRefine. OpenRefine. Retrieved on 2013-08-16.
  23. ^ google-refine - Google Refine, a power tool for working with messy data (formerly Freebase Gridworks) - Google Project Hosting. Code.google.com. Retrieved on 2013-08-16.

External links

AI Challenge

The AI Challenge was an international artificial intelligence programming contest started by the University of Waterloo Computer Science Club.

Initially the contest was for University of Waterloo students only. In 2010, the contest gained sponsorship from Google and allowed it to extend to international students and the general public.

Android Q

Android "Q" is the upcoming tenth major release and the 17th version of the Android mobile operating system. The first beta of Android Q was released on March 13, 2019 for all Google Pixel phones. The final release of Android Q is scheduled to be released in the third quarter of 2019.

BigQuery

BigQuery is a RESTful web service that enables interactive analysis of massively large datasets working in conjunction with Google Storage. It is a serverless Platform as a Service (PaaS) that may be used complementarily with MapReduce.

Chromebit

The Chromebit is a dongle running Google's Chrome OS operating system. When placed in the HDMI port of a television or a monitor, this device turns that display into a personal computer. Chromebit allows adding a keyboard or mouse over Bluetooth or Wi-Fi. The device was announced in April 2015 and began shipping that November.

Flutter (software)

Flutter is an open-source mobile application development framework created by Google. It is used to develop applications for Android and iOS, as well as being the primary method of creating applications for Google Fuchsia.

GData

GData (Google Data Protocol) provides a simple protocol for reading and writing data on the Internet, designed by Google. GData combines common XML-based syndication formats (Atom and RSS) with a feed-publishing system based on the Atom Publishing Protocol, plus some extensions for handling queries. It relies on XML or JSON as a data format.

Google provides GData client libraries for Java, JavaScript, .NET, PHP, Python, and Objective-C.

G Suite Marketplace

G Suite Marketplace (formerly Google Apps Marketplace) is a product of Google Inc. It is an online store for web applications that work with Google Apps (Gmail, Google Docs, Google Sites, Google Calendar, Google Contacts, etc.) and with third party software. Some Apps are free. Apps are based on Google APIs or on Google Apps Script.

Gayglers

Gayglers is a term for the gay, lesbian, bisexual and transgender employees of Google. The term was first used for all LGBT employees at the company in 2006, and was conceived as a play on the word "Googler" (a colloquial term to describe all employees of Google).The term, first published openly by The New York Times in 2006 to describe some of the employees at the company's new Manhattan office, came into public awareness when Google began to participate as a corporate sponsor and float participant at several pride parades in San Francisco, New York, Dublin and Madrid during 2006. Google has since increased its public backing of LGBT-positive events and initiatives, including an announcement of opposition to Proposition 8.

Google Behind the Screen

"Google: Behind the Screen" (Dutch: "Google: achter het scherm") is a 51-minute episode of the documentary television series Backlight about Google. The episode was first broadcast on 7 May 2006 by VPRO on Nederland 3. It was directed by IJsbrand van Veelen, produced by Nicoline Tania, and edited by Doke Romeijn and Frank Wiering.

Google Business Groups

Google Business Group (GBG) is a non-profit community of business professionals to share knowledge about web technologies for business success. It has over 150 local communities or chapters in various cities including Mumbai, Bangalore, Belgaum, Chandigarh, Jaipur, Chennai, Buenos Aires, Davao, Cape Town, Rio de Janeiro, Peshawar and Lahore; spanning across 30 countries around the world. The initiative was started by and is backed by Google, but driven by local chapter managers and the community members to connect, learn and impact overall success of their businesses; it is independent from the Google Corporation.

Google Dataset Search

Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched the service on September 5, 2018, and stated that the product was targeted at scientists and data journalists.

Google Dataset Search complements Google Scholar, the company's search engine for academic studies and reports.

Google Finance

Google Finance is a website focusing on business news and financial information hosted by Google.

Google Fit

Google Fit is a health-tracking platform developed by Google for the Android operating system and Wear OS. It is a single set of APIs that blends data from multiple apps and devices. Google Fit uses sensors in a user's activity tracker or mobile device to record physical fitness activities (such as walking or cycling), which are measured against the user's fitness goals to provide a comprehensive view of their fitness.

Google Flights

Google Flights is an online flight booking search service which facilitates the purchase of airline tickets through third party suppliers.

Google Forms

Google Forms is a survey administration app that is included in the Google Drive office suite along with Google Docs, Google Sheets, and Google Slides.

Forms features all of the collaboration and sharing features found in Docs, Sheets, and Slides.

Google Guice

Google Guice (pronounced "juice") is an open-source software framework for the Java platform released by Google under the Apache License. It provides support for dependency injection using annotations to configure Java objects. Dependency injection is a design pattern whose core principle is to separate behavior from dependency resolution.

Guice allows implementation classes to be bound programmatically to an interface, then injected into constructors, methods or fields using an @Inject annotation. When more than one implementation of the same interface is needed, the user can create custom annotations that identify an implementation, then use that annotation when injecting it.

Being the first generic framework for dependency injection using Java annotations in 2008, Guice won the 18th Jolt Award for best Library, Framework, or Component.

Google The Thinking Factory

Google: The Thinking Factory is documentary film about Google Inc. from 2008 written and directed by Gilles Cayatte.

Project Sunroof

Project Sunroof is a solar power initiative started by Google engineer Carl Elkin. The initiative's stated purpose is "mapping the planet's solar potential, one roof at a time."

Rajen Sheth

Rajen Sheth is an executive at Google, where he currently runs product management at cloud AI and machine learning team. The idea of an enterprise version Google's email service Gmail was pitched by Rajen in a meeting with CEO Eric Schmidt in 2004. Schmidt initially rejected the proposal, arguing that the division should focus on web search, but the suggestion was later accepted. Sheth is known as "father of Google Apps", and is responsible for development of Chrome and Chrome OS for Business.

Overview
Advertising
Communication
Software
Platforms
Hardware
Development
tools
Publishing
Search
(timeline)
Events
People
Other
Related

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.