Sawzall (programming language)

Sawzall is a procedural domain-specific programming language, used by Google to process large numbers of individual log records. Sawzall was first described in 2003,[1] and the szl runtime was open-sourced in August 2010.[2] However, since the MapReduce table aggregators have not been released,[3] the open-sourced runtime is not useful for large-scale data analysis of multiple log files off the shelf. Sawzall has been replaced by Lingo (logs in Go) for most purposes within Google.[4]

Motivation

Google's server logs are stored as large collections of records (Protocol Buffers) that are partitioned over many disks within GFS. In order to perform calculations involving the logs, engineers can write MapReduce programs in C++ or Java. MapReduce programs need to be compiled and may be more verbose than necessary, so writing a program to analyze the logs can be time-consuming. To make it easier to write quick scripts, Rob Pike et al. developed the Sawzall language. A Sawzall script runs within the Map phase of a MapReduce and "emits" values to tables. Then the Reduce phase (which the script writer does not have to be concerned about) aggregates the tables from multiple runs into a single set of tables.

Currently, only the language runtime (which runs a Sawzall script once over a single input) has been open-sourced; the supporting program built on MapReduce has not been released.[3]

Features

Some interesting features include:

  • A Sawzall script has a single input (a log record) and can output only by emitting to tables. The script can have no other side-effects.
  • A script can define any number of output tables. Table types include:
    • collection saves every value emitted
    • sum saves the sum of every emitted value
    • maximum(n) saves only the highest n values on a given weight.
  • In addition, there are several statistical table types that give inexact results. The higher the parameter n, the more accurate the estimates are.
    • sample(n) gives a random sample of n values from all the emitted values
    • quantile(n) calculates a cumulative probability distribution of the given numbers.
    • top(n) gives n values that are probably the most frequent of the emitted values.
    • unique(n) estimates the number of unique values emitted.

Sawzall's design favors efficiency and engine simplicity over power:

  • Sawzall is statically typed, and the engine compiles the script to x86 before running it.
  • Sawzall supports the compound data types lists, maps, and structs. However, there are no references or pointers. All assignments and function arguments create copies. This means that recursive data structures and cycles are impossible.
  • Like C, functions can modify global variables and local variables but are not closures.

Sawzall code

This complete Sawzall program will read the input and produce three results: the number of records, the sum of the values, and the sum of the squares of the values.

count: table sum of int;
total: table sum of float;
sum_of_squares: table sum of float;
x: float = input;
emit count <- 1;
emit total <- x;
emit sum_of_squares <- x * x;

See also

Notes

  1. ^ Rob Pike, Sean Dorward, Robert Griesemer, Sean Quinlan. Interpreting the Data: Parallel Analysis with Sawzall
  2. ^ Sawzall's open source project at Google Code.
  3. ^ a b Discussion on which parts of Sawzall are open-source.
  4. ^ "Replacing Sawzall". 2015-12-04. Retrieved 2018-06-18.

References

  • S. Ghemawat, H. Gobioff, S.-T. Leung, The Google file system, in: 19th ACM Symposium on Operating Systems Principles, Proceedings, 17 ACM Press, 2003, pp. 29–43.
  • MapReduce [1]
Google data centers

Google data centers are the large data center facilities Google uses to provide their services, which combine large amounts of digital storage (mainly hard drives and solid-state drives), compute nodes organized in aisles of racks, internal and external networking, environmental controls (mainly cooling and dehumidification), and operations software (especially as concerns load balancing and fault tolerance). This article describes the technological infrastructure behind Google's websites as presented in the company's public announcements.

There’s no official data on how many servers there are in Google data centers, but Gartner estimated in a July 2016 report that Google at the time had 2.5 million servers. This number is always changing as the company expands capacity and refreshes its hardware.

Sawmill (software)

Sawmill is a software package for the statistical analysis and reporting of log files, with dynamic contextual filtering, 'live' data zooming, user interface customization, and custom calculated reports. Sawmill also incorporates real-time reporting and real-time alerting. Available since 1997, at this time Sawmill provides support for approximately 850 server log file formats, with new formats added on request. Sawmill also includes a page tagging server and JavaScript page tag for the analysis of client side clicks (client requests) providing a total view of visitor traffic and on-site behavioural activity.

Sawmill Analytics is offered in three forms, as a software package for user deployment, as a turnkey on-premises system appliance, and as a SaaS. Sawmill analyzes any device or software package producing a log file and that includes web servers, firewalls, proxy servers, mail servers, network devices (switches & routers etc.), syslog servers, databases etc.

Sawmill is the OEM reporting engine sold by Blue Coat Systems as a bundled part of their proxy server product. A branded version of it is also sold by Cisco Systems as Sawmill for IronPort. A branded version of it is also sold by Vicomsoft Ltd as InterGate Intelligence. A branded version of it is also sold by SonicWALL as "SonicWALL Aventail Advanced Reporting".Sawmill was a second runner-up in the 2009 Streaming Media European Readers' Choice Awards.Sawmill is listed in the Ideal Observer's Web Analytics Tool Overview.

Sawzall

Sawzall may refer to:

Sawzall (programming language), a domain-specific programming language

A brand of reciprocating saw manufactured by the Milwaukee Electric Tool Company

Operating systems
Programming languages
Software
Publications
Other

This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.