Skip to Main Content
Spark: The Definitive Guide
book

Spark: The Definitive Guide

by Bill Chambers, Matei Zaharia
February 2018
Intermediate to advanced content levelIntermediate to advanced
606 pages
14h 54m
English
Content preview from Spark: The Definitive Guide

Chapter 24. Advanced Analytics and Machine Learning Overview

Thus far, we have covered fairly general data flow APIs. This part of the book will dive deeper into some of the more specific advanced analytics APIs available in Spark. Beyond large-scale SQL analysis and streaming, Spark also provides support for statistics, machine learning, and graph analytics. These encompass a set of workloads that we will refer to as advanced analytics. This part of the book will cover advanced analytics tools in Spark, including:

  • Preprocessing your data (cleaning data and feature engineering)

  • Supervised learning

  • Recommendation learning

  • Unsupervised engines

  • Graph analytics

  • Deep learning

This chapter offers a basic overview of advanced analytics, some example use cases, and a basic advanced analytics workflow. Then we’ll cover the analytics tools just listed and teach you how to apply them.

Warning

This book is not intended to teach you everything you need to know about machine learning from scratch. We won’t go into strict mathematical definitions and formulations—​not for lack of importance but simply because it’s too much information to include. This part of the book is not an algorithm guide that will teach you the mathematical underpinnings of every available algorithm nor the in-depth implementation strategies used. The chapters included here serve as a guide for users, with the purpose of outlining what you need to know to use Spark’s advanced analytics APIs.

A Short Primer ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Start your free trial

You might also like

Learning Spark

Learning Spark

Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia
Kafka: The Definitive Guide

Kafka: The Definitive Guide

Neha Narkhede, Gwen Shapira, Todd Palino
Learning Spark, 2nd Edition

Learning Spark, 2nd Edition

Jules S. Damji, Brooke Wenig, Tathagata Das, Denny Lee
Kafka: The Definitive Guide, 2nd Edition

Kafka: The Definitive Guide, 2nd Edition

Gwen Shapira, Todd Palino, Rajini Sivaram, Krit Petty

Publisher Resources

ISBN: 9781491912201Errata PageSupplemental Content