Building a data strategy in 2016

Vicki Boykis | CapTech | @vboykis

Drawing

About CapTech

image

About me


Drawing

The data landscape in 2006

image

Drawing

The data landscape today

Drawing

Now that we have all this cool stuff, how do we use it?

Agenda

Drawing

  • The basic rules of data
  • Picking the right tool
  • Common architecture patterns today

Basic rules of data


Drawing

Data rule #1: Law of entropy

Natural processes have a preferred progress towards chaos

Goal: "I want to know how many customers I have."

Drawing

Data Rule #1a: Vicki's Rule of Data

The further away from source your data is, the more annoying it gets to query and manage

Drawing

Result

Drawing Drawing

Data Rule #2: Pareto Rule

People use 20% of their tools 80% of the time because people stick to what they know.
Drawing

From the Data Science Salary Survey 2015

and The emergence of Spark.

Data Rule #3: Conway's Law

Organizations which design systems ... are constrained to produce designs which are copies of the communication structures of these organizations

Drawing

How to evaluate data tools


Drawing

Q1: Do you have enough data for Big Data?

Drawing

If you don't have big data, here's what your stack looks like:

Drawing

Q2: What kind of data do you have?

Drawing

  • Slow/fast : stock trading v.s. home insurance
  • Relational/ non-relational: banking transactions v.s. Wikipedia
  • Is your data special? : Are you stargazing?

Q3: Who is going to be working with your data?

Drawing
  • Engineers - Documentation? APIs? Testing?
  • Analysts - Access? Lineage?
  • Decision-makers - Data meaning?

Common architecture today

Drawing

Drawing
Infrastructure Analysis Visualization
Kafka Tensor Flow Caravel
Feather Bayesian Query Language Jupyter Dashboards

Thanks! Questions?

For more, check out my data post.

About the Icons:

Icons made by Freepik from www.flaticon.com is licensed by CC 3.0 BY

Twitter: @vboykis /Web