image

You hear the words everywhere these days, “Big Data, Big Data, Big Data.” Well how do you know whether or not the data you’re dealing with is big? In 2001, the data professionals of Meta (now Gartner) created a definition called the 3 Vs.

So, are you looking at big data? Here’s an explanation of the indicators:

1.    Volume

Every transaction, every marketing push, every member record, creates data that is collected. Every member or transaction could produce 100+ data points through a variety of organizational channels. Disney is an excellent example of a company collecting a large volume of data.

If you have recently attended a Disney theme park, you may have noticed that they distribute wristbands to every customer that walks through the gates. These wristbands are their way of tracking your every move throughout the park to better understand customer behavior. Which ride or park do customers spend the most time? Which restaurant is their favorite? How many rides does the typical family ride before stopping for a bathroom break? All of that information generated by each park visitor at once represents an extremely high volume of data.

In order to provide another sense of how much volume is in big data: It would crash your basic data tools like Excel. You would need to use software like Hadoop that provides the strength and capacity to store and manipulate the volume of data being collected.

2.    Velocity

Velocity is the pace at which you receive new data and information. It can range from real-time, like social media data, to less frequently, like weekly event registration reports from a vendor.

Your organization’s ability to manage, analyze, and act on data velocity is a competitive differentiator; mastering this process is a way to gain a leg up on the competition.

Twitter is a great indicator as to how fast data can be received. As soon as you tweet, click on a username or a link, or use a handle or hashtag, hundreds of data points are generated, from your location to your sentiment etc. The Twitterverse, like the rest of social media, moves fast and is always on, so data being collected is moving in real-time and multiplied by the vastness of users throughout the world.

3.    Variety

Unstructured data is difficult to deal with. This is when data is in a variety of data formats such as non-aligned data structures, and inconsistent data semantics. In other words, the data is collected from various sources at different speeds for different purposes. This is not a controlled experiment — it is where as much data is collected as possible without a specific outcome in mind.

Attempts to resolve data variety issues must be approached as an ongoing endeavor. One way to resolve this is through data profiling. You need to take a comprehensive inventory of a data set as you begin working with it: Are values missing? Which fields are numerical versus free text? Are there any relationships between variables? Discovering hidden relationships as well as resolving inconsistencies across multiple data sources will allow you to clean up the messiness of real data. In order to best analyze a variety of data; it must be in the same format in one location.

4.    Veracity

Although not a part of the original definition, veracity is commonly considered the fourth V of Big Data. Organizations who hold events can relate to this factor of data because many of their events rely on manual registration forms. With this format, registration forms are rarely identical, even year-over-year for the same event and may be open to the interpretation of the attendee

Data’s messiness — or its “real world”, human-generated nature, can make it untrustworthy. When the data’s source is an event registrant using a manual fill in field, the data can become skewed. If your organization asks attendees to fill in their job title at registration, you can have three different people who are CEOs, but one will write “CEO”, one will write “Chief Executive”, and the third “may leave the field blank. In a perfect data world, there would be no issues with data’s veracity, but no examination of “big data” is complete without analyzing the data’s source and examining its fidelity.

To recap: Big Data originates in large quantities at a high velocity. Now that you know the four Vs of big data, take a step back and ask yourself, is our organization handling Big Data?

Nick

@BearAnalytics

Photo By: jeltovski