Big Data

Looking Inside A Big Data Toolbox

Pinterest LinkedIn Tumblr

Much like term super, the big in big data comes with a certain amount of hype. Just as we now have supercars, supermodels, superspreaders and super-sized meals, we now have big business, big data and of course Big Macs.

Regardless of the hype cycle, big data has firmly entered our tech-business vocabulary. We now use it as a kind of blanket term when we talk about the massive web-scale information streams being passed over the cloud, inside the Internet of Things (IoT) and throughout the new realms of Artificial Intelligence (AI).

Broadly meant to refer to an amount of data that is too large to fit comfortably or productively into anything that resembles a ‘traditional’ relational database management system, big data is still just data… but it includes core operational enterprise data plus all the pieces of information that an organization knows it has, but is perhaps yet to act upon.

To wrangle our way through the mire of big data, an increasing number of software companies are getting into the big data tools business. So what size and shape are these tools and what do they do?

No-code data access platform company software Okera reminds us that large organizations have a variety of data access control use cases, which means they require flexible Attribute-Based Access Control (ABAC) policies. An ABAC policy can combine multiple attributes, including user, tool, type of data and location, to enable self-service analytics while ensuring secure, compliant access to data.

When enterprises allow access to the big data estate, not every employee should be able to get access to all the information that exists, for obvious reasons related to privacy and security. Okera has automated tools for this kind of function. This process is known as dynamic data masking (i.e. transforming data to a structurally similar shape but with inauthentic values for testing purposes) and data tokenization (i.e. transforming data values into a placeholder token that is random and unidentifiable) and both may be brought into play concurrently.

“Self-service analytics is the holy grail of enabling enterprises to take full advantage of their data for digital transformation initiatives related to the customer experience, end-to-end business processes, and improved business decision making,” said Nick Halsey, Okera CEO. “By eliminating the need for coding, we have put access control management into the hands of the data stewards and governance and privacy professionals who understand the intricacies of regulations and internal [big] data privacy policies. This democratization of secure, compliant access to data is critical to making true self-service analytics a reality.”

Getting dirtier with the data toolbox grease-gun
But deeper (and potentially dirtier) than low-code is, a technology described as an AI data science feature marketplace and a platform created for data scientists to allow them to discover and evaluate tens of thousands of external features across myriad datasets.

Using the platform, data scientists can identify and evaluate data sets of interest and quickly run correlation analyses against their original data set. This expedites finding the ideal candidates for data modeling and purchasing only the data best-fit to their needs.

Real-world marketing location intelligence company Gravy Analytics has become the exclusive human foot traffic data provider to the data marketplace. The company’s work with illustrates just how we can refer to these data navigation ‘application’ as tools in the big data toolbox in their own right.

“Pre-pandemic data sets and assumptions won’t fly in the post-COVID-19 world. Businesses are starting over and they’re going to need relevant data to understand where people are going post-pandemic, and how that maps to their operations, product development, supply chains and marketing,” said Jeff White, founder and CEO of Gravy Analytics. “’s platform makes it easier for researchers to explore the relationship between people’s movements and multitudinous other data features, fueling new use cases and possibilities.”

Deep in the toolbox
Arguably one dip deeper into the big data toolbox is JetBrains, a company known for its programmer-centric family of Integrated Development Environments (IDEs) for various programming languages. The company’s logically named Big Data Tools is an instrument for data engineers and other professionals that work with data to bring all their tools to a single place, for DataGrip and PyCharm.

JetBrains reminds us that since 2012, big data has created 8 million jobs in the US alone and six million more worldwide. This year in 2020, new job openings for data scientists and similar advanced analytical roles in the US are expected to reach 61,799. The company says that it recognizes this industry shift and is utilizing its expertise to making tools available for developers for data scientists and engineers.

Vitaly Khudobakhshov, project lead of Big Data Tools at JetBrains explains that his company’s Big Data Tools plugin (which is an early-access program) is now available for the general public. It offers functions such as smart navigation, code completion, inspections and quick-fixes, plus refactorings. So, without going into the deeper aspects of the technology, these tools work to help big data manipulate, manage and shape data into workable forms… much like real world tools and spanners in many ways.

Do we need big data in the first place?
So although we can go some way to defining the toolsets, procedures and techniques that exist to help us with specific dig data wrangling tasks, not everybody is sold on big data analytics as a means of attaining greater business insight.

A report in the Harvard Business Review as far back as 2013 pointed out instances where enterprises have ingested huge amounts of big data to gain ‘insight’ into how they could potentially reengineer their operations for greater profit. On more than one occasion, there are times when the big data barometer’s suggestion requires the excessive redesign of an entire supply chain to make it worthwhile. On other occasions, common sense and a gut feel for business can prove just as (if not more) useful.

Big data tools are there to use, but sometimes all business needs for reinvention is a good kick with a hefty boot.