Looking for
value in the Big data-network Metaphor or inefficient anchovy fishing
Imagine a fishing boat captain goes to sea to fish for tuna or other larger species. The ship was built and equipped for fishing tuna or similar fish species, and to make the trip profitable, the master should ensure several things: The availability of rigs and equipment appropriate fishing (sonar, networks with appropriate mesh type fish, among others); maps showing the precise location of fish stocks and the path to be followed, fishing permits and other matters.
When they reach the fishing area, he discovers that
someone took on board some anchovy mesh networks. Apparently, these networks
are better because they allow the capture of small tuna and minor species,
however create problems for several reasons. They can fish species lower than
permitted size, creating imbalance in the population is free; they can fish smaller species for which no
equipment or permits. The next task is unnecessary tuna selection and exclusion
of other species to throw them into the sea, unless it is a predator factory
ship where everything becomes fish flour.
A waste of time, waste of resources and the possibility
of confiscating the ship, heavy fines or suspension of permits when discovered
by honest maritime authorities. Let's assume a utopian world where there is no
possibility of kickbacks or markets to sell illegal fishing. The final effect
is the waste of time, waste of resources and possibly return to port with less
fish than expected, a loss of value in the fishery; all these negative effects
arising from incorrect use fishing gear.
The above story allows us to apply the wrong metaphor for
the world of business where the fish are equivalent to the data that are
available for business networks. In today's world where data are captured from
different sources, different formats, at all times and with different media
organizations should be able to find, select, filter, process and generate
information from data they are useful to take right decisions with which value
for the company and customers, to provide customers what they want, sell and obtain
an adequate return is created. That is, they must have on hand and wisely use
techniques and correct procedures for data value.
The data are in the "virtual open sea" Big
Data, a huge stage, which unlike the actual physical ocean that is fixed, grows
continuously. Big Data is characterized by the three classic V (volume,
variety, velocity) and an additional fourth that can be decisive, Veracity. By
volume it means the vast amount of data available in the world and constantly
being created; by variety we refer to the different formats in which is (text,
video, audio, images, etc.); velocity of the large amount of data being added
to the stock available. Truthfulness is a vital quality to make critical
decisions. If the data is false information is spurious, any decision is wrong
and will cause losses and other problems.
Davenport and Dyche (2013: 3) indicate that new
technologies of information as Big Data can generate fantastic cost reductions,
substantial improvement in data processing times, creating a product or a new
service. Technologies and concepts behind them, allow achieve a variety of
objectives, which have influence on financial results, processes and quality
management organization.
The cost, the use of technologies such as clusters or networks Hadoop can bring the cost of storing 1 terabyte (one million gigabytes) from $ 37.000 to a base of typical relational database, to $ 5,000 in an application database and only $ 2.000 on a Hadoop cluster.
Davenport and Dyche (2013: 5) also consider that the second common goal of business with Big data technology is the reduced time. For example, the retailer Macy´s reduced optimization time for the pricing of 73 million items from 27 hours to just one hour. This feature "big data analytics" allows renew the chain prices more often and better adapt to changing market conditions in the retail market.
Some analysts say that mankind has created five exabytes (ie, 5 billion gigabytes) of data from the Stone Age until 2003; in 2011 that number was created in just two days in 2013 were required only 10 minutes (van der Aalst, 2014: 15) Remember that in the US, a trillion is a thousand billion.
The cost, the use of technologies such as clusters or networks Hadoop can bring the cost of storing 1 terabyte (one million gigabytes) from $ 37.000 to a base of typical relational database, to $ 5,000 in an application database and only $ 2.000 on a Hadoop cluster.
Davenport and Dyche (2013: 5) also consider that the second common goal of business with Big data technology is the reduced time. For example, the retailer Macy´s reduced optimization time for the pricing of 73 million items from 27 hours to just one hour. This feature "big data analytics" allows renew the chain prices more often and better adapt to changing market conditions in the retail market.
Some analysts say that mankind has created five exabytes (ie, 5 billion gigabytes) of data from the Stone Age until 2003; in 2011 that number was created in just two days in 2013 were required only 10 minutes (van der Aalst, 2014: 15) Remember that in the US, a trillion is a thousand billion.
For this reason, it has created a new concept, a new
metaphor to describe the immensity of available data. The concept of
"large lake data" (data big lake), a large mass of data that exists
in the natural state or without trial. The central challenge is how you can
store, process and efficiently use the massive amount of data. Compañas as Google
and Facebook have useful to take advantage of the lake data technologies, but
still are in an early stage. As the "lake of data" a recent concept,
so are the relevant technologies, but certainly a new way to manage this wealth
effectively it is needed.
What we want to show with this background? In business,
staff area computer or Information Technology, through ignorance, apathy or
comfort can opt for the use of equivalent shares anchoveteras networks. These
people can know programming languages, algorithms and protocols for using the
software and equipment, but are often unaware of the essence of the business,
so someone must be responsible for this part. In general, when requesting
information from available data, often used randomly mathematical models using
the method of "trial and error" trying to find the model that best
fits the available data, setting in appearance can be good because the basic
statistics (mean, standard deviation, correlation) properties are acceptable
but can be misleading as evidenced by the Anscombe quartet (data sets with the
same statistical properties but totally different dispersion diagrams).
If the systems analyst is lucky, the model will fit the
data in every way, and the end user have adequate information to make correct
decisions. It could also be said to be a lucky guy, only that luck is not
permanent and do not win the lottery twice (Note 1). The questionable aspect is
that the analyst failed these acting as a competent professional results. If
the information is incorrect, you can induce the user to make mistakes. Using
anchoveteras networks allow tuna fishing sometimes, but the action is
inefficient and ineffective, it has real costs and hidden costs that can and
should be avoided because they neutralize the advantages and benefits of Big
Data.
Who is the appropriate person to prevent, correct and to
punish gitanería, divination? It is the General Manager, CEO, entrepreneur or
trained people who know the business and have a lot of common sense. These
people do not need to know programming or all the secrets of the world of
computers, but they must have the necessary to guide the search and achieve the
benefits as we saw earlier knowledge, offers the use of Big Data (high speed,
low cost, variety ).
In other words, the CEO should be able to avoid the use
of mesh anchovy knowing that the goal is fishing for tuna or large species; he
must identify who does and propose corrective measures. He is the captain and
must manage the ship and crew processes so that compliance with the work plan.
CEO involvement to prevent missteps as the use of
inadequate networks (arbitrary and random use of models to fish whatever)
overcomes drawbacks such as:
i) Lack of professionalism of operating systems analysts and
ii) Possibility of falling into the trap of Anscombe's quartet
iii) wasting resources (time, equipment, man hours, money paid for the data without results). Big data is not free.
iv) Delays in information generation, a factor that is critical in these times of acceleration
v) the CEO difficulty to obtain a perception, an insight, more full of relevant data for your organization
vi) Inefficient use of technologies associated with Big Data
vii) Lack in creating a bank of ideas, identification of useful models, discovering new relationships between data to answer new questions concerning the business
ii) Possibility of falling into the trap of Anscombe's quartet
iii) wasting resources (time, equipment, man hours, money paid for the data without results). Big data is not free.
iv) Delays in information generation, a factor that is critical in these times of acceleration
v) the CEO difficulty to obtain a perception, an insight, more full of relevant data for your organization
vi) Inefficient use of technologies associated with Big Data
vii) Lack in creating a bank of ideas, identification of useful models, discovering new relationships between data to answer new questions concerning the business
In conclusion, you are captain of a tuna boat. The open
sea is your destination and you can always fish you want, but do it with
appropriate fishing gear. Keep your crew to make mistakes because the result
will be disastrous, do not let pass smuggling networks with anchovy mesh.
Note 1. The policy of lucky decisions was not affected by the basic rule of
computing, GIGO (garbage in, garbage out), that is "garbage in, garbage
out," referring to the situation when entering useless data to a process,
the information generated must also be.
References
Thomas H. Davenport, Jill Dyche (2013) Big Data in Big
Companies
Mayor, 2013, International Institute for Analytics
Data Scientist: The Engineer of the Future
P. M. Wil van der Aalst
Tableau (2015) Top 7 Trends in Big Data for 2015
Mayor, 2013, International Institute for Analytics
Data Scientist: The Engineer of the Future
P. M. Wil van der Aalst
Tableau (2015) Top 7 Trends in Big Data for 2015