domingo, 10 de abril de 2016

Fishing whatever in Big Data-network Metaphor anchovy



Looking for value in the Big data-network Metaphor or inefficient anchovy fishing
 

Imagine a fishing boat captain goes to sea to fish for tuna or other larger species. The ship was built and equipped for fishing tuna or similar fish species, and to make the trip profitable, the master should ensure several things: The availability of rigs and equipment appropriate fishing (sonar, networks with appropriate mesh type fish, among others); maps showing the precise location of fish stocks and the path to be followed, fishing permits and other matters.

When they reach the fishing area, he discovers that someone took on board some anchovy mesh networks. Apparently, these networks are better because they allow the capture of small tuna and minor species, however create problems for several reasons. They can fish species lower than permitted size, creating imbalance in the population is free; they  can fish smaller species for which no equipment or permits. The next task is unnecessary tuna selection and exclusion of other species to throw them into the sea, unless it is a predator factory ship where everything becomes fish flour.

A waste of time, waste of resources and the possibility of confiscating the ship, heavy fines or suspension of permits when discovered by honest maritime authorities. Let's assume a utopian world where there is no possibility of kickbacks or markets to sell illegal fishing. The final effect is the waste of time, waste of resources and possibly return to port with less fish than expected, a loss of value in the fishery; all these negative effects arising from incorrect use fishing gear.



The above story allows us to apply the wrong metaphor for the world of business where the fish are equivalent to the data that are available for business networks. In today's world where data are captured from different sources, different formats, at all times and with different media organizations should be able to find, select, filter, process and generate information from data they are useful to take right decisions with which value for the company and customers, to provide customers what they want, sell and obtain an adequate return is created. That is, they must have on hand and wisely use techniques and correct procedures for data value.

The data are in the "virtual open sea" Big Data, a huge stage, which unlike the actual physical ocean that is fixed, grows continuously. Big Data is characterized by the three classic V (volume, variety, velocity) and an additional fourth that can be decisive, Veracity. By volume it means the vast amount of data available in the world and constantly being created; by variety we refer to the different formats in which is (text, video, audio, images, etc.); velocity of the large amount of data being added to the stock available. Truthfulness is a vital quality to make critical decisions. If the data is false information is spurious, any decision is wrong and will cause losses and other problems.

Davenport and Dyche (2013: 3) indicate that new technologies of information as Big Data can generate fantastic cost reductions, substantial improvement in data processing times, creating a product or a new service. Technologies and concepts behind them, allow achieve a variety of objectives, which have influence on financial results, processes and quality management organization.

The cost, the use of technologies such as clusters or networks Hadoop can bring the cost of storing 1 terabyte (one million gigabytes) from $ 37.000 to a base of typical relational database, to $ 5,000 in an application database and only $ 2.000 on a Hadoop cluster.

Davenport and Dyche (2013: 5) also consider that the second common goal of business with Big data technology is the reduced time. For example, the retailer Macy´s  reduced optimization time for the pricing of 73 million items from 27 hours to just one hour. This feature "big data analytics" allows renew the chain prices more often and better adapt to changing market conditions in the retail market.

Some analysts say that mankind has created five exabytes (ie, 5 billion gigabytes) of data from the Stone Age until 2003; in 2011 that number was created in just two days in 2013 were required only 10 minutes (van der Aalst, 2014: 15) Remember that in the US, a trillion is a thousand billion.

For this reason, it has created a new concept, a new metaphor to describe the immensity of available data. The concept of "large lake data" (data big lake), a large mass of data that exists in the natural state or without trial. The central challenge is how you can store, process and efficiently use the massive amount of data. Compañas as Google and Facebook have useful to take advantage of the lake data technologies, but still are in an early stage. As the "lake of data" a recent concept, so are the relevant technologies, but certainly a new way to manage this wealth effectively it is needed.



What we want to show with this background? In business, staff area computer or Information Technology, through ignorance, apathy or comfort can opt for the use of equivalent shares anchoveteras networks. These people can know programming languages, algorithms and protocols for using the software and equipment, but are often unaware of the essence of the business, so someone must be responsible for this part. In general, when requesting information from available data, often used randomly mathematical models using the method of "trial and error" trying to find the model that best fits the available data, setting in appearance can be good because the basic statistics (mean, standard deviation, correlation) properties are acceptable but can be misleading as evidenced by the Anscombe quartet (data sets with the same statistical properties but totally different dispersion diagrams).

If the systems analyst is lucky, the model will fit the data in every way, and the end user have adequate information to make correct decisions. It could also be said to be a lucky guy, only that luck is not permanent and do not win the lottery twice (Note 1). The questionable aspect is that the analyst failed these acting as a competent professional results. If the information is incorrect, you can induce the user to make mistakes. Using anchoveteras networks allow tuna fishing sometimes, but the action is inefficient and ineffective, it has real costs and hidden costs that can and should be avoided because they neutralize the advantages and benefits of Big Data.

Who is the appropriate person to prevent, correct and to punish gitanería, divination? It is the General Manager, CEO, entrepreneur or trained people who know the business and have a lot of common sense. These people do not need to know programming or all the secrets of the world of computers, but they must have the necessary to guide the search and achieve the benefits as we saw earlier knowledge, offers the use of Big Data (high speed, low cost, variety ).

In other words, the CEO should be able to avoid the use of mesh anchovy knowing that the goal is fishing for tuna or large species; he must identify who does and propose corrective measures. He is the captain and must manage the ship and crew processes so that compliance with the work plan.

CEO involvement to prevent missteps as the use of inadequate networks (arbitrary and random use of models to fish whatever) overcomes drawbacks such as:

i) Lack of professionalism of operating systems analysts and
ii) Possibility of falling into the trap of Anscombe's quartet
iii) wasting resources (time, equipment, man hours, money paid for the data without results). Big data is not free.
iv) Delays in information generation, a factor that is critical in these times of acceleration
v) the CEO difficulty to obtain a perception, an insight, more full of relevant data for your organization
vi) Inefficient use of technologies associated with Big Data
vii) Lack in creating a bank of ideas, identification of useful models, discovering new relationships between data to answer new questions concerning the business

In conclusion, you are captain of a tuna boat. The open sea is your destination and you can always fish you want, but do it with appropriate fishing gear. Keep your crew to make mistakes because the result will be disastrous, do not let pass smuggling networks with anchovy mesh.

Note 1. The policy of lucky decisions was not affected by the basic rule of computing, GIGO (garbage in, garbage out), that is "garbage in, garbage out," referring to the situation when entering useless data to a process, the information generated must also be.

References

Thomas H. Davenport, Jill Dyche (2013) Big Data in Big Companies
Mayor, 2013, International Institute for Analytics

Data Scientist: The Engineer of the Future
P. M. Wil van der Aalst

Tableau (2015) Top 7 Trends in Big Data for 2015