The Internet of Things, the infelicitously named multitude of connected smart devices that will soon number in the billions, instigates something of a tug of war where data storage and processing is concerned. A huge benefit from the IoT will come from aggregation of the data that it produces, which is likely to be stored in data lakes. It doesnt matter especially where those data lakes are located because the intention is to aid long-term decision-making. This is the essence of big data.
But, also among the IoTs benefits is real time adaptation to current conditions. That requires an entirely different model for handling data where latencies caused by geographic distance do matter.
Effective cloud storage strategies for IoT data require different tiers of storage that start at the edge of IoT networks, more than likely in the devices themselves or at a close remove, and end up in huge data lakes that can largely be location independent.
The edge data has been referred to as small data. Its the limited and fairly well structured data sets generated by one or a handful of closely related devices and its used by those devices to react in real time. An obvious example would be in-car sensors that adjust engine functioning minute-by-minute. Obviously, the data used to do that must be stored within the vehicle itself.
But, lots of small data adds up to big data, which is useful for bigger picture decision making: how can engine designs be tweaked to maximize performance and efficiency, how are people using their cars. And big data of this sort can be combined with other data streams to provide wider insights. Combined with geographical mapping data, data from engines can be used for urban planning, for logistics, and for a host of other applications.
Its not enough to just have two tiers of data storage, though. While accurate real time response requires that data be as close the devices as possible, locally and regionally aggregated data is also of essential value. Its wasteful to ship this data back to a central data lake, but much of the advantage is lost if it isnt collated and analysed close to the source and fed back to Internet-connected devices in almost real time. Continuing with our car example, engine and travel sensor data in aggregate from multiple vehicles can be used to plot traffic flows and then returned to in-car navigation systems for route planning. When and if self-driving cars become a reality, this is the data that will be used to maximize the efficiency of traffic flows.
From an IT infrastructure management perspective, managing multiple local infrastructure deployments is complex, especially when it involves using services from different vendors, each of which offer separate control interfaces, APIs, and billing systems, and which may be located in different countries. Forming a federated multi-cloud environment best suited to the Internet of Things is complex and expensive.
Or at least it would be without cloud marketplaces and integration layers that tie together dozens of different platforms across the world and provide a unified control and billing interface. Given the divergent infrastructure requirements of the Internet of Things and the Industrial Internet, and the inherent complexity of managing the required federated cloud environments, its almost a certainty that cloud integrators will be the backbone that helps the Internet of Things fulfil its promise.

