Is distributed data realistic?

[ad_1]

The idea of distributed data is an old concept that lived in white papers and PhD theses more than in the real world. I remember talking about distributed data in my database design college class back in the late 80s, with the belief that it would likely show up the next year. It never did.

The idea has been consistent throughout the years: No matter where we store data, by using a common set of services or data management control plane, we’re able to deal with it all, no matter where it physically exists, as one logical grouping of data. This data is available at any time, by anybody, for any purpose. It’s federated, democratized, and it’s completely transparent as to how this magic occurs across clouds, edge computers, devices, and legacy systems.

Fast forward to 2022, and we’re talking about much the same concept as we did 40 years ago. What’s different is that we now have the ability to pull it off at a reasonable price. Also, we have emerging concepts such as cloud native, which we’re defining as a common stack where the private and public clouds are the foundation, but the foundation clouds don’t typically provide services (or data) directly to the applications or analytical tools.

A few things are driving this right now.

First, we finally have a working and reliable global network; certainly, that will hopefully be the case when 5G completes its rollout.

Second, there’s interest in maintaining data on edge systems outside of the data center and cloud providers, meaning any device or server that can store and process data.

Finally, data storage has been democratized. No longer is data administration and control the domain of a single data administrator, but a group of people who own specific data sets that are widely distributed and can be leveraged as a single data set or a federated grouping of data sets, without limitations on performance or functionality.

Of course, there is a lot of cross-coordination required to make data anywhere a reality. The biggest problem is having a functional management control plane that can keep track of the data as well as deal with governance and security. Simple things, such as changing the meaning of a data element on an edge device, could end up breaking hundreds of applications and embedded analytics processes if not managed correctly. Also, if devices or servers, cloud or not, are offline for a long period of time, then that offline data will be missing for applications and analytics that depend on it until communications are restored.

You really need to use your head. Just because you can store data anywhere and leverage it as if it’s centralized, does not mean you should. There are some gotchas, such as network and management control plane failures that can cost you downtime. Also, although we’re still figuring out costs, it does seem a bit more expensive to deploy and operate longer term than more traditional approaches and data centralization.

Despite all this, you should still consider distributed data. Indeed, it has many pragmatic applications that businesses can exploit to drive innovation and growth. For example, enhancing the customer experience by driving more control of the data down to the customer’s systems is one opportunity; there are hundreds of others.

So, take a look at distributed data or data anywhere in 2022. As always, look for pragmatic use cases to keep your company out of trouble.

[ad_2]

Is distributed data realistic? | InfoWorld

Leave a ReplyCancel Reply