Image for post
Image for post
Image by David Schwarzenberg from Pixabay

And why we need data management, data literacy and data analytics

Data has become such a common word that many of us have probably never thought about its exact definition. What first pops up in our mind about data is most likely a spreadsheet, a table, or a chart, that comprises numbers and labels. When everyone talks about big data, it becomes even more abstract as an enormous number of bytes floating through the devices and servers and requires programs to decipher them. While data can be understood by machines, it has lost most of its meaning to humans when stored in a file or table. We rely on other people, documentations, data architecture, and data flows to restore the full sense of a piece of data as related to the real world. We often compare data to oil or land, which waits for people to discover and realize its values. However, as data are collected and processed, the most useful context information is often lost, making it harder to be discovered and further leveraged. …


Image for post
Image for post
Image by Qimono from Pixabay (CC0)

Back in 1958, Han Peter Luhn, a researcher at IBM, initiated the concept of Business Intelligence (BI), using the definition from Webster’s Dictionary: to apprehend the interrelationships of presented facts in such a way as to guide action towards a desired goal. Given its definition, Business Intelligence is indeed a vision. It should not be represented by the tools or technologies designed at some given time. In other words, it should be viewed as a company’s strategic vision for transforming data assets into business insights to make data-driven decisions.

Historically, there were two eras that revolutionized and popularized the concept of Business Intelligence. The first was in the 1980s when relational database was invented and became the mainstream tool for data collection and data storage. Database, along with SQL, enabled businesses to access information more quickly to make business decisions based on current facts and historical trends. The 2nd era was in the 1990s when Data Warehouse was born. Data warehouse gathered data from various relational database systems, and transformed and aggregated them further for BI tools to consume, which led to a jump in the accessibility of large amounts of information. As a result, the data warehouse stimulated new technologies that make business users’ lives simpler by allowing them to access more information quicker with better visualizations. …


Image for post
Image for post
Photo by xdfolio via pixabay (CC0)

Data management, including meta-data management, data governance, master data management, has been advocated since the beginning of the data warehousing era in the 1980s. It, however, has been hard to be implemented or enforced. Without it, an enterprise can still survive with their data warehousing projects. The project and organizational silos, however, would have introduced data duplications, inefficiencies, and no apparent source of truth, which have been a headache to many organizations, in particular those that own a lot of data with multiple systems.

With the recent rise of data privacy issues followed by several regulations put in place, it is a wake-up call to every organization to prioritize the implementation of disciplined data management. It not only applies to the existing data assets but also ensures compliance as new data is acquired, collected, processed, used, and stored. …


Image for post
Image for post
Photo via Pixabay

In the new era of Big Data and Data Sciences, it is vitally important for an enterprise to have a centralized data architecture aligned with business processes, which scales with business growth and evolves with technological advancements. A successful data architecture provides clarity about every aspect of the data, which enables data scientists to work with trustable data efficiently and to solve complex business problems. It also prepares an organization to quickly take advantage of new business opportunities by leveraging emerging technologies and improves operational efficiency by managing complex data and information delivery throughout the enterprise.

When compared with information architecture, system architecture, and software architecture, data architecture is relatively new. The role of Data Architects has also been nebulous and has fallen on the shoulders of senior business analysts, ETL developers, and data scientists. Nonetheless, I will use Data Architect to refer to those data management professionals who design data architecture for an organization. …


Image for post
Image for post

What NoSQL databases can do while a Relational database cannot

NoSQL database is more and more popular in the modern data architecture. It has become a powerful way to store data in a specialized format that yields fast performance for a large amount of data. There have been many NoSQL databases available on the market, while new ones are still emerging. The most popular categorization consists of 4 types: Wide Column, Document, Key-value Pairs, and Graph. Among many NoSQL databases, below lists a few popular ones:

  • Wide Columnar: Cassandra, HBase, AWS DynamoDB
  • Document: Couchbase, MongoDB, Azure Cosmos DB, AWS DynamoDB
  • Graph: Neo4J, Azure Cosmos DB, TigerGraph, AWS Neptune
  • Key-Value Pairs: AWS Dynamo, Redis, Oracle…


Image for post
Image for post

With rapid advances in AI and data science, data has become an essential asset to every enterprise. Setting up a data strategy, therefore, has become every enterprise’s mission, particularly in the C Suite and at Executive levels. What is a data strategy and how do we create the right data strategy? I would like to dedicate this article to answer these 2 questions.

Before discussing data strategy, we need to understand what a strategy is. Using a simplified definition, a strategy is a thoughtful plan focused on changing the current state in order to reach a vision for the future. In other words, the right strategy needs to start with a vision, and the strategy is a way of making a series of changes, usually requiring innovation and out-of-the-box thinking, to achieve the vision. Every enterprise must have a business vision and a business strategy in place before having a data strategy. A data strategy should go in hand in hand with a business strategy and serve to realize the business vision. On the other hand, data lives with technology, while providing value to businesses and customers. …


Image for post
Image for post

The evolution of the technologies in Big Data in the last 20 years has presented a history of battles with growing data volume. The challenge of big data has not been solved yet, and the effort will certainly continue, with the data volume continuing to grow in the coming years. The original relational database system (RDBMS) and the associated OLTP (Online Transaction Processing) make it so easy to work with data using SQL in all aspects, as long as the data size is small enough to manage. …


The practice of Design Patterns is most popular in Object-Oriented Programming (OOP), which has been effectively explained and summarized in the classic book “Design Patterns: Elements of Reusable Object-Oriented Software” by Erich Gamma and Richard Helm. Below is the definition of Design Pattern from Wikipedia:

“A software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code. It is a description or template for how to solve a problem that can be used in many different situations. …


Image for post
Image for post

Several years ago, I met a senior director from a large company. He mentioned the company he worked for was facing data quality issues that eroded customer satisfaction, and he had spent months investigating the potential causes and how to fix them. “What have you found?” I asked eagerly. “It is a tough issue. I did not find a single cause, on the contrary, many things went wrong,” he replied. He then started citing a long list of what contributed to the data quality issues — almost every department in the company was involved and it was hard for him to decide where to begin next. …


I started my career as an Oracle database developer and administrator back in 1998. Over the past 20+ years, it has been amazing to see how IT has been evolving to handle the ever growing amount of data, via technologies including relational OLTP (Online Transaction Processing) database, data warehouse, ETL (Extraction, Transformation and Loading) and OLAP (Online Analytical Processing) reporting, big data and now AI, Cloud and IoT. All these technologies were enabled by the rapid growth in computational power, particular in terms of processors, memory, storage, and networking speed. …

About

Stephanie Shen

Data and Technology Executive, #BigData #ML #Analytics # DataGovernance, also love photography and travel.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store