Data Operations is the infrastructure for working with data and teams that keep it running.
Why does Data Operations matter?
To work securely and efficiently with data, companies must have reliable infrastructure that provides access to data and tools to work with it. Having dedicated infrastructure facilitates access and control of data, which speeds up delivery of value. Having data infrastructure ensures reliable and predictable data flows for data engineers, data scientists, and business users.
Established practices to ensure reliable and controlled use of data within the organization, including deploying, accessing, testing, logging, monitoring, and recovering data.
My organization has staff that manages and provisions infrastructure through machine-readable definition files.
The goal of Infrastructure as Code (IaC) is to make configuration management and deployment as efficient and repeatable as possible by removing manual processes. A code-based approach makes it easier to get more done in less time. No need to wait on the IT Admin to manually complete the task at hand before he can get to the next one.
My organization provisions logging, monitoring, and alerting frameworks to continuously measure data processes (e.g., pipelines, storage, APIs, etc.).
Data operations teams that lack logging, monitoring, and alerting are significantly more reactive. Those that have these capabilities in place can pinpoint problems and identify and implement long term solutions.
My organization manages high availability by overseeing and regularly testing a backup and recovery architecture.
A regularly tested back up and recovery system with supporting policies are critical to providing robust data operations.
Deploying and managing infrastructure to support the exploration of using data to support business needs.
My organization manages an identity access management solution to control access to data.
Centralizing identity and access management facilitates granting access to data while protecting data and complying with regulations.
My organization effectively manages a notebook environment (e.g., Jupyter, Amazon SageMaker, Google Colab, Databricks, etc.) for data engineers and data scientists.
Modern data scientists and data engineers use notebooks for developing, testing, and commenting code. A notebook environment is a key component of a company's data and analytics platform and must be managed effectively.
My organization develops and executes automated processes for machine learning model training.
A data operations team needs to have processes and automated scripts to train and deploy machine learning models in production environments.
Deploying and managing infrastructure to support the operationalization of using data to support business needs.
My organization effectively deploys and manages APIs for data sharing and access.
APIs are one of the most fundamental delivery channels for outside systems to securely retreive data; therefore, knowing how to deploy and manage APIs is paramount for modern data operations teams.
My organization administers and manages relational databases (e.g., Postgres, mySQL, SQL Server, Oracle).
Relational databases are ubiquitous in enterprise architectures and a data operations team should have skills to effectively administer and manage them.
My organization administers and manages data lakes and meshes.
Modern analytics techniques require the ability to analyze and share data at scales beyond those possible with relational databases. Using a central organizational data lake, or a series of lakes organized into a mesh, allows analytics across the company to avoid access and performance bottlenecks and focus on delivering value.
My organization administers and manages Extract Transform Load (ETL) processes.
ETL processes allow businesses to connect data from various systems (relational databases, enterprise systems, etc.) for analysis and reporting. ELT processes can be coded manually or configured in a tool. Either way, the data operations team should know how to administer and manage ETL processes so data is complete and up-to-date.