Data Architecture Viewpoint
The Data Architecture Viewpoint is also of technical nature, will be oriented to communicate about the data model used to operate the observatory. This viewpoint is intended for experts in data management of the alliance to observe how the integration of the different data sources of each university has been implemented and will also allow users less expert in data processing to make specific queries to observe specific indicators on certain conditions (reports on a particular instrument, on its deployment, on its evolution over time, etc.).The results of these consultation operations, if relevant, could later be integrated into one of the 4 initial viewpoints.
PostgreSQL (Data Warehouse)
- Role: Central repository storing structured data such as deliverables, indicators, and metrics.
- Multi-Database Setup:
- strapi: Contains raw input tables from Strapi forms.
- datamart: Holds transformed and processed data ready for MediaWiki queries.
- unita-data: Contain additional metadata or wiki configuration tables.
- Administration: Managed via pgAdmin for database operations (e.g., backups, user management).
Apache HOP (ETL and Reporting)
- Processes:
- Data Retrieval: Fetches raw datasets from MinIO buckets (CSV files) or Strapi tables in PostgreSQL.
- Data Transformation: Cleans and normalizes data, ensuring consistency (e.g., date formatting, numeric checks, selecting values).
- Data Integration: Loads validated data into the datamart schema for consumption by MediaWiki.
- Scheduling & Monitoring: Deployed Apache HOP “Carte Server” allows scheduling of jobs and transformations, with logs for error handling and performance monitoring.
MinIO (Object Storage)
- Role: Stores raw data files (CSV, PDFs, images, etc.) uploaded by UNITA Offices.
- Integration: Apache HOP connects to MinIO using an S3-compatible interface, retrieving files for ETL processing.
- Organization: Multiple buckets can be created (e.g., “dev” for storing Apache HOP transformations, indicators buckets to store CSV files coming from UNITA Offices).
Strapi (Middleware / Headless CMS)
- Purpose: Provides a user-friendly interface for UNITA Offices to manually input or update indicator data.
- Data Flow: Stores raw records in its own PostgreSQLL database schema, which Apache HOP then reads, transforms, and pushes into datamart.
- APIs: Exposes REST or GraphQL endpoints if needed for external integrations or advanced use cases.