Jump to content
Toggle sidebar
UNITApedia
Search
English
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Talk
Contributions
Navigation
Main Page
User Guide
Documentation
Viewpoints
Structural
Strategic
Beneficiary
Semantic
Infrastructure
Data
Beneficiaries
UNITA Participants
GEMINAE
Collectives
Agile Management Guide
Quality Management Process
Tools
What links here
Related changes
Special pages
Page information
Page values
In other languages
Editing
Documentation
(section)
Page
Discussion
English
Read
Edit
Edit source
View history
More
Read
Edit
Edit source
View history
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
== Data Governance, Quality, and Security == === Data Quality Assurance === ==== Validation Rules ==== * '''Strapi Forms''' ** '''Field-Level Validation''': Certain fields (e.g., numerical indicators, date fields) must conform to a predefined format or fall within acceptable ranges. If a value is invalid, the user is prompted to correct it before the data is stored. ** '''Review & Publish Step''': Strapi includes a <q>Published</q> button that serves as an extra checkpoint. Users must explicitly publish each entry after reviewing it, ensuring that data is accurate and complete before it proceeds to the transformation process in [https://unitapedia.univ-unita.eu/hop/ Apache HOP]. * '''Apache HOP Transformations''' ** Additional checks occur when data is moved from [https://unitapedia.univ-unita.eu/strapi/ Strapi] or [https://unitapedia.univ-unita.eu/minio/ MinIO] into the datamart. These checks may include data type conversions, date validations, and referential integrity (e.g., matching Task IDs to existing records such as [https://unitapedia.univ-unita.eu/index.php/Manage_and_coordinate_UNITA T1.2], [https://unitapedia.univ-unita.eu/index.php/UNITA_Quality_Assurance T1.3]). ** Records that fail validation can be flagged or routed to a separate output for manual review and correction. ==== Error Handling and Logging ==== * '''Transformation Logs''': [https://unitapedia.univ-unita.eu/hop/ Apache HOP] logs each step of the ETL process. If data is malformed or fails a validation check, the record is either flagged or routed to an <q>error output</q> step for manual review. * '''Notifications''': Administrators (or designated data stewards) can receive email or dashboard alerts when ETL jobs fail or encounter anomalies. * '''Rollback/Correction''': Erroneous data can be corrected in [https://unitapedia.univ-unita.eu/strapi/ Strapi] or CSV files and reprocessed. Historical error logs help identify recurring issues (e.g., formatting mistakes from a specific partner). === Governance Model === The governance model defines roles, responsibilities, and processes to ensure data integrity and proper stewardship of indicator definitions. ==== Roles and Responsibilities ==== * '''UNITA Office (CRUD on Alliance Level for Their University):''' ** Can create, read, update, and delete records for their institution’s indicators. ** Responsible for ensuring that data submitted (e.g., participant counts, events) is accurate and timely. * '''Task Leader (Read-Only at Task Level):''' **Can monitor and view data for all universities participating in their assigned task(s) (e.g., [https://unitapedia.univ-unita.eu/index.php/Education_and_research_%26_innovation_community T2.3]). ** Typically does not edit or delete data, but may request corrections or clarifications from the UNITA Offices. * '''Project Manager (CRUD at All Levels):''' ** Has the highest level of access, capable of modifying any indicator data across the alliance. ** Primarily uses this access for oversight or emergency corrections, while day-to-day data entry is handled by UNITA Offices. ==== Version Control of Data Definitions ==== * '''Indicator Definitions''': Each indicator (e.g., T1.2.5, T2.3) may evolve over time. Changes to definitions—such as calculation rules, target values, or frequency—should be documented and versioned. * '''MediaWiki Tracking''': Edits to indicator documentation pages (in the <code>Doc</code> namespace) can be tracked via MediaWiki’s revision history, providing a transparent audit trail. === Data Security === ==== Access Control ==== * '''Database Roles and Privileges:''' ** Row-level or schema-level security can be configured if certain data must be restricted to specific roles. * '''Application-Level Permissions:''' ** [https://unitapedia.univ-unita.eu/strapi/ Strapi] can enforce role-based access, ensuring only UNITA Office roles can submit or update data. ** MediaWiki’s namespaces (e.g., <code>DataSrc</code>, <code>Doc</code>) can be locked down to specific groups or user roles for editing, while read access might be more widely available. ==== Authentication/Authorization ==== * '''MediaWiki''': Users log in to access pages or to perform data-related actions. In a production environment, Single Sign-On (SSO) or OAuth could be integrated to streamline user management. * '''Strapi''': Form submissions require an authenticated user with the correct permissions. * '''Apache HOP Carte''': Administrator credentials are required to deploy and schedule transformations. ==== Datawarehouse Compartmentalization ==== The UNITApedia Datawarehouse is architected to clearly separate different types of data through the use of multiple databases and schemas. Each database within the data warehouse serves a distinct purpose, and within each database, data is organized into logical schemas containing related tables. For example: * '''Strapi Database:''' ** Contains raw input data from [https://unitapedia.univ-unita.eu/strapi/ Strapi] forms. ** Schemas in this database hold tables with initial, untransformed indicator entries. * '''Datamart Database:''' ** Stores transformed and processed data ready for consumption by MediaWiki. ** Within the datamart, tables are organized under one or more schemas (commonly the public schema) such as public.t125, public.t126, etc. * '''Additional Databases:''' ** The unita-data database contains metadata and configuration details used by MediaWiki. This multi-database, multi-schema design enables: * '''Isolation:''' Raw data is kept separate from processed data, reducing the risk of accidental overwrites and ensuring data integrity. * '''Security:''' Different access controls can be applied at both the database and schema levels, so only authorized roles can access or modify sensitive data. * '''Maintainability:''' Logical separation simplifies performance tuning, backup strategies, and data governance by grouping related tables together.
Summary:
Please note that all contributions to UNITApedia are considered to be released under the Creative Commons Zero (public domain) (see
UNITApedia:Copyrights
for details). If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource.
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Debug data: