Documentation: Difference between revisions

    From UNITApedia
    No edit summary
    Line 1: Line 1:
    =Data Collection for Impact Observatory=
    =SYSTEM OVERVIEW=
    ===Strapi & MinIO Video Tutorial===
    The UNITApedia system is composed of two integrated main components designed to enhance data accessibility, transparency, and collaboration among UNITA members. It connects a shared data warehouse with a MediaWiki-based front-end, creating a dynamic and scalable ecosystem for data visualization, management, and analysis.
    Bellow a short video tutorial explaining how to use [https://unitapedia.univ-unita.eu/strapi/ Strapi] and [https://unitapedia.univ-unita.eu/minio/buckets MinIO] tools for the data collection of indicators.
    <html>
    <div style="position: relative; overflow: hidden; padding-top: 56.25%;"><iframe src="https://share.synthesia.io/embeds/videos/6b74f097-50d8-4557-af6e-b09d2be95862" loading="lazy" title="Synthesia video player - v2" allowfullscreen allow="encrypted-media; fullscreen;" style="position: absolute; width: 100%; height: 100%; top: 0; left: 0; border: none; padding: 0; margin: 0; overflow:hidden;"></iframe></div>
    </html>
    ===Data Collection Methodology===
    Two ways for collecting the data were implemented for the '''Impact Observatory'''. In the automatic path, users simply upload CSV files to our [https://unitapedia.univ-unita.eu/minio/buckets MinIO] object-storage service; from there, a suite of '''Pentaho''' jobs automatically ingests, transforms, and loads the data into our Datalake, ensuring updated metrics with zero manual intervention. In the manual path, UINTA offices log into our [https://unitapedia.univ-unita.eu/strapi/ Strapi] CMS and complete pre‐built indicator forms (e.g., entering dates, numeric values, and descriptions). Once submitted, each entry is stored in our '''PostgreSQL''' database, tracked with metadata (timestamps, author, version), and immediately available for reporting. This dual approach guarantees both high volume automation and flexible, data entry, so all the indicators stay accurate and updated.


    [[File:Data_Collection.png|frameless|996px|center|Data Injection]]
    == Shared Data Warehouse ==
    Acts as the central repository for structured data such as deliverables, indicators, and progress metrics. Utilizes metadata, ontology, and semantic web technologies to provide a comprehensive, interconnected view of data collected across all UNITA members. Supports efficient data centralization, organization, and analysis, ensuring a unified understanding of the data ecosystem. 
    Backed by [https://unitapedia.univ-unita.eu/pga/ PostgreSQL], enabling complex queries, scalability, and robust data storage. Alongside [https://unitapedia.univ-unita.eu/hop/ Apache HOP] as an ETL to develop powerful data pipelines.


    ===Strapi Manual Data Injection Steps===
    == MediaWiki-Based Front-End Interface ==
    *  Enter to the [https://unitapedia.univ-unita.eu/strapi/ Strapi] CMS.
    Provides a user-friendly system for monitoring project progress, visualizing metrics, and assessing impact. Acts as the primary user interface, powered by extensions like External Data, Scribunto, and Semantic MediaWiki. Dynamically retrieves data through its API layer, integrating seamlessly with the data warehouse. Enhances decision-making and collaboration by providing stakeholders with real-time, actionable insights. Share and collaborate with other users to extend the UNITA knowledge-base.
    *  Click on the button "Open the administration".
    *  Enter your credentials and click on Login.
    *  Click on Content Manager.
    *  Select the indicator entry form to fulfill.
    *  Under the “COLLECTION TYPES” panel the respective forms will be displayed according to the user roles. We defined three user roles with different permissions:
    ** a) UNITAOffice
    ** b) TaskLeader
    ** c) ProjectManager
    *  Once you select the form entry to fulfill. Click on “Create new entry”.
    *  Enter the required data manually.
    *  Click on Save, double check that the information is correct and then click on Publish.
    *  Click on Back to see all the published entries.
    *  See how the status of the entry changed. Click on Unpublish if you want to revert it.
    *  Now you can see all the entries that have been saved and published.
    *  In this section you can edit, duplicate or delete the entry.
    *  Once the entry has been published, it is saved on our database with its metadata.


    ===MinIO Automatic Data Injection Steps===
    == Key Features ==
    * Enter to the [https://unitapedia.univ-unita.eu/minio/buckets MinIO] site.
    * '''Near real-time integrated data pipeline process'''
    * Enter your credentials and click on Login.
    * '''Data Retrieval:''' Utilizes robust APIs to fetch and display updated information from the [https://unitapedia.univ-unita.eu/pga/ PostgreSQL] database. Near-instantaneous process from data extraction to final result display on UNITApedia.
    Click on Object Browser on the left panel.
    * '''User-Friendly Interface:''' Built on MediaWiki, ensuring an intuitive experience for users of varying technical backgrounds. Extensions like Page Forms and Semantic MediaWiki simplify data input, annotation, and querying.
    * Select the folder “bucket” of the indicator you want to upload the file.
    * '''Open Source:''' Designed with modularity and scalability in mind, allowing deployment across other UNITA members or similar institutions. Supports customization to meet unique institutional needs while adhering to UNITA’s vision.  
    * To upload a file click on Upload and select the file from your machine.
    * '''Dynamic Queries:''' Uses optimized prepared [https://unitapedia.univ-unita.eu/pga/ PostgreSQL] statements and Lua scripting via MediaWiki extensions to deliver efficient and dynamic data visualization. Allows advanced customization of data presentation formats based on user needs.
    * Once you select a file, you can Download, Share or Delete it.
    * '''Scalable Architecture:''' Employs a Dockerized infrastructure for each subsystem (MediaWiki, [https://unitapedia.univ-unita.eu/strapi/ Strapi], [https://unitapedia.univ-unita.eu/pga/ PostgreSQL], Pentaho, etc.), ensuring modularity and scalability. Supports efficient deployment, updates, and resource allocation.
    * '''Enhanced Collaboration and Transparency:''' Enables cross-institutional collaboration by centralizing data in the shared warehouse. Provides stakeholders with real-time visualizations, ensuring informed decision-making and alignment with organizational goals.

    Revision as of 08:07, 11 June 2025

    SYSTEM OVERVIEW

    The UNITApedia system is composed of two integrated main components designed to enhance data accessibility, transparency, and collaboration among UNITA members. It connects a shared data warehouse with a MediaWiki-based front-end, creating a dynamic and scalable ecosystem for data visualization, management, and analysis.

    Shared Data Warehouse

    Acts as the central repository for structured data such as deliverables, indicators, and progress metrics. Utilizes metadata, ontology, and semantic web technologies to provide a comprehensive, interconnected view of data collected across all UNITA members. Supports efficient data centralization, organization, and analysis, ensuring a unified understanding of the data ecosystem. Backed by PostgreSQL, enabling complex queries, scalability, and robust data storage. Alongside Apache HOP as an ETL to develop powerful data pipelines.

    MediaWiki-Based Front-End Interface

    Provides a user-friendly system for monitoring project progress, visualizing metrics, and assessing impact. Acts as the primary user interface, powered by extensions like External Data, Scribunto, and Semantic MediaWiki. Dynamically retrieves data through its API layer, integrating seamlessly with the data warehouse. Enhances decision-making and collaboration by providing stakeholders with real-time, actionable insights. Share and collaborate with other users to extend the UNITA knowledge-base.

    Key Features

    • Near real-time integrated data pipeline process
    • Data Retrieval: Utilizes robust APIs to fetch and display updated information from the PostgreSQL database. Near-instantaneous process from data extraction to final result display on UNITApedia.
    • User-Friendly Interface: Built on MediaWiki, ensuring an intuitive experience for users of varying technical backgrounds. Extensions like Page Forms and Semantic MediaWiki simplify data input, annotation, and querying.
    • Open Source: Designed with modularity and scalability in mind, allowing deployment across other UNITA members or similar institutions. Supports customization to meet unique institutional needs while adhering to UNITA’s vision.
    • Dynamic Queries: Uses optimized prepared PostgreSQL statements and Lua scripting via MediaWiki extensions to deliver efficient and dynamic data visualization. Allows advanced customization of data presentation formats based on user needs.
    • Scalable Architecture: Employs a Dockerized infrastructure for each subsystem (MediaWiki, Strapi, PostgreSQL, Pentaho, etc.), ensuring modularity and scalability. Supports efficient deployment, updates, and resource allocation.
    • Enhanced Collaboration and Transparency: Enables cross-institutional collaboration by centralizing data in the shared warehouse. Provides stakeholders with real-time visualizations, ensuring informed decision-making and alignment with organizational goals.