LMS Analytics: Unlock Powerful Insights from Your LMS Data

Platforms and systems that support learning activities produce a huge amount of valuable data that requires some form of learning analytics system. Some of them provide a data visualisation system out of the box or as an add-on. However, the integration of all systems is complex. Most LMS analytics systems are usually closed-source and offer limited customisation.

PANORAMA is an open source data analytics architecture designed specifically for educational institutions needing a LMS reporting and anlytics system. It can combine data sources from many LMSs – whether of the same brand or not – and other information systems, and provide a unified view across the organisation, without losing control over data access and privacy. It can be used to connect AI applications for further analysis.

This article was presented as a poster in IEEE’s LWMOOCs 2023 convention in MIT.

Keywords: LMS analytics, learning tools analytics, LMS reporting and analytics, LMS data analytics, LMS analytics dashboard

Table of Contents

The problem

Learning projects involve multiple systems, e.g., LMS (Learning Management System), SIS (Student Information System), CRM (Customer Relationship Management), video streaming services, videoconference systems, proctoring systems, etc. Larger institutions may even have more than one LMS, of the same or different brands.

Students and teachers typically connect to more than one of these systems simultaneously and over time. This leads to some data being duplicated and other being siloed in specific application systems.

In the best case scenario, each system has its own analytics or reporting system. Integration of analytics systems is complex, if possible at all.

At the end of the day, managers have to deal with a series of reports that contain duplicated, siloed, incomplete, conflicting and scattered data from which they have to make an informed decision.

Typical problems found are

Some platforms don\’t even have a reporting system.
Incompatible data formats.
Siloed data.
Duplicate users.
Disparate metadata representations.

The solution

Panorama is an open architecture for learning management that centralises data in a common external repository and creates a metadata repository in a data lake representation. Panorama can work as a simple LMS analytics system, and also connect to other support systems.

The key benefits of this approach are

Consolidates data to enable global views.
Data is conditioned and organised into partitions.
Granular data access control with row or column level security.
Enables data analysts to create custom queries, dashboards and reports.
Its open architecture enables further data engineering using standard programmatic or AI tools.
Its modular design allows it to connect to other systems, data lakes or data warehouses.

How the LMS analytics works

The whole process can be broken down into three broad phases:

Data extraction, storage and organisation
Data pipeline
Data utilisation

Data extraction, storage, and organization

open edx LMS analytics data lake approach

Software agents perform the data extraction, which connect to the data sources and extract data of interest. They preserve the data as closely as possible to the original databases. However, in some cases they may perform a pre-process to ensure that they store the raw data in a valid format.

The agents process and upload the data is uploaded to the data store in standard formats (CSV, JSON, etc.) to maximise compatibility and to facilitate the creation of data transformations.

Files are organised in a folder structure and data is partitioned to improve performance and scalability.

The data lake functionality implements a data metastore and query engine that allows these files to be queried in standard formats using Structured Query Language (SQL). In addition, data access control is implemented through row and column level security mechanisms.

Data pipeline

The data pipeline is implemented entirely in SQL, making it easy to create complex data representations from simple data sources.

The first stage is the raw data layer. The data here reflects what was in the transactional data sources at the time of extraction, without any modification. This is the most pristine and trustworthy data source of the LMS analytics system.

The second stage is data conditioning. This is where the data is properly formatted so that the next layers can operate on it using all the functions available. This includes converting numbers to integer or float types, parsing dates, interpreting JSON strings, and handling null values.

Entity resolution is an important part of the process of creating a cross-platform learning tools analytics system. It handles the different ways in which the same real-world object is represented by different systems. The two main use cases are

User deduplication: treat different system user entries for the same person as a single entity.
Homogeneous metadata: Common objects such as users and courses may have differences in their metadata. This can be due to
- Different field names
- Different field formats
- Missing fields

The final tips are about entity abstraction. This means applying joins, unions, calculations, filters, aggregations, etc. to create new entities or enrich existing ones. These can be based on any previous layer in the pipeline, including other abstract entities. Queries can be stacked to create further complex abstractions.

In addition, it is possible to perform transformations using programmatic (Spark, Scala, Pandas, Numpy, R, etc.) or AI algorithms to create new data sources. Information can be taken from any stage of the pipeline, or even the original text files, and the results can be injected into the data lake for the pipeline to ingest and make available for analysis. This is when our LMS analytics engine begins to show it’s valuable architecture.

Learning tools analytics – Data utilization

open edx analytics openedx data utilization

Finally, the data can be used for visualization (near real-time dashboards), paged reporting, or to feed other data-driven applications.

The applications that consume the data at the end of the pipeline must implement these key features:

In-memory database: This is optional, but highly recommended for faster response times on any learning tools analytics system. Queries can take time to return results from the data lake, especially those involving complex operations, nested queries, and large data sets. The in-memory database provides an intermediate mechanism that dramatically improves system performance.
Access control: Authentication is mandatory to control who has access to what data.
Custom reports and dashboards: Most analytics systems come with fixed dashboards and reports that cannot be modified. Data analysts should be able to create their own dashboards and reports based on their needs.
Control data access with row and column level security: Row and column level security allows administrators to control who can see what type of data. These features allow a single dataset to be shared among users while controlling data access at the user level.
Data source for other applications: Other applications should be able to connect to the data pipeline and use the data for specific purposes. This includes IA-powered applications, marketing tools, resource planning, etc.

How to get Panorama LMS analytics into your institutions

There are three main ways to get Panorama learning tools analytics and LMS analytics working in your institution: from source code, as a service or turnkey installation. In addition, we can provide data engineering support to help you can get the most out of your data.

From the source code

The agent engine source code is available in our GitHub repository. There is a handy Tutor plugin for those using Open edX. The agent will take care of setting up the first layers of the data pipeline. However, you will need to set up the data lake. We currently support the AWS data lake. The agents will handle the first two layers of the data pipeline. Abstract entities are up to each implementation.

As a service

We can offer Panorama as a service. This option is the fastest and easiest way to start leveraging your data with no upfront investment and on a pay-as-you-go basis. Contact us and we will walk you through the necessary steps. This option is included for all our partners subscribed to our Premium plan

Turnkey installation

If your organization is concerned about data ownership, you have a technical team to run the cloud infrastructure, and you want full control of your data, we can perform a clean installation of Panorama LMS analytics on your AWS account.

Conclusion

Panorama is the ultimate LMS analytics solution. It is system independent and can connect to multiple LMSs of the same or different brands and other supporting tools. Its modular architecture allows you to connect to third-party applications to get the most out of your data. It has a powerful data pipeline that allows data engineers to perform complex analysis across platforms.