Understanding our software architecture 05 May 2023

How data synchronisation solutions architecture can deliver information to your audience faster, more securely, and more robustly

In this article we cover how Info Rhino adapted our software and third party software to automate content and data publication to our website for automated delivery to our audiences. Whilst we use our proprietary software to achieve many of these tasks, important plain to understand is how we work with thinking in terms of responsibilities to meet the needs of each individual client. So whilst we may offer this specific solution for your needs we can equally come up with the right solution for your organisation.

If you want to find out more about this service, feel free to contact us here.

Publishing Data to a website courtesy of Dalle

Data Management

Content Management

Content management systems allow website designers and content managers to maintain information on fairly on small to medium sized websites. A bigger challenge with website content management is the need to move data between different media. For example most content starts out in Word documents, OneNote or other text editors, before being taken into CMS editor screens and configuration files for eventually appearing on the website.

Content is not just text, images, data, key files but document formats for different audiences.

The Internet is becoming more automated. Not only bots, AI language processing models are running against online content. Normally this data is structured, but a key element of data is timeliness.

Thinking about the Data Synchronisation process

We took a good look at the many technologies that exist for storing data. There are many different cloud storage providers, we have blockchain based data storage, we have APIs within cloud providers that can capture data to data lakes, container storage, to name but a few. The challenge with each option is the possibility that future versions of these technologies either won't exist or will be change their versions meaning there will be a need to upgrade code.

We took a look at our requirements for publishing data to our website platform for cryptocurrency data and analytics, the answer was relatively straight forward And even more to the point, cost effective.

High on the list of things to avoid is writing proprietary code that directly interfaces with third party cloud storage providers unless we need to.

Data risk assessment and Data Governance

There is a very real risk as legislation tightens and automation, coupled with AI, will lead to many more false positives when it comes to the posting of information on centralised information data stores. Honest actors can find themselves having significant challenges when working with cloud storage providers. We think there are so many excellent blockchain based providers but keeping this a lot simpler why not have synchronisation between your content and your website.

How we have solved our needs for data synchronisation and how you may can benefit by working with our technologies

Automation focus

We cannot eliminate manual process altogether and neither should we. Developing front ends for content management systems is an expensive process ever point and often over the top. Most times we are just taking information from one format and putting it into a system of another.

Data discovery

Our systems have been improving. We look at structured data, and where possible, bring it into reporting solutions whereby users can access that information through dashboards and other front end solutions. We think more in terms of whether the data can be automatically bought into a website.

Ad-hoc and Schedule

Periodic and Scheduled onboarding of data. We added the ability to have a time schedule to discover data, and to be notified of changes.

Data availability

Once data is known of, it should be made available in the right format to the right audience. Our Web Data Platform has a report manager and other data aware solutions within it that knows how and where to present this information.

Requirements gathering and solution architecture process

JBDT requirements

We tend to focus on what is known as promise theory in a way we see "Jobs To Be Done" as complementary to promise theory. Rather than focusing on building more technology into web data platform we thought about what is the primary need.

Situation - we have data on our systems
Motivation - my audience can benefit from our information
Expected outcome - more users will visit our website and consume our services

Defining possible requirements

Once we know these three basic elements we realise the technology is secondary to the requirement. Rather than using all the modern available technologies the requirement is quite straight forward although still not necessarily the simplest to achieve.

Data is collected from different systems continuously and periodically
Data is additionally processed and stored in relation to data collection
Our audiences will want to see and consume this information in a variety of formats
We don't want to put too much extra technology into web data platform
Rather than building more into each individual process we may want to add new independent responsibilities that are independent.

Thinking of responsibilities

Data collection
Data Processing and collection
Data publication
Data onboarding
Data presentation

Domains

We now start to recognise that we may have categories of information that can be grouped into domains. This can help us to simplify thinking about common types of responsibilities. We may be in a position to create templates of responsibilities to automate and parameterise the creation of many of their artefacts and responsibilities.

The importance of responsibilities

Responsibilities are discrete actions that are almost entirely independent to other responsibilities. They are stateless in that they accept or detect an input, perform a process, and produce an output. In the example of data presentation, it has no need to understand about how data onboarding occurred, it definitely does not need to understand how data publishing occurred. We think of this as allowing for interchangeability of responsibilities and effectively Jobs To Be Done. A great analogy - we don't need to know how the grass was cut, we just appreciate that the lawn is tidy.

Defining Processes

We now know but there are processes we need to do our data synchronisation needs.

Detection
Scheduled processing
Completion notification
Data Consumption
Data Processing
Data Delivery
Data Presentation

Process Tasks Breakdown

We won't list all of these but would rather give a couple of examples;

Detection - looking for new information on a folder.
Data Delivery - synchronising information between point A and point B.

Risks appraisal, Cost Benefit Analysis, Business Continuity

You see how we have not looked at the costs until we have an understanding of say the mains responsibilities processes and tasks that we need to achieve. This is for a very important reason because we don't want to solution lies the implementation before understanding our risk appetite. For a specific implementation, we know there things we absolutely need and things that we can live without. Specific implementation we know web host gives us an abundance of cheap storage space that we don't need to have a cloud based infrastructure at the moment and that we can run most of these processes from a desktop in the short term. Once we need to upscale our implementation we could look to moving these desktop processes to a virtual machine in the cloud oh to cloud based architecture. We always thinking in terms of the customer service level we wish to offer my audience and what are competitors do too.

Contextualization

We look at the medium in which the process operates within. For example, we understand that Data Delivery is of information from a file system on a PC to a web server. It should be fairly straight forward to understand that the FTP protocol is probably the best way to achieve this. If we can find software to synchronise this information this may meet our needs. This would lower our development time needed to write bespoke code to talk to an API.

The technologies behind our solution

We will list our applications with a brief description of each one. The important thing is to use our strengths where necessary - for example, C#, dotnet, Business intelligence, automation, parallel and asynchronous execution.

Job store

We set up definitions of jobs and processes that can be run to perform one or more tasks. We keep this lightweight and technology can detect processes making it easier to maintain this information.

Executor Processor - Batch Process Publication

Job store information is translated into creating batches of processes for execution. These processes are typically applications or batch files that perform a specific responsibility.

Processor application - Execution

This is a lightweight application which ones applications within it. The processor can either run to completion on a schedule or based upon detecting a file which can itself be zero or more times.

WinSCP FTP application

Script automation capabilities exists within WinSCP. Whilst we have written FTP solutions in dotnet, we want to avoid reinventing the wheel. This solution is perfectly capable of synchronising information between a client and FTP server, and can backup information.

DOS Command Prompt capabilities

whilst we have many input output code processes within the technology we always seek to avoid reinventing the wheel where possible. In some circumstances a simple batch file within XCopy or RoboCopy can be preferable 2 writing an extra class or library feature in.net or Java.

Reporting of artefacts

To reduce complexity we have processes within our DevOps software To detect files only I'm below I folder 2 bring them into centralised locations where we can read what process and artefacts we have. For example the location of log files or batch files. We could take this information and automate this too come on for example housekeeping.

DevOps for deployment automation

Our Full Deployer application has a host of features for helping to deploy and publish applications in addition to generating and maintaining configuration. This is used in many of our Domain processes.

Web data platform detection

We added jobs to the wdp that can detect new content and bring this into our application data store.

Web data platform data presentation

We have multiple interfaces for website audiences to consume our content and data;

Searchable APIs
Articles
Reports
Tables
Maps
Charts
Dashboards

Finishing up

We hope you enjoyed this article, feel free to reach out anything in this article is of interest or potentially difficult to conceptualise. We hope you see how focusing on the job to be done is a much better way to breakdown requirements for your organisation or customer.

Written with StackEdit.