How data synchronisation solutions architecture can deliver information to your audience faster, more securely, and more robustly
In this article we cover how Info Rhino adapted our software and third party software to automate content and data publication to our website for automated delivery to our audiences. Whilst we use our proprietary software to achieve many of these tasks, important plain to understand is how we work with thinking in terms of responsibilities to meet the needs of each individual client. So whilst we may offer this specific solution for your needs we can equally come up with the right solution for your organisation.
If you want to find out more about this service, feel free to contact us here.
Content management systems allow website designers and content managers to maintain information on fairly on small to medium sized websites. A bigger challenge with website content management is the need to move data between different media. For example most content starts out in Word documents, OneNote or other text editors, before being taken into CMS editor screens and configuration files for eventually appearing on the website.
Content is not just text, images, data, key files but document formats for different audiences.
The Internet is becoming more automated. Not only bots, AI language processing models are running against online content. Normally this data is structured, but a key element of data is timeliness.
Thinking about the Data Synchronisation process
We took a good look at the many technologies that exist for storing data. There are many different cloud storage providers, we have blockchain based data storage, we have APIs within cloud providers that can capture data to data lakes, container storage, to name but a few. The challenge with each option is the possibility that future versions of these technologies either won't exist or will be change their versions meaning there will be a need to upgrade code.
We took a look at our requirements for publishing data to our website platform for cryptocurrency data and analytics, the answer was relatively straight forward And even more to the point, cost effective.
High on the list of things to avoid is writing proprietary code that directly interfaces with third party cloud storage providers unless we need to.
Data risk assessment and Data Governance
There is a very real risk as legislation tightens and automation, coupled with AI, will lead to many more false positives when it comes to the posting of information on centralised information data stores. Honest actors can find themselves having significant challenges when working with cloud storage providers. We think there are so many excellent blockchain based providers but keeping this a lot simpler why not have synchronisation between your content and your website.
How we have solved our needs for data synchronisation and how you may can benefit by working with our technologies
We cannot eliminate manual process altogether and neither should we. Developing front ends for content management systems is an expensive process ever point and often over the top. Most times we are just taking information from one format and putting it into a system of another.
Our systems have been improving. We look at structured data, and where possible, bring it into reporting solutions whereby users can access that information through dashboards and other front end solutions. We think more in terms of whether the data can be automatically bought into a website.
Ad-hoc and Schedule
Periodic and Scheduled onboarding of data. We added the ability to have a time schedule to discover data, and to be notified of changes.
Once data is known of, it should be made available in the right format to the right audience. Our Web Data Platform has a report manager and other data aware solutions within it that knows how and where to present this information.
Requirements gathering and solution architecture process
We tend to focus on what is known as promise theory in a way we see "Jobs To Be Done" as complementary to promise theory. Rather than focusing on building more technology into web data platform we thought about what is the primary need.
- Situation - we have data on our systems
- Motivation - my audience can benefit from our information
- Expected outcome - more users will visit our website and consume our services
Defining possible requirements
Once we know these three basic elements we realise the technology is secondary to the requirement. Rather than using all the modern available technologies the requirement is quite straight forward although still not necessarily the simplest to achieve.
- Data is collected from different systems continuously and periodically
- Data is additionally processed and stored in relation to data collection
- Our audiences will want to see and consume this information in a variety of formats
- We don't want to put too much extra technology into web data platform
- Rather than building more into each individual process we may want to add new independent responsibilities that are independent.
Thinking of responsibilities
- Data collection
- Data Processing and collection
- Data publication
- Data onboarding
- Data presentation
We now start to recognise that we may have categories of information that can be grouped into domains. This can help us to simplify thinking about common types of responsibilities. We may be in a position to create templates of responsibilities to automate and parameterise the creation of many of their artefacts and responsibilities.
The importance of responsibilities
Responsibilities are discrete actions that are almost entirely independent to other responsibilities. They are stateless in that they accept or detect an input, perform a process, and produce an output. In the example of data presentation, it has no need to understand about how data onboarding occurred, it definitely does not need to understand how data publishing occurred. We think of this as allowing for interchangeability of responsibilities and effectively Jobs To Be Done. A great analogy - we don't need to know how the grass was cut, we just appreciate that the lawn is tidy.
We now know but there are processes we need to do our data synchronisation needs.
- Scheduled processing
- Completion notification
- Data Consumption
- Data Processing
- Data Delivery
- Data Presentation
Process Tasks Breakdown
We won't list all of these but would rather give a couple of examples;
- Detection - looking for new information on a folder.
- Data Delivery - synchronising information between point A and point B.
Risks appraisal, Cost Benefit Analysis, Business Continuity
You see how we have not looked at the costs until we have an understanding of say the mains responsibilities processes and tasks that we need to achieve. This is for a very important reason because we don't want to solution lies the implementation before understanding our risk appetite. For a specific implementation, we know there things we absolutely need and things that we can live without. Specific implementation we know web host gives us an abundance of cheap storage space that we don't need to have a cloud based infrastructure at the moment and that we can run most of these processes from a desktop in the short term. Once we need to upscale our implementation we could look to moving these desktop processes to a virtual machine in the cloud oh to cloud based architecture. We always thinking in terms of the customer service level we wish to offer my audience and what are competitors do too.
We look at the medium in which the process operates within. For example, we understand that Data Delivery is of information from a file system on a PC to a web server. It should be fairly straight forward to understand that the FTP protocol is probably the best way to achieve this. If we can find software to synchronise this information this may meet our needs. This would lower our development time needed to write bespoke code to talk to an API.
The technologies behind our solution
We will list our applications with a brief description of each one. The important thing is to use our strengths where necessary - for example, C#, dotnet, Business intelligence, automation, parallel and asynchronous execution.
We set up definitions of jobs and processes that can be run to perform one or more tasks. We keep this lightweight and technology can detect processes making it easier to maintain this information.
Executor Processor - Batch Process Publication
Job store information is translated into creating batches of processes for execution. These processes are typically applications or batch files that perform a specific responsibility.
Processor application - Execution
This is a lightweight application which ones applications within it. The processor can either run to completion on a schedule or based upon detecting a file which can itself be zero or more times.
WinSCP FTP application
Script automation capabilities exists within WinSCP. Whilst we have written FTP solutions in dotnet, we want to avoid reinventing the wheel. This solution is perfectly capable of synchronising information between a client and FTP server, and can backup information.
DOS Command Prompt capabilities
whilst we have many input output code processes within the technology we always seek to avoid reinventing the wheel where possible. In some circumstances a simple batch file within XCopy or RoboCopy can be preferable 2 writing an extra class or library feature in.net or Java.
Reporting of artefacts
To reduce complexity we have processes within our DevOps software To detect files only I'm below I folder 2 bring them into centralised locations where we can read what process and artefacts we have. For example the location of log files or batch files. We could take this information and automate this too come on for example housekeeping.
DevOps for deployment automation
Our Full Deployer application has a host of features for helping to deploy and publish applications in addition to generating and maintaining configuration. This is used in many of our Domain processes.
Web data platform detection
We added jobs to the wdp that can detect new content and bring this into our application data store.
Web data platform data presentation
We have multiple interfaces for website audiences to consume our content and data;
- Searchable APIs
We hope you enjoyed this article, feel free to reach out anything in this article is of interest or potentially difficult to conceptualise. We hope you see how focusing on the job to be done is a much better way to breakdown requirements for your organisation or customer.
Written with StackEdit.