Keeping track of technology stacks has always been a problem for companies and even more so for those companies that use a lot of IT infrastructure.
Beaconsoft is no different. Here we have constantly evolving developer systems for testing purposes. These systems are then implemented either as part of our products or as new technologies that aid us in reducing costs or providing better availability to our end users.
As an AWS Partner, our solutions are all built around AWS technologies. and while costs are always a problem for a new startup, our ongoing journey to keep our systems running, online, and working correctly, has become a constant workload.
However, AWS’s technologies have allowed us much more flexibility than our own hardware or other suppliers had offered in the past.
Technology Stacks Used by Beacon
While this is not a comprehensive and detailed report of our technology stacks, it paints the picture. The Beacon product is broken down into four categories: Short URL handler, Data Collection, Data Processing and Data Presentation. Each of these categories then contains services that power them.
Short URL Handler
Generally referred to as Beacon short links, this is the system which deals with user flow for short link handling and is the primary point of consumer action for our clients. You may have seen a bcn.to short link or even used one to read this article.
Referred to as our Tracker system, this receives the information given from the remote system and saves any data given from a site that has one of our trackers installed.
This is not usually referred to as it’s working away in the background. But it does all the heavy lifting to keep our user side working as fast as possible.
Referred to as the Beacon App, this presents the information to the users in an easy-to-use UI.
Monitoring and Maintaining Beacon’s Technology Stacks
Over the years, we’ve constantly monitored spending to keep our system as available as possible. Decisions were originally made that some services at the time of creation would be fine to run together from a single set of servers.
As with any growing company our IT resources have also been growing, and this recently led us to discover that adding a few new clients was causing a slowdown on one of our critical server groups. We almost fell into the trap of just increasing resources to the server group and leaving it at that to handle this problem.
But this can be a very costly method for updating technology stacks, and it often makes IT systems unmanageable because of a fear of being able to turn them off (in case a legacy system is still using it).
So with this in mind, I keep a close eye on all our systems for what they are powering and what is happening with resources of these server groups.
When this problem started, we did, as a first port of call, add resources to our technology stacks to keep the system stable and operating.
However, this was only a temporary measure, as years in the industry has given me the experience to know that a good IT Manager needs to keep track of, and audit, what and why which servers are running specific systems.
Because of the decision made at the start of the company to put these systems in a shared resource pool, which was a correct decision at the time, we needed to re-justify whether that was still the case.
Was it still okay that these systems can live together or is it costing us more to do this? Also did we need to look at the new technologies which had come out to see if they could help us reduce costs or make the system more stable?
When Beaconsoft first started, monitoring all resource usage for our systems in order to guide decision making was important. While a server group for each service would be nice, it’s too expensive and difficult to maintain while keeping an easy-to-understand audit log of servers and their jobs.
One decision we made was to use a single server group to power three services:
- The Tracker Serving Service (Data Presentation) – this needs to be powered by a server as it has to keep track of tokens and use them inside the generation of tracking code.
- The Short URL Handler – this is the short link redirection service.
- Tracking Receiver Service (Data Collection) This receives the information given from the remote systems.
A Server group is made up of one or more Serving Servers and one or more Database Servers.
We recently started to see two of these services getting a lot more traffic than had been expected with some new clients.
You’re all probably now screaming to yourselves that it was the tracker, and you’d be right. The primary bottleneck was the Tracking Receiver. But it wasn’t what was causing the point of failure to the end user. That was being caused by long response times from the Short URL redirection.
What To Do?
We needed to decide what to do. We could keep adding more resources to the server pool, or we could split apart the services it was providing.
As I said above, originally we added resources as a temporary fix. I then began an evaluation of what went wrong as reported by our users and what our own monitoring tools had reported to me.
After discovering the bottleneck, I started to evaluate how hard it would be to separate these services from the server group. Would we just be increasing our costs to set up a new server group for each or one of these services?
Having attended a couple of Tech Conferences, including AWS Summit London and the recent AWS Partner MeetUp, I’d been investigating the popularity and stability of AWS Lambda and looked at porting these services into serverless groups.
The Situation Today
As of writing this, we have successfully and seamlessly pulled one of our customer facing services into an AWS Lambda serverless situation without losing a second of downtime.
We did it by moving our Database for this server group to an AWS Aurora Serverless MySQL Compatible migrated from a MariaDB RDS which affected all three services for this system. The well-architected code made it a simple changeover and the data that we received was merged into the new Database with a quick script. Our Short Reduction System now works by using AWS Lambda behind an AWS API Gateway.
During testing, we ran into a stumbling point. As an API gateway URL is not short, neither would it work for all our existing short links. After conversations with AWS Support, one of our implementations for security and scalability provided a spotless and simple method for integration.
As part of our security model, none of our servers are actually available publicly (internal IP Addresses only). They all serve via a publicly accessible Load Balancer, even if it’s only one server. This security model allowed us to set up a Host-based routing rule for our API Gateway to the Lambda solution.
This means all short link usage is now cost on actual use, not a static ongoing cost where servers are needed at specific times. These can now change based on the location of our clients customers, with some businesses having more activity out of office hours, others within office hours.
Some businesses are also targeting customers in other time zones, so this had previously prevented us from reliably predicting when we would have to add more servers to our pool or take them out via timed scaling.
What Should You Take Away From This Article?
Foremost, make sure you’re monitoring your IT Usage.
You must ensure your business’ IT Infrastructure Lead knows what every Server in use is for, and exactly what it’s doing. Make sure that any Teams’ Infrastructure is going through a process which makes it simple for them to keep track of what it uses and where. The AWS control panel and API can provide some powerful tools to enable this.
Don’t be afraid to test other options.
Redeveloping systems to push teams to break up services into micro-services and not monolithic services. While it might cost more in the short term, it will often cost less in the midterm, and provide big savings in the long term.
Also keep your IT Team fully involved in all ongoing or new developments. Often your IT Manager can envision how a service will run and the resources it will need to operate. They can also help identify potential bottlenecks and whether you should take certain steps in the initial development to make it easier to manage a services’ resource allocation as it grows in usage.
I hope you’ve found this article interesting and if you’ve any comments – or you’ve implemented different alternative solutions – let us know!