What’s the secret to providing superior service and staying competitive in a changing market? Well, you might learn something from Alta Refrigeration’s experience. Over 10 years, it transformed itself from a custom engineering services company into a scalable industrial equipment manufacturer using an edge-oriented control architecture to efficiently manage a growing installed base.
Alta has been designing and installing refrigeration systems across the United States for more than 45 years. For many years, these systems were large, custom-designed systems that used a central machine room to deliver refrigerant to various facility areas through long, overhead piping runs. Due to their size, these systems required significant time to design and program, and competitors were able to steal some of their market share with cheaper, simpler offerings.
“Competitors could use 20-30 cheaper units with control limited to a dumb thermostat to compete against one of Alta’s large systems,” says Peter Santoro, controls engineer at Alta.
Alta knew it couldn’t compete by reducing its product quality, so the company looked for a way to standardize its offering without sacrificing features.
We’re the Experts
In 2013, Alta introduced its Expert series of modular refrigeration control units. Each unit uses a standard, reliable design and can be mounted on the roof above the area it served, simplifying installation.
“A single Expert has almost as much I/O as an entire centralized system and, because the units are much smaller, the wiring and conduit runs are incredibly short, allowing us to cram in a ton of sensors,” explains Santoro. “The units themselves are also incredibly efficient. We analyze external ambient conditions and refrigerated space and do real-time thermodynamic calculations. This lets us do variable capacity refrigeration, and only run exactly the amount of refrigeration as needed. All motors are on variable speed drives. We also design many of the sensors we use on the system, allowing us to get precise valve positioning and monitor refrigerant levels throughout the system. We make good use of Hall effect sensors in various configurations to monitor refrigerant levels and motor positions. There is also a dedicated energy monitor on each unit so we can monitor voltages and power usage.”
Since all Expert systems are essentially the same, Santoro and his colleague, Todd Hedenstrom, were able to focus on creating a robust and complete solution that works for many different applications.
A good problem to have
Market response to Expert has been very positive. Alta has sold nearly 600 units and is typically sold out into the next year. But growth brings its own challenges. With only a small controls engineering team, servicing the growing installed base became time-consuming.
Adding to this time crunch, some aspects of Alta’s previous designs related to system maintenance issues. For example, the control system required numerous steps to properly update control strategies in the field, including exchanging files between the control engine and the web server used for remote connectivity. And because Alta had previously left the details of remote connectivity to each customer, this increased the team’s workload by requiring them to check in on each site every day using different methods—such as VPN, Citrix, LogMeIn, and TeamViewer.
Alta’s centralized control system design was built around an industrial PC (IPC) running custom C++ code on top of a distributed I/O system from Opto 22. When designing Expert, this control system was simplified by replacing the IPC with an Opto 22 PAC (programmable automation controller).
Though this change was an improvement because it allowed for all the components of the system to be managed through the PAC, it still required a multi-step update process and didn’t provide as much data access as Alta wanted. This led Alta to explore use of Opto 22’s groov EPIC (edge programmable industrial controller) system.
EPIC supports all the power, I/O, communications, storage, and networking functions of an IPC and PLC on a single backplane without the complexity of maintaining a full Windows OS environment.
Web interface via the controller
Santoro and Hedenstrom started by using groov EPIC’s operating system shell to port their PAC application to C++.
The new program controls the installed I/O modules—voltage and current sensing inputs and discrete AC outputs—using Opto 22’s C++ OptoMMP SDK (software development kit). The application also includes its own Modbus server that creates and manages connections to variable frequency drives, the local energy monitoring unit, and other remote devices.
Each Expert web interface is served from an EPIC controller. The interface includes prebuilt templates for different unit configurations and verifies system settings to help technicians identify configuration values that are out of range or not recommended. It also generates alarms as needed. Alternatively, customers can access unit data through the Expert’s Modbus server or REST API.
For managing groups of Experts, Alta uses a separate HMI server to read data from each unit and present a unified view of the entire system. “All of our sites are required to have a local interface for operators to see a global view of their refrigeration units, instead of having to manage network connections to hundreds of individual units,” Santoro explains.
To create this site-level HMI, each Expert stores transient data in the shared memory scratchpad area of the groov EPIC. Alta’s HMI server runs on Windows and uses Opto 22’s .Net OptoMMP SDK to retrieve data from all units in one-second increments. Data is stored in cyclical files that maintain a oneweek buffer, and the HMI server uses this data to generate trends, charts, and email notifications.
Alta can also access this data remotely for troubleshooting latest events. By default, groov EPIC does not route traffic between its Ethernet ports, so Alta can use the controller to create a security zone for each Expert. One port on each EPIC connects to a private network exclusively for the controller and its remote devices. The other port is connected to a common network between all the units at a given site, as well as the local HMI server.
This server is connected to the internet and uses MQTT to send and receive data, acting as a middleman for each individual Expert to the MQTT broker that resides in Alta’s headquarters. When Alta’s remote HMI requires new data, it sends a request to the local server over MQTT. The data is then queried and sent back. External connections to local HMI servers are restricted so that the only traffic allowed through is from outbound MQTT TLS connections.
Recently, Alta also made it possible for customers to access this remote server. The server has its own database that records temperatures and energy usage for each Expert in 10-minute intervals.
Nationwide data aggregation
Using groov EPIC, Alta has now built a nationwide HMI that aggregates data from its network of Expert units and highlights any issues the team needs to act on. Instead of spending hours every day to check on each site, they can monitor their entire installed base in minutes. They know when there is a problem, can input and track necessary work orders, track technicians’ locations, and monitor energy usage per unit. When an alarm occurs, the system creates an interactive timeline of events before and after the alarm event.
“Often, we know what the problem is before the customer calls. We just need to drive there and fix it,” says Santoro. “With the amount of data we get from our units, we are capable of diagnosing the vast majority of problems remotely. This allows many of our end users to not even staff on-site maintenance. And there’s no interfacing with third-party systems anymore. It’s all integral.”
Servicing the systems themselves has also become much simpler now that Alta can manage the entire platform—I/O configuration, control strategy, communications, and networking—through a single device. “One of the best features we introduced was the ability to update the programs through our web interface. Now a batch program packages all the program files into a .gz (compressed) file. Technicians can upload the file and restart the system,” Santoro says.
Alta also uses the groov EPIC’s touchscreen as a maintenance interface inside the control cabinet. The native groov Manage application allows them to view and modify I/O and network settings directly on the controller without using a separate computer interface. Using the EPIC’s native HMI server, groov View, Alta also provides technicians with local control options and basic information about the Linux program’s status.
Andrew Latham has worked as a professional copywriter since 2005 and is the owner of LanguageVox, a Spanish and English language services provider. His work has been published in "Property News" and on the San Francisco Chronicle's website, SFGate. Latham holds a Bachelor of Science in English and a diploma in linguistics from Open University.
When I began my career in technical operations (mostly what we call DevOps today) the world was dramatically different. This was before the dawn of the new millennium. When the world’s biggest and most well-known SaaS company, Salesforce, was operating out of an apartment in San Francisco.
Back then, on-premise ruled the roost. Rows of towers filled countless rooms. These systems were expensive to set up and maintain, from both a labour and parts perspective. Building a business using only SaaS applications was technically possible back then but logistically a nightmare. On-prem would continue to be the default way for running software for years to come.
But technology always progresses at lightspeed. So just three years after Salesforce began preaching the “end of software”, Amazon Web Services came online and changed the game completely.
Today a new SaaS tool can be built and deployed across the world in mere days. Businesses are now embracing SaaS solutions at a record pace. The average small to medium-sized business can easily have over 100 SaaS applications in their technology stack. Twenty years ago, having this many applications to run a business was unthinkable and would have cost millions of dollars in operational resources. However, at Rewind, where I oversee technical operations, I looked after our software needs with a modem and a laptop.
SaaS has created a completely different reality for modern businesses. We can build and grow businesses cheaper and faster than ever before. Like most “too good to be true” things, there’s a catch. All this convenience comes with one inherent risk. It’s a risk that people rarely discussed in my early days as a DevOps and is still rarely talked about. Yet this risk is important to understand, otherwise, all the vital SaaS data you rely on each and every day could disappear in the blink of an eye.
And it could be gone for good.
This likely goes without saying but you rent SaaS applications, you don’t own them. Those giant on-prem server rooms companies housed years ago, now rest with the SaaS provider. You simply access their servers (and your data) through an operating system or API. Now you are probably thinking, “Dave, I know all this. So what?”
Well, this is where the conundrum lies.
If you look at the terms of service for SaaS companies, they do their best to ensure their applications are up and running at all times. It doesn’t matter if servers are compromised by fire, meteor strike, or just human error, SaaS companies strive to ensure that every time a user logs in, the software is available. The bad news is this is where their responsibility ends.
You, the user, are on the hook for backing up and restoring whatever data you’ve entered and stored in their services. Hence the term “Shared Responsibility Model”. This term is most associated with AWS but this model actually governs all of cloud computing.
The above chart breaks down the various scenarios for protecting elements of the cloud computing relationship. You can see that with the SaaS model, the largest onus is on the software provider. Yet there are still things a user is responsible for; User Access and Data.
I’ve talked to other folks in DevOps, site reliability, or IT roles in latest years and I can tell you that the level of skepticism is high. They often don’t believe their data isn’t backed up by the SaaS provider in real time. I empathize with them, though, because I was once in their shoes. So when I meet this resistance, I just point people to the various terms of service laid out by each SaaS provider. Here is GitHub’s, here is Shopify’s and the one for Office 365. It’s all there in black and white.
The reason the Shared Responsibility Model exists in the first place essentially comes down to the architecture of each application. A SaaS provider has built its software to maximize the use of its operating system, not continually snapshot and store the millions or billions of data points created by users. Now, this is not a “one-size fits all scenario”. Some SaaS providers may be able to restore lost data. However, if they do, in my experience, it’s often an old snapshot, it’s incomplete, and the process to get everything back can take days, if not weeks.
Again, it’s simply because SaaS providers are lumping all user data together, in a way that makes sense for the provider. Trying to find it again, once it’s deleted or compromised, is like looking for a needle in a haystack, within a field of haystacks.
The likelihood of losing data from a SaaS tool is the next question that inevitably comes up. One study conducted by Oracle and KPMG found that 49% of SaaS users have previously lost data. Our own research found that 40% of users have previously lost data. There are really three ways that this happens; risks that you may already be very aware of. They are human error, cyberthreats, and 3rd party app integrations.
Humans and technology have always had co-dependent challenges. Let’s face it, it’s one of the main reasons my career exists! So it stands to reason that human inference, whether deliberate or not, is a common reason for losing information. This can be as innocuous as uploading a CSV file that corrupts data sets, accidentally deleting product listings, or overwriting code repositories with a forced push.
There’s also intentional human interference. This means someone who has authorized access, nuking a bunch of stuff. It may sound far-fetched but we have seen terminated employees or third-party contractors cause major issues. It’s not very common, but it happens.
Cyberthreats are next on the list, which are all issues that most technical operations teams are used to. Most of my peers are aware that the level of attacks increased during the global pandemic, but the rate of attacks had already been increasing prior to COVID-19. Ransomware, phishing, DDoS, and more are all being used to target and disrupt business operations. If this happens, data can be compromised or completely wiped out.
Finally, 3rd party app integrations can be a source of frustration when it comes to data loss. Go back and read the terms of service for apps connected to your favourite SaaS tool. They may save a ton of time but they may have a lot of control over all the data you create and store in these tools. We’ve seen apps override and permanently delete reams of data. By the time teams catch it, the damage is already done.
There are some other ways data can be lost but these are the most common. The good news is that you can take steps to mitigate downtime. I’ll outline a common one, which is writing your own backup script for a Git.
There are a lot of ways to approach this. Simply Google “git backup script” and lots of options pop up. All of them have their quirks and limitations. Here is a quick rundown of some of them.
Creating a local backup in Cron Scripts
Essentially you are writing a script to clone a repo, at various intervals, using cron jobs. (Note the cron job tool you used will depend on the OS you use). This method essentially takes snapshots over time. To restore a lost repo, you just pick the snapshot you want to bring back. For a complete copy use
git clone --mirror to mirror your repositories. This ensures all remote and local branches, tags, and refs get included.
The pros of using this method are a lack of reliance on external tools for backups and the only cost is your time.
The cons are a few. You actually won’t have a full backup. This clone won’t have hooks, reflogs, configuration, description files, and other metadata. It’s also a lot of manual work and becomes more complex if trying to add error monitoring, logging, and error notification. And finally, as the snapshots pile up, you’ll need to consider accounts for cleanups and archiving.
Syncthing is a GUI/CLI application that allows for file syncing across many devices. All the devices need to have Syncthing installed on them and be configured to connect with one another. Keep in mind that syncing and backing up are different, as you are not creating a copy, but rather ensuring a file is identical across multiple devices.
The pros are that it is free and one of the more intuitive methods for a DIY “backup” since it provides a GUI. Cons: Syncthing only works between individual devices, so you can’t directly back up your repository from a code hosting provider. Manual fixes are needed when errors occur. Also, syncing a git repo could lead to corruption and conflicts of a repository, especially if people work on different branches. Syncthing also sucks up a lot of resources with its continuous scanning, hashing, and encryption. Lastly, it only maintains one version, not multiple snapshots.
Using SCM Backup
SCM Backup creates an offline clone of a GitHub or BitBucket repository. It makes a significant difference if you are trying to back up many repos at once. After the initial configuration, it grabs a list of all the repositories through an API. You can also exclude certain repos if need be.
SCM lets you specify backup folder location, authentication credentials, email settings, and more.
Here’s the drawback though, the copied repositories do not contain hooks, reflogs, or configuration files, or metadata such as issues, pull requests, or releases. And configuration settings can change across different code hosting providers. Finally, in order to run it, you need to have .NET Core installed on your machine.
Now that’s just three ways to backup a git repository. As I mentioned before, just type a few words into Google and a litany of options comes up. But before you get the dev team to build a homegrown solution, keep these two things in mind.
First, any DIY solution will still require a significant amount of manual work because they only clone and/or backup; they can’t restore data. In fact, that’s actually the case with most SaaS tools, not just in-house backup solutions. So although you may have some snapshots or cloned files, it will likely be in a format that needs to be reuploaded into a SaaS tool. One way around this is to build a backup as a service program, but that will likely eat up a ton of developer time.
That brings us to the second thing to keep in mind, the constantly changing states of APIs. Let’s say you build a rigorous in-house tool: you’ll need a team to be constantly checking for API updates, and then making the necessary changes to this in-house tool so it’s always working. I can only speak for myself, but I’m constantly trying to help dev teams avoid repetitive menial tasks. So although creating a DIY backup script can work, you need to decide where you want development teams to spend their time.
So what’s the way forward in all of this? There are a few things to consider. And these steps won’t be uncommon to most technical operations teams. First, figure out whether you want to DIY or outsource your backup needs. We already covered the in-house options and the challenges it presents. So if you decide to look for a backup and recovery service, just remember to do your homework. There are a lot of choices, so as you go through due diligence, look at reviews, talk to peers, read technical documentation and honestly, figure out if company X seems trustworthy. They will have access to your data after all.
Next, audit all your third-party applications. I won’t sugarcoat it, this can be a lot of work. But remember the “terms of service” agreements? There are always a few surprises to be found. And you may not like what you see. I recommend you do this about once a year and make a pro/cons list. Is the value you get from this app worth the trade-off of access the app has? If it’s not, you may want to look for another tool. Fun fact: Compliance standards like SOC2 require a “vendor assessment” for a reason. External vendors or apps are a common culprit when it comes to accidental data loss.
And finally, limit who has access to each and every SaaS application. Most people acknowledge the benefits of using the least privileged approach, but it isn’t always put into practice. So make sure the right people have the right access, ensure all users have unique login credentials (use a password manager to manage the multiple login hellscape) and get MFA installed.
It’s not a laundry list of things nor is it incredibly complex. I truly believe that SaaS is the best way to build and run organizations. But I hope now it’s glaringly obvious to any DevOps, SRE or IT professional that you need to safeguard all the information that you are entrusting to these tools. There is an old saying I learned in those early days of my career, “There are two types of people in this world – those who have lost data and those who are about to lose data”.
You don’t want to be the person who has to inform your CIO that you are now one of those people. Of course, if that happens, feel free to send them my way. I’m certain I’ll be explaining the Shared Responsibility Model of SaaS until my career is over!
Dave North has been a versatile member of the Ottawa technology sector for more than 25 years. Dave is currently working at Rewind leading 3 teams (devops, trust, IT) as the director of technical operations. Prior to Rewind, Dave was a long time member of Signiant, holding many roles in the organization including sales engineer, pro services, technical support manager, product owner and devops director. A proven leader and innovator, Dave holds 5 US patents and helped drive Signiant’s move to a cloud SAAS business model with the award winning Media Shuttle product. Prior to Signiant, Dave held several roles at Nortel, Bay Networks and ISOTRO Network Management working on the NetID product suite. Dave is fanatical about cloud computing, automation, gadgets and Formula 1 racing.