Lifting the veil on BrainBox AI’s data architecture

Written by David Calado Varela | Jul 25, 2024 6:54:23 PM

AI. It’s all anyone’s talking about. But hidden in the hype are some truly industry-revolutionizing applications of artificial intelligence systems that have been created, deployed, tested, and refined over years. Once such case is that of AI for commercial HVAC systems.

In this interview, we’ll be picking at the expansive grey matter of BrainBox AI’s David Calado Varela, the mechanical engineer whose work is instrumental to crafting the algorithms that are saving energy and reducing emissions in buildings around the world.

Read on to discover David’s thoughts on how BrainBox AI is addressing concerns around AI, data safety, and why the company’s algorithmic architecture is so unique and successful.

David, there’s a lot of talk about AI and understandable concerns around accuracy and safety. When we look at something like AI that’s designed to integrate into and optimize HVAC systems, what are two fundamental things building owners and operators should consider?

If I had to choose just two, I’d say architecture and safety systems.

First, the architecture of the AI system is crucial. Building owners and operators need to ensure that the AI architecture is robust, scalable, and adaptable to the specific needs of their building. This includes understanding how the AI integrates with existing HVAC systems, how it processes data, and how it makes decisions. A well-designed architecture ensures that the AI can handle the complexities of a building’s environment and can evolve as the building's needs change over time.

Second, it's essential to have multiple layers of safety measures in place to prevent any potential issues. This includes real-time monitoring and alert systems that can detect anomalies and take corrective actions automatically. For example, if the system detects a significant temperature drop, it should have protocols to alert the building operators and temporarily release control back to the building's original systems to ensure occupants' comfort and safety. These safety systems should be designed to default to safe conditions, ensuring that the building remains operational and comfortable, even if the AI encounters an unexpected situation.

"A well-designed architecture ensures that the AI can handle the complexities of a building’s environment and can evolve as the building's needs change over time."

Turning to BrainBox AI specifically, what do we do to ensure that the architecture of our algorithms is operating safely and smoothly? How is it unique?

We currently have three filters or layers of safety when it comes to our algos. The first line of defence is the algos themselves, which are designed to continuously analyze data and make real-time adjustments to maintain optimal conditions within a building. They’re equipped with self-checking mechanisms to detect any anomalies and trigger immediate corrective actions.

The second line of safety lies in two systems. The first system we refer to as the “Guardian”, which is an automated system within the AI architecture that checks to ensure everything’s running as it should. The other system is a series of standard HVAC operation golden rules, which are set by ASHRAE. This includes generalized standards for things like indoor air quality, energy efficiency, system maintenance, and temperature and humidity control.

These two systems, the Guardian and the golden rules, act as “judges”, checking for anomalies and continually evaluating all commands along with the current building conditions to ensure setpoints are met and equipment is operating up to code.

The third line of defense is manual intervention. While our system is designed to minimize the need for manual intervention, we feel it’s important to have the fall-back option of human oversight. In the unlikely event that the first two lines of defense fail, our 24/7 monitoring team will step in. However, this situation is extremely rare.

As to how it’s unique: Most companies don't have our level of automation; they mainly just raise alarms for someone else to address manually. We automate the response, and release control back to the building if there's an issue that cannot be resolved autonomously.

"Most companies don't have our level of automation; they mainly just raise alarms for someone else to address manually."

Could you give us a tangible example of the layers of safety in action?

Sure. So, let's say the temperature drops to 10°C during the night because all the equipment is off. If our system detects this drop, it raises an alert, indicating there's an issue. The system then releases the control points related to that equipment, effectively giving back control to the building. This allows the building to start heating and return to normal occupied values. Once the temperature stabilizes, we regain control. Essentially, our safety measure ensures that in any problematic situation, we stop controlling and let the building take over by default.

So, essentially these lines of defense act like safety switches, ensuring the automated system runs as it should?

Exactly. They're always monitoring each system and will disengage if anything goes wrong. This ensures that if the temperature drops too low during the night in winter, for example, the heating system kicks back in to prevent pipes from freezing.

How frequently do these automated checks happen?

We run a water-side cycle every single minute and an air-side cycle every five minutes, 24 hours a day. This means we extract data, run commands through our AI, and then pass them to the safety service which intervenes if there’s any issue detected in the boilers, chillers, fans, dampers, etc. So, basically, we control a building’s HVAC system until there’s an alarm, and then we stop controlling the building and send it back to its default settings.

Is control given back only when there’s a fault detected? What about if a technician needs to perform maintenance on an RTU?

In BACnet, there are 16 priorities (16 being the lowest priority and one being the highest). BrainBox AI typically writes back to a building at priority nine, which is just one priority below a manual override. So, if there's an operator that wants to, say, turn off a fan to conduct maintenance, they can overwrite at priority eight to stop the fan, which will remain off until they clear the priority. After that, BrainBox AI automatically regains control.

We’ve talked about control, but what about accuracy? How does BrainBox AI’s data architecture ensure that a building’s data is stored, accessed, and used correctly?

Our data architecture involves multiple layers of data storage. We keep the last three hours of data in a hot cache for quick access, while longer-term data is stored in a database. Additionally, we store all data in a Data Lake, which we enhance with additional information. This setup allows us to identify and enhance key performance indicators on the fly, making the calculation of metrics like runtime and occupancy easier and more accurate. In fact, we have a 99.6% accuracy rate. This high level of accuracy is largely due to our practice of constantly re-validating our models against previous data - even after the initial training.

"We have a 99.6% accuracy rate. This high level of accuracy is largely due to our practice of constantly re-validating our models against previous data."

How much building data does BrainBox AI actually collect?

A lot. Each building can have anywhere from 100 to 20,000 data points. And we perform 288 air-side data extraction cycles and 1440 water-side cycles per building per day. So, it’s safe to say we’re capable of handling hundreds of thousands of commands per building every day. Think about it, it takes one technician to create a command for each piece of equipment manually. Now we can execute thousands of commands without anyone being on site. Of course, we still need technicians to perform repairs – there’s only so much AI can do – but with this technology, they’re able have much more visibility over their equipment and can make more informed decisions way faster.

"It’s safe to say we’re capable of handling hundreds of thousands of commands per building every day. Think about it, it takes one technician to create a command for each piece of equipment manually. Now we can execute thousands of commands without anyone being on site."

What about the actual integration process? What does that look like?

Once we connect to a building, we compile a report to identify all extractable data points and begin the pre-extraction process. Simultaneously, we map the building zone by zone and RTU by RTU, preparing the necessary Haystack information. Haystack tagging is a standardized methodology for data tagging and modeling, which links up all the data points and helps to ensure that data is consistent and easily understandable across different systems. For us, this includes data like occupancy status, fan cycles, equipment status, commands, temperature, etc. Once all this is prepared, we launch the data extraction phase, gathering the data needed to auto-configure and deploy our algorithms.

By using Haystack tagging, we can essentially form a blueprint of the mechanical architecture of the building, enabling us to accurately categorize and label all data points so they’re simpler to analyze and optimize. This normalization process is crucial because it allows our AI to interpret the data uniformly, regardless of the original source or format.

So, for the first week or two, we run an extraction period to capture the building's typical behavior. Then, we prepare our algorithms based on this data. For example, we determine the usual room temperature ranges and other key metrics. Once everything is set, we launch the building in write-back mode, initially with a fixed schedule approach, before integrating AI for dynamic control.

The beauty of our whole integration process is that we’re able to automate most of it. This means that, once the primary mapping process is complete, we’re capable of onboarding hundreds of buildings per day.

"The beauty of our whole integration process is that we’re able to automate most of it. This means that, once the primary mapping process is complete, we’re capable of onboarding hundreds of buildings per day."

That’s a lot of buildings. How do we ensure the safety of all this data?

We follow strict industry standards and use AWS for secure data storage. Additionally, our safety certifications, like SOC 2, ensure we meet these standards and keep our clients’ data safe.

One last question. Do you have a favorite algo? And what does it do?

That’s a tough one. Optimus has always been my favorite because it was our first born algo and I created it. It was responsible for optimizing the start and stop times of equipment based on predictions, so it used historical data and real-time inputs (e.g., indoor temperatures, outdoor weather conditions, and building occupancy patterns) to forecast when to start and stop equipment. This ensured that a building reached its desired temperature just as occupants arrive. For example, instead of starting the heating system two hours before occupancy, it might start just 15 minutes before. This ends up saving a significant amount of energy while also maintaining comfort levels. Optimus has been revamped over the years, morphing into a 2.0 version of what we know now as Khronos.

Find out more about BrainBox AI's unbeatable tech

View full post