How to overcome data engineering challenges in construction?

Ryan Buckley, Co-founder & CEO ● Jul 30th, 2024

The full transcript

Oleg

Hi, everybody! Welcome to Devico Breakfast Bar! Here we speak with different people involved in the business landscape, share their expertise, delve into the latest tech trends, and explore the ins and outs of IT outsourcing. I'm Oleg Sadikov, and today I'm excited to have Ryan Buckley, co-founder and CEO at Shovels. Don't forget to subscribe and hit the notification bell so you don't miss new episodes. Hi, Ryan!

Ryan

Hi Oleg! Thanks for having me on your podcast.

Oleg

Thanks for joining. Could you start by sharing a bit about your journey from your early career endeavors to your current role as the founder of Shovels?

Ryan

Sure. So, I'm from California. I'm in California right now, and actually was born and raised in Mountain View. Before Google, but after the Surge and Silicon Valley development, I remember playing roller hockey and Netscape and Silicon Graphics parking lots growing up, but I didn't really get the tech entrepreneur bug until after college. So, I went to University of California, Berkeley, studied economics and environmental sciences, worked at a consulting firm in Los Angeles, and met a couple of guys who would become the co-founders of my first company. We started this business. It was a screenwriting software company. When I was in graduate school, at the Harvard Kennedy School, I went to get public policy degree, but these guys that I met in LA convinced me to start building this screenwriting software business while I was still in school, and that got me interested in the whole idea that maybe instead of going into policy, I could become an entrepreneur professionally. I decided to apply to an MBA program to extend my grad school studies. Got into the MIT Sloan Business School program, graduated from that, and then went back to California to go full-time onto this screenwriting software business. Did not really work out. It was a big struggle. It was just really, really hard to make money in this market, but we discovered that there was a need for access to really good writers, and screenwriters could actually write really great B2B content. So, we did a hard pivot from screenwriting software to a content marketplace, a writer marketplace, and we ended up raising a lot of money for that business. And we ran it together for another five or six years. And we sold that business to a private equity firm. And then, that firm hired me to run a mobile data company. Did that for a few years. And when we exited that business, I decided to start Shovels. Had a perfect co-founder from my time running MightySignal. He had the skill set that I needed and the disposition. I knew we would work really, really well together. And I'd already known that because we were working well together. So, when this idea came to start aggregating building permit data, and making it widely available, and allowing startups and large businesses to tap into this resource to enhance their products and make more revenue, it was just kind of obvious that he and I should work together. So, that was about a year and a half ago. And we were able to raise money starting a year ago, and we're now a team of four. We have some good revenue. We have some great customers, a big roadmap, a lot of plans. This company has been really, really fun to work on.

Oleg

That's very interesting: content, mobile, then Shovels. What inspired you to create Shovels and tackle the go-to-market problem in the construction industry?

Ryan

I was looking for a data problem. So, coming off of my company, MightySignal, which cracked open mobile apps and looked at the software installed, it's called SDK intelligence. I was not interested in staying in that business. So, when we sold it, I knew the right decision was to leave because I didn't care enough about mobile data to continue working there. And I really wanted to work in a field that I actually cared about and that would also fit my skill set. Finding that combination is hard, but I knew that if I could figure out a way to solve a data problem in climate tech, I would be very excited to go to work, very motivated to just do what is required. As a founder of a brand-new company, the hardest stage, and I knew what it would take. And I knew that it would be just a lot easier and a lot more rewarding if I could do it for something that I cared about. So, this was this whole thread. At Berkeley, I mentioned, I was environmental sciences. At the Harvard Kennedy School, I studied energy policy. And throughout the last 10 years, all of my volunteer work would be in environmental sustainability. So, if I could find a B2B data problem that addressed something about environmental sustainability, I could combine the two things: my professional interest and my personal interest. And it took me a year to find that intersection. The lines crossed in building permits, and I became aware of building permits because I was doing a building electrification project on my house, and that was really the exposure that I needed to see that this data existed but that it wasn't readily available.

It wasn't nearly as structured and accessible as I wanted it to be. And just doing some of my own kind of research and trying to answer some questions that I had. At the same time, I also was reading How To Avoid A Climate Disaster by Bill Gates, Speed and Scale by John Dewar, and Electrify Everything by Saul Griffith. Those three books. There's one thing they all had in common, which was the importance of residential building electrification, which really means getting the homes off of gas and propane and converting them wherever possible to heat pumps. Even if that home still taps into a coal or gas-powered grid, heat pumps still use less energy overall. If you convert the electricity and compare the energy consumption of a heat pump versus a gas-water heater or a gas-air furnace, the heat pump uses less energy – apples to apples. So, no matter what, it's still a good thing to convert to a heat pump. It's even better if the energy, the electricity coming into the home, is renewable. But that was enough for me to take the dive. And the more I learned about building permits, the more applications I saw outside of climate tech as well, just for the broader building trades. There's a labor market opportunity or data set here as well, which tied directly into my experience running Scripted, which was like a labor marketplace but for writing. With building permits, we can track the work history of building contractors. And that was really, really interesting to me as well. I don't know, there was just kind of enough here where it was like, 'That's it! I am going to dive into this problem.' And I had plenty of optimism that this one would work out.

Oleg

Thanks. As someone who also teaches business and entrepreneurship, how do you balance your role at Shovels with your passion for education?

Ryan

I started teaching at my local community college, gosh, probably five years ago now. I began teaching Intro To Business and then Intro To Marketing. I developed a new course called Intro To Tech Sales, just taught Intro To Management for the first time. They say that the best way to learn something is to teach it. And for me also, being somewhat of a shy person, actually, and not someone who likes to be center stage and focus of attention, learning to be comfortable at the front of a class and...

Oleg

That's why you're participating in podcasts and teaching students, right?

Ryan

That's right. It's a really good skill to be comfortable speaking extemporaneously as well. To be able to speak and think at the same time is the real muscle. For the last few years, I have been training that muscle, and, frankly, it feels really good to have that level of comfort and confidence when speaking. So, I find that besides the content itself, just the act of teaching, and preparing, and thinking through how to explain something in real time often is just really, I think, an essential skill for an entrepreneur. But when the content is also very relevant to what I do at Shovels, that also feels very additive, I guess. Like, it becomes more of a superpower than a burden or a distraction. So yeah, I continue to teach. I have been able to make it work. Just something that I've become pretty comfortable with doing.

Oleg

Data engineering plays a crucial role in Shovel's operations. Could you provide insights into the specific data engineering challenges you have encountered and how you have overcome them to ensure the accuracy of platform?

Ryan

With Shovels, we are very engineering-heavy. This is a tremendous amount of very dirty data. The building permit information is unstructured at the source. Just it is scraped off of HTML pages. So, there isn't an RSS feed of permits that you can tap into. You essentially have to collect the HTML, parse the HTML, and then normalize the schema, so you're mapping certain fields into a master standard schema so that the building permits in the different jurisdictions will all look the same. Even once you've had that, you will realize that the values in the JSON are inconsistent. So, you might have a string where you expect there to only be integers or vice versa. So, that's a challenge, dealing with all the data typing issues. Building permits have dates. The date data type is a lot easier to deal with. There's only so many ways that dates will be formatted, and it's still kind of a tricky thing to deal with, but that is a much easier problem than addresses. So, permits will have addresses, but all over the world, addresses are complicated. They are very complicated in the United States, very inconsistent. Prefixes, suffixes, numbers in the front, numbers in the back. The permit addresses will be incomplete. They'll be missing a city. They'll be missing a zip code. And we pride ourselves on having very, very clean addresses. So, there is a lot of data science that goes into parsing, interpreting, understanding, and matching these partial addresses to databases that we have access to that include properly formatted address strings. So, we got really good at parsing addresses. There's a really interesting challenge in classifying permits. We just started using LLMs to do this. Prior, it was an NLP or machine learning based system to classify permits. We're finding that the LLMs are a lot better at understanding context, especially when there are multiple objects described in the permit.

If you just used a pattern match like a regex or even language processing, there's a nuance that NLP doesn't pick up that an LLM does. For example, if a permit describes building a new fence surrounding a pool, you might classify that in the old system as a pool permit, but the LLM classifies it accurately as a fencing permit because the fence is around the pool. It's a new fence. It is not a new pool. The pool is kind of irrelevant to the permit itself. The LLM knows to ignore that and just focus on the fence. That's something that we found that's really interesting. And then finally, a lot of data work goes into the contractor, deduplication, and essentially creating a contractor entity off of a pretty messy string. It's just kind of a blob of text that represents a contractor, which may include a name, may include an address, possibly a license number, maybe an email, maybe a phone number. Separating out all of that metadata, deduplicating it based on some sort of logic or rules, doing fuzzy string matching, because we don't want to call a contractor distinct if it's just off by a comma or a bit of punctuation or a typo. There's different ways to do fuzzy string matching. At the end of the day, we cluster all of the contractors that seem to be the same entity, the same company, and then assign that all one contractor ID, fill in all of the missing data from the different permits. So, it's kind of like use the other permits to fill the holes and then associate it back to all of their related permits. It's a big data engineering job. So, there are two people on our team that does all of this work. Proud of the work that they're able to accomplish with the small team.

Oleg

That sounds impressive. As technology continues to advance, what role do you see data analytics and artificial intelligence play in the future of the construction industry?

Ryan

Yeah. Like most industries, AI will have a huge impact on the entire field. Really, it's everything from the designing of plans to interpreting and understanding plans, structuring data on existing plans. I bet the construction itself will be driven at some point by AI-driven, autonomous construction equipment. We'll definitely see that in the future. It's already starting to form in certain types of discrete tasks, even using drones or AI robots to paint the outside of buildings. Especially large industrial buildings could be very expensive with human labor, but it gets a lot easier with drones and some sort of even autonomous AI component of these robots and drones. I have been amazed at what the field is evolving into just in the year and a half that I've been paying attention to it. There's a ton of new what we call prop tech, which is property technology and construction tech companies. By the way, the distinction there, it's kind of interesting prop tech, construction tech. Like, where does one start? Where does the other begin? The idea is that construction tech is about anything that happens when the building is still empty. And prop tech is when the building is occupied. And you can see that sort of two different sets of problems and opportunities. Construction very much about the building, but building up, erection of a building, all of the finance, all of the actual hard work that goes into it, the physical work that goes into it, but also the planning, the finance, the engineering, in some ways, the permitting, the land acquisition, the zoning. All of that falls under construction tech. And prop tech, or property tech, usually we kind of think of that as the interface between the occupants and the building, certain types of appliances, services that the homeowners or the building owners will want or need. This is in the climate space, distributed energy resources, getting some of these appliances to interact with the grid, whether that's charger software, HVAC software, water heater software, the convergence of IOT here, the internet of things. All of these appliances in modern buildings are tapping into the internet, therefore tapping into the grid, but also creating a stream of data that can be analyzed. And there's going to be AI there too. So yeah, unsurprisingly, AI use cases are everywhere in this field.

Oleg

Definitely. Given your expertise in analyzing labor market dynamics, what insights can you share about the root causes of tech talent scarcity?

Ryan

I think the scarcity is driven by demand, frankly. It's just that everybody needs good tech talent. And from what I see, from my vantage point, even at a community college, there's a lot of interest in learning JavaScript and Python, at least Python being still the language de jure for most data science and AI applications. So, that's what the kids are learning. And I see that they also like to do all the JavaScript stuff on the front end. But the community college, computer science students that I interact with, they're doing Python and JavaScript, and they're getting hired. That speaks to the strong demand for this particular skill set in the labor market. And to the extent that there's scarcity, it's just because there are so many companies out there looking to hire.

Oleg

As Shovels currently relies solely on an in-house dev team, what considerations led to the decision not to outsource tech needs? And what advantages do you perceive in keeping these responsibilities internally?

Ryan

To be clear, our data engineering is in-house. We have outsourced DevOps, the orchestration with Amazon Web Services. Our job orchestrator, it's like Prefect, and we use Jenkins. I'm going to get this wrong. Maybe Jenkins is the orchestrator, and Prefect is something to do with all of the background jobs. I forget. That's not where I spend my time on Shovels. But we have one outsourced person who actually has been a contractor for Shovels for a while. The thinking is we don't need him full-time, so it's just more of a part-time need for now. And he prefers to be a consultant. So, that's one area that we did outsource. Another one is in application development. We prefer to use a platform, and, well, I guess it's a language called Flutter, a platform called FlutterFlow. And at least for right now, that's not a skill that we need to have in-house, so we're outsourcing that at the moment. But the core Shovels, like our expertise, is in data science, data engineering. I would argue that we should not outsource core knowledge that is really what we are about, and that's what we want to be best at in the market. And so, therefore, that needs to be a skill set that we just maintain internally.

Oleg

What key factors should companies consider when determining whether outsourcing tech needs to the right vendor in their specific circumstances?

Ryan

I've actually always been a fan of getting to know if you go through an agency or a vendor. We want to know the individual engineer, like the actual person who is going to be doing the work and have a direct relationship with them. We don't mind working with the agency. We understand the value of management, but we still want to have a direct line to the person doing the work. That just has worked well for us, helps with communication, also helps with the longevity of the relationship. I also believe that having the outsourced developers join your accounts, like your platform, join your GitHub, and work within environments that you own is important, also just makes it a little easier to transition if it doesn't work out. I've seen some cases where, not so much with agencies, but I've seen it with some individual contributors, where the customer gets locked out of the GitHub account or whatever the platform is because they weren't the admin. The contractor was the admin, and then it became a messy situation.

Oleg

Reflecting on your business operations, are there specific tasks that you believe could potentially benefit from outsourcing in the future? And what criteria would you use to determine their feasibility for external collaboration?

Ryan

Yeah, there could be some other types of very, very specialized DevOps or even AI-type work where it would make sense to bring on an agency or consultants. I think following the model that we've been following, where, for example, the DevOps consultant that we have right now just has a level of expertise and using Amazon Web Services that we just don't have and won't have, and that's when hiring an expert really makes sense. So, I could see in the future, it's like a very, very specific use for AI, or like we determine that we need to learn a certain technology or use a certain technology, and we don't know how to make it fit in our system, we'd hire someone to solve that for us.

Oleg

Thanks. And finally, what advice would you give to aspiring entrepreneurs looking to make an impact in their respective industries based on your own expertise with Shovels and beyond?

Ryan

My advice is to show up in the right conferences, physical places, network, and be very curious and enthusiastic about the work that you're doing, and then, very quickly, the relevant members of the network will find you and embrace you, cause the hardest thing is getting a foothold in a new market and a new network. So, you do that by networking, and putting yourself out there, and being open about what you're working on. I think that's my best advice.

Oleg

Thank you. Thank you very much, Ryan, for joining me today, sharing your experience, your insights, your vision. I'm sure it will be very useful to my auditory. Thanks for joining me today.

Ryan

You're welcome.

Oleg

If you enjoy our discussion and want to stay updated on future episodes, don't forget to subscribe and hit the notification bell. That way, you will not miss out on the latest insights and conversations from Devico Breakfast Bar. See you in a week!

Watch previous episodes

Contact us for a free
IT consultation

Fill out the form below to receive a free consultation and find out how Devico can help your business grow.

Get in touch