Article
Case Study: Capturing Quality Images for ML Algorithm
This blog contains Part 3 of the Cybersecurity for Software as a Medical Device blog series, which featured an interview with Bruce Parr, a DevSecOps leader and innovator at Paylocity. The following are links to each part of this blog series:
Brett:
OK Bruce, so those are the success stories. But it can’t have been all roses and sunshine. What didn’t work? What were some of the hard lessons learned that you picked up along the way?
Bruce:
One thing that seems obvious, but it’s easy to forget, is that in any organization, there’s always going to be pushback. It’s part of human nature. I always tell this story, because I think it’s a great story from when I was first training at Paylocity to be a team lead.
First, a little background. In our organization, a team lead is basically a scrum master for a product or tech team. They run the team along with the project owner. In this story, we were presenting the performance fixes we had been working on at a product briefing. A senior executive was in the product briefing, and said, “Listen, I appreciate everything that your team is doing to improve the performance of the product. But I can’t sell performance. I can only sell features.”
What I took out of it was that performance is part of quality, but you can’t sell it. It’s the same way you can’t sell security, even though security is a part of quality. And quality is non-negotiable.
So how do I deal with pushback? When I go to the product owners and the team leads, I make the case that being proactive and attentive to security actually lets them develop more features. We can prove to them that they can move faster if they attack security early. If they build security into design, instead of waiting until the software is already built to address security, they won’t have that security technical debt coming due just when their features are otherwise ready to go live.
I explain that it may take an engineer a half hour to address the security concerns of the story before they develop it. If it isn’t addressed now, and the team develops it without planning for security, then it comes back around as a security debt a month later. That debt takes six hours to fix, which is a 12x difference! They are losing six hours of development time to drop what they are doing and fix this security hole, pronto, and that’s not including the testing requirements. Oh, and by the way, you just completely destroyed your sprint cycle with this priority work that you had nowhere on your backlog.
We make that case, and then we say, “OK, you seem skeptical, but just try it for a few sprints to see if it works. Just see what happens.” So they do (which is likely also in part because we have the credibility of being software engineers and are not seen as some kind of ivory tower auditor or outsider). What they see pretty quickly are a few things that are all clear in the data.
First, they notice that the number of security bugs that are coming back is dropping. Second, they notice that the average complexity of the security debt that is coming back is getting lower. The bugs that are coming back are not nearly as difficult as what they used to be.
When you multiply a lower number of security bugs by a lower complexity per bug, that’s a big gain in time. – time that’s available to do what they really want to do, which is to write and ship great software. Teams quickly start to appreciate that this is good for them, even in the most basic terms of the ratio of their time they get to spend doing the coding they love versus the time spent on annoying fixes. Thirdly, they find that by designing for security, and coding for security up front, they have a new outlet to be professionally creative.
A good example of that is a cross site scripting (XSS) attack[1] vulnerability in code. For teams that have embraced security, when they find a XSS bug, they say, “Well, I have to fix it here, so while I’m at it is there any way I can create a universal fix so I don’t see this same issue pop up elsewhere in the code?” And instead of fixing it in one place in the code, they may take another look and say, “if I put a request filter in a response filter, then every single request that comes in is going to get scrubbed for cross site scripting, no matter where it is in the API.” With one fix they’ve solved the immediate problem and also prevented future debt and vulnerabilities.
A fourth thing happens that builds on this third one. Because of the way that we’ve built collaboration and a prestige around secure software as a core part of quality, engineers want to share their security fixes outside of their own team. They usually do a presentation to the rest of the teams and explain how they got rid of this particular form of technical debt once and for all. The other teams look at it and say, “Great idea! We’re going to put a story on the board so that we never have to address this kind of XSS again.”
This is the “lather, rinse, repeat” for how our teams operate: figuring something out and distributing the knowledge to all the other teams. These fixes wind up being presented to a larger group, whether it’s our DevOps Community of Practice, our weekly TGIT (Thank God it’s Thursday) team meetings or internal technical conferences. That’s all baked in as a part of our internal culture.
Brett:
To ask a more provocative question: Do you know if there is a business, or an organization or a team, where you don’t want to do this approach?
Bruce:
I can’t think of a team where you wouldn’t want to do it, honestly. There are some places and contexts where it wouldn’t work, but it depends on the particular project. Almost all of our projects that follow this method are public-facing projects, with public-facing interfaces and back ends. But what if you’re working on something that’s strictly internal? In that case, you may want to scale it down. You probably want to address some of the security concerns, but you don’t need to cover all of them.
Let’s say, for example, that you have a situation where you have an internal application, but you’re setting configuration for devices everywhere. No public user, no non-employee, is ever going to get access to that application, but the data you’re providing is being consumed by devices outside of the organization. Certainly, you’d still want some security on that to handle situations like XSS input validation or SQL injection. You’ve got an external consumer that’s depending on the application, and there could be ramifications if they get bad data.
I’m not saying that the internal employees intentionally create bad data. But you just don’t know. If somebody gets on your network and they get into the database or make changes to the API – that’s exposure. So pursuing security even for internal projects is still worth it.
Randy:
What are the cultural assumptions that underlie this? Like, if you don’t have this already in place in your company culture, you’re probably in for a rough road? What other problems in your organization do you have to tackle first before this is going to be effective?
Bruce:
The cultural piece definitely has to come first. You have to have executive buy-in. Your CEO and your top executives have to say, “Here’s where we stand from a standpoint of risk right now. This is what happens if we get breached. This is what our financials look like if we get breached. This is what happens to our employees if we get breached. This is what happens to our customers if we get breached.” From a cultural perspective, it has to be a beneficial viewpoint that they are sharing. It can’t be a punitive viewpoint. Executives have to buy in first, so they can ensure accountability through positive behavior modification – which is exactly what we did with the maturity models and the awards.
We talked about maturity models, but awards are a whole other aspect of buy-in through gamification. When an engineer gets certified as a ninja or champion, that goes companywide. They get recognized on our entire community channel. We even send them swag. It’s a whole big deal that’s baked into our culture.
Likewise, when teams go above and beyond, like that post that I put out on LinkedIn, they all get recognized on our Community board. I make sure I post and give them a big shout out so that everybody sees it. The more that we do that, the more that becomes the expected behavior, which is far better than a punitive approach. It’s that kind of reward atmosphere that our employees really respond to.
The other big thing that we do is our internal award system called “Impressions.” If somebody goes above and beyond in their role, for example with security or something related, I or anyone one of us can nominate that person for an impression. That impression goes on our community Slack channel, it goes into their employee record and it automatically gets pulled into their quarterly review. It’s a big deal. That’s the whole culture of positive reinforcement that we embrace as a company and then that gets embraced in application security.
Randy:
In your story, you’ve talked about the benefit of making security everyone’s problem. When it becomes part of the flow, you can better the tools that are part of the process. You don’t have to stop and think about security; you’ve empowered people as ninjas and champions to fix these issues. Can you give me an illustration of a success story since making security every team’s responsibility?
Bruce:
A year and a half ago, I was thinking to myself about the whole DevSecOps “shift left” approach, which is what we call moving security to the earliest part of the development cycle. One of the challenges that we had (and this is true in every company) was keeping security training relevant in our engineers’ minds. Without security training on a regular basis, it’s going to go out of your head.
Let’s say software and test engineers go through security training once when they are onboarded and then maybe once a year after that. How likely is it that a software engineer who did security training in January is going to remember how to handle a security story that comes around in September? That engineer has to dig around. They have to look at a story and ask, “How does the security learning I did all the way back in January apply to this story in front of me now?”
I was thinking, how could we address this issue in such a way that our software and test engineers don’t need repeated training? How could we make it so that we don’t require them to retain information that’s not in their wheelhouse? For security content – targeted, precise information – that’s not in their wheelhouse, how can we provide it to them at the exact time they actually need it?
The DevSecOps team came up with this idea of a risk advisory framework. It’s an internal web application that we built and ties to our ticket system. Our engineering teams can go to the risk advisory framework and pull in their sprint. They look at all the stories in their sprint and put each story through an interview. The interview is almost like a game of Go Fish, but for software engineers. It asks things like:
All of these are simple “yes” or “no” questions and from the software engineer’s perspective, are not security-centric.
Let’s say an engineer has a story that they’ve identified has string input. When they answer “yes” to that question and click submit, we add the information to that story that has all the specific guidance that they’re going to need. It actually links to the specific guidance, so we can change it or update it as needed.
Now, at the story stage, before they’ve even laid hands to code, they don’t have to go to the security training wiki and try and figure out where the hell that information is again. They have it when they need it because we provide it in the flow of their work. They don’t have to remember training from nine months ago. They don’t have to ask themselves, “Do I need access to XSS? What’s an IDOR?”[2] They don’t have to remember the security nomenclature or acronyms. The tool will provide it to them, so when they point to their stories as part of the software development life cycle or Agile process, they can include the security information right in the story.
Going through this interview may reveal that the story isn’t a three point story any more. Maybe it’s a five point story. That’s two additional points of work, but that’s far easier to deal with now than all the security debt that would have been created otherwise. To my knowledge, that tool doesn’t exist anywhere.
Brett:
So do you sell this approach any differently to the product owners than you do to the team leaders?
Bruce:
My main comment to the product owners is: “Try it.” We’ve gotten some initial pushback to the tune of, “This is one more thing that I have to do.” But my point to the product owners is that you don’t have to do it. Your software engineers can do this. We designed it for them, so they can go in and they can do the story work before you ever have the sprint plan.
The end result is that you have more accurate sprint planning that reflects putting security right into your development and design, and you have less technical debt on the backend. And it has the added benefit of making your software engineers familiar with the information because they’re doing this six sprints in a row.
Now your engineers don’t have to rely on training from nine, 10 or even 12 months ago. They’re doing this over and over again so it becomes muscle memory. The product owners don’t have to deal with it. And the best part of it is when the directors take a look at their monthly numbers, and they say, “Oh, your numbers are way down! This is awesome! Great job, keep it up team!”
Brett:
Ever heard of a tool called Reek?[3]
Bruce:
I have not.
Brett:
It’s one of the Ruby language frameworks for tools on the command line. It’s more than a static link checker. It actually looks for anti-patterns in your code. It alerts you when you have something in your code you don’t want to have, say, three attributes going into your method, and suggests you should get rid of them. You can override it, but you have to publicly say, “this is an exception to the rule.” It’d be really fantastic if there was something that you could run on the command line that would do what your tool is doing, but automated. That would be a godsend for the industry.
Bruce:
I think it’s two different things. I think what we’re after is getting the thought of security into the story before our engineers actually code anything, as opposed to what you’re describing – more or less a static command line scan, which is very rules based. At that point, you’ve already written the code. We want to get security involved before then.
Brett:
Let me tell you how I felt – and how other people felt – about using Reek. Some of the static checking required significant feature changes, changes that could range from very simple, like the example I gave earlier, to those “smells” or anti-patterns that are much more involved and sophisticated.
What would end up happening is that every new person would say, “Oh my god, I hate this tool. I have my piece of code. I tried to check it in, but I can’t get it passed the check-in because it’s got a list of 10 of these things.” You end up addressing them, but all the while, you’re thinking about different ways of coding. Next time, you don’t want that same list of 10 issues again. I think there is some value to a static link checker.
Bruce:
We’re still doing the static scanning though, it’s essentially the same thing you described – but here’s the difference. What you’re describing in the command line and the developer runs, that’s awesome, and we do encourage it. But we also have to have our DevSecOps scans for accountability.
We’re operating on the trust model with our software engineers, that they’re actually running the scans. If we didn’t have a DevSecOps-level scan running and we’re just trusting our software engineers to do it, maybe they will and maybe they won’t. But from both an accountability and audit perspective, we have to be able to say, “Yeah, we’ve got scan builds running on our CI/CD pipeline that build on develop branch success builds.”
Brett:
Yeah.
Bruce:
So that way, we have accountability by instrumenting the process. If the team decides to blow it off, they’re not really blowing off. They’re still getting the emails, they’re still getting the vulnerabilities and their directors are still seeing it.
Randy:
There’s two data points that I would love to ask you about – they can be very specific or more illustrative data points or data stories. What is the macro level? You’ve said, “We’ve gone from X engineers to Y engineers. We are monumentally more secure.” Do you have an estimate of your return on this investment, numbers on productivity or velocity, or time to implementation? Or an estimate of the impact on the budget? You’ve still got just three people running this. You’ve not gone on a hiring spree.
Bruce:
Well, it’s not for it’s not for lack of want. We’ve got open reqs like crazy. It’s like any other company, it’s hard to find people, and when you do find people you interview who you want to hire, they’re gone before we can even say yes. That’s just part of the market.
From a quantifying perspective, here’s a metric I can give you. We had to orchestrate our SAST builds for onboarding, and the onboarding process is onerous. We have to get the CI/CD build for a team, we have to get their code repository, and we have to figure out how they build their project. Then we have to create a debug version of their project that we can submit to have it scanned.
Originally, we had to work with the teams. We were building in their repository and we were changing their build process. What wound up happening is it would take us weeks to onboard a team into static application scanning. It’s an evolutionary process, like Agile software. You do something, you see that isn’t working, you make small incremental changes, you come back to it.
The bottom line is, we went from one to two weeks for onboarding SAST projects to a half an hour from start to finish. From the time that I pick up the ticket, figure out where the repositories and their CI/CD builds are, to building their code out for their scan, to adding the application to the scanner, creating the orchestration in our CI/CD pipeline and updating the database, in most cases it takes me half an hour.
Randy:
Basically, by the time the team goes for their first team kickoff lunch and comes back, they’re up and running.
Bruce:
Yeah. I just had a team hit me up, they want a new static scan. I’ll probably have it done by 10 AM tomorrow. For me, this is second nature. I’m neck-deep in it. I’ve got someone new who’s starting on our team. He’s learning how to do it, and he’s got the guts of it down. He’s already running at an hour per onboarding.
Randy:
So that’s how quickly you can onboard new people?
Bruce:
Our SCA scan literally takes five minutes. I can have a team onboarded in five minutes. I can copy a project, change three variables, run it and show them their results. Setting up a scan, depending on how big it is, takes just five to 10 minutes. They can get their results weekly, nightly, per-build, whatever.
Randy:
You’ve basically got a team in DevSecOps of less than ten FTEs who are now supporting this level of infrastructure for hundreds of engineers. Less than one percent, basically?
Bruce:
Yes, it’s awesome. It works, and as we can add people, it’s only going to get bigger.
Randy:
I want to ask about the other story that you shared in the LinkedIn post, where you walked in one morning and everybody had already solved everything. Can you tell me of one person on the team side who has taken ownership in a “high-five” moment?
Bruce:
It’s kind of interesting, because everybody on our AppSec team operates that way. We’re actually incentivised to do it.
I mentor a lot of our new engineers here. I tell them this all the time: If you want to grow within this organization, you have to find a way to make yourself more visible beyond your manager in a positive way. That means taking ownership of projects, doing presentations outside of the group or other activities. For example, I pretty much own the risk advisory framework. I built the SCA scanning. One of my other teammates owns the SAST scanning. Another teammate built out the DAST scanning and ran with it.
Bruce:
The thing that I love the most is the entrepreneurial spirit behind how we build out our stuff. We say, “Oh wow, I have this problem. I need to get faster onboarding or scheduled scanning beyond what the tools can provide.” It’s a ton of workarounds designed to fit our organizational structure. When we get these kinds of challenges, it’s always fun. I’ll tell my manager about it, and the next thing I know the director says to me, “Hey, I just heard you crushed something.” We’ve made our culture a lot of fun.
Brett:
Bruce, thank you so much for your time today. You and your team and entire organization have done some amazing things.
The key takeaway for me is that data security is both a cultural and technical concern. And I’m already starting to imagine how we can elevate the security discussion throughout our entire process. It’ll be interesting to see if we can make it as fun with our teams as it seems to be with yours.
Randy:
Bruce, I want to echo everything Brett just said. We appreciate your time, sharing this journey and your lessons learned with us. We’re very excited to be able to share this with the medical device software space. What’s a cool bonus for you is that it’s very possible that at some point in the future, someone in your life – a friend or family or college – will need to use a medical device that runs on software. There’s a chance now that this interview, once it’s published, is going to be read by someone who takes what they learn from you here today, puts it into action, and that makes the device that your friend or family member or co-worker uses be safer and more secure!
This blog contains Part 3 of the Cybersecurity for Software as a Medical Device blog series, which featured an interview with Bruce Parr, a DevSecOps leader and innovator at Paylocity. The following are links to each part of this blog series:
References: 1. KirstenS. Cross Site Scripting (XSS) Software Attack | OWASP Foundation. Owasp.org. https://owasp.org/www-community/attacks/xss/. Accessed January 19, 2022. 2. Academy W, Control Insecure Direct Object References (IDOR) | Web Security Academy. Portswigger.net. https://portswigger.net/web-security/access-control/idor. Accessed January 19, 2022. 3. GitHub - troessner/reek: Code smell detector for Ruby. GitHub. https://github.com/troessner/reek. Accessed January 19, 2022.
Related Posts
Article
Case Study: Capturing Quality Images for ML Algorithm
Article
Help Us Build an Authoritative List of SaMD Cleared by the FDA
Article
SaMD Cleared by the FDA: The Ultimate Running List
White Paper
Software as a Medical Device (SaMD): What It Is & Why It Matters