FriscoRecruiter Since 2001
the smart solution for Frisco jobs

Lead Site Reliability Engineer

Company: Gearbox Software, LLC
Location: Frisco
Posted on: September 19, 2023

Job Description:

Site Reliability Engineer, Spark Lead page is loaded Site Reliability Engineer, Spark Lead Apply locations Frisco, TX time type Full time posted on Posted 30+ Days Ago job requisition id JR100142 The Gearbox Entertainment Company is an award-winning creator and distributor of entertainment for people around the world. Gearbox Entertainment develops and publishes products through its subsidiaries, Gearbox Software and Gearbox Publishing. Gearbox Entertainment has become widely known for successful game franchises including Brothers in Arms and Borderlands, as well as acquired properties Duke Nukem and Homeworld. Gearbox's ambition is to entertain the world and its key driving objectives include the pursuit of happiness for our talent, partners and customers, the prioritization of entertainment and creativity and a measured respect for profitability. For more information, visit www.Gearbox.com. o further drive our vision of premier stability and rapid feature delivery, we are looking for a Lead Site Reliability Engineer (SRE) to join our team. As a Lead SRE, you should feel exceptionally comfortable bringing architectural design proposals to the table for consideration among your colleagues on our platform and infrastructure development teams. You will be one of the principal technical designers helping push our cloud-native platform toward the future. You will be responsible for driving the implementation of flexible cloud architectures with an automation-first emphasis; manual user intervention likely makes you uneasy and maybe even a little twitchy. We would expect a successful candidate for this position to be a self-starter with the ability to complete tasks independently. Though you will have access to technical leadership and senior engineers at your disposal, you should feel well acquainted with tackling complex problems without significant oversight. Observability is paramount. If we can't measure it, we can't prove it works; if we can't prove it works, it must be assumed it doesn't work. This is a philosophy you hopefully love (and preferably obsess over). If we can't observe how a new feature is behaving, our SRE team is excited to dive into the application code and make the necessary improvements. Typical DayTl;dr: You will be leading and managing a team of SREs, driving the ownership of observability libraries, implementation of flexible AWS Cloud architectures with an automation-first emphasis, collaborating with other teams, and working on solutions to technical challenges in microservice availability for our online services.This is apeople managementrole with a mix of hands-on lead engineering expectations. Your days will primarily be filled with leading a team of seasoned engineers, empowering them to build solutions to technical challenges in the observability and availability of our SHiFT online services. You will evangelize for and be obsessed with user experience as it relates to the services you support. You will help manage and orchestrate each of these by leaning heavily on technologies likeGo,Terraform,Docker, andBash. On any given day, you should expect to spend at least 25% of your time actively engineering and developing solutions; the remaining time should be a mixture of work planning, team mentoring and pair programming, reviewing code engineers on your team, participating in design meetings, documentation, and self-development.This position will eventually require you to carry a company-paid mobile device and participate in 24/7 on-call rotations alongside your engineering colleagues. Don't worry though, our on-call experience doesn't suck.Core Responsibilities:

  • Lead and manage the day-to-day operations of a team of 3-5 SREs, including road-mapping, task assignments, and performance evaluations.
  • Mentor and train your team in observability best practices and foster a culture of continuous learning and improvement.
  • Lead incident response efforts and troubleshoot critical issues to minimize downtime and maintain high availability of systems.
  • Design and implement solutions for monitoring, alerting, and incident response to proactively identify and resolve issues.
  • Be a trusted voice in the evangelism of reliability engineering throughout the team with an eagerness for mentoring.
  • Work with technical leadership to help define and oversee short and mid-term project roadmaps.
  • Participate in after-hours on-call support rotations.Must Have (the non-negotiable parts):
    • Experience leading and managing teams in a Site Reliability Engineering or related role.
    • Minimum of4yearsprofessional software development experience instrumenting complex observability stacks, preferably in Go.
    • Minimum of2 yearsprofessional experience with containers in a professional setting, preferably Docker
    • Strong understanding of microservices architecture and its associated challenges.
    • Proficiency -in AWS container management, orchestration, and observability features (ECS, Fargate, Aurora, AppConfig, CloudWatch, etc.)
    • Professional Experience in Terraform and/or CloudFormation
    • Adept understanding of observability stack management (otel, tracing, monitoring, alerting, structured logging, APM, etc.)
    • Strong leadership and communication skills, able to lead and mentor other engineers, clearly detail designs and implementations, and effectively communicate with cross-functional teams.
    • Demonstrated experience in driving and leading incident response, incident management, and post-incident review processes.Should Have (some wiggle room):
      • Extensive hands-on experience with OpenTelemetry
      • Hands-on experience developing and maintaining CI/CD pipelines, preferably in git/GitLab
      • Understanding of RESTful and Websocket based APIs
      • Bachelor's degree in computer science, related field, or equivalent training and professional experienceNow you're just showing off:
        • Familiarity with Datadog
        • Familiarity with Atlassian products (OpsGenie, JIRA, Confluence)
        • Experience working with developers in an agile environment
        • Experience in the games industry, preferably launching multiple online-enabled AAAs
        • Knowledge about Gearbox-owned IPs
          Gearbox Entertainment believes that all team members should be able to enjoy a work environment free from all forms of discrimination and harassment. We are committed to reflecting the diversity of the world we strive to entertain. As an Equal Opportunity Employer, we provide fair and equal treatment to all team members and applicants. We do not discriminate -on the basis of -race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability, genetic information, pregnancy or maternity, veteran status, or any other status protected by applicable national, federal, state or local law. - Gearbox Entertainment believes that all team members should be able to enjoy a work environment free from all forms of discrimination and harassment. We are committed to reflecting the diversity of the world we strive to entertain. As an Equal Opportunity Employer, we provide fair and equal treatment to all team members and applicants. We do not discriminate on the basis of race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability, genetic information, pregnancy or maternity, veteran status, or any other status protected by applicable national, federal, state or local law. Similar Jobs (1) Senior Site Reliability Engineer locations Frisco, TX time type Full time posted on Posted 30+ Days Ago About Us Gearbox Studio Qu--bec is an independent AAA game developer. We strive to maintain balance between ambitious projects, a human-scale team, and self-realization. Gearbox Quebec strives to make game development a high-level professional experience. In the heart of beautiful Quebec city, in the shadow of its ramparts, the studio is located in St-Roch, a district boasting a rich and inspiring cultural life close to a host of restaurants, bars, theaters and concert halls. We offer highly advantageous conditions, including flexible time management, a substantial employer RRSP contribution and one of the most generous profit sharing policy of the industry.

Keywords: Gearbox Software, LLC, Frisco , Lead Site Reliability Engineer, Professions , Frisco, Texas

Click here to apply!

Didn't find what you're looking for? Search again!

I'm looking for
in category
within


Log In or Create An Account

Get the latest Texas jobs by following @recnetTX on Twitter!

Frisco RSS job feeds