Site Reliability Engineer (Berkeley) Job at Bay Systems Consulting, Berkeley, CA

ZDRxcDYxTFBPdHlmOWEvY3cxeDhOL1VNZmc9PQ==
  • Bay Systems Consulting
  • Berkeley, CA

Job Description

Overview

Site Reliability Engineer (SRE) role at Bay Systems Consulting. Location: Berkeley, CA (Onsite at Lawrence Berkeley National Laboratory). Employment Type: 56 Month Contract (Extension Possible). Pay Rate: $80/hr + Full Benefits (Medical, Dental, Vision, 401k). Employer: Bay Systems Consulting.

About the Role: Bay Systems Consulting is seeking a Site Reliability Engineer (SRE) to support the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. NERSCs mission is to accelerate scientific discovery through high-performance computing and data analysis for the U.S. Department of Energys Office of Science. As an SRE in the Operations Group, you will help ensure the accessibility, reliability, security, and availability of world-class HPC systems that support over 10,000 scientific users. You will work with state-of-the-art monitoring systems (such as OMNI), respond to real-time alerts, automate processes, and improve reliability for mission-critical infrastructure.

Responsibilities

  • Monitor and support NERSCs HPC facility as part of a 24x7 operations team (including some overnight OWL shifts).
  • Respond to alerts from computer systems, storage, networks, and data center infrastructure by triaging issues or engaging on-call staff.
  • Develop automation to handle routine service conditions and improve system efficiency.
  • Maintain and enhance monitoring tools, pipelines, and alerting systems.
  • Create and maintain scripts and software to integrate HPC system APIs into monitoring pipelines.
  • Collaborate with cross-functional NERSC groups to coordinate maintenance activities and manage diagnostic software.
  • Document and track outages, incidents, and maintenance in the ticketing system.
  • Troubleshoot and resolve diverse technical issues involving HPC, networking, and infrastructure.

Qualifications

  • Required (Level 2) : Bachelors degree in Computer Science, Engineering, or related field (or equivalent work experience).
  • 5+ years of related experience (or 3+ years with a Masters).
  • Strong Linux/Unix administration and command-line skills.
  • Proficiency with programming/scripting languages (Python, C/C++, Perl, Java, or similar).
  • Experience supporting highly available systems in large-scale data centers.
  • Familiarity with networking, firewalls, ACLs, and network protocols.
  • Knowledge of automation and monitoring tools (e.g., Kubernetes, Prometheus, Alertmanager).
  • Strong troubleshooting and communication skills.
  • Preferred (Level 3) : 8+ years of relevant experience (or 6+ with a Masters).
  • Expertise in software development and monitoring pipeline design.
  • Experience leading technical projects and mentoring junior staff.
  • Advanced knowledge of data center management technologies.
#J-18808-Ljbffr

Job Tags

Part time, Contract work, Work experience placement, Work at office, Night shift,

Similar Jobs

Drummond Woodsum

Tax Attorney Job at Drummond Woodsum

 ...and/or international taxation is a plus. Our tax practice is dedicated to delivering comprehensive guidance on federal, state, and Indian tax law issues that emerge during the structuring of strategic, commercial, and financial transactions. Candidates with experience... 

(WCA) Wisconsin Counties Association

Case Manager/Social Worker -Child Protective Services Worker Job at (WCA) Wisconsin Counties Association

Case Manager/Social Worker -Child Protective Services WorkerApplication Deadline: 2025-12-03Job Type: CountyJob Description:This position works within Child Protective Services providing voluntary and court ordered services to families. A child welfare professional... 

Machinify

Healthcare Data Analyst - Post/Pre Pay Data Mining Job at Machinify

 ...powered software products that transform healthcare claims and payment operations. Each year...  .... Machinify is hiring a Healthcare Data Analyst for pre and/or post pay data mining to...  ...claims processing and coding, and a high level of understanding of payment guidelines for... 

Findlay Kia of Las Vegas

Internet Sales Job at Findlay Kia of Las Vegas

Job Title: Automotive Internet Sales ConsultantJob Summary:An Automotive Internet Sales Consultant is responsible for selling vehicles through online platforms and building lasting relationships with customers. This role involves discussing vehicle options, scheduling... 

Kenan Advantage Group

Diesel Mechanic Job at Kenan Advantage Group

 ...Job Description Title: Diesel Mechanic KAG is North America's largest tank truck transporter and logistics provider, delivering...  ...KAG is now seeking to hire all levels of Diesel Mechanics - Apprentice, Levels I, II, III, and HM183 certified! Here are some of the...