Site Reliability Engineer Job in Riverwoods, Illinois

Site Reliability Engineer

By mccannadmin2018 | | Comments are Closed | 22 March, 2023 | 0

Location: Riverwoods, Illinois

Employment Type: Contract

Job ID: 559

Date Added: 03/22/2023

View All Jobs

McCann Partners is supporting a financial institution with onboarding a Site Reliability Engineer for a 6-month contract-to-hire engagement. The client will consider both local and remote resources. Remote resources must work central hours (8 a.m. to 5 p.m. CST). Direct, W2 applicants only.

Overview:

Site Reliability Engineers (SREs) are responsible for keeping production systems running smoothly. SREs blend pragmatic operators and software craftspeople who apply engineering principles, operational discipline, and mature automation to our operating environments.

An SRE specializes in systems (operating systems, networks, observability) while continuously implementing best practices to improve availability, reliability, and scalability.

As an SRE, you will:

Develop and run SRE tooling and observability using automation like CI/CD and Kubernetes.
Build monitoring that alerts on symptoms rather than on outages.
Document every action, so your findings turn into repeatable actions and then into automation.

Debug production issues across services and levels of the stack.
Plan the growth and reliability of services.
Use your on-call shift to prevent incidents from ever happening.
Be on an on-call rotation to respond to “Code Red” incidents to help restore customer-impacting service.

You may be a fit for this role if you have some of these inclinations:

Have the urge to deliver quickly and effectively and iterate fast.
Think about systems: edge cases, failure modes, behaviors, and specific implementations.
As an engineer, when you see something broken, you cannot help but fix it.
Have the urge to document everything, so you do not need to learn the same thing twice.

Strong knowledge of SDLC (System Development Life Cycle).
Strong knowledge of git, Docker, Kubernetes, Jenkins, AWS (Amazon Web Services), or similar technologies.
Know the use of configuration management systems like Chef and Ansible.
Have strong programming skills in one or more of the following languages: C, Ruby, Python, 0r Java.
Good understanding of hybrid infrastructure.

Projects you could work on:

Automation like CI/CD, self-healing of services, end-to-end or performance testing.
Improve monitoring (data Dog, AppD, etc.) and build new smart metrics.
Develop a relationship with a product group and help define their SLO/SLI.

Work directly with AppDev to improve the product through Non-functional and production readiness.
Improve operability, latency, capacity planning, change management, and MTTR (Mean Time to Repair).

Leveling of Site Reliability Engineering:

Technical

Configuration management: use Chef and Ansible to manage our infrastructure effectively.
Infrastructure as code: use Terraform and GitLab CI/CD for automation, containerize the environments (Kubernetes), and leverage cloud technologies to meet our goals.
Systems: manage, configure, and troubleshoot operating system issues, storage (block and object), networking VPC (Virtual Private Cloud), proxies, and CDN (Content Delivery Network) and administer high-availability PostgreSQL and Redis clusters.
Monitoring and instrumentation: implement metrics in Prometheus, Grafana, log management, related system, and Slack/PagerDuty integrations.
Engineering practices: availability, reliability, and scalability, as well as disaster recovery

Use and contribute to code to git.
Experience coding in one or more languages: C, Ruby, Python, Shell, or Java.

Execution

Planning: familiar with agile methodologies; use epics and issues to drive projects.
Organization: workload organization, OKR (Objective and Key Result) leadership.

Management: a manager of one, able to self-organize and report asynchronously.

Collaboration and Communication

Lead and contribute to scope and designs for issues, epics, and OKRs (Objective and Key Result)
Contribute to the Handbook, creating and updating runbooks, general documentation, and writing blogs.
Completing Root Cause Analysis (RCA) investigations and performing readiness reviews.

Improving team practices through code reviews, handoffs of work, and incidents.

Influence and Maturity

Knowledge sharing and mentoring.
Self-awareness, handling conflict in the team, and providing and receiving feedback.
Maintaining good relationships with other engineering teams that help improve the product.

Accountability: willing to proactively step in and do the right thing while providing candid and constructive feedback.

Site Reliability Engineer:

Technical

General knowledge of 4 technical expertise areas, with deep knowledge in 1 area.
AWS Cloud Practitioner, resources provisioning and configuration through CLI/API.
Chef (basic syntax, recipes, cookbooks) or Ansible (basic syntax, tasks, playbooks).
Working knowledge of CI/CD, Jenkins, Nexus, pipelines, and jobs.
Kubernetes basic understanding, CLI (Command Line Interface), service re-provisioning.
Provision and set up metrics in AppD or Grafana, or Datadog.
Provision and set up logs and queries for frequent questions.
Networking VPC, proxies, and CDN (Content Delivery Network).
Working knowledge of git.

Execution

Provides emergency response by being on-call or reacting to symptoms according to monitoring and escalation when needed.

Proposes ideas and solutions to debug, optimize code, and automate tasks.
Plan, design, and execute solutions within Card/Bank to reach specific goals agreed upon within the team.
Plan and execute configuration change operations at the application and infrastructure levels.
Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation.
Experience designing, analyzing, and debugging distributed systems.

Collaboration and Communication

Self-organize through issues and epics.
Improves documentation, either in application documentation or runbooks, explaining the why, not stopping with the what.
Root cause analysis and corrective actions.

Influence and Maturity

Shares the learnings publicly through issues, runbooks, documentation, and blog posts.
Contributes to the hiring process by reviewing questionnaires or being part of the interview team to qualify SRE candidates.
Act as a reliability champion.

Mandatory Skills

5+ years of experience BE/B.Sc.

We’re Chicago-based and Chicago-proud. McCann Partners’ leadership launched in 2011 to create a different recruitment firm where relationships come first and community matters. We’re not just placing talent but making Chicago a brighter place to work and live.

Apply Now

Fill out the form below to submit your information for this opportunity. Please upload your resume as a doc, pdf, rtf or txt file. Your information will be processed as soon as possible.

Full Name*

Email*

Phone* (Phone Numbers Only)

City*

State of Residence*

Please upload your resume as a doc, pdf, rtf or txt file. Resume (optional)

Apply Now

Related Jobs

Business Analyst/Business Systems Analyst

Location: Chicago, Illinois

Employment Type: Contract

Job ID: 565

Date Added: 03/22/2023

McCann Partners is hiring a Business Analyst/Business Systems Analyst for a renewable 12-month, long-term contract engagement at a financial institution in Chicago’s loop. The resource must be flexible to be onsite two days weekly in their Chicago, IL office location with proximity to Ogilvie...

Apply Now

Database Developer

Location: Riverwoods, Illinois

Employment Type: Contract To Hire

Job ID: 576

Date Added: 03/21/2023

McCann Partners has partnered with a financial client in Deerfield, IL to hire a Database Developer for a Renewable 3-month Contract engagement. Our client will consider both on-site and 100% remote candidates. Job Description: Solid experience in the logical and physical design and implementation...

Apply Now

Sr Scrum Master

Location: Chicago

Employment Type: Contract

Job ID: 575

Date Added: 03/18/2023

WORK TO BE PERFORMED: Assist in the “building” of Agile teams/squads within Renaissance Risk Department. Provide support to the Product Owner and Delivery Team. This person should personify Agile. Facilitate discussion, decision making, and conflict resolution. Build a trusting and safe...

Apply Now

Agile Program Managers (2 Openings)

Location: Riverwoods, Illinois

Employment Type: Contract

Job ID: 569

Date Added: 03/13/2023

McCann Partners is supporting a financial institution with onboarding Two Agile Program Managers for a renewable 3-month contract engagement. Applicants should be able to work in a hybrid scenario with two to three days onsite in their Deerfield, IL, office location. Direct, W2 applicants only....

Apply Now

Sr Cloud Security Engineer

Location: Fort Worth

Employment Type: Direct Hire

Job ID: 563

Date Added: 03/06/2023

McCann Partners has partnered with a Little Rock, AR based financial services firm to hire a Sr. Cloud Security Engineer for a direct hire position. This position will hybrid based in the Fort Worth TX area requiring in office 3 times a week and remote the other 2. Position Summary Information...

Apply Now

Site Reliability Engineer