This site uses cookies. To find out more, see our Cookies Policy

Site Reliability Engineer (DevOps) in Washington, DC at APEX Systems

Date Posted: 4/16/2019

Job Snapshot

Job Description

Job #:  960938

Role: Site Reliability Engineering
Position Summary: This team currently consists of passionate engineers who strive to demonstrate excellence in the field of DevOps. We are not only responsible for the uptime of the various .COM websites and backend services, but a large portion of the job is to innovate. The Site Reliability Engineers are embedded with the development teams to create shared responsibility, ensure the proper tooling and automation is in place and to measure everything.

Essential Responsibilities:

  • Design and develop complete end to end automation environment using configuration/auto-scaling tools.
  • Lead architecture, monitoring, performance optimization and capacity planning of new infrastructure services to support a high-performance computing environment and ensure 99.9%+ uptime.
  • Respond to off-hours and weekend emergency alerts, alarms, and requests, in keeping with the team's on-call rotation schedule.
  • Work closely with Architects, Security Engineers, Product Managers, SRO and other clients and partners of the SRE team to meet the needs of the organization to stay competitive - from the infrastructure up to the highest level of applications.
  • Strategize with the teams to develop new technology initiatives with a primary focus on availability, supportability, scalability, security, and performance.
  • Configure and tune an enterprise monitoring and instrumentation system(s) to efficiently detect existing issues and predict future issues based on trends
  • Stay up-to-date with technology. Recurrently advance your technical skill-sets.
  • Implement and manage CI/CD pipelines.

Basic Qualifications:

  • Experience in Linux systems administration, management, best practices, and performance tuning
  • AWS Certification
  • 5+ years of hands-on experience as an individual contributor in a systems administration/development or DevOps role working on highly scalable distributed systems.
  • Experience supporting mission-critical platforms, both physical and virtualized environments, using CentOS, RedHat, Ubuntu.
  • Experience designing, building and managing large scale infrastructure in AWS and Rackspace, including experience leveraging one or more coding languages for automation.
  • Ability to communicate and transfer knowledge clearly and effectively in both technical and non-technical manners.

List of Technologies:

  • Automation: Ansible, Puppet, Jenkins, Bamboo, Rundeck
  • Repositories: GIT
  • Web Architectures: NodeJs, LAMP Stack, Java, JBoss, Tomcat, AEM
  • Scripting: Python, Bash
  • Cloud Providers: AWS (CF, AWS CLI, Botocore, Lambda, ECS, Beanstalk, etc.)
  • CDN: Akamai, CloudFront
  • Database: MySQL, Postgres, Mongo, Redshift, Dynamo
  • Containerization/disposable environments: Docker, Vagrant
  • Network Operation Tools: Icinga2, New Relic, Logstash, Elasticsearch, Nagios
  • Operating Systems: CentOS, RHEL, Ubuntu
  • Collaboration Tools: Jira, Confluence, Slack










EEO Employer

Apex is an Equal Employment Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, age, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at 844-463-6178