Site Reliability Engineer

4 hours ago


India Tyk Technologies Ltd Full time

Who are Tyk and what do we do The Tyk API Management platform is helping to drive the connected world and power new products and services We re changing the way that organisations connect any number of their systems and services Whether internal external public or highly encrypted systems Tyk helps businesses drive value across the retail finance telecoms healthcare or media industries to name just a few If you ve banked online used an app to check the news or perhaps even driven a connected car API s and by extension Tyk make that possible Founded in 2015 with offices in London - UK London - Ontario Atlanta and Singapore we have many thousands of users of our B2B platform across the globe Brands using Tyk range from Lotte Bell T Mobile to RBS Capital One and Vinci We have a varied user base hailing from every continent - even Antarctica Our Mission The internet started by connecting mainframes by the end of the 20th century 600m desktop and laptop computers exchanged email and web-traffic Today around 15 billion things are connected to the internet growing at a rate of a billion per year Tyk are committed to enabling interconnectivity between systems and between devices We ve started by building an API Management platform Total flexibility default remote radical responsibility We offer for everyone - for real Why Tyk was founded on the principle of doing things differently and offering flexibility and autonomy to our employees are two principles that we believe allow our employees to achieve their best results It also means we can build the best possible team location and working hours are no barrier If this sounds like an environment that you believe could work for you then read on to find out more What can you do with us We re looking for a Site Reliability Engineer to manage maintain improve and provide support on our platform You will be curious by nature always looking for ways to improve as we will look to you for new ideas solutions and metrics on how we can improve the platform You will also be our first line of incident management to our clients and will help define our response going forward This is a great opportunity to become an integral part of Tyk as we continue on our journey As a remote first company you will have the opportunity to work with an industry leading distributed team Having access to expertise from across the globe will give you both the support and opportunity to help shape not only Tyk s Cloud platform but also the Tyk as a whole as we continue to grow Here s what you ll be getting up to Maintaining global Tyk Cloud within SL A I O s you will help to define Identifying reliability issues and working together with your squad to solve them Identifying and introducing new metrics and building relevant dashboards Participating in the on-call rotation Working with your squad to expand multi-region and multi-cloud reach of the platform Documenting operational knowledge Conducting post-incident analysis Automating common tasks Be a key shaper and contributor to our continuous improvement agenda - be it the clarity of our user stories how we estimate communicate with other teams or customers - we expect this role to be advocate of continuous improvement Reliability of our new global Tyk Cloud platform Automation of operations and support Writing and maintaining documentation on SRE processes and policies Recommending and implementing ways of driving operational efficiency and driving down our cost to run without impacting service Assisting in penetration testing for Cloud through liaising with our provider providing technical details and environment setup Incident management Here s what we re looking for Strong collaboration skills Launching and operating production scale kubernetes clusters Designing and operating infrastructure on AWS and other providers Operating MongoDB or other document database clusters Operating Redis or other key-value storage clusters Administering Linux servers Maintaining distributed software Operating Prometheus and Grafana Operating logging collection and analysis systems Participating in the on-call rotation 4 00-16 00 UTC Skills Kubernetes containers advanced AWS EKS advanced Linux advanced Terraform and IaC in general proficient Helm proficient Go and or Python familiar MongoDB or similar Redis or similar Monitoring - prometheus grafana thanos familiar Grasp of networking concepts subnets routing peering load balancing NAT etc Common networking protocols DNS TCP IP HTTP TLS UDP Proactive energetic innovative and change oriented Nice to have GCP or Azure Bare metal infrastructure engineering API management experience Large scale distributed storage management Familiarity with Rancher CKA CKAD CKS Creating and delivering production software in Go language Here s why you should join us Everyone has unlimited paid holidays We have total flexibility in hours as we believe creativity flows better when our people are given freedom to decide when they are most productive Everyone is unique after all Employee share scheme Generous maternity and paternity leave Volunteering Days Company retreats Employee Wellbeing platform We all share the same vision - we value authenticity respect responsibility independence honesty diversity and inclusion and most importantly treating others how you wish to be treated We look for like-minded people who bring their personalities to work everyday strive to achieve their personal goals and who are willing to challenge the way we do things why - to make what we do even better Our values tell the story of Tyk - here s how It s ok to screw up We ve found that it s often the stupid or unexpected ideas that turn out to be the successful ones - so try it at least we can say we have The only stupid idea is the untested one It s in our DNA - starting a business with founders 12 hours apart giving our gateway away for free - sure we did that and we d do it again Trust starts with you - make it count Trust is a two-way street - instil it from day one Assume best intent We have each other s back - we re all on the same team Think before you speak or act Make things better Always try to leave things better than when you found them - change is constant inevitable and embraced Be that change we want to see What s it like to work here check it out Tyk is an equal opportunities employer and we are determined to ensure that no applicant or employee receives less favourable treatment on the grounds of gender age disability religion belief sexual orientation marital status or race or is disadvantaged by conditions or requirements which cannot be shown to be justifiable You can see more about us here Powered by JazzHR



  • India CloudHire Full time

    Job SummaryThe Technical Manager for Site Reliability Engineering (SRE) will lead a remote team of Site Reliability Engineers, ensuring operational excellence and fostering a high-performing team culture. Reporting to the US-based Director of Systems and Security, this role is responsible for overseeing day-to-day operations, technical mentorship, and...


  • India BQE Software Full time

    We are seeking a Senior Site Reliability Engineer to lead reliability efforts across our application stack, focusing on high availability, performance, and scalability.This role will own the health and uptime of our mission-critical application , Cloud infrastructure , database system , and monitoring infrastructure . About Us At BQE, our mission...


  • India JoVE Full time

    Jo VE is the world-leading producer and provider of science video solutions with the mission to improve scientific research and education.Millions of scientists, educators and students use Jo VE for their research, teaching and learning.Our institutional clients comprise over 1,000 universities, colleges, and biopharma companies, including such leaders as...


  • Remote, India Rackspace Technology Full time

    Job DescriptionSite Reliability Engineer / Observability EngineerPublic Cloud - Offerings and Delivery - Workforce Mgmt & Delivery Ops /Full - Time / RemoteRackspace is building up its Professional Services Center of Excellence on Application Performance Monitoring Suites.If you enjoy solving complex business problems and can contribute to building next...


  • India pythian Full time

    Remote Site Reliability Engineering - Site Reliability Engineering Full Time Remote Site Reliability Engineer India Multiple Timezones Remote Work from Home Why Pythian At Pythian we are experts in strategic database and analytics services driving digital transformation and operational excellence Pythian a multinational company was...


  • India Xebia Full time

    We are looking for a highly skilled AWS Engineer with strong Python development and Chaos Engineering expertise to design, build, and validate resilient, scalable, and automated cloud-native environments. The ideal candidate will combine cloud engineering, DevOps, and chaos experimentation to improve reliability, fault tolerance, and operational efficiency...


  • India AionNimbius Full time

    We are looking for a Site Reliability Engineering Manager – Cloud Engineering to join our team in Bengaluru.This role will lead operations for a 24x7 cloud environment, ensuring our systems stay reliable, resilient, and ready to scale.You'll be the one making sure incidents are handled quickly, systems are well-documented, and automation is in place to...


  • India Cimpress Full time

    Senior Site Reliability EngineerWho We Are:Cimpress Technology develops cutting-edge, best-in-world software that our mass customization businesses use to create personalized products for over 17 million global customers. Our Mass Customization Platform consists of modular, multi-tenant services. Our businesses can choose the solutions that work for them, or...


  • India iVedha Inc. Full time

    Senior Site Reliability Engineer (SRE) – ELK Expert | Platform Engineering PracticeLocation: India (Remote) - Must be available to work in the EST (US/Canada) Time Zone.Role Summary:Are you a Senior Site Reliability Engineer (SRE) with deep ELK expertise, ready to take ownership of large-scale observability infrastructure?We're looking for an SRE with 7+...


  • India MindBrain Full time

    Position SITE Reliability Engineer Budget- 1.7 LPMExp- 10 yrsDuration- 6 monthsTechnical Skills:Programming: Proficiency in languages like Python.Operating Systems: Deep understanding of Linux/Windows operating systems and networking concepts. Cloud Technologies: Experience with Azure including services, architecture, and best practices. Containerization and...