
Reliable Platform Engineer
24 hours ago
We are currently seeking a skilled Observability Engineer Site Reliability to join our team. This role will involve building and fine-tuning platform components for the Observability product, working closely with the Lead engineer, performance team, data ingestion, platform DevOps and data visualization teams under Observability product.
This individual contributor position requires experience in Observability and Monitoring initiatives as platform Engineer. The ideal candidate will be able to troubleshoot platform issues, restore service by resolving customer-facing incidents, develop and implement build release pipelines, manage deployment schedules, issues, risks, and impediments.
Key Responsibilities:
- Experience in Observability and Monitoring initiatives as platform Engineer.
- Troubleshoot platform issues and restore service by resolving customer-facing incidents.
- Development and implementation of build release pipelines with accountability for managing deployment schedules, issues, risks, and impediments.
- Agile development experience with team member accountability for commitment and delivery each sprint.
- Troubleshoot and implement corrections to problems associated with connectivity between the supported applications and the clients they serve.
- Provide technical guidance, in the diagnosis of issues as they arise in support of critical applications.
- Drive collaboration sessions among IT and business groups to facilitate optimal support and operation of the relevant applications.
- Provide Site Reliability Engineering techniques such as observability, alerting and performance tuning.
- Contribute to the design, implementation, and enhancement of critical applications.
- Perform proactive analysis and troubleshooting to predict and prevent production incidents.
- Define and contribute to monitoring capabilities for critical applications.
- Collaborate with key vendors on functional, performance and capacity improvements.
- Design and build tools to automate support and monitoring functions.
- Ensure that all implementations of observability meet the requirements prescribed by IT Services through the effective implementation or use of approved processes, methodologies, and deliverables.
- Provide expertise and build solutions for observability applications as well as system integration with internal systems and external vendors.
- Able to provide coding and technical direction to less experienced staff or develops highly complex original code.
- Track infrastructure delivery and dependencies to implementation.
Required Skills and Qualifications:
- Experience with gathering and organizing large volume of data to use for instrumentation into an Enterprise Observability solution.
- Experience with recommending baseline monitoring thresholds, and performance monitoring KPIs and SLAs.
- Experience with installing agents, forwarders, APIs, performance monitoring alerts, dashboards, and data trend analysis.
- Good Knowledge and understanding of Azure foundation components e.g. App GW, APIM, Virtual Network, NSG, Load Balancer, Azure VM etc. is required.
- Experience with Databases Azure SQL, PostgreSQL, MySQL, MongoDB, TSDB or similar databases.
- Knowledge of monitoring tools such as Log Analytics, App Dynamics, Grafana, Prometheus, Splunk, and Sitescope.
- Azure / GCP hands-on with details around pulling observability data from managed services.
- Golang / Python coding or from solutioning background with experience on SRE development and Open telemetry implementation.
- Deploying/managing and optimizing enterprise level observability platform for Grafana OSS products like Mimir,Loki,Tempo, Fluentbit/Vector.
- Design and develop standard Grafana dashboards for critical metrics for various Azure/GCP services using the observability data.
- Experience must include at least one of the following languages: Java (required), Desired--Python, GoLang, node.js.
- Experience in working with ServiceNow or similar Service Management tools.
- Familiarity with Cloud technologies in Azure, AWS, and Google Cloud.
- Experience on PCF, Docker, Kubernetes platform is required.
- Experience with DevOps and CI/CD tools and processes is required.
- Experience in high-performance and high-frequency data streaming (using Kafka etc.) and handling large volume of batch data is strongly preferred but not required.
- Experience with Agile / Scrum methodologies is required.
Benefits:
- 4-year degree (Computer Science, Information Systems, or relational functional field) and/or equivalent combination of education or work experience.
- 1-3+ years of experience on integration engineering related to Observability / Monitoring framework with open source technologies such as Grafana, Mimir, Loki, Tempo, Fluentbit, Vector etc.,
- Hands-on experience with Tools and Technology is preferred.
- 1+ years of experience as a System Reliability Engineer is required.
- Experience working with Open-source platforms and Open Telemetry libraries e.g. Grafana is preferred.
Others:
Our company values innovation, collaboration, and teamwork. We offer a competitive salary and benefits package, as well as opportunities for professional growth and development. If you are a motivated and detail-oriented individual with a passion for Observability and Site Reliability Engineering, we encourage you to apply for this exciting opportunity.
-
Reliable Systems Engineer
17 hours ago
Salem, Tamil Nadu, India beBeeSoftware Full time ₹ 1,80,00,000 - ₹ 2,50,00,000Job OverviewWe treat infrastructure and operations as software engineering problems.Our mission is to build and progress software platforms that enable the provisioning and managing of services in safe, reliable, and scalable ways.We challenge the status quo, use new technologies to build platforms and tooling for engineering teams.In this role, you will...
-
Site Reliability Engineering Executive
2 days ago
Salem, Tamil Nadu, India beBeeSre Full time ₹ 1,80,00,000 - ₹ 2,00,00,000Reliability Engineer LeaderJob DescriptionThis is an exciting opportunity to shape the SRE function within our organisation and be part of a founder member of the Group SRE team.We are seeking a highly skilled and experienced engineer to join our team at Natobotics. As a system reliability leader, you will define, drive, and implement the SRE strategy across...
-
Platform Reliability Specialist
1 day ago
Salem, Tamil Nadu, India beBeeTechnical Full time ₹ 8,00,000 - ₹ 18,00,000Job Overview:We are seeking an experienced Platform Reliability Specialist to join our team. This role is crucial in ensuring the system availability, incident resolution, and production stability of our trading and risk management systems.Key Responsibilities:Monitor and support our applications to ensure their availability and performance.Troubleshoot and...
-
Cloud Reliability Engineer
19 hours ago
Salem, Tamil Nadu, India beBeeReliability Full time ₹ 1,20,00,000 - ₹ 2,42,50,000Are you looking for a challenging role where you can utilize your skills to drive business growth?About the JobThis position is responsible for ensuring the availability, latency, performance, and efficiency of our cloud-based platform.Create customer-centric service level indicators (SLIs) and service level objectives (SLOs) for key services and publish...
-
Senior Site Reliability Engineer T500-20117
4 days ago
Salem, Tamil Nadu, India Delta Air Lines Full timeAbout Delta Tech Hub:Delta Air Lines (NYSE: DAL) is the U.S. global airline leader in safety, innovation, reliability and customer experience. Powered by our employees around the world, Delta has for a decade led the airline industry in operational excellence while maintaining our reputation for award-winning customer service. With our mission of connecting...
-
Reliability Engineering Specialist
7 days ago
Salem, Tamil Nadu, India beBeeReliability Full time ₹ 1,50,00,000 - ₹ 2,00,00,000Job OverviewWe are seeking a highly skilled technical professional to fill the role of Reliability Engineering Specialist.This position involves maintaining the reliability, scalability, and performance of our critical infrastructure, ensuring seamless availability for our services.Key Responsibilities:Maintain IT service and infrastructure uptime.Implement...
-
AWS Platform Engineer
4 days ago
Salem, Tamil Nadu, India beBeeResilient Full time ₹ 15,00,000 - ₹ 20,10,000**System Engineer for Financial Systems**Join our team as a system engineer with expertise in AWS Platform Engineering. Your primary responsibility will be to oversee the reliability and scalability of mission-critical financial systems.This is an opportunity for technical leaders who want to own platforms end-to-end, applying SRE principles to financial...
-
Platform Engineering Leader
4 days ago
Salem, Tamil Nadu, India beBeeTechnical Full time ₹ 1,50,00,000 - ₹ 2,01,00,000We are seeking a seasoned Technical Program Manager to drive the delivery and execution of our platform engineering roadmap. This individual will play a critical role in driving and orchestrating program execution, ensuring seamless alignment across product, design, engineering, and cross-functional teams.About the Role:Drive the delivery and execution of...
-
Senior Site Reliability Engineer
3 weeks ago
Salem, Tamil Nadu, India MindBrain Full timePosition SITE Reliability Engineer Budget- 1.7 LPM Exp- 10 yrs Duration- 6 months Technical Skills: Programming: Proficiency in languages like Python. Operating Systems: Deep understanding of Linux/Windows operating systems and networking concepts. Cloud Technologies: Experience with Azure including services, architecture, and best practices. ...
-
Salem, Tamil Nadu, India beBeeDevops Full time ₹ 18,00,000 - ₹ 24,00,000Job Title:We are seeking a highly skilled DevOps professional with expertise in Site Reliability Engineering.The successful candidate will have a minimum of 7 years of experience in DevOps, SRE or related field. They should possess strong technical skills in scripting languages such as Python, Bash or PowerShell, and experience with CI/CD tools like Jenkins...