
Observability Engineer
17 hours ago
We are Hiring for Observability Engineer (Dashboarding & Analytics Developer) , Splunk and technical KPIs (Application or API metrics) logs.
WORK LOCATION : Bangalore/Hyderabad/Chennai
Budget: 25 LPA (MAX)
Notice period : Immediate to 20 Days
MUST BE INCLUDED WITH SUBMITTAL :
- Full Legal Name
- Phone
- Current Location
- Rate
- Work Authorization
- Willing to relocate
- Which tool candidate used to create dashboards/ visualizations
JOB DESCRIPTION :
Which tool candidate used to create dashboards/ visualizations
Observability Engineer (Dashboarding & Analytics Developer)
The JedAI team is at the forefront of developing cutting-edge generative AI platforms that connect to Large
Language Models (LLMs), agents, knowledge bases, and Multi-Channel Processing (MCP) servers. Our
mission is to harness the power of generative AI to deliver innovative solutions that drive eHiciency,
safety, and intelligence across various applications.
- Job Description
- We are seeking a highly skilled Dashboarding and Analytics Developer to join our JedAI team. In this role,
- you will be responsible for the visualization and development of Key Performance Indicators (KPIs) that
- are critical to monitoring and enhancing the performance of our generative AI systems. You will develop
- and maintain comprehensive dashboards that provide real-time insights into the performance of LLMs,
- Retrieval Augmented Generation (RAG) systems, safety mechanisms, other generative AI features, billing,
- token consumption, and many more.
- Dashboard Development: Design, develop, and maintain interactive and user-friendly dashboards for monitoring AI
- system performance.
- KPI Identification: Collaborate with cross-functional teams to define and implement KPIs related to LLMs, RAG
- systems, safety protocols, and other AI features.
- Data Visualization: Create clear and insightful visualizations that communicate complex data trends and patterns
- eGectively to stakeholders.
- Performance Monitoring: Continuously monitor AI system metrics to identify anomalies, performance issues, and
- areas for improvement.
- Data Analysis: Analyze large and complex datasets to extract meaningful insights that support decision-making
- processes.
- Collaboration: Work closely with AI engineers, data scientists, and product managers to align dashboard
- functionalities with project goals.
- Innovation: Stay updated with the latest trends and technologies in data visualization and analytics to introduce
- innovative solutions.
- Documentation: Maintain thorough documentation of dashboard configurations, data sources, and visualization
- methodologies.
- Details of work
- 1. Performance Metrics:
- o Latency and Throughput: Monitor the response times and the number of requests processed per unit time to ensure
- the system meets performance expectations.
- o Resource Utilization: Track CPU, memory, disk I/O, and network bandwidth usage to identify bottlenecks or
- neffiiciencies.
- 2. Model Performance and Drift Monitoring:
- o Accuracy Metrics: Keep track of model accuracy, precision, recall, F1 score, etc., to ensure the models are
- performing as expected.
- o Data and Concept Drift Detection: Monitor for changes in data distribution that could aEect model performance
- over time.
- o Feature Importance Tracking: Observe changes in feature importance to understand and explain model predictions.
- 3. Anomaly Detection:
- o Implement systems to detect unusual patterns or outliers in data inputs, user behavior, or system performance,
- which could indicate errors or security issues.
- 4. Security Monitoring:
- Dashboarding & Analytics Developer
- o Access Logs: Maintain detailed logs of user access and actions for security auditing.
- o Threat Detection: Use intrusion detection systems (IDS) to identify potential security threats.
- o Compliance Monitoring: Ensure adherence to regulations like GDPR, HIPAA, or other industry-specific compliance
- requirements.
- 5. User Engagement and Feedback:
- o Usage Analytics: Analyze how users interact with the system to improve user experience.
- o Feedback Collection: Provide mechanisms for users to report issues or suggest improvements.
- o Session Tracking: Monitor user sessions to understand behavior patterns and enhance personalization.
- 6. Error Handling and Logging:
- o Detailed Error Logs: Capture and categorize errors to facilitate quicker debugging and resolution.
- o Automated Alerting: Set up alerts for critical failures or error rate thresholds being exceeded.
- 7. Audit Trails and Traceability:
- o Transaction Logging: Keep records of all transactions and changes in the system for accountability.
- o Version Control Tracking: Monitor changes in models, code, or configurations to track the evolution of the system.
- 8. Data Quality Monitoring:
- o Validation Checks: Ensure incoming data meets quality standards before processing.
- o Missing or Corrupted Data Detection: Identify and handle incomplete or corrupted data inputs.
- 9. Scalability Metrics:
- o Load Testing Metrics: Assess how the system performs under various load conditions to plan for scaling.
- o Auto-Scaling Monitoring: Monitor the eEectiveness of auto-scaling policies in cloud environments.
- 10. Cost Management:
- o Resource Cost Analysis: Monitor the costs associated with compute, storage, and network resources to optimize
- spending.
- o Budget Alerts: Set up alerts when spending exceeds predefined budgets.
- 11. Deployment and CI/CD Pipeline Monitoring:
- o Deployment Success Rates: Track the success or failure of deployments.
- o Pipeline Performance: Monitor the CI/CD pipeline for bottlenecks or failures.
- 12. Compliance and Governance:
- o Policy Enforcement: Ensure data usage and model deployment adhere to organizational policies.
- o Role-Based Access Control (RBAC): Implement and monitor access controls for diEerent system components.
- 13. Disaster Recovery and Backup Monitoring:
- o Backup Integrity Checks: Regularly verify backups to ensure data can be recovered when needed.
- o Recovery Time Objectives (RTO) Monitoring: Ensure systems can be restored within acceptable time frames after
- outages.
- 14. Customer Support Integration:
- o Ticketing System Integration: Monitor support tickets related to the system to identify common issues.
- o Service Level Agreement (SLA) Compliance: Track metrics to ensure SLAs are being met.
- 15. Visualization and Reporting:
- o Custom Dashboards: Create dashboards tailored to diEerent stakeholdersexecutives, developers, support teams.
- o Scheduled Reports: Automate reporting on key metrics for regular review.
-
- Some tools & skills preferred but does need to check all the boxes:
- Technical Domain expereince of AI LLMs, Retrieval Augmented Generation (RAG) systems, safety mechanisms, other generative AI features, billing,
- token consumption, and many more.
- Data Visualization Tools: Tableau, Power BI, Grafana, Splunk
- Programming Languages: Python, JQL, SPL
- Data Query Languages: SQL
- Cloud Platforms: AWS, Azure, GCP (Likely if Auto-Scaling is a key responsibility)
- Monitoring Tools: Prometheus, Datadog, New Relic, CloudWatch (AWS), Azure Monitor
- Version Control Systems: Git
- Ticketing Systems: Jira, Zendesk, ServiceNow
- Logging Tools: ELK Stack (Elasticsearch, Logstash, Kibana), Splunk
- CI/CD Tools: Jenkins, GitLab CI, CircleCI, GitHub Actions
Shift hours - 2 PM to 11 PM IST Mon - Fri
-
Observability Engineer
2 days ago
Bengaluru, Karnataka, India beBeeBackend Full time ₹ 2,40,00,000 - ₹ 3,60,00,000Job OpportunityWe are seeking a highly skilled software engineer to lead the development of our observability tools and infrastructure. The ideal candidate will have a strong background in backend development, particularly in Golang, and experience with microservices architectures.About the RoleThe successful candidate will be responsible for designing,...
-
Observability Engineer
3 weeks ago
Bengaluru, Karnataka, India IG Group Full timeJob Title Observability EngineerAre you passionate about data-driven insights and leveraging them to optimise both the reliability of complex systems and delivery pipelinesDo you thrive in a collaborative environment working alongside developers and operations to achieve shared goalsIs so this Observability Engineer role in our global team might be the...
-
Observability Engineer
3 days ago
Bengaluru, Karnataka, India IG Group Full time US$ 90,000 - US$ 1,20,000 per yearJob TitleObservability EngineerJob DescriptionAre you passionate about data-driven insights and leveraging them to optimise both the reliability of complex systems and delivery pipelines?Do you thrive in a collaborative environment working alongside developers and operations to achieve shared goals?Is so, this Observability Engineer role in our global team...
-
Senior Observability Engineer
1 day ago
Bengaluru, Karnataka, India Applicantz Full timeTHIS IS A LONG TERM CONTRACT POSITION WITH ONE OF THE LARGEST, GLOBAL, TECHNOLOGY LEADER. We are looking for a Senior Observability Engineer with strong expertise in AWS (EC2, EKS, Lambda), Synthetic Monitoring, and Dynatrace to join our team. The ideal candidate will have hands-on experience in building scalable observability platforms, driving adoption of...
-
Senior Observability Engineer
10 hours ago
Bengaluru, Karnataka, India Applicantz Full timeTHIS IS A LONG TERM CONTRACT POSITION WITH ONE OF THE LARGEST, GLOBAL, TECHNOLOGY LEADER. We are looking for a Senior Observability Engineer with strong expertise in AWS (EC2, EKS, Lambda), Synthetic Monitoring, and Dynatrace to join our team. The ideal candidate will have hands-on experience in building scalable observability platforms, driving adoption...
-
Observability Systems Engineer
2 weeks ago
Bengaluru, Karnataka, India Q2 Full timeJob Description- We re seeking anObservability Systems Engineerfocused on creating, implementing, and managing monitoring, alerting, and remediation tools and processes to join the Q2 Observability Automation Tools team.- Q2 Softwareis focused on empowering returns on relationships for community-centered financial institutions and their retail and commercial...
-
Bengaluru, Karnataka, India Pegasystems Full timeMeet Our Team:Cloud Observability Engineering collaborates with all the engineering teams at Pega and advocate for Observability solutions, establish standards and processes. Cloud Observability Engineering team is responsible for designing, developing and maintaining Observability solutions for Pega Cloud.Picture Yourself at Pega:You will be part of a...
-
Observability Platform Engineer
23 hours ago
Bengaluru, Karnataka, India American Express Global Business Travel Full time US$ 1,04,000 - US$ 1,30,878 per yearAmex GBT is a place where colleagues find inspiration in travel as a force for good and – through their work – can make an impact on our industry. We're here to help our colleagues achieve success and offer an inclusive and collaborative culture where your voice is valued.We're transforming business travel technology. Amex GBT gives travellers access to...
-
Bengaluru, Karnataka, India Pegasystems Full timeMeet Our Team: Cloud Observability Engineering collaborates with all the engineering teams at Pega and advocate for Observability solutions, establish standards and processes. Cloud Observability Engineering team is responsible for designing, developing and maintaining Observability solutions for Pega Cloud. Picture Yourself at Pega: You will be part of a...
-
Cloud Engineer III-Observability
3 weeks ago
Bengaluru, Karnataka, India Smarsh Full timeJob DescriptionWho are weSmarsh empowers its customers to manage risk and unleash intelligence in their digital communications. Our growing community of over 6500 organizations in regulated industries counts on Smarsh every day to help them spot compliance, legal or reputational risks in 80+ communication channels before those risks become regulatory fines...