Best Practices for Data Management

Explore top LinkedIn content from expert professionals.

  • View profile for Venkata Naga Sai Kumar Bysani

    Data Scientist | 200K LinkedIn | BCBS Of South Carolina | SQL | Python | AWS | ML | Featured on Times Square, Favikon, Fox, NBC | MS in Data Science at UConn | Proven record in driving insights and predictive analytics |

    208,108 followers

    Choosing the right chart is half the battle in data storytelling. This one visual helped me go from “𝐖𝐡𝐢𝐜𝐡 𝐜𝐡𝐚𝐫𝐭 𝐝𝐨 𝐈 𝐮𝐬𝐞?” → “𝐆𝐨𝐭 𝐢𝐭 𝐢𝐧 10 𝐬𝐞𝐜𝐨𝐧𝐝𝐬.”👇 𝐇𝐞𝐫𝐞’𝐬 𝐚 𝐪𝐮𝐢𝐜𝐤 𝐛𝐫𝐞𝐚𝐤𝐝𝐨𝐰𝐧 𝐨𝐟 𝐡𝐨𝐰 𝐭𝐨 𝐜𝐡𝐨𝐨𝐬𝐞 𝐭𝐡𝐞 𝐫𝐢𝐠𝐡𝐭 𝐜𝐡𝐚𝐫𝐭 𝐛𝐚𝐬𝐞𝐝 𝐨𝐧 𝐲𝐨𝐮𝐫 𝐝𝐚𝐭𝐚: 🔹 𝐂𝐨𝐦𝐩𝐚𝐫𝐢𝐬𝐨𝐧? • Few categories → Bar Chart • Over time → Line Chart • Multivariate → Spider Chart • Non-cyclical → Vertical Bar Chart 🔹 𝐑𝐞𝐥𝐚𝐭𝐢𝐨𝐧𝐬𝐡𝐢𝐩? • 2 variables → Scatterplot • 3+ variables → Bubble Chart 🔹 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧? • Single variable → Histogram • Many points → Line Histogram • 2 variables → Violin Plot 🔹 𝐂𝐨𝐦𝐩𝐨𝐬𝐢𝐭𝐢𝐨𝐧? • Show part of a total → Pie Chart / Tree Map • Over time → Stacked Bar / Area Chart • Add/Subtract → Waterfall Chart 𝐐𝐮𝐢𝐜𝐤 𝐓𝐢𝐩𝐬: • Don’t overload charts; less is more. • Always label axes clearly. • Use color intentionally, not decoratively. • 𝐀𝐬𝐤: What insight should this chart unlock in 5 seconds or less? 𝐑𝐞𝐦𝐞𝐦𝐛𝐞𝐫: • Charts don’t just show data, they tell a story • In storytelling, clarity beats complexity • Don’t aim to impress with fancy visuals, aim to express the insight simply, that’s where the real impact is 💡 ♻️ Save it for later or share it with someone who might find it helpful! 𝐏.𝐒. I share job search tips and insights on data analytics & data science in my free newsletter. Join 14,000+ readers here → https://lnkd.in/dUfe4Ac6

  • View profile for Raul Junco

    Simplifying System Design

    117,435 followers

    After years building event-driven systems. Here are the top 4 mistakes I have seen: 1. Duplication Events often get re-delivered due to retries or system failures. Without proper handling, duplicate events can: • Charge a customer twice for the same transaction. • Cause duplicate inventory updates, messing up stock levels. • Create inconsistent or broken system states. Solution: • Assign unique IDs to every event so consumers can track and ignore duplicates. • Design event processing to be idempotent, ensuring repeated actions don’t cause harm. 2. Not Guaranteeing Order Events can arrive out of order when distributed across partitions or queues. This can lead to: • Processing a refund before the payment. • Breaking logic that relies on correct sequence. Solution: • Use brokers that support ordering guarantees (e.g., Kafka). • Add sequence numbers or timestamps to events so consumers can detect and reorder them if needed. 3. The Dual Write Problem When writing to a database and publishing an event, one might succeed while the other fails. This can: • Lose events, leaving downstream systems uninformed. • Cause mismatched states between the database and event consumers. Solution: • Use the Transactional Outbox Pattern: Store events in the database as part of the same transaction, then publish them separately. • Adopt Change Data Capture (CDC) tools to track and publish database changes as events automatically. 4. Non-Backward-Compatible Changes Changing event schemas without considering existing consumers can break systems. For example: • Removing a field might cause missing data for consumers. • Renaming or changing field types can trigger runtime errors. Solution: • Maintain versioned schemas to allow smooth migration for consumers. • Use formats like Avro or Protobuf that support schema evolution. • Add adapters to translate new schema versions into older ones for compatibility. "Every schema change is a test of your system’s resilience—don’t fail it." What other mistakes have you seen out there?

  • View profile for Nancy Duarte
    Nancy Duarte Nancy Duarte is an Influencer
    215,229 followers

    Many amazing presenters fall into the trap of believing their data will speak for itself. But it never does… Our brains aren't spreadsheets, they're story processors. You may understand the importance of your data, but don't assume others do too. The truth is, data alone doesn't persuade…but the impact it has on your audience's lives does. Your job is to tell that story in your presentation. Here are a few steps to help transform your data into a story: 1. Formulate your Data Point of View. Your "DataPOV" is the big idea that all your data supports. It's not a finding; it's a clear recommendation based on what the data is telling you. Instead of "Our turnover rate increased 15% this quarter," your DataPOV might be "We need to invest $200K in management training because exit interviews show poor leadership is causing $1.2M in turnover costs." This becomes the north star for every slide, chart, and talking point. 2. Turn your DataPOV into a narrative arc. Build a complete story structure that moves from "what is" to "what could be." Open with current reality (supported by your data), build tension by showing what's at stake if nothing changes, then resolve with your recommended action. Every data point should advance this narrative, not just exist as isolated information. 3. Know your audience's decision-making role. Tailor your story based on whether your audience is a decision-maker, influencer, or implementer. Executives want clear implications and next steps. Match your storytelling pattern to their role and what you need from them. 4. Humanize your data. Behind every data point is a person with hopes, challenges, and aspirations. Instead of saying "60% of users requested this feature," share how specific individuals are struggling without it. The difference between being heard and being remembered comes down to this simple shift from stats to stories. Next time you're preparing to present data, ask yourself: "Is this just a data dump, or am I guiding my audience toward a new way of thinking?" #DataStorytelling #LeadershipCommunication #CommunicationSkills

  • View profile for Robert F. Smallwood MBA, CIGO, CIGO/AI, IGP

    CEO IG World magazine, Chair at Certified IG Officers Association, Principal at AI Governance Advisors

    5,293 followers

    Why is the Records and Information Management Function Crucial to Good AI Governance? The RIM function is crucial to effective AI governance due to its integral role in managing the lifecycle of information, which forms the backbone of AI systems. Key reasons why RIM is indispensable for robust AI governance: 1.     Data Quality Assurance: AI systems depend on the quality of data they process. RIM ensures that the data feeding into AI systems is accurate, complete, and reliable. By maintaining high standards for data quality, RIM helps ensure that AI outputs are based on the best available information, reducing the risk of errors and enhancing the system's reliability. 2.     Compliance with Data Regulations: AI systems must comply with various data protection regulations such as GDPR, HIPAA, or CCPA. RIM manages these aspects by ensuring that data is handled in compliance with legal and regulatory requirements, thereby safeguarding the organization from legal risks and penalties. 3.     Information Lifecycle Management: RIM professionals are experts in managing the lifecycle of records from creation, use, storage, and retrieval to disposition. In AI governance, managing the lifecycle of datasets used for training and operationalizing AI is crucial. This ensures that data is retained only as long as necessary and disposed of securely to prevent unauthorized access or breaches. 4.     Facilitating Audits and Transparency: RIM helps in creating an audit trail for data and decisions made by AI systems. This is essential for transparency, allowing stakeholders to understand how decisions are made. Audit trails also facilitate compliance checks. 5.     Risk Management: By managing records and information properly, RIM reduces risks associated with information mismanagement, such as data breaches, loss of data integrity, and failure to comply with retention policies. This is particularly important in AI systems where data sensitivity and security are paramount. 6.     Supporting Data Accessibility and Retrieval: AI systems require seamless access to relevant data. RIM ensures that data is organized, classified, and stored in a manner that facilitates easy retrieval and efficient use. This not only enhances the efficiency of AI systems but also supports scalability and management of data resources. 7.     Enhancing Ethical Considerations: Ethical AI governance involves ensuring that data usage respects individual rights and societal norms. RIM contributes to ethical governance by managing personal and sensitive information in line with ethical standards and best practices, thus supporting the ethical deployment of AI technologies. By integrating RIM into AI governance frameworks, organizations can ensure that their AI initiatives are responsibly managed, legally compliant, and aligned with broader business and ethical standards. Learn more at InfoGov World https://lnkd.in/gRwtkExh

  • View profile for Alfredo Serrano Figueroa
    Alfredo Serrano Figueroa Alfredo Serrano Figueroa is an Influencer

    Senior Data Scientist | Statistics & Data Science Candidate at MIT IDSS | Helping International Students Build Careers in the U.S.

    8,463 followers

    Communicating complex data insights to stakeholders who may not have a technical background is crucial for the success of any data science project. Here are some personal tips that I've learned over the years while working in consulting: 1. Know Your Audience: Understand who your audience is and what they care about. Tailor your presentation to address their specific concerns and interests. Use language and examples that are relevant and easily understandable to them. 2. Simplify the Message: Distill your findings into clear, concise messages. Avoid jargon and technical terms that may confuse your audience. Focus on the key insights and their implications rather than the intricate details of your analysis. 3. Use Visuals Wisely: Leverage charts, graphs, and infographics to convey your data visually. Visuals can help illustrate trends and patterns more effectively than numbers alone. Ensure your visuals are simple, clean, and directly support your key points. 4. Tell a Story: Frame your data within a narrative that guides your audience through the insights. Start with the problem, present your analysis, and conclude with actionable recommendations. Storytelling helps make the data more relatable and memorable. 5. Highlight the Impact: Explain the real-world impact of your findings. How do they affect the business or the problem at hand? Stakeholders are more likely to engage with your presentation if they understand the tangible benefits of your insights. 6. Practice Active Listening: Encourage questions and feedback from your audience. Listen actively and be prepared to explain or reframe your points as needed. This shows respect for their perspective and helps ensure they fully grasp your message. Share your tips or experiences in presenting data science projects in the comments below! Let’s learn from each other. 🌟 #DataScience #PresentationSkills #EffectiveCommunication #TechToNonTech #StakeholderEngagement #DataVisualization

  • View profile for Willem Koenders

    Global Leader in Data Strategy

    15,899 followers

    A few weeks ago, I posted about a best practices-informed #DataProduct team structure and corresponding roles and responsibilities. Some of the roles I sketched typically exist in a data product team and are well-understood, such as the Product Owner, Scrum Master, Data Engineer, and Solution Architect. But a few roles tend to be less understood and more often neglected, especially in haste to serve pressing #business needs. Yet it’s these roles that contribute towards the trust, quality, and consistency of the #data product, not just during design but to also ensure its continued governance in years to come: 🏛️ Data Domain Steward / Architect: This expert understands the nature of specific data types. They are often organized by data (or business) domains like customer, product, or financial data. Among other things, they ensure data products don’t overlap, that for each type of critical data there is a unique, trusted source, and that any data that is ingested, is taken from the right source. They inject this kind of expertise into the design and management of the data product. The roles of steward and architect may be distinct or combined. 🔗 Data Modeler: The person who designs data models to ensure data is stored in an optimized way, in alignment with the enterprise data model. Responsibilities include developing conceptual and logical data models and collaborating with engineers on physical models. 🔍 Data Quality Specialist: The inspector who ensures the data is accurate and usable. Responsibilities include defining data quality criteria, performing data quality checks, and resolving data quality issues. Sometimes, there can be a related, separate role for a Quality Assurance (QA) Analyst/Tester, to ensure that the data product is reliable, functional, and meets quality standards before it is released. 🗂️ Metadata Management Specialist: A role that is essential for ensuring that the right, minimally required #metadata is captured, maintained, and democratized. This supports the documentation of data lineage and the cataloguing of the data product. 🛡️ Data Privacy and Protection Specialist: Most commonly part of a dedicated #privacy or security team, this role reviews the data product’s design and requirements to ensure compliance with relevant policies, standards, and regulations. None of these roles are necessarily very “heavy” in terms of time commitment – none is fulltime. In a well-functioning data product team, their contributions are made early on in the data product lifecycle in the form of guidance or requirements, which are then incorporated “by design” by the Solution Architect and Data Engineer. It’s when they are neglected and later on there needs to be a remediation effort, that the corresponding efforts are much heavier. For full descriptions of these roles and their typical responsibilities ➡️ https://lnkd.in/eKs76M7D

  • View profile for Chad Sanderson

    CEO @ Gable.ai (Shift Left Data Platform)

    89,279 followers

    Putting pressure on data science teams to deliver analytical value with LLMs is cruel and unusual punishment without a scalable data foundation. Over time, the best LLMs will be able to write queries as effectively or more effectively than an analyst - or at minimum make writing the query easier. However, the most cost-intensive aspect of answering business questions is not producing SQL, but deciding what the query inputs should be and determining whether or not the inputs are trustworthy. Thanks to the rapid evolution of microservices and data lakes, data teams find themselves living in a world of fragmented truth. The same data points might be collected by multiple services, defined in multiple different ways, and could actually be going in opposite and contradictory directions. Today, data developers must do the hard work of understanding and resolving those discrepancies, which comes in the form of 1-to-1 conversations with the engineers managing logs and databases. Very few if any service teams at a company have documented their data for the purpose of analytics. That results in a giant gap in documentation across 1000s of datasets across the business. Without this gap being filled, data scientists will ultimately have to manually hand-check any prediction that an LLM makes in order to ensure it is accurate and not hallucinating. The model is doing a job with the information it has, but the business is not providing enough information for the model to deliver trustworthy outcomes! By investing in a scalable data foundation, this paradigm flips on its head. Data is well documented, clearly owned, and structured as an API enforced by contracts that define the use case, constraints, SLAs, and semantic meaning. A quality-driven infrastructure is a subset of all data in the lake, which reduces the surface area LLMs need to make decisions only to the nodes in the lineage graph which have clear governance and change management. Here's what I suggest: 1. Start by identifying which pipelines are most essential to answering the business's most common questions (you can do this by accessing query history) 2. Identify the core use cases (datasets/views) that are leveraged in these pipelines, and which intermediary tables are of critical importance 3. Define semantically what the data means at each level in the transformation. A good question to ask is "What does a single row in this table represent?" 4. Validate the semantic meaning with the table owners 5. Get the table owners to take ownership of the dataset asn API, ideally supported programmatically through a data contract 6. Define the semantic meaning and constraints within the data contract spec, mapped to a source file 6. Limit any usage of an LLM to the source files under contract Good luck! #dataengineering

  • View profile for Morgan Depenbusch, PhD

    Helping analysts grow their influence through better charts, clearer stories, and more persuasive communication | Ranked top 3 data viz creator on LinkedIn | People Analytics | Snowflake, Ex-Google

    30,272 followers

    The biggest lie in data analytics: The data speaks for itself. Whenever I post about the importance of selling your insights, someone will inevitably comment that this is only true if: your audience is dumb / you work in big tech / you’re not a “real” scientist. Respectfully… no. High-quality work is the baseline. But it will never get off the shelf if you can’t communicate the value of that work. Unfortunately, no one teaches analysts how to influence. But for marketers… that’s their whole job. So I borrowed a few of their tricks. Here are 5 marketing-inspired strategies that’ll make your insights land: —— 1. Treat slide titles like headlines ↳ Bad: “Q2 Revenue by Region” ↳ Better: “Europe Drove 80% of Our Q2 Growth” 2. Repeat your message like a marketer ↳ Repetition builds clarity ↳ Reinforce your key point in titles, charts, notes, voiceover 3. Use copywriting formulas  ↳ Frame your message with a structure that sticks ↳ Example formula: PAS (Problem → Agitate → Solution) 4. Add urgency ↳ People act when something feels time-sensitive ↳ Try: “Fixing this by Q3 could save $500K” 5. Lead with the benefit, not the finding ↳ Your stakeholders don’t want a number (the “what”) ↳ They want a reason to care (the “so what”) —— Great analysts know how to run the numbers AND land the message. I break these down (with examples you can steal) in this week’s newsletter: “5 marketing tricks to land your insights” Click “View my newsletter” at the top of this post to read it. —— ♻️ Repost to help other analysts build their influence 👋🏼 I’m Morgan. I write about data viz, storytelling, and how to make your insights actually land with your audience.

  • View profile for Brian Krueger, PhD

    Using SVs to detect cancer sooner | Vice President, Technology Development

    31,201 followers

    Everyone loves a good story. You should be using your data to tell one every chance you get. The importance of narrative in scientific communication cannot be understated. And that includes communication in traditionally technical environments! One thing that gets beaten into you in graduate school is that a scientific presentation is a technical affair. Communicating science is fact based, it's black and white, here's the data, this is the conclusion, do you have any questions? Actually, I do. Did you think about what story your data could tell before you put your slides together? I know this is a somewhat provocative question because a lot of scientists overlook the importance of telling a story when they present results. But if you want to keep your audience engaged and interested in what you have to say, you should think about your narrative! This is true for a presentation at 'The Mountain Lake Lodge Meeting on Post-Initiation Activities of RNA Polymerases,' the 'ACMG Annual Clinical Genetics Meeting,' or to a class of 16 year old AP Biology Students. The narrative doesn't need to be the same for all of those audiences, BUT IT SHOULD EXIST! There is nothing more frustrating to me than seeing someone give a presentation filled with killer data only to watch them blow it by putting the entire audience to sleep with an arcane technical overview of the scientific method. Please. Tell. A. Story. With. Your. Data. Here's how: 1. Plot - the series of events that drive the story forward to its resolution. What sets the scene, the hypothesis or initial observation? How can the data be arranged to create a beginning, middle, and end? 2. Theme - Good vs Evil, Human vs Virus, Day in the life of a microbe? Have fun with this (even just as a thought experiment) because it makes a big difference. 3. Character development - the team, the protein, gene, or model system 4. Conflict - What were the blockers and obstacles? Needed a new technique? Refuting a previous finding? 5. Climax - the height of the struggle. Use your data to build to a climax. How did one question lead to another and how were any problems overcome? 6. Resolution - What's the final overall conclusion and how was the conflict that was setup in the beginning resolved by what you found? By taking the time to work through what story you can tell, you can engage your entire audience and they'll actually remember what you had to say!

  • View profile for Maarten Masschelein

    CEO & Co-Founder @ Soda | Data quality & Governance for the Data Product Era

    12,434 followers

    Petition to stop using vague terms in data governance. I see a lot of teams use words like “owners,” “stakeholders,” or “users” in policies. But no one really knows what those roles mean in practice. Who writes the checks? Who fixes them? Who gets alerts when something breaks? Who has the final say on schema changes? If your policy can’t answer those questions clearly, it won’t work. That’s why I push for more precise terms like data producers and data consumers. They describe actual behavior, not abstract roles. A data producer is any system, team, or individual responsible for creating, generating, or modifying data. This can include manual data entry, API ingestion, ETL pipelines, or applications that write to databases. A data consumer is any person, process, or tool that uses data for downstream purposes. This includes analysts running dashboards, ML models using features, finance teams generating reports, or business systems making decisions based on data. Clear language leads to clear responsibility. What vague governance term do you think we should retire next?

Explore categories