27 stories

Stats for Data Science, from the Ground Up

1 Share

Data scientists love debating which skills are essential for success in the field. It makes sense: in a rapidly changing ecosystem that adopts new and powerful technologies all the time, job requirements and toolkits never stop evolving.

Statistics seem to be one major outlier, though. Data professionals of all stripes seem to agree that a solid foundation in stats and math will make your life easier regardless of your role, and can open up opportunities that would otherwise remain beyond reach.

To help you on your learning journey, we’re sharing a few of our favorite recent posts that focus on statistics for data science and machine learning. They go from the basics all the way to more specialized use cases, but they’re all accessible, beginner-friendly, and emphasize practical applications over lofty theory. Let’s dive in!

  • Stats novice? Not for long! If you’re tackling stats for the first time in your professional life—and especially if your memories of high school math inspire more dread than joy—you’re bound to appreciate Chi Nguyen’s simple explanations of basic concepts.
  • A structured approach to learning statistics. Looking for a thorough, step-by-step resource for learning stats? Adrienne Kline recently launched an excellent Statistics Bootcamp that unpacks the math behind all the data science libraries practitioners use daily. (If you’ve already discovered the first installment, linked above, parts two and three are already out!)
  • Making sense of occasionally confusing terms. For his debut TDS article, Ajay Halthor shared a lucid explanation of likelihood, and focused on the role it plays in machine learning, as well as its sometimes hard-to-grasp connection to probability, an equally crucial concept.
Photo by Alisa Anton on Unsplash
  • Putting your statistical know-how to good use. There’s always a gap between theoretical knowledge and its effective application. Mintao Wei’s recent contribution does a great job bridging it, as it walks us through the process of selecting the right statistical tests for a range of A/B testing metrics.
  • The inner workings of a powerful algorithm, explained. The bootstrap, says Christian Leschinski, “is an algorithm that allows you to determine the distribution of a test statistic without doing any theory.” It’s also one that’s been “widely overlooked.” Harnessing his deep knowledge as a statistician, Christian guides us through the magic behind the boostrap, and shows how it can help practitioners in their analyses.
  • Why it’s crucial to connect statistics to business outcomes. Cassie Kozyrkov identifies the challenges data professionals face when they bring their stats and math knowledge to work projects, and stresses the importance of data budgeting, a topic college classes rarely cover. (If you’d like to read more of Cassie’s insights—and you should!—don’t miss our brand-new Q&A with her, which touches on data career paths, the value data analysts bring to companies, and much more.)

All stats-ed out, are we? We hope not, but just in case—here are some non-statistics-related reading recommendations we think you’ll enjoy.

Your support means so much to us — thank you for reading our authors’ work; a special shoutout goes to all of you who’ve recently become Medium members.

Until the next Variable,

TDS Editors

Stats for Data Science, from the Ground Up was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read the whole story
461 days ago
Share this story

Health Data

2 Comments and 15 Shares
Donate now to help us find a cure for causality. No one should have to suffer through events because of other events.
Read the whole story
561 days ago
Share this story
2 public comments
562 days ago
As a medical librarian, I can confirm that this is a comic about health data.
ATL again
562 days ago
Just wow!

Debugging Flatpak applications

1 Share

Flatpak is a way to distribute applications on Linux. Its container-style approach allows applications to run across Linux distributions. This means native packages (rpm, deb, etc) are not needed and it's relatively easy to get your app to Linux users with fewer worries about distro compatibility. This makes life a lot easier for developers and is also convenient for users.

I've run popular applications like OBS Studio as flatpaks and even publish my own on Flathub, a popular hosting site for applications. Today I figured out how to debug flatpaks, which requires some extra steps that I'll share below so I don't forget them myself!

Bonus Tip: Testing local flatpaks

If you're building a flatpak of your own application it's handy to use the dir sources type in the manifest to compile your application's source code from a local directory instead of a git tag or tarball URL. This way you can make changes to the source code and test them quickly inside Flatpak.

Put something along these lines in the manifest's modules object where /home/user/my-app is you local directory with your app's source code:

"name": "my-app",
"sources": [
"type": "dir",
"path": "/home/user/my-app"

Building and installing apps with debuginfo

flatpak-builder(1) automatically creates a separate .Debug extension for your flatpak that contains your application's debuginfo. You'll need the .Debug extension if you want proper backtraces and source level debugging. At the time of writing the Flatpak documentation did not mention how to install the locally-built .Debug extension. Here is how:

$ flatpak-builder --user --force-clean --install build my.org.app.json
$ flatpak install --user --reinstall --assumeyes "$(pwd)/.flatpak-builder/cache" my.org.app.Debug

It might be a good idea to install debuginfo for the system libraries in your SDK too in case it's not already installed:

$ flatpak install org.kde.Sdk.Debug # or your runtime's SDK

Running applications for debugging

There is a flatpak(1) option that launches the application with the SDK instead of the Runtime:

$ flatpak run --user --devel my.org.app

The SDK contains development tools whereas the Runtime just has the files needed to run applications.

It can also be handy to launch a shell so you can control the launch of your app and maybe use gdb or strace:

$ flatpak run --user --devel --command=sh my.org.app
[📦 my.org.app ~]$ gdb /app/bin/my-app

Working with core dumps

If your application crashes it will dump core like any other process. However, existing ways of inspecting core dumps like coredumpctl(1) are not fully functional because the process ran inside namespaces and debuginfo is located inside flatpaks instead of the usual system-wide /usr/lib/debug location. coredumpctl(1), gdb, etc aren't Flatpak-aware and need extra help.

Use the flatpak-coredumpctl wrapper to launch gdb:

$ flatpak-coredumpctl -m <PID> my.org.app

You can get PID from the list printed by coredumpctl(1).


This article showed how to install locally-built .Debug extensions and inspect core dumps when using Flatpak. I hope that over time these manual steps will become unnecessary as flatpak-builder(1) and coredumpctl(1) are extended to automatically install .Debug extensions and handle Flatpak core dumps. For now it just takes a few extra commands compared to debugging regular applications.

Read the whole story
575 days ago
Share this story

Let’s Encrypt Receives the Levchin Prize for Real-World Cryptography

1 Comment

On April 13, 2022, the Real World Crypto steering committee presented the Max Levchin Prize for Real-World Cryptography to Let’s Encrypt. The following is the speech delivered by our Executive Director, Josh Aas upon receiving the award. We’d like to thank our community for supporting us and invite you to join us in making the Internet more secure and privacy-respecting for everyone.

Thank you to the Real World Crypto steering committee and to Max Levchin for this recognition. I couldn’t be more proud of what our team has accomplished since we started working on Let’s Encrypt back in 2013.

My first temptation is to name some names, but there are so many people who have given a significant portion of their lives to this work over the years that the list would be too long. You know who you are. I hope you’re as proud as I am at this moment.

Let’s Encrypt is currently used by more than 280 million websites, issuing between two and three million certificates per day. I often think about how we got here, looking for some nugget of wisdom that might be useful to others. I’m not sure I’ve really come up with anything particularly profound, but I’m going to give you my thoughts anyway. Generally speaking: we started with a pretty good idea, built a strong team, stayed focused on what’s important, and kept ease of use in mind every step of the way.

Let’s Encrypt ultimately came from a group of people thinking about a pretty daunting challenge. The billions of people living increasingly large portions of their lives online deserved better privacy and security, but in order to do that we needed to convince hundreds of millions of websites to switch to HTTPS. Not only did we want them to make that change, we wanted most of them to make the change within the next three to five years.

Levchin Prize Trophy

We thought through a lot of options but in the end we just didn’t see any other way than to build what became Let’s Encrypt. In hindsight building Let’s Encrypt seems like it was a good and rewarding idea, but at the time it was a frustrating conclusion in many ways. It’s not an easy solution to commit to. It meant standing up a new organization, hiring at least a dozen people, understanding a lot of details about how to operate a CA, building some fairly intense technical systems, and setting all of it up to operate for decades. Many of us wanted to work on this interesting problem for a bit, solve it or at least put a big dent in it, and then move on to other interesting problems. I don’t know about you, but I certainly didn’t dream about building and operating a CA when I was younger.

It needed to be done though, so we got to work. We built a great team that initially consisted of mostly volunteers and very few staff. Over time that ratio reversed itself such that most people working on Let’s Encrypt on a daily basis are staff, but we’re fortunate to continue to have a vibrant community of volunteers who do work ranging from translating our website and providing assistance on our community forums, to maintaining the dozens (maybe hundreds?) of client software options out there.

Today there are just 11 engineers working on Let’s Encrypt, as well as a small team handling fundraising, communication, and administrative tasks. That’s not a lot of people for an organization serving hundreds of millions of websites in every country on the globe, subject to a fairly intense set of industry rules, audits, and high expectations for security and reliability. The team is preparing to serve as many as 1 billion websites. When that day comes to pass the team will be larger, but probably not much larger. Efficiency is important to us, for a couple of reasons. The first is principle - we believe it’s our obligation to do the most good we can with every dollar entrusted to us. The second reason is necessity - it’s not easy to raise money, and we need to do our best to accomplish our mission with what’s available to us.

It probably doesn’t come as a surprise to anyone here at Real World Crypto that ease of use was critical to any success we’ve had in applying cryptography more widely. Let’s Encrypt has a fair amount of internal complexity, but we expose users to as little of that as possible. Ideally it’s a fully automated and forgettable background task even to the people running servers.

The fact that Let’s Encrypt is free is a huge factor in ease of use. It isn’t even about how much money people might be willing or able to pay, but any financial transaction requirement would make it impossible to fully automate our service. At some point someone would have to get a credit card and manage payment information. That task ranges in complexity from finding your wallet to obtaining corporate approval. The existence of a payment in any amount would also greatly limit our geographic availability because of sanctions and financial logistics.

All of these factors led to the decision to form ISRG, a nonprofit entity to support Let’s Encrypt. Our ability to provide this global, reliable service is all thanks to the people and companies who believe in TLS everywhere and have supported us financially. I’m so grateful to all of our contributors for helping us.

Our service is pretty easy to use under normal circumstances, but we’re not done yet. We can be better about handling exceptional circumstances such as large revocation events. Resiliency is good. Automated, smooth resiliency is even better. That’s why I’m so excited about the ACME Renewal Info work we’re doing in the IETF now, which will go into production over the next year.

Everyone here has heard it before, but I’ll say it again because we can’t afford to let it slip our minds. Ease of use is critical for widespread adoption of real world cryptography. As we look toward the future of ISRG, our new projects will have ease of use at their core. In fact, you can learn about our newest project related to privacy-preserving measurement at two of this afternoon’s sessions! Getting ease of use right is not just about the software though. It’s a sort of pas de trois, a dance for three, between software, legal, and finance, in order to achieve a great outcome.

Thank you again. This recognition means so much to us.

Supporting Let’s Encrypt

As a nonprofit project, 100% of our funding comes from contributions from our community of users and supporters. We depend on their support in order to provide our services for the public benefit. If your company or organization would like to sponsor Let’s Encrypt please email us at sponsor@letsencrypt.org. If you can support us with a donation, we ask that you make an individual contribution.

Read the whole story
596 days ago
This project changed adoption for HTTPS in the best way possible.
Share this story

AI-Generated Sleep Podcast Urges You To Imagine Pleasant Nonsense

1 Share

[Stavros Korokithakis] finds the experience of falling asleep to fairy tales soothing, and this has resulted in a fascinating project that indulges this desire by using machine learning to generate mildly incoherent fairy tales and read them aloud. The result is a fantastic sort of automated, machine-generated audible sleep aid. Even the logo is machine-generated!

The Deep Dreams Podcast is entirely machine-generated, including the logo.

The project leverages the natural language generation abilities of OpenAI’s GPT-3 to create fairytale-style content that is just coherent enough to sound natural, but not quite coherent enough to make a sensible plotline. The quasi-lucid, dreamlike result is perfect for urging listeners to imagine pleasant nonsense (thanks to Nathan W Pyle for that term) as they drift off to sleep.

We especially loved reading about the methods and challenges [Stavros] encountered while creating this project. For example, he talks about how there is more to a good-sounding narration than just pointing a text-to-speech engine at a wall of text and mashing “GO”. A good episode has things like strategic pauses, background music, and audio fades. That’s where pydub — a Python library for manipulating audio — came in handy. As for the speech, text-to-speech quality is beyond what it was even just a few years ago (and certainly leaps beyond machine-generated speech in the 80s) but it still took some work to settle on a voice that best suited the content, and the project gradually saw improvement.

Deep Dreams Podcast has a GitLab repository if you want to see the code that drives it all, and you can go to the podcast itself to give it a listen.

Read the whole story
613 days ago
Share this story

Choosing the Research Topic and Reading Its Literature

1 Share

Are you starting with your first research project? Do you want to publish a research paper but don’t know where or how to start? A few years back, I was in a similar position: I was completely new to machine learning research and had no idea where and how to begin.

Fortunately, I had the opportunity to be a part of the machine learning and vision group (Lab1055). With the proper guidance, several setbacks, and learning along the way, I have published my research in an ICCV Workshop. The learnings and principles I developed as a member of Lab1055 have helped me pursue research and publish in vision conferences, workshops, and journals (e.g., ECCV, WACV, ECML, PRL, CVPR, etc.).

Through this mini-blog series, I want to share these learnings and principles. I hope that these will help you hack your way into research and get something fruitful from it.

In this series, you will learn how to publish novel research.

This lesson is the first in a 5-part series on How to Publish Novel Research:

  1. Choosing the Research Topic and Reading Its Literature (this tutorial)
  2. Ideating the Solution and Planning Experiments
  3. Planning and Writing a Research Paper
  4. Planning Next Steps When Things Don’t Work Out
  5. Ensuring Your Research Stays Visible and General Tips

To learn how to choose your research topic and go about reading its literature (Figure 1), just keep reading.

Figure 1: Choosing the Research Topic and Reading Its Literature.

Choosing the Research Topic and Reading Its Literature

Choosing Your Research Topic

Nowadays, it’s hard to think of any problem where machine learning has not been applied. Everything seems to be virtually solved. And that’s why choosing a research topic in this vast space can be a tricky and overwhelming task. Here are a few ideas to help you narrow down your search.

Find Something That Excites You and Is Relevant to the Community

Without a doubt, the topic should pique your interest. However, not every topic that excites you can benefit the community. As a result, one should look for something that interests them and is relevant to the community. For example, if image classification piques your interest, you should try few-shot, zero-shot, unsupervised, self-supervised, or domain-generalized classification.

These topics gain more attention in the community as they model real-world constraints on available training data, unlike traditional supervised classification, which assumes an abundance of data. One method for identifying these trending or relevant topics is to conduct a topic-by-topic analysis of recent AI conferences. For example, Figure 2 shows the top 50 keywords of ICLR 2021 submissions. It indicates that topics such as GNNs, meta-learning, few-shot, unsupervised, supervised, and robustness are more popular, relevant, and trending in the community than topics like classification, CNNs, and so on.

Figure 2: Top 50 keywords in ICLR 2021 Submissions (credits).

Try to Find a Domain Where the Literature Is Not Very Crowded

When the literature is crowded, trying to be novel in your research becomes challenging and competitive. Crowded literature is more than just too many papers to read; it reflects how actively the community is working or publishing on similar topics. And given the rate at which these new works are published, you are likely to be scooped by them in terms of results or novelty.

Work with Your Advisor or Mentors

I strongly advise you to collaborate with your mentors (PhDs, Research Scientists, and so on), as well as any advisors, to identify promising and relevant topics. Working on something that intersects with your advisor’s or mentor’s expertise can greatly increase your chances of producing novel research. With their expertise and research experience, you can receive appropriate guidance and feedback to help you improve and publish your work.

Avoid Compute-Extensive Projects

Note how much computation is typically required to solve the topic. Avoid those that necessitate intensive computation unavailable or unaffordable. This is to prevent any disappointment later in the project. For example, someone working on large-scale image generation should keep track of the computation required to generate higher-resolution images. This is a standard and fundamental experiment in this literature.

Identify Low-Hanging Fruits and Potential Gaps

Identifying low-hanging fruits in the literature is probably one of the simplest ways to choose your topic. Low-hanging fruits are research topics that are simple to work upon but go unnoticed. These could be as simple as

  • Combining two research topics into one:
    • Mancini et al. (2020) are the first to combine Zero-shot learning and Domain Generalization and propose a simple curriculum-based class/domain mixup strategy to train models that generalize under both domains’ semantic shift.
    • Ganea et al. (2021) present the first incremental approach to few-shot instance segmentation: iMTFA, which learns discriminative embeddings for object instances merged into class representatives.
    • Chauhan et al. (2020) first propose studying the topic of few-shot graph classification in graph neural networks (GNNs) to recognize unseen classes, given limited labeled graph examples.
  • Applying a new class of algorithms or architectures:
    • Deng et al. (2021) first present a simple and effective transformer-based framework for visual grounding. Their TransVG method outperforms state-of-the-arts that rely on a complex module with manually designed mechanisms to perform the query reasoning and multi-modal fusion.
  • Proposing new benchmarks or evaluations:
    • Hendrycks et al. (2021) propose four new real-world distribution shift datasets consisting of changes in image style, image blurriness, geographic location, camera operation, etc.
    • Gulrajani and Lopez-Paz (2020) implement DomainBed, a testbed for domain generalization (DG), including seven multi-domain datasets, nine baseline algorithms, and three model selection criteria. In addition, they test existing DG methods under their settings to understand how practical these algorithms are in realistic settings.

However, keep in mind that, because these are typically easy to identify and work on, you must act quickly to be the first to propose them. Another good strategy can be to identify the drawbacks/gaps in the current pipelines of space and work upon eliminating them. This could include gaps like making pipelines compute/time efficient without sacrificing performance, robust to shifts, etc.

Analysis and Understanding as Research Topics

Comprehensive analysis to provide a holistic understanding of a specific space can be a research topic in and of itself. Analyzing what works best, any intriguing phenomena, trade-offs, limitations, or standardized benchmarking can help you, and the community better understand the space and identify potential gaps to address in the future. The best part is that they get a lot of attention from the community through citations and discussions.

For example,

  • Naseer et al. (2021) show and analyze several intriguing properties of Vision Transformers (ViTs), like their robustness to severe occlusions, perturbations, and domain shifts; their less texture bias compared to CNNs, and superior transfer learning capabilities.
  • Xian et al. (2017) discuss limitations in zero-shot learning formulations and algorithms by comparing and analyzing a significant number of the state-of-the-art methods in-depth, both in the classic zero-shot setting and the more realistic generalized zero-shot setting.
  • Chen et al. (2019) perform a consistent comparative analysis of several few-shot classification algorithms. They show that deeper backbones significantly reduce the performance differences among various state-of-the-art methods. Furthermore, in a realistic cross-domain evaluation setting, baseline methods compare favorably against other state-of-the-art algorithms.

Identify Applications

Applying existing ideas to a relevant topic(e.g., medical images, editing, navigation, etc.) can also serve as a potential research topic. Following are examples of such papers.

  • Papadopoulos et al. (2019) aim to teach a machine to make a pizza by building a generative model that mirrors an ordered set of instructions. They learn composable module operations to either add or remove a particular ingredient through GANs.
  • Machiraju and Balasubramanian (2020) study the natural adversaries in the field of autonomous navigation wherein adverse weather conditions such as fog have a drastic effect on the predictions of these systems. These weather conditions can act like natural adversaries that can help test models.
  • Richardson et al. (2021) propose a StyleGAN encoder able to directly encode real images into style space and show that solving translation tasks through StyleGAN significantly simplifies the training process and has better support for solving tasks without pixel-to-pixel correspondence.

Reading the Literature

With the exponential growth of deep learning-related publications (Figure 3), it has become necessary to devise effective strategies to deal with the pacing literature. So, now that you’ve decided which topic to work on, let’s look at some popular resources/tools for understanding the topic better, skimming through its literature, and keeping yourself updated with the ongoing research in the community.

Figure 3: Evolution of deep learning related publications: (a) citations listed in Google scholar for AlexNet paper; (b) arXiv listed publications in the categories cs and stat, including the terms deep learning, convolutional neural networks, convolutional networks, or fully convolutional and their share of all publications listed in the two categories; and (c) publications in selected Earth observation journals, searched for with the same terms as in arXiv (caption and image credits: Hoeser and Kuenzer, 2020).

Survey and Analysis Papers

If you want to understand the fundamentals, different classes of algorithms introduced, or how they compare, survey papers are probably the best place to start. They are usually simple to locate and follow. On the other hand, analytical papers can help you better understand the topic by explaining gaps, limitations, trade-offs, best strategies, intriguing results, etc.

GitHub Compilations

Explore GitHub to get compilations (e.g., Awesome Visual Transformers, Awesome Zero-shot Learning, Awesome Self-supervised Learning, Awesome Visual Grounding, etc.) of research papers specific to your topic. For a kickstart, read initial papers (to get fundamentals) and top papers (to see where the trend is going). They also include links to any implementations, blogs, videos, etc., and are updated regularly to reflect the most up-to-date content. You can find these compilations easily by using the search term “awesome <topic name>.”

Conference and Workshop Proceedings

Following the proceedings of top conferences (e.g., CVPR, NeurIPS, ICCV, ICML, ICLR, ECCV, ACL, EMNLP, KDD, etc.) is an excellent way to stay up to date on the latest research. As an example, Figure 4 ranks out various computer vision conferences by h-index. Another great way to stay up to date is to attend area-specific workshops where you can find research talks, presentations, and submissions that are more relevant to your topic. In addition, these workshops often consolidate a specific research space, allowing you to understand the current trends better.

Figure 4: Top Computer Vision conferences by h-index (credits: Google Scholar).

For example, the Workshop on Meta-Learning has been a popular NeurIPS workshop focusing on advancing meta-learning methods. Another popular workshop at ICML, Uncertainty and Robustness in Deep Learning, aims to make deep neural networks more reliable. The Adversarial Machine Learning in Real-World Computer Vision Systems and Online Challenges at CVPR focuses on recent research and future directions for security in real-world machine learning and computer vision systems. Finally, 3DVR Workshop (CVPR 2021) discusses the unique challenges and opportunities in the 3D vision for robotics.

Online Tools and Platforms

Here are a few tools you can use to ease your search of papers relevant to your topic:

  • With Connected Papers, you can build a graph of papers relevant to a particular field and discover prior or derivatives works in your field of interest. It further allows you to create a bibliography for any future use cases.
  • Arxiv Sanity allows researchers to keep track of recent papers, search for papers, sort papers by similarity to any paper, see recent popular papers, add papers to a personal library, and get personalized recommendations of (new or old) Arxiv papers.
  • Alpha Signal provides you with a weekly summary of research papers trending and worth reading.
  • Using Google Scholar, you can add your personalized keywords, fields, researchers and get notifications whenever any paper relevant to them gets uploaded.
  • Twitter, in my opinion, is by far the best place to stay up to date. If you follow the right people, research labs, and conferences, you will find a wealth of content, insights, and collaboration opportunities. The best part is that you can directly share your thoughts or ask questions with the community. Figures 5 and 6 recommend several researchers, academic, and industry labs that you can follow to create an excellent feed for yourself.
Figure 5: Top 25 AI Influencers to Follow on Twitter (credits: AI TIME JOURNAL).
Figure 6: Top 100 institutions (Academic and Industry) at ICML 2019 (credits: Reddit).

Reading Strategy

After you’ve read the fundamental and introductory papers, devise a reading strategy that will help you efficiently filter out the relevant content and save you several hours of reading irrelevant articles.

  • Examine the abstract for a high-level summary of the work.
  • Look for any introductory figures or illustrations (to get a sense of the approach).
  • Skim through key results, and so on.

Papers are organized in a consistent format (Abstract, Introduction, Related Work, Methodology, Experiments, and Conclusion), making it simple to find anything specific. Aside from that, you can get a good overview by looking at their posters, spotlight videos, or blogs.

Keshav (2020) proposes a three-pass approach for reading papers.

  • The first pass gives you a general idea of the paper. It involves reading the abstract, introduction, conclusions; glancing over references to decide whether the paper is relevant to you and needs any other passes. After the first pass, you should be able to answer questions about the category, context, contributions, correctness, and clarity of the work.
  • The second pass lets you grasp the paper’s content by focusing on figures, illustrations, or diagrams, marking any unread references for future reading. This pass will help you see if the work can be relevant to your topic (can it be a potential baseline, related work, or even a solution to your topic). If yes, then go for a third pass to fully understand it.
  • The third pass involves reading the paper carefully ​​to identify its strong and weak points. In particular, you should be able to pinpoint implicit assumptions, missing citations to relevant work, and potential issues with experimental or analytical techniques.

Consider yourself a reviewer and ask pertinent questions to ensure a thorough assessment. For example, are there any simpler methods/guidelines that the authors did not consider? Are the authors’ assumptions reasonable? Is their approach technically sound, or do they have any limitations (expensive computation, training/inference overheads, etc.)?

Think creatively to determine whether the presented idea can be extended, integrated, or have some applications. Highlight important content, thoughts, or criticism, or summarize it in a few paragraphs. This will greatly assist you when rereading the paper or writing about it in the related work section of your paper.

What's next? I recommend PyImageSearch University.

Course information:
35+ total classes • 39h 44m video • Last updated: February 2022
★★★★★ 4.84 (128 Ratings) • 3,000+ Students Enrolled

I strongly believe that if you had the right teacher you could master computer vision and deep learning.

Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science?

That’s not the case.

All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught.

If you're serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery.

Inside PyImageSearch University you'll find:

  • &check 35+ courses on essential computer vision, deep learning, and OpenCV topics
  • &check 35+ Certificates of Completion
  • &check 39h 44m on-demand video
  • &check Brand new courses released every month, ensuring you can keep up with state-of-the-art techniques
  • &check Pre-configured Jupyter Notebooks in Google Colab
  • &check Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)
  • &check Access to centralized code repos for all 500+ tutorials on PyImageSearch
  • &check Easy one-click downloads for code, datasets, pre-trained models, etc.
  • &check Access on mobile, laptop, desktop, etc.

Click here to join PyImageSearch University


Selecting a research topic can be difficult in this vast machine/deep learning space. Not every topic that piques your interest can be turned into a successful research topic. As a result, one should look for domains relevant to the research community, align with their interests, and are not in a crowded space. Furthermore, take note of the compute typically required to solve these topics and avoid working on those where the compute is unavailable or unaffordable.

After you have narrowed down your list of topics, consult with your mentors and advisors to see if they’re particularly interested or experienced in any of them. Then, begin with them and look for any low-hanging fruits. But do not forget to act quickly if you want to be the first to claim it. Parallelly, look for any constraints, gaps, trade-offs, or intriguing phenomena that could lead to a good analysis or concerns to address.

Conducting a literature review can be a daunting task. Start with survey papers and GitHub compilations to understand the fundamentals and skim through the recent approaches. Next, follow the proceedings of top conferences and their area-specific workshops to stay updated with the ongoing research. Utilize online tools and platforms such as Twitter to obtain a curated set of content relevant to your issue. Look for blogs, posters, or spotlight videos to get a quick paper overview. Finally, but most importantly, devise a reading strategy for parsing a paper without wasting time.

I hope this lesson will assist you in narrowing your search for your research topic and efficiently dealing with its literature. Stay tuned for the next lesson on ideating for a solution planning your experiments.

Citation Information

Mangla, P. “Choosing the Research Topic and Reading Its Literature,” PyImageSearch, J. Haase, P. Chugh, R. Raha, K. Kudriavtseva, and S. Huot, 2022, https://pyimg.co/oravw

  author = {Puneet Mangla},
  title = {Choosing the Research Topic and Reading Its Literature},
  booktitle = {PyImageSearch},
  editor = {Jon Haase and Puneet Chugh and Ritwik Raha and Kseniia Kudriavtseva and Susan Huot},
   year = {2022},
  note = {https://pyimg.co/oravw},

Join the PyImageSearch Newsletter and Grab My FREE 17-page Resource Guide PDF

Enter your email address below to join the PyImageSearch Newsletter and download my FREE 17-page Resource Guide PDF on Computer Vision, OpenCV, and Deep Learning.

The post Choosing the Research Topic and Reading Its Literature appeared first on PyImageSearch.

Read the whole story
618 days ago
Share this story
Next Page of Stories