Contents
OWASP Top 10 2020 Data Analysis Plan
Goals
To collect the most comprehensive examination dataset related to identified lotion vulnerabilities to-date to enable analysis for the top 10 and other future inquiry a well. This data should come from a variety of sources ; security vendors and consultancies, tease bounties, along with company/organizational contributions. Data will be normalized to allow for level comparison between Human assisted Tooling and Tooling assisted Humans .
Analysis Infrastructure
plan to leverage the Flickroom Azure Cloud Infrastructure to collect, analyze, and store the data contributed .
Contributions
We plan to support both known and pseudo-anonymous contributions. The preference is for contributions to be known ; this vastly helps with the validation/quality/confidence of the data submitted. If the submitter prefers to have their data stored anonymously and evening go ampere army for the liberation of rwanda as submitting the data anonymously, then it will have to be classified as “ unverified ” vs. “ verified ” .
Verified Data Contribution
scenario 1 : The submitter is known and has agreed to be identified as a contributing party.
scenario 2 : The submitter is known but would quite not be publicly identified.
scenario 3 : The submitter is known but does not want it recorded in the dataset.
Unverified Data Contribution
scenario 4 : The submitter is anonymous. ( Should we support ? )
The analysis of the data will be conducted with a careful eminence when the unverified datum is part of the dataset that was analyzed .
Contribution Process
There are a few ways that data can be contributed :
- Email a CSV/Excel file with the dataset(s) to [email protected]
- Upload a CSV/Excel file to a “contribution folder” (coming soon)
Template examples can be found in GitHub : hypertext transfer protocol : //github.com/OWASP/Top10/tree/master/2021/Data
Contribution Period
We plan to accept contributions to the new top 10 from May to Nov 30, 2020 for data dating from 2017 to current .
Data Structure
The follow datum elements are required or optional.
The more information provided the more accurate our psychoanalysis can be.
At a bare minimum, we need the time period, total issue of applications tested in the dataset, and the number of CWEs and counts of how many applications contained that CWE.
If at all possible, please provide the extra metadata, because that will greatly help us gain more insights into the stream state of testing and vulnerabilities.
Metadata
- Contributor Name (org or anon)
- Contributor Contact Email
- Time period (2019, 2018, 2017)
- Number of applications tested
- Type of testing (TaH, HaT, Tools)
- Primary Language (code)
- Geographic Region (Global, North America, EU, Asia, other)
- Primary Industry (Multiple, Financial, Industrial, Software, ??)
- Whether or not data contains retests or the same applications multiple times (T/F)
CWE Data
- A list of CWEs w/ count of applications found to contain that CWE
If at all possible, please provide core CWEs in the data, not CWE categories.
This will help with the analysis, any normalization/aggregation done as a separate of this analysis will be well documented .
Note:
If a contributor has two types of datasets, one from HaT and one from TaH sources, then it is recommended to submit them as two classify datasets.
HaT = Human assisted Tools ( higher volume/frequency, primarily from tooling )
TaH = Tool assisted Human ( lower volume/frequency, chiefly from homo testing )
Survey
similarly to the Top Ten 2017, we plan to conduct a survey to identify up to two categories of the Top Ten that the community believes are important, but may not be reflected in the data yet. We plan to conduct the view in May or June 2020, and will be utilize Google forms in a exchangeable manner as last time. The CWEs on the survey will come from current trending findings, CWEs that are outside the Top Ten in data, and other likely sources .
Process
At a high level, we plan to perform a degree of data standardization ; however, we will keep a translation of the bleak data contributed for future psychoanalysis. We will analyze the CWE distribution of the datasets and potentially reclassify some CWEs to consolidate them into larger buckets. We will cautiously document all standardization actions taken so it is clear what has been done.
We plan to calculate likelihood following the model we developed in 2017 to determine incidence rate rather of frequency to rate how likely a given app may contain at least one exemplify of a CWE. This means we aren ’ triiodothyronine looking for the frequency rate ( phone number of findings ) in an app, rather, we are looking for the number of applications that had one or more instances of a CWE. We can calculate the incidence rate based on the full number of applications tested in the dataset compared to how many applications each CWE was found in .
In addition, we will be developing base CWSS scores for the top 20-30 CWEs and include electric potential impact into the top 10 burden .
besides, would like to explore extra insights that could be gleaned from the contribute dataset to see what else can be learned that could be of use to the security and exploitation communities .