How to Assess Startups Using Machine Learning: Part II — The GASP Open Source Framework

Arturo Moreno
PreSeries
Published in
7 min readSep 18, 2018

--

The data geeks of the VC community finally have their newsletter. Subscribe here!

How to Assess Startups Using Machine Learning

Part I Introduction

Part II — The GASP Open Source Framework

Part IIIThe GASP for Predictive Modelling

A Common Framework To Work With Startup Data

In Part I, we started to explain the motivation behind the creation of the GASP. In today’s venture capital world, utilizing data-science to power deal sourcing and assessment is seen by many players as the next technological frontier. But apart from big firms publicizing engineering and data science hires to build their own software, few investors offer a sneak peek of what it actually entails. The ones who do, follow their own approach. That’s where the industry is standing right now, everyone defending its own turf. For more on how VCs utilize data, read “How Venture Capitalists Use Artificial Intelligence” on our blog.

Currently, picking startups to invest in is more art than science. Investors all have different investing criteria, outlook on what the future holds, and more importantly what patterns in the startups should be looked at. In other words, investors have their own frameworks which they believe, rightfully or not, give them an advantage over other investors. The problem with relying on a set of assumptions for what a “good” startup is, is that investors will only consider startups matching their assumptions. Whether it’s aspects of the team, the product, traction, or even the market, investors will always look for the same signals, the same patterns. Meaning, a lot of interesting ventures are missing the deserved attention because of not belonging to key category hits.

“But how can I find patterns of success/failures that I don’t know exist?” — You right now!

How Machine Learning Can Help

This is where machine learning shows promises. It works by extracting patterns from the data without pre-defined assumptions of what the output should look like. Let me illustrate! Instead of only looking at startups with certain milestones, milestones you believe to be good predictors of success, you feed the predictive model with a dataset of startup data covering everything from the team, to the product, the market, etc. and let the model find by by itself the patterns of success/failures.

Sounds wonderful! How do I do that? — A skeptical you right now!

So, two important points here! First, we want to build a dataset that covers all possible quantifiable aspects for each startup we look at. Second, we will then combine all individual datasets to generate predictive models. The first part is the reason for this blogpost, we’ll dive into the machine learning aspect of things in Part III.

Our vision for fundraising

Why are we doing all this? We belive a better fundraising experience is possible. We believe fundraising can be a fast, fair and widely available experience for founders while it is financially performing for investors.

We envision a world where access to capital for your entrepreneurial venture becomes an analytical experience, full of transparency and where little time is spent on convincing others about the possibilities of your business and rather founders spend time building products.

This project constitutes a sandbox to demonstrate that making analytics a cornerstone for the fundraising process is possible and, even with plenty of opportunity to expand input data (video, voice, etcetera…) performs well.

We dream with a future where all founders, from all backgrounds and irrespectful for location, network or industry vertical have a fair chance to get funded. And I believe we can achieve that through data.

Bill Aulet, my professor of entrepreneurship at MIT, teaches new students every year to demistify entrepreneurship. “Entrepreneurship can be taught” is the motto of the Martin Trust Center for MIT Entrepreneurship and he’s shared his views on multiple ocassions like this one: “the Zuckerberg example is a fading myth of entrepreneurship”.

I believe the Midas List is a fading myth of venture capital. Of course there exist out there great investors, but making venture capital investing look like a black art, only a few have mastery on, derails us from the opportunity to provide worldwide financing to the asset class that brings, and will bring, the most benefit for humanity.

Want to join? Contributing to GASP is, in my opinion, the single step at your reach to help this become a reality shortly. Let me tell you why…

The GASP

Because there is no standard industry practice in venture capital to assess startups, we took it on ourselves to design a framework that can be used and re-used freely by anyone anywhere. The objective of this framework is to offer a common set of variables that can be derived to easily perform feature engineering tasks for machine learning. In other words, by standardizing the collection of startup data we empower investors to put that data to use, uncover insights, and not let it sit on a dusty spreadsheet, never to be looked at again. We named our framework the GASP (Generally Accepted Startup Principles), a very obvious pun on the mother of accounting standards, the GAAP.

The GASP in its most complete form is currently represented in a 10 tab spreadsheet. Each variable is approached from a historical perspective, meaning we’re interested in the evolution of each variable over time. A detailed explanation of each variable is available in the documentation folder of the GASP GitHub repository. We open sourced the framework because we believe it’s only through collaboration and transparency that we can transform venture capital from its current subjective practice to a highly analytical one.

Contribute to the GASP framework on GitHub (Click here)

GASP — Team, Market, Product, and Traction (Tab 1)

This tab covers the primary data points we think are necessary to understand a company in a quantitative way.

  • Team: We look at the evolution of the team in terms of headcount, contract type, roles, diversity, background, duration, and work location.

*Note that a full company profile would also take into consideration data from key team members social network accounts voluntarily shared (LinkedIn & Twitter) — See Tabs 7,8 & 9.

  • Market: We look at the business model and the company’s positioning. We are interested in how the startup quantifies its market opportunity and who are the main competitors.

*Note that a full market profile would also take into consideration data from free and paid publicly available data sources.

  • Product: We look at the value proposition and the adoption of the product/service.
  • Traction: We look at user and revenue/expenses growth, user churn, customer acquisition and valuation

GASP — Income Statement, Balance Sheet, and Cash Flow (Tabs 2, 3, and 4)

Collecting financial data even if better suited for startups with a more stable business model, it is interesting to start gathering information early enough to witness the evolution over time.

We also strongly believe it’s a great instrument for founders to have control over basic financial considerations to understand the creation and consumption of value at the startup level.

  • Income Statement:
  • Balance Sheet:
  • Cashflow statement:

GASP — Patents and Trademarks (Tabs 5 and 6)

Patents and trademarks are somewhat a good indicator for a startup’s ability to innovate and protect that innovation. In certain industries (e.g. Life Sciences) patent ownership is what drives funding. Patents are long and complex documents, it is more practical to ask for minimal information and then access the full document for more details. We’ll show in Part III how we automate the crawling and extraction of patent information with machine learning.

  • Patents:
  • Trademarks:

GASP — Founders, Board of Directors, and Key Team Members (Tab 7, 8 and 9)

Being able to assess the people involved in a startup is key to understand the dynamics and cohesion of a team. Moreover, past experiences and expertise acquired outside of the current venture is important to reference. Same as for patents and trademarks, instead of asking for the detailed CVs of all the people involved in a company, it is easier to ask for a LinkedIn profile and automate the crawling of that profile later on. We identified 3 distinct groups of stakeholders: the Board of Directors, the Founders, and the Management (incl. Key Team Members).

  • Board of Directors:
  • Founders:
  • Management or Key Employees

Let Us Know Your Thoughts

The GASP is free and always improving, don’t miss next iterations by joining our Venture Technology Newsletter (we email once a month).

If you’re interested in using our platform to use machine learning on startup data, please get in touch with us. We’re on Twitter too, we’d love to hear your story!

--

--

Arturo Moreno
PreSeries

Chief Information Officer @Civitatis, MBA @MIT, Formerly: Co-Founder CEO @PreSeries, @BessemerVP, @Kensho @EshipMIT Practice Leader, President @MITFinTech