diff --git a/201807_ETHPrize/ETHPrize_tagged_data_analysis.ipynb b/201807_ETHPrize/ETHPrize_tagged_data_analysis.ipynb index 2fe516d..eaef91e 100644 --- a/201807_ETHPrize/ETHPrize_tagged_data_analysis.ipynb +++ b/201807_ETHPrize/ETHPrize_tagged_data_analysis.ipynb @@ -6485,7 +6485,7 @@ }, { "cell_type": "code", - "execution_count": 15, + "execution_count": 14, "metadata": {}, "outputs": [ { @@ -7079,7 +7079,7 @@ "[593 rows x 3 columns]" ] }, - "execution_count": 15, + "execution_count": 14, "metadata": {}, "output_type": "execute_result" } @@ -7109,7 +7109,7 @@ }, { "cell_type": "code", - "execution_count": 14, + "execution_count": 15, "metadata": {}, "outputs": [], "source": [ @@ -7121,15 +7121,3138 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Data preparation, Feature extraction & engineering on Topics" + "# Choosing an evaluation metric for the model\n", + "\n", + "For us human it's easy to evaluate if an answer matches with a topic or project, but machine learning models are not on/off. Simple models will give us a probability of 20%, 50% or 80% and it's up to us to determine the threshold (or we could use a fancy neural net model that will learn the thresholds).\n", + "\n", + "Also we could get fancy and choose to penalize more false positives (wrong topic added) vs false negative (topic not added) but let's keep it simple for now.\n", + "\n", + "We will set our threshold to 50%: if the model output less than 50% probability we consider it a negative and positive otherwise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Data preparation, Feature extraction & engineering on Topics\n", + "\n", + "We will use a simple Latent Semantic Analysis technique ([Wikipedia](https://en.wikipedia.org/wiki/Latent_semantic_analysis), [Stanford NLP](https://nlp.stanford.edu/IR-book/html/htmledition/latent-semantic-indexing-1.html)) for feature engineering.\n", + "\n", + "In short we transform the text into a term-document frequencies (and inverse frequency) matrix, i.e. how often each word appear w.r.t. to the whole doc. We project that into a multidimensional vector space (say 2, 3, or 100 dimensions), this will be the lexical/semantic field of our documents. Then the model is trained to associate those semantic fields to our desired labels." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Note on Pipelines\n", + "\n", + "For simplicity we will use Scikit-Learn pipelines to represent our series of transformaion including the final classifier. They have the following caveats:\n", + "\n", + " - Inefficient: during \"cross-validation\" (validation on unseen data) they ensure that at no moment unseen data leaks into the train dataset (when doing \"mean\" for example), however this implies doing the intermediate computations for each \"fold\" (a train + validation dataset pair) even though many intermediate computations do not leak.\n", + " - Does not support early stopping: early-stopping allows us to fine-tuning the tree models complexity without manual guesswork on the ideal number of trees, i.e. you augment the number of trees until it doesn't help.\n", + " - Other advanced use limitations: no caching support, no out-of-fold predictions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Imports" ] }, { "cell_type": "code", - "execution_count": null, + "execution_count": 16, "metadata": {}, "outputs": [], - "source": [] + "source": [ + "from sklearn.multiclass import OneVsRestClassifier\n", + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.ensemble import RandomForestClassifier\n", + "from sklearn.pipeline import Pipeline\n", + "from sklearn.feature_extraction.text import TfidfVectorizer\n", + "from sklearn.decomposition import TruncatedSVD\n", + "from sklearn_pandas import DataFrameMapper\n", + "from sklearn.preprocessing import MultiLabelBinarizer\n", + "from sklearn.model_selection import cross_val_score" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Preparing target column for Multilabeling\n", + "First let's see what it looks like" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2 unit tests, testing, functional tests, contact...\n", + "7 testing, compile, deployment, integration, con...\n", + "10 community, ETHGlobal, visualisation, logging, \n", + "11 readthedocs, websockets, transaction\n", + "13 tokens, tokens, open source documentation, ERC...\n", + "15 human\n", + "21 IDE, Go\n", + "22 wallets, dapps, market\n", + "23 documentation, UX, transaction, light client, ...\n", + "24 ecosystem, local dummy client, continuous inte...\n", + "25 ecosystem, security, browser\n", + "31 bounties, ERC, volatility\n", + "37 usability, signing, ICO, bugs, education, boun...\n", + "41 UI, wallet, keys, signing, gas, contracts, tra...\n", + "44 visualisation, fuzz testing, unit tests\n", + "48 readthedocs, stack, opcodes, community\n", + "49 modularity, updatability, opcodes, open source...\n", + "52 caching, IDE, vidualisation, stack, modularity\n", + "55 Optimise, data\n", + "65 communication, ecosystem, static analysis, jav...\n", + "69 experimentation, optimise, dapps, UI, C++, RPC...\n", + "71 formal verification, audit\n", + "72 continuous integration, security, deployment, ...\n", + "73 bugs, bounties, multisig contracts, signatures...\n", + "75 wallet, UX\n", + "78 payments\n", + "80 staking, tokens, contracts, assembly, language...\n", + "83 analysis, cryptography, payment\n", + "85 audit, open source, security, \n", + "86 game theory, memory\n", + " ... \n", + "1366 assets, scalability\n", + "1368 RPC\n", + "1369 ERC, automatic, errors, documentation, testnet\n", + "1371 RPC, dapps, ERC, tokens, IDE, deployment, keys...\n", + "1375 money, management, ecosystem, education, video...\n", + "1376 community, infrastructure, security, UI, testing\n", + "1385 ecosystem, security\n", + "1389 linting, code coverage, platform agnostic, bou...\n", + "1390 integration\n", + "1393 infrastructure, debugger\n", + "1395 versioning, compile\n", + "1404 payments, exchange, ERC, fork, consensus, gove...\n", + "1409 ICO, identity, money\n", + "1413 ERC, interface, inheritance, errors, open sour...\n", + "1415 gas, security, audit, deployment, community, d...\n", + "1416 javascript, code coverage\n", + "1417 adoption, scalability, transactions, wallets, ...\n", + "1421 linting, dapps, contracts, upgradability\n", + "1422 formal verification, scaling, state channels\n", + "1423 compile, data, community, language, events\n", + "1424 gas, compile, debuggers, testnet\n", + "1427 query, data, protocol, design, RPC, data, laye...\n", + "1431 dapps, integration, usability, libraries, tran...\n", + "1435 UX, documentation, videos, documentation, boun...\n", + "1441 static analysis, UX, modularity, contracts\n", + "1442 grants, bounties, debugger, maintenance, incen...\n", + "1451 wallet, memory, keys, key management, document...\n", + "1453 contracts, copile, C++, javascript, state, deb...\n", + "1454 Natspec, typescript\n", + "1455 contracts, Go, video\n", + "Name: Topics, Length: 591, dtype: object" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topics_df['Topics']" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We need to feed that to \"MultiLabelBinarizer\".\n", + "\n", + "Since it expects a list we will use split on commas (with or without whitespace before/after) to transform those into a list." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [], + "source": [ + "topics_mlb = MultiLabelBinarizer()\n" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2 [unit tests, testing, functional tests, contac...\n", + "7 [testing, compile, deployment, integration, co...\n", + "10 [community, ETHGlobal, visualisation, logging]\n", + "11 [readthedocs, websockets, transaction]\n", + "13 [tokens, tokens, open source documentation, ER...\n", + "15 [human]\n", + "21 [IDE, Go]\n", + "22 [wallets, dapps, market]\n", + "23 [documentation, UX, transaction, light client,...\n", + "24 [ecosystem, local dummy client, continuous int...\n", + "25 [ecosystem, security, browser]\n", + "31 [bounties, ERC, volatility]\n", + "37 [usability, signing, ICO, bugs, education, bou...\n", + "41 [UI, wallet, keys, signing, gas, contracts, tr...\n", + "44 [visualisation, fuzz testing, unit tests]\n", + "48 [readthedocs, stack, opcodes, community]\n", + "49 [modularity, updatability, opcodes, open sourc...\n", + "52 [caching, IDE, vidualisation, stack, modularity]\n", + "55 [Optimise, data]\n", + "65 [communication, ecosystem, static analysis, ja...\n", + "69 [experimentation, optimise, dapps, UI, C++, RP...\n", + "71 [formal verification, audit]\n", + "72 [continuous integration, security, deployment,...\n", + "73 [bugs, bounties, multisig contracts, signature...\n", + "75 [wallet, UX]\n", + "78 [payments]\n", + "80 [staking, tokens, contracts, assembly, languag...\n", + "83 [analysis, cryptography, payment]\n", + "85 [audit, open source, security]\n", + "86 [game theory, memory]\n", + " ... \n", + "1366 [assets, scalability]\n", + "1368 [RPC]\n", + "1369 [ERC, automatic, errors, documentation, testnet]\n", + "1371 [RPC, dapps, ERC, tokens, IDE, deployment, key...\n", + "1375 [money, management, ecosystem, education, vide...\n", + "1376 [community, infrastructure, security, UI, test...\n", + "1385 [ecosystem, security]\n", + "1389 [linting, code coverage, platform agnostic, bo...\n", + "1390 [integration]\n", + "1393 [infrastructure, debugger]\n", + "1395 [versioning, compile]\n", + "1404 [payments, exchange, ERC, fork, consensus, gov...\n", + "1409 [ICO, identity, money]\n", + "1413 [ERC, interface, inheritance, errors, open sou...\n", + "1415 [gas, security, audit, deployment, community, ...\n", + "1416 [javascript, code coverage]\n", + "1417 [adoption, scalability, transactions, wallets,...\n", + "1421 [linting, dapps, contracts, upgradability]\n", + "1422 [formal verification, scaling, state channels]\n", + "1423 [compile, data, community, language, events]\n", + "1424 [gas, compile, debuggers, testnet]\n", + "1427 [query, data, protocol, design, RPC, data, lay...\n", + "1431 [dapps, integration, usability, libraries, tra...\n", + "1435 [UX, documentation, videos, documentation, bou...\n", + "1441 [static analysis, UX, modularity, contracts]\n", + "1442 [grants, bounties, debugger, maintenance, ince...\n", + "1451 [wallet, memory, keys, key management, documen...\n", + "1453 [contracts, copile, C++, javascript, state, de...\n", + "1454 [Natspec, typescript]\n", + "1455 [contracts, Go, video]\n", + "Name: Topics, Length: 591, dtype: object" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# We split on comma and only keep non-empty labels.\n", + "# Line 3 for example has a ending comma followed by trailing space\n", + "topics_df['Topics'].apply(lambda labels: [x.strip() for x in labels.split(',') if x.strip()])" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['0x', 'ABI', 'ABI Encoding', 'ABIEncoderV2', 'ABIEncoding', 'AI',\n", + " 'AST', 'BigNumber', 'C', 'C++', 'DAO', 'DHT', 'DSL', 'EIP', 'ERC',\n", + " 'ETHGlobal', 'EuroToken', 'Go', 'Haskell', 'Human', 'ICO', 'IDE',\n", + " 'IDEA', 'IOT', 'Java', 'Jelly', 'LLL', 'MiniMe', 'NFT', 'Natspec',\n", + " 'Optimise', 'Proof of Stake', 'RLP', 'RPC', 'Radspec', 'Ruby',\n", + " 'Rust', 'SNARKs', 'STARKs', 'Schnorr signatures', 'TCRs',\n", + " 'Token Curated Registries', 'UI', 'UX', 'Vitalik', 'abigen',\n", + " 'adoption', 'analysis', 'analytics', 'arbitration', 'architecture',\n", + " 'art', 'artifacts', 'assembly', 'assets', 'attack', 'auction',\n", + " 'audit', 'audits', 'automatic', 'automatically', 'beige paper',\n", + " 'best practices', 'block explorer', 'blockchain explorer',\n", + " 'bloom filters', 'boilerplate', 'bootstrap', 'bounties', 'browser',\n", + " 'bug bounties', 'bugs', 'bunties', 'business logic', 'bytecode',\n", + " 'caching', 'chairty', 'code coverage', 'code review',\n", + " 'code reviews', 'collectibles', 'communication', 'community',\n", + " 'compile', 'compiler', 'complexity', 'consensus',\n", + " 'constinuous integration', 'contacts', 'continuous integration',\n", + " 'contract', 'contractis', 'contracts', 'contracts audit',\n", + " 'contracts bugs', 'contracts tokens transactions', 'coordination',\n", + " 'copile', 'cpp-ethereum', 'crowdfunding', 'cryptography',\n", + " 'culture', 'curation', 'curation markets', 'custom', 'dapps',\n", + " 'data', 'debugger', 'debuggers', 'decentralized Infura',\n", + " 'defensive', 'dependencies', 'deployment', 'design', 'desktop',\n", + " 'deterministic', 'diligence', 'discovery', 'disk', 'documenation',\n", + " 'documentation', 'dogfooding', 'dynamic', 'ecosystem',\n", + " 'ecosystems', 'ecryption', 'edge cases', 'eduation', 'education',\n", + " 'education consensus', 'efficiency', 'emscripten', 'encryption',\n", + " 'enforcement', 'error handling', 'error logging', 'errors',\n", + " 'event logging', 'events', 'exchange', 'exchanges', 'expensive',\n", + " 'experimentation', 'faucet', 'fizz testing', 'fork',\n", + " 'formal verification', 'frameworks', 'functional tests',\n", + " 'fungibility', 'fuzz testing', 'game', 'game theory', 'games',\n", + " 'gaming', 'gas', 'gas limit', 'givernance', 'governance', 'grants',\n", + " 'haskell', 'holocracy', 'human', 'human-readable', 'identity',\n", + " 'immutability', 'immutable', 'improve', 'incentives', 'inentives',\n", + " 'infrastructure', 'inheritance', 'integration', 'interface',\n", + " 'interfaces', 'interoperability', 'interpeter', 'interpreter',\n", + " 'iterate', 'iteration', 'javascript', 'javascripts', 'javescript',\n", + " 'key management', 'key managementm adoption', 'keys', 'lanaguage',\n", + " 'language', 'languages', 'layer 2 solutions', 'layer 2 tooling',\n", + " 'leadership', 'legislation', 'libp2p', 'libraries', 'light client',\n", + " 'light clients', 'linking', 'linting', 'liquid democracy',\n", + " 'liquid pledging', 'liquidity', 'local client',\n", + " 'local dummy client', 'logging', 'logic', 'low hanging fruit',\n", + " 'maintenance', 'management', 'manual', 'market', 'marketplace',\n", + " 'marketplaces', 'memory', 'messages', 'metadats', 'metrics',\n", + " 'micropayments', 'middleware', 'migration', 'migrations',\n", + " 'mindset', 'minset', 'mobile', 'mocking', 'modularity', 'money',\n", + " 'monitoring', 'multisig', 'multisig contracts', 'natspec',\n", + " 'netowkr', 'network', 'noise', 'not_javascript', 'notifications',\n", + " 'off cahin', 'off chain', 'onboarding', 'opcode', 'opcodes',\n", + " 'open souce', 'open source', 'open source block explorer',\n", + " 'open source documentation', 'optimise', 'optimization', 'orders',\n", + " 'organisation', 'pattern', 'payment', 'payments', 'peer-to-peer',\n", + " 'permissions', 'philosophy', 'phishing', 'pipeline',\n", + " 'platform agnostic', 'prediction', 'prediction markets', 'privacy',\n", + " 'production', 'productivity', 'profiling', 'protocol', 'protocols',\n", + " 'provider', 'proxy', 'pythereum', 'python', 'queries', 'query',\n", + " 'race conditions', 'rants', 're-entrancy', 'readability',\n", + " 'readthedocs', 'reliability', 'reporting', 'reputation',\n", + " 'research', 'sanity check', 'sanity checks', 'scalabilit',\n", + " 'scalability', 'scaling', 'security', 'securiy', 'serialization',\n", + " 'sharding', 'side chains', 'side channels', 'signature',\n", + " 'signatures', 'signing', 'simplicity', 'snapshot', 'solc',\n", + " 'sourcemap', 'spam', 'stability', 'stablecoin', 'stablecoins',\n", + " 'stack', 'stack explorer', 'stack limit of 16', 'stack trace',\n", + " 'stadards', 'stake', 'stake signing', 'staking', 'standard',\n", + " 'standards', 'standrads', 'state', 'state channel',\n", + " 'state channels', 'state management', 'state watching',\n", + " 'static analysis', 'storage', 'syncing', 'talent', 'tesnet',\n", + " 'test environment', 'test-driven development', 'testeth',\n", + " 'testing', 'testnet', 'token', 'tokens', 'tools', 'tracing',\n", + " 'transaction', 'transactions', 'tribalism', 'trnsactions', 'trust',\n", + " 'typescript', 'unit testing', 'unit tests', 'unti tests',\n", + " 'updatability', 'upgradability', 'upgradaility', 'usability',\n", + " 'user', 'utils', 'value', 'verification', 'verisioning',\n", + " 'versioning', 'video', 'videos', 'vidualisation', 'visualisation',\n", + " 'volatility', 'wallet', 'wallets', 'websockets', 'yellow paper'],\n", + " dtype=object)" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Now assign the trnasformed labels\n", + "y_topics = topics_mlb.fit_transform(\n", + " topics_df['Topics'].apply(lambda labels: [x.strip() for x in labels.split(',') if x.strip()])\n", + ")\n", + "topics_mlb.classes_" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Seems like we have some tagging problems \"debugger\", \"debuggers\", \"securiy\", \"unti tests\", ecosystem\", \"ecosystems\", \"open souce\"...\n", + "\n", + "We will continue like this for now.\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(591, 361)\n" + ] + } + ], + "source": [ + "print(y_topics.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Wow, we have 361 possible topics!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Feature pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [], + "source": [ + "mapper = DataFrameMapper([\n", + " ('Questions', [TfidfVectorizer(max_features=2**16,\n", + " min_df=1, stop_words='english',\n", + " use_idf=True), # Create term-document frequencies and inverse frequencies\n", + " TruncatedSVD(20)]), # Project on 20-dimension space (the text is very short)\n", + " ('Answer', [TfidfVectorizer(max_features=2**16,\n", + " min_df=1, stop_words='english',\n", + " use_idf=True), # Create term-document frequencies and inverse frequencies\n", + " TruncatedSVD(100)]) # Project on 100-dimension space\n", + "])\n", + "pipeline = Pipeline([\n", + " ('mapper_step', mapper),\n", + " # Most model don't support MultiLabel classification\n", + " # We will train one simple LogisticRegression classifier per target label with the OneVsRestClassifier wrapper\n", + " # LogisticRegression is simple but fast, which is needed since we have 361 topics at the moment\n", + " ('OvR_logreg', OneVsRestClassifier(\n", + " LogisticRegression(random_state = 1337, n_jobs = 1),\n", + " n_jobs = -1 # Launch as many classifiers as we have cores\n", + " ))\n", + "])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Measuring model performance" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "def crossval(pipe, X_train, y_train, n_folds):\n", + " # We don't test in parallel as the classifier is already parallel\n", + " cv = cross_val_score(pipe, X_train, y_train, cv=n_folds, n_jobs=1)\n", + " print(\"Cross Validation Scores are: \", cv.round(4))\n", + " print(\"Mean CrossVal score is: \", round(cv.mean(),4))\n", + " print(\"Std Dev CrossVal score is: \", round(cv.std(),4))" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idQuestionsAnswer
2How do you handle testing?Just Truffle for tests\\nMocha for unit and fun...
7What are the tools/libraries/frameworks you use?Truffle for building, testing and compiling\\nC...
10What tools don’t exist at the moment?The community is doing a good job and a lot of...
11What was the hardest part about learning to de...Having a sequential getting started stuff on e...
13Who are you and what are you working on?Full stack web dev, working in finance and som...
15How do you handle smart contract verification ...Write a lot of tests myself. Get other people ...
21What are the tools/libraries/frameworks you use?Truffle - not his favourite, but best thing ou...
22What are you most excited about in the short t...What dev tools on near horizon that would chan...
23What are your biggest frustrations?Dapps: web3js stuff sucks. In the doc, it’s in...
24What tools don’t exist at the moment?Things I want improved with truffle: it has a ...
\n", + "
" + ], + "text/plain": [ + "id Questions \\\n", + "2 How do you handle testing? \n", + "7 What are the tools/libraries/frameworks you use? \n", + "10 What tools don’t exist at the moment? \n", + "11 What was the hardest part about learning to de... \n", + "13 Who are you and what are you working on? \n", + "15 How do you handle smart contract verification ... \n", + "21 What are the tools/libraries/frameworks you use? \n", + "22 What are you most excited about in the short t... \n", + "23 What are your biggest frustrations? \n", + "24 What tools don’t exist at the moment? \n", + "\n", + "id Answer \n", + "2 Just Truffle for tests\\nMocha for unit and fun... \n", + "7 Truffle for building, testing and compiling\\nC... \n", + "10 The community is doing a good job and a lot of... \n", + "11 Having a sequential getting started stuff on e... \n", + "13 Full stack web dev, working in finance and som... \n", + "15 Write a lot of tests myself. Get other people ... \n", + "21 Truffle - not his favourite, but best thing ou... \n", + "22 What dev tools on near horizon that would chan... \n", + "23 Dapps: web3js stuff sucks. In the doc, it’s in... \n", + "24 Things I want improved with truffle: it has a ... " + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_topics = topics_df.drop('Topics', axis = 1)\n", + "X_topics.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 4 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 6 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 8 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 25 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 11 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 30 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 49 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 19 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 37 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 38 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 90 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 122 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 125 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 100 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 117 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 118 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 160 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 189 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 191 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 148 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 216 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 234 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 236 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 253 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 210 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 241 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 244 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 229 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 231 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 265 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 313 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 304 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 308 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 309 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 292 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 344 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 335 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 354 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 356 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 342 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 2 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 45 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 61 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 16 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 39 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 121 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 124 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 79 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 95 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 101 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 140 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 186 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 167 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 202 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 217 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 233 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 219 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 247 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 270 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 272 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 275 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 307 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 318 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 347 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 350 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 336 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 22 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 31 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 34 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 35 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 60 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 65 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 67 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 93 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 94 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 73 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 74 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 109 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 110 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 87 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 127 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 104 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 134 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 135 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 184 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 161 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 218 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 147 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 222 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 223 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 199 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 200 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 230 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 288 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 273 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 301 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 302 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 256 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 331 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 281 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 346 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 287 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 0 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 7 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 58 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 18 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 64 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 26 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 48 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 72 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 96 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 98 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 99 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 119 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 102 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 144 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 149 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 151 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 129 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 131 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 154 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 133 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 115 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 116 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 176 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 179 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 182 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 192 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 235 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 215 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 258 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 261 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 224 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 295 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 296 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 278 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 282 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 306 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 324 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 326 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 337 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 12 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 23 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 76 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 108 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 80 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 97 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 85 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 103 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 169 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 157 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 165 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 181 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 208 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 212 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 257 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 214 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 246 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 268 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 298 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 283 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 271 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 315 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 291 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 349 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cross Validation Scores are: [0. 0. 0.0085 0. 0. ]\n", + "Mean CrossVal score is: 0.0017\n", + "Std Dev CrossVal score is: 0.0034\n" + ] + } + ], + "source": [ + "crossval(pipeline, X_topics, y_topics, 5) # We test our model with 5 different splits of train/validation set" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ugh something seems wrong\n", + "\n", + "Let's try to run the model on the input data" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Pipeline(memory=None,\n", + " steps=[('mapper_step', DataFrameMapper(default=False, df_out=False,\n", + " features=[('Questions', [TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',\n", + " dtype=, encoding='utf-8', input='content',\n", + " lowercase=True, max_df=1.0, max_features=65536, m...1337, solver='liblinear', tol=0.0001,\n", + " verbose=0, warm_start=False),\n", + " n_jobs=-1))])" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pipeline.fit(X_topics, y_topics)" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[(),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('javascript', 'unit tests'),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('gas',),\n", + " (),\n", + " ('gas',),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('gas',),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('javascript',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " ('javascript',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('audit', 'contracts'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " ('javascript', 'unit tests'),\n", + " (),\n", + " ('debugger',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('gas',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('debugger',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " ('documentation',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',),\n", + " (),\n", + " (),\n", + " ('bounties',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit', 'security'),\n", + " (),\n", + " (),\n", + " (),\n", + " ('gas',),\n", + " ('debugger',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('audit',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('contracts',)]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "topics_mlb.inverse_transform(pipeline.predict(X_topics))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Yep that's super wrong, we probably need more data, and also a more complex model." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Projects\n" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2 Truffle, Mocha, Mthril\n", + "7 Open Zeppelin, Truffle, Ganache, Ethereum, not...\n", + "8 Proof of Stake, Casper, eWASM\n", + "10 ETHGlobal, Solcoverage, parity\n", + "11 Solidity, Ethereum, Geth, GitHub, reddit\n", + "13 Ethereum, Modular.network, AWS, Blossom, Statu...\n", + "21 Truffle, VIM, Etherscan, built_our_own\n", + "22 Status, web3.js, SNARKs, MetaMask, Cipher\n", + "23 not_web3.js, ethers.js, MetaMask, Geth, Vipnod...\n", + "24 Truffle\n", + "25 Truffle, MetaMask, Mist\n", + "27 ThousandEther\n", + "31 Gitcoin, MakerDAO\n", + "37 MakerDAO, RChain, MetaMask, Ethereum, Whymarrh...\n", + "41 MetaMask, Qhymarrh, EthereumJS, eth.js\n", + "44 Mythril, Mocha\n", + "48 Gitter, GitHub, Slack, Stack overflow, Ethereum\n", + "49 Truffle, EVM, Parity, Geth, MetaMask, Not_Myth...\n", + "50 Shyft, Bunz\n", + "52 Mythril, EVM\n", + "55 Aion, Shyft, EVM, POA, Parity, Geth\n", + "65 Ethereum, eWASM, EVM, Gitter, Solidity, GitHub...\n", + "69 Solidity, EVM, Trezor, eWASM, Testrpc, Ganache\n", + "71 Trail of Bits\n", + "73 Ethereum\n", + "76 Ethereum, GitHub\n", + "78 Lightning, Ethereum\n", + "80 Casper, LLL\n", + "81 Ethereum\n", + "82 Trail of Bits\n", + " ... \n", + "1371 EthVigil, Slack\n", + "1375 MEW, Ethereum\n", + "1376 Slack, MEW, Ethereum, Swarm City, Coinbase, Ha...\n", + "1385 Kosla, MEW, Ethereum\n", + "1389 Consensys, 0x, Truffle, Solidity, web3.js. Myt...\n", + "1390 Truffle, Ganache\n", + "1392 Gitter, Truffle, Stack overflow\n", + "1393 Truffle\n", + "1395 Ethereum, web3.js, Solidity, Aragon\n", + "1404 Ethereum, GitHub, Swift, Apple, DARPA\n", + "1409 Ethereum, Coinbase\n", + "1413 Ethereum, Solidity\n", + "1415 The Graph, Open Zeppelin\n", + "1416 Truffle, built_our_own\n", + "1417 Status MetaMask, L4, Next\n", + "1418 Aragon, IPFS\n", + "1420 CryptoZombies, Zasterin, Truffle, Medium, Unen...\n", + "1421 Truffle, Remix, Etherscan, Parity, Geth, IDE, ...\n", + "1422 Plasma, Vyper, eWASM\n", + "1423 Solidity, EVM, Ethereum, eWASM\n", + "1424 Remix, Atom, Embark\n", + "1427 The Graph, Ethereum\n", + "1431 Remix, Truffle, MetaMask, Swarm, web3.js, Soli...\n", + "1435 Remix, Solium, Oyente, GitHub\n", + "1441 Ethereum, Remix, Embark, Visual Studio Code\n", + "1442 Ethereum\n", + "1451 Livepeer, Casper, Geth, Pupeth, Docker, Ethere...\n", + "1453 Truffle, Solidity, Dapphub, Ethereum, Geth, pa...\n", + "1454 Aragon, Open Zeppelin, 0x\n", + "1455 Geth, EVM, Parity, POA, Truffle, Testrpc\n", + "Name: Projects, Length: 593, dtype: object" + ] + }, + "execution_count": 30, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "projects_df['Projects']" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "projects_mlb = MultiLabelBinarizer()" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array(['0x', '1Protocol', '4byte.io', 'ARES', 'AWS', 'AdChain', 'Adtoken',\n", + " 'Aion', 'Airbit', 'Alethio', 'Ambisafe', 'Ansible', 'AppDynamics',\n", + " 'Apple', 'Aragon', 'Argus', 'ArtDAO', 'Atom', 'Augur', 'Ava',\n", + " 'AwesomeList', 'Bamboo', 'Bancor', 'Berkeley', 'Biddler',\n", + " 'BitFury', 'Bitcoin', 'Blockgeeks', 'Blockseer', 'Blossom',\n", + " 'Bounties.network', 'Braid', 'Brave', 'Bunz', 'Capture The Ether',\n", + " 'Capture the Ether', 'Cardano', 'Casper', 'Chai', 'Chainshot',\n", + " 'Chrome', 'Chronologic', 'Cipher', 'Circle', 'Circles', 'Coinbase',\n", + " 'Colony', 'Consensys', 'Cosmos', 'Counterfactual', 'Coursera',\n", + " 'CryptoKitties', 'CryptoNYC', 'CryptoZombies', 'CryptoZombiew',\n", + " 'Cryptomechanics.info', 'Cure52', 'DARPA', 'DAT Protocol',\n", + " 'Dagger', 'Dapphub', 'Dapple', 'Dappnode', 'Decentraland',\n", + " 'Deja Vu', 'Dfinity', 'Dharma', 'Digital Ocean', 'District0x',\n", + " 'Django', 'Docker', 'Drizzle', 'EMACS', 'ENS', 'EOS', 'ETHFiddle',\n", + " 'ETHGasReporter', 'ETHGasStation', 'ETHGlobal', 'EVM', 'EVMLab',\n", + " 'Eclipse', 'Embark', 'Enchidna', 'Endorse.io',\n", + " 'Energy Web Foundation', 'EthFiddle', 'EthGasReporter',\n", + " 'EthGasStation', 'EthMix', 'EthPM', 'EthVigil', 'Ether cards',\n", + " 'EtherCamp', 'EtherDelta', 'Ethercamp', 'Etherdid', 'Ethereim',\n", + " 'Ethereu', 'Ethereum', 'Ethereum Alarm Clock', 'Ethereum Wallet',\n", + " 'EthereumJS', 'EthereumJS-blockstream', 'Ethermint', 'Ethernaturs',\n", + " 'Ethernauts', 'Etherscan', 'Ethmoji', 'FOAM', 'Facebook',\n", + " 'Feathers', 'Figma', 'Filecoin', 'Firebase', 'Ganache', 'Geo',\n", + " 'Geth', 'Geth. Solidity', 'GitHub', 'GitPivot', 'Gitcoin',\n", + " 'Github', 'Gitter', 'Giveth', 'Gnarly', 'Gnosis', 'Golem',\n", + " 'Goodle', 'Google', 'GovernX', 'Hackernoon', 'Hackerrank',\n", + " 'HashHeroes', 'HelloGold', 'Heroku', 'Hive', 'Horizon Blockchain',\n", + " 'Hyperledger', 'IC3', 'IDE', 'IPFS', 'IPLD', 'IPNS', 'IULIA',\n", + " 'IVY', 'Infura', 'Integral', 'IntelliJ', 'JetBrains',\n", + " 'Jupiter notebook', 'Kauri', 'Keep Network', 'Keythereum', 'Keyva',\n", + " 'Kleros', 'Kosla', 'Kyokan', 'L4', 'LLL', 'Ledger', 'Leeroy',\n", + " 'Lightning', 'Linnia', 'Lisk', 'Livepeer', 'Loom Netowkr',\n", + " 'Loom Network', 'Lukso', 'Lunyr', 'MEW', 'MakerDAO',\n", + " 'MakerDAO Plasma', 'Manticore', 'Mascara', 'Mattis', 'Medium',\n", + " 'MelonPort', 'MetaMask', 'Micha', 'Mist', 'Mix', 'Mocha',\n", + " 'Modular.network', 'Monax', 'Monero', 'Mozilla', 'Mthril',\n", + " 'MuleSoft', 'MyCrypto', 'MyEtherWallet', 'Mycrypto', 'Mythril',\n", + " 'NPM', 'Neo', 'Neufund', 'Next', 'NexusMutual', 'Node',\n", + " 'Not_Ganache', 'Not_MetaMask', 'Not_Mythril', 'Not_Solidity',\n", + " 'Not_Truffle', 'NuCypher', 'Numerai', 'OST', 'Odin protocol',\n", + " 'OmiseGo', 'Omkara', 'Open Zeppelin', 'Open Zeppeling',\n", + " 'OpenBazaar', 'Opensea', 'Oraclize', 'OrbitDB', 'Oyente',\n", + " 'Oyented', 'POA', 'Pantera Capital', 'Parity', 'Plasma',\n", + " 'Polychain', 'Polymath', 'Populus', 'Porosity', 'Portis',\n", + " 'Proof of Stake', 'Proof of Steak', 'Protocol Labs', 'Puddle',\n", + " 'Pupeth', 'Pythereum', 'Qhymarrh', 'Quantstamp', 'QuickBlocks',\n", + " 'Quickblocks', 'Quiknode', 'RChain', 'Raiden', 'React',\n", + " 'React Native', 'Rect', 'Reddit', 'Redis', 'Redux', 'Redux-saga',\n", + " 'Remix', 'Ripple', 'Rufflet', 'SNARKs', 'STARKs', 'Salesforce',\n", + " 'Samsara Protocol', 'Secure Scuttlebutt', 'Securify', 'Selenium',\n", + " 'Set Protocol', 'ShapeShift', 'Sherpal', 'Shyft', 'Skype', 'Slack',\n", + " 'SlockIt', 'SolCoverage', 'Solcover', 'Solcoverage', 'Solidify',\n", + " 'Solidity', 'Solidity flattener', 'Solidity.berlin', 'Solint',\n", + " 'Solium', 'SourceCred', 'SourceCred Ethereu', 'Spankchain',\n", + " 'Sportcrypt', 'Squarespace', 'Stack overflow', 'Stackexchange',\n", + " 'Stanford', 'Status', 'Status MetaMask', 'Steemit', 'Sublime',\n", + " 'SuperMax', 'Surya', 'Swarm', 'Swarm City', 'Sweetbridge', 'Swift',\n", + " 'Tedermint', 'Telehash', 'Tendermint', 'Tensorflow', 'Terraform',\n", + " 'Testrpc', 'The Graph', 'ThousandEther', 'Toshi', 'Trail of Bits',\n", + " 'Travis', 'Trezor', 'Trinity', 'TrueBit', 'Truebit', 'Truffle',\n", + " 'Truffle.', 'Trufle', 'Trustory', 'Twitter', 'Typedoc', 'Ubuntu',\n", + " 'Udemy', 'Ujo', 'Unenumerated', 'VIM', 'Vinos', 'Vipnode',\n", + " 'VirtuePoker', 'Visual Studio Code', 'Voltaire House',\n", + " 'Voltaire Labs', 'VulcanizeDB', 'Vyper', 'WeTrust',\n", + " 'Web3.js\\n0x.js\\nTruffle\\nTestrpc\\nInfura\\nRemix Solidity\\nEtherscan',\n", + " 'WebRTC', 'WeekInEthereum', 'Whisper', 'Whymarrh', 'WithPragma',\n", + " 'Wix', 'Wyvern', 'XLNT', 'Yarn', 'YouTube', 'Youtube', 'Zasterin',\n", + " 'Zcash', 'Zeppelin', 'Zklabs', 'ZoKrates', 'Zokrates', 'blk.io',\n", + " 'built_our_ow', 'built_our_own', 'com', 'eWASM', 'eth.js',\n", + " 'ether.js', 'ethernauts', 'ethers.build', 'ethers.build. Parity',\n", + " 'ethers.cli', 'ethers.js', 'ganache', 'infura', 'jQuery', 'ledger',\n", + " 'manticore', 'metaMask', 'not_web3.js', 'not_webs.js', 'oyente',\n", + " 'parity', 'plasma', 'populus', 'proof of Stake', 'reddit', 'remix',\n", + " 'solc', 'solhint', 'steak.network', 'truebit', 'truffle', 'uPort',\n", + " 'web.js', 'web3.js', 'web3.js. Mythril', 'web3.js. eth.js',\n", + " 'web3.py', 'webs.js'], dtype=object)" + ] + }, + "execution_count": 33, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_projects = projects_mlb.fit_transform(\n", + " projects_df['Projects'].apply(lambda labels: [x.strip() for x in labels.split(',') if x.strip()])\n", + ")\n", + "projects_mlb.classes_" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "(593, 383)\n" + ] + } + ], + "source": [ + "print(y_projects.shape)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "383 projects as well :/" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
idQuestionsAnswer
2How do you handle testing?Just Truffle for tests\\nMocha for unit and fun...
7What are the tools/libraries/frameworks you use?Truffle for building, testing and compiling\\nC...
8What are you most excited about in the short t...Proof of Stake overlays will be really interes...
10What tools don’t exist at the moment?The community is doing a good job and a lot of...
11What was the hardest part about learning to de...Having a sequential getting started stuff on e...
13Who are you and what are you working on?Full stack web dev, working in finance and som...
21What are the tools/libraries/frameworks you use?Truffle - not his favourite, but best thing ou...
22What are you most excited about in the short t...What dev tools on near horizon that would chan...
23What are your biggest frustrations?Dapps: web3js stuff sucks. In the doc, it’s in...
24What tools don’t exist at the moment?Things I want improved with truffle: it has a ...
\n", + "
" + ], + "text/plain": [ + "id Questions \\\n", + "2 How do you handle testing? \n", + "7 What are the tools/libraries/frameworks you use? \n", + "8 What are you most excited about in the short t... \n", + "10 What tools don’t exist at the moment? \n", + "11 What was the hardest part about learning to de... \n", + "13 Who are you and what are you working on? \n", + "21 What are the tools/libraries/frameworks you use? \n", + "22 What are you most excited about in the short t... \n", + "23 What are your biggest frustrations? \n", + "24 What tools don’t exist at the moment? \n", + "\n", + "id Answer \n", + "2 Just Truffle for tests\\nMocha for unit and fun... \n", + "7 Truffle for building, testing and compiling\\nC... \n", + "8 Proof of Stake overlays will be really interes... \n", + "10 The community is doing a good job and a lot of... \n", + "11 Having a sequential getting started stuff on e... \n", + "13 Full stack web dev, working in finance and som... \n", + "21 Truffle - not his favourite, but best thing ou... \n", + "22 What dev tools on near horizon that would chan... \n", + "23 Dapps: web3js stuff sucks. In the doc, it’s in... \n", + "24 Things I want improved with truffle: it has a ... " + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "X_projects = projects_df.drop('Projects', axis = 1)\n", + "X_projects.head(10)" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 1 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 7 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 32 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 33 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 44 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 22 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 39 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 28 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 29 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 42 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 31 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 56 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 55 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 58 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 65 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 116 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 142 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 144 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 154 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 155 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 138 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 139 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 160 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 169 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 181 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 183 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 187 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 201 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 212 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 213 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 206 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 236 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 230 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 248 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 233 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 222 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 242 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 251 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 260 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 254 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 246 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 264 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 259 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 298 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 289 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 280 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 302 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 292 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 325 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 327 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 318 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 330 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 356 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 341 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 342 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 369 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 371 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 363 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 367 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 34 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 36 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 24 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 48 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 41 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 59 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 61 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 93 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 84 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 87 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 101 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 104 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 105 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 108 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 128 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 118 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 112 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 143 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 134 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 149 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 168 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 193 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 194 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 195 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 197 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 200 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 173 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 225 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 209 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 273 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 274 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 276 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 277 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 285 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 286 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 257 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 283 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 299 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 311 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 294 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 332 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 321 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 322 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 317 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 344 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 339 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 347 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 328 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 358 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 360 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 361 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 6 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 8 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 11 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 54 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 19 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 64 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 35 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 68 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 62 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 72 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 94 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 96 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 75 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 76 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 86 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 89 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 114 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 153 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 157 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 217 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 219 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 190 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 191 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 161 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 163 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 166 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 228 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 172 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 202 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 174 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 175 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 232 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 145 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 235 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 147 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 237 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 241 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 211 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 215 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 312 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 252 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 253 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 258 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 326 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 267 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 270 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 331 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 364 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 372 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 380 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 374 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 382 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 2 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 3 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 5 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 52 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 12 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 25 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 15 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 27 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 30 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 43 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 21 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 92 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 69 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 109 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 77 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 88 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 150 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 125 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 130 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 186 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 188 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 137 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 164 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 226 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 205 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 207 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 179 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 334 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 337 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 255 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 256 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 345 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 295 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 269 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 373 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 45 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 50 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 16 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 57 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 81 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 85 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 97 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 98 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 100 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 91 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 106 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 140 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 131 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 110 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 132 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 120 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 122 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 136 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 127 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 152 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 177 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 156 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 184 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 196 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 198 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 199 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 224 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 204 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 229 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 231 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 244 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 223 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 249 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 263 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 291 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 293 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 282 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 308 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 309 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 335 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 313 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 315 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 338 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 304 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 343 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 307 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 368 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 370 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 359 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 381 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 366 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 351 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 352 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 353 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 354 is present in all training examples.\n", + " str(classes[c]))\n", + "/Users/tesuji/miniconda3/envs/datascience-py3/lib/python3.6/site-packages/sklearn/multiclass.py:76: UserWarning: Label not 379 is present in all training examples.\n", + " str(classes[c]))\n" + ] + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Cross Validation Scores are: [0. 0.0336 0.0084 0. 0.0254]\n", + "Mean CrossVal score is: 0.0135\n", + "Std Dev CrossVal score is: 0.0137\n" + ] + } + ], + "source": [ + "crossval(pipeline, X_projects, y_projects, 5) # We can reuse the previous pipeline" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Pipeline(memory=None,\n", + " steps=[('mapper_step', DataFrameMapper(default=False, df_out=False,\n", + " features=[('Questions', [TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',\n", + " dtype=, encoding='utf-8', input='content',\n", + " lowercase=True, max_df=1.0, max_features=65536, m...1337, solver='liblinear', tol=0.0001,\n", + " verbose=0, warm_start=False),\n", + " n_jobs=-1))])" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pipeline.fit(X_projects, y_projects)" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[(),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Parity', 'Truffle'),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Ethereum',),\n", + " ('Ethereum',),\n", + " ('Ethereum',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Solidity', 'Truffle'),\n", + " ('Ethereum',),\n", + " ('Consensys',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Truffle',),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Remix', 'Solidity', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('MetaMask', 'Truffle'),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('MetaMask', 'web3.js'),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " ('Solidity', 'Truffle'),\n", + " (),\n", + " ('MetaMask',),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Solidity',),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Mythril',),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('EVM',),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " ('MetaMask', 'Remix'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Consensys',),\n", + " ('Truffle',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Consensys',),\n", + " (),\n", + " ('Remix', 'Solidity', 'Truffle'),\n", + " (),\n", + " ('MetaMask',),\n", + " (),\n", + " ('Solidity',),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Truffle',),\n", + " ('Aragon',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Remix',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Consensys',),\n", + " ('Truffle',),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " ('Solidity',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Geth', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Remix',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Truffle',),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " ('Remix', 'Truffle'),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " ('Truffle',),\n", + " ('Truffle',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " ('Solidity',),\n", + " (),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Truffle',),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " ('Ethereum',),\n", + " (),\n", + " ('Remix',),\n", + " (),\n", + " (),\n", + " (),\n", + " ('Solidity',),\n", + " (),\n", + " ()]" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "projects_mlb.inverse_transform(pipeline.predict(X_projects))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Seems like our logistic regression model as an easier time with projects. It's still not perfect though." + ] } ], "metadata": {