Тема: Парсинг

мб не в той розділ, сорі
є якісь паблік апі, для парсингу гугла і бінга, (без приватних ключів)?

2

Re: Парсинг

В Google нема такого апі. Напряму можна парсити видачу.
Є лише https://developers.google.com/custom-search/v1/overview

Free quota
Usage is free for all users, up to 100 queries per day.

3

Re: Парсинг

:(

Daily Limit for Unauthenticated Use Exceeded. Continued use requires signup.

хтось з моєї сітки тим бавився

4

Re: Парсинг

ще є якісь методи ?

5 Востаннє редагувалося funivan (30.10.2012 13:02:05)

Re: Парсинг

Особисто я не знаю тільки ця штука що реплейс вказав. Але чим вам звичайний парсінг не підходить? :)

6

Re: Парсинг

miroslav.chandler написав:

мб не в той розділ, сорі
є якісь паблік апі, для парсингу гугла і бінга, (без приватних ключів)?

А якщо ви б були власником гугла ви б створили паблік апі для його парсингу?)

7

Re: Парсинг

А якщо ви б були власником гугла ви б створили паблік апі для його парсингу?)

Фактично апі є тілкьи з лімітами таким що огого

The Google Custom Search API lets you develop websites and programs to retrieve and display search results from Google Custom Search programmatically. With this API, you can use RESTful requests to get either web search or image search results in JSON or Atom format.
Any usage beyond the free usage quota will fail if you are not signed up for billing. Once you have enabled billing, you will continue to receive 100 free queries per day. However, you will be billed for all additional requests at the rate of $5 per 1000 queries, for up to 10,000 queries per day.

8

Re: Парсинг

Patron написав:
miroslav.chandler написав:

мб не в той розділ, сорі
є якісь паблік апі, для парсингу гугла і бінга, (без приватних ключів)?

А якщо ви б були власником гугла ви б створили паблік апі для його парсингу?)

http://ajax.googleapis.com/ajax/service … mp;q=ololo
(:
тільки ця тулза, офіціально депрекейтед

9

Re: Парсинг

А у когось вишло зпарсити інфу з "Google's Keyword Tool - Google Adwords". Це взагалі реально зробити?

10

Re: Парсинг

як варіант можна написати прогу/скрипт/бібліотеку. там без яви - парсити не проблема

11

Re: Парсинг

там без яви - парсити не проблема

Якраз там все через ajax. Парсити таке нереально.

12 Востаннє редагувалося ADR (11.02.2013 22:51:44)

Re: Парсинг

Replace написав:

там без яви - парсити не проблема

Якраз там все через ajax. Парсити таке нереально.

Я парсив на сайтах із ajax...

Можна лінк на сторінку з даними? (надіюсь авторизація не потрібна?)

Головне щоб скрипт не генерував якийсь ключ чи хеш...

13

Re: Парсинг

https://adwords.google.com/o/KeywordTool

14

Re: Парсинг

що там намутили...

Потрібно послати POST запит на адрес: https://adwords.google.com/o/Targeting/ … rrency=USD
АЛЕ

POST запит:

Прихований текст
7|1|94|https://adwords.google.com/o/Targeting/|5BAA2B8303235029A339E698BB40CAC7|_|invoke|d|2m6|2s1|2lx|2ly|t|TiAction (Search.SearchInput.KEYWORD_IDEAS.NONE.bundles=0,defaulted=0.RelatedToKeyword+RelatedToUrl)|XIfY_r2Rjw5mPXWySvoYhmW9aoU:1360687460500|2ru|2n1|2mb|2md|m|u|v|2pa|2o0|2rs|Captcha|answered|validated|RelatedToKeyword(KEYWORD_INPUT)|TiImpression|ParameterInQuery|RelatedToUrl(URL_INPUT)|LanguageTarget(COMMON_ADVANCED_OPTIONS)|DeviceType(KEYWORD_IDEAS_ADVANCED_OPTIONS)|KeywordMatchType(MATCH_TYPE_SELECTOR_PANEL)|28s|2pz|o|AXIS-NONE|AXIS-CONCEPTS|CellTable|CreateAdGroups|CurrencyPicker|IcsCaptchaMessage|KeywordIdeasVisualizations|LoadNegativesFromCampaignAndAdgroup|MobileAppPlacements|OneBox|PersistPreferences|SavedAdGroups|SeedKeywordsTable|trafficEstimatorSunset|urlBundles|AverageMonthlyTrafficWithAfs|StarredIdeasEditing|SuggestedBid|28f|2qb|1ob|uk|1q8|2a2|292|2q1|293|IDEA_TYPE|KEYWORD|AVERAGE_TARGETED_MONTHLY_SEARCHES|AD_SHARE|COMPETITION|EXTRACTED_FROM_WEBPAGE|GLOBAL_MONTHLY_SEARCHES|IDEA_IN_ADGROUP|NEGATIVE_KEYWORDS|SEARCH_SHARE|SUGGESTED_BID|TARGETED_MONTHLY_SEARCHES|2pb|2b9|2q8|2ej|українська|2au|29y|2bm|2p9|2q7|nv|nw|Український форум|CATEGORY_PRODUCTS_AND_SERVICES|KEYWORD_CATEGORY|2bp|2bq|2rr|http://replace.org.ua|2b6|1|2|3|4|1|5|5|6|nNsd_KmT4|4|7|1|8|0|9|A|10|2|7msoA|7msoA|11|12|13|9|14|15|16|-4|0|0|17|6|A|0|0|0|0|0|0|0|18|0|19|1|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|20|0|7|0|0|0|21|22|0|-1|23|22|1|24|14|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|-14|0|7|0|0|0|21|-17|-1|23|22|1|25|14|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|-14|0|7|0|0|0|21|22|1|26|1|27|22|1|28|14|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|-14|0|7|0|0|0|21|22|1|29|1|27|-31|14|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|-14|0|7|0|0|0|21|22|1|30|1|27|-31|14|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|-14|0|7|0|0|0|21|22|1|31|1|27|-31|14|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|-14|0|7|0|0|0|21|22|1|32|1|27|-31|33|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|34|18|35|1|36|35|0|37|35|1|38|35|1|39|35|1|40|35|1|41|35|1|42|35|1|43|35|1|44|35|1|45|35|1|46|35|1|47|35|1|48|35|1|49|35|1|50|35|1|51|35|1|52|35|1|53|0|7|0|0|0|54|55|236|56|0|57|0|58|50|0|0|59|0|0|60|1|61|12|62|63|62|64|62|65|62|66|62|67|62|68|62|69|62|70|62|71|62|72|62|73|62|74|75|61|3|76|77|78|79|57|80|81|4|82|83|84|85|0|86|2|87|0|0|0|0|0|33|15|16|-4|0|0|-11|A|0|0|0|0|0|0|0|-12|-13|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|34|18|35|1|36|35|0|37|35|1|38|35|1|39|35|1|40|35|1|41|35|1|42|35|1|43|35|1|44|35|1|45|35|1|46|35|1|47|35|1|48|35|1|49|35|1|50|35|1|51|35|1|52|35|1|53|0|7|0|0|0|54|-80|-81|57|0|58|50|0|0|-83|60|0|61|12|-86|-87|-88|-89|-90|-91|-92|-93|-94|-95|-96|-97|61|3|62|88|-87|62|89|61|5|90|0|91|3|77|92|93|-100|-103|94|77|-109|82|-106|84|85|0|-109|87|0|0|0|0|0|

Результат (дані JSON):

Прихований текст
//OK[100,5,4,0,91,90,1,0,10,83,89,88,3,87,86,3,85,84,3,0,10,83,1,10,83,1,10,83,2,10,82,81,34,1,4,-155,64,0.162,63,51,-151,0,51,-149,'u',38,53,-146,0,0,0,0,80,-132,0,47,0,0,0,0,79,-132,0,47,2,10,58,-141,-140,55,-138,'u',38,53,-135,0,51,-133,0,0,0,0,79,-132,0,47,46,-129,0,44,-127,'BDl_',78,41,40,-124,2012,2,'BJ',38,37,2012,3,'6',38,37,2012,4,'k',38,37,2012,5,'c',38,37,2012,6,'u',38,37,2012,7,'u',38,37,2012,8,'k',38,37,2012,9,'u',38,37,2012,10,'k',38,37,2012,11,'u',38,37,2012,12,'6',38,37,2013,1,0,37,12,10,36,-98,11,4,33,0.116,63,51,-151,0,51,-149,'u',38,53,-146,0,0,0,0,77,-132,0,47,0,0,0,0,76,-132,0,47,2,10,58,-141,-140,55,-138,'u',38,53,-135,0,51,-133,0,0,0,0,76,-132,0,47,46,-129,0,44,-127,'Yag',73,41,40,-124,2012,2,'u',38,37,2012,3,'c',38,37,2012,4,'c',38,37,2012,5,'k',38,37,2012,6,'Bb',38,37,2012,7,'u',38,37,2012,8,'6',38,37,2012,9,'c',38,37,2012,10,'c',38,37,2012,11,'6',38,37,2012,12,'6',38,37,2013,1,0,37,12,10,36,-98,11,4,33,0.05,63,51,-151,0,51,-149,'Bb',38,53,-146,0,0,0,0,75,-132,0,47,0,0,0,0,74,-132,0,47,2,10,58,-141,-140,55,-138,'Bb',38,53,-135,0,51,-133,0,0,0,0,74,-132,0,47,46,-129,0,44,-127,'Yag',73,41,40,-124,2012,2,'Cq',38,37,2012,3,'Bu',38,37,2012,4,'BJ',38,37,2012,5,'BJ',38,37,2012,6,'BJ',38,37,2012,7,'Bb',38,37,2012,8,'Bb',38,37,2012,9,'Bb',38,37,2012,10,'Bb',38,37,2012,11,'Bb',38,37,2012,12,'Bb',38,37,2013,1,0,37,12,10,36,-98,11,4,33,0.071,63,51,-151,0,51,-149,'Cq',38,53,-146,0,0,0,0,72,-132,0,47,0,0,0,0,71,-132,0,47,2,10,58,-141,-140,55,-138,'Cq',38,53,-135,0,51,-133,0,0,0,0,71,-132,0,47,46,-129,0,44,-127,'Dvn2',70,41,40,-124,2012,2,'Bu',38,37,2012,3,'CM',38,37,2012,4,'Bu',38,37,2012,5,'CM',38,37,2012,6,'DS',38,37,2012,7,'Cq',38,37,2012,8,'EE',38,37,2012,9,'FA',38,37,2012,10,'Cq',38,37,2012,11,'DS',38,37,2012,12,'Cq',38,37,2013,1,0,37,12,10,36,-98,11,4,33,4,10,0,10,-92,1,32,69,-92,'E1YnVUhXW',182408330,9936,29,28,1,10,27,26,0,0,0,0,0,4,0,'A','A',-1,68,0,'A','A',67,6,-7,15,6,-20,66,6,-5,3,4,3,13,0,23,1,0,0,11,1,5,1,0,4,65,64,0.061,63,51,62,34,0,51,61,34,'lg',38,53,60,34,0,0,0,0,59,-132,0,47,0,0,0,0,49,-132,0,47,2,10,58,57,34,0,56,55,54,34,'lg',38,53,52,34,0,51,50,34,0,0,0,0,49,2,48,0,47,46,45,34,0,44,43,34,'BMFH',42,41,40,39,34,2012,2,'lg',38,37,2012,3,'tU',38,37,2012,4,'ds',38,37,2012,5,'ds',38,37,2012,6,'ZA',38,37,2012,7,'ds',38,37,2012,8,'ds',38,37,2012,9,'lg',38,37,2012,10,'lg',38,37,2012,11,'lg',38,37,2012,12,'lg',38,37,2013,1,0,37,12,10,36,35,34,11,4,33,1,10,0,10,-92,1,32,31,2,30,'E1YnVUhXU',182408330,9936,29,28,1,10,27,26,0,0,0,0,0,4,0,'A','A',-1,25,0,'A','A',24,6,-7,15,6,-20,24,6,-5,3,4,3,13,0,23,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,22,0,'A','A',14,6,-7,15,6,-20,14,6,-5,3,4,3,13,12,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,21,0,'A','A',14,6,-7,15,6,-20,14,6,-5,3,4,3,13,12,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,20,0,'A','A',14,6,-7,15,6,-20,14,6,-5,3,4,3,13,12,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,19,0,'A','A',14,6,-7,15,6,-20,14,6,-5,3,4,3,13,12,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,18,0,'A','A',14,6,-7,15,6,-20,14,6,-5,3,4,3,13,12,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,17,0,'A','A',14,6,-7,15,6,-20,14,6,-5,3,4,3,13,12,1,0,0,11,0,0,0,0,0,4,0,'A','A',-1,16,0,'A','A',15,6,-7,15,6,23,5,14,6,-5,3,4,3,13,12,1,0,0,11,9,10,0,4,'A','A',9,0,8,1,4,0,7,6,24,5,7,6,21,5,2,4,3,2,1,0,0,1,["e","2m8","2ma","2s1","2rl","2rr","1617","2lx","2m0","2ru","j","2n3","2mc","0","1","4BFS1YnVBACKVN8K0CYAAA","yRVS1YnVBACKVN8K0CYAAA","yhVS1YnVBACKVN8K0CYAAA","zBVS1YnVBACKVN8K0CYAAA","zhVS1YnVBACKVN8K0CYAAA","0BVS1YnVBACKVN8K0CYAAA","0hVS1YnVBACKVN8K0CYAAA","28u","141","1BVS1YnVBACKVN8K0CYAAA","28e","2a5","2a3","2a1","2a7","events {\n  time_micros: 1360687873075921\n  text: \"Start processing request\"\n}\nevents {\n  time_micros: 1360687873075924\n  text: \"Start fetching data\"\n}\nevents {\n  time_micros: 1360687873077314\n  text: \"Looked up 1 KeywordMetadata from SSTable in 1000 usec. Sent 2 rpcs of which 0 failed.\"\n}\nevents {\n  time_micros: 1360687873077475\n  text: \"Fetch done 3 keywords in 1551 usec\"\n}\nevents {\n  time_micros: 1360687873077489\n  text: \"Start filtering suggestions\"\n}\nevents {\n  time_micros: 1360687873077546\n  text: \"Filter done 3 suggestions to 3 suggestions in 57 usec\"\n}\nevents {\n  time_micros: 1360687873077549\n  text: \"Start sorting suggestions\"\n}\nevents {\n  time_micros: 1360687873077554\n  text: \"Sort done 3 suggestions in 5 usec\"\n}\nevents {\n  time_micros: 1360687873077558\n  text: \"Start dedup\"\n}\nevents {\n  time_micros: 1360687873077571\n  text: \"Dedup done 3 suggestions to 3 suggestions in 13 usec\"\n}\nevents {\n  time_micros: 1360687873077574\n  text: \"Start annotation\"\n}\nevents {\n  time_micros: 1360687873077576\n  text: \"Annotation done in 2 usec\"\n}\nevents {\n  time_micros: 1360687873077595\n  text: \"Start response conversion\"\n}\nevents {\n  time_micros: 1360687873077621\n  text: \"Response conversion done in 26 usec\"\n}\nevents {\n  time_micros: 1360687873077623\n  text: \"Response ready in 1702 usec\"\n}\n","2s2","28d","293","TARGETED_MONTHLY_SEARCHES","1om","1pp","2rn","SUGGESTED_BID","1ok","2lt","0,31\xA0$","EXTRACTED_FROM_WEBPAGE","1ow","KEYWORD","1og","nv","nw","український форум","SEARCH_SHARE","1o8","GLOBAL_MONTHLY_SEARCHES","1oi","IDEA_TYPE","1oc","1ob","NEGATIVE_KEYWORDS","1oh","український","AVERAGE_TARGETED_MONTHLY_SEARCHES","AD_SHARE","COMPETITION","2rh","28c","2pa","1615","1616","1hVS1YnVBACKVN8K0CYAAA","events {\n  time_micros: 1360687873091694\n  text: \"Start processing request\"\n}\nevents {\n  time_micros: 1360687873091695\n  text: \"Start fetching data\"\n}\nevents {\n  time_micros: 1360687873091731\n  text: \"Looking up cache with key : 12548898826427528153\"\n}\nevents {\n  time_micros: 1360687873092204\n  text: \"Cache miss\"\n}\nevents {\n  time_micros: 1360687873093018\n  text: \"Making quest online call with request : seed_input { keyword { text: \\\"\\\\321\\\\203\\\\320\\\\272\\\\321\\\\200\\\\320\\\\260\\\\321\\\\227\\\\320\\\\275\\\\321\\\\201\\\\321\\\\214\\\\320\\\\272\\\\320\\\\270\\\\320\\\\271 \\\\321\\\\204\\\\320\\\\276\\\\321\\\\200\\\\321\\\\203\\\\320\\\\274\\\" match_type: BROAD } crawl { url: \\\"http://replace.org.ua\\\" } } targeting { platform: SEARCH language: \\\"uk\\\" } max_results: 1000 client_id: \\\"KWI-OPT.API.EXPLORER\\\" config { u2kconfig { use_related_urls: false } return_keyword_stats: false } event_id { time_usec: 1360687873005014 server_ip: 182408330 process_id: 9936 }\"\n}\nevents {\n  time_micros: 1360687874485487\n  text: \"Looked up 16 KeywordMetadata from SSTable in 1000 usec. Sent 29 rpcs of which 0 failed.\"\n}\nevents {\n  time_micros: 1360687874485749\n  text: \"Fetch done 16 keywords in 1394054 usec\"\n}\nevents {\n  time_micros: 1360687874485758\n  text: \"Start filtering suggestions\"\n}\nevents {\n  time_micros: 1360687874485838\n  text: \"Filter done 16 suggestions to 4 suggestions in 80 usec\"\n}\nevents {\n  time_micros: 1360687874485842\n  text: \"Start sorting suggestions\"\n}\nevents {\n  time_micros: 1360687874485846\n  text: \"Sort done 4 suggestions in 4 usec\"\n}\nevents {\n  time_micros: 1360687874485848\n  text: \"Start dedup\"\n}\nevents {\n  time_micros: 1360687874485857\n  text: \"Dedup done 4 suggestions to 4 suggestions in 9 usec\"\n}\nevents {\n  time_micros: 1360687874485859\n  text: \"Start annotation\"\n}\nevents {\n  time_micros: 1360687874485861\n  text: \"Annotation done in 2 usec\"\n}\nevents {\n  time_micros: 1360687874485874\n  text: \"Start response conversion\"\n}\nevents {\n  time_micros: 1360687874485895\n  text: \"Response conversion done in 21 usec\"\n}\nevents {\n  time_micros: 1360687874485896\n  text: \"Response ready in 1394202 usec\"\n}\n","0,98\xA0$","український природоохоронний форум","український природоохоронний","0,10\xA0$","український форум благодійників","український благодійників","український сат форум","український сат","0,28\xA0$","український бізнес форум","український бізнес","CATEGORY_PRODUCTS_AND_SERVICES","29d","29e","11499","Форуми й чати","13418","Інтернет","10007","Інтернет і телекомунікація","10013","Мистецтво та розваги"],1,7]


Як парсити результат ще можна придумати, а от із запитом значно складніше...

15 Востаннє редагувалося ping (17.07.2013 08:35:39)

Re: Парсинг

питання - хто знає готове рішення - треба отримати з URL ВСІ лінки на задану глибину.

додано:
відповідь - якщо нема потреби писати щось своє - ось чудова програмка:
KLinkStatus
KLinkStatus is KDE's web link validity checker. It allows you to search internal and external links throughout your web site. Simply point it to a single page and choose the depth to search.

You can also check local files, or files over ftp:, fish: or any other KIO protocols. For performance, links can be checked simultaneously.

This package is part of KDE web development module.

http://kdewebdev.org/

16

Re: Парсинг

Якраз там все через ajax. Парсити таке нереально.

Та все реально =)

17

Re: Парсинг

подивись ось тут
http://answers.oreilly.com/topic/2165-h … bing-in-c/