Closed Bug 1201977 Opened 9 years ago Closed 9 years ago

Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement

Categories

(Firefox :: New Tab Page, defect, P3)

Product:

Component:

Version:

43 Branch

Type:

defect

Priority:

P3

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Firefox 47

Iteration:

43.3 - Sep 21

Tracking Flags:

Tracking

Status

firefox43

---

affected

firefox47

---

fixed

People

(Reporter: oyiptong, Assigned: mzhilyaev)

References

Details

Attachments

(2 files, 1 obsolete file)

v1 of places query executer 9 years ago maxim zhilyaev 11.18 KB, patch		Details \| Diff \| Splinter Review
v2. fixed reviewer comments and reworked SQL logic 9 years ago maxim zhilyaev 20.76 KB, patch	mak : feedback-	Details \| Diff \| Splinter Review
MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong 9 years ago maxim zhilyaev 58 bytes, text/x-review-board-request	oyiptong : review+	Details

Olivier Yiptong [:oyiptong]

Reporter

Description

•

9 years ago

The use of nsINavHistoryQuery involves additional code to maintain that doesn't need to be implemented. For instance, in PlacesProvider.jsm, there is code to sort the results of a database query after all results are collected. Using a string query and asyncStatement execution will make that code much shorter. The nsINavHistoryQuery for getLinks ends up being: SELECT h.id, h.url, h.title AS page_title, h.rev_host, h.visit_count, h.last_visit_date, f.url, null, null, null, null, null AS tags , h.frecency, h.hidden, h.guid FROM moz_places h LEFT JOIN moz_favicons f ON h.favicon_id = f.id WHERE 1 AND hidden = 0 AND last_visit_date NOTNULL ORDER BY 13 DESC LIMIT 100; This could be replaced with: SELECT url, title, last_visit_date, frecency FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER BY frecency DESC LIMIT 10; Since the old code has not been tested before, we need to make sure the functionality stays the same.

Olivier Yiptong [:oyiptong]

Reporter

Comment 1

•

9 years ago

ugh, i meant: SELECT url, title, last_visit_date, frecency FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER BY frecency DESC, last_visit_date DESC, url DESC LIMIT 100;

Olivier Yiptong [:oyiptong]

Reporter

Comment 2

•

9 years ago

This query would further add more work on sqlite and reduce the need to remove links in JS: SELECT url, title, MAX(last_visit_date), MAX(frecency) FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL GROUP BY rev_host ORDER BY frecency DESC, last_visit_date DESC, url DESC LIMIT 10; NewTabUtils.getLinks currently calls extractSites. This would diminish the work extractSites has to do, because it would group by reverse hostname, and ignores the HTTP scheme

Comment 3

•

9 years ago

mak/ttaubert, can you provide any context on why NewTabUtils used nsINavHistoryQuery.getNewQuery? Or now that there's asyncStatement, it's okay to just query now?

Marco Bonardo [:mak]

Comment 4

•

9 years ago

I think we could just query through PlacesUtils.promiseDBConnection.

Tim Spurway [:tspurway]

Updated

•

9 years ago

Priority: -- → P3

Assignee

Updated

•

9 years ago

Assignee: nobody → mzhilyaev

Assignee

Comment 5

•

9 years ago

Attached patch v1 of places query executer (obsolete) — Details — Splinter Review

- parked in Links objects for now - also not that domain deduplication is not performed as RNT code does not seem to attempt domain deduplication (only the old NT code dedupes links on base domains)

Attachment #8711087 - Flags: review?(oyiptong)

Olivier Yiptong [:oyiptong]

Reporter

Comment 6

•

9 years ago

Comment on attachment 8711087 [details] [diff] [review] v1 of places query executer Review of attachment 8711087 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/components/newtab/PlacesProvider.jsm @@ +195,5 @@ > * > * @returns {Promise} Returns a promise with the array of links as payload. > */ > getLinks: function PlacesProvider_getLinks() { > + return Task.spawn(function* () { I think what you want is: > getLinks: Task.async(function* () { > ... > }), @@ +209,5 @@ > + params: {limit: this.maxNumLinks} > + }); > + > + // It's very important that the initial list of links is sorted in > + // the same order imposed by compareLinks, because Links uses compareLinks Please remove mention of compareLinks, because compareLinks will be (and should be) absent from this file. An explanation from first principles is a better way. localeCompare as defined in: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Object/String/localeCompare does a string comparison in ascending order, taking into account the character encoding if passed. compareLinks doesn't. Therefore, I believe that the sqlite string comparison shouldn't be very different. LinkUtils should die. With Fire. @@ +226,5 @@ > + i = BinarySearch.insertionIndexOf(LinkUtils.compareLinks, links, link); > + links.splice(i, 0, link); > + } > + > + return links; On the other hand, we want LinkChecker to run before we return the links. We want to filter out uncouth urls. @@ +280,2 @@ > } > + nit: additional whitespace ::: browser/components/newtab/tests/xpcshell/test_PlacesProvider.js @@ +335,5 @@ > + > + yield PlacesTestUtils.addVisits(visits); > + > + function testItemValue(results, index, value) { > + equal(results[index][0], "https://mozilla.com/" + value, "raw url"); nit: please make use of string templating, as such, where sensible: > let name = "FooBar"; > let someText = `my name is ${name}`; It makes code more readable and less error-prone @@ +397,5 @@ > + } > + catch (e) { > + do_check_true("expected failure - bad sql"); > + } > + // missing bidnings nit: typo should be > // missing bindings

Olivier Yiptong [:oyiptong]

Reporter

Comment 7

•

9 years ago

Also, did we say we were going to do: > GROUP BY rev_host ?

Assignee

Comment 8

•

9 years ago

Attached patch v2. fixed reviewer comments and reworked SQL logic — Details — Splinter Review

Attachment #8711087 - Attachment is obsolete: true

Attachment #8711087 - Flags: review?(oyiptong)

Attachment #8711734 - Flags: review?(oyiptong)

Assignee

Comment 9

•

9 years ago

Attached file MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong — Details

Review commit: https://reviewboard.mozilla.org/r/32297/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/32297/

Attachment #8711754 - Flags: review?(oyiptong)

Marco Bonardo [:mak]

Comment 10

•

9 years ago

Comment on attachment 8711734 [details] [diff] [review] v2. fixed reviewer comments and reworked SQL logic Review of attachment 8711734 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/components/newtab/PlacesProvider.jsm @@ +167,5 @@ > + " \"history\" as type " + > + "FROM " + > + "( " + > + " SELECT rev_host, url, title, frecency, last_visit_date " + > + " FROM moz_places " + why are you doing a subquery with an order by like this? I don't see a reason, selecting directly from moz_places would bring the same results, maybe even better ones. This is very unefficient since it cannot use indices and has to go through the whole database. did you try to profile it? On my db with a very fast computer it takes more than 2 seconds. If you tell me what you are trying to query I can maybe help building something more efficient.

Attachment #8711734 - Flags: feedback-

Assignee

Comment 11

•

9 years ago

Marco, This is our (obviously poor) attempt to off-load as much work to sqlite as we can. I will be descriptive at how we got to this query, so you can give us an expert advice. What we wanted to achieve was quite simple: For every host select a SINGLE page with highest frecency, highest recency. Then, from that page set ordered by frecency, recency select top N pages. The result set is a list of pages ordered by frecency, recency where only a single page from a unique host is present. We made a few naive attempts with this approach: select rev_host, title, url, MAX(frecency), MAX(last_visit_date) from moz_places group by rev_host order by frecency DESC, last_visit_date DESC That did not work because "group by" behavior for columns other then aggregates is undefined in SQL. Since there are two aggregates in select, sqlte ended up mixing urls and titles from different records to satisfy both MAX(frecency) and MAX(last_visit_date). For example, it would take 'title' from a record with MAX(frecency) and 'url' from a record with MAX(last_visit_date), and would place them together into a resulting record. To get around this issue, we moved ordering with groupby into subselect, and noticed that sqlite picks the last row in the groupby bucket (which explains this backward orderby clause inside subselect). And then we re-order the result outside the subselect and do the limit. This forces sqlite to do the full table scan and screws performance. So, what should we do? One optimization that I can think of is to join moz_places with moz_hosts inside subselect - this will choose only the rows which hosts are listed in moz_hosts table. I am not particularity thrilled about this solution because it would require reversing host column if moz_hosts table in order to join on rev_host. Another option is to revert back to old behavior where we simply select to N most fresent pages inside subselect, and then run date ordering and rev_host deduplication outside the subselect. That will definitely help, but.... If you could advice a better solution, I would rather use that. Finally, if you have a beefy places.sqlite file to test performance on, please share. My miki-mouse profile did not catch performance degrade, and I woudl rather use what you use for performance testing.

Flags: needinfo?(mak77)

Olivier Yiptong [:oyiptong]

Reporter

Comment 12

•

9 years ago

Thanks for chiming in, Marco. I'll wait for Marco's feedback before I resume reviewing.

Marco Bonardo [:mak]

Comment 13

•

9 years ago

The perf problem is usually due to the most internal query, indeed in your case the most internal query selects the whole table. But you don't need the whole table for the external query, you just need the pages for the 10 most frecent domains. So a first improvement may be to add a condition in the most internal query to limit the number of entries that the external one will have to go through, for example: rev_host IN ( SELECT DISTINCT rev_host FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER by frecency DESC, last_visit_date DESC LIMIT 10 ) With this, you may even try the query you were trying to do originally, but as you said SQL will take "random" values, thus to make it work you should select each value one by one. This works when you need just a couple values but in this case the query would be quite long: SELECT (SELECT url FROM moz_places WHERE rev_host = h.rev_host ORDER BY frecency DESC, last_visit_date DESC LIMIT 1) AS url, (SELECT title FROM moz_places WHERE rev_host = h.rev_host ORDER BY frecency DESC, last_visit_date DESC LIMIT 1) AS title, (SELECT frecency FROM moz_places WHERE rev_host = h.rev_host ORDER BY frecency DESC, last_visit_date DESC LIMIT 1) AS frecency, (SELECT last_visit_date FROM moz_places WHERE rev_host = h.rev_host ORDER BY frecency DESC, last_visit_date DESC LIMIT 1) AS lastVisitDate, "history" as type FROM moz_places h WHERE rev_host IN ( SELECT DISTINCT rev_host FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER by frecency DESC, last_visit_date DESC LIMIT 10 ) GROUP BY rev_host ORDER BY frecency DESC, lastVisitDate DESC, url LIMIT 10 Even if this executes 6 subqueries, it is faster than your original query cause it acts on a subset and can use indices (about 70ms) There are various ways to obtain the result starting from that internal optimization, you could even apply it to the subquery trick you used and check how it performs. In the end, after some playing, I obtained this query that is about 4 times faster than the previous one (15ms): SELECT url, title, frecency, last_visit_date as lastVisitDate, "history" as type FROM moz_places WHERE frecency IN ( SELECT MAX(frecency) AS frecency FROM moz_places WHERE rev_host IN ( SELECT DISTINCT rev_host FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER by frecency DESC, last_visit_date DESC LIMIT 10 ) GROUP BY rev_host ) GROUP BY rev_host HAVING MAX(last_visit_date) ORDER BY frecency DESC, last_visit_date DESC The idea is, starting from the 10 most frecent hosts, extract the max frecency for each of them, then select the pages having those frecency values, then group by rev_host to remove dupes. the HAVING is using an aggregate and thus it applies to the group subset. Finally it just needs sorting. Note I did some quick tests and I seem to get the expected values, but you should double check that I was not fooled. Regardless this should give you some interesting ideas to investigate. To test I'm currently just using my everyday browsing profile, it is about 80MB places.sqlite, nothing crazy. Unfortunaly it has many personal data so it's hard to share it... Btw, a good testing db should have something like 100 thousands unique pages from 7000 unique hosts.

Flags: needinfo?(mak77)

Olivier Yiptong [:oyiptong]

Reporter

Updated

•

9 years ago

Attachment #8711754 - Flags: review?(oyiptong)

Olivier Yiptong [:oyiptong]

Reporter

Comment 14

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong https://reviewboard.mozilla.org/r/32297/#review29739 ::: browser/components/newtab/PlacesProvider.jsm:165 (Diff revision 1) > - .SORT_BY_FRECENCY_DESCENDING; > + let sqlQuery = "SELECT url, title, frecency, " + Please change query according to Marco's feedback ::: browser/components/newtab/PlacesProvider.jsm:205 (Diff revision 1) > + return Task.spawn(function*() { Can this be Task.async? ::: browser/components/newtab/tests/xpcshell/test_PlacesProvider.js:280 (Diff revision 1) > + // test paramter passing nit: typo: parameter passing ::: browser/components/newtab/tests/xpcshell/test_PlacesProvider.js:303 (Diff revision 1) > + // test ordering Unsure what this is testing. It looks like this is testing SQLite and not testing the logic.

Olivier Yiptong [:oyiptong]

Reporter

Comment 15

•

9 years ago

https://reviewboard.mozilla.org/r/32297/#review29771

Olivier Yiptong [:oyiptong]

Reporter

Comment 16

•

9 years ago

Thanks for the patch! The changes to be made are fairly cosmetic, aside from the SQL query suggestions by Marco.

Assignee

Comment 17

•

9 years ago

After some twisting on Marco's idea, I finally arrived to the query below that runs almost as fast as Marco's quickest query, and generates correct results: SELECT url, title, frecency, last_visit_date as lastVisitDate, "history" as type FROM moz_places WHERE frecency in ( SELECT MAX(frecency) as freceny FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL GROUP BY rev_host ORDER BY freceny DESC LIMIT 10 ) GROUP BY rev_host HAVING MAX(lastVisitDate) ORDER BY frecency DESC, lastVisitDate DESC My profile is 1/10 of Marco's, so the speed tests may not be directly applicable. But I do observe the same performance. Marco, if not mach trouble, could you run the above query against your monstrous profile and verify performance acceptable.

Flags: needinfo?(mak77)

Assignee

Comment 18

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32297/diff/1-2/

Attachment #8711754 - Flags: review?(oyiptong)

Olivier Yiptong [:oyiptong]

Reporter

Comment 19

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong https://reviewboard.mozilla.org/r/32297/#review31091 ::: browser/components/newtab/PlacesProvider.jsm:192 (Diff revision 2) > - // searches on the list. So, ensure the list is so ordered. > + * aOptions.columns - an array of column names. if supplied the return typo: returned? ::: browser/components/newtab/PlacesProvider.jsm:193 (Diff revision 2) > - let i = 1; > + * items will consists of objects keyed on column names. Otherwise typo: consist ::: browser/components/newtab/PlacesProvider.jsm:193 (Diff revision 2) > - let i = 1; > + * items will consists of objects keyed on column names. Otherwise Otherwise _an_ array? ::: browser/components/newtab/PlacesProvider.jsm:196 (Diff revision 2) > - if (LinkUtils.compareLinks(links[i - 1], links[i]) > 0) { > + * aOptions.callback - a callback to handle query raws typo: raws ::: browser/components/newtab/PlacesProvider.jsm:196 (Diff revision 2) > - if (LinkUtils.compareLinks(links[i - 1], links[i]) > 0) { > + * aOptions.callback - a callback to handle query raws Do we need a callback parameter? How about we just return the results of the query? ::: browser/components/newtab/PlacesProvider.jsm:204 (Diff revision 2) > + return Task.spawn(function*() { Can this be in Task.async? Furthermore, do we need to handle each row or can we just obtain the rows? We'll probably need to make a copy of the keys, but IIRC, the rows are returned as objects with column names as keys. Something like: ```javascript executePlacesQuery: Task.async(function*(aSql, aOptions={}) { let {columns, params, callback} = aOptions; let conn = yield PlacesUtils.promiseDBConnection(); let rows = conn.executeCached(aSql, params, callback); let items = rows.map(row => { let item = {}; for (let column of columns) { item[column] = row.getResultByName(column); } return item; }); return items; ```

Attachment #8711754 - Flags: review?(oyiptong)

Olivier Yiptong [:oyiptong]

Reporter

Comment 20

•

9 years ago

https://reviewboard.mozilla.org/r/32297/#review31095 ::: browser/components/newtab/PlacesProvider.jsm:168 (Diff revision 2) > - > + " SELECT MAX(frecency) as freceny " + typo: freceny ::: browser/components/newtab/PlacesProvider.jsm:172 (Diff revision 2) > - let url = row.getResultByIndex(1); > + " ORDER BY freceny DESC " + typo: freceny

Assignee

Comment 21

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32297/diff/2-3/

Attachment #8711754 - Flags: review?(oyiptong)

Assignee

Comment 22

•

9 years ago

https://reviewboard.mozilla.org/r/32297/#review31091 > Do we need a callback parameter? > > How about we just return the results of the query? talked to :oyiptong - decided to keep callback for the sake of generality. > Can this be in Task.async? > > Furthermore, do we need to handle each row or can we just obtain the rows? > > We'll probably need to make a copy of the keys, but IIRC, the rows are returned as objects with column names as keys. > > Something like: > > ```javascript > executePlacesQuery: Task.async(function*(aSql, aOptions={}) { > let {columns, params, callback} = aOptions; > let conn = yield PlacesUtils.promiseDBConnection(); > let rows = conn.executeCached(aSql, params, callback); > > let items = rows.map(row => { > let item = {}; > for (let column of columns) { > item[column] = row.getResultByName(column); > } > return item; > }); > > return items; > ``` talked to :oyiptong - decided to keep the original code for the sake of generality and error handling.

Marco Bonardo [:mak]

Comment 23

•

9 years ago

(In reply to maxim zhilyaev from comment #17) > My profile is 1/10 of Marco's, so the speed tests may not be directly > applicable. But I do observe the same performance. Marco, if not mach > trouble, could you run the above query against your monstrous profile and > verify performance acceptable. Is there a specific reason to change my query? like it is returning wrong results in some case? Locally I see both queries return the same results, but while my query takes 16ms, yours takes 250ms. That makes sense cause your query requires to group by the whole moz_places table and then it can limit it, while my query can first sort by frecency, then pick the first 10 distinct hosts, so it has to walk less rows.

Flags: needinfo?(mak77)

Assignee

Comment 24

•

9 years ago

Marco, I do see differences in my profile between results sets computed by the two queries. Consider the sub-query results from the slower query: SELECT rev_host, MAX(frecency) as freceny FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL GROUP BY rev_host ORDER BY freceny DESC LIMIT 10 721.66.861.291.|42533 moc.elgoog.liam.|40700 moc.nideknil.www.|16985 moc.lanruojevil.1kintup.|13553 moc.elgoog.www.|12434 moc.lanruojevil.lledyps.|12046 moc.lanruojevil.dassaclenoloc.|10971 ur.oboboh.www.|10704 gro.allizom.koobenohp.|9561 moc.itic.enilno.|8580 Note that it has mail.google.com as a second row. Now, consider results from your subquery SELECT rev_host, MAX(frecency) AS frecency FROM moz_places WHERE rev_host IN ( SELECT DISTINCT rev_host FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER by frecency DESC, last_visit_date DESC LIMIT 10 ) GROUP BY rev_host ORDER BY frecency DESC 721.66.861.291.|42533 moc.nideknil.www.|16985 moc.lanruojevil.1kintup.|13553 moc.lanruojevil.lledyps.|12046 moc.lanruojevil.dassaclenoloc.|10971 ur.oboboh.www.|10704 gro.allizom.koobenohp.|9561 ur.acimonocoen.|7895 moc.lanruojevil.nizahk.|7740 moc.lanruojevil.vearuk-kaid.|7625 Note, that mail.google.com is gone. I believe the "SELECT DISTINCT rev_host" does not necessarily picks rev_hosts in the order of frecency (which one would expect). If you run this query SELECT DISTINCT rev_host from ( SELECT rev_host FROM moz_places WHERE hidden = 0 AND last_visit_date NOTNULL ORDER by frecency DESC, last_visit_date DESC ) LIMIT 10 you discover that the result set is different from what one would expect by examining the output of the sub-query and choosing top 10 distinct sites going down the list in frecency order (you may want to increase the LIMIT to see this happening in your profile) Do you suggest we keep looking for the ways to further improve performance, or it's adequate at 250ms? Please advice.

Flags: needinfo?(mak77)

Olivier Yiptong [:oyiptong]

Reporter

Comment 25

•

9 years ago

Here's additional data points from a run on my machine. This will be comparing the first (a) and second (b) of mak's comment 13, and maksik's revised query (c) from comment 17. The size of my profile is: > sqlite> select count(*) from moz_places; > 104858 Here are the real clock times, averaged obtained from running 5 queries with `.time ON` in sqlite: > clock = {'a': [0.113, 0.106, 0.113, 0.11, 0.104], > 'b': [0.017, 0.016, 0.016, 0.017, 0.015], > 'c': [0.12, 0.122, 0.127, 0.126, 0.127]} > [(name, sum(real_clock)/len(real_clock)) for name, real_clock in clock.iteritems()] > > >> [('a', 0.1092), ('b', 0.0162), ('c', 0.1244)] i.e. mak's second query (16ms) is 6.7X faster than the first (109ms) and 7.7X faster than maksik's revised query (124ms). I obtain the same results from all 3 queries.

Olivier Yiptong [:oyiptong]

Reporter

Comment 26

•

9 years ago

maksik, if you generated that places db, can you please share that sqlite 3 file?

Olivier Yiptong [:oyiptong]

Reporter

Comment 27

•

9 years ago

I made a quick profile anonymizer: https://gist.github.com/oyiptong/784e1ed92c4851da2557 It expects a file named `places.backup.sqlite` to exist and will drop all tables except moz_places. It then replaces moz_places columns with anonymized entries. If anonymous enough, using this, we could share large profiles. Thoughts?

Olivier Yiptong [:oyiptong]

Reporter

Comment 28

•

9 years ago

updated the script to include rows where rev_host = null

Assignee

Comment 29

•

9 years ago

send anonymized profile via email. Note that i am using: sqlite3 -version 3.7.13 2012-07-17 17:46:21 65035912264e3acbced5a3e16793327f0a2f17bb

Marco Bonardo [:mak]

Comment 30

•

9 years ago

(In reply to maxim zhilyaev from comment #24) > Do you suggest we keep looking for the ways to further improve performance, > or it's adequate at 250ms? > Please advice. It depends on UX. If we need to run this query before the user can interact with the page, anything above 50-100ms will be noticeable to the user. If instead it's something running in the background that can happen at any time, it will likely be unnoticed. That said, I will look into your feedback, I think I have an half idea of the problem, and you're likely right the distinct is not doing what we think, likely it needs an aggregate. Let me see what I can do.

Olivier Yiptong [:oyiptong]

Reporter

Comment 31

•

9 years ago

Here's another script that measures the times it takes to run the queries. It also compares if the results are the same: https://gist.github.com/oyiptong/27a1f3f91204f3d56fb3

Olivier Yiptong [:oyiptong]

Reporter

Comment 32

•

9 years ago

My results: results for mak_1@10 mean: 106ms median: 107ms results for mak_1@100 mean: 313ms median: 315ms results for mak_1@1000 mean: 739ms median: 737ms results for mak_2@10 mean: 14ms median: 15ms results for mak_2@100 mean: 46ms median: 46ms results for mak_2@1000 mean: 122ms median: 120ms results for maksik_2@10 mean: 149ms median: 148ms results for maksik_2@100 mean: 148ms median: 148ms results for maksik_2@1000 mean: 165ms median: 166ms ['mak_1', 'mak_2', 'maksik_2'] same for 10 query limit: True ['mak_1', 'mak_2', 'maksik_2'] same for 100 query limit: False ['mak_1', 'mak_2', 'maksik_2'] same for 1000 query limit: False

Marco Bonardo [:mak]

Comment 33

•

9 years ago

you query is correct, unfortunately we cannot guarantee much about the distinct behavior, it's mostly undefined unless we distinct the same column we order on, and we can't. My query is much faster mostly cause it can use the frecency index to sort, but we need to sort on max(frecency), so we can't do better than what you are doing, if we want sqlite do all the work for us. The only thing I found is that if Sqlite doesn't use the rev_host index for the group by, and goes with a linear scan, it takes half the time, for whatever reason. (try replacing "GROUP BY rev_host" with "GROUP BY +rev_host" to suggest the optimizer not using the rev_host index). How is UX here? is the query blocking interaction? You definitely want to add a telemetry probe measuring the time taken by this query in the wild. as a side note, tomorrow I'm out, if I figure out something better in the meanwhile I'll let you know. But feel free to proceed. My only suggestions so far are trying to skip the index, add a telemetry probe to check how it performas in the wild, and ensure this doesn't block user interaction.

Flags: needinfo?(mak77)

Olivier Yiptong [:oyiptong]

Reporter

Comment 34

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong https://reviewboard.mozilla.org/r/32297/#review31467 ::: browser/components/newtab/PlacesProvider.jsm:163 (Diff revision 3) > - // Sort by frecency, descending. > + let sqlQuery = "SELECT url, title, frecency, " + please use backticks for multiline string ::: browser/components/newtab/tests/xpcshell/test_PlacesProvider.js:306 (Diff revision 3) > + "select url, title, last_visit_date, frecency from moz_places " + nit: use backticks for multiline string ::: browser/components/newtab/tests/xpcshell/test_PlacesProvider.js:306 (Diff revision 3) > + "select url, title, last_visit_date, frecency from moz_places " + nit: backticks for multiline string

Attachment #8711754 - Flags: review?(oyiptong)

Olivier Yiptong [:oyiptong]

Reporter

Comment 35

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong https://reviewboard.mozilla.org/r/32297/#review31469 r+ after string nits

Attachment #8711754 - Flags: review+

Olivier Yiptong [:oyiptong]

Reporter

Updated

•

9 years ago

Attachment #8711734 - Flags: review?(oyiptong)

Assignee

Comment 36

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32297/diff/3-4/

Attachment #8711754 - Attachment description: MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement; r?oyiptong → MozReview Request: imported patch 1201977.v5.patch

Assignee

Comment 37

•

9 years ago

Comment on attachment 8711754 [details] MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong Review request updated; see interdiff: https://reviewboard.mozilla.org/r/32297/diff/4-5/

Attachment #8711754 - Attachment description: MozReview Request: imported patch 1201977.v5.patch → MozReview Request: Bug 1201977 - Replace usage of nsINavHistoryQuery with a string query to nsPIPlacesDatabase.asyncStatement r=oyiptong

Comment 38

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/874d13a1b6ea

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 39

•

9 years ago

And a complementary eslint fix: https://hg.mozilla.org/integration/fx-team/rev/33fc723946aa

Carsten Book [:Tomcat]

Comment 40

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/874d13a1b6ea https://hg.mozilla.org/mozilla-central/rev/33fc723946aa

Status: NEW → RESOLVED

Closed: 9 years ago

status-firefox47: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 47

You need to log in before you can comment on or make changes to this bug.